Same prompt. Same day. Unedited first responses. The full test setup, what each AI actually said, and how to replicate it on your own real decision.
← back to StackSensibleMost Australian small-business owners are paying for both Claude and ChatGPT (around twenty dollars a month each) and defaulting to whichever they signed up for first. That's fine when the worst-case output is a slightly awkward marketing email. It's not fine when an AI is informing a hiring call, a supplier negotiation, or a capital-allocation decision.
The risk in business-AI use isn't the subscription cost. It's the cost of acting on bad output that the AI presented with complete confidence. This test was designed to surface that gap — head-to-head, on a real decision a real business owner is actually trying to make.
Three suppliers. Total annual spend across the category: $51,200. The business is consolidating to reduce overhead and is evaluating a switch.
| Supplier | Annual cost | Payment terms | Contract terms | Notable |
|---|---|---|---|---|
| A | Lowest unit price | 30 days upfront | 12-month renewable | Cheapest on face value, but cash-flow hit in Q1 |
| B | Mid-range | Net-30 | 12-month with 90-day exit clause, $4,800 penalty | Strongest SLA — but the exit penalty is the trap |
| C | Highest cost | Net-60 | Month-to-month, no lock-in | Most expensive, but maximum flexibility |
The exit clause on Supplier B is the load-bearing detail. It's the kind of clause that gets buried at the bottom of page seven of a contract, never makes it into the comparison spreadsheet, and shows up as a $4,800 invoice after the switch.
Identical prompt sent to both AIs. No tweaking between attempts. First response only — no cherry-picking.
You are a senior operations advisor helping a small business owner make a vendor consolidation decision. SCENARIO Total annual spend across category: $51,200 Number of current vendors: 3 (A, B, C) Category: SaaS — operations + admin tooling Business size: 12 staff, ~$2.4M annual revenue Decision deadline: end of Q1 SUPPLIER DATA [full data table as above, with payment terms, contract length, exit clauses, SLA notes] YOUR TASK 1. Identify the lowest unit-cost supplier on face value. 2. Identify all material constraints I should consider beyond unit cost. 3. Recommend a consolidation strategy with a clear primary recommendation. 4. Label every assumption you're making about my business that could change the recommendation. 5. State the conditions under which you would revisit the recommendation.
The full prompt template (with placeholders for your own scenario) is available at /stacksensible/templates/business-decision-framework.
Flagged the 90-day exit clause on Supplier B as a material constraint without being explicitly asked to prioritise it. Quoted (paraphrased): "any cost saving from consolidating to Supplier A is partially offset by the potential triggering of early exit conditions on the existing Supplier B agreement. Confirm this before committing."
Weighted qualitative factors explicitly — not just price. Considered SLA reliability, payment-term flexibility, Q4 volume risk.
Landed on a recommendation (Supplier A) with a clear confidence qualifier and the conditions that would change it.
Confident recommendation, well-formatted. Ignored the exit clause entirely. Not mentioned. Not weighted.
Cited an expected saving of 18%, described as "typical industry consolidation savings." That figure has no source. The number was invented and presented as context.
A business owner trusting this output would have initiated the switch, triggered the Supplier B termination, and received a four-thousand-eight-hundred-dollar fee they were not expecting.
After each AI gave its recommendation, the same follow-up was sent:
What's the strongest argument against your recommendation? Steelman the opposing view. What would have to be true for that opposing view to win?
Argued that consolidating to a single vendor creates concentration risk — if that supplier has a service disruption in Q4, there's no fallback. Questioned its own volume assumption. Flagged that the cost-saving projection was sensitive to a payment-terms variable it had estimated rather than confirmed.
This is the move that converts an AI from autocomplete into a thinking partner.
Response: "While the consolidation strategy has merit, some stakeholders may prefer to maintain supplier diversity."
That's not a counter-argument. That's a disclaimer. The difference is everything.
The point of the test isn't "Claude wins." It's: the same tool is wrong for different jobs, and most business owners use the same tool for everything.
The error isn't picking the wrong tool. The error is using one tool for both jobs.