Blog

Claude API vs Self-Hosted Qwen: A Cost Breakdown for AU SMBs

June 2026 · 6 min read · ROI & Business Case

Hand-drawn graph of a flat self-hosting cost line crossing a rising usage-based cost line
← Back to all posts

Qwen 3.5 is a capable open model under a clean Apache 2.0 licence. On paper, running it yourself looks cheaper than paying for a managed API. The honest comparison depends almost entirely on your volume, and most Australian SMBs sit on the wrong side of the line for self-hosting.

This is the comparison clients ask us for most often, usually after someone has seen the model is free to download and assumed the system around it is free too. It deserves real figures rather than a slogan, so here is how the two options actually stack up for a small Australian business.

The two cost shapes

The reason this decision confuses people is that the two options are not the same kind of cost. One scales with use. The other is mostly fixed, and you pay it no matter how busy the model happens to be.

  • Claude bills per request, so a quiet week costs less than a busy one

  • A self-hosted Qwen node costs roughly the same whether it runs flat out or sits idle

  • The crossover point is the volume where the fixed node finally becomes cheaper

  • Below that point, usage-based pricing wins on simple arithmetic

Once you see the two as different shapes rather than two prices, the rest of the decision gets much clearer.

The break-even question

API pricing scales with how much you use. Self-hosting is mostly a fixed cost you pay whether the model is busy or idle. The whole decision turns on where those two lines cross.

  • Low and uneven volume strongly favours an API like Claude

  • Very high and steady volume can favour self-hosting

  • The crossover point is where the real maths lives

  • Idle capacity quietly worsens the self-hosted case

If you cannot keep a GPU node busy through the day, you are paying for capacity you do not use.

A worked example

Consider a Melbourne SMB sending moderate daily traffic to a model for support drafts and internal tasks.

  • A self-hosted Qwen node can cost $40,000 a year as a flat commitment

  • The same workload on Claude might run well under that on usage-based pricing

  • The self-hosted node sits idle every evening and weekend

  • Each upgrade adds testing time the API user never spends

At this volume the flat node loses, because the business is paying for a full day of capacity to cover a few busy hours of real demand.

The costs that do not show up on the invoice

The headline compute figure is only part of the self-hosted bill. The rest hides in people and risk, and it lands every month rather than once.

  • An engineer who can run inference, scaling, and security, often a large share of a $120,000 salary

  • On-call cover for nights and weekends when a node falls over mid-task

  • Privacy Act work for any personal data the model touches

  • Re-testing and integration each time you move to a newer Qwen release

Add these and a node that looked like $40,000 can pass $90,000 once it is fully production-ready and safe to put in front of real work.

When self-hosting Qwen does win

Qwen on your own hardware is the right call for a real set of businesses, and we say so when the numbers point that way. The case is strongest when several conditions line up at the same time.

  • A high, steady request volume that keeps a GPU node busy through the working day

  • A narrow task you can tune once on Qwen and then largely leave alone

  • An in-house engineer who already owns similar infrastructure

  • A firm requirement to keep every request on Australian soil at all times

When most of those hold, a flat $40,000 node can undercut usage-based pricing by a wide margin. The mistake is assuming they hold when they do not. A Sydney firm running a model for a few hours a day rarely meets even two of them, and pays for the gap.

Reading the result

Most Australian SMBs fall below the volume where self-hosting wins. They would pay a flat cost for capacity they barely touch, plus the staffing to keep it running. The API model fits their uneven, business-hours usage far better.

  • Measure your real request volume before deciding

  • Compare flat self-hosted cost against usage-based pricing

  • Factor in the engineer you would need either way

We run the break-even maths on your actual traffic and recommend the cheaper option, even when that turns out to be Claude. Book a costing session and we will price both paths in plain figures.

Reading your own usage curve

The break-even point is personal to your traffic, so look at the shape of it rather than a headline price someone quoted you.

  • Plot requests by hour across a normal week

  • Note how many hours sit near zero

  • Compare that idle time against a flat node cost

For a Melbourne SMB with business-hours demand, the curve usually makes the case on its own, because a fixed node you cannot keep busy loses to usage-based pricing every time the bill arrives.

Ready to move from AI pilot to production?

We help mid-market Australian businesses deploy AI automations that actually reach production and deliver measurable ROI.