Claude API vs Self-Hosted Qwen: A Cost Breakdown for AU SMBs

Qwen 3.5 is a capable open model under a clean Apache 2.0 licence. On paper, running it yourself looks cheaper than paying for a managed API. The honest comparison depends almost entirely on your volume, and most Australian SMBs sit on the wrong side of the line for self-hosting.

This is the comparison clients ask us for most often, usually after someone has seen the model is free to download and assumed the system around it is free too. It deserves real figures rather than a slogan, so here is how the two options actually stack up for a small Australian business.

The two cost shapes

The reason this decision confuses people is that the two options are not the same kind of cost. One scales with use. The other is mostly fixed, and you pay it no matter how busy the model happens to be.

Claude bills per request, so a quiet week costs less than a busy one
A self-hosted Qwen node costs roughly the same whether it runs flat out or sits idle
The crossover point is the volume where the fixed node finally becomes cheaper
Below that point, usage-based pricing wins on simple arithmetic

Once you see the two as different shapes rather than two prices, the rest of the decision gets much clearer.

The break-even question

API pricing scales with how much you use. Self-hosting is mostly a fixed cost you pay whether the model is busy or idle. The whole decision turns on where those two lines cross.

Low and uneven volume strongly favours an API like Claude
Very high and steady volume can favour self-hosting
The crossover point is where the real maths lives
Idle capacity quietly worsens the self-hosted case

If you cannot keep a GPU node busy through the day, you are paying for capacity you do not use.

A worked example

Consider a Melbourne SMB sending moderate daily traffic to a model for support drafts and internal tasks.

A self-hosted Qwen node can cost $40,000 a year as a flat commitment
The same workload on Claude might run well under that on usage-based pricing
The self-hosted node sits idle every evening and weekend
Each upgrade adds testing time the API user never spends

At this volume the flat node loses, because the business is paying for a full day of capacity to cover a few busy hours of real demand.

The costs that do not show up on the invoice

The headline compute figure is only part of the self-hosted bill. The rest hides in people and risk, and it lands every month rather than once.

An engineer who can run inference, scaling, and security, often a large share of a $120,000 salary
On-call cover for nights and weekends when a node falls over mid-task
Privacy Act work for any personal data the model touches
Re-testing and integration each time you move to a newer Qwen release

Add these and a node that looked like $40,000 can pass $90,000 once it is fully production-ready and safe to put in front of real work.

When self-hosting Qwen does win

Qwen on your own hardware is the right call for a real set of businesses, and we say so when the numbers point that way. The case is strongest when several conditions line up at the same time.

A high, steady request volume that keeps a GPU node busy through the working day
A narrow task you can tune once on Qwen and then largely leave alone
An in-house engineer who already owns similar infrastructure
A firm requirement to keep every request on Australian soil at all times

When most of those hold, a flat $40,000 node can undercut usage-based pricing by a wide margin. The mistake is assuming they hold when they do not. A Sydney firm running a model for a few hours a day rarely meets even two of them, and pays for the gap.

Reading the result

Most Australian SMBs fall below the volume where self-hosting wins. They would pay a flat cost for capacity they barely touch, plus the staffing to keep it running. The API model fits their uneven, business-hours usage far better.

Measure your real request volume before deciding
Compare flat self-hosted cost against usage-based pricing
Factor in the engineer you would need either way

We run the break-even maths on your actual traffic and recommend the cheaper option, even when that turns out to be Claude. Book a costing session and we will price both paths in plain figures.

Reading your own usage curve

The break-even point is personal to your traffic, so look at the shape of it rather than a headline price someone quoted you.

Plot requests by hour across a normal week
Note how many hours sit near zero
Compare that idle time against a flat node cost

For a Melbourne SMB with business-hours demand, the curve usually makes the case on its own, because a fixed node you cannot keep busy loses to usage-based pricing every time the bill arrives.

Claude API vs Self-Hosted Qwen: A Cost Breakdown for AU SMBs

The two cost shapes

The break-even question

A worked example

The costs that do not show up on the invoice

When self-hosting Qwen does win

Reading the result

Reading your own usage curve

Ready to move from AI pilot to production?

More from the blog

Claude vs Kimi K3: Why Benchmark Parity Doesn't Mean Business Parity

Stop Sharing Claude Max Logins: How Australian Teams Should Provision Claude Code

Open-Source Voice AI Economics: What Voxtral and Open TTS Mean for Australian Call-Handling Costs