Why Most Australian SMBs Should Not Self-Host an LLM Yet

Running an open-weight model on your own hardware is more achievable in 2026 than it has ever been. DeepSeek, Qwen and Mistral all publish weights you can download this afternoon, and the tooling around them has matured to the point where a competent engineer can have a model answering requests within a day. So the question arrives on our desk most weeks: should an SMB self-host an LLM rather than pay for a managed model such as Claude?

For most Australian small and mid-sized businesses, the honest answer is not yet. Not never, just not yet. The blockers have little to do with model quality. They are about people, economics and compliance, and they bite hardest at exactly the size of business that finds self-hosting most tempting.

Why self-hosting looks attractive

The appeal deserves a fair hearing, because the arguments are not silly.

No per-token bill, so heavy usage does not show up as a growing invoice
Your data never leaves infrastructure you control
No dependency on a vendor's pricing, terms or roadmap
Freedom to fine-tune the model on your own data

Each point is true in isolation. The problem is what it costs to make them true in practice, and who in your business carries that cost.

The people problem

A self-hosted model is a production system, and production systems need owners. Someone has to patch the serving stack, watch GPU utilisation, handle the 2am failure when an update breaks the inference server, and re-run the evaluation suite every time a new model version ships. In a 30-person Brisbane firm, that someone is usually the one technical person who already runs everything else.

Hiring for it instead means a machine learning or platform engineer, and in Sydney or Melbourne that is a $140,000 to $180,000 salary before super. Very few SMBs have enough inference volume to keep that person busy, so you end up paying a specialist's wage for a part-time problem. The skills also need maintaining, because the open-source serving ecosystem changes monthly.

The real numbers

Here is the arithmetic we walk clients through. A single GPU node capable of running a serious open model well, with redundancy and backups, rents from an Australian provider for roughly $6,000 to $10,000 a month. The card sits mostly idle outside business hours, but the bill does not.

Hardware or cloud GPU rental: $70,000 to $120,000 a year
Engineering time to run it safely: $40,000 to $80,000 a year, even part-time
Monitoring, security and compliance overhead: $10,000 to $20,000 a year
A managed Claude build doing the same job: often $500 to $3,000 a month in API usage

For most workloads we see, the managed option is cheaper by an order of magnitude until usage gets very large. The crossover point exists, but a business doing a few hundred thousand requests a month is nowhere near it. Paying $80,000 a year to avoid a $20,000 API bill is not a saving.

The compliance load under the Privacy Act

Self-hosting moves every security obligation onto you. Under the Privacy Act and APP 11, you are responsible for taking reasonable steps to protect the personal information the model touches. That means access controls on the box, audit logs for prompts and outputs, prompt patching of the serving stack, and being able to show your work if the OAIC ever asks. A managed provider carries a large share of that load and holds certifications you would otherwise have to earn yourself.

None of this is impossible. It is simply work, and it is work that does not move your product or your customers forward.

When the maths changes

There are situations where self-hosting earns its keep, and it pays to know them in advance.

Sustained, high-volume inference where the per-token saving outruns the fixed costs
Hard data residency or air-gap requirements a managed service cannot meet
An existing platform team with GPU experience and on-call capacity
A genuine need to fine-tune deeply on proprietary data

If two or more of these describe your business, the question deserves a serious look. If none do, the maths is not close.

A sensible order of operations

The pattern that works is to prove value first and buy infrastructure later. Start with a managed model, ship a working automation, and measure real usage rather than the forecast. Volume data turns the self-hosting question from a debate into a calculation.

Ship something useful on a managed model first
Track monthly token volume and cost as it grows
Revisit self-hosting when the numbers clearly support it

We help Australian SMBs run this calculation with real numbers, and we default to Claude until self-hosting genuinely pays. If you are weighing the options, book a short brainstorm and we will map the fastest path for your workload.

Why Most Australian SMBs Should Not Self-Host an LLM Yet

Why self-hosting looks attractive

The people problem

The real numbers

The compliance load under the Privacy Act

When the maths changes

A sensible order of operations

Ready to move from AI pilot to production?

More from the blog

Claude vs Kimi K3: Why Benchmark Parity Doesn't Mean Business Parity

Stop Sharing Claude Max Logins: How Australian Teams Should Provision Claude Code

Open-Source Voice AI Economics: What Voxtral and Open TTS Mean for Australian Call-Handling Costs