The Real Cost of Self-Hosting an Open Source LLM in Australia

Self-hosting an open source model looks close to free on the first slide. You download the weights, the licence costs nothing, and the demo runs on a laptop. Then the real invoices start arriving. For an Australian business, the true cost of running your own model is a stack of recurring line items that rarely make it into the original plan. Naming them upfront is the difference between a sound decision and an expensive surprise six months in.

What you actually pay for

The model weights are the cheapest part of the exercise. Everything wrapped around them carries a cost that recurs every month, whether the system is busy or idle. A realistic budget has four layers: compute, storage and networking, people, and the quiet operational work that keeps the whole thing running safely.

Compute to run inference, billed by the hour for each GPU
Storage for model weights, logs, caches, and backups
Networking and data transfer that grows with usage
The engineering time to build, run, and maintain it

The hardware bill

A capable open model needs a capable GPU, and capable GPUs are not cheap to rent in Australia. Cloud GPU instances commonly land between $3 and $12 an hour depending on the card and the region. Run one instance around the clock and that is roughly $26,000 to $105,000 a year for a single node, before storage or transfer. Buy the hardware outright instead and a production-grade GPU server runs $40,000 or more as a capital outlay, plus power, cooling, and somewhere to rack it.

A single always-on cloud GPU node: roughly $26,000 to $105,000 a year
An owned production GPU server: $40,000 or more upfront
Redundancy for failover roughly doubles the compute line
A staging environment to test upgrades safely adds more again

The people bill

Software does not run itself, and the people who keep it alive are the largest line in any serious plan. A self-hosted model is a production system that needs someone who understands inference, scaling, and security. In Sydney and Melbourne, an engineer with that skill set is not cheap, and a single person cannot cover nights, weekends, and public holidays on their own.

An engineer to manage inference, scaling, and tuning, often $120,000 or more in salary
On-call cover so the system survives leave and outages
Security work to meet the Privacy Act and your own policies
Time spent testing every model upgrade before it ships

The hidden costs that blow budgets

Self-hosted plans rarely fail on the obvious lines. They fail on the quiet ones that no one thought to budget. Idle compute is the classic example: a node sized for the busy hour still bills through the night for work that only happens between nine and five. Then the model you standardised on gets superseded, and re-testing the replacement quietly eats a fortnight of engineering time.

Idle compute running overnight for daytime workloads
A second engineer needed the moment the first takes leave
Re-testing and re-tuning every time the model updates
Compliance and audit work treated as ongoing, not one-off

Putting a real number on it

Add the pieces together and a serious self-hosted setup for an Australian SMB reaches around $80,000 a year before it earns a single dollar, and closer to $160,000 once you count a full engineer and proper redundancy. That figure reframes the whole conversation, because every one of those dollars has to be recovered through value the system creates. For a business whose usage would never keep a GPU busy, the maths rarely closes.

Count compute, storage, and networking as recurring costs
Add at least part of an engineer's salary, often most of one
Include security and compliance as ongoing work
Compare the fully loaded figure against a managed option

When self-hosting still makes sense

None of this means self-hosting is always the wrong call. For a business with very high and steady request volume, strict rules to keep data on Australian soil, and an engineering team with genuine spare capacity, running your own model can be both cheaper per request and more controlled. The point is to reach that conclusion from real numbers, not from the assumption that open source is free because the download is.

Where Claude changes the maths

For most Australian SMBs, a managed build on Claude hits the same goals for less, because the provider absorbs the GPU fleet and the on-call load that dominate a self-hosted budget. You pay for what you use rather than for capacity that sits idle overnight, and the cost scales with demand instead of arriving as a fixed annual commitment. The result is usually a faster path to something shipped, with a smaller and more predictable bill.

We cost both paths honestly and let the numbers make the case, whether that points to self-hosting or to a managed Claude build. Book a brainstorm and we will map your workload to the cheaper, safer option.

The Real Cost of Self-Hosting an Open Source LLM in Australia

What you actually pay for

The hardware bill

The people bill

The hidden costs that blow budgets

Putting a real number on it

When self-hosting still makes sense

Where Claude changes the maths

Ready to move from AI pilot to production?

More from the blog

Claude vs Kimi K3: Why Benchmark Parity Doesn't Mean Business Parity

Stop Sharing Claude Max Logins: How Australian Teams Should Provision Claude Code

Open-Source Voice AI Economics: What Voxtral and Open TTS Mean for Australian Call-Handling Costs