Self-hosting an open source model is now genuinely viable for a mid-size business. Llama, Gemma, Qwen and their peers have closed much of the quality gap, the tooling has matured, and a capable engineer can stand up a model server in a week. None of that makes it the right call for every Australian team. The decision comes down to scale, risk, and the team you actually have, not to which model topped a benchmark this month.
We help Sydney and Melbourne businesses make this call regularly, and the honest answer is that both paths are legitimate. The trick is knowing which set of conditions you are actually in before you commit budget to either.
Good reasons to self-host
Some workloads reward the effort of running your own model server, and for those it can be the cheaper and more controlled choice.
Very high and steady request volume where per-token API costs would stack up month after month
Strict requirements to keep data on Australian soil at all times, including during inference
A workload narrow enough to tune once and maintain cheaply, like classification or extraction over a fixed document set
An existing engineering team with real capacity to own uptime, security patching, and model upgrades
When most of those are true at the same time, self-hosting starts to make economic sense. A logistics company processing two million documents a month with a stable, narrow prompt is a different proposition from a professional services firm running varied, judgement-heavy work.
Good reasons to pick Claude
For most Australian SMBs the managed path wins on speed and total cost, because it removes the parts that quietly drain a small team.
No GPU fleet to buy, scale, secure, or replace when the hardware generation turns over
A faster path from idea to a working build, often weeks instead of months
Reliability that does not depend on your on-call roster or a single engineer's tenure
Frontier-quality reasoning for complex tasks where smaller open models still fall short
Predictable cost that scales with use rather than with provisioned capacity
There is also a quality argument that gets lost in cost spreadsheets. The tasks businesses most want automated, like drafting client correspondence, reviewing contracts, and handling multi-step workflows, are exactly the ones where the gap between a small open model and Claude shows up in the output.
The numbers that usually decide it
A single production GPU node in Australia can cost $40,000 a year before anyone writes a line of application code. Add a fraction of an ML engineer's time to run it, at Sydney market rates of $160,000 to $200,000 a year, and the true cost of a modest self-hosted setup climbs past $100,000 annually.
Compare that with API usage. A team of 50 using Claude heavily for document work might spend $1,000 to $3,000 a month, or $12,000 to $36,000 a year, with zero infrastructure to maintain. The break-even point sits at a usage volume most SMBs never reach. For a business whose workload would leave that GPU node idle most of the day, the maths settles itself.
Price the hardware and the people, not just the model weights
Match provisioned capacity to real, measured demand, not projected demand
Revisit self-hosting only when sustained volume clearly justifies it
Data sovereignty and the regulatory angle
Australian businesses often assume self-hosting is the only way to satisfy data residency obligations. That is no longer accurate. Claude is available through cloud regions that keep data in Australia, and for most obligations under the Privacy Act the question is how data is handled and governed, not which company runs the GPU.
For APRA-regulated entities the analysis is more detailed, and CPS 230 puts the onus on you to assess any service provider, including your own internal platform team. A self-hosted stack you cannot patch promptly is a worse compliance story than a well-governed managed service. Either path can pass; the difference is the evidence you can produce.
A quick decision rule
When the choice feels close, three lines keep you out of trouble.
If you cannot keep a GPU node busy through the working day, do not self-host
If the data is sensitive or regulated, default to the option with the stronger governance story
If your team is already stretched, choose the path with less to run
For most Australian SMBs these three lines settle the question before a single benchmark is opened, and they point firmly toward a managed build on Claude. The exceptions are real but rare, and they tend to know who they are: high-volume, narrow-workload operations with engineering depth.
Where a hybrid setup earns its keep
The choice is not always binary. A pattern we see working well in 2026 is Claude for customer-facing and judgement-heavy work, with a small open model handling high-volume background tasks like tagging and routing. The open model runs where errors are cheap and caught downstream; Claude runs where the output goes out the door under your brand. This keeps cost down without betting the customer experience on a model you have to babysit.
We map your workload to the cheaper, safer option and stay honest when that option turns out to be open source. Most often, for an Australian SMB, it points to Claude. If you are weighing this decision for your own business, book a brainstorm with us.



