Running Qwen 3.5 Locally: A Technical Walkthrough

Qwen 3.5 under the Apache 2.0 licence is one of the friendlier open models to self-host, which makes it a common first choice for Australian teams that want their own deployment. The 235B mixture-of-experts design activates only about 22B parameters per token, so the running cost stays lower than the headline size suggests. Running it well, though, is a different job from running it at all. Here is how a sound local setup comes together, what it costs across a year, and where a managed model like Claude still earns its place.

Why teams look at a local Qwen

Self-hosting appeals for a few clear reasons, and they are worth naming honestly before any hardware is bought.

Data never leaves your own environment, which simplifies some Privacy Act questions
No per-token bill, so heavy and predictable workloads can be cheaper at scale
Full control over model version, uptime, and any fine-tuning
Independence from a single vendor's pricing or availability

Each of these is real. None of them is free, because the work and the risk simply move from a provider onto your own team.

Getting it running

The basic path is well trodden, and the tooling has matured over the past year.

Pull the weights from a trusted, verified source
Serve with an inference engine that supports MoE routing
Set sensible context and batch limits for your hardware
Confirm output quality on your real tasks before going further

Getting a demo running is the easy part. A single workstation with a capable GPU can have Qwen 3.5 answering prompts in an afternoon. The next step is what separates a demo from a deployment your business can actually lean on.

Hardening for production

A working demo is not a production system, and treating it as one invites trouble the first time real users arrive.

Add authentication and request rate limits
Log requests for audit under the Privacy Act
Plan capacity around Australian business hours
Build alerting so failures surface before users notice

This is the stage most local pilots underestimate. A model that answers nine prompts in ten is fine for a test and a problem in front of a customer in Sydney at nine on a Monday morning. Reliability is an engineering job in its own right, and it does not end at launch.

Where a local node actually pays off

Self-hosting is less a yes or no question than a workload question. Some jobs suit a local node, and plenty of others do not, so it helps to sort your tasks before you commit to hardware.

High-volume, repetitive tasks like bulk classification or document tagging
Work on sensitive data that must stay inside your own walls
Steady, predictable load that keeps the hardware busy through the week
Internal tools where an occasional slow response is acceptable

Customer-facing replies, anything that has to read perfectly the first time, and spiky or seasonal demand all tend to favour a managed model instead. Sorting your tasks into these two buckets before you buy a node avoids the most expensive kind of mistake, which is building the wrong thing well and paying for it every year.

The honest cost

A modest production node for an Australian SMB starts around $35,000 a year once you count hardware, hosting, and the staff time to keep it healthy. That figure buys capacity you then have to keep busy to justify, which is the catch for businesses with uneven demand.

Match the node size to measured demand, not to the largest model you can fit
Account for idle hours, since a node costs the same whether it is busy or not
Keep a clear upgrade and testing routine, and budget the hours it takes

Set that against a managed alternative. For a team running a few thousand requests a day, a Claude deployment often lands well under $35,000 a year with no hardware to own and no node to keep watch over. The maths only favours self-hosting once volume is both high and steady, which is a narrower set of businesses than the download-it-free framing suggests.

We build and harden these setups when the case is genuinely there, and we keep Claude as the default where the volume does not warrant running your own node. If you want an honest read on which side of that line you sit, book a brainstorm.

Keeping the node honest

A local Qwen deployment needs ongoing care, not just a clean launch.

Track utilisation so you know whether the node earns its cost
Patch the server and dependencies on a schedule
Test each upgrade in staging before it ships

For an Australian SMB, this routine is the real commitment behind self-hosting, and it is the part that decides whether the $35,000 a year is money well spent. A node nobody tends slowly drifts out of date and out of compliance, and the savings on paper quietly turn into risk. Run it like the production system it is, or pick the managed path and put the saved hours into the product itself.

Running Qwen 3.5 Locally: A Technical Walkthrough

Why teams look at a local Qwen

Getting it running

Hardening for production

Where a local node actually pays off

The honest cost

Keeping the node honest

Ready to move from AI pilot to production?

More from the blog

A CISO's Framework for Agentic AI: What Anthropic's Security Team Learned

Claude Code Can Migrate a Million Lines of Legacy Code in Two Weeks

Claude Code Can Set Up Your Server So You Don't Need a DevOps Hire