RAG With Open Source Models: An Australian Setup Guide

Retrieval-augmented generation, or RAG, lets an AI model answer questions from your own documents instead of guessing from its training data. For most Australian businesses it is the cheapest way to make any model genuinely useful on company knowledge, whether that model is open source or a managed one like Claude.

This guide maps the pieces of a working RAG setup, the decisions that matter for an Australian build, and the realistic costs. It is written for a technical owner or team lead who wants a grounded plan rather than a vendor pitch.

What RAG actually does

When someone asks a question, a RAG system first searches your document store for the most relevant passages, then hands those passages to the model along with the question. The model answers from the retrieved text rather than from memory. Done well, that means answers grounded in your contracts, policies and procedures, with far fewer invented facts.

The quality of a RAG system is decided mostly by the retrieval step, not the model. A modest model fed exactly the right passages beats a frontier model fed the wrong ones, which is why the build order matters so much.

The core pieces

A RAG build has four standard components, and each one affects answer quality.

A document store, with files chunked into passages and indexed sensibly
A retriever that finds the chunks relevant to each question, usually a mix of vector and keyword search
A model that writes the answer using only the retrieved chunks
An evaluation loop that measures answer quality against real questions from your team

Teams rush the chunking and skip the evaluation set, then blame the model when answers disappoint. Spend the effort where the value sits: sensible chunking that follows the natural sections of your documents, a retriever tuned on your data, and a set of 50 to 100 real questions with known answers to test against before anyone calls the system finished.

Where open source models fit

Open source models in 2026 are genuinely capable, and at the embedding and retrieval layer they are often the right choice. Embedding models are small, run cheaply on your own infrastructure, and swapping one for another later is a contained change rather than a rebuild.

The answering model is a separate decision. Self-hosting a large open model means provisioning GPUs, patching, monitoring and capacity planning, and those costs land on your team every month. For a business with strong infrastructure skills and high volume, that can pay off. For most Australian SMBs, a managed model costs less once engineering time is counted honestly. We build the retrieval layer to be portable, then default to Claude for the answering step because it follows instructions reliably and sticks to the retrieved passages instead of improvising around them.

Use open source embeddings where data must stay on your own infrastructure
Keep the retrieval layer portable so the answering model can change later
Choose the answering model on accuracy against your evaluation set, not on a leaderboard

Doing it well in Australia

Local context shapes a RAG build in ways a generic tutorial will miss.

Keep the document store and vector index in an Australian region when contracts or client data are involved
Treat personal information in line with the Privacy Act, including retrieval logs, which often reproduce sensitive passages verbatim
Control which documents each user can retrieve; access rules belong in the retriever itself, not just the front end
Keep an audit trail of what was retrieved for each answer; APRA-regulated firms will want this for any customer-facing use

That last point catches teams out. The model's answer is only half the record. If a dispute arises, you want to be able to show which passages the system retrieved and why it retrieved them.

What it costs and what it returns

A capable RAG build for an Australian SMB typically lands between $20,000 and $45,000. That covers chunking and indexing the document set, retriever tuning, an evaluation harness and a working interface. Running costs after that are modest at typical SMB volumes: a few hundred dollars a month for hosting and model calls.

The return shows up as time. If a 20-person Sydney team saves two hours per person each week by asking the knowledge base instead of interrupting a colleague, that is worth roughly $60,000 a year at average professional rates, so a serious build recovers its cost well inside the first year.

Common mistakes to avoid

The same few failures account for most disappointing RAG projects, and all of them are avoidable with a careful start.

Choosing the model first and treating retrieval as plumbing
Chunking documents at arbitrary sizes instead of natural sections
Skipping the evaluation set, so quality is judged by anecdote
Ignoring permissions, so any staff member can retrieve any document
Leaving the index stale while the source documents change weekly
Self-hosting on principle when a managed model is cheaper in practice

Key takeaways

Retrieval quality, not model brand, decides RAG quality
Open source fits best at the embedding and infrastructure layers; pick the answering model on measured accuracy
Australian builds need data residency, Privacy Act handling and per-user access rules from day one
Budget $20,000 to $45,000 for a serious build and measure the hours it gives back

Talk to a Claude specialist

Automata AI is a Sydney based consultancy that builds retrieval first and picks the model second, with Claude as the default for grounded, reliable answers on Australian business data. If you are weighing up a RAG build, book a short brainstorm and we will map the fastest path to a system your team actually trusts.

RAG With Open Source Models: An Australian Setup Guide

What RAG actually does

The core pieces

Where open source models fit

Doing it well in Australia

What it costs and what it returns

Common mistakes to avoid

Key takeaways

Talk to a Claude specialist

Ready to move from AI pilot to production?

More from the blog

A CISO's Framework for Agentic AI: What Anthropic's Security Team Learned

Claude Code Can Migrate a Million Lines of Legacy Code in Two Weeks

Claude Code Can Set Up Your Server So You Don't Need a DevOps Hire