Most Australian law firms that ran a retrieval-augmented generation pilot in 2025 or 2026 are still running it. The pilot ingests a corpus of precedents, retrieves relevant passages, and produces a fluent answer with Claude. Then a partner asks one harder question, the system retrieves the wrong clause or invents a citation, and the rollout quietly stalls. The gap between that demo and a system lawyers actually trust is not better prompting. It is deliberate retrieval engineering.
The prize justifies the work. For a 50-lawyer firm, a working RAG system over the precedent bank and historical advice files typically returns 12 to 25 hours per lawyer per month. At a fully loaded $260 per hour, that is more than $1.5M of recovered annual capacity, before counting faster turnaround on client work.
Why legal RAG pilots stall
A pilot proves the friendly case. The demo corpus is small and clean, the questions are cooperative, and nobody has wired in the firm's access controls. Production reverses every one of those conditions: messy scanned documents, adversarial questions from sceptical partners, and confidentiality rules that carry professional consequences. Firms that plan for those three reversals from day one get through them. Firms that treat the pilot as 90 percent done do not.
Chunking that respects legal structure
Generic chunking by character count produces mediocre legal retrieval. Legal documents have structure: clauses, schedules, definitions, precedents. A chunking strategy that ignores that structure splits obligations mid-clause and separates defined terms from their definitions, and retrieval quality pays for it on every query.
Clause-level chunking for contracts and statutes, with no mid-clause splits, so an indemnity or limitation clause is always retrieved whole
Document-level chunking for short advice memos under four pages, where the memo itself is the natural unit of meaning
Hybrid chunking for long opinions: section-level chunks plus the cross-references that let the model follow a chain of reasoning
Citation-preserving chunking, where the source citation travels with every chunk so the answer can ground itself
The difference between generic and structure-aware chunking is typically 30 to 50 percent on first-pass retrieval relevance. It is the single highest-return engineering decision in the whole build.
Citation discipline is the trust layer
Lawyers do not trust answers without citations, and they should not. A legal RAG system must cite the source document, the section, and the date, and it must link to the source so the lawyer can verify the answer in one click. Claude handles this reliably when the pipeline passes citation metadata through with every retrieved chunk and the system prompt requires grounded answers only. What good citations look like in production:
The document title with the matter or precedent identifier
The section or clause reference, not just the document
The date of the source document, because legal advice goes stale
A direct link to the source for one-click verification
Just as important is the refusal path. When retrieval comes back thin, the system should say it cannot answer from the firm's materials rather than speculate. One confident hallucinated citation will get the system banned faster than fifty honest refusals will build credibility.
An eval suite catches drift before the lawyers do
Without an evaluation suite, RAG quality drifts silently every time you change an embedding model, a chunking rule, or a prompt. Australian firms that invest in a 200-question eval set tied to known correct answers detect quality drops within a day of any change, instead of hearing about them from an annoyed partner three weeks later. The evals worth building first:
Definition retrieval: does the right defined term come back for a term-of-art question
Precedent matching: does a similar past matter surface for a new fact pattern
Negative precedent: does a document the firm has flagged as wrong stay out of the top results
Recency: when two precedents conflict, does the more recent one win
Privacy, ethics, and information barriers
Australian legal RAG carries obligations most industries do not. Client confidentiality, the firm's information barriers, and the Australian Solicitors' Conduct Rules all bind the retrieval layer, not just the people using it. The system must enforce matter-level access controls so a lawyer cannot retrieve content from a matter they are walled off from, and the Privacy Act applies to personal information sitting inside the corpus.
This is an architecture decision, not a policy document. A RAG system that ignores information barriers exposes the firm to a conflict-of-interest finding that can disqualify it from future work. Retrieval-layer permissions, enforced at query time against the firm's matter management system, are the only control that holds up.
What production actually costs
A production-grade RAG system for a 50-lawyer Australian firm typically costs $250,000 to $600,000 to build, and $80,000 to $180,000 a year to operate across hosting, evaluation upkeep, and corpus refresh. Against the recovered capacity above, payback usually lands inside 12 months. The build cost varies mostly with how messy the document estate is: a firm with clean matter management pays the low end, a firm with 20 years of scanned PDFs pays the high end.
Sydney and Melbourne firms ask us the same first question: pilot again, or commit to production? The honest answer is that a stalled pilot usually contains 80 percent of the lessons and 20 percent of the engineering. If your firm is sizing a RAG build, or a pilot has gone quiet, book a pilot scoping conversation and we will map the path from where you are to a system your partners trust.



