Blog

How Claude Code Handles Large Codebases: Agentic Search vs RAG

May 2026 · 7 min read · Technical

Engineer reviewing a large monorepo on a wide monitor in a Sydney office
← Back to all posts

Australian engineering leads running Claude Code in production tell a consistent story. The tool behaves differently on a 4-million-line monorepo than it does on a 30,000-line side project. The mechanics that make Claude Code reliable at scale are not the same ones that other AI coding tools use, and the distinction matters when a Sydney bank or a Melbourne SaaS company is sizing a rollout across hundreds of engineers.

The pattern is documented in the May 2026 engineering writeup from Anthropic on how Claude Code operates in large codebases. We see the same dynamics on client work. Here is what the architecture means for an Australian engineering team weighing Claude Code against a retrieval-augmented (RAG) competitor.

Why RAG breaks at AU enterprise scale

RAG-based coding assistants work by embedding the codebase into a vector store, then retrieving relevant chunks at query time. That model is straightforward when the codebase is small and stable. It quietly degrades when the codebase is large and active.

A 2-million-line monorepo at a Sydney financial services company, with a hundred engineers committing daily, mutates faster than most embedding pipelines can keep up with. By the time a developer queries the index, the index reflects the codebase as it existed hours, days, or weeks earlier. The retrieved chunks reference functions that were renamed in the last sprint or modules that were deleted yesterday. The assistant returns plausible-looking suggestions that no longer apply.

This failure mode is invisible to the engineer. The model output looks confident. The code it produces compiles. The bug only shows up later, in review or in production, when someone notices that the agent referenced a deprecated API or imported a file that no longer exists. For AU teams under APRA CPS 230 operational resilience requirements, a tool that is quietly wrong about the current state of the codebase is a control gap, not just a productivity issue.

How Claude Code searches a live codebase

Claude Code takes a different approach. It runs locally on the developer's machine and searches the file system the way a software engineer would. It walks directories, reads files, runs grep against the live tree, and follows references across the codebase from whatever entry point it has been given.

There is no embedding pipeline. There is no centralised index that has to be rebuilt every time a thousand engineers push commits. Each developer's instance works against the actual current state of the repository. When Claude Code finds a reference to a function, that function is the one that exists right now, not the one that existed when the index was last refreshed.

For an Australian engineering team this changes the failure profile. The model can still be wrong about what to do with the code it finds, but the code it finds is real. The class of bug where the assistant references something that no longer exists effectively disappears.

Where agentic search outperforms RAG

Agentic search is not free. The model spends tokens reading files and running grep, and the spend grows with the size of the area it has to explore. In a small repository RAG can be faster and cheaper, because the index gives the model an immediate shortlist.

The crossover point is around a few hundred thousand lines of code. Above that, the freshness problem and the index-maintenance overhead dominate. Below it, RAG is fine for most use cases.

For AU enterprises the relevant cases sit above that line. The teams that come to us asking for Claude Code rollouts are usually:

  • Banks with multi-million-line monorepos governed under APRA CPS 230 operational resilience requirements, where freshness of the assistant's understanding is a compliance concern, not just a productivity one.

  • Federal and state government agencies with legacy systems written in C, Java, and PHP that have grown over fifteen years, where no embedding pipeline can be kept current against the rate of change.

  • ASX-listed SaaS companies with distributed architectures spanning twenty or thirty service repositories, where the assistant needs to follow references across repo boundaries that an index does not see.

  • Mid-tier Brisbane and Melbourne consultancies running multi-tenant codebases for clients, where each tenant's branch diverges weekly and no centralised index can stay accurate across all of them.

In each case the question is not whether the assistant should be intelligent. It is whether the assistant is working against the real codebase or against a stale snapshot of it.

The tradeoff and how to size for it

The cost of agentic search is starting context. Claude Code performs best when the developer or the project conventions point it at roughly the right area before it begins searching. Drop it into a 4-million-line monorepo with no hints and it will spend tokens exploring directories that were never relevant.

AU engineering teams that get this right tend to do three things:

  • Keep a project-level conventions file at the repo root that names the entry points, the build commands, and the directories the agent should usually look in. A 60-line file changes the token economics meaningfully.

  • Constrain MCP server bindings to the repository being worked in, so a frontend project does not load eight unrelated backend servers and pull the agent into the wrong context.

  • Pin a specific Claude model version per repository so a rollover does not silently change the search behaviour partway through a release window.

The combination tends to cut Claude Code token spend on a typical session by 30 to 45 percent on a Sydney mid-market engineering team, based on the rollouts we have measured. The savings are large enough to matter to a CTO, and the discipline is small enough to teach in a single workshop.

The cost picture in AUD

For a 25-engineer Sydney engineering team running Claude Code daily, the monthly Claude API spend usually settles between $4,500 and $9,000 once the conventions above are in place. That is roughly $180 to $360 per engineer per month for a tool that, in the engagements we run, returns four to six hours per engineer per week.

At an average loaded engineering cost of $180,000 per year, four hours back per week is around $18,000 of recovered capacity per engineer per year. For the team of 25, that is roughly $450,000 of annual capacity against a tool cost of around $90,000. The return is not subtle, but it depends on the team using the tool the way the architecture intends.

Teams that try to use Claude Code as if it were a RAG-based assistant, dropping it into the repository with no conventions and expecting an index to carry the work, see a meaningfully smaller return. The architecture is doing different work, and the operating model has to match.

What to do this fortnight

If your team is evaluating Claude Code against a RAG-based competitor, three things are worth running in the next two weeks:

  • A side-by-side test on your real monorepo. Pick a task that touches code that changed in the last sprint. RAG-based tools tend to miss the rename. Claude Code tends to find it.

  • A conventions file at the repo root. Even a rough draft makes a measurable difference to token spend on the second day.

  • A token-spend dashboard. Without it the conversation with finance becomes a faith argument. With it the numbers carry the discussion.

Automata AI ships Claude Code rollouts for Australian engineering teams, from the conventions file to the monorepo onboarding to the cost dashboard. If your team is sizing a rollout, book a 30-minute brainstorm at our contact page.

Ready to move from AI pilot to production?

We help mid-market Australian businesses deploy AI automations that actually reach production and deliver measurable ROI.