How to Cut Token Waste in Claude Code (and Keep Your AUD Spend Down)

Most of the money an Australian team spends on Claude Code does not go on the instructions you type. It goes on everything the agent pulls in around them: file reads, command output, search results, logs, and the long back-and-forth history that piles up in the context window. The model is billed by the token, and by default very little governs how many tokens reach it on each turn.

Context bloat is the quiet line item on your bill. You rarely see it as a single charge, because it hides inside thousands of ordinary turns, each one a little heavier than it needed to be. The encouraging part is that trimming it is one of the cheapest reliability and cost wins available to an engineering team, and most of the work is habit rather than new infrastructure.

Where the tokens actually go

Before changing anything, picture a single Claude Code turn. The model receives your message, the system prompt, your project CLAUDE.md, the recent conversation history, and the raw output of whatever tool the agent just ran. On a tidy small task that might be a few thousand tokens. On a messy one, where the agent has read an entire directory or pasted a two thousand line log, it can balloon past a hundred thousand.

The cost is not only financial. A context window stuffed with irrelevant material also makes Claude slower and more likely to lose the thread, because the detail you care about is buried under noise. So the same change that lowers your AUD spend usually lifts answer quality at the same time, which is a rare thing to get for free.

Govern what enters the context window

The principle worth adopting is plain: put a filter between the agent and the model, so what reaches the context window is the part you need rather than everything that happened to be produced. In practice that filter is a mix of built-in Claude Code features and a few working habits your team can adopt today.

Reset deliberately. Use /clear when you move to an unrelated task, so a fresh request does not drag the previous one's history along for the ride.
Compact long sessions. When a conversation has to stay long, /compact swaps the full transcript for a summary and keeps the useful conclusions.
Scope your reads. Point Claude at the specific files or folders that matter instead of letting it crawl the whole repository. A precise path is cheaper and more accurate than a broad sweep.
Push fan-out work to subagents. Search-heavy steps that generate large intermediate output can run in a subagent that returns only its conclusion, so the bulky middle never reaches your main thread.
Keep CLAUDE.md lean. Project instructions are re-sent on every turn, so a tidy file pays for itself many times over across a week of real work.
Filter before you paste. Trim logs and test output down to the failing lines rather than dropping the entire dump into the chat.

Cache the parts that do not change

A second lever is caching. Stable context, such as your system prompt and project guidelines, can be cached so you are not paying full price to resend the same material on every turn. For a team that runs Claude Code all day, that saving compounds quietly in the background. Pair it with sensible model choice: send cheap mechanical steps to a smaller, faster model such as Claude Haiku, and keep the heavier models for the reasoning that genuinely needs them.

Compressing tool output, and a note on outside tools

Beyond the built-in controls, a small group of open-source projects now sits in the same spot, between the agent and the model, compressing tool output before it reaches the context window while keeping the original retrievable on demand. One example doing the rounds in the builder community, called Headroom, reports figures from its own test traces along these lines:

Code search: roughly 92 percent fewer tokens
Incident debugging: roughly 92 percent fewer tokens
Broad codebase exploration: roughly 47 percent fewer tokens

Its authors also report benchmark accuracy holding above 97 percent after compression, and the tool offers an MCP server mode so Claude can call it directly. Treat those numbers as community-reported rather than independently verified, and treat the tool itself as one example of the pattern rather than an endorsement. Compression is lossy by nature, so the only number that should guide your decision is the one you measure on your own workflow.

What this is worth in AUD

Put rough figures on it. Say a Sydney engineering team runs Claude Code across daily work at around $4,000 a month, or close to $48,000 a year. If a conservative third of that is avoidable context that never needed to reach the model, governing it well puts something near $16,000 a year back on the table, before you count the speed and quality gains. Scale the same logic to a larger group whose annual Claude Code spend sits closer to $120,000, and the avoidable slice alone can fund a meaningful chunk of next year's tooling.

Those are illustrative numbers, not a quote, and your real figure depends on how your team works. The point is the shape of it. Token waste is never dramatic on any single turn, which is exactly why it goes unmanaged for months. Measured across a year and a whole team, it is well worth an afternoon of attention.

Where to start this week

You do not need a big project to capture most of this. Pick one active repository, switch on the habits above for a week, and watch your usage dashboard. Reset between tasks, scope your reads, move search-heavy work to subagents, and trim what you paste. Most teams see the line move within days, and the habits tend to stick because the sessions feel sharper as well as cheaper.

If you would like a hand working out where your Claude Code tokens actually go, and setting up the controls and caching that suit how your team works, that is the kind of audit we run for Australian businesses. You can book a short brainstorm with us here.

How to Cut Token Waste in Claude Code (and Keep Your AUD Spend Down)

Where the tokens actually go

Govern what enters the context window

Cache the parts that do not change

Compressing tool output, and a note on outside tools

What this is worth in AUD

Where to start this week

Ready to move from AI pilot to production?

More from the blog

The Hybrid AI Stack: Claude Where It Counts, an Open Model Where It's Cheap

When Open Source Beats Claude on Cost, and When It Quietly Doesn't

The Real Cost of Self-Hosting an Open-Source LLM in Australia: 2026 Numbers