How to Manage Claude Code Sessions at Million-Token Scale

You got access to a million-token context window. The first thing most engineering teams do is load everything. The entire source tree, all the docs, every SDK reference, poured into a single session.

The cost comes back at $30 for a morning's work. The session times out halfway through a refactor. The cache goes cold between meetings and you're paying to reprocess the same 200K tokens of static boilerplate again.

That's not a model limitation. That's a session architecture problem.

The naive approach costs more than you think

A million-token context is genuinely useful. A senior engineer at a Sydney-based software company can now ask Claude Code to reason across their entire codebase, not a curated sample of it, and get answers that account for the full system. No more manually scoping the context to the files you think are relevant. That's a real change. But reasoning across a codebase does not mean reloading it from scratch every session, and that's the part most teams get wrong.

Claude's prompt cache has a 5-minute TTL. If your session design doesn't account for that, every new conversation reprocesses the same large block of static content from the ground up. The model can't tell the difference between cached tokens and freshly processed ones. Your bill can.

A typical work session without cache hits costs AUD$10–$15. With a warm cache, that drops to AUD$4–$6. Run 30 work sessions a month per developer and you're looking at AUD$300–$450 versus AUD$120–$180. Across a five-person engineering team, that's a gap worth closing. Our ROI Calculator can help you model the exact figures for your team's usage pattern and session volume.

The foundation-work split

The pattern that fixes this is the foundation-work split: two types of sessions, used for two different purposes, with the cache TTL built into the design rather than worked around. Think of it as separating your static inputs from your active outputs. The foundation stays constant; the work sessions change.

Foundation session. Run once per work block. Load everything that won't change during the sprint: project structure, architecture docs, locked dependency versions, internal conventions, SDK references. This is your 150K–200K token investment, amortised across every work session that follows.
Work sessions. Run many. Each starts from the cached foundation and adds only what's active: the files you're modifying, the error you're debugging, the diff you're reviewing. Fast, targeted, and cheap because the heavy lifting is already cached.

In practice: start a foundation session, kick off your first work session within 60 seconds, and you'll see 90%+ cache hit rates across the day. Let a session lapse past the 5-minute TTL without a follow-up and the cache goes cold. The foundation needs to stay warm, not just be loaded once and forgotten.

Four-step framework card titled The Foundation-Work Split, showing steps to build the foundation, start work quickly, keep the foundation stable, and monitor cache hit rate

Scaling the pattern for larger codebases

The foundation-work split scales with codebase complexity, but it doesn't scale by making the foundation bigger. A 400K-token monorepo doesn't need one monolithic foundation. It needs four focused ones, each scoped to the team working in that domain, so agents aren't carrying irrelevant context into every conversation.

Per 100K tokens of active codebase, budget one foundation layer per major feature area. Auth, payments, data pipeline, and API layer each get their own foundation. Agents working in one domain don't need to load another domain's static content.
Monitor your cache hit rate. Below 40% means the foundation is too volatile. Static docs and stable interfaces belong in the foundation. Active feature branches and files that change daily don't.
Budget AUD$20–$50 per developer per month for full-project context. If you're spending more, the foundation is either too large or not being reused across enough work sessions.

For a 10-person engineering team in Melbourne running a mid-market product, that's AUD$200–$500 per month total. Less than one day of a mid-level developer's time, for something that fundamentally changes what a single agent can do. We cover how we structure these patterns for teams at different stages in our AI Automation Services.

When session architecture is the wrong move

Not every project needs this. Session architecture adds design overhead, and if you reach for it before you've earned it, you're building infrastructure for a problem you don't have yet.

Codebases under 50K tokens. Load everything every time. The economics don't justify a separate foundation layer at this scale, and the overhead of maintaining one isn't worth it.
Greenfield prototypes with no stable structure. A foundation session assumes the foundation is stable. If the codebase is being restructured every few days, cached content churns faster than you build hit rate.
Exploratory or one-off use. If you're using Claude Code to understand an unfamiliar codebase rather than to build in it systematically, session architecture adds friction without return.

This is also where I'd push back on vendors selling 1M-context as a fix for unstructured workflows. The context window is not the bottleneck for most teams. Session structure is. Throwing more tokens at an unstructured workflow means paying more for the same problem. Our AI Readiness Assessment covers whether your team's AI usage has hit the inflection point where session architecture starts to pay.

Stats card showing session cost comparison: AUD$4 to $6 per cached session, AUD$10 to $15 per cold session, 90 percent plus cache hit rate with proper architecture

Enterprise codebases: coordinating multiple foundations

At enterprise scale — 500K+ token codebases, multi-squad teams — the foundation-work split evolves into what I call a session graph. Each squad maintains a domain-scoped foundation built around the code they own. A shared foundation covers cross-cutting concerns: auth contracts, API schemas, shared utilities, third-party integration specs. Agents working on auth don't carry the payments domain's context. Agents working on the data pipeline don't carry the frontend's.

A single agent can reason across the entire system when sessions are structured this way. Not through a monolithic context window that carries everything at once, but through coordinated foundations that each agent accesses within their domain, with handoffs that carry only what's needed across domain boundaries. The 3–6x productivity gains that engineering teams report from Claude Code almost always trace back to this kind of session discipline. Not to the raw size of the context window, but to how carefully that window is filled.

The per-developer budget holds at AUD$20–$50 per month even at this scale. The compound effect of multiple agents with warm caches, each scoped to their domain, is where the gains become structural rather than individual.

If this is the architecture you're building toward, talk to our team — we've structured this pattern for Australian engineering teams across several industries and can tell you quickly whether the scale justifies the approach.

The context window is not the bottleneck. Session architecture is. Get that right, and a single Claude Code agent can hold the complete mental model of your production system across every conversation.

How to Manage Claude Code Sessions at Million-Token Scale

The naive approach costs more than you think

The foundation-work split

Scaling the pattern for larger codebases

When session architecture is the wrong move

Enterprise codebases: coordinating multiple foundations

Ready to move from AI pilot to production?

More from the blog

A CISO's Framework for Agentic AI: What Anthropic's Security Team Learned

Claude Code Can Migrate a Million Lines of Legacy Code in Two Weeks

Claude Code Can Set Up Your Server So You Don't Need a DevOps Hire