Most teams run Claude Code as a single agent. For a one-off question, that's fine. For a recurring workflow, it's the wrong shape entirely.
The teams getting disproportionate returns from Claude Code subagents have stopped thinking about them as a chat interface and started thinking about them as an orchestration layer. The single-agent model is a bottleneck when the task has distinct phases: research, execution, verification. A single agent switching between those modes reloads context on every turn. A composition locks each mode to the agent best suited for it.
The four compositions worth standardising
Not every subagent pattern is worth building. These four have earned their keep in production. Together they form what we call the four subagent compositions: a named set that Australian platform teams can adopt, adapt, and measure.
1. Researcher plus implementer
One subagent reads the codebase, pulls the relevant modules and documentation, and writes a structured brief: what needs changing, what must not break, what edge cases to handle. A second subagent picks up that brief and executes the change. The handoff in writing is the whole point. Without it, the implementer makes assumptions the researcher already resolved.
This pattern cuts context-loading from every implementation run. For codebases over 80,000 lines, it's often the only way to keep token costs rational.
The ROI shows up in two places. First, the researcher builds a brief once and the implementer works against a fixed target. No mid-task context re-reads. Second, the brief is reviewable: a senior engineer can approve it before the implementer runs, catching scope creep before any code is written. That's a meaningful shift from reviewing code after the fact.
2. Reviewer plus fixer
A subagent reviews a pull request and produces a structured comment list: what to fix, what to flag, what to ignore. A second subagent applies the fixes. The reviewer does not fix; the fixer does not evaluate. Scope is everything here. In practice this halves the round-trip on code review for medium-complexity PRs.
3. Triager plus runners
A parent agent reads a queue of failing tests and classifies them into clusters: infrastructure failures, logic regressions, data issues. It then dispatches subagents in parallel against each cluster. What takes an engineer a morning becomes 20 minutes.
A Sydney engineering team of 12 running this pattern reduced their weekly triage load from roughly 4 hours to 40 minutes, saving around $310,000 in attention cost over six months. The pattern paid for the platform engineer who built it in under two months.

4. Verifier loop
Any subagent that writes code is paired with a verifier subagent that runs the tests, reads the diff, and reports back only if the work passes. Without this, you get false-positive completions: a subagent announces success, the parent agent reports up the chain, and nobody has checked the actual output. A subagent that finishes in 3 minutes with broken tests is worse than a single agent that takes 12 minutes and finishes correctly. The verifier loop is not optional in a production workflow.

The rule that makes subagents reliable
A subagent is a contract, not a chat.
Give it a brief. Give it a scope. Give it the tools it needs and nothing else. The parent agent reads the result, not the transcript. Teams that try to monitor subagents in real time or intervene mid-execution end up with worse outcomes than a single agent would have produced. The orchestration layer only works when you trust the contracts you wrote upfront. Intervening mid-run breaks the contract.
That means investing time in the briefs before you invest time in the tooling. A vague brief produces a vague subagent. If your team is building these patterns from scratch, our AI Automation Services cover the brief design work before touching the orchestration layer.
When subagents are the wrong move
Not every task benefits from composition.
If the work takes under 15 minutes with a single agent and does not repeat more than a few times a week, the overhead of building and maintaining the pattern probably is not worth it. Subagents amplify token usage. On a complex triager-plus-runners run across 50 failing tests, your token spend can be three to four times what a sequential single-agent approach would cost. That is justified when the workflow runs daily and the time savings are real. It is not justified for a one-off.
Two situations where we tell teams to hold off:
The brief discipline is not there yet. If your team has not developed clear briefs for single-agent tasks, adding subagent orchestration amplifies the ambiguity. Get the brief practice right first.
Token budgets are not instrumented. Subagents running unchecked can burn through your monthly allocation in days. Model the cost before building. Our ROI Calculator is the fastest way to estimate token spend against your workflows.
What measurement looks like in practice
Three numbers on a shared dashboard from week one.
Time-to-first-result on the agent flow. Not wall-clock time. The elapsed time from task submission to a result the engineer can actually evaluate.
Cost-per-task in tokens. Translate to AUD at your organisation's effective rate. An engineer at $120 per hour fully loaded is the denominator: if the subagent pattern saves 30 minutes per run and runs 50 times a week, the business case writes itself.
Acceptance rate. How often does the engineer accept the agent's output without major edits? If this drops below 70%, the briefs need work before the architecture does.
These three numbers are interdependent. A high acceptance rate with high cost-per-task means the output quality is there but the token efficiency is not. A low acceptance rate with low cost-per-task means the briefs are weak. Getting all three into acceptable range at the same time is the sign the pattern has stabilised.
A Sydney engineering platform team running this measurement discipline found the dashboard itself drove around $120,000 per year in additional savings. One or two flows were burning 40% of the token budget while delivering 10% of the value. Retiring them was a one-day decision once the data existed.
The second discipline is a quarterly review of the agent surface. Patterns that earned their keep stay. Patterns that did not get retired. Australian engineering teams that run this review treat their Claude Code tooling as a product rather than a project. Products get measured, iterated, and sometimes deprecated. Projects get built and quietly abandoned.
Pick one workflow this week. Apply the researcher-plus-implementer pattern. Measure the three numbers for two weeks. If the acceptance rate is not there, the brief needs work before the architecture does. The AI Readiness Assessment is where most teams start when they're unsure which workflow to pick first.



