Claude Code Harness Engineering: Make Your Repo Legible, Executable and Verifiable for Agents

A recent community workshop on Claude Code put a name to a pattern Australian engineering teams keep rediscovering: the agent is rarely the bottleneck. The repository is. Harness engineering, as the workshop framed it, is the practice of building the environment around your agent so it can handle long-horizon, multi-session and multi-agent work reliably. The teams getting the most from Claude Code treat that environment as a first-class engineering project, not an afterthought.

The evolution reported in the session maps onto what we see inside client codebases. In 2023 the craft was prompt engineering: single tasks against 4,000-token context windows. In 2024 it became context engineering: agentic loops and compaction strategies that kept one session coherent. From 2025 onward the frontier moved to harness engineering: cross-session work, tasks that run for three hours or more, and handoff logic between agents. Each step shifts effort away from the prompt itself and into the scaffolding around it.

What follows is our commentary on the ideas reported from that workshop, filtered through what we see when Australian teams ask why their Claude Code pilots plateau. The specifics below are workshop-reported rather than independently benchmarked, but they line up closely with field experience.

Three properties every codebase needs

Before running serious agentic workloads, the workshop argued, a repo needs to be legible, executable and verifiable. The framework is useful precisely because it is checkable. You can audit a codebase against it in an afternoon and know where a Claude Code rollout will stall.

Legible: the agent can find your rules and structure without reading half the repo.
Executable: the agent can start the system and put it into known states on its own.
Verifiable: the agent can prove a change works without a human watching over its shoulder.

Legible: a short index, not a long rulebook

The reported guidance is to keep CLAUDE.md under 100 lines and treat it as an index, with detailed documentation living in a /docs folder the agent pulls in when a task demands it. One example shared in the workshop was a 52-line CLAUDE.md serving a production monorepo. Brevity is the feature: a long rules file gets skimmed, by humans and agents alike.

The sharper move is encoding constraints as custom lint rules rather than prose. The workshop example used custom ESLint rules to enforce monorepo import boundaries, which means the agent catches a violation in its own logs the moment it runs the linter. A rule the toolchain enforces does not depend on the agent reading, remembering or agreeing with a document.

Executable: one script to bring everything up

Agents do their best work when verification is cheap. The workshop recommendation is a single script that spins up the full dev server, with the port configurable as a CLI argument so several agents can run in parallel worktrees without colliding. The reported setup used a tmux-based script for exactly this.

State-switching helpers matter just as much. If the agent can flip between authenticated and unauthenticated states, or load seeded data with one command, it can verify its own work without manual setup. Every state a human has to arrange by hand is a state the agent will quietly decline to test.

Verifiable: make the agent prove it

The strongest pattern reported was a PR skill that spawns a sub-agent to verify the work, returns a structured failure report when tests fail, and loops until every acceptance criterion passes before a pull request is opened. Alongside it, Playwright's CLI records test videos the agent attaches to the PR, so a reviewer in Sydney can watch the feature working before reading a single line of diff.

The same idea extends beyond UI work. One team described a backend migration harness built on A/B comparison scripts and structured diff reports: the old and new code paths run side by side, and the agent gets a machine-readable answer to whether behaviour changed.

Goal prompts, schedules and atomic tools

Harness engineering also reaches into how tasks are written. The workshop's guidance for goal prompts in Claude Code: state the scope, name the verification method, set guardrails, and define stop conditions, so the agent knows what finished looks like and when to halt rather than improvise.

Scheduled autonomous tasks suit recurring work, such as monitoring support tickets every 30 minutes.
Keep agent tools atomic. As models improve, one batch tool tends to beat five specialised ones.
Write stop conditions explicitly. An agent that knows when to halt is safer than one that guesses.

What this means for Australian engineering teams

The economics are direct. A senior engineer in Sydney or Melbourne costs upwards of $160,000 a year, and the difference between an agent that completes a three-hour task and one that stalls after twenty minutes is rarely the model. It is whether the repo let the agent see the rules, run the system and check the result. Readiness work of this kind is exactly what determines whether a Claude Code rollout sticks or quietly gets abandoned after the pilot.

Most codebases we look at fail at least one of the three properties, and the fix is usually days of work, not months: a trimmed CLAUDE.md, a dev-server script, a verification loop on the PR path. Model capability improves every quarter on someone else's schedule. The harness is the part you control.

Talk to a Claude specialist

Automata AI audits and prepares Australian codebases for agentic work, from CLAUDE.md structure through to verification loops. If you want a clear read on whether your repo is ready for Claude Code, book a short brainstorm and we will map the gaps with you.

Claude Code Harness Engineering: Make Your Repo Legible, Executable and Verifiable for Agents

Three properties every codebase needs

Legible: a short index, not a long rulebook

Executable: one script to bring everything up

Verifiable: make the agent prove it

Goal prompts, schedules and atomic tools

What this means for Australian engineering teams

Talk to a Claude specialist

Ready to move from AI pilot to production?

More from the blog

A CISO's Framework for Agentic AI: What Anthropic's Security Team Learned

Claude Code Can Migrate a Million Lines of Legacy Code in Two Weeks

Claude Code Can Set Up Your Server So You Don't Need a DevOps Hire