Blog

Claude Code Skills in Production: Lessons from Hundreds of Skills at Anthropic

June 2026 · 5 min read · Technical

Rows of labelled folders arranged on a desk, representing a library of reusable skills
← Back to all posts

Anthropic published a post on 3 June 2026 distilling what it learned from building and scaling hundreds of Claude Code skills in daily internal use. It is the most useful first-party guidance yet on the question every team building with Claude eventually asks: which skills are worth making, how should they be structured, and when should they be shared across the team.

The timing matters for Australian engineering teams. Skills have moved from a power-user trick to the main extension point in Claude Code, and the teams that build a disciplined skill library early compound the benefit on every ticket. A fully loaded senior engineer in Sydney costs a business about $150,000 a year. A skill that removes one hour of repeated setup or review work per engineer per week pays back its build cost inside a month.

Skills are folders, not markdown files

The first lesson in Anthropic's catalogue is structural. Most teams start by treating a skill as a single SKILL.md file full of instructions. The skills that survive production use are folders: instructions plus scripts the agent can run, reference data it can read, validators that check its output, and configuration such as dynamic hooks that fire at the right moment in a session.

The difference is what a skill can guarantee. An instruction file hopes the model behaves. A folder with a bundled validator proves that it did, because the agent can run the check itself and fix what fails before a human ever reviews it. In regulated Australian environments, where APRA-supervised firms need to show how automated work was checked, that bundled evidence is the difference between a tool the risk team tolerates and one it signs off.

Hooks deserve a special mention because they are the newest and least used feature in the set. A skill can register a hook that runs automatically at defined points, which means checks stop depending on the agent remembering to invoke them. For long-running agentic sessions, that shift from invoked-when-remembered to fires-every-time is what makes a skill dependable enough to build a process on.

The nine-category lesson

After cataloguing its internal library, Anthropic found that hundreds of skills cluster into nine categories. The taxonomy itself matters less than the boundary rule that fell out of it: the best skills fit cleanly into one category. Skills that try to do too much straddle several, and a straddling skill confuses the agent that has to decide when to invoke it. The result is a skill that fires when it should not and stays silent when it should.

That matches what we see in client work. The skills that stick are narrow, verifiable, and shipped with their own tooling. The ones that rot are the kitchen-sink skills that mix release notes, infrastructure checks, and Slack etiquette in a single file.

Anthropic's guidance, condensed for a team starting its own library:

  • Catalogue the repeated work first, then write skills against the clusters you actually find rather than the ones you imagine.

  • Keep each skill single-purpose. If a draft straddles two jobs, split it into two skills.

  • Use the whole folder. Bundle scripts, reference data, and validators the agent can execute, not just prose instructions.

  • Promote a skill from personal to team-wide only after it has survived real use on real tickets.

What this means for Australian engineering teams

Run the numbers on a 20-engineer team in Sydney or Melbourne. If each engineer loses three hours a week to work a skill could absorb, things like environment setup, review checklists, test scaffolding, and release notes, that is roughly $200,000 of engineering time a year at typical fully loaded rates. A working skill library that claws back even half of that is one of the cheapest productivity investments available to an Australian engineering organisation in 2026.

There is also a quality argument the cost figure undersells. A validator the agent runs on every invocation applies the team's standard every single time, including at 5pm on a Friday. Human reviewers do not. Teams that encode their review checklist as a skill see the baseline rise on every pull request, not just the ones a senior happens to read closely.

Data handling belongs in the plan too. A skill that touches customer records needs the same Privacy Act thinking as any other internal tool, and bundling the handling rules into the skill itself is easier than policing them after the fact.

How to start a skill library this quarter

Start with an audit, not a build. Spend a week logging the prompts your engineers repeat, the setup steps they re-explain, and the checks reviewers apply by hand. The clusters that emerge are your first three skills. Build those, ship them to the people who felt the pain, and measure invocation counts and time saved before writing the next batch.

Resist the urge to write ten skills in a weekend. Anthropic's own experience with hundreds of skills points the other way: a small number of single-purpose, validated skills beats a large library of vague ones, because the agent picks the right tool reliably and the team learns to trust the output.

Automata AI designs and builds custom Claude skills for Australian engineering teams as a core service, from the initial workload audit through to validators and team-wide rollout. If you want a skill library shaped by your team's actual repeated work rather than a generic template, book a short scoping chat.

Ready to move from AI pilot to production?

We help mid-market Australian businesses deploy AI automations that actually reach production and deliver measurable ROI.