Blog

Claude Banking Pilot to Production: A 90-Day Roadmap

May 2026 · 6 min read · AI Strategy

Printed APRA compliance document with handwritten margin notes, a pen, and a coffee mug on a meeting room table
← Back to all posts

Your Claude pilot worked. The compliance team loved the demo. The risk committee saw the numbers.

Then legal asked for an audit trail. IT raised failover to a secondary provider. APRA's third-party risk governance requirements under CPS 230 surfaced as a board-level concern. Six months on, the pilot is still running in a sandbox.

This is a sequencing problem, not a technology problem. Banks that move from pilot to production in 90 days do one thing differently: they treat regulatory alignment as the long pole in the tent, not a parallel workstream. Technology is fast. The APRA CPS 230 third-party risk process, the Privacy Act mapping, the board-level sign-off: none of that runs on your timeline.

The three-phase banking production loop

The framework below reflects the shape we have run for Australian financial services firms navigating their first Claude deployment into a regulated production environment. Every phase has one primary output. Running phases concurrently is the single most reliable way to restart the clock at day 45.

Three-phase framework: regulatory alignment days 1 to 30, operational hardening days 31 to 60, shadow and assisted deployment days 61 to 90

Phase 1 (Days 1-30): Regulatory and security alignment

This phase produces one deliverable: a signed-off production deployment shape. That means APRA CPS 230 mapping, a CPS 234 audit trail design, a Privacy Act (1988) review, and a third-party risk assessment for Anthropic and the underlying compute infrastructure.

The third-party risk assessment is where most banks hit unexpected complexity. Mapping Anthropic's sub-processors, data residency positions, and contractual commitments to APRA's outsourcing requirements under CPS 230 requires legal, risk, and technology to work in the same room. Compressing this into two weeks requires all three functions to be available from day one.

Every engineering decision made before this phase is signed off is provisional. Banks that bolt the audit trail on at day 75 get told to start over. That is not a cautionary tale — it is the modal outcome.

Phase 2 (Days 31-60): Operational hardening

Phase 2 rewrites the pilot codebase as production-grade software with proper test coverage. Four checkpoints gate the phase exit:

  • Failover tested. Secondary provider failover exercised under load, not just documented in the architecture diagram.

  • SIEM integration verified. Every agent action is visible in the bank's security event monitoring before any production traffic flows.

  • Permissions framework shipped. Role-based access controls, not ad hoc API keys granted to whoever needed access during the pilot.

  • Audit logging live. The CPS 234-compliant log exists and is complete before the first production transaction.

Phase 3 (Days 61-90): Shadow and assisted deployment

Two weeks in shadow mode: the agent runs in parallel with the human workflow, both outputs are compared, and discrepancies are logged. Two weeks in assisted mode: the agent recommends, the human executes and can override. Then gradual production cutover, measured in volume percentage, not calendar dates.

Skipping shadow mode and going straight to assisted deployment will fail at least one risk committee meeting. This is not a soft preference. Shadow mode is the evidence your risk committee needs to see before authorising live traffic.

Operations team training runs concurrently through phase three. The goal is not general AI literacy. It is the specific capability to monitor agent outputs, identify drift, and escalate override patterns to the engineering team before they compound into compliance incidents.

When the 90-day shape is the wrong call

The 90-day structure is not the right size for every workstream. If your first Claude use case is a low-volume internal process that does not touch regulated data, customer decisions, or anything that attracts APRA or ASIC scrutiny, you are over-engineering the first deployment.

For those workstreams, a four-to-eight week engagement at a significantly lower investment is usually the right starting point. The AI Automation Services page covers the Accelerator and Discovery tier options for lighter-touch deployments. The full 90-day structure is warranted when:

  • The workstream touches regulated customer data or material operational risk decisions.

  • Failure has a direct path to a regulator, an audit, or a customer complaint.

  • The intent is to build shared platform infrastructure (compliance, SIEM, and audit foundations) that will be reused across multiple future workstreams.

The investment and the platform tax

A 90-day pilot-to-production engagement for a single high-value banking workstream costs around $1.4 million in capable hands, covering platform engineering, regulatory work, and operations training. Annual run cost is approximately $400,000.

The $400,000 annual run cost breaks down as platform maintenance, model usage at production volumes, and ongoing compliance overhead. This is roughly equivalent to what three to four senior analyst hours per week costs fully loaded. For most high-value banking workstreams, it is less than the manual version of the same process.

The more important number is what the second workstream costs. The audit trail, the SIEM integration, the permissions framework, the multi-provider failover: all of it was built in workstream one. A tier-two Australian bank that ran this 90-day shape on their first Claude workstream landed three more in the following six months at a fraction of the original cost. The platform tax is paid once.

Three statistics: $1.4M typical 90-day engagement, $400K annual run cost, $200K-plus per year compounding optionality value

Three signals to watch at month six

No 12-month AI plan survives the Australian market unchanged. The hyperscaler availability story, model capabilities, and regulatory clarity have all shifted materially over the last 18 months. Build a formal checkpoint at month six. The signals that should trigger a plan revision:

  • Anthropic pricing changes. A material shift in API pricing resets the cost model for every workstream queued behind your first.

  • Model capability changes. New Claude releases can make workstreams viable that were not 12 months ago, or change the ROI assumption for existing ones.

  • Regulatory positions from APRA and the OAIC. New guidance on AI in credit decisions or automated advice changes the compliance scope for several common banking use cases.

The $80,000 to $200,000 per year of compounding optionality is the strategic return the platform layer generates: the ability to shift workloads between providers, absorb model upgrades, and respond to pricing changes without a rebuild. Run the payback model in our ROI Calculator if you want to stress-test the assumption against your specific workstream.

Pick the workstream that is already bleeding the most time and revenue. Run the AI Readiness Assessment against it. If the payback clears six months, the 90-day investment is straightforward to justify to the board. If it does not, you have found a better starting point before spending the $1.4 million to discover it.

Ready to move from AI pilot to production?

We help mid-market Australian businesses deploy AI automations that actually reach production and deliver measurable ROI.