On 1 June 2026, MiniMax released M3, the first open-weight model to land within a rounding error of Claude Opus on the coding benchmark that matters most to working engineers. For Australian teams already building on Claude, the launch raises a fair question: is a model that costs a fraction of Opus and ships with open weights a reason to switch? The short answer is that M3 is a real competitor, not hype. The longer answer is that the benchmark number is the least interesting part of the decision.
What MiniMax M3 actually is
MiniMax M3 packages three frontier capabilities that most open-weight models ship separately. It is built for agentic and coding work, and the weights are scheduled for public release around 11 June, which means teams with the infrastructure to self-host can run it inside their own environment.
One-million-token context. Native rather than stitched together from chunked retrieval, so the whole codebase can sit in context.
Native multimodal input. Text and vision in the base model instead of a bolted-on vision pipeline.
Switchable thinking mode. A faster default mode or a slower reasoning mode, toggled on demand.
Open weights. Releasing around 11 June, enabling self-hosting for teams that want full control of where inference runs.
The benchmark that has everyone's attention
On Terminal-Bench 2.1, the test built around real agentic terminal tasks rather than toy puzzles, MiniMax M3 scored 66.0 against Claude Opus 4.7 at 66.1. That is not a gap an engineer would feel day to day. During its launch week MiniMax priced M3 at $0.30 per million input tokens and $1.20 per million output tokens, close to DeepSeek territory and well below the premium tier where Claude Opus sits. For a high-volume code-generation pipeline, that difference is real money.
Why the price gap is smaller than it looks
Token price is the most visible cost and rarely the largest one. Consider a 20-engineer Sydney team running coding agents across a busy month. The headline saving from routing every request to the cheaper model might look like $40,000 a year. Once you add the work of standing up self-hosted inference, the GPU capacity to serve a one-million-token context at production latency, and the engineering hours to monitor and patch it, a chunk of that saving is spent before the first line of code ships. A managed Claude deployment folds those costs into the price.
Self-hosting open weights means provisioning and paying for GPU capacity that sits idle between bursts.
A one-million-token context is expensive to serve at low latency, so real throughput costs more than the promotional token price suggests.
Time to monitor, evaluate, and patch a self-hosted model is an ongoing line item, often $120K or more in loaded salary for a single platform engineer.
Switching costs compound if you later move workloads back, so the year-one saving rarely repeats.
Where Claude still wins for Australian enterprises
Benchmarks measure capability on a fixed test. Production measures something else: predictability under edge cases, accountability when something breaks, and a documentation trail that a procurement or risk team can actually read. This is where the decision tilts back toward Claude for most regulated Australian work.
Instruction following at the edges. Claude's Constitutional AI training produces more predictable refusals and safer outputs in compliance-sensitive scenarios, which matters when an agent touches customer data covered by the Privacy Act.
Enterprise support and SLA. An open-weight model has no partnership layer and no formal service agreement behind it, so when a production incident hits, you own the whole stack.
Local presence. Anthropic has an Australia and New Zealand GM and a published APAC compliance roadmap; MiniMax has neither, which is a live concern for APRA-regulated firms and any buyer weighing AUSTRAC obligations.
Audit trail. Claude ships model cards, published evals, and safety documentation that satisfy legal and procurement reviews in financial services, healthcare, and government.
The hybrid architecture worth considering
The most interesting use of M3 is not as a replacement but as a component. Because MiniMax exposes an Anthropic-compatible API endpoint, you can keep Claude Opus as the orchestrator that handles planning, reasoning, and any sensitive tool calls, and route bulk, repetitive code generation to M3 as a cheaper sub-agent. For a cost-sensitive, low-regulatory pipeline, that split can cut spend without giving up Claude's judgment where judgment counts. We would not point regulated workloads at it, but for internal tooling and non-sensitive batch work it is worth a controlled trial.
What we tell Australian clients
MiniMax M3 is the first open-weight model that genuinely competes with Opus on coding, and pretending otherwise would not help anyone. For a business choosing the foundation of an automation stack it will ship to customers, the question is not which model wins a single benchmark this month. It is which platform gives your team predictable behaviour, real support, and a compliance story your auditors accept. For most Australian enterprises, that remains Claude, with M3 as a tool to deploy deliberately rather than a default to switch to. If you are weighing the two for a real workload, we can help you map the right architecture: book a brainstorm via our contact page.



