OpenAI has previewed a new family of models, and the launch slide name-checks Claude. For Australian businesses already building on Claude, that is the most useful detail in the announcement. When a competitor picks Claude Mythos 5 as the score to beat, it tells you where the frontier sits and which model is the reference point. Here is a measured read on the GPT-5.6 preview, and what it should and should not change for your team.
What OpenAI actually announced
OpenAI previewed three models under the GPT-5.6 name: Sol, positioned as the flagship for developers and enterprises; Terra, a balanced option for everyday work; and Luna, a faster and cheaper tier. The company says Sol slightly outperforms Claude Mythos 5 on a cyber-security benchmark called ExploitBench while using roughly 80% fewer output tokens. Access is a limited preview through the API and Codex, with wider availability promised in the coming weeks.
The vendor-stated prices, in US dollars per million tokens, are Sol at $5 in and $30 out, Terra at $2.50 and $15, and Luna at $1 and $6. Read those as indicative only. They are preview prices, quoted in USD, and your real bill in Australian dollars depends on the exchange rate on the day and on how many tokens your workloads actually use. At current rates Luna's $1 input lands near $1.50 in AUD, but that figure moves with the market.
Two caveats sit under all of this. The benchmark result and the pricing are reported by OpenAI, not independently verified. And the models are in limited preview, so most teams cannot run their own tests yet. Treat the headline as a claim to check later, not a result to act on now.
It also helps to know what ExploitBench measures. It scores a model on offensive cyber-security tasks, which is one narrow slice of capability and a long way from the work most Australian businesses run day to day. A strong score there tells you little about how a model drafts a contract summary, reconciles a month of invoices, or runs a customer support workflow without going off-script. It is a useful signal about raw capability at the frontier, but it is a single benchmark on a single kind of task, and your evaluation should weigh the jobs you actually care about.
The Claude-first read
A launch slide is a marketing artefact, not a migration plan. Here is how we read it for clients standardised on Claude:
The headline is a benchmark, not your bill. 80% fewer output tokens on one cyber benchmark is not total cost of ownership across your prompts, guardrails and real tasks.
Claude is the reference point. OpenAI chose Claude Mythos 5 as the baseline to beat, which is a useful signal about where the frontier actually is.
Your switching cost is the integration, not the model name. If you run Cowork, Claude Code and MCP-based agents, the cost of moving is the whole surface you have built, not a line in a pricing table.
Capability is what you get day to day. Weigh any raw token-efficiency edge against the Skills and MCP ecosystem, predictable safety behaviour, and an agent stack your team already trusts.
What this changes for an Australian business (for now, not much)
If you are running Claude in production today, the honest answer is that a limited-preview competitor model changes nothing this week. You cannot buy most of it yet, and the one public number is a single benchmark. The right response is a measured evaluation when access opens, not a reaction to a launch.
When you do evaluate, run a like-for-like test on your own tasks, with your own prompts and your own AUD cost profile. A proper eval across a few real workflows costs a fraction of a rushed migration. Budgeting $15,000 to test a model change on your data is cheap next to re-platforming an agent stack that already works and then discovering the benchmark did not hold on your jobs.
For regulated Australian businesses there is a second test beyond cost and capability. Any model you put near customer data has to fit your obligations under the Privacy Act and, for APRA-regulated firms, your outsourcing and data-handling controls. A model being cheaper per token does not make it compliant in your environment. That assessment is part of the eval, not an afterthought.
How we would handle the question
When a client in Sydney asks whether they should move to GPT-5.6, we do not answer from the launch post. We answer with evidence: a scoped evaluation on their workloads, their cost profile in Australian dollars and their compliance posture. We lead with Claude because it is the product we build on and trust, and we use moments like this to pressure-test that choice honestly rather than defend it on reflex.
If your team is weighing a model change and wants a clear-eyed evaluation rather than a benchmark headline, we can help you design one. You can book a brainstorm session and we will scope it against your actual workloads.



