The race to bigger context windows reached a milestone in June 2026 when MiniMax M3 shipped a one-million-token window in an open-weight model. A million tokens is roughly a stack of books fed into a single prompt, and the capability is genuine. What surprises most Australian teams is the bill that arrives with it. A bigger window is not automatically a better deal, and the businesses that stay in control are the ones that treat context size as a cost decision rather than a default setting.
Why a bigger window costs more
Models charge by the token, and the input you send counts just as much as the answer you get back. A prompt that pushes an entire document library into the window can cost many times more than a focused prompt that carries only the pages that matter. Three points are worth holding onto before you reach for the biggest window on offer:
Cost scales with how much you put into the window, not only with the length of the reply.
A one-million-token prompt is usually slower and far pricier than a well-targeted ten-thousand-token one.
More context does not always mean a better answer, because a model can lose track of a detail buried deep inside a huge prompt.
For an Australian team, the easy habit of pasting everything in can quietly inflate a monthly invoice without improving a single result. The window is there to be used with intent, not filled to the brim on every call.
A worked example from a Melbourne firm
Take a Melbourne firm summarising long contracts. Sending each 200-page contract as one giant prompt might cost a few dollars per document. Across 1,000 documents a month, that adds up to real money. A retrieval approach, where the system finds and sends only the relevant clauses, can cut the same job to a fraction of the input size and bring the monthly cost from around $6,000 down toward $1,500, with no loss of quality.
The lesson is not to avoid long context. It is to use it on purpose:
Use a large window when the task genuinely needs the whole document in one pass.
Use retrieval to send only what matters when the question is targeted.
Measure the token cost of each pattern on a small sample before rolling it out across the business.
Where the spend really adds up
Long-context habits scale badly. A workload that sends large prompts on every call can cost several times more than a retrieval-based design doing identical work. At higher volumes the gap widens fast. A team processing tens of thousands of documents a month can watch long-context costs climb past $50,000 a year, when a disciplined design would have held the same workload closer to $15,000. The model is rarely the problem. The prompt pattern is.
There is a governance angle as well. Under the Privacy Act, an Australian business is responsible for the personal information it sends to any model, and a habit of pasting whole document sets into a prompt widens the surface area of data leaving your systems. Sending less per request is cheaper and lowers that exposure at the same time, which is a rare case of the careful choice also being the frugal one.
Two questions before you send a big prompt
Most overspending comes from sending more than the task requires. Before a workflow goes live, two questions catch the worst of it:
Does this task actually need the whole document, or just a section of it?
Would a retrieval step answer the same question with a tenth of the input?
If the honest answer is that a smaller prompt would do, the smaller prompt is almost always the better build. It is cheaper to run, faster to return, and easier to test, because there is less going into it that could push the model off course.
How we build it on Claude
We design client systems on Claude, the model family from Anthropic, with long context available but used by exception rather than by default. Claude offers a large context window when a task genuinely needs it, and we pair it with retrieval so the common case stays small, fast, and affordable. A whole-document question gets the whole document. A routine query gets only the few passages it actually needs.
Open-weight long-context models like MiniMax M3 have a place too, particularly where data must stay onshore for a Sydney or regional client with a strict residency rule. The cost discipline is the same whichever model sits underneath. The saving comes from the design of the system, not the badge on the model.
The takeaway for budgets
A one-million-token window is a tool, not a switch you leave on. The Australian teams that keep their AI costs predictable are the ones that match the context size to the task and watch the bill as they go. A short design review before launch, costing around $3,500, can save many times that across a year of running prompts, and it pays for itself the first month a routine job stops being sent at full-document prices.
If you want to size the long-context cost for your own workload before it lands on an invoice, book a brainstorm and we will model it against your real volumes.



