What Claude Users Should Know About Gemini 3.5

On 19 May 2026, Google announced Gemini 3.5, with the Flash variant shipping first and the Pro variant flagged for the following month. The marketing is loud, the benchmarks are sharp, and a handful of clients in Sydney and Melbourne have already pinged us asking whether their Claude rollouts need to pause. The short answer is no. The longer answer is more interesting, because Gemini 3.5 does signal something real about where agentic AI is heading, and Australian enterprises that bet on Claude should understand exactly which parts of the launch matter and which parts can be safely ignored.

What Google actually shipped

Gemini 3.5 is a frontier-model family combining reasoning with action. Flash is in general availability today through the Gemini app, AI Mode in Google Search, the Gemini API in Google AI Studio, Antigravity, Android Studio, and the Gemini Enterprise Agent Platform. Pro is in internal use and is scheduled for broader release next month. The pitch is that Flash now rivals last-generation flagship models on agent-style tasks while running at Flash-series throughput, which Google quotes at roughly four times the output tokens per second of comparable frontier models.

Google's published numbers position Gemini 3.5 Flash above Gemini 3.1 Pro on a small set of benchmarks. These are the headline claims worth knowing before any meeting about whether to add Gemini to your AI vendor list:

Terminal-Bench 2.1 at 76.2 percent, an agentic coding evaluation
GDPval-AA at 1656 Elo, a productivity tasks rating across knowledge work
MCP Atlas at 83.6 percent, a Model Context Protocol tool-use evaluation
CharXiv Reasoning at 84.2 percent, a multimodal chart-understanding benchmark
Roughly four times the output tokens per second versus other frontier models

Two things to keep in mind. First, these are Google's own numbers, not independently reproduced by Stanford, AI2, or the Australian government's AI Risk Group, so treat them the way you would treat any vendor's first-party benchmark sheet. Second, Terminal-Bench, GDPval, and MCP Atlas are all relatively young benchmarks, which means the variance from prompt format and harness choice is high. A six-point gap on Terminal-Bench may or may not survive a real-world Australian engineering rollout.

Where Claude sits in the comparison today

Claude Opus 4.7 and the Sonnet 4.6 and Haiku 4.5 family are not standing still. They are already in production across PwC's global rollout to 30,000 professionals, inside Anthropic's own engineering and cybersecurity teams, and across the Cowork, Code, and Platform surfaces that Australian builders use every day. The capability story you can ship to an Australian client this quarter is not a benchmark printout. It is a stack: Claude as the reasoning core, Skills for domain-specific behaviours, Cowork for the desktop workflow, MCP servers for the rest of the customer's software, and Claude Code in repositories at Australian banks, insurers, and law firms today.

Gemini 3.5 Flash is, today, available through Google AI Studio and the Gemini Enterprise Agent Platform. Those are credible surfaces. They are also less mature than what Australian buyers see on the Claude side, where Skills, Cowork, and MCP are already wired through the production tooling that Sydney and Melbourne teams have spent months training their staff to use.

A practical lens for Australian enterprises

If you are running a Claude-first stack in Sydney, Melbourne, or Brisbane, the Gemini 3.5 launch should change roughly nothing about your next 90 days. Switching base models in a production agent is not a benchmark exercise. It is a re-platforming project. For a typical Australian mid-market deployment with 200 active seats, retraining staff, rebuilding the prompts library, re-running your APRA CPS 230 operational risk assessment, and re-certifying any Privacy Act 1988 data-handling controls comes in at roughly $85,000 of internal and consulting cost. That is the real cost of model-vendor churn, and it is one of the strongest reasons to think carefully before chasing the latest leaderboard delta.

There are categories where Gemini 3.5 is genuinely worth a pilot. Multimodal chart reasoning, where Google has consistently shown strong numbers, is one. Video-grounded reasoning, where Gemini's training data and pipeline have a head start, is another. For Australian organisations under AUSTRAC obligations or in scope of ASIC market integrity rules, the bar to add a second AI vendor is high, and the right framing is rarely the word 'switch'. It is 'where does a Gemini call slot inside an otherwise Claude-shaped architecture'.

When it makes sense to run Gemini alongside Claude

For most Australian teams the answer is: rarely, and only when the workload is narrow enough to justify a second compliance review. A Claude-centric architecture covers the majority of knowledge-work agents, coding assistants, customer support copilots, and legal or financial document workflows that Australian mid-market firms actually run. A second model adds a second vendor risk profile, a second contract, a second set of usage caps, and a second incident response surface for the security team to manage.

That said, there are workloads where Gemini's strengths in multimodal video and chart understanding may justify a focused pilot:

Long-form video summarisation, where Gemini ingests raw video and Claude does the narrative layer downstream
Chart-heavy report parsing in finance and analyst workflows, where the multimodal benchmark scores are most relevant
Hedging workloads on regulated Australian government contracts where a single-vendor dependency is itself a procurement risk

In every one of these cases, the architectural pattern is Claude as the orchestration core and Gemini called as a tool, not Gemini as a substitute. The MCP layer makes that pattern straightforward. Claude is already MCP-native, and the same Skills and Cowork plugins continue to work whichever upstream tool gets invoked.

What to actually do this week

If you are an Australian CIO, head of AI, or engineering lead reading this from a Claude-shaped roadmap, here is what we tell our clients. Hold the roadmap. Note the Gemini 3.5 launch in your vendor risk register so that procurement and security have it on file. Schedule one engineering spike in the next sprint to run a single agentic benchmark on a representative workload from your business, side by side, on Gemini 3.5 Flash and Claude Sonnet 4.6. Report the result, the latency, and the price per task to your steering committee. Then keep shipping. Vendor benchmark cycles run on six-week cadences and Australian enterprise AI rollouts run on twelve-month cadences. The cadences do not need to be matched.

If you want a structured read on what a Claude-first Australian stack should look like in light of the Gemini 3.5 launch, or you want a second opinion on a multi-model pilot, we run a 30-minute working session through our contact page. We are an Australian Claude specialist consultancy and the work we do is independent of any single vendor cycle.

What Claude Users Should Know About Gemini 3.5

What Google actually shipped

Where Claude sits in the comparison today

A practical lens for Australian enterprises

When it makes sense to run Gemini alongside Claude

What to actually do this week

Ready to move from AI pilot to production?

More from the blog

Claude Cowork and Canva: Marketing Output for Non-Designers

From ChatGPT to Claude Cowork: A Migration Guide for Australian Teams

Claude Cowork With Outlook and Microsoft 365: What Works Today