Why We Still Build on Claude After Testing Gemini 3.5 Flash

We put Gemini 3.5 Flash through real client work before writing a word about it. The short version is that it is a genuinely strong model, and we now use it in the places where it earns its keep. The longer version is that most of the work our Australian clients pay us for still runs on Claude, and this post sets out that decision in plain terms rather than as brand loyalty. We work with Australian SMBs every week, and the model choice is one of the first questions they ask us, so it is worth answering carefully rather than reaching for a slogan.

We tested Gemini 3.5 Flash properly

Google launched Gemini 3.5 Flash at I/O 2026 as a fast, low cost model that is strong on agentic and multimodal tasks. We did not take that at face value. We ran it against the kind of jobs we hand to a model every day: triaging inbound email, drafting first passes of client documents, reading long contracts, and stepping through multi part automations. On speed and price it delivered, and on several of those tasks it was the right tool for the job.

Fast output that makes high volume work cheap to run
Strong reading of images, screenshots and mixed media
Capable agentic behaviour across multi step tasks

Why Claude stays our default

For the careful, judgement heavy work that sits at the centre of most engagements, Claude stayed more dependable. The gap shows up less on a single prompt and more across long sessions, where instructions stack up and a small drift early on turns into a real problem later. When a client is paying us to draft a board paper, review a supplier agreement, or refactor a fragile codebase, that steadiness counts for more than raw speed. Claude also holds a consistent tone across a long document, which matters when the output goes straight to a customer or a regulator.

Steadier across long Australian documents and briefs
More predictable tone for client facing writing
Stronger on review heavy coding where mistakes are costly

How we actually choose a model

We are not attached to a single vendor. We pick per task, and we revisit the decision whenever a major release lands, which is close to every few weeks now. Claude is the default for work that needs judgement, and a faster, cheaper model handles safe, high volume jobs where a wrong answer is easy to spot and easy to fix. That split lets us pass the savings on for routine work without putting careful work at risk. It also keeps us honest, because the moment a cheaper model starts handling judgement work well, we will move it there and tell the client why.

Default to Claude for judgement heavy work
Use a faster model for safe, repetitive, high volume jobs
Re-test the choice whenever a significant release ships

What this costs an Australian business to get wrong

Standardising on the wrong model is an expensive mistake to make quietly. A Sydney team that builds six months of automation on a model that does not suit its real work can burn well over $60,000 in rebuilds and lost time before anyone names the problem. The cost is rarely the licence fee. It is the rework, the broken trust with staff who quietly stopped using a tool that let them down, and the client deliverables that needed a second pass. None of that shows up on an invoice, which is exactly why it runs on so long.

Rebuild costs when a default model does not fit the work
Lost staff confidence after an unreliable rollout
Client deliverables that need expensive rework

How to make the call yourself

You do not need our opinion to choose well. You need a short, honest trial on your own work. Pick two or three tasks that represent what your team actually does, run both models on them, and judge the output the way a paying client would. Write down the decision and the reasons behind it, set a review date, and treat the choice as something you revisit rather than a position you defend. The teams that do this calmly outlast the ones chasing every launch.

Test both models on your real tasks, not vendor demos
Judge the output the way a paying client would
Record the decision and set a date to review it

Talk to a Claude specialist

Automata AI is a Sydney based consultancy that helps Australian businesses put Claude to work safely, and pick a different model when that is the smarter call. If you are weighing Claude against Gemini for your own team, book a brainstorm and we will map the fastest path to value for the work you actually do.

Why We Still Build on Claude After Testing Gemini 3.5 Flash

We tested Gemini 3.5 Flash properly

Why Claude stays our default

How we actually choose a model

What this costs an Australian business to get wrong

How to make the call yourself

Talk to a Claude specialist

Ready to move from AI pilot to production?

More from the blog

Claude, GPT-Red, and the Vendor Safety Questions Every AU Business Should Be Asking

Why Cursor's Own Benchmark Team Rates Claude Fable 5 Frontier-Ready

When to Use Claude Fable 5 in Claude Cowork (And When Sonnet 5 Is Enough)