Blog

Claude Computer and Browser Use: Production Agent Best Practices for Australian Teams

May 2026 · 7 min read · Technical

Stylised illustration of a Claude browser agent operating a desktop with a Sydney skyline silhouette behind it
← Back to all posts

Most Australian engineering teams shipping their first browser agent on Claude run into the same wall about three weeks in. The agent looks great in demos. Then it gets pointed at a real internal tool and starts misclicking. Buttons get missed by 40 pixels. Forms partially fill. The team blames the model and starts shopping for a workaround. Almost always, the root cause is screenshot scaling, not the model.

Anthropic published a Claude Computer and Browser Use best-practices guide on 13 May 2026 that puts the canonical answer in writing for the first time. The substance is unglamorous: pre-downscale your screenshots, match coordinate spaces, and verify clicks against the image the model actually sees. It is also the single highest-impact change a Sydney or Melbourne team can make to its agent reliability before reaching for a more expensive model.

This is the playbook our agent-build engagements have been reverse-engineering by trial. The official guidance now lets Australian teams skip the resolution-tuning phase entirely on every new browser-agent project. Below is the version we hand to clients in the first week of an engagement.

Why click accuracy decides whether your agent ships

Click accuracy is the foundation of any Claude Computer Use integration. If clicks do not land where they should, nothing downstream works. Forms do not get filled. Buttons do not get pressed. Multi-step workflows fail halfway through and the agent enters a doom loop of retries. For an Australian back-office automation project, that translates directly into a stalled rollout, a frustrated sponsor, and a hard conversation about whether autonomous agents are ready for production at all.

The conversation is usually unnecessary. In our experience across recent Australian client builds, more than 70 percent of reported "model accuracy" issues on browser agents resolve as soon as the harness sends Claude images that match the coordinate space the model is reasoning in. The model is not wrong. The harness is asking it to click on a degraded preview of a high-resolution screenshot while expecting coordinates aligned to the original image.

If you only remember one rule from the official guide, remember this: pre-downscale screenshots before sending them to the API, and tell Claude the resolution you actually sent.

The resolution rule that fixes 70 percent of issues

When you send a screenshot to Claude's Computer Use API, the model returns click coordinates in the display_width_px and display_height_px coordinate space you specified. The API has internal processing limits on image size. Images that exceed those limits are downscaled before the model sees them. If your harness is still mapping clicks back to the original full-resolution screenshot, every coordinate is off.

The hard limits to design against are model-family specific. Treat the smaller of the two as your ceiling per request:

  • Claude 4.6 family (Opus 4.6, Sonnet 4.6, Haiku 4.5): max long edge 1568 px, max total 1.15 megapixels.

  • Claude Opus 4.7: max long edge 2576 px, max total 3.75 megapixels.

  • Claude Opus 4.8 (released 28 May 2026): follow the Opus 4.7 limits until Anthropic publishes a 4.8-specific update; the 84 percent Online-Mind2Web score reported on the Opus 4.8 launch was generated under those scaling assumptions.

  • Every model: after downscaling, set the display_width_px and display_height_px fields to the dimensions you actually sent, not the original screen.

A working harness pattern: capture the native screenshot, downscale once on the client side to fit inside the active model's long-edge and megapixel ceilings, send that image, and store the scale factor. Every click coordinate returned by Claude is multiplied by the inverse of that scale factor before being dispatched to the browser or operating system. This is roughly 30 lines of Python, and it is the single change that turns a flaky proof-of-concept into something a Sydney operations team will actually trust.

Verification, sandboxing, and the agent feedback loop

Resolution is the foundation. Three habits sit on top of it.

Verify every click visually

After the agent executes a click, take a fresh screenshot and feed it back. Ask Claude to confirm the expected outcome happened. Do not rely on DOM state alone if you are operating on a browser, and do not rely on returncode if you are operating on the desktop. The cheap loop of click, screenshot, confirm is what allows the model to catch its own mistakes mid-task instead of failing at the end of a 12-step procedure.

Sandbox before you scope up

Every Australian client agent we build runs first inside an isolated browser context (Playwright with an ephemeral profile) or a sandboxed VM. The sandbox lets you point the agent at production-equivalent data without giving it the ability to do anything irreversible. Once the agent is reliable on dry runs in the sandbox, you can graduate it to a real user account with scoped permissions. Trying to skip this step is one of the most common ways teams turn an agent project into an incident report.

Define stop conditions before you ship

Production-grade agents need explicit stop conditions: a maximum number of steps, a maximum elapsed wall time, a maximum number of consecutive identical actions, and a human-confirmation gate on any irreversible operation. Without these, an agent that hits an edge case will burn tokens and clicks indefinitely. With them, the worst-case outcome is a logged failure that a human reviews the next morning.

What it costs Australian teams to do this properly

The Australian browser-agent build market is now mature enough that you can size projects against real comparables. For a single internal workflow (think submission triage at an insurer, AP invoice keying at a manufacturer, or onboarding-form completion at a Melbourne professional services firm), the rough envelope we see is $45,000 to $90,000 for the first agent, including discovery, harness build, sandboxing, evaluation harness, and a four-week supervised pilot.

Inference costs sit on top, but they are smaller than most buyers expect. A well-scoped browser agent running on Claude Sonnet 4.6 with downscaled screenshots typically costs $0.08 to $0.25 per completed task, depending on the number of screenshots in the trajectory. A 5,000-task per month workflow lands between $4,800 and $15,000 in annual model spend. The honest answer to the AUD ROI question for most Australian mid-market clients: payback inside the first two quarters, then a long tail of recovered capacity.

The teams that get burned are the ones that skip the resolution discipline, skip the sandbox, and skip the evaluation harness. They spend the same $45,000, ship something that misclicks 15 percent of the time in production, and conclude that browser agents are not ready. The agents are ready. The build practice has to be.

The Australian context: APRA, Privacy Act, and where this fits

Browser agents that operate on internal systems touch every part of the Australian regulatory stack worth caring about. APRA CPS 230 (operational risk) and CPS 234 (information security) both treat an autonomous agent as a critical operational dependency, which means it needs documented controls, change governance, and incident response playbooks. The Privacy Act applies any time the agent touches personal information, including in the screenshots themselves. For ASIC-regulated entities, audit trails of agent decisions need to be retrievable for the standard seven-year retention horizon.

The Claude Computer and Browser Use guidance does not solve any of this for you. What it does solve is the technical foundation that compliance sits on top of. A reliable, deterministic, auditable agent is a precondition for a compliant one. An agent that misclicks 15 percent of the time cannot be audited because nobody can tell which outputs are intended and which are slips. Get the resolution and verification loops right first, then layer governance on top.

What to do this week

If your Australian team is already running a Claude browser-agent pilot, three small actions this week will outperform any model upgrade:

  • Audit your screenshot pipeline. Confirm every image sent to the API fits inside the active model's long-edge and megapixel ceilings, and confirm display_width_px matches what you sent.

  • Add a post-click verification screenshot to every action loop. Have Claude confirm the expected outcome before moving on.

  • Write your stop conditions into config, not into the prompt. Maximum steps, maximum wall time, and an explicit human-confirmation gate on irreversible operations.

If the team is earlier in the build, the official Anthropic guide is now the right starting point. Read it before the harness design review, not after the first production incident.

Automata AI builds Claude browser and computer-use agents for Australian businesses across financial services, manufacturing, and professional services. If you are scoping a build or trying to recover a stalled pilot, book a 30-minute brainstorm and we will walk through the harness, the sandbox, and the evaluation plan together.

Ready to move from AI pilot to production?

We help mid-market Australian businesses deploy AI automations that actually reach production and deliver measurable ROI.