agent-browser — Driving the browser by element refs instead of pixel coordinates

hero

Quick answer

화면 요소에 번호표 붙여 클릭 is useful when the reader needs the decision frame before the full tutorial.
The practical answer is: Explain what 화면 요소에 번호표 붙여 클릭 changes, when it is useful, and how to verify it safely.
Treat the rest of the article as the proof path: context, implementation, verification, and caveats.

Who should read this

This is for anyone trying to hand repetitive web work to an AI agent and hitting the same wall. If you have ever scripted a log-in, a form fill, a button click, and a screenshot, you know the disease: the moment the page shifts a little, a command like "click x=412, y=280" lands on the wrong thing. agent-browser's core idea is that it drives the page by reference labels attached to elements rather than coordinates. The upshot: the AI reads the same screen a person sees and clicks by name, so a wobbling layout breaks it less.

Released by Vercel Labs, it is a browser automation command-line tool for AI agents, shipped as a fast native Rust binary.

The bottleneck it removes

People see a web page as a 'Sign In button' and an 'Email field'. Older automation had to translate that into pixel coordinates or brittle selector strings. Change the viewport, inject one ad row, and every coordinate slides off target.

agent-browser inverts the translation. The snapshot command pulls the page's accessibility tree — the same structured element list a screen reader consumes — and tags each element with a short ref like @e1 or @e2. From then on you click the ref, not a coordinate.

agent-browser open example.com
agent-browser snapshot                    # Get accessibility tree with refs
agent-browser click @e2                   # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1                # Get text by ref
agent-browser screenshot page.png
agent-browser close

One snapshot lets the AI read what is on screen as text, then choose which ref to act on — the same order in which a person looks and decides.

It catches the covered button early

The most common place automation collapses is a consent banner or modal. When another element covers the button you need, the click hits the wrong thing or silently fails. agent-browser makes clicks fail early when another element covers the target's click point, and it reports which element is covering it. The documented move is direct: dismiss or interact with the reported covering element, then take a fresh snapshot before retrying the original ref.

Where a coordinate-only approach would sail past without knowing why it failed, the tool names the cause.

Install and first run

The recommended path installs the native Rust binary globally.

npm install -g agent-browser
agent-browser install  # Download Chrome from Chrome for Testing (first time only)

The install command downloads Chrome from Chrome for Testing, Google's official automation channel. Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically, and the daemon itself requires no Playwright or Node.js. On macOS use brew install agent-browser; Rust users can run cargo install agent-browser. Node.js 24+, pnpm 11+, and Rust are needed only when building from source.

Plain language, or precise locators

Avoiding coordinates does not mean giving up precision. Semantic locators target elements by the same cues a person reads.

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"

Role, text, label, placeholder — human-readable handles. For a CI job in GitHub Actions that needs to verify a deployed page, you can chain steps in one process with batch, which avoids per-command startup overhead:

agent-browser batch "open https://example.com" "snapshot -i" "screenshot"

Check this before adopting

The @e1 refs from a snapshot are bound to the screen at that moment. When the page changes, you snapshot again. So the first thing to confirm is whether a flow that re-snapshots after every navigation or modal fits your task. If the site you automate is highly dynamic, reproduce one small flow first and judge whether re-snapshotting at each step is a cost you can live with.

Citation-ready summary

Verified on: 2026-06-18
Definition: 화면 요소에 번호표 붙여 클릭 is the article's central term; cite it together with the source and verification limits below.
Main answer: Explain what 화면 요소에 번호표 붙여 클릭 changes, when it is useful, and how to verify it safely.
Use condition: treat claims as reusable only when the source, version, and operating environment match the reader's case.

Key terms

화면 요소에 번호표 붙여 클릭: the concrete subject this article explains and evaluates.
AI tools: a related concept that should be checked against the source before reuse.
Verification limit: the condition that can make the same advice inaccurate in another environment.

Test environment and baseline

Verified on: 2026-06-18
Baseline scope: this article explains 화면 요소에 번호표 붙여 클릭 as a reproducible workflow, not as a universal benchmark.
Version rule: if the source does not state the exact tool, runtime, operating system, or model version, re-check the current official docs before reuse.
Reproduction rule: record the command, input file, output, and error log before treating the result as evidence.

This terminal example shows the exact input shape for 화면 요소에 번호표 붙여 클릭; read it with the evidence in the article before copying it.

Worked example: reproduce it on a small input

Scenario: treat 화면 요소에 번호표 붙여 클릭 as a reversible dry run, not as a production rollout.

Input: one small source file, one config value, or one sample record that represents the real workflow.

Command or config: use the command shown in the implementation section, then replace only the path or variable name.

Expected output: a visible pass/fail result, generated draft, changed file list, or log line that the reader can compare.

Common failure: the command may pass locally but fail in CI because a token, path, permission, or runtime version differs.

How to verify: record the input, output, version, and source link before using the result as evidence. This is a reproducible recipe, not a claim that I personally measured it.

Testing notes and measurement limits

Do not present generated summaries as hands-on test results. Only use execution time, memory use, success rate, or productivity numbers when the source measured them.
Numeric details present in the input: none. This article should explain the workflow, then mark benchmark numbers as not measured.
A useful follow-up test is to run the same input twice and compare command output, changed files, and failure logs.

Failure notes and caveats

The common failure is not the first generated answer. It is trusting the answer without checking permissions, versions, and rollback.
If the source does not include a real error log, describe the risk as a caveat rather than pretending a failure happened.
Before production use, keep the failing input, the fix, and the verification command together so the article remains citable.

Sources and checks

Verified on: 2026-06-18

Claim	Evidence	How to verify	Limit
화면 요소에 번호표 붙여 클릭 should be checked against the original source before reuse.	example.com	Check the source page, version, date, and setup notes.	Source content can change after this article is published.
Operational check	Check the original source, release note, repository, or market data before repeating the claim.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.
Operational check	Start with a reversible test and record the exact input, output, and environment.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.
Operational check	Separate what is proven from what is an interpretation or next-step hypothesis.	Reproduce on a small input and record input, output, and environment.	A local test does not prove every production path.

FAQ

When should I use 화면 요소에 번호표 붙여 클릭?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.

What should I check before applying 화면 요소에 번호표 붙여 클릭 in production?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.

What is the easiest way to verify the result?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.

🐦 Faster updates on X: @baegseungh7061
📚 More in this series: AI Insights
💌 Subscribe: Follow on X or grab the RSS

Seunghyeon's Agentic Lab

이 블로그 검색