The browser tool lets you control a real web browser to navigate websites, interact with forms, extract data, and take screenshots. Browser sessions are sticky per user - state (cookies, page, tabs) persists across tool calls.
Core Workflow: Snapshot → Interact → Verify
- Open a page:
action: "open", url: "https://example.com" - Snapshot to see the page:
action: "snapshot"- returns the accessibility tree with element refs like@e1,@e2 - Interact using refs:
action: "click", selector: "@e5"oraction: "fill", selector: "@e3", text: "hello" - Verify with another snapshot or get specific data
Element Selectors
- Accessibility refs (preferred):
@e1,@e2, etc. - from the snapshot output. Token-efficient and reliable. - CSS selectors:
#login-btn,.nav-item,input[name="email"]- use when refs aren't available. - Semantic find:
action: "find", locator: "text", find_value: "Submit", find_action: "click"- find by role, text, label, placeholder, alt, title, or testid.
Common Patterns
Form Fill
1. open url → snapshot (see form fields and their @refs)
2. fill @e3 "user@example.com"
3. fill @e5 "password123"
4. click @e7 (submit button)
- or: press Enter
5. snapshot (verify result)
Data Extraction
1. open url → snapshot
2. get text @e12 → specific element text
3. get title → page title
4. get url → current URL
5. get attr @e8 href → link href
6. get count ".item" → count matching elements
7. eval "document.querySelectorAll('.price').map(e => e.textContent)"
Multi-Page Navigation
1. open page → snapshot → click link
2. wait condition:"load" value:"networkidle"
3. snapshot new page → extract data
4. back → snapshot → click next link → repeat
Screenshot
action: "screenshot" → viewport screenshot
action: "screenshot", full: true → full page screenshot
action: "screenshot", filename: "result.png" → custom filename
Screenshots are saved to the project workspace and displayed as images.
Handle Dialogs (alert/confirm/prompt)
action: "dialog", dialog_action: "accept"
action: "dialog", dialog_action: "accept", text: "my input"
action: "dialog", dialog_action: "dismiss"
All Actions
| Action | Required Params | Description |
|---|---|---|
| open | url | Navigate to URL |
| snapshot | - | Get accessibility tree (primary "see page" action) |
| screenshot | - | Visual capture → saved to workspace |
| click | selector | Click element |
| dblclick | selector | Double-click element |
| type | selector, text | Type into element (appends to existing) |
| fill | selector, text | Clear + fill input |
| press | key | Press key combo: "Enter", "Tab", "Control+a", "Escape" |
| select | selector, value | Select dropdown option |
| check/uncheck | selector | Toggle checkbox |
| scroll | - | Scroll page (direction, amount in px, default 300) |
| hover | selector | Hover element |
| get | attribute | Extract data: text, html, value, attr, title, url, count |
| find | locator, find_value | Semantic find by role/text/label/placeholder/alt/title/testid |
| wait | condition, value | Wait for selector/text/url/load/time/function |
| eval | expression | Run JS on page |
| console | - | Retrieve captured console errors/warnings |
| upload | selector, file | Upload file to input |
| dialog | - | Accept or dismiss browser dialogs |
| back/forward/reload | - | Navigation |
| close | - | Close browser session |
| tab_new/tab_switch/tab_list/tab_close | - | Tab management |
| cookies_get/cookies_set/cookies_clear | - | Cookie management |
Snapshot Options
interactive: true- show only interactive elements (buttons, links, inputs)compact: true- compact output for less tokensselector: "#main"- scope snapshot to a specific CSS selector
Tips
- Start with snapshot, not screenshot. Snapshots are much more token-efficient than images.
- Use @refs from snapshots for all interactions - they're stable within a page state.
- After interactions (click, fill, submit), take another snapshot to see the updated page.
- Use
press "Enter"to submit forms instead of finding the submit button. - Use
findfor semantic locators when you know the element by its label or role. - Use
interactive: trueon snapshot to reduce output to just interactive elements. - Rate limit: 30 actions per minute. Plan multi-step flows efficiently.
- Timeouts: Default 30s per action, max 120s. Use
timeoutparam for slow pages. - Pages have full internet access - you can browse any public website.
- Browser state persists across calls (cookies, tabs, page state) until the session idles out (5 min).
Console Error Capture
When you open a page, a console interceptor is automatically injected. It captures:
console.error()andconsole.warn()calls- Uncaught exceptions (
window.onerror) - Unhandled promise rejections
Use action: "console" to retrieve captured messages. Output is capped at 3000 characters.
This is especially useful after deploying an app - open it, then check console to catch any JS errors without needing to manually eval anything.
Limitations
- No file downloads from the browser (use
web_fetchfor direct file downloads) - JavaScript-heavy SPAs may need
waitactions after navigation - Session expires after 5 minutes of inactivity