The browser tool lets you control a real web browser to navigate websites, interact with forms, extract data, and take screenshots. Browser sessions are sticky per user - state (cookies, page, tabs) persists across tool calls.

Core Workflow: Snapshot → Interact → Verify

  1. Open a page: action: "open", url: "https://example.com"
  2. Snapshot to see the page: action: "snapshot" - returns the accessibility tree with element refs like @e1, @e2
  3. Interact using refs: action: "click", selector: "@e5" or action: "fill", selector: "@e3", text: "hello"
  4. Verify with another snapshot or get specific data

Element Selectors

Common Patterns

Form Fill

1. open url → snapshot (see form fields and their @refs)
2. fill @e3 "user@example.com"
3. fill @e5 "password123"
4. click @e7 (submit button)
   - or: press Enter
5. snapshot (verify result)

Data Extraction

1. open url → snapshot
2. get text @e12    → specific element text
3. get title        → page title
4. get url          → current URL
5. get attr @e8 href → link href
6. get count ".item" → count matching elements
7. eval "document.querySelectorAll('.price').map(e => e.textContent)"

Multi-Page Navigation

1. open page → snapshot → click link
2. wait condition:"load" value:"networkidle"
3. snapshot new page → extract data
4. back → snapshot → click next link → repeat

Screenshot

action: "screenshot"                               → viewport screenshot
action: "screenshot", full: true                   → full page screenshot
action: "screenshot", filename: "result.png"       → custom filename

Screenshots are saved to the project workspace and displayed as images.

Handle Dialogs (alert/confirm/prompt)

action: "dialog", dialog_action: "accept"
action: "dialog", dialog_action: "accept", text: "my input"
action: "dialog", dialog_action: "dismiss"

All Actions

Action Required Params Description
open url Navigate to URL
snapshot - Get accessibility tree (primary "see page" action)
screenshot - Visual capture → saved to workspace
click selector Click element
dblclick selector Double-click element
type selector, text Type into element (appends to existing)
fill selector, text Clear + fill input
press key Press key combo: "Enter", "Tab", "Control+a", "Escape"
select selector, value Select dropdown option
check/uncheck selector Toggle checkbox
scroll - Scroll page (direction, amount in px, default 300)
hover selector Hover element
get attribute Extract data: text, html, value, attr, title, url, count
find locator, find_value Semantic find by role/text/label/placeholder/alt/title/testid
wait condition, value Wait for selector/text/url/load/time/function
eval expression Run JS on page
console - Retrieve captured console errors/warnings
upload selector, file Upload file to input
dialog - Accept or dismiss browser dialogs
back/forward/reload - Navigation
close - Close browser session
tab_new/tab_switch/tab_list/tab_close - Tab management
cookies_get/cookies_set/cookies_clear - Cookie management

Snapshot Options

Tips

Console Error Capture

When you open a page, a console interceptor is automatically injected. It captures:

Use action: "console" to retrieve captured messages. Output is capped at 3000 characters.

This is especially useful after deploying an app - open it, then check console to catch any JS errors without needing to manually eval anything.

Limitations