# Browser Automation

The `browser` tool lets you control a real web browser to navigate websites, interact with forms, extract data, and take screenshots. Browser sessions are sticky per user - state (cookies, page, tabs) persists across tool calls.

## Core Workflow: Snapshot → Interact → Verify

1. **Open** a page: `action: "open", url: "https://example.com"`
2. **Snapshot** to see the page: `action: "snapshot"` - returns the accessibility tree with element refs like `@e1`, `@e2`
3. **Interact** using refs: `action: "click", selector: "@e5"` or `action: "fill", selector: "@e3", text: "hello"`
4. **Verify** with another snapshot or get specific data

## Element Selectors

- **Accessibility refs** (preferred): `@e1`, `@e2`, etc. - from the snapshot output. Token-efficient and reliable.
- **CSS selectors**: `#login-btn`, `.nav-item`, `input[name="email"]` - use when refs aren't available.
- **Semantic find**: `action: "find", locator: "text", find_value: "Submit", find_action: "click"` - find by role, text, label, placeholder, alt, title, or testid.

## Common Patterns

### Form Fill
```
1. open url → snapshot (see form fields and their @refs)
2. fill @e3 "user@example.com"
3. fill @e5 "password123"
4. click @e7 (submit button)
   - or: press Enter
5. snapshot (verify result)
```

### Data Extraction
```
1. open url → snapshot
2. get text @e12    → specific element text
3. get title        → page title
4. get url          → current URL
5. get attr @e8 href → link href
6. get count ".item" → count matching elements
7. eval "document.querySelectorAll('.price').map(e => e.textContent)"
```

### Multi-Page Navigation
```
1. open page → snapshot → click link
2. wait condition:"load" value:"networkidle"
3. snapshot new page → extract data
4. back → snapshot → click next link → repeat
```

### Screenshot
```
action: "screenshot"                               → viewport screenshot
action: "screenshot", full: true                   → full page screenshot
action: "screenshot", filename: "result.png"       → custom filename
```
Screenshots are saved to the project workspace and displayed as images.

### Handle Dialogs (alert/confirm/prompt)
```
action: "dialog", dialog_action: "accept"
action: "dialog", dialog_action: "accept", text: "my input"
action: "dialog", dialog_action: "dismiss"
```

## All Actions

| Action | Required Params | Description |
|--------|----------------|-------------|
| open | url | Navigate to URL |
| snapshot | - | Get accessibility tree (primary "see page" action) |
| screenshot | - | Visual capture → saved to workspace |
| click | selector | Click element |
| dblclick | selector | Double-click element |
| type | selector, text | Type into element (appends to existing) |
| fill | selector, text | Clear + fill input |
| press | key | Press key combo: "Enter", "Tab", "Control+a", "Escape" |
| select | selector, value | Select dropdown option |
| check/uncheck | selector | Toggle checkbox |
| scroll | - | Scroll page (direction, amount in px, default 300) |
| hover | selector | Hover element |
| get | attribute | Extract data: text, html, value, attr, title, url, count |
| find | locator, find_value | Semantic find by role/text/label/placeholder/alt/title/testid |
| wait | condition, value | Wait for selector/text/url/load/time/function |
| eval | expression | Run JS on page |
| console | - | Retrieve captured console errors/warnings |
| upload | selector, file | Upload file to input |
| dialog | - | Accept or dismiss browser dialogs |
| back/forward/reload | - | Navigation |
| close | - | Close browser session |
| tab_new/tab_switch/tab_list/tab_close | - | Tab management |
| cookies_get/cookies_set/cookies_clear | - | Cookie management |

## Snapshot Options

- `interactive: true` - show only interactive elements (buttons, links, inputs)
- `compact: true` - compact output for less tokens
- `selector: "#main"` - scope snapshot to a specific CSS selector

## Tips

- **Start with snapshot**, not screenshot. Snapshots are much more token-efficient than images.
- **Use @refs** from snapshots for all interactions - they're stable within a page state.
- **After interactions** (click, fill, submit), take another snapshot to see the updated page.
- **Use `press "Enter"`** to submit forms instead of finding the submit button.
- **Use `find`** for semantic locators when you know the element by its label or role.
- **Use `interactive: true`** on snapshot to reduce output to just interactive elements.
- **Rate limit**: 30 actions per minute. Plan multi-step flows efficiently.
- **Timeouts**: Default 30s per action, max 120s. Use `timeout` param for slow pages.
- Pages have full internet access - you can browse any public website.
- Browser state persists across calls (cookies, tabs, page state) until the session idles out (5 min).

## Console Error Capture

When you `open` a page, a console interceptor is automatically injected. It captures:
- `console.error()` and `console.warn()` calls
- Uncaught exceptions (`window.onerror`)
- Unhandled promise rejections

Use `action: "console"` to retrieve captured messages. Output is capped at 3000 characters.

This is especially useful after deploying an app - open it, then check `console` to catch any JS errors without needing to manually eval anything.

## Limitations

- No file downloads from the browser (use `web_fetch` for direct file downloads)
- JavaScript-heavy SPAs may need `wait` actions after navigation
- Session expires after 5 minutes of inactivity