{
  "name": "browser-automation",
  "title": "Browser Automation",
  "description": "Browse websites, fill forms, extract data, and take screenshots with a real browser",
  "guid": "sk_plat_brow",
  "category": "Agent Tools",
  "requiredTools": [
    "browser"
  ],
  "content": "# Browser Automation\n\nThe `browser` tool lets you control a real web browser to navigate websites, interact with forms, extract data, and take screenshots. Browser sessions are sticky per user - state (cookies, page, tabs) persists across tool calls.\n\n## Core Workflow: Snapshot → Interact → Verify\n\n1. **Open** a page: `action: \"open\", url: \"https://example.com\"`\n2. **Snapshot** to see the page: `action: \"snapshot\"` - returns the accessibility tree with element refs like `@e1`, `@e2`\n3. **Interact** using refs: `action: \"click\", selector: \"@e5\"` or `action: \"fill\", selector: \"@e3\", text: \"hello\"`\n4. **Verify** with another snapshot or get specific data\n\n## Element Selectors\n\n- **Accessibility refs** (preferred): `@e1`, `@e2`, etc. - from the snapshot output. Token-efficient and reliable.\n- **CSS selectors**: `#login-btn`, `.nav-item`, `input[name=\"email\"]` - use when refs aren't available.\n- **Semantic find**: `action: \"find\", locator: \"text\", find_value: \"Submit\", find_action: \"click\"` - find by role, text, label, placeholder, alt, title, or testid.\n\n## Common Patterns\n\n### Form Fill\n```\n1. open url → snapshot (see form fields and their @refs)\n2. fill @e3 \"user@example.com\"\n3. fill @e5 \"password123\"\n4. click @e7 (submit button)\n   - or: press Enter\n5. snapshot (verify result)\n```\n\n### Data Extraction\n```\n1. open url → snapshot\n2. get text @e12    → specific element text\n3. get title        → page title\n4. get url          → current URL\n5. get attr @e8 href → link href\n6. get count \".item\" → count matching elements\n7. eval \"document.querySelectorAll('.price').map(e => e.textContent)\"\n```\n\n### Multi-Page Navigation\n```\n1. open page → snapshot → click link\n2. wait condition:\"load\" value:\"networkidle\"\n3. snapshot new page → extract data\n4. back → snapshot → click next link → repeat\n```\n\n### Screenshot\n```\naction: \"screenshot\"                               → viewport screenshot\naction: \"screenshot\", full: true                   → full page screenshot\naction: \"screenshot\", filename: \"result.png\"       → custom filename\n```\nScreenshots are saved to the project workspace and displayed as images.\n\n### Handle Dialogs (alert/confirm/prompt)\n```\naction: \"dialog\", dialog_action: \"accept\"\naction: \"dialog\", dialog_action: \"accept\", text: \"my input\"\naction: \"dialog\", dialog_action: \"dismiss\"\n```\n\n## All Actions\n\n| Action | Required Params | Description |\n|--------|----------------|-------------|\n| open | url | Navigate to URL |\n| snapshot | - | Get accessibility tree (primary \"see page\" action) |\n| screenshot | - | Visual capture → saved to workspace |\n| click | selector | Click element |\n| dblclick | selector | Double-click element |\n| type | selector, text | Type into element (appends to existing) |\n| fill | selector, text | Clear + fill input |\n| press | key | Press key combo: \"Enter\", \"Tab\", \"Control+a\", \"Escape\" |\n| select | selector, value | Select dropdown option |\n| check/uncheck | selector | Toggle checkbox |\n| scroll | - | Scroll page (direction, amount in px, default 300) |\n| hover | selector | Hover element |\n| get | attribute | Extract data: text, html, value, attr, title, url, count |\n| find | locator, find_value | Semantic find by role/text/label/placeholder/alt/title/testid |\n| wait | condition, value | Wait for selector/text/url/load/time/function |\n| eval | expression | Run JS on page |\n| console | - | Retrieve captured console errors/warnings |\n| upload | selector, file | Upload file to input |\n| dialog | - | Accept or dismiss browser dialogs |\n| back/forward/reload | - | Navigation |\n| close | - | Close browser session |\n| tab_new/tab_switch/tab_list/tab_close | - | Tab management |\n| cookies_get/cookies_set/cookies_clear | - | Cookie management |\n\n## Snapshot Options\n\n- `interactive: true` - show only interactive elements (buttons, links, inputs)\n- `compact: true` - compact output for less tokens\n- `selector: \"#main\"` - scope snapshot to a specific CSS selector\n\n## Tips\n\n- **Start with snapshot**, not screenshot. Snapshots are much more token-efficient than images.\n- **Use @refs** from snapshots for all interactions - they're stable within a page state.\n- **After interactions** (click, fill, submit), take another snapshot to see the updated page.\n- **Use `press \"Enter\"`** to submit forms instead of finding the submit button.\n- **Use `find`** for semantic locators when you know the element by its label or role.\n- **Use `interactive: true`** on snapshot to reduce output to just interactive elements.\n- **Rate limit**: 30 actions per minute. Plan multi-step flows efficiently.\n- **Timeouts**: Default 30s per action, max 120s. Use `timeout` param for slow pages.\n- Pages have full internet access - you can browse any public website.\n- Browser state persists across calls (cookies, tabs, page state) until the session idles out (5 min).\n\n## Console Error Capture\n\nWhen you `open` a page, a console interceptor is automatically injected. It captures:\n- `console.error()` and `console.warn()` calls\n- Uncaught exceptions (`window.onerror`)\n- Unhandled promise rejections\n\nUse `action: \"console\"` to retrieve captured messages. Output is capped at 3000 characters.\n\nThis is especially useful after deploying an app - open it, then check `console` to catch any JS errors without needing to manually eval anything.\n\n## Limitations\n\n- No file downloads from the browser (use `web_fetch` for direct file downloads)\n- JavaScript-heavy SPAs may need `wait` actions after navigation\n- Session expires after 5 minutes of inactivity"
}
