{
  "name": "web-vision-mediapipe",
  "title": "Browser Vision (MediaPipe)",
  "description": "On-device camera vision for web apps: gesture recognition, body pose, object detection - the web-vision-mediapipe kit",
  "guid": "sk_plat_wvmp",
  "category": "Templates",
  "requiredTools": [
    "add",
    "file_write",
    "project_deploy"
  ],
  "content": "# Browser Vision (MediaPipe)\n\n`web-vision-mediapipe` is a **kit** - a reusable building block added into an existing web app. It wraps Google [MediaPipe Tasks](https://ai.google.dev/edge/mediapipe) so an app can read the camera and run **gesture recognition**, **body pose**, or **object detection** entirely in the browser.\n\nOn-device: no server, no upload, the camera stream never leaves the device. Inference is WASM + WebGL-accelerated. **Web only** - it needs `getUserMedia`, WASM, and a canvas, so it runs only on HTTPS or `localhost`.\n\n## Two ways in\n\n**Start a fresh camera app** - add the `web-vision-cam` starter. It is a fullscreen camera app that already switches between gesture, object, and pose detection with a live FPS readout, and ships the kit pre-installed:\n\n```\nadd name=web-vision-cam title=\"...\"\n```\n\n**Add vision to an existing web app** - install the kit into it:\n\n```\nadd name=web-vision-mediapipe\n```\n\nThis copies the kit to `src/packages/web-vision-mediapipe/` and wires the import map in `src/index.html` (the kit specifier plus `@mediapipe/tasks-vision`). There is no deploy phase - it is pure client-side, so a plain static app needs nothing else.\n\n## Using the kit\n\nThe whole job is two elements - a `<video>` for the camera and a `<canvas>` overlaying it - plus one call:\n\n```js\nimport { mountVision } from '@gipity/web-vision-mediapipe';\n\nconst vision = await mountVision({\n  video:  document.querySelector('video'),\n  canvas: document.querySelector('canvas'),\n  kind:   'gesture',                          // 'gesture' | 'detect' | 'pose'\n  camera: { facingMode: 'user' },             // 'user' (front) | 'environment' (rear)\n  onFps:  (fps) => { hud.textContent = `${fps} FPS`; },\n  onResult: (result, kind) => { /* app logic - see result shapes below */ },\n});\n\nawait vision.switchTask('pose');   // swap model, camera keeps running\nvision.stop();                     // release camera + free GPU memory\n```\n\n`mountVision` runs the camera, the inference loop, and the overlay drawing. For a custom loop, compose the low-level exports instead: `createTask`, `startCamera`, `createLoop`, `draw`, `fitCanvas`, `clearCanvas`. See `src/packages/web-vision-mediapipe/examples/` and its `README.md`.\n\n## Tasks and result shapes\n\n`kind` selects the model. Each `onResult` / `task.detect()` value is the native MediaPipe result:\n\n| `kind`    | Detects | Key fields |\n|-----------|---------|------------|\n| `gesture` | Hands + recognised gesture | `result.gestures[hand][0]` → `{ categoryName, score }`; `result.landmarks[hand]` → 21 points |\n| `detect`  | The 80 COCO object classes | `result.detections[]` → `{ boundingBox, categories: [{ categoryName, score }] }` |\n| `pose`    | Body skeleton | `result.landmarks[person]` → 33 points `{ x, y, z, visibility }` |\n\nRecognised gestures: `Thumb_Up`, `Thumb_Down`, `Open_Palm`, `Closed_Fist`, `Victory`, `Pointing_Up`, `ILoveYou` (and `None`).\n\n## Notes and common mistakes\n\n- **Gesture is the strong task.** Object detection uses EfficientDet-Lite - fast but modest accuracy. Good for a demo; do not promise production-grade detection. If a project needs high-accuracy detection, say so rather than over-selling this kit.\n- **The canvas must overlay the video** at the same on-screen size. The kit sizes the canvas backing store to the camera frame; CSS `object-fit: cover` on *both* keeps the overlay aligned. A front camera reads naturally with `transform: scaleX(-1)` on both.\n- **Camera needs a user gesture and a secure origin.** Call `mountVision` from a click handler, not on page load, and deploy over HTTPS - `getUserMedia` fails on plain HTTP.\n- **One `detect()` per frame.** Timestamps must strictly increase; `mountVision`/`createLoop` already handle this. Do not call `task.detect()` twice for the same frame.\n- **First use downloads the model** (~3-8 MB) from Google's CDN, then it is browser-cached. Expect a short delay on the first frame of each task.\n- **License:** MediaPipe and its default models are Apache-2.0 - free for commercial use, no copyleft obligation on the app.\n"
}