{
  "name": "app-video",
  "title": "App API - Video Generation & Understanding",
  "description": "Video generation (Veo 3.1) and video understanding (Gemini) for deployed apps: generate short AI videos, analyze video content",
  "guid": "sk_plat_avid",
  "category": "App services",
  "requiredTools": [
    "project_settings"
  ],
  "content": "# App API - Video Generation & Understanding\n\n## Video Generation (Veo 3.1)\n\nGenerate short AI videos (up to 8 seconds) with audio using Google Veo.\n\n### Models\nveo-3.1-generate-preview (best quality, ~$0.40/sec), veo-3.1-fast-generate-preview (faster, ~$0.15/sec), veo-3.1-lite-generate-preview (budget, ~$0.07/sec)\n\n### Agent Tool: `video_generate`\n```json\n{\n  \"prompt\": \"A bird flying over a mountain lake at sunset, slow camera pan\",\n  \"model\": \"veo-3.1-generate-preview\",\n  \"aspect_ratio\": \"16:9\",\n  \"resolution\": \"1080p\"\n}\n```\n\nParameters:\n- `prompt` (required): Describe scene, action, camera movement, dialogue\n- `model`: See models above (default: veo-3.1-generate-preview)\n- `aspect_ratio`: 16:9 (landscape, default), 9:16 (portrait/vertical), 1:1 (square)\n- `resolution`: 720p (default), 1080p, 4k\n\nGeneration takes 30-120 seconds. Videos include AI-generated audio.\n\n### App API Endpoint\n`POST /api/<PROJECT_GUID>/services/video`\n\nRequest:\n```json\n{\n  \"prompt\": \"Close-up of coffee being poured into a cup\",\n  \"model\": \"veo-3.1-fast-generate-preview\",\n  \"aspect_ratio\": \"1:1\"\n}\n```\n\nResponse:\n```json\n{\n  \"url\": \"https://media.gipity.ai/med_abc12345.mp4\",\n  \"content_type\": \"video/mp4\",\n  \"model\": \"veo-3.1-fast-generate-preview\",\n  \"provider\": \"gemini\",\n  \"size_bytes\": 2048000,\n  \"credits_used\": 120\n}\n```\n\n### CLI\n```bash\ngipity generate video \"a cat playing piano\" --model veo-3.1-fast-generate-preview\ngipity generate video \"vertical dance\" --aspect 9:16 --resolution 1080p -o dance.mp4\n```\n\n### Tips\n- Be specific: describe lighting, camera angle, movement, and mood\n- Include dialogue in quotes for Veo to generate matching speech\n- 9:16 for social media / mobile content\n- veo-3.1-lite is good enough for previews and iterations\n\n## Video Understanding\n\nAnalyze video content using Gemini's multimodal AI.\n\n### Agent Tool: `video_understand`\n```json\n{\n  \"path\": \"video.mp4\",\n  \"prompt\": \"Describe what happens in this video\"\n}\n```\n\nParameters:\n- `path` (required): Path to video file in workspace\n- `prompt` (required): What to analyze - describe, count, transcribe, summarize, etc.\n\nSupports: mp4, mov, avi, mkv, webm (up to 100MB). Uses Gemini 2.5 Flash.\n\n### Example Prompts\n- \"Describe what happens in each scene\"\n- \"Count the number of people visible\"\n- \"Transcribe all dialogue and text on screen\"\n- \"What brand logos appear in this video?\"\n- \"Summarize the key points of this presentation\""
}
