# App API - Video Generation & Understanding

## Video Generation (Veo 3.1)

Generate short AI videos (up to 8 seconds) with audio using Google Veo.

### Models
veo-3.1-generate-preview (best quality, ~$0.40/sec), veo-3.1-fast-generate-preview (faster, ~$0.15/sec), veo-3.1-lite-generate-preview (budget, ~$0.07/sec)

### Agent Tool: `video_generate`
```json
{
  "prompt": "A bird flying over a mountain lake at sunset, slow camera pan",
  "model": "veo-3.1-generate-preview",
  "aspect_ratio": "16:9",
  "resolution": "1080p"
}
```

Parameters:
- `prompt` (required): Describe scene, action, camera movement, dialogue
- `model`: See models above (default: veo-3.1-generate-preview)
- `aspect_ratio`: 16:9 (landscape, default), 9:16 (portrait/vertical), 1:1 (square)
- `resolution`: 720p (default), 1080p, 4k

Generation takes 30-120 seconds. Videos include AI-generated audio.

### App API Endpoint
`POST /api/<PROJECT_GUID>/services/video`

Request:
```json
{
  "prompt": "Close-up of coffee being poured into a cup",
  "model": "veo-3.1-fast-generate-preview",
  "aspect_ratio": "1:1"
}
```

Response:
```json
{
  "url": "https://media.gipity.ai/med_abc12345.mp4",
  "content_type": "video/mp4",
  "model": "veo-3.1-fast-generate-preview",
  "provider": "gemini",
  "size_bytes": 2048000,
  "credits_used": 120
}
```

### CLI
```bash
gipity generate video "a cat playing piano" --model veo-3.1-fast-generate-preview
gipity generate video "vertical dance" --aspect 9:16 --resolution 1080p -o dance.mp4
```

### Tips
- Be specific: describe lighting, camera angle, movement, and mood
- Include dialogue in quotes for Veo to generate matching speech
- 9:16 for social media / mobile content
- veo-3.1-lite is good enough for previews and iterations

## Video Understanding

Analyze video content using Gemini's multimodal AI.

### Agent Tool: `video_understand`
```json
{
  "path": "video.mp4",
  "prompt": "Describe what happens in this video"
}
```

Parameters:
- `path` (required): Path to video file in workspace
- `prompt` (required): What to analyze - describe, count, transcribe, summarize, etc.

Supports: mp4, mov, avi, mkv, webm (up to 100MB). Uses Gemini 2.5 Flash.

### Example Prompts
- "Describe what happens in each scene"
- "Count the number of people visible"
- "Transcribe all dialogue and text on screen"
- "What brand logos appear in this video?"
- "Summarize the key points of this presentation"
