Video Generation (Veo 3.1)
Generate short AI videos (up to 8 seconds) with audio using Google Veo.
Models
veo-3.1-generate-preview (best quality, ~$0.40/sec), veo-3.1-fast-generate-preview (faster, ~$0.15/sec), veo-3.1-lite-generate-preview (budget, ~$0.07/sec)
Agent Tool: video_generate
{
"prompt": "A bird flying over a mountain lake at sunset, slow camera pan",
"model": "veo-3.1-generate-preview",
"aspect_ratio": "16:9",
"resolution": "1080p"
}
Parameters:
prompt(required): Describe scene, action, camera movement, dialoguemodel: See models above (default: veo-3.1-generate-preview)aspect_ratio: 16:9 (landscape, default), 9:16 (portrait/vertical), 1:1 (square)resolution: 720p (default), 1080p, 4k
Generation takes 30-120 seconds. Videos include AI-generated audio.
App API Endpoint
POST /api/<PROJECT_GUID>/services/video
Request:
{
"prompt": "Close-up of coffee being poured into a cup",
"model": "veo-3.1-fast-generate-preview",
"aspect_ratio": "1:1"
}
Response:
{
"url": "https://media.gipity.ai/med_abc12345.mp4",
"content_type": "video/mp4",
"model": "veo-3.1-fast-generate-preview",
"provider": "gemini",
"size_bytes": 2048000,
"credits_used": 120
}
CLI
gipity generate video "a cat playing piano" --model veo-3.1-fast-generate-preview
gipity generate video "vertical dance" --aspect 9:16 --resolution 1080p -o dance.mp4
Tips
- Be specific: describe lighting, camera angle, movement, and mood
- Include dialogue in quotes for Veo to generate matching speech
- 9:16 for social media / mobile content
- veo-3.1-lite is good enough for previews and iterations
Video Understanding
Analyze video content using Gemini's multimodal AI.
Agent Tool: video_understand
{
"path": "video.mp4",
"prompt": "Describe what happens in this video"
}
Parameters:
path(required): Path to video file in workspaceprompt(required): What to analyze - describe, count, transcribe, summarize, etc.
Supports: mp4, mov, avi, mkv, webm (up to 100MB). Uses Gemini 2.5 Flash.
Example Prompts
- "Describe what happens in each scene"
- "Count the number of people visible"
- "Transcribe all dialogue and text on screen"
- "What brand logos appear in this video?"
- "Summarize the key points of this presentation"