App API - Audio Services (Sound Effects, Music, Transcription)

Three audio services, available on every project with no setup (user_pays by default).

Plan availability (Free plan): Music generation is Pro-only (blocked on Free). Sound effects count toward a shared 3-uses-per-month audio allowance with text-to-speech (see app-tts). Transcription is not gated. When a Free-plan owner hits a block, the endpoint returns 403 FORBIDDEN with an upgrade message - handle it in the app's UI. All limits lift on Pro.

Use project_settings to customize each independently (optional).

To make a billing-mode choice reproducible (it ships with the app instead of living as out-of-band server state), declare it in a services deploy phase in gipity.yaml rather than only via project_settings - e.g. { service: transcribe, billing_mode: owner_pays } (also sound / music). See deploy and app-llm.

Billing modes only govern the deployed app's runtime calls. Direct generation during development - gipity generate sound|music, gipity service call, or the agent's own generation tools - always bills the caller (you), whatever the service's billing_mode says. Never flip a service to owner_pays just to generate assets: it's unnecessary, and while flipped the live app accepts anonymous generation on your credits.

Sound Effects

Generate sound effects from text descriptions.

Endpoint

POST /api/<PROJECT_GUID>/services/sound

Request

{
  "text": "thunder rumbling in the distance",
  "duration_seconds": 5,
  "prompt_influence": 0.5
}

text (required): Description of the sound, max 1,000 chars
duration_seconds: 0.5-30 (optional, provider decides if omitted)
prompt_influence: 0-1, how closely to follow the prompt (optional)

Response

{
  "url": "https://media.gipity.ai/med_abc12345.mp3",
  "duration_seconds": 5,
  "credits_used": 2
}

CLI

For one-off sound effects during development (game SFX, UI sounds, voice-like cries), skip the HTTP call and use gipity generate sound - it bills you directly and ignores the service's billing mode. It writes to ./sound.mp3 by default - pass -o <path> to land the clip in your source tree so it deploys.

gipity generate sound "cartoon character saying oof" -o src/assets/sounds/oof.mp3
gipity generate sound "thunder rolling in the distance" --duration 5 -o src/assets/sounds/thunder.mp3

Music Generation

Generate music from text prompts.

Endpoint

POST /api/<PROJECT_GUID>/services/music

Request

{
  "prompt": "upbeat lo-fi hip hop beat with piano and soft drums",
  "duration_seconds": 30,
  "instrumental": true
}

prompt (required): Music description, max 2,000 chars
duration_seconds: 3-600 (optional, default ~30s; each model has its own max)
instrumental: true to force no vocals (optional)
model: which music model to use (optional; omit for the default). Get the list from the models endpoint below - don't hardcode ids.

Response

{
  "url": "https://media.gipity.ai/med_abc12345.mp3",
  "duration_seconds": 30,
  "model": "music-v1",
  "credits_used": 3
}

Picking a model

GET /api/<PROJECT_GUID>/services/music/models lists what's available:

{
  "data": {
    "models": [
      { "id": "music-v1", "label": "Music v1", "description": "Fast, polished songs and loops.", "max_duration_s": 600 }
    ],
    "default_model": "music-v1"
  }
}

Use the id as the model value on the generate call. For a model picker in the UI, render label + description and send the chosen id. Models differ in sound and cost; the platform handles where each one runs.

CLI

For one-off music during development (saves the clip to a local file), skip the HTTP call and use gipity generate music. It writes to ./music.mp3 by default - pass -o <path> to land the clip in your source tree so it deploys.

gipity generate music "chill lo-fi beat for studying" -o src/assets/audio/lofi.mp3
gipity generate music "epic orchestral battle theme" --duration 60 -o src/assets/audio/theme.mp3
gipity service call music/models --get   # list available models

Audio Transcription

Transcribe audio files to text, with word-level timestamps and optional speaker diarization.

Video or a file over 100MB? The cap is on the audio you send here, not on what you start from. Extract the audio first with ffmpeg in the sandbox (code_execute) - e.g. ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -b:a 64k audio.mp3 turns hours of video into a few MB - then send that audio to this endpoint. The sandbox accepts inputs up to several GB, so the original file size is never the constraint.

Endpoint

POST /api/<PROJECT_GUID>/services/transcribe (multipart/form-data)

Request

Send as multipart/form-data:

audio (required): Audio file (MP3, WAV, M4A, etc.), max 100MB
provider: "elevenlabs" (default) or "openai"
language: Language code (e.g., "en", "es") - optional, auto-detected
diarize: "true" to identify speakers (optional)

Response

{
  "text": "Hello, this is a transcription test.",
  "words": [{ "text": "Hello", "start": 0.0, "end": 0.5, "type": "word" }],
  "language": "en",
  "duration_seconds": 12.5,
  "provider": "elevenlabs",
  "credits_used": 5
}

Client Code Examples

// Get token first
const tokenRes = await fetch('https://a.gipity.ai/api/token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ app: '<PROJECT_GUID>' })
});
const { data: { token } } = await tokenRes.json();

// Sound effect
const soundRes = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/sound', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({ text: 'door creaking open slowly' })
});
const sound = await soundRes.json();
new Audio(sound.url).play();

// Music
const musicRes = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/music', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({ prompt: 'calm ambient piano', duration_seconds: 60 })
});
const music = await musicRes.json();
new Audio(music.url).play();

// Transcription (from file input)
const formData = new FormData();
formData.append('audio', fileInput.files[0]);
formData.append('diarize', 'true');
const transRes = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/transcribe', {
  method: 'POST',
  headers: { 'X-App-Token': token },
  body: formData
});
const transcript = await transRes.json();
console.log(transcript.text);

Limits

Rate limit: 600 requests per 5-minute window (per IP, all audio endpoints)
Sound text: max 1,000 chars, duration 0.5-30s, timeout 60s
Music prompt: max 2,000 chars, duration 3-600s, timeout 120s
Transcription: max 100MB file, timeout 120s
Standard RateLimit-* headers included in responses