# App API - Text-to-Speech Service

> This is the **app service** a deployed app calls over HTTPS. For the agent generating speech during a chat, see [tts](tts.md).

TTS is available for every project with no setup needed (owner_pays billing by default).

Use `project_settings` to customize (optional):
- Switch billing mode (owner_pays ↔ user_pays)
- Set default provider and voice

## Providers
- **elevenlabs**: ElevenLabs (many voices - use voice_set list to discover)
- **openai**: OpenAI (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse)
- **gemini**: Gemini (30 voices: Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede, and 22 more). Multi-speaker (up to 2) and 60+ languages

## Endpoints
- `GET /api/<PROJECT_GUID>/services/tts/voices` - list available voices
- `POST /api/<PROJECT_GUID>/services/tts` - generate speech audio

## Listing Voices
```
GET /api/<PROJECT_GUID>/services/tts/voices?provider=elevenlabs
```
Returns `{ data: { voices: [...], provider, available_providers } }`

## Request Format (POST /tts)
```json
{
  "text": "Hello, welcome to our app!",
  "voice_id": "JBFqnCBsd6RMkjVDRZzb",
  "provider": "elevenlabs",
  "model": "eleven_multilingual_v2"
}
```

Fields:
- `text` (required): Text to speak, max 5,000 chars
- `voice_id`: Voice to use (default: JBFqnCBsd6RMkjVDRZzb / George)
- `provider`: "elevenlabs", "openai", or "gemini" (default: elevenlabs)
- `model`: Provider-specific model ID (optional)
- `language`: BCP-47 language code (Gemini only, e.g. "ja-JP", "es-ES"). 60+ languages
- `speakers`: Multi-speaker config (Gemini only, up to 2). Array of `{ name, voice }`. Text must use "Name: dialogue" format per line

## Gemini TTS Details

**30 voices**: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat

**Multi-speaker example** (up to 2 speakers):
```json
{
  "text": "Joe: Hey, how are you?\nJane: Great, thanks!",
  "provider": "gemini",
  "speakers": [
    { "name": "Joe", "voice": "Charon" },
    { "name": "Jane", "voice": "Leda" }
  ]
}
```

**Language example** (Japanese):
```json
{
  "text": "こんにちは世界",
  "provider": "gemini",
  "voice_id": "Kore",
  "language": "ja-JP"
}
```

Output format is raw PCM audio (audio/L16, 24kHz). The platform converts and serves as MP3.

## Response Format
```json
{
  "url": "https://media.gipity.ai/med_abc12345.mp3",
  "voice_id": "JBFqnCBsd6RMkjVDRZzb",
  "provider": "elevenlabs",
  "credits_used": 5
}
```

The `url` is a permanent public CDN URL to an MP3 file.

## Client Code Example
```js
const tokenRes = await fetch('https://a.gipity.ai/api/token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ app: '<PROJECT_GUID>' })
});
const { data: { token } } = await tokenRes.json();

const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/tts', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
  body: JSON.stringify({ text: 'Welcome to the future of AI!' })
});
const data = await res.json();

// Play audio
const audio = new Audio(data.url);
audio.play();
```

## Limits
- **Rate limit**: 600 requests per 5-minute window (per IP)
- **Max text length**: 5,000 chars
- **Timeout**: 60s
- Standard `RateLimit-*` headers included in responses