This is the app service a deployed app calls over HTTPS. For the agent generating speech during a chat, see tts.
TTS is available for every project with no setup needed (owner_pays billing by default).
Use project_settings to customize (optional):
- Switch billing mode (owner_pays ↔ user_pays)
- Set default provider and voice
Providers
- elevenlabs: ElevenLabs (many voices - use voice_set list to discover)
- openai: OpenAI (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse)
- gemini: Gemini (30 voices: Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede, and 22 more). Multi-speaker (up to 2) and 60+ languages
Endpoints
GET /api/<PROJECT_GUID>/services/tts/voices- list available voicesPOST /api/<PROJECT_GUID>/services/tts- generate speech audio
Listing Voices
GET /api/<PROJECT_GUID>/services/tts/voices?provider=elevenlabs
Returns { data: { voices: [...], provider, available_providers } }
Request Format (POST /tts)
{
"text": "Hello, welcome to our app!",
"voice_id": "JBFqnCBsd6RMkjVDRZzb",
"provider": "elevenlabs",
"model": "eleven_multilingual_v2"
}
Fields:
text(required): Text to speak, max 5,000 charsvoice_id: Voice to use (default: JBFqnCBsd6RMkjVDRZzb / George)provider: "elevenlabs", "openai", or "gemini" (default: elevenlabs)model: Provider-specific model ID (optional)language: BCP-47 language code (Gemini only, e.g. "ja-JP", "es-ES"). 60+ languagesspeakers: Multi-speaker config (Gemini only, up to 2). Array of{ name, voice }. Text must use "Name: dialogue" format per line
Gemini TTS Details
30 voices: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat
Multi-speaker example (up to 2 speakers):
{
"text": "Joe: Hey, how are you?\nJane: Great, thanks!",
"provider": "gemini",
"speakers": [
{ "name": "Joe", "voice": "Charon" },
{ "name": "Jane", "voice": "Leda" }
]
}
Language example (Japanese):
{
"text": "こんにちは世界",
"provider": "gemini",
"voice_id": "Kore",
"language": "ja-JP"
}
Output format is raw PCM audio (audio/L16, 24kHz). The platform converts and serves as MP3.
Response Format
{
"url": "https://media.gipity.ai/med_abc12345.mp3",
"voice_id": "JBFqnCBsd6RMkjVDRZzb",
"provider": "elevenlabs",
"credits_used": 5
}
The url is a permanent public CDN URL to an MP3 file.
Client Code Example
const tokenRes = await fetch('https://a.gipity.ai/api/token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ app: '<PROJECT_GUID>' })
});
const { data: { token } } = await tokenRes.json();
const res = await fetch('https://a.gipity.ai/api/<PROJECT_GUID>/services/tts', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-App-Token': token },
body: JSON.stringify({ text: 'Welcome to the future of AI!' })
});
const data = await res.json();
// Play audio
const audio = new Audio(data.url);
audio.play();
Limits
- Rate limit: 600 requests per 5-minute window (per IP)
- Max text length: 5,000 chars
- Timeout: 60s
- Standard
RateLimit-*headers included in responses