{
  "name": "tts",
  "title": "Voice & Audio Guide",
  "description": "Voice selection, text-to-speech, sound effects, music generation, and audio tools",
  "guid": "sk_plat_ttsg",
  "category": "Agent Tools",
  "requiredTools": [
    "voice_set",
    "speech_generate",
    "sound_generate",
    "music_generate",
    "audio_transcribe",
    "audio_isolate"
  ],
  "content": "# Voice & Audio Guide\n\n> Agent-side voice & audio - the agent speaking and generating audio during a chat. For text-to-speech *inside a deployed app*, see [app-tts](app-tts.md).\n\n## Voice Setup (Streaming TTS)\nWhen the user wants the agent to speak during chat, set a default voice:\n1. List voices: `voice_set action=\"list\" provider=\"elevenlabs\"` (or `provider=\"openai\"`)\n2. Present options to the user with descriptions - help them pick based on tone, accent, gender, and use case\n3. Set the chosen voice: `voice_set action=\"set\" provider=\"elevenlabs\" voice_id=\"...\"`\n4. The user enables speech via the S toggle in the status bar\n\nTo disable: `voice_set action=\"clear\"`\n\n### Providers\n- **ElevenLabs** (default) - hundreds of voices, real-time streaming, highest quality. Each voice has a description plus labels (accent, gender, age, use case).\n- **OpenAI** - 11 built-in voices, batch generation only (no streaming). Good for file generation.\n\n### Choosing a Voice\nWhen helping users pick a voice:\n- Ask about their preference: tone (warm, professional, energetic), gender, accent\n- List voices from the matching provider and highlight relevant descriptions/labels\n- Suggest 3-5 options that fit, don't dump the full list\n- Offer to generate a short sample with `speech_generate` so they can compare\n\n## Audio File Generation\nUse `speech_generate` to create audio files (saved to workspace with inline player):\n- Default provider: ElevenLabs (agent's configured voice)\n- Override with `provider` and `voice_id` for one-off voices\n- Max 5000 characters per call\n- OpenAI models: gpt-4o-mini-tts (default, fast), tts-1, tts-1-hd (higher quality)\n\n## Sound Effects & Music\n- `sound_generate` - generate sound effects from descriptions (e.g. \"thunder and rain\", \"sci-fi laser\")\n- `music_generate` - generate music from prompts (e.g. \"chill lo-fi beat\", \"epic orchestral theme\")\nBoth save to workspace with inline playback. Do not call audio_play after - the card is already shown.\n\n## Transcription & Audio Processing\n- `audio_transcribe` - speech-to-text from audio files\n- `audio_isolate` - extract vocals from audio (remove background music/noise)\n"
}
