Agent-side voice & audio - the agent speaking and generating audio during a chat. For text-to-speech inside a deployed app, see app-tts.

Voice Setup (Streaming TTS)

When the user wants the agent to speak during chat, set a default voice:

  1. List voices: voice_set action="list" provider="elevenlabs" (or provider="openai")
  2. Present options to the user with descriptions - help them pick based on tone, accent, gender, and use case
  3. Set the chosen voice: voice_set action="set" provider="elevenlabs" voice_id="..."
  4. The user enables speech via the S toggle in the status bar

To disable: voice_set action="clear"

Providers

Choosing a Voice

When helping users pick a voice:

Audio File Generation

Use speech_generate to create audio files (saved to workspace with inline player):

Sound Effects & Music

Transcription & Audio Processing