Documentation Index
Fetch the complete documentation index at: https://docs.avaturn.live/llms.txt
Use this file to discover all available pages before exploring further.
When to use
- Low-latency voice-to-voice with natural barge-in.
- Prompts, voice, and turn detection configured inline per session via an ephemeral OpenAI client secret.
- You don’t want to run a separate agent runtime.
Prerequisites
- OpenAI API key with Realtime API access. Only the GA version is supported — the beta is not.
- Avaturn API key (dashboard).
1. Mint a client secret
On your backend, exchange your OpenAI API key for a short-lived client secret. Avaturn uses this secret to open the Realtime WebSocket on the user’s behalf.2. Create an Avaturn session
session_id— backend handle (terminate, telemetry)token— short-lived credential for the Web SDK
avatar_id, background, render_model (avatar render preset, not the LLM), user_absent_timeout (default 60s, min 10), max_duration (default 3600s, max 86400). See the API reference.
3. Connect from the frontend
Configuring the agent
Thesession object you pass to client_secrets.create() is applied to the WebSocket Avaturn opens on the user’s behalf — full control over instructions, voice, VAD, and transcription.
Instructions and voice
marin and cedar voices for best quality. Other supported values: alloy, ash, ballad, coral, echo, sage, shimmer, verse.
User transcripts require
audio.input.transcription. Without it, OpenAI doesn’t emit transcription events and Avaturn has nothing to forward to the SDK. Avatar response transcripts (assistant side) flow regardless.Stored prompts
Engine behavior
- Audio. 24 kHz mono PCM in both directions.
- Interruptions. OpenAI server VAD (or semantic VAD, if configured). When the user starts speaking, Avaturn discards in-flight avatar audio.
- Transcripts. Assistant transcripts (
response.output_audio_transcript.done) flow by default. User transcripts (conversation.item.input_audio_transcription.completed) flow only whenaudio.input.transcriptionis configured. Both are forwarded to the SDK viace_events.realtime.*. - Tools. Tool definitions sent in the session config are parsed by OpenAI, but Avaturn doesn’t surface
response.function_call_arguments.*events to the Web SDK nor relay function results back. Tool calls won’t execute end-to-end — avoid them at this layer until proper support lands. - GA only. Beta or mixed beta/GA usage causes a
session_lifecycle_errorwith codeopenai-realtime-version-mismatch. See beta-to-GA migration.
Session lifecycle
A session ends on any of:- Explicit
DELETE /api/v1/sessions/{session_id} user_absent_timeoutelapses with the user disconnected (default 60s)max_durationcap reached (default 3600s, max 86400s)
avatar.dispose() on the frontend to tear down the local SDK state. The backend session terminates as described above — dispose() does not directly close it. Don’t try to resume a session after it ends; mint a new client secret and create a fresh session.