Documentation Index
Fetch the complete documentation index at: https://docs.avaturn.live/llms.txt
Use this file to discover all available pages before exploring further.
When to use
- Your conversational agent runs on Cartesia Line.
- Prompts, tools, voice, and the LLM live inside your Cartesia agent — not in the Avaturn session payload.
- You want natural barge-in detected server-side.
Prerequisites
- Deployed Cartesia Line agent and its
agent_id(quickstart) - Cartesia API key (
sk_car_...) - Avaturn API key (dashboard)
1. Mint a Cartesia access token
Cartesia Line uses short-lived access tokens for agent connections. Mint server-side from your Cartesia API key — a few minutes is enough.2. Create the Avaturn session
session_id— backend handletoken— short-lived credential for the Web SDK
avatar_id, background, render_model (avatar render preset, not the LLM), user_absent_timeout (default 60s, min 10), max_duration (default 3600s, max 86400).
3. Connect from the frontend
Configuring the agent
Cartesia Line is a deployed agent platform: prompts, tools, voice, and the LLM live in your Cartesia agent. To change agent behavior, update and redeploy in Cartesia.No per-session variables. The Avaturn payload accepts only
agent_id and access_token. There’s no variables, context, or metadata pass-through. For per-user variation, deploy multiple agents and select the right agent_id at session creation.Engine behavior
- Audio. Avaturn streams the user’s microphone to Cartesia as 24 kHz base64-encoded PCM, matching the Cartesia Calls API input format.
- Interruptions. Cartesia detects barge-in server-side and emits a
clearsignal; Avaturn drops in-flight avatar audio so the next response starts cleanly. - No turn boundaries. Cartesia doesn’t emit explicit turn-start / turn-end markers. Avaturn opens a new segment on the first audio chunk and closes it on buffer drain or
clear. - Tools and LLM. Both execute inside Cartesia’s runtime — Avaturn doesn’t observe or proxy them. Configure tools in your Cartesia agent.
- Transcripts. Cartesia transcripts are not forwarded to the Web SDK. If you need transcripts in your app, capture them inside the Cartesia agent and ship via your own backend.
- Text input is not played. The Avaturn
POST /sessions/{id}/tasksendpoint accepts the request and returns atask_id, but the Cartesia engine ignores text-echo commands — the avatar is driven by user voice and your agent logic only. - Call transfer is not supported. If your agent emits a
transfer_callaction, Avaturn logs a warning and ignores it. The avatar continues in the existing session. - Server-initiated end. If your agent invokes the
end_calltool (or otherwise ends the conversation), Cartesia closes the WebSocket gracefully and the Avaturn session ends as a normal termination.
Session lifecycle
A session ends on any of:- Explicit
DELETE /api/v1/sessions/{session_id} - Your Cartesia agent ending the call (e.g. via the
end_calltool) user_absent_timeoutelapses with the user disconnected (default 60s)max_durationcap reached (default 3600s, max 86400s)
avatar.dispose() on the frontend to tear down the local SDK state. The backend session terminates as described above — dispose() does not directly close it.