Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.avaturn.live/llms.txt

Use this file to discover all available pages before exploring further.

When to use

  • Your conversational agent runs on Cartesia Line.
  • Prompts, tools, voice, and the LLM live inside your Cartesia agent — not in the Avaturn session payload.
  • You want natural barge-in detected server-side.
For inline per-session config of prompts/voice/VAD, see OpenAI Realtime.

Prerequisites

  • Deployed Cartesia Line agent and its agent_id (quickstart)
  • Cartesia API key (sk_car_...)
  • Avaturn API key (dashboard)

1. Mint a Cartesia access token

Cartesia Line uses short-lived access tokens for agent connections. Mint server-side from your Cartesia API key — a few minutes is enough.
import httpx

async with httpx.AsyncClient(timeout=10.0) as http:
    r = await http.post(
        "https://api.cartesia.ai/access-token",
        headers={
            "Authorization": "Bearer <CARTESIA_API_KEY>",
            "Cartesia-Version": "2025-04-16",
        },
        json={"grants": {"agent": True}, "expires_in": 300},
    )
    r.raise_for_status()
    access_token = r.json()["token"]
Mint per session. Don’t cache. See the Cartesia authentication guide for scopes.

2. Create the Avaturn session

import httpx

async with httpx.AsyncClient() as http:
    r = await http.post(
        "https://api.avaturn.live/api/v1/sessions",
        headers={"Authorization": "Bearer <AVATURN_API_KEY>"},
        json={
            "conversation_engine": {
                "type": "cartesia",
                "access_token": access_token,
                "agent_id": "<your-agent-id>",
            },
        },
    )
    r.raise_for_status()
    session = r.json()  # { "session_id": "...", "token": "..." }
Response:
  • session_id — backend handle
  • token — short-lived credential for the Web SDK
Optional session fields: avatar_id, background, render_model (avatar render preset, not the LLM), user_absent_timeout (default 60s, min 10), max_duration (default 3600s, max 86400).

3. Connect from the frontend

import { AvaturnHead } from "@avaturn-live/web-sdk";

const root = document.querySelector<HTMLDivElement>("#avaturn-video")!;
const avatar = new AvaturnHead(root, {
  sessionToken: session.token,
  audioSource: true, // required — engine is voice-to-voice
});

await avatar.init();

Configuring the agent

Cartesia Line is a deployed agent platform: prompts, tools, voice, and the LLM live in your Cartesia agent. To change agent behavior, update and redeploy in Cartesia.
No per-session variables. The Avaturn payload accepts only agent_id and access_token. There’s no variables, context, or metadata pass-through. For per-user variation, deploy multiple agents and select the right agent_id at session creation.

Engine behavior

  • Audio. Avaturn streams the user’s microphone to Cartesia as 24 kHz base64-encoded PCM, matching the Cartesia Calls API input format.
  • Interruptions. Cartesia detects barge-in server-side and emits a clear signal; Avaturn drops in-flight avatar audio so the next response starts cleanly.
  • No turn boundaries. Cartesia doesn’t emit explicit turn-start / turn-end markers. Avaturn opens a new segment on the first audio chunk and closes it on buffer drain or clear.
  • Tools and LLM. Both execute inside Cartesia’s runtime — Avaturn doesn’t observe or proxy them. Configure tools in your Cartesia agent.
  • Transcripts. Cartesia transcripts are not forwarded to the Web SDK. If you need transcripts in your app, capture them inside the Cartesia agent and ship via your own backend.
  • Text input is not played. The Avaturn POST /sessions/{id}/tasks endpoint accepts the request and returns a task_id, but the Cartesia engine ignores text-echo commands — the avatar is driven by user voice and your agent logic only.
  • Call transfer is not supported. If your agent emits a transfer_call action, Avaturn logs a warning and ignores it. The avatar continues in the existing session.
  • Server-initiated end. If your agent invokes the end_call tool (or otherwise ends the conversation), Cartesia closes the WebSocket gracefully and the Avaturn session ends as a normal termination.

Session lifecycle

A session ends on any of:
  • Explicit DELETE /api/v1/sessions/{session_id}
  • Your Cartesia agent ending the call (e.g. via the end_call tool)
  • user_absent_timeout elapses with the user disconnected (default 60s)
  • max_duration cap reached (default 3600s, max 86400s)
async with httpx.AsyncClient() as http:
    await http.delete(
        f"https://api.avaturn.live/api/v1/sessions/{session_id}",
        headers={"Authorization": "Bearer <AVATURN_API_KEY>"},
    )
Call avatar.dispose() on the frontend to tear down the local SDK state. The backend session terminates as described above — dispose() does not directly close it.

Reference