You’ll mint an ephemeral OpenAI client secret on your backend, create an Avaturn session bound to it, and connect from the browser. The conversation engine is OpenAI Realtime .
Prerequisites
Avaturn API key (dashboard )
OpenAI API key with Realtime API access
Backend in Node.js or Python; any JS framework on the frontend
1. Mint an OpenAI client secret
On your backend, exchange your OpenAI API key for a short-lived client secret.
from openai import AsyncOpenAI
openai = AsyncOpenAI( api_key = "<OPENAI_API_KEY>" )
secret = await openai.realtime.client_secrets.create(
expires_after = { "seconds" : 600 , "anchor" : "created_at" },
session = {
"type" : "realtime" ,
"model" : "gpt-realtime" ,
"instructions" : "You are a friendly assistant. Keep replies brief." ,
"audio" : { "output" : { "voice" : "marin" }},
},
)
client_secret = secret.value # ek_...
2. Create an Avaturn session
import httpx
async with httpx.AsyncClient() as http:
r = await http.post(
"https://api.avaturn.live/api/v1/sessions" ,
headers = { "Authorization" : "Bearer <AVATURN_API_KEY>" },
json = {
"conversation_engine" : {
"type" : "openai-realtime" ,
"client_secret" : client_secret,
}
},
)
r.raise_for_status()
session = r.json() # { "session_id": "...", "token": "..." }
Never expose <AVATURN_API_KEY> or <OPENAI_API_KEY> to the browser. Only the per-session token belongs there.
3. Connect from the frontend
npm install @avaturn-live/web-sdk
import { AvaturnHead } from "@avaturn-live/web-sdk" ;
const root = document . querySelector < HTMLDivElement >( "#avaturn-video" ) ! ;
const avatar = new AvaturnHead ( root , {
sessionToken: session . token ,
audioSource: true ,
});
await avatar . init ();
The avatar joins the room, requests microphone access, and starts conversing. Speak into the mic — the avatar responds.
4. Clean up
dispose() tears down the local SDK state. The backend session expires shortly after the user disconnects (default 60s). To terminate immediately, call DELETE /api/v1/sessions/{id} from your backend.
Next steps
OpenAI Realtime engine Tools, custom prompts, turn detection, transcripts.
Cartesia engine Drive the avatar from a Cartesia Line agent.
SDK methods Devices, mute, attach to a new DOM node.
SDK events Speech start/end, lifecycle, transcripts.