When this still makes sense
Use text echo only if you need:- Fully scripted, deterministic speech with no STT
- Backend-driven LLM output where you stream sentences to the avatar yourself
- Compatibility with existing integrations that already depend on this flow
How it works
Create a session
POST /api/v1/sessions with an explicit text-echo engine config, or omit conversation_engine to inherit the avatar’s default text-echo voice. The avatar stays silent until you push text.Push text
From your backend:
POST /api/v1/sessions/{id}/tasks. From the frontend: avatar.task(text) on the Web SDK instance.Terminate
DELETE /api/v1/sessions/{id}, or wait for user_absent_timeout.Engine config
To override the default voice, pass an explicittext-echo engine config when creating the session:
elevenlabs is currently accepted as the TTS engine. tts is required when type is text-echo. The voice_id is validated against ElevenLabs at session creation — an invalid id returns HTTP 400.
Backend example
Frontend example
task(), cancelAllTasks(), changeVoice().
Streaming LLM output
To feed an LLM stream throughtask(), see Streaming LLM output.