OpenAI Realtime Conversation Engine

Overview

The OpenAI Realtime conversation engine enables natural, low-latency voice conversations between users and avatars. Unlike text-based conversation engines, it handles both speech-to-text and text-to-speech natively through OpenAI’s Realtime API, providing a seamless conversational experience.

Key Features

Bidirectional Audio: Users can speak directly to the avatar and receive spoken responses
Natural Interruptions: Users can interrupt the avatar mid-sentence, just like in real conversations
Low Latency: Minimal delay between user speech and avatar response
Built-in Speech Processing: No separate TTS/STT configuration needed

When to Use

Use the OpenAI Realtime engine when you need:

Natural, conversational interactions with interruption support
Real-time voice-to-voice communication
Low-latency responses for interactive experiences

For scripted content with precise timing control, consider using the text-echo conversation engine instead.

Prerequisites

Before getting started, ensure you have:

An OpenAI API key with access to the Realtime API
An Avaturn API key for creating sessions
Familiarity with OpenAI’s Realtime API basics

How It Works

Create OpenAI Client Secret

Your backend creates an ephemeral client secret from OpenAI’s API

Create Avaturn Session

Your backend creates an Avaturn session configured with the OpenAI Realtime conversation engine, passing the ephemeral client secret

Connect with Web SDK

Your frontend uses the Avaturn Web SDK to initialize and connect to the session

Architecture Overview

Creating OpenAI Client Secrets

OpenAI’s Realtime API uses ephemeral client secrets for secure, temporary access. These secrets are created server-side and passed to Avaturn when creating a session.

Never expose your OpenAI API key to the frontend. Always create ephemeral client secrets on your backend server.

Please use only GA version of OpenAI Realtime API. Beta or mixed API version usage is not supported and will prevent the session from properly starting. Refer here to understand the difference.

Code Examples

from openai import AsyncOpenAI

# Initialize OpenAI client with your API key
client = AsyncOpenAI(api_key="your-openai-api-key")

# Create an ephemeral client secret
session = await client.realtime.client_secrets.create(
    expires_after={"seconds": 7200, "anchor": "created_at"},
    session={"type": "realtime", "model": "gpt-realtime"}
)

# Use session.value as the client_secret
client_secret = session.value

Customizing Session Configuration

When creating an ephemeral client secret, you can customize the OpenAI Realtime session by configuring prompts, tools, voice settings, and more. Avaturn passes this configuration through to OpenAI.

You have full control over the OpenAI session configuration. Configure instructions, tools, audio settings, and other parameters when creating the ephemeral client secret.

Available Configuration Options

Top-level options:

instructions - Custom system prompt to guide the AI’s behavior
model - Model ID (e.g., gpt-realtime, gpt-4o-realtime-preview-2024-12-17)
tools - Function calling tools for extended capabilities
tool_choice - Tool selection mode: "auto", "none", "required", or specific tool
prompt - Reference to a stored prompt by ID (see below)
truncation - Context truncation: "auto", "disabled", or retention ratio config
tracing - Tracing configuration for debugging

Audio input options (audio.input):

transcription - Transcription settings with model (whisper-1, gpt-4o-transcribe, etc.), language, and prompt
turn_detection - Voice activity detection (see below)

Audio output options (audio.output):

voice - Voice selection: alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar
speed - Speech speed: 0.25 to 1.5 (default 1.0)

Turn detection options (audio.input.turn_detection): Server VAD (voice activity detection):

{
  "type": "server_vad",
  "threshold": 0.5,
  "prefix_padding_ms": 300,
  "silence_duration_ms": 500,
  "create_response": true,
  "interrupt_response": true
}

Semantic VAD (AI-based turn detection):

{
  "type": "semantic_vad",
  "eagerness": "medium",
  "create_response": true,
  "interrupt_response": true
}

For complete details, see OpenAI’s session configuration reference.

Example: Custom Instructions and Tools

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key="your-openai-api-key")

# Create ephemeral client secret with custom configuration
session = await client.realtime.client_secrets.create(
    expires_after={"seconds": 7200, "anchor": "created_at"},
    session={
        "type": "realtime",
        "model": "gpt-realtime",
        "instructions": "You are a helpful AI assistant representing a company. Be professional and friendly.",
        "audio": {
            "output": {
                "voice": "alloy"
            }
        },
        "tools": [
            {
                "type": "function",
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"}
                    },
                    "required": ["location"]
                }
            }
        ]
    }
)

client_secret = session.value

Using Stored Prompts

OpenAI allows you to save and reuse prompts across sessions. Instead of passing instructions inline, you can reference a stored prompt by its ID (format: pmpt_xxx).

Stored prompts can include instructions, tools, variables, and example messages. This helps maintain consistency and simplifies prompt management.

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key="your-openai-api-key")

# Create ephemeral client secret using a stored prompt
session = await client.realtime.client_secrets.create(
    expires_after={"seconds": 7200, "anchor": "created_at"},
    session={
        "type": "realtime",
        "model": "gpt-realtime",
        "prompt": {
            "id": "pmpt_abc123",  # Your stored prompt ID
            "version": "6",       # Optional: pin to specific version
            "variables": {        # Optional: pass variables to prompt
                "company_name": "Acme Corp",
                "tone": "professional"
            }
        }
    }
)

client_secret = session.value

Learn More:

OpenAI Realtime API Guide - Comprehensive guide to the Realtime API
Session Configuration Reference - Complete list of configuration options

Client Secret Expiration

Ephemeral client secrets expire after the specified duration (7200 seconds = 2 hours in the example above). Plan your session lifecycle accordingly:

Create a new ephemeral client secret for each user session
Handle expiration by creating new sessions
Don’t reuse expired client secrets

Configuring the Conversation Engine

When creating an Avaturn session, configure the conversation engine with type "openai-realtime" and pass the OpenAI client secret.

Configuration Schema

The conversation engine configuration requires two fields:

type: Must be "openai-realtime"
client_secret: The ephemeral client secret from OpenAI (see above)

Session Creation Examples

import requests

# After creating the OpenAI client secret (see above)
response = requests.post(
    "https://api.avaturn.live/api/v1/sessions",
    headers={
        "Authorization": f"Bearer {avaturn_api_key}",
        "Content-Type": "application/json"
    },
    json={
        "conversation_engine": {
            "type": "openai-realtime",
            "client_secret": client_secret  # From OpenAI
        }
    }
)

data = response.json()
session_id = data["session_id"]
session_token = data["token"]

Response

The session creation endpoint returns:

session_id: Unique identifier for the session
token: Session token to pass to your frontend for SDK initialization

Pass the session_token to your frontend to initialize the Avaturn Web SDK. See the Web SDK documentation for details on connecting and managing the session from the client side.

Session Management

Session Lifecycle

Created: Session is initialized but not yet active
Active: User has connected and conversation is ongoing
Terminated: Session has been explicitly ended or expired

Terminating Sessions

To end a session programmatically:

import requests

requests.delete(
    f"https://api.avaturn.live/api/v1/sessions/{session_id}",
    headers={"Authorization": f"Bearer {avaturn_api_key}"}
)

Sessions automatically terminate when the ephemeral client secret expires. Always handle expiration gracefully in your application.

Additional Resources

OpenAI Realtime API Documentation
Avaturn Web SDK Documentation
Avaturn API Reference (for complete session creation parameters)

Get Started

REST

Web SDK

How Tos

OpenAI Realtime Conversation Engine

Overview

Key Features

When to Use

Prerequisites

How It Works

Architecture Overview

Creating OpenAI Client Secrets

Code Examples

Customizing Session Configuration

Available Configuration Options

Example: Custom Instructions and Tools

Using Stored Prompts

Client Secret Expiration

Configuring the Conversation Engine

Configuration Schema

Session Creation Examples

Response

Session Management

Session Lifecycle

Terminating Sessions

Additional Resources

Get Started

REST

Web SDK

How Tos

​Overview

​Key Features

​When to Use

​Prerequisites

​How It Works

​Architecture Overview

​Creating OpenAI Client Secrets

​Code Examples

​Customizing Session Configuration

​Available Configuration Options

​Example: Custom Instructions and Tools

​Using Stored Prompts

​Client Secret Expiration

​Configuring the Conversation Engine

​Configuration Schema

​Session Creation Examples

​Response

​Session Management

​Session Lifecycle

​Terminating Sessions

​Additional Resources

Overview

Key Features

When to Use

Prerequisites

How It Works

Architecture Overview

Creating OpenAI Client Secrets

Code Examples

Customizing Session Configuration

Available Configuration Options

Example: Custom Instructions and Tools

Using Stored Prompts

Client Secret Expiration

Configuring the Conversation Engine

Configuration Schema

Session Creation Examples

Response

Session Management

Session Lifecycle

Terminating Sessions

Additional Resources