docs: document OpenAI realtime voices

This commit is contained in:
Peter Steinberger
2026-05-08 01:07:46 +01:00
parent 63ec912786
commit b75e5c50bf
4 changed files with 22 additions and 4 deletions

View File

@@ -1184,7 +1184,10 @@ Auto-join example:
reconnectGraceMs: 15000,
tts: {
provider: "openai",
openai: { voice: "onyx" },
openai: {
model: "gpt-4o-mini-tts",
voice: "cedar",
},
},
},
},
@@ -1195,8 +1198,9 @@ Auto-join example:
Notes:
- `voice.tts` overrides `messages.tts` for voice playback only.
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model.
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model. Do not set this to `gpt-realtime-2`; Discord voice channels use STT plus TTS playback, not the OpenAI Realtime session transport.
- STT uses `tools.media.audio`; `voice.model` does not affect transcription.
- For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
- Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
- Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
- Discord voice is opt-in for text-only configs; set `channels.discord.voice.enabled=true` (or keep an existing `channels.discord.voice` block) to enable `/vc` commands, the voice runtime, and the `GuildVoiceStates` gateway intent.

View File

@@ -1389,7 +1389,7 @@ Defaults for Talk mode (macOS/iOS/Android).
providers: {
openai: {
model: "gpt-realtime-2",
voice: "alloy",
voice: "cedar",
},
},
mode: "realtime",

View File

@@ -82,7 +82,7 @@ Supported keys:
openai: {
apiKey: "openai_api_key",
model: "gpt-realtime-2",
voice: "alloy",
voice: "cedar",
},
},
mode: "realtime",
@@ -104,6 +104,7 @@ Defaults:
- `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
- `realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
- `realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
- `realtime.providers.openai.voice`: built-in OpenAI Realtime voice id. Current `gpt-realtime-2` voices are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, and `cedar`; `marin` and `cedar` are recommended for best quality.
- `realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is owner-only compatibility behavior; `none` is for transcription or external orchestration.
- `talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
- Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.

View File

@@ -648,10 +648,23 @@ Legacy `plugins.entries.openai.config.personality` is still read as a compatibil
| Silence duration | `...openai.silenceDurationMs` | `500` |
| API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |
Available built-in Realtime voices for `gpt-realtime-2`: `alloy`, `ash`,
`ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`.
OpenAI recommends `marin` and `cedar` for the best Realtime quality. This
is a separate set from the Text-to-speech voices above; do not assume a TTS
voice such as `fable`, `nova`, or `onyx` is valid for Realtime sessions.
<Note>
Backend OpenAI realtime bridges use the GA Realtime WebSocket session shape, which does not accept `session.temperature`. Azure OpenAI deployments remain available via `azureEndpoint` and `azureDeployment` and keep the deployment-compatible session shape. Supports bidirectional tool calling and G.711 u-law audio.
</Note>
<Note>
Realtime voice is selected when the session is created. OpenAI allows most
session fields to change later, but the voice cannot be changed after the
model has emitted audio in that session. OpenClaw currently exposes the
built-in Realtime voice ids as strings.
</Note>
<Note>
Control UI Talk uses OpenAI browser realtime sessions with a Gateway-minted
ephemeral client secret and a direct browser WebRTC SDP exchange against the