docs: document OpenAI realtime voices

2026-05-13 23:56:07 +00:00 · 2026-05-08 01:07:46 +01:00
parent 63ec912786
commit b75e5c50bf
4 changed files with 22 additions and 4 deletions
--- a/docs/channels/discord.md
+++ b/docs/channels/discord.md
@@ -1184,7 +1184,10 @@ Auto-join example:
        reconnectGraceMs: 15000,
        tts: {
          provider: "openai",
-          openai: { voice: "onyx" },
+          openai: {
+            model: "gpt-4o-mini-tts",
+            voice: "cedar",
+          },
        },
      },
    },
@@ -1195,8 +1198,9 @@ Auto-join example:
 Notes:

 - `voice.tts` overrides `messages.tts` for voice playback only.
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model.
+- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model. Do not set this to `gpt-realtime-2`; Discord voice channels use STT plus TTS playback, not the OpenAI Realtime session transport.
 - STT uses `tools.media.audio`; `voice.model` does not affect transcription.
+- For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
 - Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
 - Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
 - Discord voice is opt-in for text-only configs; set `channels.discord.voice.enabled=true` (or keep an existing `channels.discord.voice` block) to enable `/vc` commands, the voice runtime, and the `GuildVoiceStates` gateway intent.
--- a/docs/gateway/config-agents.md
+++ b/docs/gateway/config-agents.md
@@ -1389,7 +1389,7 @@ Defaults for Talk mode (macOS/iOS/Android).
      providers: {
        openai: {
          model: "gpt-realtime-2",
-          voice: "alloy",
+          voice: "cedar",
        },
      },
      mode: "realtime",
--- a/docs/nodes/talk.md
+++ b/docs/nodes/talk.md
@@ -82,7 +82,7 @@ Supported keys:
        openai: {
          apiKey: "openai_api_key",
          model: "gpt-realtime-2",
-          voice: "alloy",
+          voice: "cedar",
        },
      },
      mode: "realtime",
@@ -104,6 +104,7 @@ Defaults:
 - `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
 - `realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
 - `realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
+- `realtime.providers.openai.voice`: built-in OpenAI Realtime voice id. Current `gpt-realtime-2` voices are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, and `cedar`; `marin` and `cedar` are recommended for best quality.
 - `realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is owner-only compatibility behavior; `none` is for transcription or external orchestration.
 - `talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
 - Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.
--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@@ -648,10 +648,23 @@ Legacy `plugins.entries.openai.config.personality` is still read as a compatibil
    | Silence duration | `...openai.silenceDurationMs` | `500` |
    | API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |

+    Available built-in Realtime voices for `gpt-realtime-2`: `alloy`, `ash`,
+    `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`.
+    OpenAI recommends `marin` and `cedar` for the best Realtime quality. This
+    is a separate set from the Text-to-speech voices above; do not assume a TTS
+    voice such as `fable`, `nova`, or `onyx` is valid for Realtime sessions.
+
    <Note>
    Backend OpenAI realtime bridges use the GA Realtime WebSocket session shape, which does not accept `session.temperature`. Azure OpenAI deployments remain available via `azureEndpoint` and `azureDeployment` and keep the deployment-compatible session shape. Supports bidirectional tool calling and G.711 u-law audio.
    </Note>

+    <Note>
+    Realtime voice is selected when the session is created. OpenAI allows most
+    session fields to change later, but the voice cannot be changed after the
+    model has emitted audio in that session. OpenClaw currently exposes the
+    built-in Realtime voice ids as strings.
+    </Note>
+
    <Note>
    Control UI Talk uses OpenAI browser realtime sessions with a Gateway-minted
    ephemeral client secret and a direct browser WebRTC SDP exchange against the