mirror of
https://github.com/moltbot/moltbot.git
synced 2026-05-13 23:56:07 +00:00
docs: document OpenAI realtime voices
This commit is contained in:
@@ -1184,7 +1184,10 @@ Auto-join example:
|
||||
reconnectGraceMs: 15000,
|
||||
tts: {
|
||||
provider: "openai",
|
||||
openai: { voice: "onyx" },
|
||||
openai: {
|
||||
model: "gpt-4o-mini-tts",
|
||||
voice: "cedar",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
@@ -1195,8 +1198,9 @@ Auto-join example:
|
||||
Notes:
|
||||
|
||||
- `voice.tts` overrides `messages.tts` for voice playback only.
|
||||
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model.
|
||||
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model. Do not set this to `gpt-realtime-2`; Discord voice channels use STT plus TTS playback, not the OpenAI Realtime session transport.
|
||||
- STT uses `tools.media.audio`; `voice.model` does not affect transcription.
|
||||
- For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
|
||||
- Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
|
||||
- Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
|
||||
- Discord voice is opt-in for text-only configs; set `channels.discord.voice.enabled=true` (or keep an existing `channels.discord.voice` block) to enable `/vc` commands, the voice runtime, and the `GuildVoiceStates` gateway intent.
|
||||
|
||||
@@ -1389,7 +1389,7 @@ Defaults for Talk mode (macOS/iOS/Android).
|
||||
providers: {
|
||||
openai: {
|
||||
model: "gpt-realtime-2",
|
||||
voice: "alloy",
|
||||
voice: "cedar",
|
||||
},
|
||||
},
|
||||
mode: "realtime",
|
||||
|
||||
@@ -82,7 +82,7 @@ Supported keys:
|
||||
openai: {
|
||||
apiKey: "openai_api_key",
|
||||
model: "gpt-realtime-2",
|
||||
voice: "alloy",
|
||||
voice: "cedar",
|
||||
},
|
||||
},
|
||||
mode: "realtime",
|
||||
@@ -104,6 +104,7 @@ Defaults:
|
||||
- `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
|
||||
- `realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
|
||||
- `realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
|
||||
- `realtime.providers.openai.voice`: built-in OpenAI Realtime voice id. Current `gpt-realtime-2` voices are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, and `cedar`; `marin` and `cedar` are recommended for best quality.
|
||||
- `realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is owner-only compatibility behavior; `none` is for transcription or external orchestration.
|
||||
- `talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
|
||||
- Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.
|
||||
|
||||
@@ -648,10 +648,23 @@ Legacy `plugins.entries.openai.config.personality` is still read as a compatibil
|
||||
| Silence duration | `...openai.silenceDurationMs` | `500` |
|
||||
| API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |
|
||||
|
||||
Available built-in Realtime voices for `gpt-realtime-2`: `alloy`, `ash`,
|
||||
`ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`.
|
||||
OpenAI recommends `marin` and `cedar` for the best Realtime quality. This
|
||||
is a separate set from the Text-to-speech voices above; do not assume a TTS
|
||||
voice such as `fable`, `nova`, or `onyx` is valid for Realtime sessions.
|
||||
|
||||
<Note>
|
||||
Backend OpenAI realtime bridges use the GA Realtime WebSocket session shape, which does not accept `session.temperature`. Azure OpenAI deployments remain available via `azureEndpoint` and `azureDeployment` and keep the deployment-compatible session shape. Supports bidirectional tool calling and G.711 u-law audio.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Realtime voice is selected when the session is created. OpenAI allows most
|
||||
session fields to change later, but the voice cannot be changed after the
|
||||
model has emitted audio in that session. OpenClaw currently exposes the
|
||||
built-in Realtime voice ids as strings.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Control UI Talk uses OpenAI browser realtime sessions with a Gateway-minted
|
||||
ephemeral client secret and a direct browser WebRTC SDP exchange against the
|
||||
|
||||
Reference in New Issue
Block a user