fix(discord): make realtime barge-in guard tunable

2026-05-13 15:47:28 +00:00 · 2026-05-09 11:11:46 +01:00
parent 2a89e03bf4
commit cc4a596be2
12 changed files with 187 additions and 31 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -46,6 +46,7 @@ Docs: https://docs.openclaw.ai
 - Discord/voice: make duplicate same-guild auto-join entries resolve to the last configured channel so moving an agent between voice channels does not keep joining the stale channel.
 - Discord/voice: add realtime `/vc` modes so Discord voice channels can run as STT/TTS, a realtime talk buffer with the OpenClaw agent brain, or a bidi realtime session with `openclaw_agent_consult`.
 - Discord/voice: add bounded realtime gateway logs for voice channel joins, realtime model/voice selection, transcripts, consult routing/answers, and playback start, allow OpenAI realtime Discord sessions to disable input-triggered response interruption for echo-heavy rooms while keeping explicit Discord barge-in available for new and already-active speakers, and allow voice turns to target an existing Discord channel agent session.
+- Discord/voice: add `voice.realtime.minBargeInAudioEndMs` and let the realtime provider own playback clearing, so speaker echo no longer cuts OpenAI realtime model audio at `audioEndMs=0` while low-echo rooms can opt back into immediate barge-in with `0`.
 - Discord/voice: include a bounded one-line STT transcript preview in verbose voice logs so live voice debugging shows what speakers said before the agent reply.
 - Codex app-server: pin the managed Codex harness and Codex CLI smoke package to `@openai/codex@0.129.0`, defer OpenClaw integration dynamic tools behind Codex tool search by default, and accept current Codex service-tier values so legacy `fast` settings survive the stable harness upgrade as `priority`.
 - Codex app-server: annotate message-tool-only direct chat turns in the dynamic `message` tool spec so visible replies are sent through `message(action="send")` instead of staying private. (#79704)
--- a/docs/.generated/config-baseline.sha256
+++ b/docs/.generated/config-baseline.sha256
@@ -1,4 +1,4 @@
-216abd8ed137e6e92d615d88afb0d9fe0b1428cddee292439b39138ee03f9a10  config-baseline.json
+632c00a35e0ed2413604ff28f5b4df0718131492208863c5d39576d76a9b7c88  config-baseline.json
 7ac9eadabe0119deba4418dbaadc478092fa32617fab3f9618e0a14210720e4b  config-baseline.core.json
-c3e8742922d4e5ece408dd3590382285927ef86252d1a2f6f922566ea21531bb  config-baseline.channel.json
+42264b147fb29e0ba7017b4ec018a0793bb9cd23e58bf5fb796d6b33bf9ca829  config-baseline.channel.json
 df93bfde8e3de8d6f80dbf1b0ae43ad250f216f2fc0244c5d9a19afca50806f6  config-baseline.plugin.json
--- a/docs/channels/discord.md
+++ b/docs/channels/discord.md
@@ -1206,6 +1206,7 @@ Notes:
 - In `stt-tts` mode, STT uses `tools.media.audio`; `voice.model` does not affect transcription.
 - In realtime modes, `voice.realtime.provider`, `voice.realtime.model`, and `voice.realtime.voice` configure the realtime audio session. For OpenAI Realtime 2 plus the Codex brain, use `voice.realtime.model: "gpt-realtime-2"` and `voice.model: "openai-codex/gpt-5.5"`.
 - `voice.realtime.bargeIn` controls whether Discord speaker-start events interrupt active realtime playback. If unset, it follows the realtime provider's input-audio interruption setting.
+- `voice.realtime.minBargeInAudioEndMs` controls the minimum assistant playback duration before an OpenAI realtime barge-in truncates audio. Default: `250`. Set `0` for immediate interruption in low-echo rooms, or raise it for echo-heavy speaker setups.
 - For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
 - Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
 - Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
@@ -1217,7 +1218,7 @@ Notes:
 - `voice.connectTimeoutMs` controls the initial `@discordjs/voice` Ready wait for `/vc join` and auto-join attempts. Default: `30000`.
 - `voice.reconnectGraceMs` controls how long OpenClaw waits for a disconnected voice session to begin reconnecting before destroying it. Default: `15000`.
 - In `stt-tts` mode, voice playback does not stop just because another user starts speaking. To avoid feedback loops, OpenClaw ignores new voice capture while TTS is playing; speak after playback finishes for the next turn. Realtime modes forward speaker starts as barge-in signals to the realtime provider.
- In realtime modes, echo from speakers into an open mic can look like barge-in and interrupt playback. For echo-heavy Discord rooms, set `voice.realtime.providers.openai.interruptResponseOnInputAudio: false` to keep OpenAI from auto-interrupting on input audio. Add `voice.realtime.bargeIn: true` if you still want Discord speaker-start events to interrupt active playback.
+- In realtime modes, echo from speakers into an open mic can look like barge-in and interrupt playback. For echo-heavy Discord rooms, set `voice.realtime.providers.openai.interruptResponseOnInputAudio: false` to keep OpenAI from auto-interrupting on input audio. Add `voice.realtime.bargeIn: true` if you still want Discord speaker-start events to interrupt active playback. The OpenAI realtime bridge ignores playback truncations shorter than `voice.realtime.minBargeInAudioEndMs` as likely echo/noise and logs them as skipped instead of clearing Discord playback.
 - `voice.captureSilenceGraceMs` controls how long OpenClaw waits after Discord reports a speaker has stopped before finalizing that audio segment for STT. Default: `2500`; raise this if Discord splits normal pauses into choppy partial transcripts.
 - When ElevenLabs is the selected TTS provider, Discord voice playback uses streaming TTS and starts from the provider response stream. Providers without streaming support fall back to the synthesized temp-file path.
 - OpenClaw also watches receive decrypt failures and auto-recovers by leaving/rejoining the voice channel after repeated failures in a short window.
@@ -1345,6 +1346,7 @@ Echo-heavy OpenAI Realtime example:
          model: "gpt-realtime-2",
          voice: "cedar",
          bargeIn: true,
+          minBargeInAudioEndMs: 500,
          consultPolicy: "always",
          providers: {
            openai: {
@@ -1358,16 +1360,17 @@ Echo-heavy OpenAI Realtime example:
 }
 ```

-Use this when the model hears its own Discord playback through an open mic, but you still want to interrupt it by speaking. OpenClaw keeps OpenAI from auto-interrupting on raw input audio, while `bargeIn: true` lets Discord speaker-start events and already-active speaker audio cancel active realtime responses before the next captured turn reaches OpenAI.
+Use this when the model hears its own Discord playback through an open mic, but you still want to interrupt it by speaking. OpenClaw keeps OpenAI from auto-interrupting on raw input audio, while `bargeIn: true` lets Discord speaker-start events and already-active speaker audio cancel active realtime responses before the next captured turn reaches OpenAI. Very early barge-in signals with `audioEndMs` below `minBargeInAudioEndMs` are treated as likely echo/noise and ignored so the model does not cut off at the first playback frame.

 Expected voice logs:

 - On join: `discord voice: joining ... voiceSession=... supervisorSession=... agentSessionMode=... voiceModel=... realtimeModel=...`
- On realtime start: `discord voice: realtime bridge starting ... interruptResponse=false bargeIn=true`
+- On realtime start: `discord voice: realtime bridge starting ... interruptResponse=false bargeIn=true minBargeInAudioEndMs=...`
 - On realtime consult: `discord voice: realtime consult requested ... voiceSession=... supervisorSession=... question=...`
 - On agent answer: `discord voice: agent turn answer ...`
 - On same-speaker interruption: `discord voice: realtime barge-in from active speaker audio ...`
 - On realtime interruption: `discord voice: realtime model interrupt requested client:response.cancel reason=barge-in`, followed by either `discord voice: realtime model audio truncated client:conversation.item.truncate reason=barge-in audioEndMs=...` or `discord voice: realtime model interrupt confirmed server:response.done status=cancelled ...`
+- On ignored echo/noise: `discord voice: realtime model interrupt ignored client:conversation.item.truncate.skipped reason=barge-in audioEndMs=0 minAudioEndMs=250`
 - On disabled barge-in: `discord voice: realtime capture ignored during playback (barge-in disabled) ...`

 Credentials are resolved per component: LLM route auth for `voice.model`, STT auth for `tools.media.audio`, TTS auth for `messages.tts`/`voice.tts`, and realtime provider auth for `voice.realtime.providers` or the provider's normal auth config.
--- a/extensions/discord/src/config-schema.test.ts
+++ b/extensions/discord/src/config-schema.test.ts
@@ -175,6 +175,7 @@ describe("discord config schema", () => {
          toolPolicy: "safe-read-only",
          consultPolicy: "always",
          bargeIn: true,
+          minBargeInAudioEndMs: 500,
          providers: {
            openai: {
              apiKey: "sk-test",
@@ -193,6 +194,7 @@ describe("discord config schema", () => {
    expect(cfg.voice?.realtime?.toolPolicy).toBe("safe-read-only");
    expect(cfg.voice?.realtime?.consultPolicy).toBe("always");
    expect(cfg.voice?.realtime?.bargeIn).toBe(true);
+    expect(cfg.voice?.realtime?.minBargeInAudioEndMs).toBe(500);
  });

  it("rejects invalid Discord realtime voice modes", () => {
@@ -201,6 +203,8 @@ describe("discord config schema", () => {
      { mode: "bidi", realtime: { toolPolicy: "dangerous" } },
      { mode: "talk-buffer", realtime: { consultPolicy: "substantive" } },
      { mode: "talk-buffer", realtime: { debounceMs: 10_001 } },
+      { mode: "talk-buffer", realtime: { minBargeInAudioEndMs: -1 } },
+      { mode: "talk-buffer", realtime: { minBargeInAudioEndMs: 10_001 } },
      { agentSession: { mode: "target" } },
    ]) {
      expectInvalidDiscordConfig({ voice });
--- a/extensions/discord/src/config-ui-hints.ts
+++ b/extensions/discord/src/config-ui-hints.ts
@@ -217,6 +217,10 @@ export const discordChannelConfigUiHints = {
    label: "Discord Realtime Barge-In",
    help: "Allow Discord speaker-start events to interrupt active realtime playback. Set true to keep manual interruption when provider input-audio interruption is disabled for echo control.",
  },
+  "voice.realtime.minBargeInAudioEndMs": {
+    label: "Discord Realtime Minimum Barge-In Audio (ms)",
+    help: "Minimum assistant playback duration before a Discord barge-in truncates realtime audio. Default: 250; set 0 for immediate interruption in low-echo rooms.",
+  },
  "voice.realtime.providers": {
    label: "Discord Realtime Provider Settings",
    help: "Provider-specific realtime voice settings keyed by provider id.",
--- a/extensions/discord/src/voice/manager.e2e.test.ts
+++ b/extensions/discord/src/voice/manager.e2e.test.ts
@@ -587,7 +587,7 @@ describe("DiscordVoiceManager", () => {
    ).handleSpeakingStart(entry, "u1");

    expect(realtimeSessionMock.handleBargeIn).toHaveBeenCalled();
-    expect(player.stop).toHaveBeenCalledWith(true);
+    expect(player.stop).not.toHaveBeenCalled();
    expect(connection.receiver.subscribe).toHaveBeenCalledWith(
      "u1",
      expect.objectContaining({ end: expect.any(Object) }),
@@ -642,7 +642,7 @@ describe("DiscordVoiceManager", () => {
    turn?.sendInputAudio(Buffer.alloc(3840));

    expect(realtimeSessionMock.handleBargeIn).toHaveBeenCalled();
-    expect(player.stop).toHaveBeenCalledWith(true);
+    expect(player.stop).not.toHaveBeenCalled();
    expect(realtimeSessionMock.sendAudio).toHaveBeenCalled();
  });

@@ -684,7 +684,7 @@ describe("DiscordVoiceManager", () => {
    turn?.sendInputAudio(Buffer.alloc(3840));

    expect(realtimeSessionMock.handleBargeIn).toHaveBeenCalled();
-    expect(player.stop).toHaveBeenCalledWith(true);
+    expect(player.stop).not.toHaveBeenCalled();
    expect(realtimeSessionMock.sendAudio).toHaveBeenCalled();
  });

@@ -964,6 +964,7 @@ describe("DiscordVoiceManager", () => {
        realtime: {
          model: "gpt-realtime-2",
          voice: "cedar",
+          minBargeInAudioEndMs: 500,
          providers: {
            openai: { model: "provider-default", voice: "marin" },
          },
@@ -981,7 +982,11 @@ describe("DiscordVoiceManager", () => {
        providerConfigs: expect.objectContaining({
          openai: { model: "provider-default", voice: "marin" },
        }),
-        providerConfigOverrides: { model: "gpt-realtime-2", voice: "cedar" },
+        providerConfigOverrides: {
+          model: "gpt-realtime-2",
+          voice: "cedar",
+          minBargeInAudioEndMs: 500,
+        },
      }),
    );
  });
--- a/extensions/discord/src/voice/realtime.ts
+++ b/extensions/discord/src/voice/realtime.ts
@@ -41,6 +41,7 @@ const DISCORD_REALTIME_TALKBACK_DEBOUNCE_MS = 350;
 const DISCORD_REALTIME_FALLBACK_TEXT = "I hit an error while checking that. Please try again.";
 const DISCORD_REALTIME_PENDING_SPEAKER_CONTEXT_LIMIT = 32;
 const DISCORD_REALTIME_LOG_PREVIEW_CHARS = 500;
+const DISCORD_REALTIME_DEFAULT_MIN_BARGE_IN_AUDIO_END_MS = 250;

 export type DiscordVoiceMode = "stt-tts" | "talk-buffer" | "bidi";

@@ -69,6 +70,9 @@ function formatRealtimeInterruptionLog(event: RealtimeVoiceBridgeEvent): string
    if (event.type === "response.cancel") {
      return `discord voice: realtime model interrupt requested ${event.direction}:${event.type}${detail}`;
    }
+    if (event.type === "conversation.item.truncate.skipped") {
+      return `discord voice: realtime model interrupt ignored ${event.direction}:${event.type}${detail}`;
+    }
    if (event.type === "conversation.item.truncate") {
      return `discord voice: realtime model audio truncated ${event.direction}:${event.type}${detail}`;
    }
@@ -260,7 +264,7 @@ export class DiscordRealtimeVoiceSession implements VoiceRealtimeSession {
          realtimeConfig: this.realtimeConfig,
          providerId: resolved.provider.id,
        },
-      )}`,
+      )} minBargeInAudioEndMs=${resolveDiscordRealtimeMinBargeInAudioEndMs(this.realtimeConfig)}`,
    );
    const voiceSdk = loadDiscordVoiceSdk();
    this.params.entry.player.on(voiceSdk.AudioPlayerStatus.Idle, this.playerIdleHandler);
@@ -323,7 +327,6 @@ export class DiscordRealtimeVoiceSession implements VoiceRealtimeSession {
      return;
    }
    this.bridge?.handleBargeIn({ audioPlaybackActive: true });
-    this.clearOutputAudio();
  }

  isBargeInEnabled(): boolean {
@@ -516,10 +519,21 @@ function buildProviderConfigOverrides(
  const overrides = {
    ...(realtimeConfig?.model ? { model: realtimeConfig.model } : {}),
    ...(realtimeConfig?.voice ? { voice: realtimeConfig.voice } : {}),
+    ...(typeof realtimeConfig?.minBargeInAudioEndMs === "number"
+      ? { minBargeInAudioEndMs: realtimeConfig.minBargeInAudioEndMs }
+      : {}),
  };
  return Object.keys(overrides).length > 0 ? overrides : undefined;
 }

+function resolveDiscordRealtimeMinBargeInAudioEndMs(
+  realtimeConfig: DiscordRealtimeVoiceConfig,
+): number {
+  return typeof realtimeConfig?.minBargeInAudioEndMs === "number"
+    ? realtimeConfig.minBargeInAudioEndMs
+    : DISCORD_REALTIME_DEFAULT_MIN_BARGE_IN_AUDIO_END_MS;
+}
+
 function buildDiscordRealtimeInstructions(params: {
  mode: Exclude<DiscordVoiceMode, "stt-tts">;
  instructions?: string;
--- a/extensions/openai/realtime-voice-provider.test.ts
+++ b/extensions/openai/realtime-voice-provider.test.ts
@@ -703,7 +703,7 @@ describe("buildOpenAIRealtimeVoiceProvider", () => {
        }),
      ),
    );
-    bridge.setMediaTimestamp(1240);
+    bridge.setMediaTimestamp(1300);

    bridge.handleBargeIn?.({ audioPlaybackActive: true });

@@ -714,7 +714,7 @@ describe("buildOpenAIRealtimeVoiceProvider", () => {
      type: "conversation.item.truncate",
      item_id: "item_1",
      content_index: 0,
-      audio_end_ms: 240,
+      audio_end_ms: 300,
    });
  });

@@ -904,6 +904,7 @@ describe("buildOpenAIRealtimeVoiceProvider", () => {
      "message",
      Buffer.from(JSON.stringify({ type: "response.created", response: { id: "resp_1" } })),
    );
+    bridge.setMediaTimestamp(1000);
    socket.emit(
      "message",
      Buffer.from(
@@ -914,6 +915,7 @@ describe("buildOpenAIRealtimeVoiceProvider", () => {
        }),
      ),
    );
+    bridge.setMediaTimestamp(1300);

    bridge.handleBargeIn?.({ audioPlaybackActive: true });
    bridge.handleBargeIn?.({ audioPlaybackActive: true });
@@ -927,7 +929,106 @@ describe("buildOpenAIRealtimeVoiceProvider", () => {
    expect(onEvent).toHaveBeenCalledWith({
      direction: "client",
      type: "conversation.item.truncate",
-      detail: "reason=barge-in audioEndMs=0",
+      detail: "reason=barge-in audioEndMs=300",
+    });
+  });
+
+  it("ignores zero-length playback barge-in without clearing audio", async () => {
+    const provider = buildOpenAIRealtimeVoiceProvider();
+    const onClearAudio = vi.fn();
+    const onEvent = vi.fn();
+    const bridge = provider.createBridge({
+      providerConfig: { apiKey: "sk-test" }, // pragma: allowlist secret
+      onAudio: vi.fn(),
+      onClearAudio,
+      onEvent,
+    });
+    const connecting = bridge.connect();
+    const socket = FakeWebSocket.instances[0];
+    if (!socket) {
+      throw new Error("expected bridge to create a websocket");
+    }
+
+    socket.readyState = FakeWebSocket.OPEN;
+    socket.emit("open");
+    socket.emit("message", Buffer.from(JSON.stringify({ type: "session.updated" })));
+    await connecting;
+    bridge.setMediaTimestamp(1000);
+    socket.emit(
+      "message",
+      Buffer.from(JSON.stringify({ type: "response.created", response: { id: "resp_1" } })),
+    );
+    socket.emit(
+      "message",
+      Buffer.from(
+        JSON.stringify({
+          type: "response.audio.delta",
+          item_id: "item_1",
+          delta: Buffer.from("assistant audio").toString("base64"),
+        }),
+      ),
+    );
+
+    bridge.handleBargeIn?.({ audioPlaybackActive: true });
+
+    expect(onClearAudio).not.toHaveBeenCalled();
+    expect(parseSent(socket)).not.toContainEqual({ type: "response.cancel" });
+    expect(parseSent(socket).some((event) => event.type === "conversation.item.truncate")).toBe(
+      false,
+    );
+    expect(onEvent).toHaveBeenCalledWith({
+      direction: "client",
+      type: "conversation.item.truncate.skipped",
+      detail: "reason=barge-in audioEndMs=0 minAudioEndMs=250",
+    });
+  });
+
+  it("allows immediate playback barge-in when the minimum audio window is zero", async () => {
+    const provider = buildOpenAIRealtimeVoiceProvider();
+    const onClearAudio = vi.fn();
+    const bridge = provider.createBridge({
+      providerConfig: {
+        apiKey: "sk-test", // pragma: allowlist secret
+        minBargeInAudioEndMs: 0,
+      },
+      onAudio: vi.fn(),
+      onClearAudio,
+    });
+    const connecting = bridge.connect();
+    const socket = FakeWebSocket.instances[0];
+    if (!socket) {
+      throw new Error("expected bridge to create a websocket");
+    }
+
+    socket.readyState = FakeWebSocket.OPEN;
+    socket.emit("open");
+    socket.emit("message", Buffer.from(JSON.stringify({ type: "session.updated" })));
+    await connecting;
+    bridge.setMediaTimestamp(1000);
+    socket.emit(
+      "message",
+      Buffer.from(JSON.stringify({ type: "response.created", response: { id: "resp_1" } })),
+    );
+    socket.emit(
+      "message",
+      Buffer.from(
+        JSON.stringify({
+          type: "response.audio.delta",
+          item_id: "item_1",
+          delta: Buffer.from("assistant audio").toString("base64"),
+        }),
+      ),
+    );
+
+    bridge.handleBargeIn?.({ audioPlaybackActive: true });
+
+    expect(onClearAudio).toHaveBeenCalledTimes(1);
+    expect(parseSent(socket)).toContainEqual({ type: "response.cancel" });
+    expect(parseSent(socket)).toContainEqual({
+      type: "conversation.item.truncate",
+      item_id: "item_1",
+      content_index: 0,
+      audio_end_ms: 0,
    });
  });

--- a/extensions/openai/realtime-voice-provider.ts
+++ b/extensions/openai/realtime-voice-provider.ts
@@ -59,6 +59,7 @@ type OpenAIRealtimeVoiceProviderConfig = {
  silenceDurationMs?: number;
  prefixPaddingMs?: number;
  interruptResponseOnInputAudio?: boolean;
+  minBargeInAudioEndMs?: number;
  azureEndpoint?: string;
  azureDeployment?: string;
  azureApiVersion?: string;
@@ -73,6 +74,7 @@ type OpenAIRealtimeVoiceBridgeConfig = RealtimeVoiceBridgeCreateRequest & {
  silenceDurationMs?: number;
  prefixPaddingMs?: number;
  interruptResponseOnInputAudio?: boolean;
+  minBargeInAudioEndMs?: number;
  azureEndpoint?: string;
  azureDeployment?: string;
  azureApiVersion?: string;
@@ -84,6 +86,7 @@ const OPENAI_REALTIME_ACTIVE_RESPONSE_ERROR_PREFIX =
  "Conversation already has an active response in progress:";
 const OPENAI_REALTIME_NO_ACTIVE_RESPONSE_CANCEL_ERROR =
  "Cancellation failed: no active response found";
+const OPENAI_REALTIME_DEFAULT_MIN_BARGE_IN_AUDIO_END_MS = 250;

 type RealtimeEvent = {
  type: string;
@@ -177,12 +180,18 @@ function normalizeProviderConfig(
      typeof raw?.interruptResponseOnInputAudio === "boolean"
        ? raw.interruptResponseOnInputAudio
        : undefined,
+    minBargeInAudioEndMs: asNonNegativeInteger(raw?.minBargeInAudioEndMs),
    azureEndpoint: trimToUndefined(raw?.azureEndpoint),
    azureDeployment: trimToUndefined(raw?.azureDeployment),
    azureApiVersion: trimToUndefined(raw?.azureApiVersion),
  };
 }

+function asNonNegativeInteger(value: unknown): number | undefined {
+  const number = asFiniteNumber(value);
+  return number === undefined || number < 0 ? undefined : Math.floor(number);
+}
+
 type OpenAIRealtimeApiKeyResolution =
  | { status: "available"; value: string }
  | { status: "missing" };
@@ -815,6 +824,19 @@ class OpenAIRealtimeVoiceBridge implements RealtimeVoiceBridge {
      responseStartTimestamp !== null &&
      assistantItemId !== null &&
      (this.markQueue.length > 0 || options?.audioPlaybackActive === true);
+    const audioEndMs = shouldInterruptProvider
+      ? Math.max(0, this.latestMediaTimestamp - responseStartTimestamp)
+      : null;
+    const minBargeInAudioEndMs =
+      this.config.minBargeInAudioEndMs ?? OPENAI_REALTIME_DEFAULT_MIN_BARGE_IN_AUDIO_END_MS;
+    if (audioEndMs !== null && audioEndMs < minBargeInAudioEndMs) {
+      this.config.onEvent?.({
+        direction: "client",
+        type: "conversation.item.truncate.skipped",
+        detail: `reason=barge-in audioEndMs=${audioEndMs} minAudioEndMs=${minBargeInAudioEndMs}`,
+      });
+      return;
+    }
    if (
      options?.audioPlaybackActive === true &&
      this.responseActive &&
@@ -824,8 +846,6 @@ class OpenAIRealtimeVoiceBridge implements RealtimeVoiceBridge {
      this.responseCancelInFlight = true;
    }
    if (shouldInterruptProvider) {
-      const elapsedMs = this.latestMediaTimestamp - responseStartTimestamp;
-      const audioEndMs = Math.max(0, elapsedMs);
      this.sendEvent(
        {
          type: "conversation.item.truncate",
@@ -1074,6 +1094,7 @@ export function buildOpenAIRealtimeVoiceProvider(): RealtimeVoiceProviderPlugin
        prefixPaddingMs: config.prefixPaddingMs,
        interruptResponseOnInputAudio:
          req.interruptResponseOnInputAudio ?? config.interruptResponseOnInputAudio,
+        minBargeInAudioEndMs: config.minBargeInAudioEndMs,
        azureEndpoint: config.azureEndpoint,
        azureDeployment: config.azureDeployment,
        azureApiVersion: config.azureApiVersion,
--- a/src/config/bundled-channel-config-metadata.generated.ts
+++ b/src/config/bundled-channel-config-metadata.generated.ts
--- a/src/config/types.discord.ts
+++ b/src/config/types.discord.ts
@@ -150,6 +150,8 @@ export type DiscordVoiceRealtimeConfig = {
  consultPolicy?: DiscordVoiceRealtimeConsultPolicy;
  /** Allow Discord speaker-start events to interrupt active realtime playback. */
  bargeIn?: boolean;
+  /** Minimum assistant playback duration before a barge-in truncates audio. Default: 250ms; set 0 for immediate interruption. */
+  minBargeInAudioEndMs?: number;
  /** Debounce window before buffered transcripts are sent to the OpenClaw agent. */
  debounceMs?: number;
  /** Provider-specific realtime voice config keyed by provider id. */
--- a/src/config/zod-schema.providers-core.ts
+++ b/src/config/zod-schema.providers-core.ts
@@ -551,6 +551,7 @@ const DiscordVoiceRealtimeSchema = z
    toolPolicy: DiscordVoiceRealtimeToolPolicySchema.optional(),
    consultPolicy: DiscordVoiceRealtimeConsultPolicySchema.optional(),
    bargeIn: z.boolean().optional(),
+    minBargeInAudioEndMs: z.number().int().min(0).max(10_000).optional(),
    debounceMs: z.number().int().positive().max(10_000).optional(),
    providers: z.record(z.string(), z.record(z.string(), z.unknown()).optional()).optional(),
  })