fix(qa-lab): refresh parity models and approval timeout

Summary: - refresh QA parity workflow model refs to Opus 4.7 / GPT-5.5-alt - raise approval-turn-tool-followthrough mock fallback timeouts to 60s - credit the original contributor in the changelog Verification: - OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm build - mock-openai approval-turn scenario passed 1/1 for openai/gpt-5.5 + openai/gpt-5.5-alt - mock-openai approval-turn scenario passed 1/1 for anthropic/claude-opus-4-7 + anthropic/claude-sonnet-4-7 - pnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/qa-gateway-config.test.ts extensions/qa-lab/src/suite-planning.test.ts extensions/qa-lab/src/cli.runtime.test.ts - pnpm check:workflows - pnpm check:test-types - pnpm exec oxfmt --check --threads=1 .github/workflows/openclaw-release-checks.yml .github/workflows/qa-live-transports-convex.yml CHANGELOG.md qa/scenarios/runtime/approval-turn-tool-followthrough.md - git diff --check origin/main...HEAD
2026-05-13 15:47:28 +00:00 · 2026-05-09 03:22:55 -04:00
parent 63b8013b44
commit 44d7d6fd52
4 changed files with 15 additions and 13 deletions
--- a/.github/workflows/openclaw-release-checks.yml
+++ b/.github/workflows/openclaw-release-checks.yml
@@ -705,11 +705,11 @@ jobs:
          case "${QA_PARITY_LANE}" in
            candidate)
              model="${OPENCLAW_CI_OPENAI_MODEL}"
-              alt_model="openai/gpt-5.4-alt"
+              alt_model="openai/gpt-5.5-alt"
              ;;
            baseline)
-              model="anthropic/claude-opus-4-6"
-              alt_model="anthropic/claude-sonnet-4-6"
+              model="anthropic/claude-opus-4-7"
+              alt_model="anthropic/claude-sonnet-4-7"
              ;;
            *)
              echo "Unknown QA parity lane: ${QA_PARITY_LANE}" >&2
@@ -779,7 +779,7 @@ jobs:
            --candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
            --baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
            --candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
-            --baseline-label anthropic/claude-opus-4-6 \
+            --baseline-label anthropic/claude-opus-4-7 \
            --output-dir .artifacts/qa-e2e/parity

      - name: Upload parity artifacts
--- a/.github/workflows/qa-live-transports-convex.yml
+++ b/.github/workflows/qa-live-transports-convex.yml
@@ -187,17 +187,17 @@ jobs:
            --parity-pack agentic \
            --concurrency "${QA_PARITY_CONCURRENCY}" \
            --model "${OPENCLAW_CI_OPENAI_MODEL}" \
-            --alt-model openai/gpt-5.4-alt \
+            --alt-model openai/gpt-5.5-alt \
            --output-dir .artifacts/qa-e2e/gpt54

-      - name: Run Opus 4.6 lane
+      - name: Run Opus 4.7 lane
        run: |
          pnpm openclaw qa suite \
            --provider-mode mock-openai \
            --parity-pack agentic \
            --concurrency "${QA_PARITY_CONCURRENCY}" \
-            --model anthropic/claude-opus-4-6 \
-            --alt-model anthropic/claude-sonnet-4-6 \
+            --model anthropic/claude-opus-4-7 \
+            --alt-model anthropic/claude-sonnet-4-7 \
            --output-dir .artifacts/qa-e2e/opus46

      - name: Generate parity report
@@ -207,7 +207,7 @@ jobs:
            --candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
            --baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
            --candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
-            --baseline-label anthropic/claude-opus-4-6 \
+            --baseline-label anthropic/claude-opus-4-7 \
            --output-dir .artifacts/qa-e2e/parity

      - name: Upload parity artifacts
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -147,6 +147,8 @@ Docs: https://docs.openclaw.ai
 - Cron/agents: recognize same-target `edit`↔`write` recovery in `isSameToolMutationAction`, so a successful `write` to a path clears an earlier failed `edit` on the same path. Stops cron from reporting fatal failures when an agent self-heals across `edit` and `write`, while preserving same-tool fingerprint matching, blocking different-target writes, and excluding tools (including `apply_patch`) whose real call args do not produce a stable `path` fingerprint segment. Fixes #79024. Thanks @RenzoMXD.
 - Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
 - CLI/router: when `openclaw <name>` does not match a CLI subcommand, check plugin tool manifests first so names like `lcm_recent` get an agent-tool diagnostic instead of the misleading suggestion to add the tool name to `plugins.allow`. Fixes #77214. Thanks @100yenadmin.
+- QA-lab/parity: bump the live mock-openai parity baseline from `claude-opus-4-6`/`claude-sonnet-4-6` to `claude-opus-4-7`/`claude-sonnet-4-7` and the candidate alt from `gpt-5.4-alt` to `gpt-5.5-alt` in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml`, matching the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main. Carries forward the surface-bump portion of #74290. Thanks @100yenadmin.
+- QA-lab/scenarios: raise the `approval-turn-tool-followthrough` per-turn fallback timeouts from 20s/30s to 60s so cold mock-gateway parity runs do not flake on the approval-turn chain. Carries forward the timeout-bump portion of #74290. Thanks @100yenadmin.
 - Agents/compaction: keep the recent tail after manual `/compact` when Pi returns an empty or no-op compaction summary, preventing blank checkpoints from replacing the live context.
 - Native commands: handle slash commands before workspace and agent-reply bootstrap so Telegram `/status` and other command-only native replies do not wait behind full agent turn setup.
 - Plugins/Nix: allow externally configured plugin roots under `/nix/store` to load in `OPENCLAW_NIX_MODE=1` while keeping normal external plugin hardlink rejection unchanged. Thanks @joshp123.
--- a/qa/scenarios/runtime/approval-turn-tool-followthrough.md
+++ b/qa/scenarios/runtime/approval-turn-tool-followthrough.md
@@ -54,14 +54,14 @@ steps:
            message:
              expr: config.preActionPrompt
            timeoutMs:
-              expr: liveTurnTimeoutMs(env, 20000)
+              expr: liveTurnTimeoutMs(env, 60000)
      - call: waitForOutboundMessage
        args:
          - ref: state
          - lambda:
              params: [candidate]
              expr: "candidate.conversation.id === 'qa-operator'"
-          - expr: liveTurnTimeoutMs(env, 20000)
+          - expr: liveTurnTimeoutMs(env, 60000)
      - set: beforeApprovalCursor
        value:
          expr: state.getSnapshot().messages.length
@@ -72,7 +72,7 @@ steps:
            message:
              expr: config.approvalPrompt
            timeoutMs:
-              expr: liveTurnTimeoutMs(env, 30000)
+              expr: liveTurnTimeoutMs(env, 60000)
      - set: expectedReplyAny
        value:
          expr: config.expectedReplyAny.map(normalizeLowercaseStringOrEmpty)
@@ -81,7 +81,7 @@ steps:
        args:
          - lambda:
              expr: "state.getSnapshot().messages.slice(beforeApprovalCursor).filter((candidate) => candidate.direction === 'outbound' && candidate.conversation.id === 'qa-operator' && expectedReplyAny.some((needle) => normalizeLowercaseStringOrEmpty(candidate.text).includes(needle))).at(-1)"
-          - expr: liveTurnTimeoutMs(env, 20000)
+          - expr: liveTurnTimeoutMs(env, 60000)
          - expr: "env.providerMode === 'mock-openai' ? 100 : 250"
    detailsExpr: outbound.text
 ```