mirror of
https://github.com/moltbot/moltbot.git
synced 2026-05-13 15:47:28 +00:00
fix(qa-lab): refresh parity models and approval timeout
Summary: - refresh QA parity workflow model refs to Opus 4.7 / GPT-5.5-alt - raise approval-turn-tool-followthrough mock fallback timeouts to 60s - credit the original contributor in the changelog Verification: - OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm build - mock-openai approval-turn scenario passed 1/1 for openai/gpt-5.5 + openai/gpt-5.5-alt - mock-openai approval-turn scenario passed 1/1 for anthropic/claude-opus-4-7 + anthropic/claude-sonnet-4-7 - pnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/qa-gateway-config.test.ts extensions/qa-lab/src/suite-planning.test.ts extensions/qa-lab/src/cli.runtime.test.ts - pnpm check:workflows - pnpm check:test-types - pnpm exec oxfmt --check --threads=1 .github/workflows/openclaw-release-checks.yml .github/workflows/qa-live-transports-convex.yml CHANGELOG.md qa/scenarios/runtime/approval-turn-tool-followthrough.md - git diff --check origin/main...HEAD
This commit is contained in:
committed by
GitHub
parent
63b8013b44
commit
44d7d6fd52
@@ -705,11 +705,11 @@ jobs:
|
||||
case "${QA_PARITY_LANE}" in
|
||||
candidate)
|
||||
model="${OPENCLAW_CI_OPENAI_MODEL}"
|
||||
alt_model="openai/gpt-5.4-alt"
|
||||
alt_model="openai/gpt-5.5-alt"
|
||||
;;
|
||||
baseline)
|
||||
model="anthropic/claude-opus-4-6"
|
||||
alt_model="anthropic/claude-sonnet-4-6"
|
||||
model="anthropic/claude-opus-4-7"
|
||||
alt_model="anthropic/claude-sonnet-4-7"
|
||||
;;
|
||||
*)
|
||||
echo "Unknown QA parity lane: ${QA_PARITY_LANE}" >&2
|
||||
@@ -779,7 +779,7 @@ jobs:
|
||||
--candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
|
||||
--baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
|
||||
--candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
|
||||
--baseline-label anthropic/claude-opus-4-6 \
|
||||
--baseline-label anthropic/claude-opus-4-7 \
|
||||
--output-dir .artifacts/qa-e2e/parity
|
||||
|
||||
- name: Upload parity artifacts
|
||||
|
||||
10
.github/workflows/qa-live-transports-convex.yml
vendored
10
.github/workflows/qa-live-transports-convex.yml
vendored
@@ -187,17 +187,17 @@ jobs:
|
||||
--parity-pack agentic \
|
||||
--concurrency "${QA_PARITY_CONCURRENCY}" \
|
||||
--model "${OPENCLAW_CI_OPENAI_MODEL}" \
|
||||
--alt-model openai/gpt-5.4-alt \
|
||||
--alt-model openai/gpt-5.5-alt \
|
||||
--output-dir .artifacts/qa-e2e/gpt54
|
||||
|
||||
- name: Run Opus 4.6 lane
|
||||
- name: Run Opus 4.7 lane
|
||||
run: |
|
||||
pnpm openclaw qa suite \
|
||||
--provider-mode mock-openai \
|
||||
--parity-pack agentic \
|
||||
--concurrency "${QA_PARITY_CONCURRENCY}" \
|
||||
--model anthropic/claude-opus-4-6 \
|
||||
--alt-model anthropic/claude-sonnet-4-6 \
|
||||
--model anthropic/claude-opus-4-7 \
|
||||
--alt-model anthropic/claude-sonnet-4-7 \
|
||||
--output-dir .artifacts/qa-e2e/opus46
|
||||
|
||||
- name: Generate parity report
|
||||
@@ -207,7 +207,7 @@ jobs:
|
||||
--candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
|
||||
--baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
|
||||
--candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
|
||||
--baseline-label anthropic/claude-opus-4-6 \
|
||||
--baseline-label anthropic/claude-opus-4-7 \
|
||||
--output-dir .artifacts/qa-e2e/parity
|
||||
|
||||
- name: Upload parity artifacts
|
||||
|
||||
@@ -147,6 +147,8 @@ Docs: https://docs.openclaw.ai
|
||||
- Cron/agents: recognize same-target `edit`↔`write` recovery in `isSameToolMutationAction`, so a successful `write` to a path clears an earlier failed `edit` on the same path. Stops cron from reporting fatal failures when an agent self-heals across `edit` and `write`, while preserving same-tool fingerprint matching, blocking different-target writes, and excluding tools (including `apply_patch`) whose real call args do not produce a stable `path` fingerprint segment. Fixes #79024. Thanks @RenzoMXD.
|
||||
- Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
|
||||
- CLI/router: when `openclaw <name>` does not match a CLI subcommand, check plugin tool manifests first so names like `lcm_recent` get an agent-tool diagnostic instead of the misleading suggestion to add the tool name to `plugins.allow`. Fixes #77214. Thanks @100yenadmin.
|
||||
- QA-lab/parity: bump the live mock-openai parity baseline from `claude-opus-4-6`/`claude-sonnet-4-6` to `claude-opus-4-7`/`claude-sonnet-4-7` and the candidate alt from `gpt-5.4-alt` to `gpt-5.5-alt` in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml`, matching the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main. Carries forward the surface-bump portion of #74290. Thanks @100yenadmin.
|
||||
- QA-lab/scenarios: raise the `approval-turn-tool-followthrough` per-turn fallback timeouts from 20s/30s to 60s so cold mock-gateway parity runs do not flake on the approval-turn chain. Carries forward the timeout-bump portion of #74290. Thanks @100yenadmin.
|
||||
- Agents/compaction: keep the recent tail after manual `/compact` when Pi returns an empty or no-op compaction summary, preventing blank checkpoints from replacing the live context.
|
||||
- Native commands: handle slash commands before workspace and agent-reply bootstrap so Telegram `/status` and other command-only native replies do not wait behind full agent turn setup.
|
||||
- Plugins/Nix: allow externally configured plugin roots under `/nix/store` to load in `OPENCLAW_NIX_MODE=1` while keeping normal external plugin hardlink rejection unchanged. Thanks @joshp123.
|
||||
|
||||
@@ -54,14 +54,14 @@ steps:
|
||||
message:
|
||||
expr: config.preActionPrompt
|
||||
timeoutMs:
|
||||
expr: liveTurnTimeoutMs(env, 20000)
|
||||
expr: liveTurnTimeoutMs(env, 60000)
|
||||
- call: waitForOutboundMessage
|
||||
args:
|
||||
- ref: state
|
||||
- lambda:
|
||||
params: [candidate]
|
||||
expr: "candidate.conversation.id === 'qa-operator'"
|
||||
- expr: liveTurnTimeoutMs(env, 20000)
|
||||
- expr: liveTurnTimeoutMs(env, 60000)
|
||||
- set: beforeApprovalCursor
|
||||
value:
|
||||
expr: state.getSnapshot().messages.length
|
||||
@@ -72,7 +72,7 @@ steps:
|
||||
message:
|
||||
expr: config.approvalPrompt
|
||||
timeoutMs:
|
||||
expr: liveTurnTimeoutMs(env, 30000)
|
||||
expr: liveTurnTimeoutMs(env, 60000)
|
||||
- set: expectedReplyAny
|
||||
value:
|
||||
expr: config.expectedReplyAny.map(normalizeLowercaseStringOrEmpty)
|
||||
@@ -81,7 +81,7 @@ steps:
|
||||
args:
|
||||
- lambda:
|
||||
expr: "state.getSnapshot().messages.slice(beforeApprovalCursor).filter((candidate) => candidate.direction === 'outbound' && candidate.conversation.id === 'qa-operator' && expectedReplyAny.some((needle) => normalizeLowercaseStringOrEmpty(candidate.text).includes(needle))).at(-1)"
|
||||
- expr: liveTurnTimeoutMs(env, 20000)
|
||||
- expr: liveTurnTimeoutMs(env, 60000)
|
||||
- expr: "env.providerMode === 'mock-openai' ? 100 : 250"
|
||||
detailsExpr: outbound.text
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user