mirror of
https://github.com/moltbot/moltbot.git
synced 2026-05-13 23:56:07 +00:00
test(docker): add observability smoke
Add Docker aggregate observability coverage for QA-lab OTEL and Prometheus diagnostics.
This commit is contained in:
@@ -19,6 +19,7 @@ Docs: https://docs.openclaw.ai
|
||||
- Providers/Ollama: honor `/api/show` capabilities when registering local models so non-tool Ollama models no longer receive the agent tool surface, and keep native Ollama thinking opt-in instead of enabling it by default. Fixes #64710 and duplicate #65343. Thanks @yuan-b, @netherby, @xilopaint, and @Diyforfun2026.
|
||||
- Providers/Ollama: expose native Ollama thinking effort levels so `/think max` is accepted for reasoning-capable Ollama models and maps to Ollama's highest supported `think` effort. Fixes #71584. Thanks @g0st1n.
|
||||
- Agents/Ollama: validate explicit `--thinking max` against catalog-discovered Ollama reasoning metadata so local agent runs accept the same native thinking levels shown in the model catalog. Fixes #71584. Thanks @g0st1n.
|
||||
- Docker/QA: add observability coverage to the normal Docker aggregate so QA-lab OTEL and Prometheus diagnostics run inside Docker. Thanks @vincentkoc.
|
||||
- Auto-reply: poison inbound message dedupe after replay-unsafe provider/runtime failures so retries stay safe before visible progress but cannot duplicate messages after block output, tool side effects, or session progress. Fixes #69303; keeps #58549 and #64606 as duplicate validation. Thanks @martingarramon, @NikolaFC, and @zeroth-blip.
|
||||
- Agents/model fallback: jump directly to a known later live-session model redirect instead of walking unrelated fallback candidates, while preserving the already-landed live-session/fallback loop guard. Fixes #57471; related loop family already closed via #58496. Thanks @yuxiaoyang2007-prog.
|
||||
- Gateway/Bonjour: keep @homebridge/ciao cancellation handlers registered across advertiser restarts so late probing cancellations cannot crash Linux and other mDNS-churned gateways. Thanks @codex.
|
||||
|
||||
@@ -65,6 +65,14 @@ model calls must not export `StreamAbandoned` on successful turns; raw diagnosti
|
||||
`openclaw.content.*` attributes must stay out of the trace. It writes
|
||||
`otel-smoke-summary.json` next to the QA suite artifacts.
|
||||
|
||||
The normal Docker aggregate also runs an observability lane. It builds or
|
||||
reuses a source-backed Docker observability image, runs the OTEL trace smoke
|
||||
inside the container, then runs the `docker-prometheus-smoke` QA scenario with the
|
||||
`diagnostics-prometheus` plugin enabled. Set
|
||||
`OPENCLAW_DOCKER_OBSERVABILITY_LOOPS=<count>` to repeat both checks inside one
|
||||
Docker run while preserving per-loop artifacts under
|
||||
`.artifacts/docker-observability/...`.
|
||||
|
||||
For a transport-real Matrix smoke lane, run:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -617,6 +617,7 @@ The live-model Docker runners also bind-mount only the needed CLI auth homes (or
|
||||
- CLI backend smoke: `pnpm test:docker:live-cli-backend` (script: `scripts/test-live-cli-backend-docker.sh`)
|
||||
- Codex app-server harness smoke: `pnpm test:docker:live-codex-harness` (script: `scripts/test-live-codex-harness-docker.sh`)
|
||||
- Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)
|
||||
- Docker observability smoke: included in `pnpm test:docker:all` and `pnpm test:docker:local:all` (script: `scripts/e2e/docker-observability-smoke.sh`). It runs QA-lab OTEL and Prometheus diagnostics checks inside a source-backed Docker image. Set `OPENCLAW_DOCKER_OBSERVABILITY_LOOPS=<count>` to repeat both checks in one container run.
|
||||
- Open WebUI live smoke: `pnpm test:docker:openwebui` (script: `scripts/e2e/openwebui-docker.sh`)
|
||||
- Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)
|
||||
- Npm tarball onboarding/channel/agent smoke: `pnpm test:docker:npm-onboard-channel-agent` installs the packed OpenClaw tarball globally in Docker, configures OpenAI via env-ref onboarding plus Telegram by default, verifies doctor repairs activated plugin runtime deps, and runs one mocked OpenAI agent turn. Reuse a prebuilt tarball with `OPENCLAW_CURRENT_PACKAGE_TGZ=/path/to/openclaw-*.tgz`, skip the host rebuild with `OPENCLAW_NPM_ONBOARD_HOST_BUILD=0`, or switch channel with `OPENCLAW_NPM_ONBOARD_CHANNEL=discord`.
|
||||
|
||||
156
qa/scenarios/runtime/docker-prometheus-smoke.md
Normal file
156
qa/scenarios/runtime/docker-prometheus-smoke.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# Docker Prometheus smoke
|
||||
|
||||
```yaml qa-scenario
|
||||
id: docker-prometheus-smoke
|
||||
title: Docker Prometheus smoke
|
||||
surface: telemetry
|
||||
coverage:
|
||||
primary:
|
||||
- telemetry.prometheus
|
||||
secondary:
|
||||
- harness.qa-lab
|
||||
- docker.e2e
|
||||
objective: Verify a QA-lab gateway run emits protected, bounded Prometheus diagnostics metrics through the diagnostics-prometheus plugin.
|
||||
successCriteria:
|
||||
- The diagnostics-prometheus plugin exposes the protected scrape route.
|
||||
- An unauthenticated scrape is rejected.
|
||||
- A minimal QA-channel agent turn completes.
|
||||
- The authenticated scrape includes release-critical diagnostics metric families.
|
||||
- Prometheus output omits prompt content, session keys, auth tokens, raw ids, and file paths.
|
||||
plugins:
|
||||
- diagnostics-prometheus
|
||||
gatewayConfigPatch:
|
||||
diagnostics:
|
||||
enabled: true
|
||||
docsRefs:
|
||||
- docs/gateway/prometheus.md
|
||||
- docs/concepts/qa-e2e-automation.md
|
||||
codeRefs:
|
||||
- extensions/diagnostics-prometheus/src/service.ts
|
||||
- src/diagnostics/internal-diagnostics.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: flow
|
||||
summary: Complete a minimal QA-lab turn and scrape the protected Prometheus route.
|
||||
config:
|
||||
prompt: Reply exactly DOCKER-PROMETHEUS-OK. Do not repeat DOCKER-PROMETHEUS-SECRET.
|
||||
secretNeedle: DOCKER-PROMETHEUS-SECRET
|
||||
```
|
||||
|
||||
```yaml qa-flow
|
||||
steps:
|
||||
- name: emits protected low-cardinality prometheus metrics
|
||||
actions:
|
||||
- call: waitForGatewayHealthy
|
||||
args:
|
||||
- ref: env
|
||||
- 60000
|
||||
- call: waitForQaChannelReady
|
||||
args:
|
||||
- ref: env
|
||||
- 60000
|
||||
- call: reset
|
||||
- set: startCursor
|
||||
value:
|
||||
expr: state.getSnapshot().messages.length
|
||||
- call: runAgentPrompt
|
||||
args:
|
||||
- ref: env
|
||||
- sessionKey: agent:qa:docker-prometheus-smoke
|
||||
message:
|
||||
expr: config.prompt
|
||||
timeoutMs:
|
||||
expr: liveTurnTimeoutMs(env, 30000)
|
||||
- call: waitForCondition
|
||||
saveAs: outbound
|
||||
args:
|
||||
- lambda:
|
||||
expr: "state.getSnapshot().messages.slice(startCursor).filter((candidate) => candidate.direction === 'outbound' && candidate.conversation.id === 'qa-operator' && String(candidate.text ?? '').trim().length > 0).at(-1)"
|
||||
- expr: liveTurnTimeoutMs(env, 30000)
|
||||
- expr: "env.providerMode === 'mock-openai' ? 100 : 250"
|
||||
- assert:
|
||||
expr: "String(outbound.text ?? '').trim().length > 0"
|
||||
message: "expected non-empty qa output before scraping metrics"
|
||||
- set: prometheusUrl
|
||||
value:
|
||||
expr: "`${env.gateway.baseUrl}/api/diagnostics/prometheus`"
|
||||
- set: gatewayToken
|
||||
value:
|
||||
expr: "String(env.gateway.token ?? env.gateway.runtimeEnv.OPENCLAW_GATEWAY_TOKEN ?? '')"
|
||||
- assert:
|
||||
expr: "gatewayToken.length > 0"
|
||||
message: "expected QA gateway token to be available for protected scrape"
|
||||
- set: unauthenticatedScrape
|
||||
value:
|
||||
expr: |-
|
||||
(async () => {
|
||||
const response = await fetch(prometheusUrl);
|
||||
await response.text().catch(() => "");
|
||||
return { status: response.status };
|
||||
})()
|
||||
- assert:
|
||||
expr: "unauthenticatedScrape.status === 401 || unauthenticatedScrape.status === 403"
|
||||
message:
|
||||
expr: "`expected unauthenticated prometheus scrape to be rejected, got ${unauthenticatedScrape.status}`"
|
||||
- set: authenticatedScrape
|
||||
value:
|
||||
expr: |-
|
||||
(async () => {
|
||||
const response = await fetch(prometheusUrl, {
|
||||
headers: { authorization: `Bearer ${gatewayToken}` },
|
||||
});
|
||||
const text = await response.text();
|
||||
return {
|
||||
status: response.status,
|
||||
contentType: response.headers.get("content-type") ?? "",
|
||||
text,
|
||||
};
|
||||
})()
|
||||
- assert:
|
||||
expr: "authenticatedScrape.status === 200"
|
||||
message:
|
||||
expr: "`expected authenticated prometheus scrape to return 200, got ${authenticatedScrape.status}`"
|
||||
- assert:
|
||||
expr: "authenticatedScrape.contentType.includes('text/plain')"
|
||||
message:
|
||||
expr: "`expected prometheus text content type, got ${authenticatedScrape.contentType}`"
|
||||
- set: prometheusText
|
||||
value:
|
||||
expr: "String(authenticatedScrape.text ?? '')"
|
||||
- assert:
|
||||
expr: "prometheusText.includes('# TYPE openclaw_run_completed_total counter')"
|
||||
message: "missing run completion counter"
|
||||
- assert:
|
||||
expr: "prometheusText.includes('# TYPE openclaw_run_duration_seconds histogram')"
|
||||
message: "missing run duration histogram"
|
||||
- assert:
|
||||
expr: "prometheusText.includes('# TYPE openclaw_model_call_total counter')"
|
||||
message: "missing model call counter"
|
||||
- assert:
|
||||
expr: "prometheusText.includes('# TYPE openclaw_harness_run_total counter')"
|
||||
message: "missing harness run counter"
|
||||
- assert:
|
||||
expr: "!prometheusText.includes(config.secretNeedle)"
|
||||
message: "prometheus output leaked prompt sentinel"
|
||||
- assert:
|
||||
expr: "!prometheusText.includes('DOCKER-PROMETHEUS-OK')"
|
||||
message: "prometheus output leaked response content"
|
||||
- assert:
|
||||
expr: "!prometheusText.includes('agent:qa:docker-prometheus-smoke')"
|
||||
message: "prometheus output leaked the session key"
|
||||
- assert:
|
||||
expr: "!prometheusText.includes(gatewayToken)"
|
||||
message: "prometheus output leaked the gateway token"
|
||||
- assert:
|
||||
expr: "!/runId|sessionId|sessionKey|callId|toolCallId|messageId|providerRequestId/.test(prometheusText)"
|
||||
message: "prometheus output leaked raw diagnostic identifiers"
|
||||
- assert:
|
||||
expr: "!/\\/tmp\\/|\\/private\\/tmp\\/|\\/app\\//.test(prometheusText)"
|
||||
message: "prometheus output leaked a local file path"
|
||||
- assert:
|
||||
expr: "!prometheusText.includes('openclaw.content.')"
|
||||
message: "prometheus output leaked content attributes"
|
||||
- assert:
|
||||
expr: "!/openclaw_prometheus_series_dropped_total(?:\\{[^}]*\\})?\\s+(?!0(?:\\.0+)?(?:\\s|$))/.test(prometheusText)"
|
||||
message: "prometheus dropped series during the smoke"
|
||||
```
|
||||
55
scripts/e2e/Dockerfile.observability
Normal file
55
scripts/e2e/Dockerfile.observability
Normal file
@@ -0,0 +1,55 @@
|
||||
# syntax=docker/dockerfile:1.7
|
||||
|
||||
FROM node:24-bookworm-slim@sha256:e8e2e91b1378f83c5b2dd15f0247f34110e2fe895f6ca7719dbb780f929368eb AS observability-runner
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends ca-certificates git \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN corepack enable
|
||||
|
||||
RUN useradd --create-home --shell /bin/bash appuser \
|
||||
&& mkdir -p /app \
|
||||
&& chown appuser:appuser /app
|
||||
|
||||
ENV HOME="/home/appuser"
|
||||
ENV NODE_OPTIONS="--disable-warning=ExperimentalWarning"
|
||||
ENV OPENCLAW_DISABLE_BONJOUR="1"
|
||||
|
||||
USER appuser
|
||||
WORKDIR /app
|
||||
|
||||
COPY --chown=appuser:appuser package.json pnpm-lock.yaml pnpm-workspace.yaml .npmrc ./
|
||||
COPY --chown=appuser:appuser ui/package.json ./ui/package.json
|
||||
COPY --chown=appuser:appuser patches ./patches
|
||||
COPY --chown=appuser:appuser scripts/postinstall-bundled-plugins.mjs scripts/preinstall-package-manager-warning.mjs scripts/npm-runner.mjs scripts/windows-cmd-helpers.mjs ./scripts/
|
||||
RUN --mount=type=bind,source=extensions,target=/tmp/extensions,readonly \
|
||||
find /tmp/extensions -mindepth 2 -maxdepth 2 -name package.json -print | \
|
||||
while IFS= read -r manifest; do \
|
||||
dest="${manifest#/tmp/}"; \
|
||||
mkdir -p "$(dirname "$dest")"; \
|
||||
cp "$manifest" "$dest"; \
|
||||
done
|
||||
|
||||
RUN --mount=type=cache,id=openclaw-pnpm-store,target=/home/appuser/.local/share/pnpm/store,sharing=locked \
|
||||
pnpm install --frozen-lockfile
|
||||
|
||||
COPY --chown=appuser:appuser .oxlintrc.json tsconfig.json tsconfig.plugin-sdk.dts.json tsconfig.oxlint*.json tsdown.config.ts vitest.config.ts openclaw.mjs ./
|
||||
COPY --chown=appuser:appuser src ./src
|
||||
COPY --chown=appuser:appuser test ./test
|
||||
COPY --chown=appuser:appuser scripts ./scripts
|
||||
COPY --chown=appuser:appuser docs ./docs
|
||||
COPY --chown=appuser:appuser packages ./packages
|
||||
COPY --chown=appuser:appuser qa ./qa
|
||||
COPY --chown=appuser:appuser skills ./skills
|
||||
COPY --chown=appuser:appuser ui ./ui
|
||||
COPY --chown=appuser:appuser extensions ./extensions
|
||||
COPY --chown=appuser:appuser vendor/a2ui/renderers/lit ./vendor/a2ui/renderers/lit
|
||||
COPY --chown=appuser:appuser apps/shared/OpenClawKit/Sources/OpenClawKit/Resources ./apps/shared/OpenClawKit/Sources/OpenClawKit/Resources
|
||||
COPY --chown=appuser:appuser apps/shared/OpenClawKit/Tools/CanvasA2UI ./apps/shared/OpenClawKit/Tools/CanvasA2UI
|
||||
|
||||
RUN pnpm build
|
||||
RUN mkdir -p dist/control-ui \
|
||||
&& printf '%s\n' '<!doctype html><title>OpenClaw Control UI</title>' > dist/control-ui/index.html
|
||||
|
||||
CMD ["bash"]
|
||||
52
scripts/e2e/docker-observability-smoke.sh
Normal file
52
scripts/e2e/docker-observability-smoke.sh
Normal file
@@ -0,0 +1,52 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
source "$ROOT_DIR/scripts/lib/docker-e2e-image.sh"
|
||||
|
||||
IMAGE_NAME="$(docker_e2e_resolve_image "openclaw-docker-observability-e2e:local" OPENCLAW_DOCKER_OBSERVABILITY_E2E_IMAGE)"
|
||||
SKIP_BUILD="${OPENCLAW_DOCKER_OBSERVABILITY_E2E_SKIP_BUILD:-0}"
|
||||
LOOPS="${OPENCLAW_DOCKER_OBSERVABILITY_LOOPS:-1}"
|
||||
OUTPUT_DIR="${OPENCLAW_DOCKER_OBSERVABILITY_OUTPUT_DIR:-$ROOT_DIR/.artifacts/docker-observability/$(date +%Y%m%d-%H%M%S)}"
|
||||
|
||||
if ! [[ "$LOOPS" =~ ^[1-9][0-9]*$ ]]; then
|
||||
echo "OPENCLAW_DOCKER_OBSERVABILITY_LOOPS must be a positive integer, got: $LOOPS" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
docker_e2e_build_or_reuse "$IMAGE_NAME" docker-observability "$ROOT_DIR/scripts/e2e/Dockerfile.observability" "$ROOT_DIR" "" "$SKIP_BUILD"
|
||||
|
||||
echo "Running Docker observability smoke with $LOOPS loop(s)..."
|
||||
run_logged docker-observability docker run --rm \
|
||||
-e "OPENCLAW_DOCKER_OBSERVABILITY_LOOPS=$LOOPS" \
|
||||
-v "$OUTPUT_DIR:/app/.artifacts/docker-observability-current" \
|
||||
"$IMAGE_NAME" \
|
||||
bash -lc '
|
||||
set -euo pipefail
|
||||
|
||||
loops="${OPENCLAW_DOCKER_OBSERVABILITY_LOOPS:-1}"
|
||||
artifact_root=".artifacts/docker-observability-current"
|
||||
mkdir -p "$artifact_root"
|
||||
|
||||
for i in $(seq 1 "$loops"); do
|
||||
iteration_dir="$artifact_root/loop-$i"
|
||||
mkdir -p "$iteration_dir"
|
||||
|
||||
echo "== docker observability loop $i/$loops: otel =="
|
||||
pnpm qa:otel:smoke \
|
||||
--provider-mode mock-openai \
|
||||
--output-dir "$iteration_dir/otel"
|
||||
|
||||
echo "== docker observability loop $i/$loops: prometheus =="
|
||||
pnpm openclaw qa suite \
|
||||
--provider-mode mock-openai \
|
||||
--scenario docker-prometheus-smoke \
|
||||
--concurrency 1 \
|
||||
--fast \
|
||||
--output-dir "$iteration_dir/prometheus"
|
||||
done
|
||||
'
|
||||
|
||||
echo "Docker observability smoke passed. Artifacts: $OUTPUT_DIR"
|
||||
@@ -25,7 +25,10 @@ function lane(name, command, options = {}) {
|
||||
return {
|
||||
cacheKey: options.cacheKey,
|
||||
command,
|
||||
e2eImageKind: options.e2eImageKind ?? (options.live ? undefined : "functional"),
|
||||
e2eImageKind:
|
||||
options.e2eImageKind === false
|
||||
? undefined
|
||||
: (options.e2eImageKind ?? (options.live ? undefined : "functional")),
|
||||
estimateSeconds: options.estimateSeconds,
|
||||
live: options.live === true,
|
||||
name,
|
||||
@@ -181,6 +184,10 @@ export const mainLanes = [
|
||||
{ resources: ["service"], weight: 3 },
|
||||
),
|
||||
serviceLane("gateway-network", "OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:gateway-network"),
|
||||
serviceLane("observability", "bash scripts/e2e/docker-observability-smoke.sh", {
|
||||
e2eImageKind: false,
|
||||
weight: 3,
|
||||
}),
|
||||
serviceLane(
|
||||
"agents-delete-shared-workspace",
|
||||
"OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:agents-delete-shared-workspace",
|
||||
|
||||
Reference in New Issue
Block a user