BrowserOS

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-18 11:06:19 +00:00

Author	SHA1	Message	Date
Nikhil	a59f96f657	test(agent): tighten claude acp command assertion (#1022 )	2026-05-17 11:28:59 -07:00
Nikhil	4f3d83c1ff	chore: server update (#1021 )	2026-05-17 10:19:23 -07:00
Nikhil	4b2e887fbb	fix(agent): resolve claude codex acp commands (#1020 )	2026-05-17 10:17:42 -07:00
Nikhil	d097aa5648	chore: bump version (#1012 )	2026-05-15 17:23:34 -07:00
Nikhil	2bd6c18dca	feat: add window visibility tool (#1011 ) * chore: update protocol json * feat: add window visibility tool * fix: address window visibility review comments	2026-05-15 17:20:12 -07:00
Nikhil	628586ce51	fix: remove unsupported CDP window visibility tool (#1007 )	2026-05-15 11:27:47 -07:00
shivammittal274	0026e7f41f	feat(mcp): X-BrowserOS-Default-Window-Id header + set_window_visibility tool (#1004 ) * feat(mcp): X-BrowserOS-Default-Window-Id header + set_window_visibility tool Two related additions for host applications that want to deterministically bind every browser tool call in an MCP session to a specific window. 1. Header `X-BrowserOS-Default-Window-Id` is honored in `register-mcp.ts`: before `executeTool` runs, the handler wrap reads the tool's zod input schema and, when it has a `windowId` field and `args.windowId` is undefined, injects the header value. Schema-driven — covers today's new_page / new_hidden_page / show_page / move_page, and any future tool with a windowId field, with no per-tool list to maintain. The agent's explicit `args.windowId` still wins when set. 2. New tool `set_window_visibility {windowId, visible}` — symmetric show/hide on existing windows, delegating to `cdp.Browser.showWindow`/`hideWindow`. Note: those CDP methods are in the inspector_protocol_config include list and the generated client types but the C++ implementation isn't wired yet (`is_hidden_` on Browser is const). The tool returns the runtime error verbatim so the host can degrade gracefully; once the Chromium patch lands, the tool starts working with no changes here. Companion changes on the agent-company side (separate PR) use the header to per-thread-bind windows + PATCH /threads/:id browserVisibility calls set_window_visibility on toggle. * fix(mcp): require integer in X-BrowserOS-Default-Window-Id parser Greptile flagged: `Number.isFinite` lets non-integer floats through (e.g. header value `"1.5"` parses to `1.5`), and that float would be injected as `args.windowId` into tools that forward it to CDP — which expects an integer windowId and rejects with an opaque protocol error. Swap to `Number.isInteger` to reject at the parse boundary instead.	2026-05-15 21:57:29 +05:30
Nikhil	0be59dccdd	fix: revert recent agent/server changes (#995 ) * Revert "fix(server): tolerate existing workspace dirs" This reverts commit `d7e1125db3`. * Revert "fix(server): support Gemini computer use requests" This reverts commit `8b6483a633`. * Revert "feat(agent): add reset controls for sessions and memory" This reverts commit `f54eff4543`. * Revert "fix: add cloud sync sign-in disclosure" This reverts commit `f1ebfa5232`. * Revert "fix: allow pasted images in agent text box" This reverts commit `b89ea201fa`. * fix(server): stabilize Hermes harness state paths * fix: address review feedback for PR #995	2026-05-11 14:26:56 -07:00
shivammittal274	dad2331448	refactor(agent): clean up hermes adapter structure (#994 )	2026-05-11 22:57:59 +05:30
Nikhil	d7e1125db3	fix(server): tolerate existing workspace dirs Fixes #974	2026-05-08 19:17:29 -07:00
Nikhil	8b6483a633	fix(server): support Gemini computer use requests Fixes #148	2026-05-08 19:12:07 -07:00
Nikhil	f54eff4543	feat(agent): add reset controls for sessions and memory Fixes #418	2026-05-08 19:06:44 -07:00
Dani Akash	4e405681a7	feat(container): richen ManagedContainer — isImageCurrent + logs + sibling-exec (#968 ) * feat(container): add isImageCurrent + getLogs + tailLogs + runOneShot to ManagedContainer Four base-class additions ahead of the OpenClaw runtime migration so the upcoming subclass doesn't have to re-implement them: - isImageCurrent() — pure predicate comparing the existing container's image ref to descriptor.defaultImage. Treats SHA-pinned variants as matches. start() is unchanged; subclasses + service layers compose the predicate where they want short-circuit behaviour. - getLogs(tail) and tailLogs(onLine) — generic log primitives, thin pass-throughs to ContainerCli. - runOneShot(argv, opts) — sibling-container helper that spawns a <name>-setup container with the same image+mounts+env (no ports/ health/restart), runs argv, force-removes after. Includes the retry-on-name-collision behaviour previously bespoke to OpenClaw. Hermes inherits unused surface only — no behavioural change. The in-flight base-class tests cover all four primitives. * fix(container): tighten getLogs error path + close runOneShot timeout-onLog leak; trim docstrings - getLogs now distinguishes a missing container (returns []) from other CLI failures (throws). Previously nerdctl's stderr ("Error: no such container: …") leaked into the lines array as if it were log output. isNoSuchContainer is exported from container-cli to share the predicate. - runWithOptionalTimeout wraps the caller's onLog so post-timeout lines from the abandoned runCommand promise become no-ops; before this, callers could see onLog fire after runOneShot had already rejected, hitting state the caller may have torn down on the timeout error. - Tightens the new docstrings to one short line per the project convention; drops a restating comment in the test file.	2026-05-08 15:58:05 +05:30
Dani Akash	b445615d61	refactor(claude+codex): migrate onto HostProcessAgentRuntime; collapse adapter-health (#967 ) * feat(runtime): add ClaudeRuntime + CodexRuntime + factories * refactor(host-adapters): switch wire-up + dispatch + health to runtime registry main.ts registers ClaudeRuntime + CodexRuntime alongside Hermes. ACP runtime resolves all three via the registry; legacy host-process spawn is preserved as a fallback so unit tests that don't bootstrap runtimes keep working. AdapterHealthChecker now reads runtime snapshots through the registry — the embedded execAsync probe, ADAPTER_HEALTH_COMMANDS table, and friendlyProbeFailure mapper delete. As a side-effect this also fixes the Hermes "Unavailable" chip (Hermes was missing from ADAPTER_HEALTH_COMMANDS). Drops the standalone claude-code/prepare.ts and codex/prepare.ts modules (their bodies are exported from the runtime files now). * test(runtime): cover ClaudeRuntime + CodexRuntime descriptor + prep + factory * fix(runtime): coalesce concurrent host-process probes; expose probedAt on snapshot * fix(runtime): preserve acpx-core npx-wrapped spawn for claude + codex The host-process runtimes were resolving the ACP spawn command through their own getAcpExecSpec, which returned argv [claude] / [codex] — bare binaries. acpx-core's built-in registry actually resolves these adapters to npx wrappers around the official ACP-aware packages (claude-agent-acp, codex-acp), and the package version range is owned by acpx-core. The bare-binary spawn would fail because either the binary is missing or doesn't speak ACP. Spawn dispatch now goes through registry.resolve() + wrapCommandWithEnv for claude/codex (matching pre-#967 behaviour). The runtime registrations still drive health probing and per-turn prep — only the spawn-command source-of-truth stays in acpx-core. Drops the misleading getAcpExecSpec from the host-process runtime classes. Regression test asserts the spawn command contains the npx package name (claude-agent-acp / codex-acp) for each adapter.	2026-05-08 13:02:19 +05:30
Dani Akash	d68e8905fe	refactor(hermes): migrate Hermes onto ContainerAgentRuntime (#965 ) * feat(runtime): add HermesContainerRuntime + factory * refactor(hermes): switch wire-up + dispatch to runtime registry main.ts and the agent route stack now resolve Hermes through `AgentRuntimeRegistry`. Drops the `hermesGateway` plumbing chain (server.ts → routes → harness → AcpxRuntime), the `HermesGatewayAccessor` interface, and `resolveHermesAcpCommand`. Removes `HermesContainerService`, `HermesContainer`, and `prepareHermesContext`'s standalone module — their behaviour is now owned by `HermesContainerRuntime`. * test(runtime): cover HermesContainerRuntime descriptor + lifecycle + factory * test(runtime): move registry reset to afterEach to survive assertion failures	2026-05-08 11:32:19 +05:30
Dani Akash	e89fccd997	feat(runtime): introduce AgentRuntime abstraction (types, interface, registry, abstract bases) (#964 ) * feat(runtime): introduce AgentRuntime types + interface + registry Foundation for the unified agent-runtime abstraction. No adapter migrates yet; the existing acpx-runtime, per-adapter prepare modules, OpenClawService, HermesContainerService, and adapter-health.ts all keep working unchanged. This commit adds the data layer of the abstraction: - `RuntimeDescriptor` discriminates the two kinds we ship today (`'container'` \| `'host-process'`). UI components route on this. - `RuntimeState` is the union of both kinds' states — container flow `not_installed → installing → installed → starting → running → stopped`, host flow `cli_missing \| cli_present \| cli_unhealthy`, plus the shared `errored` and `unsupported_platform` terminals. - `RuntimeStatusSnapshot` carries a single `isReady: boolean` so the harness has one bit to read before spawning turns. - `RuntimeAction` is a typed discriminated union — required args (e.g. `agentId` for `'reset-wipe-agent'`) are compile-time enforced, removing the previous footgun of optional args on a string-keyed dispatch. - `RuntimeCapability` lists every action a runtime can advertise; `getCapabilities()` is the single switchboard the UI uses to decide which buttons to render. `AgentRuntime` interface declares the contract every runtime implements: status snapshot + subscriber, capability list, `executeAction(action)`, `buildExecArgv(spec)`, and per-agent home dir. `prepareTurnContext` is intentionally absent until the first adapter migrates so callers can't depend on a method that has no implementation. `AgentRuntimeRegistry` is a small class + module-level singleton — adapters register themselves at boot, the harness/UI look up by `adapterId`. `resetAgentRuntimeRegistry()` is for tests only. Two error classes round it out: `ActionNotSupportedError` (capability gate, mapped to HTTP 405 in a later phase) and `RuntimeNotReadyError` (state gate at the runtime layer, distinct from the container-layer's `ContainerNotReadyError`). * feat(runtime): add ContainerAgentRuntime + HostProcessAgentRuntime abstract bases * test(runtime): cover state translation, action dispatch, registry * fix(runtime): gate host-process executeAction on capabilities; only stamp probe cache after probe resolves	2026-05-08 09:47:38 +05:30
Dani Akash	805ae8e607	feat(server): ManagedContainer abstraction — Hermes readiness gate + ACP layering fix (#962 ) * feat(container): add waitForContainerRunning primitive + typed error Adds `ContainerCli.waitForContainerRunning(name, opts)` polling `inspectContainer().running === true` until either the container reports running or the timeout expires. Distinct from the existing `waitForContainerNameRelease` (which waits for deletion). Used by the upcoming managed-container layer between `nerdctl create + start` and "container is ready for exec" so the harness never spawns a turn against a half-started container — which is the root cause of the silent first-turn failure on Hermes today (`hermes-container.ts:130-160` returns immediately after start). Defaults sized for cold-start: 30s budget at 500ms cadence. Throws `ContainerNotRunningError` (new, in `lib/vm/errors.ts`) on timeout — distinct from `ContainerNameReleaseTimeoutError` so callers can branch on "didn't come up" vs "didn't get cleaned up". * feat(container): add ManagedContainer abstract base + state machine Introduces the abstract base every container-backed agent adapter will subclass. Owns the canonical state machine (not_installed \| installing \| installed \| starting \| running \| stopped \| errored), the lifecycle lock (per-process promise chain + cross-process file lock), the gated `execute` family, and the host↔container path translator. Subclasses provide only what's actually adapter-specific: - `descriptor` (image, container name, supported platforms) - `buildContainerSpec()` for the `nerdctl create` args - `readinessProbe()` after the container reaches running - `mountRoots()` for the path translator Three execute methods, all sharing one invariant — every entry point gates on state == running: - `execProcess(spec)` spawns a long-lived child process via Bun, waits through `starting` up to 60s, throws typed `ContainerNotReadyError` if the container is not_installed / stopped / errored / timed out. - `execOneShot(spec)` is a buffered convenience wrapper. - `buildExecArgv(spec)` is the pure builder for callers (acpx-core) that need a shell-command string. Single source of truth for the `env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` chain that today's ACP runtime hand-rolls in two places (`acpx-runtime .ts:780-820` and `:823-870`). `reset(level)` is on the API surface but throws `ResetNotSupportedError` so the next PR can wire soft / wipe-agent / hard without revving the abstract class. Path translator uses lexical containment against declared mount roots; the realpath-based symlink-escape check lives one layer up (in the file-attribution code that already shipped) since the translator itself never reads from disk. feat(container): HermesContainer subclass + wrapper-service bridge `HermesContainer` (lib/container/managed/) is the first concrete adapter on the new `ManagedContainer` base. Provides the four bits that are actually adapter-specific: - `descriptor`: image, container name, supported platforms, readiness-probe tuning. - `mountRoots()`: host↔container path mapping for the harness dir. - `buildContainerSpec()`: nerdctl create args (env, mounts, add-hosts, entrypoint override). - `readinessProbe()`: execs `hermes --version` inside the freshly-started container; bypasses the state gate via `cli.exec` since we're in `starting`, not `running`, when the probe runs. `HermesContainerService` (api/services/hermes/) is rewritten as a thin wrapper that delegates `prewarm` / `start` / `stop` / `restart` / `shutdown` to the underlying `HermesContainer`. Public surface is preserved so `main.ts`, `server.ts`, and `agent-harness-service` compile unchanged in this PR; `getAccessor()` still returns the structural `HermesAccessor` the ACP runtime expects today (the runtime swap is the next commit). The wrapper also exposes `getContainer(): HermesContainer \| null` for callers that want the richer surface. The user-visible bug — Hermes silent first-turn failure — is fixed as a side effect: `start()` now waits through `cli.waitForContainerRunning` and runs the `hermes --version` readiness probe before transitioning to `running`. Subsequent chat turns are gated on the container actually being ready, not just on `nerdctl create + start` having returned. * feat(agent): ACP runtime spawns Hermes via ManagedContainer.buildExecArgv `resolveHermesAcpCommand` no longer hand-rolls the `env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` chain. It now delegates to `gateway.buildExecArgv`, which the wrapper service routes to the underlying `ManagedContainer.buildExecArgv`. The structural `HermesGatewayAccessor` type gains one method (`buildExecArgv`) — keeps the existing four getters so any test/legacy caller still works. The wrapper's `getAccessor()` delegates `buildExecArgv` to its `HermesContainer`. Net effect: the `limactl shell ... -- nerdctl exec ...` argv chain has exactly one owner (`ManagedContainer.buildExecArgv` in the container layer) instead of being duplicated across `acpx-runtime` and the now-deleted hand-built chain. The OpenClaw branch (`resolveOpenclawAcpCommand`) is untouched — its migration to ManagedContainer is a separate, larger PR that also has to model the gateway / control-plane surfaces. Tests: the existing acpx-runtime test suite expected the four old getters; updated the Hermes-container fixture to also provide `buildExecArgv` (mirrors the production builder inline so the test stays independent of the production class wiring). All 320 server tests pass. * test(container): managed-container + hermes-container coverage 20 cases across two files in `tests/lib/container/managed/`. ManagedContainer base (14 cases): - State machine: start() walks installing → starting → running; probe-false lands errored with lastError populated; stop() force-transitions to stopped even from errored. - execProcess gating: rejects ContainerNotReadyError with reason='not_installed' when never started; reason='errored' when in errored state (preserving lastError); resolves once state flips to running while waiting; reason='timeout' when starting never resolves. - buildExecArgv: snapshot test pinning the exact canonical `env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` string for the Hermes-shaped invocation; -e flags omitted when env is empty. - reset(level): throws ResetNotSupportedError for all three levels (Phase 1 stub). - Path translation: round-trip host ↔ container under a declared mount; mount-root itself translates without suffix; rejects PathOutsideMountsError for /etc/passwd / /proc/cpuinfo. - subscribeState fires every transition, stops after unsubscribe. HermesContainer subclass (6 cases): - Descriptor declares adapterId='hermes', the canonical container name, image, and darwin platform support. - start() happy path reaches running + invokes the `hermes --version` probe via cli.exec. - Probe-non-zero start() lands errored with the right error. - ContainerSpec built with idle entrypoint, harness bind-mount (source = /mnt/browseros/vm/hermes/harness, target = HERMES_CONTAINER_HARNESS_DIR), and host.containers.internal add-host pointing at the VM gateway. - toContainerPath maps host harness paths to /data/agents/harness. - buildExecArgv produces the canonical Hermes ACP spawn string with LIMA_HOME, container name, hermes binary path, and -e env. Pre-existing test in tests/lib/container/container-cli.test.ts (`waits until a container name is no longer resolvable`) flakes under parallel test load on dev; passes solo. Last touched in `fd5aba24`, well before this branch. * chore: tidy comments * fix(hermes): use provider:custom for openai + openai-compatible Hermes (v2026.4.x) does not have a provider key called "openai" — its `PROVIDER_REGISTRY` enumerates 33 named providers (anthropic, deepseek, gemini, kimi-coding, etc.) and "openai" is not one of them. Per the upstream docs, the canonical shape for any OpenAI-compatible endpoint with an API key is: model: provider: custom base_url: "<endpoint>" When `base_url` is set, Hermes ignores provider lookup and calls the URL directly using OPENAI_API_KEY (or the configured api_key). Today's mapping wrote `provider: "openai"` for both BrowserOS provider types — Hermes' main-model loader rejected that with `unknown provider 'openai'`, and the harness surfaced an opaque "Internal error" on every first chat for any Hermes agent backed by a Fireworks / Together / Groq / OpenAI provider. Fix: - `openai` and `openai-compatible` BrowserOS types now both map to `hermesProvider: 'custom'`. - HermesProviderMapping gains an optional `defaultBaseUrl` field used when `provider: 'custom'` is set with no caller-supplied baseUrl (BrowserOS' `openai` type doesn't require base_url at the API edge, but Hermes' `custom` always does — so we fall back to https://api.openai.com/v1). - writeHermesPerAgentProvider rejects `provider: 'custom'` with no base_url so a future regression fails loudly instead of silently writing an unusable config.yaml. Tests updated: the existing openai-compatible case now asserts `provider: "custom"` instead of `"openai"`, plus a new case covering the openai-default-base-url fallback path. Note: the `openrouter` mapping is left untouched because its fix is unverified (Hermes' PROVIDER_REGISTRY doesn't appear to contain "openrouter" either, but the auxiliary fallback chain recognises it). Worth a separate follow-up — out of scope for this fix which targets the user-reported reproduction. * fix(container): install() must ensure VM is ready before image pull Image operations run inside the Lima VM, so `nerdctl pull` fails on a cold-boot run if the VM hasn't been started yet. `HermesContainerService.prewarm()` (the original wrapper) always called `vm.ensureReady()` before `ensureImageLoaded()` — the wrapper-bridge introduced earlier in this PR delegated `prewarm()` to `container.install()` and dropped the VM-ensure step. `start()` does ensure VM, but on cold boot `prewarm()` and `start()` race for the lifecycle lock and there is no guarantee which one wins. When `prewarm()` lands first, the image pull crashes against an unstarted VM and Hermes never comes up. Fix: `install()` now awaits `deps.vm.ensureReady()` before transitioning to `installing`. Errors land in `errored` exactly as before. New regression test pins the call order (`vm.ensureReady` → `loader.ensureImageLoaded`) so a future edit can't silently re-introduce the gap.	2026-05-08 08:14:45 +05:30
shivammittal274	7a2a8e09bc	feat(agent): add Hermes as 4th ACPX adapter (in-VM container, BrowserOS-managed providers) (#956 ) * feat(agent): add Hermes as a 4th ACPX adapter (Phase A) Adds Hermes Agent (NousResearch/hermes-agent) as a host-process ACPX adapter, mirroring the Claude Code pattern. - agent-types.ts: extend AgentAdapter union with 'hermes' - agent-catalog.ts: add Hermes catalog entry - lib/agents/hermes/prepare.ts (new): minimal prepare using prepareBrowserosManagedContext - acpx-agent-adapter.ts: register the adapter - acpx-runtime.ts: add 'hermes' branch returning 'hermes acp' (host) - AdapterIcon.tsx: add Hermes icon - db schema + supporting frontend types/literals updated for the new adapter Phase A scope: host-process only. Phase A.5 swaps to nerdctl exec into a Hermes container. OpenClaw is untouched. Verified by all 6 POC spikes (plans/features/claude-browseros-hermes-poc/findings.md). * fix(agent): address Hermes adapter review issues - NewAgentDialog: add 'hermes' to onValueChange guard so the dropdown option actually wires through onRuntimeChange/onHarnessAdapterChange (was a no-op before — selecting Hermes silently kept previous value) - tests/acpx-runtime: add coverage for the new 'hermes' registry branch - tests/acpx-agent-adapter: fold hermes prepare test into existing file, matching the pattern used for claude/codex/openclaw - Delete tests/lib/agents/hermes-prepare.test.ts (now redundant) - Reconcile install-mechanism comment between acpx-runtime.ts and agent-catalog.ts * fix(agent): make Hermes adapter actually work end-to-end Two surgical fixes uncovered while running the Phase A smoke test through the BrowserOS chat HTTP API: 1. lib/agents/hermes/prepare.ts — seed per-agent HERMES_HOME from the user's global ~/.hermes/ on first use. ensureAgentHome only writes SOUL.md and MEMORY.md; without seeding config.yaml, .env, and auth.json, hermes acp comes up unconfigured and either hangs or errors with "No LLM provider configured." Copy is idempotent (skip if dest exists) so subsequent prepare calls don't clobber per-agent edits. 2. lib/agents/acpx-runtime.ts — wrap the hermes spawn in `bash -c "exec hermes acp \| tee /dev/null"` to bridge Bun's socketpair-based child stdio with Python's asyncio.connect_write_pipe (which only drains correctly to a real pipe(2)). Without it, hermes' stdout never reaches the harness — verified by inspecting hermes process FDs: Bun gives the child unix sockets, asyncio queues writes that never become readable on Bun's end. With tee in the middle, hermes writes to a real pipe and tee bridges the bytes through the socket. Verified 2026-05-06 against hermes-agent 0.12.0 on macOS arm64 + Bun 1.3.6. Smoke-test result with both fixes: - ACP session created end-to-end - BrowserOS MCP wired (96 browser tools registered with hermes) - Reasoning + text streamed back through /agents/:id/sidepanel/chat - Final stream: text-delta "PONG", finishReason "stop" Updates the existing acpx-runtime test to assert the new spawn shape (bash -c, tee /dev/null bridge) so the workaround can't silently regress. * feat(agent): run Hermes adapter in Lima container (Phase A.5) Move Hermes ACPX adapter from host-process spawn to running inside docker.io/nousresearch/hermes-agent:v2026.4.30 in the existing BrowserOS Lima VM, mirroring the OpenClaw container pattern. Container lifecycle (api/services/hermes/hermes-container.ts): - prewarm: ensure VM ready, pull image (or skip if already in containerd), start an idle container with /bin/sh -c "exec sleep infinity" so the harness can nerdctl exec into it per turn - Tini bypassed — tini 0.19.0 in upstream image getopt-parses any -x token even after PROGRAM, breaking /bin/sh -c - --add-host host.containers.internal:<vm-gateway> so hermes inside the container can reach the BrowserOS HTTP MCP endpoint - Bind-mount <browserosDir>/vm/hermes/harness onto /data/agents/harness so per-agent HERMES_HOME dirs are visible to the container Spawn (acpx-runtime.ts): - HermesGatewayAccessor interface (mirrors OpenclawGatewayAccessor) - resolveHermesAcpCommand builds: env LIMA_HOME=... limactl shell --workdir / browseros-vm -- nerdctl exec -i -e PYTHONUNBUFFERED=1 -e HERMES_HOME=... <container> /opt/hermes/.venv/bin/hermes acp - Absolute path /opt/hermes/.venv/bin/hermes (not bare "hermes") since upstream image's PATH is set by its entrypoint script which we override to keep the container idle - Falls back to host-process spawn when no HermesGatewayAccessor wired (test path / dev fallback) - Drops the host-mode bash+tee workaround — limactl/SSH/nerdctl pipe chain is sufficient for asyncio's pipe writer MCP wiring: - New PreparedAcpxAgentContext.browserosMcpHost field threads through prepare → getRuntime → createBrowserosMcpServers - Hermes prepare sets browserosMcpHost='host.containers.internal' so the URL injected into newSession.mcpServers resolves from inside the container; other adapters keep '127.0.0.1' default Per-agent home (lib/agents/hermes/prepare.ts): - HERMES_HOME points at /data/agents/harness/<agentId>/home (in-container) - Host-side seedHermesHomeFromGlobal still copies ~/.hermes/{config.yaml, .env, auth.json} into the per-agent home; the volume mount makes them visible inside the container - New api/services/hermes/hermes-paths.ts holds host/container path helpers End-to-end smoke tests against the dev server (clean Lima state): - Plain text: PONG round-trip via /sidepanel/chat ✓ - Multi-turn context: RUBY-7421 stored + recalled ✓ - Multi-agent isolation: agent 2 doesn't see agent 1's secret ✓ - MCP tool execution: mcp_browseros_browseros_info fires ✓ - Image attachment via /chat: model identifies "Red" from a 128x128 PNG ✓ - Concurrent turns + 409 attachUrl: full attach streams the in-flight Pacific Ocean essay turn cleanly ✓ - Cancel midstream + recovery turn: ALIVE response ✓ - Persistence across server restart: agents survive ✓ Companion knowledge doc: plans/features/claude-browseros-hermes-acp-knowledge.md * feat(agent): per-agent provider/key for Hermes adapter Lets users create multiple Hermes agents each with its own provider, model, and API key. NewAgentDialog now shows provider/model/key fields inline when 'Hermes' is selected. On submit, the harness writes the per-agent <browserosDir>/vm/hermes/harness/<agentId>/home/{config.yaml, .env} directly so the agent has the right config from turn 1 — no dependency on the user having run `hermes setup` outside BrowserOS. The existing seedHermesHomeFromGlobal flow remains as a fallback for agents created without provider fields (e.g. via direct API or with an existing ~/.hermes/ install). Backend: - shared/constants/hermes.ts: HERMES_SUPPORTED_PROVIDERS registry (openrouter, anthropic, openai, custom — bedrock follow-up) - api/services/hermes/hermes-paths.ts: writeHermesPerAgentProvider - agent-harness-service: writes per-agent config.yaml + .env in createAgent when adapter=hermes and apiKey present - routes/agents.ts: relax modelId catalog validation for adapter=hermes (catalog has empty models[] by design; per-agent modelId is free-form) - tests/agent-harness-service: cover write + skip paths Frontend: - HermesProviderFields.tsx (new): provider dropdown, model field, API key + optional baseUrl when provider=custom - NewAgentDialog: render the new fields when adapter=hermes - agents-page-actions: thread fields through createHarnessAgent - AgentsPage / agent-harness-types: minor pass-through edits Smoke-tested end-to-end against the dev server (clean Hermes per-agent home, no ~/.hermes/ seed): create agent with apiKey + modelId, files written at the per-agent path with mode 0600, first chat returns the expected response, all without touching ~/.hermes/. * feat(agent): source Hermes provider config from BrowserOS LLM providers Replace the Hermes-specific provider/model/API-key form in New Agent with a chooser that pulls from the same global LLM providers OpenClaw uses (Settings → BrowserOS AI). Backend rejects creation with a 400 when the selected provider is missing required fields (apiKey, modelId, plus baseUrl for openai-compatible) or is not in the Hermes-supported set; the ~/.hermes/ fallback is removed so Hermes agents always carry their own per-agent config.	2026-05-07 21:54:36 +05:30
shivammittal274	6f8da5b7fb	refactor(openclaw): TKT-788 cleanup (relanded, openclaw-only) — bump image, lock no-auth, delete observer + image bypass (#954 ) * refactor(openclaw): TKT-788 cleanup — bump image, lock no-auth, delete observer + image bypass Re-lands the openclaw-only changes from #934 (reverted in #953 because the original PR's working tree had stale rollback content for `packages/browseros/tools/patch/`). This commit is the same openclaw diff with zero changes outside `packages/browseros-agent/`. What changes (TKT-788 work-streams A + B + C): WS-A — bundled gateway no-auth: - Bump image from `ghcr.io/openclaw/openclaw:2026.4.12` to `ghcr.io/browseros-ai/openclaw:2026.5.2-browseros.1` (BrowserOS- pinned variant with the no-auth contract baked in). - Configure gateway with `auth.mode: 'none'`; remove the device-auth bootstrap dance that the older binary required. - Delete the per-call token plumbing the http-client / observer / chat- client carried (340 LOC). The harness still passes a stable token in headers for backwards-compat with code that hasn't been re-pointed yet, but it is no longer required by the gateway. WS-C — delete the image-attachment bypass: - The HTTP `/v1/chat/completions` carve-out for OpenClaw image turns is gone. Image attachments now ride through ACP as image content blocks (which acpx 0.6.x supports natively for openclaw, claude, codex). - Delete `openclaw-gateway-chat-client.ts` (211 LOC) and `image-turn.ts` (219 LOC). - Drop `maybeHandleTurn` from the `AcpxAgentAdapter` interface and the openclaw entry. `AcpxAdapterTurnInput` removed. - Drop the corresponding 'diverts OpenClaw image turns to the gateway chat client' test from `acpx-runtime.test.ts`. WS-B — replace the WS observer with harness events: - Delete `openclaw-observer.ts` (276 LOC) — no more parallel WS subscription, no more `new OpenClawObserver`, no more `ensureObserverConnected` / `observer.disconnect()` plumbing. - Wire `AgentHarnessService` to receive turn-lifecycle events from the runtime stream itself (`turnLifecycleListeners`) and feed ClawSession from those, preserving the dashboard SSE shape. Net: 314 insertions / 1144 deletions, all under `packages/browseros-agent/`. Typecheck clean across all 6 packages. 946 server tests pass (1 unrelated CDP-dependent test skipped — same state as origin/dev). Reference: TKT-788. The patch-CLI rollback that was in the squash of #934 is intentionally NOT in this commit. * fix(openclaw): handle 2026.5.4 acp-cli envelope shapes (media + injected timestamp) + bump image OpenClaw 2026.5.4 (the BrowserOS-pinned image variant with the no-auth handshake bypass needed for cron tool calls from inside ACP) introduced two new envelope prefix shapes that the post-bypass-deletion path now surfaces in user-message text: [media attached: <internal-path> (<mime>)] [<weekday> <YYYY-MM-DD HH:MM> <TZ>] [Working directory: <path>] <BrowserOS role envelope> The previous cleaner only matched a leading [Working directory: ...] \n\n line. With media + timestamp prefixes ahead of it the anchor no longer matched, so image-attachment user turns rendered with 8+ lines of envelope leak in the chat panel. Replaces the single OPENCLAW_WORKDIR_PREFIX with three content-shape- anchored patterns chained through stripOpenClawAcpCliEnvelope(): 1. [media attached: <path> (<mime>)] ← repeats per attachment 2. [<weekday> <YYYY-MM-DD HH:MM> <TZ>] ← injectTimestamp 3. [Working directory: <path>] ← acp-cli prefixCwd Each is anchored on its content shape (media attached:, weekday abbrev + ISO date, Working directory:) rather than just '[…]', so user-typed lines that happen to start with brackets are not eaten. Also bumps OPENCLAW_IMAGE from 2026.5.2-browseros.1 to 2026.5.4-browseros.1. The 5.2 image refused tool-side WS connections with 'device identity required' even though gateway auth.mode=none — PR #6 in browseros-ai/openclaw added the OPENCLAW_GATEWAY_PRIVATE_INGRESS_NO_AUTH bypass that ships in 5.4. Without 5.4, the cron tool (and any other tool that opens a fresh gateway WS from inside the embedded runner) fails with 1008. Verified end-to-end with the BrowserOS chat endpoint: - Plain text turn: clean - Image attachment turn: clean (was leaking 8 envelope lines pre-fix) - One-shot kind:at cron fires, PING fire renders clean - Second openclaw agent creates, runs, history isolated 15/15 history-mapper unit tests pass; typecheck clean across all packages.	2026-05-07 02:26:25 +05:30
shivammittal274	50cbe48558	Revert "refactor(openclaw): lock no-auth gateway, bump image, delete token pl…" (#953 ) This reverts commit `d81b99c8e3`.	2026-05-07 01:49:50 +05:30
shivammittal274	d81b99c8e3	refactor(openclaw): lock no-auth gateway, bump image, delete token plumbing (TKT-788 WS-A) (#934 ) * fix: disable bundled OpenClaw gateway auth * refactor(openclaw): delete token plumbing now that auth is locked off Builds on the cherry-picked spike (#933). With gateway.auth.mode=none locked in as the only path the bundled gateway runs, the BrowserOS-side token machinery becomes dead weight. This commit deletes: - OpenClawService: token field, tokenLoaded, gatewayAuthMode state machine, getGatewayToken(), getGatewayHttpToken(), ensureTokenLoaded(), refreshGatewayAuthToken(), loadTokenFromConfig() and all six lifecycle call sites. - OpenclawGatewayAccessor.getGatewayToken interface field. - OpenClawHttpClient / OpenClawGatewayChatClient: optional getToken constructor arg and authHeaders() helpers. - OpenClawObserver: gatewayToken field/parameter and the auth.token branch in the connect frame. - GatewayContainerSpec.gatewayToken and the OPENCLAW_GATEWAY_TOKEN env wiring; the OPENCLAW_GATEWAY_PRIVATE_INGRESS_NO_AUTH=1 env is now always set rather than conditional. Test suites: dropped bearer-token assertions and the two persisted-token tests in openclaw-service that asserted deleted behavior. Net: -310 LOC across src + tests, with 118 openclaw + acpx tests still green. Typecheck and biome clean. Reference: TKT-788 (move OpenClaw integration to ACPX runtime), WS-A. * refactor(openclaw): delete gateway image bypass, route image turns via ACP (TKT-788 WS-C) (#935) * refactor(openclaw): delete gateway image bypass, route image turns through ACP The browseros-ai/openclaw ACP bridge accepts image content blocks natively (extractAttachmentsFromPrompt at openclaw/src/acp/event-mapper.ts:92, forwarded via chat.send attachments at translator.ts:295), so the BrowserOS-side carve-out that diverted image-bearing turns to the gateway HTTP /v1/chat/completions endpoint is no longer needed. Deletes: - apps/server/src/api/services/openclaw/openclaw-gateway-chat-client.ts - The corresponding test file - AcpxRuntime.sendOpenclawViaGateway, persistGatewayTurn, recordToOpenAIMessages helpers - The image-attachment carve-out branch in AcpxRuntime.send - openclawGatewayChat option from AcpxRuntime + AgentHarnessService + agent routes ctor wiring - The randomUUID import (only the deleted helper used it) - The acpx-runtime test for the deleted carve-out Net: 614 LOC removed, 0 added, all 142 openclaw + acpx + agent tests still green. Reference: TKT-788, WS-C. Stacked on WS-A (#934). * refactor(openclaw): delete WS observer, feed ClawSession from harness events (#936) The openclaw-observer.ts WebSocket observer was a second tap on the same gateway events the AcpxRuntime already sees as ACP session/update notifications. Replace it with a pull from the AgentHarnessService's turn lifecycle stream — keeping ClawSession and the /openclaw/dashboard SSE endpoint shape unchanged for the BrowserOS UI. Changes: - AgentHarnessService: emit `turn_started` / `turn_event` / `turn_ended` to subscribers via a new `onTurnLifecycle(listener)` API. Wired around the existing `notifyTurnStarted/Ended` calls and inside the per-event read loop. - agents route: forward an optional `onTurnLifecycle` dep into the service it constructs. - server.ts: subscribe and route OpenClaw-adapter events to `OpenClawService.recordAgentTurnEvent(agentId, sessionKey, event)`. - OpenClawService: new `recordAgentTurnEvent` method that maps stream events to ClawSession transitions (working/idle/error + currentTool from `tool_call` events). Keeps the existing `onAgentStatusChange` / `getAgentState` / `getDashboard` API. - Delete `openclaw-observer.ts` (276 LOC) and all observer wiring (`new OpenClawObserver`, `ensureObserverConnected`, three `observer.disconnect()` call sites, the import). Net: 276 LOC removed from the observer; ~130 LOC added across harness event plumbing + recorder method. -146 LOC overall, all 141 tests still green, typecheck clean, biome clean. Reference: TKT-788, WS-B (Path 1: keep ClawSession + dashboard SSE shape). Independent of WS-A (#934) and WS-C (#935); will rebase on top of whichever lands first. --------- Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>	2026-05-07 01:40:37 +05:30
shivammittal274	86cb03a1fc	fix(openclaw): drop BrowserOS-envelope regexes in history mapper (#952 ) * fix(openclaw): drop BrowserOS-envelope regexes in history mapper Replace the four BrowserOS-side regex strips (`<role>`, `<user_request>`, `<system-reminder>`, `[Working directory:]`) in history-mapper with a single call to `unwrapBrowserosAcpUserMessage`. That helper is the same exact-string matcher acpx-runtime already uses for non-OpenClaw history paths (chat history endpoint, listing's `lastUserMessage`); it anchors on the exact constants `buildBrowserosAcpPrompt` writes, so matcher and wrapper travel together. Also drops two patterns that were defensive-only with no emit site in the codebase (`[Working directory:]` prefix and trailing `<system-reminder>` block), and updates the corresponding tests to use the realistic envelope shape `buildBrowserosAcpPrompt` actually produces. The OpenClaw-injected scaffolding patterns (cron prefix, queued- marker, subagent context) stay in place for now — replacing those needs either a side-channel cache keyed on cron job id or a structured `trigger` field on the gateway's history schema, tracked as a follow-up. * fix(openclaw): strip acp-cli's [Working directory:] prefix before BrowserOS unwrap The previous commit incorrectly removed the workdir-prefix strip on the assumption it was speculative defensive code with no live emit site. Actually emitted by OpenClaw's acp-cli (`/app/dist/acp-cli-.js` line 1361, `prefixCwd ? \`[Working directory: ${displayCwd}]\\n\\n...` style), so live history rendering regressed: every user message surfaced with a `[Working directory: /Users/...]\\n\\n<role>...` envelope intact. Restore the strip as an exact-shape line match (`^\\[Working directory: [^\\]]\\]\\n\\n`) anchored on the closing bracket + double-newline so path content is consumed without a content-shape regex. Apply it ahead of `unwrapBrowserosAcpUserMessage` so the BrowserOS unwrap's `^<role>` anchor can match the now-leading envelope. Also fix the test fixture: the BrowserOS unwrap performs exact-prefix match against the full `BROWSEROS_ACP_AGENT_INSTRUCTIONS` constant — truncated `<role>...` test bodies didn't match. Tests now use the verbatim constant text via a shared `ROLE_BLOCK` helper. Verified live: 8/8 history entries render with no envelope leaks.	2026-05-06 23:54:09 +05:30
shivammittal274	7765d99c73	feat(openclaw): aggregate sub-session history into agent main session (#939 ) * feat(openclaw): aggregate sub-session history into agent's main session Cron-triggered (and hook/channel-triggered) runs land in their own ephemeral session files under the parent agent's directory: /home/node/.openclaw/agents/<agentId>/sessions/<runId>.jsonl The chat panel reads agent:<id>:main, so autonomous runs were invisible in history even though they fired and persisted on disk. This change makes `getSessionHistory(agent:<id>:main)` enumerate every session under that agent (via the existing `sessions.list` gateway RPC) and merge their messages into one chronological response. Each merged message is tagged with `source` (main / cron / hook / channel) and the sub-session's key, so the UI can render section markers without re-parsing. Filesystem isolation is enforced upstream — `sessions.list({ agentId })` resolves to that agent's directory only (browseros-ai/openclaw src/config/sessions/combined-store-gateway.ts:90), so no cross-agent leakage is possible. Behavior: - Main session keys (`^agent:[^:]+:main$`) → aggregate - Any other key → existing single-session behavior - Sub-session fetch failures → logged + dropped (partial timeline preferable to a hard failure that hides main) - `limit` applied post-merge across the unified timeline - Streaming variant (`Accept: text/event-stream`) unchanged for now Reuses the pre-existing `cliClient.listSessions` and `httpClient.getSessionHistory` — no new gateway integration. Validation: - bun typecheck clean - bunx biome check clean - 44 openclaw service + route tests pass * feat(openclaw): wire chat panel history through gateway aggregation Adds the missing seam between the chat panel's history fetch and OpenClawService's aggregated history. Before this change: - Chat panel calls GET /agents/<id>/sessions/main/history - AgentHarnessService.getHistory delegates to AcpxRuntime.getHistory - AcpxRuntime reads ~/.browseros-dev/agents/acpx/sessions/<key>.json - That local file is only written by AcpxRuntime.send (user turns) - Cron / hook / channel turns persist on the gateway side instead - Panel sees user turns only; autonomous turns are invisible After this change: - OpenClawProvisioner gains optional getAgentHistory(agentId) method - AgentHarnessService.getHistory branches on adapter — for openclaw, routes through the provisioner instead of the runtime - server.ts wires the provisioner method to call OpenClawService.getSessionHistory("agent:<id>:main") which already aggregates main + every sub-session - New history-mapper.ts converts OpenClaw rich content blocks (text/thinking/toolCall/toolResult) into AgentHistoryEntry shape the chat panel consumes Layering preserved: - AcpxRuntime untouched, still generic, zero services/openclaw imports - AgentHarnessService still talks only to abstract OpenClawProvisioner - server.ts is the single concrete-binding seam (same place that wires createAgent, removeAgent, getStatus) - Other adapters (claude, codex) keep their existing local-file history path — no behavior change for them Tool-call pairing: assistant `toolCall` blocks are stored by toolCallId; subsequent `toolResult` (role: 'tool') messages mutate the same AgentHistoryToolCall reference to attach output / error, so the UI renders complete tool entries instead of orphan inputs. Net: +240 LOC, 1 new file, AcpxRuntime untouched, 117 tests still pass. * feat(openclaw): paginate aggregated history + strip prompt scaffolding Two follow-ups on the aggregation work, both required for the chat panel to render OpenClaw history cleanly. 1. Compound-cursor pagination across sub-sessions The previous aggregation always returned the full merged window with cursor=null/hasMore=false, which broke "load more" in the chat panel once an agent's history grew beyond a single page (every cron job spawns a sub-session, so this hits quickly). Per-session cursor support already exists on the gateway HTTP endpoint (`session-history-state.ts:paginateSessionMessages`). The aggregator now threads each session's cursor through and emits a compound cursor encoding `{<sessionKey>: messageSeq \| null}`, base64url JSON. A `null` slot means the session is exhausted; subsequent pages skip it. The gateway records the per-session monotonic seq inside the `__openclaw.seq` extension envelope rather than the top-level `messageSeq` field; the cursor reads from there. The wire-shape type gains an optional `__openclaw?: { id?, seq? }` field reflecting that. 2. Strip OpenClaw + BrowserOS scaffolding from history user messages Cron-fired user messages on the gateway side carry an OpenClaw template: [cron:<uuid> <name>] <payload> Current time: ... Use the message tool if you need to notify the user directly with an explicit target. ... BrowserOS-initiated turns carry the ACP system prefix: [Working directory: ...] <role>...</role> <user_request> <actual user text> </user_request> <system-reminder>...</system-reminder> Both surface verbatim in the chat panel today. Add `cleanHistoryUserText` (in history-mapper) which extracts: - the cron payload (and drops the trailer) - the user_request body (and drops the role / working-dir / system- reminder envelopes) Non-matching text falls through unchanged so future patterns we don't recognize stay visible rather than getting silently dropped. Verified end-to-end: - /agents history endpoint now returns clean text per item - Pagination cursor advances across pages with correct seq ordering - Chat panel renders messages as `print('hello')`, `hey`, etc. (no leaked envelopes or trailers) - 8 new unit tests for cleanHistoryUserText + the converter, + 86 existing openclaw tests still pass * feat(openclaw): handle queued-marker concatenation in history cleaner When multiple cron prompts (or any prompts) arrive while a turn is still active, BrowserOS's harness queue concatenates them into a single user message joined by a marker line: [Queued user message that arrived while the previous turn was still active] That blob renders as one wall of text in the chat panel — and worse, the cron-prompt cleaner doesn't fire because the message no longer starts with `[cron:...]`. cleanHistoryUserText now splits on the queued-marker line and runs each chunk through the per-message cleaner (cron-prompt extraction or BrowserOS-prefix unwrap), then joins the non-empty results with single newlines so each prompt renders as its own visually distinct line. Verified live: a 6926-char queued blob containing five concatenated [cron:...] prompts now renders as five short `print('hello')` lines. + 2 unit tests covering split + leading-marker edge case. * feat(openclaw): drop subagent context + reasoning-only assistant turns Two new patterns surfaced during e2e cron testing. 1. [Subagent Context] prefix: when an OpenClaw agent invokes a nested subagent, the subagent's session is seeded with a user message: [Subagent Context] You are running as a subagent (depth N/M). ... Begin. Your assigned task is in the system prompt under Your Role. The actual task lives in the subagent's system prompt; the user message body is pure scaffolding. cleanHistoryUserText now returns empty for these so the converter drops the entry — no empty bubble. 2. Reasoning-only assistant turns: MiniMax with thinking:minimal often returns content with only `thinking` blocks and no `text` block on trivial prompts ("Print hello"). The empty text bubble plus dangling reasoning collapsible reads as a broken UI. The converter now skips any entry where text is empty AND there are no tool calls (regardless of reasoning). Trade-off: reasoning-only turns lose their reasoning collapsible. The alternative (empty-bubble cards) is worse. If we want to preserve the reasoning, surface it as the bubble's text — separate UI decision for later. + 3 unit tests covering both patterns.	2026-05-06 00:15:57 +05:30
Dani Akash	db5e55a174	feat(agent-files): expose openclaw produced files inline + outputs rail (#946 ) * feat(server): foundation for OpenClaw agent file-output attribution Phase 1 of TKT-762 — surface files OpenClaw agents produce as artifacts inline in chat + a per-agent Outputs rail. This commit lays the storage + I/O foundation only; turn-lifecycle wiring, HTTP routes, and UI follow in subsequent phases. - New `produced_files` Drizzle table (FK→agent_definitions with cascade, unique on (agent, path) so re-modifications upsert). Migration 0002_chemical_whirlwind.sql. Adapter-agnostic schema — V1 only enables the watcher for openclaw, V2 can plug Claude / Codex into the same table without migrating. - `ProducedFilesStore` — snapshot/finalize-turn diff API plus by-turn / by-agent queries and a path-resolver that enforces workspace-root containment for the download / preview routes. - `walkWorkspace` — bounded recursive workspace walker; skips symlinks (no host-fs smuggling), excludes node_modules / .git / .cache, hard-capped at 50k entries / depth 16. - `file-preview` helper — extension + magic-byte MIME detection, bounded text-snippet reader (1 MB cap), inline image base64 reader (4 MB cap). Streaming download path lives in the route layer (next phase) — this module only handles the small in-memory reads the preview UX needs. * feat(server): attribute openclaw turn outputs to the harness layer Phase 2 of TKT-762 — wire the per-turn workspace diff into the single dispatch path that owns every turn's lifecycle. Two prior wiring points the original plan named (the OpenClaw HTTP chat route + OutboundQueueService.tryDispatch) were collapsed in dev into agent-harness-service.runDetachedTurn — both direct sends and queued sends route through it now, so a single hook covers both. The old `OutboundQueueService` is gone; its successor `message-queue.ts` re-enters runDetachedTurn for the queued case, so we still only need to bracket once. Changes: - New `produced_files` variant on `AgentStreamEvent` so the inline artifact card has a wire-format hook independent of the REST API. - `ProducedFilesStore` gains `resolveAgentDefinitionId` to bridge gateway-side openclaw agent names to the harness's `agent_definitions.id`, handling both the reconciled-row shape (id == openclaw name) and the BrowserOS-created shape (id = oc-<uuid>, name = openclaw display name). - `AgentHarnessService.runDetachedTurn`: snapshot the openclaw workspace before `runtime.send(...)`, finalize the diff in the outer finally, push the resulting rows as a `produced_files` event. Adapter-gated to openclaw only — Claude / Codex agents write to the user's own filesystem and don't need attribution. - Skip attribution on user-cancel (`abort.signal.aborted`) so the side effects of an aborted turn don't get surfaced as "outputs you asked for." On runtime errors we still attribute, because partial outputs are what the user is most likely to want to recover. - Lazy-init the store via `tryGetProducedFilesStore()` so tests that swap in a fake `agentStore` don't trip the process-wide `getDb()` initialisation guard. - File attribution extracted into `attributeTurnFiles` helper to keep `runDetachedTurn`'s cognitive complexity under the lint ceiling. Verifications: - Server tsgo --noEmit clean for changed files. - 162/162 server-api tests pass. - Biome lint clean on all three changed files. * feat(server): expose produced-files HTTP API for /agents Phase 3 of TKT-762 — surface the rows Phase 2 attributes via four read-only endpoints under the existing `/agents` router. Mounted where the agents page already polls so the rail UI doesn't add a second router/origin to its trust boundary. Routes: - GET /agents/:agentId/files Outputs-rail data, grouped by the assistant turn that produced each batch, newest first. `?limit=` clamps to N rows server-side (default 200). - GET /agents/:agentId/files/turn/:turnId Per-turn refresh — used by the inline-card consumer to rebuild metadata after the SSE `produced_files` event lands, and by direct fetches that missed the live event. - GET /agents/files/:fileId/preview Discriminated `FilePreview` JSON: text snippet (≤1MB), base64 image (≤4MB), pdf metadata, or `binary` placeholder when neither preview path applies. 404 when the file id is unknown OR the on-disk file disappeared after attribution. - GET /agents/files/:fileId/download Streams raw bytes via `Bun.file().stream()` with `Content-Disposition: attachment` and the detected MIME type. The fileId is opaque — the server resolves the agent and on-disk path; the client never sees a path, so traversal is impossible by construction. Service layer: - `AgentHarnessService` gains `listAgentFiles`, `listAgentFilesForTurn`, `previewProducedFile`, and `resolveProducedFileForDownload`. All four are no-ops for claude / codex adapters (they return null/[]) so the route contract stays uniform across adapters even though only openclaw produces rows in v1. - New `ProducedFileEntry` and `ProducedFilesRailGroup` DTOs — trimmed wire shapes that strip `agentDefinitionId` and `sessionKey` from the on-disk row. Verifications: - Server tsgo --noEmit clean for changed files (only pre- existing `Bun` global warning). - 162/162 server-api tests pass. - Biome clean on both changed files. Smoke-test instructions for the route shape live in the plan under §6 and §8; full end-to-end smoke happens in Phase 6. * feat(agent): client-side hooks + types for agent file outputs Phase 4 of TKT-762 — frontend foundation for the inline artifact card and the per-agent Outputs rail. UI components themselves land in Phase 5; this commit only adds types, hooks, and shared helpers so the wiring is in place when the components arrive. New module: `apps/agent/lib/agent-files/` - `types.ts` — `ProducedFile`, `ProducedFilesRailGroup`, and the discriminated `FilePreview` union, mirrored from the server-side DTOs in `apps/server/src/api/services/agents/agent-harness-service.ts`. The `agentDefinitionId` / `sessionKey` columns on the on-disk rows deliberately do NOT exist at the type boundary — clients refer to files by opaque `id`. - `file-helpers.ts` — pure helpers: `inferFileKind` (icon routing), `formatFileSize`, `extensionOf`, `basenameOf`, `buildFileDownloadUrl`. No React, no fetch, no DOM — anything stateful belongs in the hooks. - `useAgentOutputs.ts` — `useAgentOutputs(agentId)` for the rail, `useAgentTurnFiles(agentId, turnId)` for the inline card, `useInvalidateAgentOutputs()` for the chat-stream-completion hook (Phase 5 will plumb this), and `useRefreshAgentOutputs()` for the rail's manual refresh button. - `useFilePreview.ts` — `useFilePreview(fileId)` with `staleTime: Infinity` (previews are immutable for a given id; no point refetching on focus). Always opt-in (`enabled`) — the preview only loads when the user clicks a row. - `index.ts` — barrel re-export so consumers import from one path. Touched in `apps/agent/entrypoints/app/agents/`: - `agent-harness-types.ts` — added `produced_files` variant + the `HarnessProducedFile` type to `AgentHarnessStreamEvent`. Mirrors the server-side change from Phase 2 so the client SSE consumer type-narrows correctly. - `useAgents.ts` — exported the previously-private `agentsFetch` helper and the `AGENT_QUERY_KEYS` registry so the agent-files hooks reuse them without duplicating fetch / key conventions. Three new keys added: `agentOutputs`, `agentTurnFiles`, `filePreview`. Verifications: - Agent tsgo --noEmit clean. - Biome clean on all touched files. * feat(agent): inline artifact card + per-agent outputs rail Wires the chat surface to the produced-files API shipped earlier: - Inline artifact card under each assistant turn that produced files, populated by the live `produced_files` SSE event (resumes also stamp `turnId` so a missed live event can fall back to the per-turn fetch). - Collapsible right-side Outputs rail on the agent conversation page, grouped by turn, with Refresh + per-agent open/close persistence in localStorage. Gated to openclaw adapters in v1. - Shared file preview Sheet branches on the FilePreview union: text snippet (markdown for `.md`/`.mdx`, otherwise pre+code), image data URL, and download-only fallback for pdf/binary/missing. - Conversation hook invalidates the rail's React Query cache from its finally block so newly attributed files appear without a manual refresh. * feat(agent-files): polish — symlink-safe paths + toast on failures - `resolveFilePath` now rejects symlink-escapes from the workspace by realpath-resolving both endpoints and re-checking containment. Lexical traversal (`..` segments) still fails fast without touching the filesystem. - Added `produced-files-store.test.ts` with 6 path-resolution cases including a symlink whose target lives outside the workspace root — the prior string-only check would have allowed this. - File preview Sheet: surfaces preview-load failures in a toast (in addition to the inline error block, which is easy to miss when the body has scrolled). Download button now intercepts the click so a missing baseUrl shows a toast instead of silently hiding the button. - Outputs rail: refresh failures fire `toast.error` with the underlying message. * fix(agent-files): drop duplicate `/agents` prefix from client paths `agentsFetch` / `buildAgentApiUrl` already prepend `/agents`, but the file-output hooks were passing fully-qualified paths (`/agents/<id>/files`, `/agents/files/<id>/preview`, etc.) which resolved to `/agents/agents/...` and 404'd. Fixed the four call sites to pass paths relative to the `/agents` root. * fix(agents): strip openclaw role envelope from chat history PR #924 introduced a second `<role>…</role>` prefix for openclaw turns — a single-line block distinct from the multi-line BrowserOS role TKT-774 wired the unwrap against. Because TKT-774's `stripOuterRoleEnvelope` matched the BrowserOS prefix exactly, the openclaw envelope sailed through unstripped and user messages on openclaw agents rendered the full preamble in /sessions/main/history responses. Make the strip adapter-agnostic: any `<role …>…</role>\n\n<user_request>\n…\n</user_request>` shape gets unwrapped. Drops the now-unused BROWSEROS_ACP_AGENT_INSTRUCTIONS constant and adds a regression test that uses the openclaw form verbatim. * feat(agent-files): inline file-card strip with rail deep-link Replaces Phase 5's row-list ArtifactCard with a horizontal strip of small file cards under any assistant turn that produced files. Click a card → opens the FilePreviewSheet directly (preview + download). Click View / +N → opens the per-agent Outputs rail and scrolls / expands the matching turn group. The card strip: - Caps at 4 visible cards; remainder collapses into a +N pill that shares the View handler. - Owns its own FilePreviewSheet instance (parallel to the deprecated ArtifactCard) so the per-card preview path doesn't fight with the rail's Sheet. - Hidden during streaming and absent when producedFiles is empty. - Adapter-gated upstream: AgentCommandConversation only passes the open-rail callback when adapter==='openclaw', so claude / codex agents render no rail-opening affordance. Rail changes: - Accepts focusTurnId + onFocusTurnConsumed; the matching RailTurnGroup expands and scrollIntoView's on focus, then fires the consumed callback so the parent can drop the URL state. - ?outputsTurn=<turnId> deep-links work: external nav opens the rail, sets focusTurnId, and clears the param after consumption. ArtifactCard is marked @deprecated; remove in a follow-up once nothing imports it. * fix(agent-files): keep file-card strip visible after history reload After Phase 7 the inline FileCardStrip vanished as soon as a turn finished: `filterTurnsPersistedInHistory` dropped the optimistic turn once history reloaded, and history items don't carry `producedFiles`. So the user could see a file produced inside an assistant message but no card to open it. Two fixes in tandem so the strip survives both the just-finished case AND a fresh page load: - New `selectStripOnlyTurns` keeps persisted turns that still carry `producedFiles`. `ConversationMessage` learns a `stripOnly` mode that renders only the trailing strip (no duplicate user/assistant bubbles, since those are rendered by `ClawChatMessage`). - `AgentCommandConversation` now also calls `useAgentOutputs` and passes `tailStripGroups` to `ClawChat`. Each rail group not already covered by a live or strip-only turn renders as its own tail `FileCardStrip` after history. Dedup keys on `turnId` so the same turn never doubles up. Adapter-gated upstream — claude / codex agents skip the useAgentOutputs fetch entirely. The card click still opens the preview Sheet directly; View / +N still deep-link to the rail at the matching turn group. * fix(agent-files): per-turn association + cache invalidation Two fixes for the inline file-card strip: 1. Strips were stacking at the conversation tail because every produced-files group rendered as a tail strip after history. New `mapHistoryToProducedFilesGroups` matches each group to the assistant history message that came from its turn — by `group.turnPrompt` vs the first non-blank line of the preceding user message — and ClawChat renders the strip directly under that bubble. Groups that don't match any history pair (orphans) still fall through to the tail. 2. `useInvalidateAgentOutputs` was passing `undefined` as the baseUrl placeholder to `invalidateQueries({ queryKey })` — react-query's positional partial-match doesn't treat undefined as a wildcard, so the cache stayed stale until the query refetched on its own (e.g. window focus). Switched to predicate-based invalidation that matches by [agentOutputs marker, agentId] regardless of baseUrl. Same for the per-turn files key. Net effect: send a turn that produces files → strip appears under the just-finished assistant message; reload the page → strips still appear under the right bubbles, not bunched at the bottom. * fix(agent-files): review feedback — name guard, RFC 5987, limit cap Three review-flagged issues: 1. Path traversal via agent display name — `getHostWorkspaceDir` accepted any string and `path.join`'d it, so a name like `../../tmp` escaped `.openclaw`. The pre-turn snapshot would then walk that escaped directory and attribute every file to the new turn; resolveSafeWorkspacePath's containment check is relative to the same escaped root so it would later serve arbitrary host paths. Added `isAgentWorkspaceNameSafe` (rejects `..`, separators, control chars, leading dots, empty); the builder now throws on unsafe names plus a defensive realpath-style containment check after the join. Harness wraps the call so the path-traversal trip just disables file attribution for the turn instead of failing the whole send. Six-case regression test pinned. 2. `encodeRfc6266Filename` JSDoc claimed an RFC 5987 `filename=UTF-8''<percent-encoded>` fallback but the impl only stripped CRLFs/quotes. Now actually emits the fallback when non-ASCII is present; helper returns the full `filename="…"; filename=UTF-8''…` attribute pair so the call site doesn't have to wrap in quotes. 3. `/agents/:agentId/files` `?limit=` was forwarded to the DB uncapped — extracted `parseAgentFilesLimit` that clamps to [1, 500] before forwarding. Also extracted `resolveSafeWorkspaceDir` + `snapshotWorkspaceForTurn` helpers off `runDetachedTurn` so the new safety branch doesn't push it past biome's cognitive-complexity cap.	2026-05-05 19:48:28 +05:30
Nikhil	d61d6fc8a9	feat: add ACPX agent runtime adapters (#924 ) * feat: add acpx claude runtime paths * feat: add acpx adapter preparation * refactor: use acpx adapter preparation * refactor: move openclaw image turns to adapter * fix: keep openclaw independent of host cwd * fix: address acpx review feedback * fix: preserve claude host auth in acpx	2026-05-04 11:04:24 -07:00
shivammittal274	d383b5e344	feat(eval): add claude-generated run report artifact (#892 ) * feat(eval): add claude-generated run report artifact * fix(eval): install claude code cli for CI evals * fix(eval): bypass claude code tool permissions * Eval metrics configs (#932) * feat(eval): add agisdk comparison metrics configs * fix(eval): keep cdp crashes from aborting run	2026-05-04 21:09:06 +05:30
Nikhil	0d56815cba	fix: store server database under BrowserOS dir (#923 ) * fix: store server database under browseros dir * fix: address PR review feedback for 923	2026-05-02 16:03:41 -07:00
Nikhil	c07d3d95d4	feat: add sqlite drizzle persistence (#919 ) * feat: add drizzle agent schema * feat: run sqlite drizzle migrations * refactor: remove old sql identity dependency * feat: store harness agents in sqlite * build: package db migrations * refactor: remove sqlite oauth token store * feat: restore oauth token storage * fix: handle empty install id * chore: ignore server runtime state * fix: address review feedback for PR 919	2026-05-02 15:19:57 -07:00
Nikhil	1d42a973ea	refactor: extract acpx runtime templates (#918 )	2026-05-02 14:03:15 -07:00
Nikhil	921a797c5b	feat: add ACPX agent soul and memory support (#917 ) * feat: add acpx agent runtime context helpers * feat: add acpx runtime state store * feat: prepare acpx agent runtime context * feat: inject acpx agent command environment * feat: forward acpx agent chat cwd * fix: normalize acpx session record fallback * feat: improve acpx agent soul and memory prompts * fix: address PR review comments for memory-soul-acp * fix: satisfy acpx runtime deepscan checks	2026-05-02 13:45:40 -07:00
Nikhil	d94597bbf9	fix(agent): add CLI model catalog entries (#915 ) * fix(agent): add CLI model catalog entries * fix: address PR review comments for acpx-models	2026-05-02 13:06:41 -07:00
Dani Akash	974e7e9b86	fix(agents): hide BrowserOS ACP envelope from chat history payloads (TKT-774) (#907 ) * fix(agents): hide BrowserOS ACP envelope from chat history payloads (TKT-774) The user-message text persisted on the wire carried two nested envelopes — the outer `<role>You are BrowserOS…</role>` + `<user_request>…</user_request>` block from buildBrowserosAcpPrompt and the inner `## Browser Context` + `<selected_text>` + `<USER_QUERY>` block from formatUserMessage. PR #856 had unwrapped only the outer envelope on history reads, so the user bubble in the agent rail still rendered the inner envelope, and the LLM chat-service path leaked the wrapper all the way back to the sidepanel client through AI SDK's stream sync. Two surgical fixes, both server-only: 1) ACP path (acpx-runtime.ts) — replace unwrapBrowserosAcpPrompt with a comprehensive unwrapBrowserosAcpUserMessage that strips both layers and decodes the </>/& escapes the server applied via escapePromptTagText. Each step is independently defensive (anchors that don't match are skipped) so the helper is idempotent and tolerates partial / older / future-shape envelopes. Applied in userContentToText (history mapper) and inherited by extractLastUserMessage (listing's lastUserMessage). 2) LLM chat path (chat-service.ts) — split the persisted user message from the prompt-time copy. session.agent.appendUserMessage now stores the raw user text; a transient promptUiMessages array is built with the wrapped (formatUserMessage + context-change prefix) form and passed to createAgentUIStreamResponse for the model. onFinish restores the raw form before persisting, so the user-visible message and any future history reads see only the user's typed text. Tests: - acpx-runtime.test.ts: new dedicated unwrapBrowserosAcpUserMessage suite covering fully-wrapped messages, only-outer / only-inner inputs, selected_text blocks with attribute strings, idempotency, literal user-typed angle-bracket round-trip, and an integration test that round-trips the real formatUserMessage output through the unwrap to pin the writer/reader contract. - chat-service.test.ts: existing 'rebuilds a managed-app session' test updated for the new behaviour — asserts the persisted user message is the raw text and the prompt copy passed to the agent carries the Klavis context-change notice. * fix(agents): decode entity escapes before stripping inner envelope (TKT-774) The unwrap was running its inner-envelope strips against the literal-tag form (<USER_QUERY>, <selected_text>) but the persisted payload has those tags entity-escaped (<USER_QUERY>, <selected_text>) — buildBrowserosAcpPrompt runs escapePromptTagText over the entire formatUserMessage payload before adding the outer <role>+<user_request> envelope, so the inner anchors never matched against the on-disk text and the user was still seeing <USER_QUERY> in /agents/:id/sessions/main/history responses. Reorder unwrapBrowserosAcpUserMessage to: outer-strip → decode entities → inner-strips. Test fixtures updated to reflect the actual on-wire form (escaped inner tags); the round-trip test duplicates the escape rule inline so the contract between buildBrowserosAcpPrompt and the unwrap is pinned end-to-end.	2026-05-01 19:42:48 +05:30
Nikhil	fd5aba249b	fix: stabilize OpenClaw gateway startup (#888 ) * feat(server): add shared process lock helper * feat(container): add container name reconciliation helpers * feat(openclaw): serialize lifecycle across processes * fix(openclaw): reconcile fixed gateway container startup * test(openclaw): cover lifecycle race recovery * fix(server): satisfy process lock error override * fix(openclaw): address review feedback * test(openclaw): align serialization mock with image check	2026-04-30 11:31:40 -07:00
Nikhil	492f3fcdf2	feat(openclaw): prewarm ghcr image in vm (#887 ) * feat(openclaw): add gateway image inspection * feat(openclaw): pull gateway image from registry * refactor(vm): decouple readiness from image cache * refactor(openclaw): remove vm cache from runtime factory * feat(openclaw): detect current gateway image * feat(openclaw): prewarm vm runtime and reuse current gateway * feat(openclaw): prewarm runtime on server startup * refactor(vm): remove browseros image cache runtime * refactor(build-tools): remove openclaw tarball pipeline * chore: self-review fixes * fix(openclaw): suppress prewarm pull progress logs * fix(openclaw): address review feedback * fix(openclaw): resolve review findings * fix(dev): stop stale watch supervisors	2026-04-30 11:18:11 -07:00
Dani Akash	8712f89f18	feat(agents): durable per-agent chat message queue + composer Stop (#880 ) * feat(agents): durable per-agent chat message queue + composer Stop button * fix(agents): tighten queue UI — smaller Stop, drop empty indicator, live drain attach User feedback round 1 on the message-queue UX: 1) The Stop button matched the send/voice mics at h-10 w-10 with a solid destructive fill, which read as alarming. Shrunk to h-8 w-8, ghost variant with a soft destructive/10 background, smaller filled square glyph. Reads as a calm 'stop' affordance instead of a panic button. 2) The QueueItem's leading <QueueItemIndicator> dot was decorative only — no state, no interaction. Dropped it from QueuePanel along with the import; queue items now render as a clean preview line with the trailing X remove action. 3) When the server drained the queue and started the next turn, the chat panel didn't pick up the live stream until the user navigated away and back. The hook's resume effect previously only fired on agent change, not on listing-observed activeTurnId change. Surface activeTurnId from useHarnessAgents into useAgentConversation; effect now re-runs when the id changes, calls /chat/active, and attaches to the new turn — so a queued message starts streaming the moment the server drain pops it. * fix(agents): don't reset streaming state from the resume effect's no-op paths The Stop button was disappearing while the agent was actively streaming, even though events were still flowing into the chat. Root cause: the resume effect's `finally` block reset `streaming`, `turnIdRef`, and `lastSeqRef` unconditionally — including on the early-return paths (no active turn, or another mechanism already owns the stream). Sequence that triggered it: 1) User sends a message → send() sets streamAbortRef + streaming=true and starts consuming the SSE. 2) User enqueues another message → enqueue mutation invalidates the listing query. 3) Listing refetches with the live activeTurnId → the resume effect re-fires (deps include activeTurnIdDep). 4) attemptResume hits `if (streamAbortRef.current) return` because send() owns it. 5) The finally clause fires anyway and calls setStreaming(false), clobbering the live state set by send(). The SSE consumer keeps running (refs are intact) so text keeps streaming, but the React flag is wrong, so the Stop button gates off. Fix: track whether this run actually started a stream (`weStartedStream`). The finally only resets state when it does. Early-return / no-active-turn paths now leave streaming/turnIdRef/ lastSeqRef alone for whoever does own them. Also widens the Stop button's visibility (`canStop` prop on ConversationInput) so it stays steady across the brief gap between turns when a queue drain is mid-flight; the parent computes `streaming \|\| activeTurnId !== null \|\| queue.length > 0`. The visibility widening is independent of the streaming-state fix above — both are now in place. * revert: drop canStop widening — Stop only shows while streaming Reverts the canStop prop on ConversationInput and the OR-with-queue visibility from AgentCommandConversation. Stop is gated solely on `streaming` again. Between turns (queue draining) the button stays hidden — only the actively-streaming turn is interruptible from the composer, which matches what the user actually expects. * fix(agents): persist the kicking-off prompt on active turns so the resume placeholder isn't empty When a queued message drained and started a new turn, the chat panel's resume effect staged a placeholder turn with userText: '' because the hook had no way to know what message kicked off the turn — only the agent-side stream was visible, and the user bubble above it was blank until the user navigated away and back (at which point the session record's history loaded normally). Fix: ActiveTurnRegistry.register now accepts an optional `prompt` that's stashed on the turn and surfaced via describe() / the ActiveTurnInfo response. AgentHarnessService.startTurn passes the incoming message into register. /chat/active returns it. The chat hook's resume effect uses active.prompt as the placeholder turn's userText, so the user bubble shows the queued message text the moment streaming begins. Falls back to '' for older clients that haven't been refetched yet. * fix(agents): always release streamAbortRef on resume cleanup, even when cancelled Greptile P1 follow-up. The previous `weStartedStream` guard correctly stopped the resume effect's no-op early-returns from clobbering an in-flight `send()` stream — but it also stopped a cancelled mid-stream resume from clearing its own `streamAbortRef`. When the cleanup fires (e.g. the 5s listing poll captures a new queue-drain turn id while the SSE for the prior turn is still finishing), the next effect run hits the `if (streamAbortRef.current) return` guard against the now-aborted controller and never reattaches, leaving `streaming === true` with no live stream until the user navigates away. Split the finally block: always release `streamAbortRef` when we owned the controller (so the next run can take over), but only reset the streaming flag / turn id / lastSeq on a clean exit (the new run will set those itself, so resetting on cancel would just flicker).	2026-04-30 18:26:56 +05:30
Dani Akash	ba60bf466f	feat(agents): rich command-center rows + home grid + dead-code sweep (#879 ) * feat(agents): rich-info command center rows + pin/PATCH/adapter-health backbone Splits AgentRowCard from a 271-line monolith into a shallow tree of single-responsibility sub-components under `agent-row/`: AgentTile, AdapterHealthDot, PinToggle, AgentTitleRow, AgentSparkline, AgentSummaryChips, AgentLastMessage, CwdChip, AgentTokenSummary, AgentMetaRow, AgentErrorPanel, AgentActions Adds the data each row consumes: - pinned: boolean field on AgentDefinition + FileAgentStore.update + new PATCH /agents/:id route. useUpdateHarnessAgent mutation optimistically updates the listing cache so the star flips instantly; rolls back on error. - Listing payload extended with lastUserMessage, cwd, tokens (cumulative + last7d shape — last7d zero-filled until the activity ledger lands), turnsByDay/failedByDay (zero-filled), lastError/lastErrorAt, activeTurnId. AcpxRuntime grows a getRowSnapshot() that reads cwd + cumulative tokens + last user message from the session record in one pass. - Adapter health: in-memory AdapterHealthChecker probes `claude --version` / `codex --version` with a 2s timeout and caches results for 5 min. /adapters response carries { healthy, reason?, checkedAt }. Tile-corner dot exposes the state via HoverCard; openclaw inherits health from the gateway snapshot already on the page. Sub-components are pure: card itself owns no state. Sort order becomes pinned-first, then recency. HoverCard is the workhorse for keeping rows compact while exposing depth (full message, token breakdown, daily turn list, error stack, adapter reason). * refactor(agents): tighten command-center row design + cut redundant affordances User feedback round 1: 1) Two green dots on the tile (health + liveness) was confusing. Health moves out of the tile entirely and surfaces as an inline 'Unavailable' chip in the model line — silent when the adapter is healthy, with a warning amber chip + HoverCard reason when not. The tile now shows one signal: liveness. 2) The last-user-message HoverCard wasn't telegraphing intent. Drop the HoverCard. The line is informational, italic, with a leading quote glyph so the row reads like a conversation snippet. To see the full message the user opens the chat (which is the action they want next anyway). 3) Resume + Chat were duplicate CTAs. Single primary action per row: Resume (filled, accent-orange, with a pulsing dot) replaces Chat when there's an active turn. Both navigate to /agents/:id but the row tells the user which action they're taking. 4) Tokens weren't visible because the row gated on last7d.requestCount, which is zero until the activity ledger ships. Switch to lifetime tokens (which we have today). Drop the '7d stats:' framing — talking about a window we can't compute would be misleading. The HoverCard surfaces input/output split + a footnote that per-window stats land in a follow-up. 5) CWD was rendering the server's own running directory, which is meaningless to users. Hide it from the row entirely. The cwd field still rides in the listing payload for future surfaces (chat panel, debug view) — only the row stops rendering it. Aesthetic refinements while we're here: - Whole card carries state, not just the tile: working rows get an accent-orange tinted border with a soft glow, error rows tint destructive, idle rows lift on hover. - Pin star fades in on hover (group-hover) when unpinned and stays solid amber when pinned — keeps the rail calm by default. - Tabular-nums on token figures so columns visually align across rows. - Drop CwdChip and AdapterHealthDot files: no callers left. * fix(agents): align row title flush-left whether pinned or not Pin star moved from leading the title to trailing the badges, and hidden from layout entirely (`hidden group-hover:inline-flex`) when unpinned. The previous `opacity-0` rule kept the star reserving its `size-6` slot, which left every unpinned title indented relative to the model / preview / meta lines underneath it. Title now flushes left in both states; pinned star stays solid amber so the signal isn't hidden, and unpinned reveals an outline star on row hover for the toggle affordance. * fix(agents): keep pin-toggle slot reserved so row height is constant Switching the unpinned star from `hidden group-hover:inline-flex` to `opacity-0 group-hover:opacity-100`. The hidden/show variant was collapsing the title row's height when the star wasn't rendered, which made every card below visibly shift on hover. Always rendering the button (with opacity-only visibility) keeps the row's vertical metrics constant; the title still flushes left because the slot is trailing, not leading. Card hover effect (-translate-y + shadow-md) restored — the layout shift wasn't coming from the card hover; it was the pin slot appearing and disappearing. * fix(agents): quieten row hover — border-tint only, no lift, no shadow Drop the `-translate-y-px` and `hover:shadow-md` from the row card plus the working-state inner ring. The translate + shadow grow combination was visibly noisy as the cursor moved through the rail — each row 'lifted' as you passed over it. Hover now just tints the border in accent-orange/30; working and error states keep their distinct border colours but no inner ring. Card height and shadow stay constant in every state, so the rail reads as a calm vertical list of cards. * feat(home): rich Recent Agents grid + dead-code sweep The /home Recent Agents grid was a placeholder shell. Every 'rich' field on the card (lastMessage, lastMessageTimestamp, activitySummary, currentTool, costUsd) was wired to undefined because AgentCommandHome called `buildAgentCardData(agents, status?.status, undefined)` — the dashboard arg has been hard-coded undefined since the harness migration. Repointing the grid at `useHarnessAgents` + `useAgentAdapters` gives every card the same enriched data the rail uses. What the new card shows per agent: • Adapter glyph tile + liveness dot (working pulses; asleep is hollow; error is red) • Name + Working pill (when active) • Adapter · model · reasoning summary line, with an inline Unavailable chip + HoverCard reason when the adapter binary isn't on $PATH • Italic last-user-message preview (line-clamp-2, leading quote glyph) — same visual language as the rail • Footer: 'X ago' + state chip (Asleep / Attention) OR a Resume button (orange, with pulsing dot) when activeTurnId is non-null Sort on the home grid is active-turn → recency. Pinning is NOT a sort key here (and there's no pin indicator on the card) — pinning belongs to the rail at /agents; the home page is action-oriented and trusts active-turn + recency to surface the right agent. Dead code removed: • useAgentDashboard.ts (96 lines, no callers; subscribed to the dead /claw/dashboard/stream from the OpenClaw-only era) • useAgentCardData.ts (the dashboard-merge shim; passed undefined every call so all enriched fields landed as undefined) • AgentCard.tsx (AgentCardExpanded replaced by HomeAgentCard; AgentCardCompact had no callers — the dock's compact mode was never used) • AgentCardData interface dropped from lib/agent-conversations/ types.ts; the new card consumes HarnessAgent directly Visual language stays continuous between rail and grid: same <AgentTile>, same <LivenessDot>, same italic-quote message preview, same orange Resume button with a pulsing dot.	2026-04-30 16:36:22 +05:30
shivammittal274	df0f45dd29	Feat: eval debug dev ci (#869 ) * chore(eval): instrument server startup to root-cause dev CI health-check timeouts Three diagnostics + one config swap to investigate why the eval-weekly workflow has been failing on dev since 2026-04-25 with "Server health check timed out" (every worker, every retry). Background: - Last successful weekly eval on dev: 2026-04-18 (sha `f5a2b73`) - Since then, ~30 server commits landed including Lima/VM runtime, OpenClaw service, ACL system, ACP SDK — 108 server files changed, ~13K LOC added. - Server process spawns cleanly in CI (PID logged) but never binds /health within the 30s eval-side timeout. Static analysis finds no obvious blocker; we need runtime evidence. Changes: 1. apps/server/package.json — add `start:ci` script (no `--watch`). The default `start` uses `bun --watch` which forks a child process that watches every file in the import graph. Dev's graph is ~108 files larger than main's; on a cold CI runner the watcher setup is a plausible source of multi-second startup overhead. 2. apps/eval/src/runner/browseros-app-manager.ts: - Use `start:ci` when `process.env.CI` is set (true on GitHub-hosted runners by default), else `start`. - Capture per-worker server stderr to /tmp/browseros-server-logs/ instead of ignoring it. Without this we have no visibility into why the server is hung pre-/health. - Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module graph may simply need more cold-start time on CI. 3. .github/workflows/eval-weekly.yml — upload the server logs dir as a workflow artifact (always, not just on success) so we can post-mortem any startup failure on the next run. 4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter -> Fireworks (bypasses the OpenRouter per-key spend cap that has been eating recent runs) and drop num_workers 10 -> 4 (well below the Fireworks per-account TPM threshold that overwhelmed the original 2026-04-23 run). Plan: trigger the eval-weekly workflow on this branch with the agisdk config and observe (a) whether it gets past server startup, and (b) if it doesn't, what the captured server stderr says. * fix(eval): capture stdout too — pino logger writes to stdout, not stderr Previous diagnostic patch only redirected stderr; the captured per-worker log files came back as 0 bytes because the server uses pino which writes all log output to stdout (fd 1), not stderr (fd 2). Capture both into the same file. * fix(server): catch sync throw from OpenClaw constructor on Linux The container runtime constructor in OpenClawService throws synchronously on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing .catch() on tryAutoStart() only handles async throws inside auto-start — the sync throw from configureOpenClawService(...) itself propagates up through Application.start() and crashes the process via index.ts:48 (process.exit(EXIT_CODES.GENERAL_ERROR)). This is what's been killing dev's eval-weekly CI: the server crashes in milliseconds, the eval client polls /health, gets nothing, times out. Fix: wrap the configureOpenClawService call in try/catch matching the existing .catch() intent (best-effort, don't crash). Server continues without OpenClaw on platforms where it can't initialize. Verified by reading captured server stdout from run 25123195126: Failed to start server: error: browseros-vm currently supports macOS only at buildContainerRuntime (container-runtime-factory.ts:54:11) at new OpenClawService (openclaw-service.ts:652:15) at configureOpenClawService (openclaw-service.ts:1527:19) at start (main.ts:127:5) * fix(server): defer OpenClaw chat client port lookup to request time apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort() synchronously when constructing the OpenClawGatewayChatClient inside the createHttpServer object literal. On non-darwin platforms this throws via the OpenClawService constructor → buildContainerRuntime, escaping the try/catch added in `5cf7b765` (which only protected the configureOpenClawService call further down in main.ts). Every other getOpenClawService() reference in server.ts is already wrapped in an arrow function. This was the lone holdout. Make it lazy too: change the chat client constructor to take getHostPort: () => number instead of hostPort: number, evaluate it inside streamTurn at request time. Behavior on darwin is unchanged. This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't available — the chat endpoint isn't exercised by the eval, so a deferred throw is acceptable. * fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1 Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't unblock dev's Linux CI — same throw kept reproducing. Whether this is bun caching stale stack frames or a missed eager call site, the safer move is to fix it at the root: make buildContainerRuntime never throw on Linux when the runner has explicitly opted out. Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test escape hatch in container-runtime-factory.ts. When set, returns the existing UnsupportedPlatformTestRuntime stub — server boots normally, /health binds, any actual OpenClaw API call still fails loudly at request time. eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and non-CI Linux behavior unchanged (without the flag they still throw). * feat(eval): align Clado action executor with new endpoint contract David Shan shared the updated Clado BrowserOS Action Model spec. Changes to match it: - Bump endpoint URL + model id to the 000159-merged checkpoint (clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef) in browseros-oe-clado-weekly.json and the README example. - CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min; the 2-min ceiling was failing every cold-start request. - Treat HTTP 200 with action=null / parse_error as an INVALID step instead of aborting the executor loop. The model can self-correct on the next call. Cap consecutive parse failures at 3 to avoid infinite loops. - Capture final_answer from end actions. Surface it in the observation back to the orchestrator so its task answer can use the model's declared result. - Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X). - Switch screenshot format from webp → png to match the documented "PNG or JPEG" contract. * chore(eval): refresh test-clado-api script for new Clado contract Updated the local smoke-test to match the new Clado endpoint and response contract: - New action + health URLs (000159-merged checkpoint). - Drop the grounding-model branch (orchestrator-executor doesn't use it; the README David shared only documents the action model). - Health-check waits up to 6 minutes for cold start with a 30s warning so the operator knows it's spinning up. - Print every documented response field (action, x/y, text, key, direction, amount, drag start/end, time, final_answer, thinking, parse_error, inference_time_seconds). - Three-step run that exercises a click, a typing continuation with formatted history, and an end+final_answer probe. * chore(eval): point clado weekly config at agisdk-real Switches the orchestrator-executor + Clado weekly config to run on the AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff grader. Matches the orchestrator-executor smoke target (Fireworks K2.5 orchestrator + Clado action executor) we want to track week-over-week. * chore(eval): run clado weekly headless Default to headless so the weekly job (and local repros) don't pop ten visible Chrome windows. Set headless=false locally if you need to watch a worker. * fix(eval): address Greptile P1+P2 on server log fd handling P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the log directory missing and crash the server spawn with ENOENT. Move openSync into the same try block; fall back to /dev/null so spawn always succeeds. P2: the log fd was opened on every server start but never closed. Each restart attempt leaked one fd across all workers — over a long eval run that could exhaust the process fd limit. Track the fd on the manager and closeSync it in killApp() right after the server process exits (the child's dup keeps the file open until it exits, so we don't truncate output).	2026-04-30 01:33:49 +05:30
Nikhil	edfc5c751c	fix: align OpenClaw gateway image with VM cache (#868 ) * fix: load OpenClaw gateway image from VM cache * fix: use container port for OpenClaw ACP bridge * fix: address review feedback for PR #868	2026-04-29 12:11:00 -07:00
Nikhil	471256f31c	fix: stop passing native permission flags to ACP adapters (#867 )	2026-04-29 11:07:51 -07:00
Nikhil	4c90ca696b	fix(agents): connect OpenClaw ACP inside gateway container (#866 )	2026-04-29 11:07:29 -07:00
Nikhil	f2ac87d7c3	feat: show created agents in sidepanel (#865 ) * feat(agent): list created agents in sidepanel target catalog * feat(agent): show created agents in sidepanel selector * feat(server): add sidepanel chat route for created agents * feat(agent): route sidepanel agent sends by agent id * chore(agent): retire virtual sidepanel acp targets * fix: address review feedback for PR #865	2026-04-29 10:15:58 -07:00
Dani Akash	a228c278c6	feat(agents): background-resilient chat — turns survive tab disconnect (#863 ) * feat(agents): decouple chat turn lifecycle from SSE response Introduce a per-process ActiveTurnRegistry that owns each agent turn's lifecycle and a ring-buffered event stream, so chat tabs that close, refresh, or navigate away no longer cancel the in-flight turn. New endpoints: POST /agents/:id/chat starts a turn (now returns 409 when one is already running, with the active turnId for attaching) GET /agents/:id/chat/active reports the running turn for a UI that just mounted GET /agents/:id/chat/stream subscribes to a turn; supports Last-Event-ID resume via per-event seq ids POST /agents/:id/chat/cancel explicit cancel — fetch abort no longer affects the underlying turn The chat hook now captures X-Turn-Id, tracks lastSeq from SSE id lines, re-attaches on mount when the server still has an active turn, and routes Stop through the cancel endpoint. The runtime call uses the registry's per-turn AbortController instead of the HTTP request signal, which is the core decoupling that lets turns outlive their initiator. * feat(agents): add ActiveTurnRegistry primitive backing the new chat lifecycle The previous commit referenced these files in tests and the harness service but global gitignore swallowed them on the first add. The registry owns the per-turn ring buffer (drop-oldest, terminal frame preserved), the per-turn AbortController, and subscriber fan-out used by /chat/stream resume.	2026-04-29 21:01:06 +05:30
Dani Akash	e2ec1991cf	feat(agents): redesign the agent command center for multi-adapter use (#861 ) * feat(agents): redesign agent rail to match the rest of the app Reshape the `/agents` page so it reads as a sibling of `/scheduled` and `/soul` and adapts to the multi-adapter world (OpenClaw, Claude Code, Codex). Visual scaffolding only in this commit — per-agent liveness state ships as `unknown` until the server-side activity tracker lands. - New `AgentsHeader` mirrors `SoulHeader`/`ScheduledTasksHeader`: accent bot tile, title, descriptive subtitle, "+ New Agent" button. Replaces the loose top toolbar that mixed page-level and OpenClaw-lifecycle controls. - New `GatewayStatusBar` collects the OpenClaw lifecycle pills (running, control plane connected) plus the Terminal/Refresh affordances into a single labeled bar that only renders when the gateway is running AND there is at least one OpenClaw agent in the merged list. - New `AgentRowCard` per agent: adapter tile with liveness dot, name + status badge, adapter/model/reasoning chips, last-used relative time + truncated workspace path, primary "Chat" button, overflow menu (Copy id / Rename* / Reset history* / Delete). Rename + Reset are disabled with "coming soon" tooltips until the corresponding endpoints ship; Delete is hidden for the protected `main` agent. - New `AgentsEmptyState` mirrors the scheduled-tasks empty card. - New `AdapterIcon` + `LivenessDot` + `agent-display.helpers.ts` keep the row card focused on layout; helpers cover display name fallbacks for legacy `oc-<uuid>` titles, workspace label rules, and a tiny relative-time formatter. - `AgentList` now sorts by `lastUsedAt` desc with `null`s falling to the bottom; the gateway's `main` agent is pinned to the top only while it has zero turns so a fresh install has an obvious starting point. The list also threads a per-agent activity map so future commits can light up working/idle/asleep without reshuffling the API. - `AgentsPage` swaps to the standard `fade-in slide-in-from-bottom-5 animate-in space-y-6 duration-500` shell and threads a `harnessAgentLookup` Map down to the row card so adapter chips and reasoning effort render correctly without a re-fetch. * feat(agents): wire per-agent liveness end-to-end into the rail Closes the placeholder `unknown` dot from the redesign's first commit. The rail now shows real working / idle / asleep / error states per agent, with `lastUsedAt` driving the recency sort. Server side: - `AgentHarnessService` keeps an in-memory activity tracker keyed by agentId. `notifyTurnStarted` flips an entry to `working`, `notifyTurnEnded({ok})` either drops it (success) or pins it to `error` (failure / error event). - `send()` wraps the runtime stream so the lifecycle hook fires exactly once on natural close, error event, downstream cancel, or thrown setup. The runtime itself stays unchanged — fork is contained at the harness layer. - New `listAgentsWithActivity()` method enriches every agent with `{ status, lastUsedAt }`. lastUsedAt is read from the acpx session record's last persisted item via `runtime.getHistory`, so it survives server restart even though the activity map doesn't. - Status derivation: `working`/`error` take precedence; otherwise timestamp-based — `idle` until 15 min of silence, then `asleep`. Never-used agents resolve to `idle` (asleep implies "was active, went quiet"). - `GET /agents` returns the enriched shape. Client side: - `HarnessAgent` UI type extended with optional `status` + `lastUsedAt` so older deployments still typecheck. - `useHarnessAgents` flips on `refetchInterval: 5_000` (with `refetchIntervalInBackground: false` so hidden tabs go quiet) so the per-row dots and last-used copy stay fresh without a websocket. - `AgentsPage` builds an activity map from the harness listing response and threads it into `AgentList` → `AgentRowCard`. The sort by `lastUsedAt` desc (already in the row card) now has real data to operate on. Tests: - New `marks an agent working while a turn streams and idle once it ends` exercises the wrap; uses a held upstream stream so the in-flight `working` state is observable. - New `flips to error when a turn emits an error event`. * fix(agents): dedupe agent rail when /claw/agents and /agents share an id The agents page was rendering every OpenClaw agent twice — once from the legacy `/claw/agents` listing (`useOpenClawAgents`) and once from the harness `/agents` listing (`useHarnessAgents`). Post Step 9 backfill the harness store contains every gateway agent, so the overlap is the rule, not the exception. Mirror the dedup the chat-panel layout already does: when a gateway agent's id appears in the harness listing, drop the legacy entry and keep the harness one (it has adapter/model/reasoning/status/lastUsedAt the chat path actually consumes). * feat(agents): swap GatewayStatusBar refresh icon for a Restart Gateway button + tooltips The manual refresh became redundant once `useHarnessAgents` and `useOpenClawStatus` started polling on a 5s interval — every visible field self-refreshes within seconds. The previous AgentsPageHeader had a real Restart action that the redesign dropped; reinstate it on the bar so a wedged gateway is one click away again. - GatewayStatusBar: dropped the `RotateCcw` refresh icon and the `onRefresh` prop. Added `onRestart` + `actionInProgress` props; the button shows a spinner while a gateway lifecycle mutation is in flight. - Both Terminal and Restart Gateway buttons get tooltips explaining what they do — Terminal as a power-user shell escape hatch, Restart for unsticking a wedged gateway or after manual config edits. - AgentsPage: drop the now-unused `refreshAll` helper and the `refetchStatus`/`refetchAdapters`/`refetchOpenClawAgents` destructures it depended on. Wire `restartOpenClaw` (already pulled from `useOpenClawMutations`) through `runWithPageErrorHandling` like the legacy header did. * feat(agents): consolidate gateway status into the /agents listing Folds the gateway lifecycle snapshot into the harness listing so the agents page polls one endpoint instead of two. Drops the dead `/claw/status` call from the command center while keeping every UI affordance the page already shipped (Running / Control plane connected pills, GatewayStateCards setup/start prompts, ControlPlaneAlert for degraded states). Server side: - `OpenClawProvisioner.getStatus()` (optional) — when wired, returns the same `GatewayStatusSnapshot` shape `/claw/status` does. - `AgentHarnessService.getGatewayStatus()` — best-effort wrapper around the provisioner method; logs and swallows errors so a transient gateway issue doesn't 500 the listing endpoint. - `GET /agents` now returns `{agents, gateway}` in a single `Promise.all`. Both fields are independent — agents enrichment succeeds even if the gateway snapshot is null. - `server.ts` wires `getOpenClawService().getStatus()` into the provisioner accessor object alongside `createAgent` / `removeAgent` / `listAgents`. Client side: - `useHarnessAgents` returns `{harnessAgents, gateway}` (plus the legacy `agents` mapping). Same 5s `refetchInterval` as before — one round-trip drives the per-row liveness AND the gateway pills. - `AgentsPage` drops `useOpenClawStatus` entirely; `status` comes from the harness query. Loader + error/lifecycle plumbing rewired around the harness query's loading/error. - `agents-page-utils.getInlineError` and `getAgentsLoading` lose the now-redundant `statusError` / `statusLoading` / `openClawAgentsEnabled` params. The chat-panel layout (`agent-command-layout.tsx`) still consumes `useOpenClawStatus(5000)` for now — left intact per the user's "only the command center" scope. Folding that one in is a separate, smaller pass once we're sure no regression slipped here. * test(agents): teach the route fake service about the new listing shape PR #861 CI surfaced two failures in tests/api/routes/agents.test.ts: both call \`GET /agents\` and the route handler now invokes \`service.listAgentsWithActivity()\` + \`service.getGatewayStatus()\` which the fake created here didn't implement. Add both methods to the fake (returning idle / null) and update the empty-list assertion to expect the new \`{agents, gateway}\` envelope.	2026-04-29 19:03:29 +05:30
Dani Akash	0c84547e8f	feat(agents): migrate OpenClaw chat onto the unified harness/ACP path (#859 ) * chore(acp): smoke-test ACP capabilities against running gateway Adds apps/server/scripts/acp-smoke.ts which spawns `openclaw acp` inside the gateway container and exercises every method we plan to depend on: initialize, newSession, prompt (text + image), cancel, listSessions, loadSession. SDK pinned to 0.19.1 (Bun's minimum-release-age policy blocks 0.20+ which were released < 7 days ago). Findings (full notes in plan outcomes): - promptCapabilities advertises image:true but the model does NOT see image bytes — silently dropped at the bridge. - sessionCapabilities advertises {list:{}} but session/list throws "Method not found": stale capability advertising. - loadSession works; replays user/assistant/thought text and session_info/usage/commands updates. No tool_call replay, as documented. - cancel works end-to-end: stopReason=cancelled. - closeSession/resumeSession are not on ClientSideConnection in 0.19.1; kill child to close, use loadSession for rebind. Plan revisions triggered by spike are recorded in plans/browseros-ai/BrowserOS/features/2026-04-28-2310-claude-code-acp-implementation-roadmap.md. * chore(acp): re-run smoke on SDK 0.21.0 and add mode/config/auth scenarios After bypassing Bun's minimum-release-age and upgrading the SDK to 0.21.0, restore the previously-skipped resume/close paths and add three new scenarios: mode (setSessionMode), config (setSessionConfigOption, correct configId field), and auth (authenticate noop). Findings, all bridge-side (independent of SDK): - session/list, session/resume, session/close all throw -32601 on OpenClaw 2026.4.12 — capability advertising is stale. - Image content blocks silently dropped; model never sees the bytes. - setSessionMode and setSessionConfigOption work; latter requires `configId` (not `optionId`) per the schema. - loadSession replays user/assistant/thought text + session_info + usage + available_commands; no tool_call replay (documented). - authenticate is a noop on OpenClaw (no authMethods advertised). Plan outcomes updated with full method-support matrix. * chore(deps): promote @agentclientprotocol/sdk to a runtime dependency The smoke script in apps/server/scripts/acp-smoke.ts used the SDK as devDependency. The upcoming ACP bridge (apps/server/src/api/services/acp/) needs it at runtime, not just for tooling. Move the entry from devDependencies to dependencies, alphabetically first under @a. Pinned to 0.21.0 — same version the smoke script validated against. README gains a small Dependencies note pointing at the future bridge location. No code changes yet. The bridge wiring lands in subsequent commits. fix(openclaw): wire LlmProvider.supportsImages through to OpenClaw model config When BrowserOS sets up a custom OpenAI-compat provider on the gateway, the agent UI's "Supports Image" flag (LlmProviderConfig.supportsImages) was being dropped on the floor. As a result the persisted model entry had no `input` field, OpenClaw defaulted it to ['text'], and image_url content parts were silently stripped before the model saw them. Fix: - Extend OpenClawSetupInput / OpenClawAgentMutationInput on the agent side (useOpenClaw.ts) and the route body schema + SetupInput + createAgent input on the server side with `supportsImages?: boolean`. - AgentsPage forwards `llmOption?.supportsImages` from the selected LlmProviderConfig in both handleSetup and handleCreate. - provider-map.resolveSupportedOpenClawProvider emits `input: ['text', 'image']` on the model entry when the flag is truthy; otherwise emits the explicit `['text']` so the value is always pinned (avoids relying on OpenClaw's implicit default). - applyBrowserosConfig adds `tools.media.image.enabled = true` to the bootstrap batch so the gateway's image-understanding pipeline is always wired up — per-model `input` still gates which models see images, this just enables the global path. ACP image content blocks are still dropped by the OpenClaw bridge — that's a separate bridge bug, not addressed here. This commit restores image support for the OpenAI-compat /v1/chat/completions path that the upcoming ACP chat panel will use as a carve-out for image-bearing prompts. Existing custom-provider configs are NOT auto-migrated; users will re-acquire image support either by re-running setup or by editing their model entries' `input` field manually. A migration pass for legacy installs is not in scope for this commit because the "supportsImages" intent isn't recoverable from the persisted config alone — the source of truth is the LlmProvider record on the agent side. * feat(agents): add OpenClaw to AgentAdapter union and catalog Extends AgentAdapter to 'claude' \| 'codex' \| 'openclaw' and adds the OpenClaw entry to AGENT_ADAPTER_CATALOG. The new entry has: - defaultModelId: 'default' — OpenClaw's ACP bridge does not surface per-session model selection (verified during the ACP spike), so models live in the OpenClawService config, not in the adapter catalog. AgentDefinition.modelId carries the gateway-side model name for display only. - models: [] — empty list signals "no per-session model picker" in the UI; isSupportedAgentModel('openclaw', undefined\|'default') returns true via the existing fallback path. - reasoningEfforts mirror OpenClaw's session-level `thought_level` config option (off / minimal / low / medium / high / adaptive). Also extends: - isAgentAdapter type guard recognizes 'openclaw' - HarnessAgentAdapter union on the extension side - agents.test.ts createAgent fake type - agent-catalog.test.ts asserts on the new entry, empty model list passthrough behavior, and OpenClaw's reasoning effort set Lockfile delta is the workspace SDK pin reconciling 0.20.0 (taken from dev's lock) up to our package.json's 0.21.0 (added in `c1d987ea`). acpx still uses 0.20.0 transitively — both are present. No runtime wiring yet — the registry override and AcpxRuntime plumbing land in subsequent commits. * feat(agents): plumb OpenClaw gateway accessors into AcpxRuntime Adds an optional `openclawGateway` accessor to AcpxRuntime so the upcoming registry override (Step 4) can spawn `openclaw acp` inside the gateway container with the right port, token, and container/VM identity. All accessors are getter-shaped so values stay live across gateway restarts (port can change, token can rotate). The accessor is threaded: server.ts → createAgentRoutes → AgentHarnessService → AcpxRuntime ↘ sidepanel lazy AcpxRuntime Also adds OpenClawService.getGatewayToken() returning the in-memory token string. We pass it via OPENCLAW_GATEWAY_TOKEN env var on the spawn (per OpenClaw's documented env-var precedence) instead of via `--token` flag (which leaks to ps aux) or `--token-file` path (no discrete token file lives inside the container — the token is nested inside openclaw.json). Wiring is dormant — the registry override that consumes these accessors lands in Step 4. Typecheck + existing acpx/harness/routes tests pass unchanged. * refactor(agents): scrub local plan-step references from code comments Replaces forward-looking comments that referenced internal plan steps (e.g. "Step 4 wires this into…") with comments that justify the code on its own merits. Plan files live locally on the contributor's machine, so cross-references are noise to the rest of the project. No behavior change. * feat(agents): spawn openclaw ACP adapter inside the gateway container When the harness resolves the `openclaw` adapter, it now returns a command that runs `openclaw acp` inside the bundled gateway container via `limactl shell <vm> -- nerdctl exec -i ... openclaw acp --url ws://127.0.0.1:<port>`. This reuses the openclaw binary already installed alongside the gateway — no host-side openclaw install is required. Auth: the gateway token is injected via OPENCLAW_GATEWAY_TOKEN on the container exec rather than `--token` on the openclaw CLI, so the secret never appears in `ps aux`. Banner output: OPENCLAW_HIDE_BANNER=1 and OPENCLAW_SUPPRESS_NOTES=1 keep stdout JSON-RPC-clean. LIMA_HOME: prefixed via `env LIMA_HOME=<path>` on the resolved command so the spawned limactl finds the BrowserOS-owned VM (the server doesn't set LIMA_HOME on its own process env). When the gateway accessor is absent, falls through to acpx's built-in openclaw adapter which assumes a host-side install — that branch will fail at spawn time with a descriptive error. Verified end-to-end via the existing acp-smoke script during the Step 0 spike. * feat(agents): dual-create OpenClaw harness agents on the gateway When the harness creates an `openclaw` adapter agent, it now also provisions a matching agent on the OpenClaw gateway via the existing CLI path (OpenClawService.createAgent). Symmetric on delete: gateway removeAgent runs alongside the harness-store delete. - Adds an OpenClawProvisioner interface (decoupled from OpenClawService for testability) and injects it through AgentHarnessService. - createAgent rolls back the harness record if gateway provisioning fails; deleteAgent tolerates gateway-side failures so harness identity stays consistent with the user-facing UI. - New OpenClawProvisionerUnavailableError surfaces as a 503 when an openclaw create request lands on a harness with no provisioner wired in (instead of a generic 500). - FileAgentStore mints openclaw agent ids with an 'oc-' prefix so the id satisfies the gateway's `^[a-z][a-z0-9-]$` agent name pattern. Other adapters keep raw UUIDs to preserve compatibility. - POST /agents body schema accepts providerType / providerName / baseUrl / apiKey / supportsImages, forwarded to the provisioner when adapter='openclaw'. The agents-page UI still routes openclaw create through the legacy /claw/agents flow; switching that path to the harness is a separate UI cutover. Tests cover: gateway dual-create on success, rollback on gateway failure, 503 when provisioner is missing, and tolerant delete on gateway-side failure. fix(agents): skip catalog model validation for OpenClaw adapter OpenClaw agents resolve their model from the gateway-side provider config (set at agent-create time via OpenClawService) rather than from the harness catalog, which has an empty `models: []` entry by design. Without this carve-out, every OpenClaw create body fails parsing with "Invalid modelId" because no concrete model id can satisfy isSupportedAgentModel('openclaw', ...). The reasoning-effort check still runs against the catalog (those values map directly to OpenClaw's session `thought_level` config option). * fix(agents): pass --session to openclaw bridge so newSession routes correctly acpx's AcpClient.createSession calls connection.newSession with cwd and mcpServers but never forwards the sessionKey. Without it, the openclaw bridge falls back to a synthetic acp:<uuid> session that doesn't resolve to any provisioned gateway agent — every harness chat returns a generic "Internal error" from -32603. Fix: bake `--session <key>` into the resolved spawn command. The bridge then uses that as the default session key for any newSession the bridge receives, routing the turn to the matching gateway agent. Per-session keying means each openclaw agent gets its own AcpxCoreRuntime instance (cached by sessionKey on top of the existing cwd/permissionMode key). This adds one extra runtime per active openclaw session — claude/codex are unaffected. Test asserts the resolved command includes the right --session arg. * fix(agents): suppress BrowserOS MCP for openclaw bridge The openclaw ACP bridge rejects newSession when mcpServers is non-empty because its provider tooling comes from the gateway, not from ACP-side MCP servers. Forwarding the BrowserOS HTTP MCP made every harness chat fail with a JSON-RPC -32603 "Internal error" before the session was even opened. Claude/codex still need the BrowserOS MCP for browser tooling, so the carve-out is keyed off whether the runtime is for an openclaw session. * feat(agents): route OpenClaw chat through the harness behind a flag Adds the `feature.useAcpxForOpenClaw` extension storage flag. When on, OpenClaw agents in the agent-command chat panel use the harness /agents/<id>/chat SSE and harness history hook instead of the legacy /claw/agents/<id>/chat. When off, behavior is unchanged. Also dedupes the agent rail when the same id appears in both stores (dual-created agents from /claw/agents and /agents) by preferring the harness entry — without this, every dual-created OpenClaw agent shows up twice after Step 5. Image attachments are temporarily disabled when the harness path is active; the carve-out lands in the next commit. * fix(agents): keep legacy OpenClaw agents on ClawChat The previous commit's flag-gated branch routed every `source='openclaw'` agent through `/agents/<id>/chat` when the flag was on, but the layout dedup means the only agents that ever reach that branch are legacy gateway-only entries (`main`, orphan agents from rolled-back creates) — which by definition have no harness record, so the harness path 404s and chat is unusable. Source is the only routing signal again: harness agents go through the harness, legacy agents stay on ClawChat. The storage flag stays for Step 9/10's migration story. * feat(agents): expose OpenClaw in sidepanel and route through gateway main `buildSidepanelChatTargets` now emits a single default ACP target for adapters with no per-session model picker (OpenClaw, whose model is configured on the gateway-side agent). Without this, OpenClaw never appeared in the sidepanel target picker because the catalog entry has `models: []`. Sidepanel sessions don't have a dedicated provisioned gateway agent. The openclaw bridge `--session` flag previously got the raw sidepanel key (`sidepanel:<convId>:openclaw:...`), which doesn't match any gateway agent — newSession was accepted but every prompt hung forever. The bridge command now rewrites non-harness session keys onto the always-present `main` gateway agent, encoding the original key as a channel suffix to keep state segregated per conversation. Verified end-to-end via curl: sidepanel openclaw chat streams `text-delta` + `finish: stop`. * feat(agents): backfill harness records for legacy gateway agents Reframes Step 9 of the OpenClaw-on-acpx migration. The plan's literal Step 9 (route OpenClaw history through the harness when the flag is on) was already a no-op after the Step 6 walkback — history is routed by source today. The actual blocker for Steps 10–13 was that legacy gateway-only agents (e.g. `main`, orphans from rolled-back creates) had no harness record, so they could never migrate to the harness path without breaking chat. `AgentHarnessService.reconcileWithGateway()` now lists every gateway agent and upserts a matching harness record for any that are missing. The pass runs lazily on first `listAgents()` call (memoized on success, retried on failure so a gateway-down boot doesn't permanently disable backfill). Verified end-to-end: the legacy `agent` agent now streams `text_delta` + `done(end_turn)` through `/agents/agent/chat`, with the bridge resolving to the gateway's `agent` record via the existing `agent:<name>:main` session-key format. After this, every OpenClaw agent surfaces as `source='agent-harness'` post-dedup, the legacy `useClawChatHistory` hook becomes unreachable for OpenClaw, and Steps 11–13 (delete legacy chat/history paths) are unblocked. * fix(agents): drop duplicate OpenClaw entry from NewAgentDialog adapter list The adapter Select hardcoded an `<SelectItem value="openclaw">OpenClaw</SelectItem>` on top of iterating `adapters`, which now includes OpenClaw post the catalog change. The dropdown rendered "OpenClaw" twice — once at the top, once at the bottom of the list. The literal was a pre-catalog artifact; removing it leaves a single OpenClaw entry sourced from the catalog. Routing into `handleOpenClawCreate` is unchanged because the value (`'openclaw'`) is identical either way. * fix(agents): always reconcile harness with gateway on list, just dedupe concurrent calls Memoizing the first successful reconcile meant new gateway agents (created via the legacy /claw/agents path or out-of-band CLI) never appeared in the harness until server restart. The Promise now serves as a concurrent-call dedupe only — cleared on settle — so every listAgents call picks up the current gateway state. Reconcile is one cheap idempotent CLI call. * chore(agents): remove dormant useAcpxForOpenClaw flag The flag was scaffolded in Step 6 but its routing effect was walked back the same day after it broke chat for legacy gateway-only agents. After the Step 9 backfill, every OpenClaw agent has a harness record and routes through the harness path purely from `source='agent-harness'` — no flag is consulted anywhere. Remove the dead storage item, hook, and stale comment. * refactor(agents): drop legacy /claw/agents/:id/history endpoint The harness /agents/:id/sessions/main/history endpoint replaced this once every OpenClaw agent got a harness record (Step 9 backfill). Routing is fully source-driven now, so the UI's useClawChatHistory hook is never enabled today — verified live: legacy URL returns 404, harness history hydrates correctly for the same agent. Removes the GET /claw/agents/:id/history route, OpenClawService's getAgentHistoryPage method plus its cursor/limit helpers and the history-only types it owned (BrowserOSOpenClawHistoryPageResponse, HistoryPageInput, normalizeHistoryLimit, encodeHistoryCursor, decodeHistoryCursor, jsonlEventsToHistoryItems), and the route + service tests that covered the dropped endpoint. OpenClawJsonlReader stays alive — still feeds /claw/dashboard, /claw/agents/:id/sessions, and the boot-time clawSession seed. Removing those is its own follow-up since the dashboard would need a harness-side replacement first. * feat(agents): wire image attachments through the harness ACP path Composer attachments now flow into the ACP `prompt` request as spec-compliant `image` content blocks alongside the user's text. End to end: composer → chatWithHarnessAgent({attachments}) → POST /agents/:id/chat {message, attachments} → parseChatBody decodes data: URLs to {mediaType, base64} → AgentHarnessService.send forwards → AcpxRuntime.send forwards → acpx startTurn({attachments}) → ACP image blocks UI no longer disables the attach button on harness agents — the gating was just a placeholder before this commit landed. Verified end to end with a 1×1 red PNG against a Claude harness agent: model replies "Red." correctly. OpenClaw's `acp` bridge still drops image content blocks upstream (verified by the same probe — Kimi-k2p5 reports "I don't see an image"). That's an upstream openclaw limitation, not a harness-side gap; Claude/Codex agents work as advertised today. * chore(openclaw): delete OpenClawJsonlReader and JSONL-backed routes * chore(openclaw): remove legacy /claw/agents/:id/chat and /queue routes * chore(agents): collapse chat panel to harness-only path * feat(agents): route OpenClaw image turns through the gateway HTTP client The OpenClaw `acp` bridge silently drops ACP `image` content blocks (verified during dogfood — model says "I don't see an image"). When the user attaches images to an OpenClaw agent, the harness now diverts that turn to the gateway's HTTP `/v1/chat/completions` endpoint, which accepts OpenAI-style `image_url` parts and forwards them natively to the provider. - New `OpenClawGatewayChatClient` translates an OpenAI streaming response into the same `AgentStreamEvent` shape the rest of the harness already consumes, so the chat panel renders identically whether a turn went through ACP or the gateway carve-out. - `AcpxRuntime.send` forks at the top: openclaw + any image attachment + a wired gateway client → `sendOpenclawViaGateway`. Other turns (text-only openclaw, claude, codex) take the existing ACP path unchanged. - The diverted path reads the prior turn history from the acpx session record so context is preserved, builds the OpenAI multimodal user message with text + image_url parts, and pumps the gateway SSE back to the caller through a tee that accumulates the assistant text. On natural completion, persists a synthetic user+assistant message pair to the acpx session record so reload shows the image turn in history. - Wired `OpenClawGatewayChatClient` into `AgentHarnessService` via `server.ts` (gateway port + token accessor, just like the existing `openclawGateway`). Persistence note: the acpx record requires User messages to carry an `id` and Agent messages to carry `tool_results` — without them the record fails to round-trip through `parseSessionRecord`. The persist helper now sets both. Limitation by design: image recognition only works if the OpenClaw agent's provider supports vision (e.g. Claude-via-OpenClaw, GPT-4o). The pipeline routes images correctly to the provider regardless; text-only providers like Kimi-k2p5 will reply "I don't see an image" because the model itself has no vision capability — that's a provider config issue, not a routing bug. The unit test asserts the image_url part is present in the OpenAI request the gateway client sends. The wider plan (background-resilient chat, queue, replay) remains in `plans/.../2026-04-29-1527-...-background-resilient-chat-and-image-uploads.md` as Phases 3–12; this commit ships only Phases 1–2. * feat(agents): validate inbound image attachments on /agents/:id/chat The harness chat body parser was accepting any mediaType and any dataUrl length. The composer enforces these caps client-side but the endpoint also serves direct curl/script callers, so the server has to defend itself. Restores the same caps the legacy /claw/agents/:id/chat parser had before it was deleted in the migration: - 10 attachments per message - 5 MB raw image bytes (≈ 6.7 MB once base64-encoded plus prefix) - PNG / JPEG / WebP / GIF only - Must start with `data:` Each violation returns 400 with a specific error message instead of silently dropping or forwarding the payload.	2026-04-29 16:37:03 +05:30
Nikhil	2ff5c12840	feat: add sidepanel ACP chat targets (#857 ) * feat(agent): add sidepanel chat target catalog * feat(agent): show acp models in sidepanel selector * feat(server): adapt acp events to ui message streams * feat(server): add sidepanel acp chat route * feat(agent): route sidepanel chat through acp targets * chore: self-review fixes * fix: address review feedback for PR #857	2026-04-28 18:23:38 -07:00
Nikhil	d87422eea1	fix: hide BrowserOS ACP wrapper in history (#856 )	2026-04-28 17:31:11 -07:00
Nikhil	1946ca0cf8	chore: clean up unused agent sdk (#855 )	2026-04-28 17:21:46 -07:00
Nikhil	754f7d0e1d	test: cover terminal limactl resolver errors (#854 )	2026-04-28 17:12:08 -07:00
Nikhil	85bb3f7b42	fix: avoid eager limactl resolution in server tests (#853 )	2026-04-28 16:56:41 -07:00
Nikhil	cb32b8191d	fix: show rich ACP harness history from ACPX (#852 ) * fix: load ACP harness history from ACPX * fix: address ACP history review comments	2026-04-28 16:40:22 -07:00

1 2 3 4

164 Commits