Commit Graph

2494 Commits

Author SHA1 Message Date
DaniAkash
e77031025b fix(container): poll readiness probe within descriptor budget on start
ManagedContainer.start was firing the subclass `readinessProbe()`
exactly once, the moment containerd reported the container as Up.
For OpenClaw this raced the Node.js gateway's HTTP listener bind —
containerd flips status as soon as the entrypoint process spawns, but
the Express server takes a few hundred ms to start serving /readyz.
Single-shot probe → unlucky → state='errored' with
"Readiness probe failed after container reached running state".

Pre-refactor (dev branch) didn't hit this because openclaw used a
two-phase flow: `runtime.startGateway` (no probe) then
`service.waitForReady` (polled /readyz for 30s). When the new
runtime architecture folded openclaw under ManagedContainer, the
polling was lost.

Bring it into the base class: `ManagedContainer.start` now polls
`readinessProbe()` within `descriptor.readinessProbe.timeoutMs` at
`intervalMs` cadence. Deterministic probes (Hermes' `--version` exec)
succeed on the first call and exit immediately — no extra latency.
HTTP probes get the full budget they need.

Also stops misapplying `descriptor.readinessProbe` to the containerd
"Up" wait (which only takes ~50ms anyway — defaults are fine).
2026-05-11 20:41:43 +05:30
DaniAkash
b6172a4109 feat(agents): per-runtime install/start controls via RuntimesSection
The agents page only surfaced OpenClaw's lifecycle controls — Hermes
auto-installed silently at boot with no UI visibility or manual handle.
Adds a generic section that iterates over container-kind runtimes from
/runtimes and renders a control panel + status bar per adapter.

- new useRuntimes() hook hits GET /runtimes
- new RuntimesSection renders one card per container runtime, with an
  adapter-keyed extras registry for adapter-specific affordances
  (panel extras + status-bar pill / actions)
- AgentsPage replaces its hand-rolled openclaw panel + bar with the
  section, plugging Configure-provider + Terminal into the openclaw
  slot via the registry
- the section becomes adapter-agnostic: new container runtimes show up
  on the page automatically (filtered by descriptor.kind === 'container')
2026-05-11 20:15:00 +05:30
DaniAkash
9632b60425 refactor(openclaw): delete legacy UI helpers + types
The runtime state machine is now the single source of truth in the UI;
the old OpenClawStatus surface (controlPlaneStatus, lastGatewayError,
lastRecoveryReason, the status enum) and its consumers are all dead
weight after Chunks 1-4. Drop them.

UI:
- OpenClawControls.tsx: delete StatusBadge, ControlPlaneBadge,
  AgentsPageHeader, LifecycleAlert, ControlPlaneAlert, GatewayStateCards.
  Keep ProviderSelector + InlineErrorAlert — still used by the setup
  dialog and AgentsPage's inline error surface.
- agents-page-utils.ts: delete getControlPlaneCopy, getRecoveryDetail,
  getGatewayUiState, getLifecycleBanner, canManageOpenClawAgents,
  shouldShowControlPlaneDegraded, getControlPlaneCopyForStatus.
- agents-page-types.ts: delete GatewayUiState, LIFECYCLE_BANNER_COPY,
  CONTROL_PLANE_COPY, FALLBACK_CONTROL_PLANE_COPY, RECOVERY_REASON_COPY.
- useOpenClaw.ts: delete OpenClawStatus + GatewayLifecycleAction.
2026-05-11 17:39:48 +05:30
DaniAkash
fdc6b80395 refactor(openclaw): move gateway port ownership into the runtime
Port persistence + reconciliation now lives entirely on the runtime
side. Service keeps a lazy httpClient getter that always reads the
current port from runtime.getHostPort(), so a port change (via
syncState drift detection) propagates everywhere automatically.

Server:
- OpenClawContainerRuntime seeds hostPort from runtime-state.json at
  construction (readPersistedGatewayPortSync) and writes back via
  syncState when the live container's mapping drifts
- OpenClawService.hostPort, setPort, adoptRuntimeHostPort,
  ensureGatewayPortAllocated, isCurrentGatewayAvailable,
  isGatewayAvailable, isGatewayAuthenticated, isGatewayPortReady,
  the httpClient field, and the local fetchOk all deleted
- tryAutoStart is now ~10 lines: syncState → executeAction({type:start})
  → control-plane probe; no port juggling, no auth-mismatch realloc
  (that path was driving the broken-state bug from earlier)
- internal `this.hostPort` references now go through runtime.getHostPort()

Tests:
- delete the four obsolete tryAutoStart tests (each asserted internals
  that are gone) plus the unused mockGatewayAuth helpers
- add two slim tryAutoStart tests pinning the new contract
- existing runtime tests still call setHostPort, so the method survives
  as a test-only override
2026-05-11 17:37:20 +05:30
DaniAkash
4806eb414d refactor(openclaw): drop /claw/status, getStatus, and the gateway block 2026-05-11 16:59:19 +05:30
DaniAkash
7392244574 refactor(openclaw): delete duplicated service-level lifecycle methods
Removes the start/stop/restart/reconnectControlPlane/shutdown surface on
OpenClawService — these duplicated the new AgentRuntime state machine
and were the root cause of the two views disagreeing. UI flows now go
through runtime.executeAction via the RuntimeControlPanel; server
shutdown via getOpenClawRuntime().executeAction({type:'stop'}).

Server:
- delete service.start/stop/restart/reconnectControlPlane/shutdown +
  stopGatewayLogTail (now unreferenced)
- delete /claw/start /claw/stop /claw/restart /claw/reconnect routes
- replace internal `await this.restart()` (createAgent, updateProviderKeys)
  with `runtime.restartGateway` — provider-config changes only need a
  container restart, not a control-plane re-probe
- main.ts shutdown handler uses getOpenClawRuntime().executeAction directly

UI:
- useOpenClawMutations drops startOpenClaw/stopOpenClaw/restartOpenClaw/
  reconnectOpenClaw and pendingGatewayAction; setup/create/delete remain
- AgentsPage drops the legacy LifecycleAlert + ControlPlaneAlert blocks;
  the RuntimeControlPanel already renders pending state on its own
  action buttons

Tests:
- delete tests for the removed methods
- runtime mocks in restart-side tests now expose restartGateway directly
2026-05-11 16:47:20 +05:30
DaniAkash
d6440bdccd refactor(openclaw): derive legacy gateway status from runtime state
OpenClawService.getStatus was carrying its own view of "is the gateway
alive" (running/stopped/uninitialized derived from machineStatus +
isReady probe) while the new AgentRuntime maintains the canonical state
machine. The two could disagree — most visibly after a wipe + partial
restart, where the runtime correctly read not_installed but the service
still reported running/connected from in-memory fields.

Map the legacy status surface from runtime.getStatusSnapshot().state so
both pills can't contradict each other. Clear controlPlaneStatus /
lastGatewayError / lastRecoveryReason whenever the runtime isn't
running — those signals are only meaningful for an alive gateway.

First chunk of the legacy-lifecycle removal. Lifecycle methods on the
service (restart/shutdown/tryAutoStart/etc.) and duplicated hostPort
state still exist and will be removed in follow-up commits.
2026-05-11 16:32:41 +05:30
DaniAkash
349c3743a9 fix(openclaw): seed empty .env in runtime so direct Start works on a fresh install
Starting the gateway via the new RuntimeControlPanel "Start" CTA goes
through runtime.executeAction({type:'start'}) directly, bypassing
OpenClawService.tryAutoStart and its ensureStateEnvFile() seeding step.
On a freshly-wiped .browseros-dev that left nerdctl create failing with
"failed to open env file .../.openclaw/.env: no such file or directory".

Seed the file (empty, mode 0600) inside buildContainerSpec so the
runtime is self-sufficient. Service callers continue to work — their
ensureStateEnvFile is now an idempotent no-op once the file exists.
2026-05-08 23:22:57 +05:30
DaniAkash
830eebae82 fix(openclaw): stop stale gateway before re-allocating port on auth mismatch
When a previous boot leaves a gateway running with a stale token, the
realloc-on-auth-mismatch branch was bumping the persisted port without
actually freeing the old container — ManagedContainer.start() no-ops
when state==='running', so the next start cycle never recreated the
container on the new port. The result: persisted/service/runtime drift
back into mismatch, and history requests 500 with "gateway is not ready"
even while the (stale) gateway keeps serving chat from the old port.

Stop the gateway explicitly when we decide to bump off the port, so the
upcoming start cycle goes through the full remove + create + start path
on the freshly-allocated port. The token-mismatch test still passes;
adds a new test pinning the stop-before-realloc behaviour.
2026-05-08 22:28:56 +05:30
DaniAkash
4ccb7ac0fd fix(openclaw): reconcile drifted gateway host port from live container
When a previous server boot wrote runtime-state.json after the gateway
container had already been created with a different hostPort (e.g. 18789
held at allocate-time → container started on 18790), the persisted port
disagrees with the live mapping. The runtime then probes the persisted
port forever and the UI sticks at `starting`.

`syncState` now reads `NetworkSettings.Ports` from inspect-container and
adopts the actual host port for the gateway container's published port
when it differs. The service then re-syncs `hostPort`/`httpClient` and
rewrites runtime-state.json so the next boot starts from a clean slate.

- ContainerInfo gains a flat `ports` array (parsed from
  `NetworkSettings.Ports`)
- OpenClawContainerRuntime.syncState: reconcile hostPort from live
  mapping before probing /readyz
- OpenClawService.tryAutoStart: adopt the runtime's reconciled port and
  persist it via writePersistedGatewayPort
2026-05-08 21:01:10 +05:30
DaniAkash
ab63827b69 fix(openclaw): sync runtime state from existing container at boot; render Start CTA for installed state
Two stuck-state bugs in the new RuntimeControlPanel:

1. The runtime's state machine started fresh at not_installed on every
   server boot. tryAutoStart's short-circuit branches (gateway already
   running, auth pass) never drove the state transitions, so the UI
   saw not_installed for a gateway that was actually running. Add a
   syncState() method on OpenClawContainerRuntime that probes the
   actual container via cli.inspectContainer + /readyz and sets state
   accordingly. Wire it into tryAutoStart as the first step so it
   runs regardless of which branch the rest takes.

2. RuntimeControlPanel had no case for state === 'installed', so after
   a successful Install the panel went blank instead of offering the
   next step. Treat installed the same as stopped — show the Start
   CTA with copy that reflects the difference (image is pulled vs
   container exists but stopped).

Optional-chained the syncState call so existing tests with partial
runtime mocks don't crash on the missing method.
2026-05-08 19:47:27 +05:30
DaniAkash
8f68d12339 chore: merge feat/openclaw-runtime — picks up bundled-Lima fallback fix 2026-05-08 19:27:03 +05:30
DaniAkash
af16f1cc0c fix(openclaw): tolerate missing bundled Lima at runtime construction
resolveBundledLimactl / resolveBundledLimaTemplate throw synchronously
when the host has no Lima and no bundled resources — that fired during
configureOpenClawRuntime on linux CI runners, breaking server-integration.
Wrap both calls so construction falls back to the bare 'limactl' command
name (and undefined template). Lifecycle ops still fail at spawn time
on platforms without Lima, matching how Hermes/Claude/Codex degrade.
2026-05-08 19:26:31 +05:30
DaniAkash
c099a35dee refactor(ui): wire RuntimeStatusBar + RuntimeControlPanel on AgentsPage; drop legacy lifecycle UI
AgentsPage now uses the new runtime-control components for OpenClaw
lifecycle:
- RuntimeControlPanel replaces GatewayStateCards (state-appropriate
  CTAs gated on capabilities). Provider config dialog trigger lives
  in the panel's extras slot.
- RuntimeStatusBar replaces GatewayStatusBar (running pill +
  Restart). Control-plane pill + Open Terminal live in the bar's
  extra slots — gateway specifics stay outside the runtime layer.

GatewayStatusBar.tsx deletes outright. The 'Unavailable' badge in
AgentSummaryChips.tsx deletes — capabilities-driven UI surfaces the
same signal more usefully on the new RuntimeControlPanel; the prop
stays for upstream callers but is now a no-op.

ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from
OpenClawControls remain — they're alerts for control-plane and
mid-flight lifecycle states, distinct from the runtime control
surface. They cover gateway-specific concerns the runtime layer
doesn't model. Cleanup deferred to a follow-up.
2026-05-08 19:20:40 +05:30
DaniAkash
8eb911d83f feat(ui): RuntimeStatusBar + RuntimeControlPanel components
RuntimeStatusBar — compact one-line bar with adapter name + state pill
+ optional Restart action. Reads from useRuntime(adapter); the pill
covers every container and host-process state. extraPill / extraActions
slots let openclaw add its control-plane pill and Open Terminal
button without baking gateway specifics into the runtime layer.

RuntimeControlPanel — capability-gated state-appropriate primary CTA:
not_installed → Install, stopped → Start, errored → Restart + Reset,
installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI,
running → optional Stop. extras slot for adapter-specific affordances
(e.g. openclaw provider Setup dialog trigger).
2026-05-08 19:12:47 +05:30
DaniAkash
983e433845 feat(ui): add useRuntime / useRuntimeAction / useRuntimeLogs hooks
Generic React Query hooks backed by the typed RPC client (hc<AppType>),
keyed by adapter id. useRuntime polls /runtimes/:adapter/status every
5s by default; useRuntimeAction issues a capability-gated POST to
/runtimes/:adapter/actions/:action and invalidates the status query
on success; useRuntimeLogs is opt-in (disabled by default) for
container runtimes.
2026-05-08 19:10:25 +05:30
DaniAkash
4401e30fdc feat(server): add /runtimes/* route surface
Uniform HTTP surface backed by AgentRuntimeRegistry + runtime.executeAction:
- GET /runtimes — list all registered runtimes (descriptor + status + capabilities)
- GET /runtimes/:adapter/status — single status snapshot
- GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition
- POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction
- GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process)

Routes use zValidator for path/query/body so the typed RPC client picks
up the schemas; mounted with the same requireTrustedAppOrigin
middleware as /claw/* /terminal /acl-rules /monitoring.
2026-05-08 19:08:50 +05:30
DaniAkash
5da13e54b5 test(openclaw): make persisted-port restart test deterministic on linux CI 2026-05-08 18:56:06 +05:30
DaniAkash
5ea8cff1b6 fix(openclaw): keep runtime constructable on non-darwin so service tests + linux CI work
The previous configureOpenClawRuntime short-circuited to null on
non-darwin, which caused OpenClawService's constructor to throw
"runtime is not available on platform linux" on linux CI runners
— breaking server-api tests (which build the service then mock
service.runtime) and the server-integration test (which spawns
the server on linux). The legacy ContainerRuntime constructor was
platform-agnostic; this restores that.

The runtime now constructs on every platform. descriptor.platforms:
['darwin'] is still the live signal for the UI / adapter health,
and inherited start() fails at limactl-not-found on linux if
anyone actually invokes it. Tests that override service.runtime
post-construction (the standard pattern) work uniformly.

ensureOpenClawRuntime simplifies to a one-liner. The
configureOpenClawRuntime non-darwin test retargets to assert
the runtime is still returned (instead of asserting null).
2026-05-08 18:45:37 +05:30
DaniAkash
b494bbd41c test(runtime): cover OpenClawContainerRuntime descriptor + spec + ACP exec + factory 2026-05-08 16:54:39 +05:30
DaniAkash
f313aa532d refactor(openclaw): switch service + dispatch to OpenClawContainerRuntime; delete legacy ContainerRuntime 2026-05-08 16:49:32 +05:30
DaniAkash
a23fd55934 feat(runtime): add OpenClawContainerRuntime + factory 2026-05-08 16:04:09 +05:30
Dani Akash
4e405681a7 feat(container): richen ManagedContainer — isImageCurrent + logs + sibling-exec (#968)
* feat(container): add isImageCurrent + getLogs + tailLogs + runOneShot to ManagedContainer

Four base-class additions ahead of the OpenClaw runtime migration so
the upcoming subclass doesn't have to re-implement them:

- isImageCurrent() — pure predicate comparing the existing container's
  image ref to descriptor.defaultImage. Treats SHA-pinned variants as
  matches. start() is unchanged; subclasses + service layers compose
  the predicate where they want short-circuit behaviour.
- getLogs(tail) and tailLogs(onLine) — generic log primitives, thin
  pass-throughs to ContainerCli.
- runOneShot(argv, opts) — sibling-container helper that spawns a
  <name>-setup container with the same image+mounts+env (no ports/
  health/restart), runs argv, force-removes after. Includes the
  retry-on-name-collision behaviour previously bespoke to OpenClaw.

Hermes inherits unused surface only — no behavioural change. The
in-flight base-class tests cover all four primitives.

* fix(container): tighten getLogs error path + close runOneShot timeout-onLog leak; trim docstrings

- getLogs now distinguishes a missing container (returns []) from
  other CLI failures (throws). Previously nerdctl's stderr ("Error:
  no such container: …") leaked into the lines array as if it were
  log output. isNoSuchContainer is exported from container-cli to
  share the predicate.
- runWithOptionalTimeout wraps the caller's onLog so post-timeout
  lines from the abandoned runCommand promise become no-ops; before
  this, callers could see onLog fire after runOneShot had already
  rejected, hitting state the caller may have torn down on the
  timeout error.
- Tightens the new docstrings to one short line per the project
  convention; drops a restating comment in the test file.
2026-05-08 15:58:05 +05:30
Dani Akash
b445615d61 refactor(claude+codex): migrate onto HostProcessAgentRuntime; collapse adapter-health (#967)
* feat(runtime): add ClaudeRuntime + CodexRuntime + factories

* refactor(host-adapters): switch wire-up + dispatch + health to runtime registry

main.ts registers ClaudeRuntime + CodexRuntime alongside Hermes. ACP
runtime resolves all three via the registry; legacy host-process
spawn is preserved as a fallback so unit tests that don't bootstrap
runtimes keep working. AdapterHealthChecker now reads runtime
snapshots through the registry — the embedded execAsync probe,
ADAPTER_HEALTH_COMMANDS table, and friendlyProbeFailure mapper
delete. As a side-effect this also fixes the Hermes "Unavailable"
chip (Hermes was missing from ADAPTER_HEALTH_COMMANDS).

Drops the standalone claude-code/prepare.ts and codex/prepare.ts
modules (their bodies are exported from the runtime files now).

* test(runtime): cover ClaudeRuntime + CodexRuntime descriptor + prep + factory

* fix(runtime): coalesce concurrent host-process probes; expose probedAt on snapshot

* fix(runtime): preserve acpx-core npx-wrapped spawn for claude + codex

The host-process runtimes were resolving the ACP spawn command
through their own getAcpExecSpec, which returned argv [claude] /
[codex] — bare binaries. acpx-core's built-in registry actually
resolves these adapters to npx wrappers around the official
ACP-aware packages (claude-agent-acp, codex-acp), and the package
version range is owned by acpx-core. The bare-binary spawn would
fail because either the binary is missing or doesn't speak ACP.

Spawn dispatch now goes through registry.resolve() + wrapCommandWithEnv
for claude/codex (matching pre-#967 behaviour). The runtime
registrations still drive health probing and per-turn prep — only
the spawn-command source-of-truth stays in acpx-core. Drops the
misleading getAcpExecSpec from the host-process runtime classes.

Regression test asserts the spawn command contains the npx package
name (claude-agent-acp / codex-acp) for each adapter.
2026-05-08 13:02:19 +05:30
Dani Akash
d68e8905fe refactor(hermes): migrate Hermes onto ContainerAgentRuntime (#965)
* feat(runtime): add HermesContainerRuntime + factory

* refactor(hermes): switch wire-up + dispatch to runtime registry

main.ts and the agent route stack now resolve Hermes through
`AgentRuntimeRegistry`. Drops the `hermesGateway` plumbing chain
(server.ts → routes → harness → AcpxRuntime), the
`HermesGatewayAccessor` interface, and `resolveHermesAcpCommand`.
Removes `HermesContainerService`, `HermesContainer`, and
`prepareHermesContext`'s standalone module — their behaviour is now
owned by `HermesContainerRuntime`.

* test(runtime): cover HermesContainerRuntime descriptor + lifecycle + factory

* test(runtime): move registry reset to afterEach to survive assertion failures
2026-05-08 11:32:19 +05:30
Dani Akash
e89fccd997 feat(runtime): introduce AgentRuntime abstraction (types, interface, registry, abstract bases) (#964)
* feat(runtime): introduce AgentRuntime types + interface + registry

Foundation for the unified agent-runtime abstraction. No adapter
migrates yet; the existing acpx-runtime, per-adapter prepare
modules, OpenClawService, HermesContainerService, and
adapter-health.ts all keep working unchanged.

This commit adds the data layer of the abstraction:

- `RuntimeDescriptor` discriminates the two kinds we ship today
  (`'container'` | `'host-process'`). UI components route on this.
- `RuntimeState` is the union of both kinds' states — container
  flow `not_installed → installing → installed → starting →
  running → stopped`, host flow `cli_missing | cli_present |
  cli_unhealthy`, plus the shared `errored` and
  `unsupported_platform` terminals.
- `RuntimeStatusSnapshot` carries a single `isReady: boolean` so
  the harness has one bit to read before spawning turns.
- `RuntimeAction` is a typed discriminated union — required args
  (e.g. `agentId` for `'reset-wipe-agent'`) are compile-time
  enforced, removing the previous footgun of optional args on a
  string-keyed dispatch.
- `RuntimeCapability` lists every action a runtime can advertise;
  `getCapabilities()` is the single switchboard the UI uses to
  decide which buttons to render.

`AgentRuntime` interface declares the contract every runtime
implements: status snapshot + subscriber, capability list,
`executeAction(action)`, `buildExecArgv(spec)`, and per-agent home
dir. `prepareTurnContext` is intentionally absent until the first
adapter migrates so callers can't depend on a method that has no
implementation.

`AgentRuntimeRegistry` is a small class + module-level singleton —
adapters register themselves at boot, the harness/UI look up by
`adapterId`. `resetAgentRuntimeRegistry()` is for tests only.

Two error classes round it out: `ActionNotSupportedError`
(capability gate, mapped to HTTP 405 in a later phase) and
`RuntimeNotReadyError` (state gate at the runtime layer, distinct
from the container-layer's `ContainerNotReadyError`).

* feat(runtime): add ContainerAgentRuntime + HostProcessAgentRuntime abstract bases

* test(runtime): cover state translation, action dispatch, registry

* fix(runtime): gate host-process executeAction on capabilities; only stamp probe cache after probe resolves
2026-05-08 09:47:38 +05:30
Dani Akash
805ae8e607 feat(server): ManagedContainer abstraction — Hermes readiness gate + ACP layering fix (#962)
* feat(container): add waitForContainerRunning primitive + typed error

Adds `ContainerCli.waitForContainerRunning(name, opts)` polling
`inspectContainer().running === true` until either the container
reports running or the timeout expires. Distinct from the existing
`waitForContainerNameRelease` (which waits for *deletion*).

Used by the upcoming managed-container layer between
`nerdctl create + start` and "container is ready for exec" so the
harness never spawns a turn against a half-started container —
which is the root cause of the silent first-turn failure on Hermes
today (`hermes-container.ts:130-160` returns immediately after
start).

Defaults sized for cold-start: 30s budget at 500ms cadence.
Throws `ContainerNotRunningError` (new, in `lib/vm/errors.ts`) on
timeout — distinct from `ContainerNameReleaseTimeoutError` so
callers can branch on "didn't come up" vs "didn't get cleaned up".

* feat(container): add ManagedContainer abstract base + state machine

Introduces the abstract base every container-backed agent adapter
will subclass. Owns the canonical state machine (not_installed |
installing | installed | starting | running | stopped | errored),
the lifecycle lock (per-process promise chain + cross-process file
lock), the gated `execute*` family, and the host↔container path
translator.

Subclasses provide only what's actually adapter-specific:
- `descriptor` (image, container name, supported platforms)
- `buildContainerSpec()` for the `nerdctl create` args
- `readinessProbe()` after the container reaches running
- `mountRoots()` for the path translator

Three execute methods, all sharing one invariant — every entry
point gates on state == running:

- `execProcess(spec)` spawns a long-lived child process via Bun,
  waits through `starting` up to 60s, throws typed
  `ContainerNotReadyError` if the container is not_installed /
  stopped / errored / timed out.
- `execOneShot(spec)` is a buffered convenience wrapper.
- `buildExecArgv(spec)` is the pure builder for callers (acpx-core)
  that need a shell-command string. Single source of truth for the
  `env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` chain
  that today's ACP runtime hand-rolls in two places (`acpx-runtime
  .ts:780-820` and `:823-870`).

`reset(level)` is on the API surface but throws
`ResetNotSupportedError` so the next PR can wire soft / wipe-agent
/ hard without revving the abstract class.

Path translator uses lexical containment against declared mount
roots; the realpath-based symlink-escape check lives one layer up
(in the file-attribution code that already shipped) since the
translator itself never reads from disk.

* feat(container): HermesContainer subclass + wrapper-service bridge

`HermesContainer` (lib/container/managed/) is the first concrete
adapter on the new `ManagedContainer` base. Provides the four bits
that are actually adapter-specific:

- `descriptor`: image, container name, supported platforms,
  readiness-probe tuning.
- `mountRoots()`: host↔container path mapping for the harness dir.
- `buildContainerSpec()`: nerdctl create args (env, mounts,
  add-hosts, entrypoint override).
- `readinessProbe()`: execs `hermes --version` inside the
  freshly-started container; bypasses the state gate via
  `cli.exec` since we're in `starting`, not `running`, when the
  probe runs.

`HermesContainerService` (api/services/hermes/) is rewritten as a
thin wrapper that delegates `prewarm` / `start` / `stop` /
`restart` / `shutdown` to the underlying `HermesContainer`. Public
surface is preserved so `main.ts`, `server.ts`, and
`agent-harness-service` compile unchanged in this PR; `getAccessor()`
still returns the structural `HermesAccessor` the ACP runtime
expects today (the runtime swap is the next commit). The wrapper
also exposes `getContainer(): HermesContainer | null` for callers
that want the richer surface.

The user-visible bug — Hermes silent first-turn failure — is fixed
as a side effect: `start()` now waits through
`cli.waitForContainerRunning` and runs the `hermes --version`
readiness probe before transitioning to `running`. Subsequent
chat turns are gated on the container actually being ready, not
just on `nerdctl create + start` having returned.

* feat(agent): ACP runtime spawns Hermes via ManagedContainer.buildExecArgv

`resolveHermesAcpCommand` no longer hand-rolls the
`env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` chain.
It now delegates to `gateway.buildExecArgv`, which the wrapper
service routes to the underlying `ManagedContainer.buildExecArgv`.

The structural `HermesGatewayAccessor` type gains one method
(`buildExecArgv`) — keeps the existing four getters so any
test/legacy caller still works. The wrapper's `getAccessor()`
delegates `buildExecArgv` to its `HermesContainer`. Net effect:
the `limactl shell ... -- nerdctl exec ...` argv chain has
exactly one owner (`ManagedContainer.buildExecArgv` in the
container layer) instead of being duplicated across `acpx-runtime`
and the now-deleted hand-built chain.

The OpenClaw branch (`resolveOpenclawAcpCommand`) is untouched —
its migration to ManagedContainer is a separate, larger PR that
also has to model the gateway / control-plane surfaces.

Tests: the existing acpx-runtime test suite expected the four
old getters; updated the Hermes-container fixture to also
provide `buildExecArgv` (mirrors the production builder inline so
the test stays independent of the production class wiring). All
320 server tests pass.

* test(container): managed-container + hermes-container coverage

20 cases across two files in `tests/lib/container/managed/`.

ManagedContainer base (14 cases):
- State machine: start() walks installing → starting → running;
  probe-false lands errored with lastError populated; stop()
  force-transitions to stopped even from errored.
- execProcess gating: rejects ContainerNotReadyError with
  reason='not_installed' when never started; reason='errored'
  when in errored state (preserving lastError); resolves once
  state flips to running while waiting; reason='timeout' when
  starting never resolves.
- buildExecArgv: snapshot test pinning the exact canonical
  `env LIMA_HOME=… limactl shell <vm> -- nerdctl exec -i …` string
  for the Hermes-shaped invocation; -e flags omitted when env is
  empty.
- reset(level): throws ResetNotSupportedError for all three
  levels (Phase 1 stub).
- Path translation: round-trip host ↔ container under a declared
  mount; mount-root itself translates without suffix; rejects
  PathOutsideMountsError for /etc/passwd / /proc/cpuinfo.
- subscribeState fires every transition, stops after unsubscribe.

HermesContainer subclass (6 cases):
- Descriptor declares adapterId='hermes', the canonical container
  name, image, and darwin platform support.
- start() happy path reaches running + invokes the
  `hermes --version` probe via cli.exec.
- Probe-non-zero start() lands errored with the right error.
- ContainerSpec built with idle entrypoint, harness bind-mount
  (source = /mnt/browseros/vm/hermes/harness, target =
  HERMES_CONTAINER_HARNESS_DIR), and host.containers.internal
  add-host pointing at the VM gateway.
- toContainerPath maps host harness paths to /data/agents/harness.
- buildExecArgv produces the canonical Hermes ACP spawn string
  with LIMA_HOME, container name, hermes binary path, and -e env.

Pre-existing test in tests/lib/container/container-cli.test.ts
(`waits until a container name is no longer resolvable`) flakes
under parallel test load on dev; passes solo. Last touched in
fd5aba24, well before this branch.

* chore: tidy comments

* fix(hermes): use provider:custom for openai + openai-compatible

Hermes (v2026.4.x) does not have a provider key called "openai" —
its `PROVIDER_REGISTRY` enumerates 33 named providers (anthropic,
deepseek, gemini, kimi-coding, etc.) and "openai" is not one of
them. Per the upstream docs, the canonical shape for any
OpenAI-compatible endpoint with an API key is:

    model:
      provider: custom
      base_url: "<endpoint>"

When `base_url` is set, Hermes ignores provider lookup and calls
the URL directly using OPENAI_API_KEY (or the configured api_key).
Today's mapping wrote `provider: "openai"` for both BrowserOS
provider types — Hermes' main-model loader rejected that with
`unknown provider 'openai'`, and the harness surfaced an opaque
"Internal error" on every first chat for any Hermes agent backed
by a Fireworks / Together / Groq / OpenAI provider.

Fix:
- `openai` and `openai-compatible` BrowserOS types now both map
  to `hermesProvider: 'custom'`.
- HermesProviderMapping gains an optional `defaultBaseUrl` field
  used when `provider: 'custom'` is set with no caller-supplied
  baseUrl (BrowserOS' `openai` type doesn't require base_url at
  the API edge, but Hermes' `custom` always does — so we fall
  back to https://api.openai.com/v1).
- writeHermesPerAgentProvider rejects `provider: 'custom'` with
  no base_url so a future regression fails loudly instead of
  silently writing an unusable config.yaml.

Tests updated: the existing openai-compatible case now asserts
`provider: "custom"` instead of `"openai"`, plus a new case
covering the openai-default-base-url fallback path.

Note: the `openrouter` mapping is left untouched because its
fix is unverified (Hermes' PROVIDER_REGISTRY doesn't appear to
contain "openrouter" either, but the auxiliary fallback chain
recognises it). Worth a separate follow-up — out of scope for
this fix which targets the user-reported reproduction.

* fix(container): install() must ensure VM is ready before image pull

Image operations run inside the Lima VM, so `nerdctl pull` fails
on a cold-boot run if the VM hasn't been started yet.
`HermesContainerService.prewarm()` (the original wrapper) always
called `vm.ensureReady()` before `ensureImageLoaded()` — the
wrapper-bridge introduced earlier in this PR delegated `prewarm()`
to `container.install()` and dropped the VM-ensure step.

`start()` does ensure VM, but on cold boot `prewarm()` and
`start()` race for the lifecycle lock and there is no guarantee
which one wins. When `prewarm()` lands first, the image pull
crashes against an unstarted VM and Hermes never comes up.

Fix: `install()` now awaits `deps.vm.ensureReady()` before
transitioning to `installing`. Errors land in `errored` exactly
as before. New regression test pins the call order
(`vm.ensureReady` → `loader.ensureImageLoaded`) so a future edit
can't silently re-introduce the gap.
2026-05-08 08:14:45 +05:30
Nikhil
833baec84d fix(agent): offset sidebar content to prevent overlap on narrow viewports (#960)
* fix(agent): offset main content by collapsed sidebar width to prevent overlap

Add pl-14 (56px = w-14) to both main branches in SidebarLayout so the
content is always offset to the right of the fixed overlay sidebar.
Previously, on viewports narrower than ~1300px the expanded sidebar
would visually overlap the left edge of the centered content.

* fix(agent): DRY up sidebar offset — hoist pl-14 to parent div

Move pl-14 from the two <main> branches to their shared parent div
so any future layout branch gets the rail offset automatically.
Functionally equivalent; verified NewTabChat uses absolute inset-0
relative to its own <main>, so the chat layout is unaffected.
2026-05-07 10:15:21 -07:00
shivammittal274
7a2a8e09bc feat(agent): add Hermes as 4th ACPX adapter (in-VM container, BrowserOS-managed providers) (#956)
* feat(agent): add Hermes as a 4th ACPX adapter (Phase A)

Adds Hermes Agent (NousResearch/hermes-agent) as a host-process ACPX
adapter, mirroring the Claude Code pattern.

- agent-types.ts: extend AgentAdapter union with 'hermes'
- agent-catalog.ts: add Hermes catalog entry
- lib/agents/hermes/prepare.ts (new): minimal prepare using prepareBrowserosManagedContext
- acpx-agent-adapter.ts: register the adapter
- acpx-runtime.ts: add 'hermes' branch returning 'hermes acp' (host)
- AdapterIcon.tsx: add Hermes icon
- db schema + supporting frontend types/literals updated for the new adapter

Phase A scope: host-process only. Phase A.5 swaps to nerdctl exec
into a Hermes container.

OpenClaw is untouched. Verified by all 6 POC spikes
(plans/features/claude-browseros-hermes-poc/findings.md).

* fix(agent): address Hermes adapter review issues

- NewAgentDialog: add 'hermes' to onValueChange guard so the dropdown
  option actually wires through onRuntimeChange/onHarnessAdapterChange
  (was a no-op before — selecting Hermes silently kept previous value)
- tests/acpx-runtime: add coverage for the new 'hermes' registry branch
- tests/acpx-agent-adapter: fold hermes prepare test into existing file,
  matching the pattern used for claude/codex/openclaw
- Delete tests/lib/agents/hermes-prepare.test.ts (now redundant)
- Reconcile install-mechanism comment between acpx-runtime.ts and
  agent-catalog.ts

* fix(agent): make Hermes adapter actually work end-to-end

Two surgical fixes uncovered while running the Phase A smoke test
through the BrowserOS chat HTTP API:

1. lib/agents/hermes/prepare.ts — seed per-agent HERMES_HOME from
   the user's global ~/.hermes/ on first use. ensureAgentHome only
   writes SOUL.md and MEMORY.md; without seeding config.yaml, .env,
   and auth.json, hermes acp comes up unconfigured and either hangs
   or errors with "No LLM provider configured." Copy is idempotent
   (skip if dest exists) so subsequent prepare calls don't clobber
   per-agent edits.

2. lib/agents/acpx-runtime.ts — wrap the hermes spawn in
   `bash -c "exec hermes acp | tee /dev/null"` to bridge Bun's
   socketpair-based child stdio with Python's asyncio.connect_write_pipe
   (which only drains correctly to a real pipe(2)). Without it, hermes'
   stdout never reaches the harness — verified by inspecting hermes
   process FDs: Bun gives the child unix sockets, asyncio queues writes
   that never become readable on Bun's end. With tee in the middle,
   hermes writes to a real pipe and tee bridges the bytes through the
   socket. Verified 2026-05-06 against hermes-agent 0.12.0 on macOS
   arm64 + Bun 1.3.6.

Smoke-test result with both fixes:
- ACP session created end-to-end
- BrowserOS MCP wired (96 browser tools registered with hermes)
- Reasoning + text streamed back through /agents/:id/sidepanel/chat
- Final stream: text-delta "PONG", finishReason "stop"

Updates the existing acpx-runtime test to assert the new spawn shape
(bash -c, tee /dev/null bridge) so the workaround can't silently regress.

* feat(agent): run Hermes adapter in Lima container (Phase A.5)

Move Hermes ACPX adapter from host-process spawn to running inside
docker.io/nousresearch/hermes-agent:v2026.4.30 in the existing
BrowserOS Lima VM, mirroring the OpenClaw container pattern.

Container lifecycle (api/services/hermes/hermes-container.ts):
- prewarm: ensure VM ready, pull image (or skip if already in
  containerd), start an idle container with /bin/sh -c "exec sleep
  infinity" so the harness can nerdctl exec into it per turn
- Tini bypassed — tini 0.19.0 in upstream image getopt-parses any
  -x token even after PROGRAM, breaking /bin/sh -c
- --add-host host.containers.internal:<vm-gateway> so hermes inside
  the container can reach the BrowserOS HTTP MCP endpoint
- Bind-mount <browserosDir>/vm/hermes/harness onto /data/agents/harness
  so per-agent HERMES_HOME dirs are visible to the container

Spawn (acpx-runtime.ts):
- HermesGatewayAccessor interface (mirrors OpenclawGatewayAccessor)
- resolveHermesAcpCommand builds:
  env LIMA_HOME=... limactl shell --workdir / browseros-vm --
    nerdctl exec -i -e PYTHONUNBUFFERED=1 -e HERMES_HOME=... <container>
    /opt/hermes/.venv/bin/hermes acp
- Absolute path /opt/hermes/.venv/bin/hermes (not bare "hermes") since
  upstream image's PATH is set by its entrypoint script which we
  override to keep the container idle
- Falls back to host-process spawn when no HermesGatewayAccessor wired
  (test path / dev fallback)
- Drops the host-mode bash+tee workaround — limactl/SSH/nerdctl pipe
  chain is sufficient for asyncio's pipe writer

MCP wiring:
- New PreparedAcpxAgentContext.browserosMcpHost field threads through
  prepare → getRuntime → createBrowserosMcpServers
- Hermes prepare sets browserosMcpHost='host.containers.internal' so
  the URL injected into newSession.mcpServers resolves from inside
  the container; other adapters keep '127.0.0.1' default

Per-agent home (lib/agents/hermes/prepare.ts):
- HERMES_HOME points at /data/agents/harness/<agentId>/home (in-container)
- Host-side seedHermesHomeFromGlobal still copies ~/.hermes/{config.yaml,
  .env, auth.json} into the per-agent home; the volume mount makes them
  visible inside the container
- New api/services/hermes/hermes-paths.ts holds host/container path helpers

End-to-end smoke tests against the dev server (clean Lima state):
- Plain text: PONG round-trip via /sidepanel/chat ✓
- Multi-turn context: RUBY-7421 stored + recalled ✓
- Multi-agent isolation: agent 2 doesn't see agent 1's secret ✓
- MCP tool execution: mcp_browseros_browseros_info fires ✓
- Image attachment via /chat: model identifies "Red" from a 128x128 PNG ✓
- Concurrent turns + 409 attachUrl: full attach streams the in-flight
  Pacific Ocean essay turn cleanly ✓
- Cancel midstream + recovery turn: ALIVE response ✓
- Persistence across server restart: agents survive ✓

Companion knowledge doc:
plans/features/claude-browseros-hermes-acp-knowledge.md

* feat(agent): per-agent provider/key for Hermes adapter

Lets users create multiple Hermes agents each with its own provider,
model, and API key. NewAgentDialog now shows provider/model/key fields
inline when 'Hermes' is selected. On submit, the harness writes the
per-agent <browserosDir>/vm/hermes/harness/<agentId>/home/{config.yaml,
.env} directly so the agent has the right config from turn 1 — no
dependency on the user having run `hermes setup` outside BrowserOS.

The existing seedHermesHomeFromGlobal flow remains as a fallback for
agents created without provider fields (e.g. via direct API or with
an existing ~/.hermes/ install).

Backend:
- shared/constants/hermes.ts: HERMES_SUPPORTED_PROVIDERS registry
  (openrouter, anthropic, openai, custom — bedrock follow-up)
- api/services/hermes/hermes-paths.ts: writeHermesPerAgentProvider
- agent-harness-service: writes per-agent config.yaml + .env in
  createAgent when adapter=hermes and apiKey present
- routes/agents.ts: relax modelId catalog validation for adapter=hermes
  (catalog has empty models[] by design; per-agent modelId is free-form)
- tests/agent-harness-service: cover write + skip paths

Frontend:
- HermesProviderFields.tsx (new): provider dropdown, model field, API
  key + optional baseUrl when provider=custom
- NewAgentDialog: render the new fields when adapter=hermes
- agents-page-actions: thread fields through createHarnessAgent
- AgentsPage / agent-harness-types: minor pass-through edits

Smoke-tested end-to-end against the dev server (clean Hermes per-agent
home, no ~/.hermes/ seed): create agent with apiKey + modelId, files
written at the per-agent path with mode 0600, first chat returns the
expected response, all without touching ~/.hermes/.

* feat(agent): source Hermes provider config from BrowserOS LLM providers

Replace the Hermes-specific provider/model/API-key form in New Agent
with a chooser that pulls from the same global LLM providers OpenClaw
uses (Settings → BrowserOS AI). Backend rejects creation with a 400
when the selected provider is missing required fields (apiKey, modelId,
plus baseUrl for openai-compatible) or is not in the Hermes-supported
set; the ~/.hermes/ fallback is removed so Hermes agents always carry
their own per-agent config.
2026-05-07 21:54:36 +05:30
shivammittal274
6f8da5b7fb refactor(openclaw): TKT-788 cleanup (relanded, openclaw-only) — bump image, lock no-auth, delete observer + image bypass (#954)
* refactor(openclaw): TKT-788 cleanup — bump image, lock no-auth, delete observer + image bypass

Re-lands the openclaw-only changes from #934 (reverted in #953 because the
original PR's working tree had stale rollback content for
`packages/browseros/tools/patch/`). This commit is the same openclaw
diff with zero changes outside `packages/browseros-agent/`.

What changes (TKT-788 work-streams A + B + C):

WS-A — bundled gateway no-auth:
- Bump image from `ghcr.io/openclaw/openclaw:2026.4.12` to
  `ghcr.io/browseros-ai/openclaw:2026.5.2-browseros.1` (BrowserOS-
  pinned variant with the no-auth contract baked in).
- Configure gateway with `auth.mode: 'none'`; remove the device-auth
  bootstrap dance that the older binary required.
- Delete the per-call token plumbing the http-client / observer / chat-
  client carried (340 LOC). The harness still passes a stable token in
  headers for backwards-compat with code that hasn't been re-pointed yet,
  but it is no longer required by the gateway.

WS-C — delete the image-attachment bypass:
- The HTTP `/v1/chat/completions` carve-out for OpenClaw image turns
  is gone. Image attachments now ride through ACP as image content
  blocks (which acpx 0.6.x supports natively for openclaw, claude, codex).
- Delete `openclaw-gateway-chat-client.ts` (211 LOC) and `image-turn.ts`
  (219 LOC).
- Drop `maybeHandleTurn` from the `AcpxAgentAdapter` interface and
  the openclaw entry. `AcpxAdapterTurnInput` removed.
- Drop the corresponding 'diverts OpenClaw image turns to the gateway
  chat client' test from `acpx-runtime.test.ts`.

WS-B — replace the WS observer with harness events:
- Delete `openclaw-observer.ts` (276 LOC) — no more parallel WS
  subscription, no more `new OpenClawObserver`, no more
  `ensureObserverConnected` / `observer.disconnect()` plumbing.
- Wire `AgentHarnessService` to receive turn-lifecycle events from
  the runtime stream itself (`turnLifecycleListeners`) and feed
  ClawSession from those, preserving the dashboard SSE shape.

Net: 314 insertions / 1144 deletions, all under
`packages/browseros-agent/`. Typecheck clean across all 6 packages.
946 server tests pass (1 unrelated CDP-dependent test skipped — same
state as origin/dev).

Reference: TKT-788. The patch-CLI rollback that was in the squash of
#934 is intentionally NOT in this commit.

* fix(openclaw): handle 2026.5.4 acp-cli envelope shapes (media + injected timestamp) + bump image

OpenClaw 2026.5.4 (the BrowserOS-pinned image variant with the no-auth
handshake bypass needed for cron tool calls from inside ACP) introduced
two new envelope prefix shapes that the post-bypass-deletion path now
surfaces in user-message text:

  [media attached: <internal-path> (<mime>)]
  [<weekday> <YYYY-MM-DD HH:MM> <TZ>] [Working directory: <path>]
  <BrowserOS role envelope>

The previous cleaner only matched a leading [Working directory: ...]
\n\n line. With media + timestamp prefixes ahead of it the anchor
no longer matched, so image-attachment user turns rendered with
8+ lines of envelope leak in the chat panel.

Replaces the single OPENCLAW_WORKDIR_PREFIX with three content-shape-
anchored patterns chained through stripOpenClawAcpCliEnvelope():

  1. [media attached: <path> (<mime>)]      ← repeats per attachment
  2. [<weekday> <YYYY-MM-DD HH:MM> <TZ>]    ← injectTimestamp
  3. [Working directory: <path>]            ← acp-cli prefixCwd

Each is anchored on its content shape (media attached:, weekday
abbrev + ISO date, Working directory:) rather than just '[…]', so
user-typed lines that happen to start with brackets are not eaten.

Also bumps OPENCLAW_IMAGE from 2026.5.2-browseros.1 to
2026.5.4-browseros.1. The 5.2 image refused tool-side WS connections
with 'device identity required' even though gateway auth.mode=none —
PR #6 in browseros-ai/openclaw added the OPENCLAW_GATEWAY_PRIVATE_INGRESS_NO_AUTH
bypass that ships in 5.4. Without 5.4, the cron tool (and any other
tool that opens a fresh gateway WS from inside the embedded runner)
fails with 1008.

Verified end-to-end with the BrowserOS chat endpoint:
- Plain text turn: clean
- Image attachment turn: clean (was leaking 8 envelope lines pre-fix)
- One-shot kind:at cron fires, PING fire renders clean
- Second openclaw agent creates, runs, history isolated

15/15 history-mapper unit tests pass; typecheck clean across all
packages.
2026-05-07 02:26:25 +05:30
shivammittal274
50cbe48558 Revert "refactor(openclaw): lock no-auth gateway, bump image, delete token pl…" (#953)
This reverts commit d81b99c8e3.
2026-05-07 01:49:50 +05:30
shivammittal274
d81b99c8e3 refactor(openclaw): lock no-auth gateway, bump image, delete token plumbing (TKT-788 WS-A) (#934)
* fix: disable bundled OpenClaw gateway auth

* refactor(openclaw): delete token plumbing now that auth is locked off

Builds on the cherry-picked spike (#933). With gateway.auth.mode=none
locked in as the only path the bundled gateway runs, the BrowserOS-side
token machinery becomes dead weight. This commit deletes:

- OpenClawService: token field, tokenLoaded, gatewayAuthMode state
  machine, getGatewayToken(), getGatewayHttpToken(),
  ensureTokenLoaded(), refreshGatewayAuthToken(),
  loadTokenFromConfig() and all six lifecycle call sites.
- OpenclawGatewayAccessor.getGatewayToken interface field.
- OpenClawHttpClient / OpenClawGatewayChatClient: optional getToken
  constructor arg and authHeaders() helpers.
- OpenClawObserver: gatewayToken field/parameter and the auth.token
  branch in the connect frame.
- GatewayContainerSpec.gatewayToken and the
  OPENCLAW_GATEWAY_TOKEN env wiring; the
  OPENCLAW_GATEWAY_PRIVATE_INGRESS_NO_AUTH=1 env is now always set
  rather than conditional.

Test suites: dropped bearer-token assertions and the two persisted-token
tests in openclaw-service that asserted deleted behavior.

Net: -310 LOC across src + tests, with 118 openclaw + acpx tests still
green. Typecheck and biome clean.

Reference: TKT-788 (move OpenClaw integration to ACPX runtime), WS-A.

* refactor(openclaw): delete gateway image bypass, route image turns via ACP (TKT-788 WS-C) (#935)

* refactor(openclaw): delete gateway image bypass, route image turns through ACP

The browseros-ai/openclaw ACP bridge accepts image content blocks
natively (extractAttachmentsFromPrompt at openclaw/src/acp/event-mapper.ts:92,
forwarded via chat.send attachments at translator.ts:295), so the
BrowserOS-side carve-out that diverted image-bearing turns to the
gateway HTTP /v1/chat/completions endpoint is no longer needed.

Deletes:

- apps/server/src/api/services/openclaw/openclaw-gateway-chat-client.ts
- The corresponding test file
- AcpxRuntime.sendOpenclawViaGateway, persistGatewayTurn,
  recordToOpenAIMessages helpers
- The image-attachment carve-out branch in AcpxRuntime.send
- openclawGatewayChat option from AcpxRuntime + AgentHarnessService
  + agent routes ctor wiring
- The randomUUID import (only the deleted helper used it)
- The acpx-runtime test for the deleted carve-out

Net: 614 LOC removed, 0 added, all 142 openclaw + acpx + agent tests
still green.

Reference: TKT-788, WS-C. Stacked on WS-A (#934).

* refactor(openclaw): delete WS observer, feed ClawSession from harness events (#936)

The openclaw-observer.ts WebSocket observer was a second tap on the
same gateway events the AcpxRuntime already sees as ACP session/update
notifications. Replace it with a pull from the AgentHarnessService's
turn lifecycle stream — keeping ClawSession and the /openclaw/dashboard
SSE endpoint shape unchanged for the BrowserOS UI.

Changes:

- AgentHarnessService: emit `turn_started` / `turn_event` / `turn_ended`
  to subscribers via a new `onTurnLifecycle(listener)` API. Wired around
  the existing `notifyTurnStarted/Ended` calls and inside the
  per-event read loop.
- agents route: forward an optional `onTurnLifecycle` dep into the
  service it constructs.
- server.ts: subscribe and route OpenClaw-adapter events to
  `OpenClawService.recordAgentTurnEvent(agentId, sessionKey, event)`.
- OpenClawService: new `recordAgentTurnEvent` method that maps stream
  events to ClawSession transitions (working/idle/error + currentTool
  from `tool_call` events). Keeps the existing
  `onAgentStatusChange` / `getAgentState` / `getDashboard` API.
- Delete `openclaw-observer.ts` (276 LOC) and all observer wiring
  (`new OpenClawObserver`, `ensureObserverConnected`, three
  `observer.disconnect()` call sites, the import).

Net: 276 LOC removed from the observer; ~130 LOC added across harness
event plumbing + recorder method. -146 LOC overall, all 141 tests still
green, typecheck clean, biome clean.

Reference: TKT-788, WS-B (Path 1: keep ClawSession + dashboard SSE shape).
Independent of WS-A (#934) and WS-C (#935); will rebase on top of
whichever lands first.

---------

Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>
2026-05-07 01:40:37 +05:30
shivammittal274
86cb03a1fc fix(openclaw): drop BrowserOS-envelope regexes in history mapper (#952)
* fix(openclaw): drop BrowserOS-envelope regexes in history mapper

Replace the four BrowserOS-side regex strips (`<role>`,
`<user_request>`, `<system-reminder>`, `[Working directory:]`)
in history-mapper with a single call to
`unwrapBrowserosAcpUserMessage`. That helper is the same exact-string
matcher acpx-runtime already uses for non-OpenClaw history paths
(chat history endpoint, listing's `lastUserMessage`); it anchors on
the exact constants `buildBrowserosAcpPrompt` writes, so matcher and
wrapper travel together.

Also drops two patterns that were defensive-only with no emit site in
the codebase (`[Working directory:]` prefix and trailing
`<system-reminder>` block), and updates the corresponding tests to
use the realistic envelope shape `buildBrowserosAcpPrompt` actually
produces.

The OpenClaw-injected scaffolding patterns (cron prefix, queued-
marker, subagent context) stay in place for now — replacing those
needs either a side-channel cache keyed on cron job id or a structured
`trigger` field on the gateway's history schema, tracked as a
follow-up.

* fix(openclaw): strip acp-cli's [Working directory:] prefix before BrowserOS unwrap

The previous commit incorrectly removed the workdir-prefix strip on the
assumption it was speculative defensive code with no live emit site.
Actually emitted by OpenClaw's acp-cli (`/app/dist/acp-cli-*.js` line
1361, `prefixCwd ? \`[Working directory: ${displayCwd}]\\n\\n...` style),
so live history rendering regressed: every user message surfaced with
a `[Working directory: /Users/...]\\n\\n<role>...` envelope intact.

Restore the strip as an exact-shape line match (`^\\[Working directory:
[^\\]]*\\]\\n\\n`) anchored on the closing bracket + double-newline so
path content is consumed without a content-shape regex. Apply it
ahead of `unwrapBrowserosAcpUserMessage` so the BrowserOS unwrap's
`^<role>` anchor can match the now-leading envelope.

Also fix the test fixture: the BrowserOS unwrap performs exact-prefix
match against the full `BROWSEROS_ACP_AGENT_INSTRUCTIONS` constant —
truncated `<role>...` test bodies didn't match. Tests now use the
verbatim constant text via a shared `ROLE_BLOCK` helper.

Verified live: 8/8 history entries render with no envelope leaks.
2026-05-06 23:54:09 +05:30
shivammittal274
7765d99c73 feat(openclaw): aggregate sub-session history into agent main session (#939)
* feat(openclaw): aggregate sub-session history into agent's main session

Cron-triggered (and hook/channel-triggered) runs land in their own
ephemeral session files under the parent agent's directory:

  /home/node/.openclaw/agents/<agentId>/sessions/<runId>.jsonl

The chat panel reads agent:<id>:main, so autonomous runs were invisible
in history even though they fired and persisted on disk.

This change makes `getSessionHistory(agent:<id>:main)` enumerate every
session under that agent (via the existing `sessions.list` gateway RPC)
and merge their messages into one chronological response. Each merged
message is tagged with `source` (main / cron / hook / channel) and the
sub-session's key, so the UI can render section markers without
re-parsing.

Filesystem isolation is enforced upstream — `sessions.list({ agentId })`
resolves to that agent's directory only (browseros-ai/openclaw
src/config/sessions/combined-store-gateway.ts:90), so no cross-agent
leakage is possible.

Behavior:
- Main session keys (`^agent:[^:]+:main$`) → aggregate
- Any other key → existing single-session behavior
- Sub-session fetch failures → logged + dropped (partial timeline
  preferable to a hard failure that hides main)
- `limit` applied post-merge across the unified timeline
- Streaming variant (`Accept: text/event-stream`) unchanged for now

Reuses the pre-existing `cliClient.listSessions` and
`httpClient.getSessionHistory` — no new gateway integration.

Validation:
- bun typecheck clean
- bunx biome check clean
- 44 openclaw service + route tests pass

* feat(openclaw): wire chat panel history through gateway aggregation

Adds the missing seam between the chat panel's history fetch and
OpenClawService's aggregated history.

Before this change:
- Chat panel calls GET /agents/<id>/sessions/main/history
- AgentHarnessService.getHistory delegates to AcpxRuntime.getHistory
- AcpxRuntime reads ~/.browseros-dev/agents/acpx/sessions/<key>.json
- That local file is only written by AcpxRuntime.send (user turns)
- Cron / hook / channel turns persist on the gateway side instead
- Panel sees user turns only; autonomous turns are invisible

After this change:
- OpenClawProvisioner gains optional getAgentHistory(agentId) method
- AgentHarnessService.getHistory branches on adapter — for openclaw,
  routes through the provisioner instead of the runtime
- server.ts wires the provisioner method to call
  OpenClawService.getSessionHistory("agent:<id>:main") which already
  aggregates main + every sub-session
- New history-mapper.ts converts OpenClaw rich content blocks
  (text/thinking/toolCall/toolResult) into AgentHistoryEntry shape
  the chat panel consumes

Layering preserved:
- AcpxRuntime untouched, still generic, zero services/openclaw imports
- AgentHarnessService still talks only to abstract OpenClawProvisioner
- server.ts is the single concrete-binding seam (same place that
  wires createAgent, removeAgent, getStatus)
- Other adapters (claude, codex) keep their existing local-file
  history path — no behavior change for them

Tool-call pairing: assistant `toolCall` blocks are stored by
toolCallId; subsequent `toolResult` (role: 'tool') messages mutate the
same AgentHistoryToolCall reference to attach output / error, so the
UI renders complete tool entries instead of orphan inputs.

Net: +240 LOC, 1 new file, AcpxRuntime untouched, 117 tests still pass.

* feat(openclaw): paginate aggregated history + strip prompt scaffolding

Two follow-ups on the aggregation work, both required for the chat
panel to render OpenClaw history cleanly.

1. Compound-cursor pagination across sub-sessions

The previous aggregation always returned the full merged window with
cursor=null/hasMore=false, which broke "load more" in the chat panel
once an agent's history grew beyond a single page (every cron job
spawns a sub-session, so this hits quickly).

Per-session cursor support already exists on the gateway HTTP endpoint
(`session-history-state.ts:paginateSessionMessages`). The aggregator
now threads each session's cursor through and emits a compound cursor
encoding `{<sessionKey>: messageSeq | null}`, base64url JSON. A `null`
slot means the session is exhausted; subsequent pages skip it.

The gateway records the per-session monotonic seq inside the
`__openclaw.seq` extension envelope rather than the top-level
`messageSeq` field; the cursor reads from there. The wire-shape type
gains an optional `__openclaw?: { id?, seq? }` field reflecting that.

2. Strip OpenClaw + BrowserOS scaffolding from history user messages

Cron-fired user messages on the gateway side carry an OpenClaw
template:

  [cron:<uuid> <name>] <payload>
  Current time: ...
  Use the message tool if you need to notify the user directly with an
  explicit target. ...

BrowserOS-initiated turns carry the ACP system prefix:

  [Working directory: ...]
  <role>...</role>
  <user_request>
  <actual user text>
  </user_request>
  <system-reminder>...</system-reminder>

Both surface verbatim in the chat panel today. Add
`cleanHistoryUserText` (in history-mapper) which extracts:
- the cron payload (and drops the trailer)
- the user_request body (and drops the role / working-dir / system-
  reminder envelopes)

Non-matching text falls through unchanged so future patterns we don't
recognize stay visible rather than getting silently dropped.

Verified end-to-end:
- /agents history endpoint now returns clean text per item
- Pagination cursor advances across pages with correct seq ordering
- Chat panel renders messages as `print('hello')`, `hey`, etc.
  (no leaked envelopes or trailers)
- 8 new unit tests for cleanHistoryUserText + the converter, +
  86 existing openclaw tests still pass

* feat(openclaw): handle queued-marker concatenation in history cleaner

When multiple cron prompts (or any prompts) arrive while a turn is
still active, BrowserOS's harness queue concatenates them into a
single user message joined by a marker line:

  [Queued user message that arrived while the previous turn was still active]

That blob renders as one wall of text in the chat panel — and worse,
the cron-prompt cleaner doesn't fire because the message no longer
*starts* with `[cron:...]`. cleanHistoryUserText now splits on the
queued-marker line and runs each chunk through the per-message cleaner
(cron-prompt extraction or BrowserOS-prefix unwrap), then joins the
non-empty results with single newlines so each prompt renders as its
own visually distinct line.

Verified live: a 6926-char queued blob containing five concatenated
[cron:...] prompts now renders as five short `print('hello')` lines.

+ 2 unit tests covering split + leading-marker edge case.

* feat(openclaw): drop subagent context + reasoning-only assistant turns

Two new patterns surfaced during e2e cron testing.

1. [Subagent Context] prefix: when an OpenClaw agent invokes a nested
   subagent, the subagent's session is seeded with a user message:

     [Subagent Context] You are running as a subagent (depth N/M). ...
     Begin. Your assigned task is in the system prompt under **Your Role**.

   The actual task lives in the subagent's system prompt; the user
   message body is pure scaffolding. cleanHistoryUserText now returns
   empty for these so the converter drops the entry — no empty bubble.

2. Reasoning-only assistant turns: MiniMax with thinking:minimal often
   returns content with only `thinking` blocks and no `text` block on
   trivial prompts ("Print hello"). The empty text bubble plus dangling
   reasoning collapsible reads as a broken UI. The converter now skips
   any entry where text is empty AND there are no tool calls (regardless
   of reasoning).

Trade-off: reasoning-only turns lose their reasoning collapsible. The
alternative (empty-bubble cards) is worse. If we want to preserve the
reasoning, surface it as the bubble's text — separate UI decision for
later.

+ 3 unit tests covering both patterns.
2026-05-06 00:15:57 +05:30
Dani Akash
db5e55a174 feat(agent-files): expose openclaw produced files inline + outputs rail (#946)
* feat(server): foundation for OpenClaw agent file-output attribution

Phase 1 of TKT-762 — surface files OpenClaw agents produce as
artifacts inline in chat + a per-agent Outputs rail. This commit
lays the storage + I/O foundation only; turn-lifecycle wiring,
HTTP routes, and UI follow in subsequent phases.

- New `produced_files` Drizzle table (FK→agent_definitions with
  cascade, unique on (agent, path) so re-modifications upsert).
  Migration 0002_chemical_whirlwind.sql. Adapter-agnostic schema
  — V1 only enables the watcher for openclaw, V2 can plug Claude
  / Codex into the same table without migrating.
- `ProducedFilesStore` — snapshot/finalize-turn diff API plus
  by-turn / by-agent queries and a path-resolver that enforces
  workspace-root containment for the download / preview routes.
- `walkWorkspace` — bounded recursive workspace walker; skips
  symlinks (no host-fs smuggling), excludes node_modules / .git /
  .cache, hard-capped at 50k entries / depth 16.
- `file-preview` helper — extension + magic-byte MIME detection,
  bounded text-snippet reader (1 MB cap), inline image base64
  reader (4 MB cap). Streaming download path lives in the route
  layer (next phase) — this module only handles the small
  in-memory reads the preview UX needs.

* feat(server): attribute openclaw turn outputs to the harness layer

Phase 2 of TKT-762 — wire the per-turn workspace diff into the
single dispatch path that owns every turn's lifecycle. Two prior
wiring points the original plan named (the OpenClaw HTTP chat
route + OutboundQueueService.tryDispatch) were collapsed in dev
into agent-harness-service.runDetachedTurn — both direct sends
and queued sends route through it now, so a single hook covers
both. The old `OutboundQueueService` is gone; its successor
`message-queue.ts` re-enters runDetachedTurn for the queued
case, so we still only need to bracket once.

Changes:

- New `produced_files` variant on `AgentStreamEvent` so the
  inline artifact card has a wire-format hook independent of the
  REST API.
- `ProducedFilesStore` gains `resolveAgentDefinitionId` to bridge
  gateway-side openclaw agent names to the harness's
  `agent_definitions.id`, handling both the reconciled-row shape
  (id == openclaw name) and the BrowserOS-created shape
  (id = oc-<uuid>, name = openclaw display name).
- `AgentHarnessService.runDetachedTurn`: snapshot the openclaw
  workspace before `runtime.send(...)`, finalize the diff in the
  outer finally, push the resulting rows as a `produced_files`
  event. Adapter-gated to openclaw only — Claude / Codex agents
  write to the user's own filesystem and don't need
  attribution.
- Skip attribution on user-cancel (`abort.signal.aborted`) so
  the side effects of an aborted turn don't get surfaced as
  "outputs you asked for." On runtime errors we still attribute,
  because partial outputs are what the user is most likely to
  want to recover.
- Lazy-init the store via `tryGetProducedFilesStore()` so tests
  that swap in a fake `agentStore` don't trip the
  process-wide `getDb()` initialisation guard.
- File attribution extracted into `attributeTurnFiles` helper to
  keep `runDetachedTurn`'s cognitive complexity under the lint
  ceiling.

Verifications:
- Server tsgo --noEmit clean for changed files.
- 162/162 server-api tests pass.
- Biome lint clean on all three changed files.

* feat(server): expose produced-files HTTP API for /agents

Phase 3 of TKT-762 — surface the rows Phase 2 attributes via four
read-only endpoints under the existing `/agents` router. Mounted
where the agents page already polls so the rail UI doesn't add
a second router/origin to its trust boundary.

Routes:

- GET /agents/:agentId/files
    Outputs-rail data, grouped by the assistant turn that
    produced each batch, newest first. `?limit=` clamps to N
    rows server-side (default 200).

- GET /agents/:agentId/files/turn/:turnId
    Per-turn refresh — used by the inline-card consumer to
    rebuild metadata after the SSE `produced_files` event lands,
    and by direct fetches that missed the live event.

- GET /agents/files/:fileId/preview
    Discriminated `FilePreview` JSON: text snippet (≤1MB),
    base64 image (≤4MB), pdf metadata, or `binary` placeholder
    when neither preview path applies. 404 when the file id is
    unknown OR the on-disk file disappeared after attribution.

- GET /agents/files/:fileId/download
    Streams raw bytes via `Bun.file().stream()` with
    `Content-Disposition: attachment` and the detected MIME
    type. The fileId is opaque — the server resolves the agent
    and on-disk path; the client never sees a path, so traversal
    is impossible by construction.

Service layer:

- `AgentHarnessService` gains `listAgentFiles`,
  `listAgentFilesForTurn`, `previewProducedFile`, and
  `resolveProducedFileForDownload`. All four are no-ops for
  claude / codex adapters (they return null/[]) so the route
  contract stays uniform across adapters even though only
  openclaw produces rows in v1.
- New `ProducedFileEntry` and `ProducedFilesRailGroup` DTOs —
  trimmed wire shapes that strip `agentDefinitionId` and
  `sessionKey` from the on-disk row.

Verifications:
- Server tsgo --noEmit clean for changed files (only pre-
  existing `Bun` global warning).
- 162/162 server-api tests pass.
- Biome clean on both changed files.

Smoke-test instructions for the route shape live in the plan
under §6 and §8; full end-to-end smoke happens in Phase 6.

* feat(agent): client-side hooks + types for agent file outputs

Phase 4 of TKT-762 — frontend foundation for the inline artifact
card and the per-agent Outputs rail. UI components themselves
land in Phase 5; this commit only adds types, hooks, and shared
helpers so the wiring is in place when the components arrive.

New module: `apps/agent/lib/agent-files/`

- `types.ts` — `ProducedFile`, `ProducedFilesRailGroup`, and the
  discriminated `FilePreview` union, mirrored from the server-side
  DTOs in `apps/server/src/api/services/agents/agent-harness-service.ts`.
  The `agentDefinitionId` / `sessionKey` columns on the on-disk
  rows deliberately do NOT exist at the type boundary — clients
  refer to files by opaque `id`.
- `file-helpers.ts` — pure helpers: `inferFileKind` (icon
  routing), `formatFileSize`, `extensionOf`, `basenameOf`,
  `buildFileDownloadUrl`. No React, no fetch, no DOM — anything
  stateful belongs in the hooks.
- `useAgentOutputs.ts` — `useAgentOutputs(agentId)` for the rail,
  `useAgentTurnFiles(agentId, turnId)` for the inline card,
  `useInvalidateAgentOutputs()` for the chat-stream-completion
  hook (Phase 5 will plumb this), and `useRefreshAgentOutputs()`
  for the rail's manual refresh button.
- `useFilePreview.ts` — `useFilePreview(fileId)` with
  `staleTime: Infinity` (previews are immutable for a given id;
  no point refetching on focus). Always opt-in (`enabled`) — the
  preview only loads when the user clicks a row.
- `index.ts` — barrel re-export so consumers import from one path.

Touched in `apps/agent/entrypoints/app/agents/`:

- `agent-harness-types.ts` — added `produced_files` variant + the
  `HarnessProducedFile` type to `AgentHarnessStreamEvent`. Mirrors
  the server-side change from Phase 2 so the client SSE consumer
  type-narrows correctly.
- `useAgents.ts` — exported the previously-private `agentsFetch`
  helper and the `AGENT_QUERY_KEYS` registry so the agent-files
  hooks reuse them without duplicating fetch / key conventions.
  Three new keys added: `agentOutputs`, `agentTurnFiles`,
  `filePreview`.

Verifications:
- Agent tsgo --noEmit clean.
- Biome clean on all touched files.

* feat(agent): inline artifact card + per-agent outputs rail

Wires the chat surface to the produced-files API shipped earlier:

- Inline artifact card under each assistant turn that produced files,
  populated by the live `produced_files` SSE event (resumes also stamp
  `turnId` so a missed live event can fall back to the per-turn fetch).
- Collapsible right-side Outputs rail on the agent conversation page,
  grouped by turn, with Refresh + per-agent open/close persistence in
  localStorage. Gated to openclaw adapters in v1.
- Shared file preview Sheet branches on the FilePreview union: text
  snippet (markdown for `.md`/`.mdx`, otherwise pre+code), image data
  URL, and download-only fallback for pdf/binary/missing.
- Conversation hook invalidates the rail's React Query cache from its
  finally block so newly attributed files appear without a manual
  refresh.

* feat(agent-files): polish — symlink-safe paths + toast on failures

- `resolveFilePath` now rejects symlink-escapes from the workspace
  by realpath-resolving both endpoints and re-checking containment.
  Lexical traversal (`..` segments) still fails fast without
  touching the filesystem.
- Added `produced-files-store.test.ts` with 6 path-resolution cases
  including a symlink whose target lives outside the workspace
  root — the prior string-only check would have allowed this.
- File preview Sheet: surfaces preview-load failures in a toast
  (in addition to the inline error block, which is easy to miss
  when the body has scrolled). Download button now intercepts the
  click so a missing baseUrl shows a toast instead of silently
  hiding the button.
- Outputs rail: refresh failures fire `toast.error` with the
  underlying message.

* fix(agent-files): drop duplicate `/agents` prefix from client paths

`agentsFetch` / `buildAgentApiUrl` already prepend `/agents`, but
the file-output hooks were passing fully-qualified paths
(`/agents/<id>/files`, `/agents/files/<id>/preview`, etc.) which
resolved to `/agents/agents/...` and 404'd. Fixed the four call
sites to pass paths relative to the `/agents` root.

* fix(agents): strip openclaw role envelope from chat history

PR #924 introduced a second `<role>…</role>` prefix for openclaw
turns — a single-line block distinct from the multi-line BrowserOS
role TKT-774 wired the unwrap against. Because TKT-774's
`stripOuterRoleEnvelope` matched the BrowserOS prefix exactly, the
openclaw envelope sailed through unstripped and user messages on
openclaw agents rendered the full preamble in /sessions/main/history
responses.

Make the strip adapter-agnostic: any
`<role …>…</role>\n\n<user_request>\n…\n</user_request>` shape gets
unwrapped. Drops the now-unused BROWSEROS_ACP_AGENT_INSTRUCTIONS
constant and adds a regression test that uses the openclaw form
verbatim.

* feat(agent-files): inline file-card strip with rail deep-link

Replaces Phase 5's row-list ArtifactCard with a horizontal strip
of small file cards under any assistant turn that produced files.
Click a card → opens the FilePreviewSheet directly (preview +
download). Click View / +N → opens the per-agent Outputs rail and
scrolls / expands the matching turn group.

The card strip:
- Caps at 4 visible cards; remainder collapses into a +N pill that
  shares the View handler.
- Owns its own FilePreviewSheet instance (parallel to the
  deprecated ArtifactCard) so the per-card preview path doesn't
  fight with the rail's Sheet.
- Hidden during streaming and absent when producedFiles is empty.
- Adapter-gated upstream: AgentCommandConversation only passes the
  open-rail callback when adapter==='openclaw', so claude / codex
  agents render no rail-opening affordance.

Rail changes:
- Accepts focusTurnId + onFocusTurnConsumed; the matching
  RailTurnGroup expands and scrollIntoView's on focus, then fires
  the consumed callback so the parent can drop the URL state.
- ?outputsTurn=<turnId> deep-links work: external nav opens the
  rail, sets focusTurnId, and clears the param after consumption.

ArtifactCard is marked @deprecated; remove in a follow-up once
nothing imports it.

* fix(agent-files): keep file-card strip visible after history reload

After Phase 7 the inline FileCardStrip vanished as soon as a turn
finished: `filterTurnsPersistedInHistory` dropped the optimistic
turn once history reloaded, and history items don't carry
`producedFiles`. So the user could see a file produced inside an
assistant message but no card to open it.

Two fixes in tandem so the strip survives both the just-finished
case AND a fresh page load:

- New `selectStripOnlyTurns` keeps persisted turns that still
  carry `producedFiles`. `ConversationMessage` learns a
  `stripOnly` mode that renders only the trailing strip (no
  duplicate user/assistant bubbles, since those are rendered by
  `ClawChatMessage`).
- `AgentCommandConversation` now also calls `useAgentOutputs` and
  passes `tailStripGroups` to `ClawChat`. Each rail group not
  already covered by a live or strip-only turn renders as its own
  tail `FileCardStrip` after history. Dedup keys on `turnId` so
  the same turn never doubles up.

Adapter-gated upstream — claude / codex agents skip the
useAgentOutputs fetch entirely. The card click still opens the
preview Sheet directly; View / +N still deep-link to the rail at
the matching turn group.

* fix(agent-files): per-turn association + cache invalidation

Two fixes for the inline file-card strip:

1. Strips were stacking at the conversation tail because every
   produced-files group rendered as a tail strip after history.
   New `mapHistoryToProducedFilesGroups` matches each group to
   the assistant history message that came from its turn — by
   `group.turnPrompt` vs the first non-blank line of the
   preceding user message — and ClawChat renders the strip
   directly under that bubble. Groups that don't match any
   history pair (orphans) still fall through to the tail.

2. `useInvalidateAgentOutputs` was passing `undefined` as the
   baseUrl placeholder to `invalidateQueries({ queryKey })` —
   react-query's positional partial-match doesn't treat
   undefined as a wildcard, so the cache stayed stale until the
   query refetched on its own (e.g. window focus). Switched to
   predicate-based invalidation that matches by [agentOutputs
   marker, agentId] regardless of baseUrl. Same for the per-turn
   files key.

Net effect: send a turn that produces files → strip appears
under the just-finished assistant message; reload the page →
strips still appear under the right bubbles, not bunched at
the bottom.

* fix(agent-files): review feedback — name guard, RFC 5987, limit cap

Three review-flagged issues:

1. Path traversal via agent display name — `getHostWorkspaceDir`
   accepted any string and `path.join`'d it, so a name like
   `../../tmp` escaped `.openclaw`. The pre-turn snapshot would
   then walk that escaped directory and attribute every file to
   the new turn; resolveSafeWorkspacePath's containment check is
   relative to the same escaped root so it would later serve
   arbitrary host paths. Added `isAgentWorkspaceNameSafe` (rejects
   `..`, separators, control chars, leading dots, empty); the
   builder now throws on unsafe names plus a defensive
   realpath-style containment check after the join. Harness
   wraps the call so the path-traversal trip just disables file
   attribution for the turn instead of failing the whole send.
   Six-case regression test pinned.

2. `encodeRfc6266Filename` JSDoc claimed an RFC 5987
   `filename*=UTF-8''<percent-encoded>` fallback but the impl
   only stripped CRLFs/quotes. Now actually emits the fallback
   when non-ASCII is present; helper returns the full
   `filename="…"; filename*=UTF-8''…` attribute pair so the call
   site doesn't have to wrap in quotes.

3. `/agents/:agentId/files` `?limit=` was forwarded to the DB
   uncapped — extracted `parseAgentFilesLimit` that clamps to
   [1, 500] before forwarding.

Also extracted `resolveSafeWorkspaceDir` + `snapshotWorkspaceForTurn`
helpers off `runDetachedTurn` so the new safety branch doesn't
push it past biome's cognitive-complexity cap.
2026-05-05 19:48:28 +05:30
Dani Akash
fbae45eb97 feat(agent): calm composer + redesigned hero (#931)
* feat(agent): calm composer + redesigned hero on /home

Adopt the Variant A redesign aesthetic on /home — hero text and
composer styling only. shadcn primitives and CSS variables
unchanged; conversation-screen composer untouched.

Hero:
- Larger display title (clamp 36→56px, weight 600, tighter
  letter-spacing, balanced wrap).
- Italic muted span around "work on" — small typographic accent
  that makes the hero read as designed rather than default.

Composer (variant="home" only):
- Internal dashed divider between the typing area and the footer
  chip row. The visual cornerstone of the calm aesthetic.
- Footer chips become 24px pill-shaped (rounded-full), ghost-on-
  idle / muted-bg-on-hover. Workspace and Tabs show muted trailing
  values inline (none / 0).
- Agent selector on the far left of the footer gets a filled-pill
  trigger variant (bordered, accent/40 background, mono name) to
  visually anchor the row. AgentSelector exposes a triggerVariant
  prop (ghost | pill); chat surface keeps the existing ghost.
- Subtle 1px vertical divider between the agent pill and the rest.
- Right-aligned keyboard hint (↵ to run · ⇧↵ new line) using kbd
  elements with the existing accent/border tokens.
- Outer shell gains a soft accent-orange focus-within ring.

Out of scope (future PRs): TRY suggestion chips, eyebrow strip,
recent-agents redesign, activity log.

* fix(agent): textarea bg leaks in dark mode

* style(agent): paint hero italic span in accent orange

* feat(agent): adopt calm composer aesthetic on chat-screen too

Bring the calm-composer footer (dashed divider, pill chips,
keyboard hint) over from /home to /agents/:agentId so both
surfaces share one design language.

- Rename HomeContextControls → CalmContextControls; the agent
  selector is conditional via showAgentSelector, so chat hides it
  while home keeps the filled agent pill on the left.
- Drop the legacy ContextControls function entirely (~140 LOC) and
  collapse the variant branching at the call site to a single
  CalmContextControls render.
- Add the same focus-within accent ring to ConversationShell that
  HomeShell already has, so the focus signal is consistent.

The chat composer's Stop button (between textarea and voice mic)
is unchanged — it lives outside the footer chip row and only
surfaces while streaming.

---------

Co-authored-by: DaniAkash <DaniAkash@users.noreply.function>
2026-05-05 14:18:29 +05:30
Nikhil
554fcd7c06 fix: improve browseros-patch CLI ergonomics (#941)
* fix: make checkout detection errors actionable

* fix: clarify browseros-patch checkout terminology

* fix: add browseros-patch help examples

* fix: add browseros-patch llm quick reference

* test: cover patch CLI checkout ergonomics

* fix: address review feedback for PR #941
2026-05-04 18:37:19 -07:00
Nikhil
eed158eca0 fix(patch): handle canonical workspace paths (#940) 2026-05-04 18:09:51 -07:00
Nikhil
d61d6fc8a9 feat: add ACPX agent runtime adapters (#924)
* feat: add acpx claude runtime paths

* feat: add acpx adapter preparation

* refactor: use acpx adapter preparation

* refactor: move openclaw image turns to adapter

* fix: keep openclaw independent of host cwd

* fix: address acpx review feedback

* fix: preserve claude host auth in acpx
2026-05-04 11:04:24 -07:00
shivammittal274
d383b5e344 feat(eval): add claude-generated run report artifact (#892)
* feat(eval): add claude-generated run report artifact

* fix(eval): install claude code cli for CI evals

* fix(eval): bypass claude code tool permissions

* Eval metrics configs (#932)

* feat(eval): add agisdk comparison metrics configs

* fix(eval): keep cdp crashes from aborting run
2026-05-04 21:09:06 +05:30
Dani Akash
ce4bb44083 feat(agent): /home composer parity with image attachments (#930)
* feat(agent): /home composer parity with image attachments

The /home composer used the same ConversationInput component as the
chat screen but passed attachmentsEnabled={false}, and the home →
chat handoff was a URL search param `?q=<text>` that physically
can't carry binary attachments. Pasting a screenshot at /home did
nothing.

Add a small in-memory registry (pending-initial-message.ts) as the
rich-data side channel for the same navigation: the home composer
writes { agentId, text, attachments } there before navigating; the
chat screen consumes it on mount and replays through the existing
harness send() path that already supports attachments. URL `?q=`
stays for shareable text-only prompts; the registry wins when both
are present. Module-scope, 10s TTL, destructive consume.

Net: home is now flagged attachmentsEnabled={true}; users can paste,
drag, or pick image files at /home and they survive the navigation
into the chat screen with previews intact.

* docs(agent): clarify why initial-message ref reset is safe post-registry-fire
2026-05-04 18:02:31 +05:30
Nikhil
0d56815cba fix: store server database under BrowserOS dir (#923)
* fix: store server database under browseros dir

* fix: address PR review feedback for 923
2026-05-02 16:03:41 -07:00
Nikhil
c07d3d95d4 feat: add sqlite drizzle persistence (#919)
* feat: add drizzle agent schema

* feat: run sqlite drizzle migrations

* refactor: remove old sql identity dependency

* feat: store harness agents in sqlite

* build: package db migrations

* refactor: remove sqlite oauth token store

* feat: restore oauth token storage

* fix: handle empty install id

* chore: ignore server runtime state

* fix: address review feedback for PR 919
2026-05-02 15:19:57 -07:00
Nikhil
32530ec418 fix: default extract base to BASE_COMMIT (#922)
* fix: default extract base to BASE_COMMIT

* fix: address review feedback for PR #922
2026-05-02 15:12:17 -07:00
Nikhil
e7105ae50b fix: improve browseros-patch workspace feedback (#921)
* fix: make patch list registry-only

* feat: add patch command progress logs

* fix: address review feedback for PR #921
2026-05-02 15:09:31 -07:00
Nikhil
1d42a973ea refactor: extract acpx runtime templates (#918) 2026-05-02 14:03:15 -07:00
Nikhil
921a797c5b feat: add ACPX agent soul and memory support (#917)
* feat: add acpx agent runtime context helpers

* feat: add acpx runtime state store

* feat: prepare acpx agent runtime context

* feat: inject acpx agent command environment

* feat: forward acpx agent chat cwd

* fix: normalize acpx session record fallback

* feat: improve acpx agent soul and memory prompts

* fix: address PR review comments for memory-soul-acp

* fix: satisfy acpx runtime deepscan checks
2026-05-02 13:45:40 -07:00
Nikhil
d94597bbf9 fix(agent): add CLI model catalog entries (#915)
* fix(agent): add CLI model catalog entries

* fix: address PR review comments for acpx-models
2026-05-02 13:06:41 -07:00
github-actions[bot]
ecc6bac070 chore: sync internal-docs submodule (#911)
Co-authored-by: browseros-bot <bot@browseros.ai>
2026-05-01 20:16:26 +00:00
Dani Akash
84e2739663 feat(agent): rich rail + header on /agents/:agentId chat (#908)
* feat(agent): rich rail + header on /agents/:agentId chat

Replace the chat screen's legacy AgentEntry rail and binary READY
header with the same rich data the /agents page already exposes:
adapter glyph, liveness dot, pin star, status badge, adapter · model ·
reasoning chip line, last-used time, lifetime tokens, queue count,
and the Adapter Unavailable warning. Source of truth flips from the
merged AgentEntry list to useHarnessAgents() directly.

Sort order matches /agents (pinned → recency) — not /home
(active-first → recency) — because chat is index-shaped and shuffling
rows every 5s as turns transition would be jarring while reading.

Lift the inline pin-then-recency comparator out of /agents
AgentList.tsx into a shared agents-list-order.ts so both surfaces
stay on identical sort semantics.

* fix(agent): chat header height + composer sticking to bottom

Header was clipping descenders because the strip was vertical-content
sized at min-h-14 with tight py-2.5; bump padding and lean on natural
content height. Drop the AgentTile glyph (the rail row already shows
adapter identity) and the cwd path (too long, pushed the meta line
off-screen). Header is now name + pin star + status pill, then
adapter · model · reasoning, then last-used · tokens · queued.

Composer was floating mid-screen on short chats because the chat
grid had no grid-template-rows — the implicit auto row collapsed to
content height, so the right-column flex wrapper never received the
full container height. Add grid-rows-[minmax(0,1fr)] so the single
row claims 100% and ClawChat's flex-1 expands to push the composer
flush to the bottom.

* fix(agent): composer flush to bottom on short chats

Match the sidepanel chat's nested-flex pattern. The right-column
wrapper got h-full so it expands to the grid row; the conversation
controller's root added flex-1 so ClawChat's existing flex-1 has
something to actually fill against. Without these, the grid cell
stretched but the inner flex columns shrank to content height,
leaving the composer floating mid-screen.

* fix(agent): align rail header with chat header in shared top band

Pull the rail's "Agents" + back-button into the same horizontal strip
as the agent identity header. The two halves now sit on a single row
that spans both columns, so they can't drift in height as the chat
header gains/loses meta lines (last-used, tokens, queued).

The rail below the band keeps its scrollable list only; the chat
column below holds the conversation + composer. Border-bottom moves
from ConversationHeader to the band wrapper so we don't get a
double-rule on the boundary.

* fix(agent): reserve header height to prevent layout shift on data load

The chat header grew from a single line to three lines once the
useHarnessAgents() poll resolved (adapter chips + meta line populate
asynchronously), shoving the rail and conversation body downward.
Lock min-h-[84px] on both the band's left "Agents" cell and the
ConversationHeader root, and always render the meta line slot
(non-breaking space when empty) so the typographic frame is stable
regardless of data state.

* refactor(agent): pull status pill + meta to right side of chat header

Two-column header layout instead of three stacked rows: name + pin
star + adapter chips on the left, status pill stacked on top of the
last-used / tokens / queued meta line on the right. Drops min-h
from 84px → 60px so the band reclaims ~24px of vertical space and
the chat body starts higher on screen. Band's left "Agents" cell
matches the new height.
2026-05-01 20:19:16 +05:30