Compare commits

...

33 Commits

Author SHA1 Message Date
Nikhil Sonti
8cd7c985e3 fix: address review feedback for PR #891 2026-04-30 12:54:44 -07:00
Nikhil Sonti
548c9dd996 feat(dev): bootstrap setup from dev watch 2026-04-30 12:49:25 -07:00
Nikhil
d38b01a8c7 feat(dev): add guided cleanup and reset commands (#890)
* feat(dev): add guided cleanup and reset commands

* fix: address cleanup reset review feedback
2026-04-30 12:27:15 -07:00
Nikhil
ff36c8412b fix(dev): use run lock for watch cleanup (#889)
* fix(dev): use run lock for watch cleanup

* fix(dev): address watch lock review comments
2026-04-30 11:46:17 -07:00
Nikhil
fd5aba249b fix: stabilize OpenClaw gateway startup (#888)
* feat(server): add shared process lock helper

* feat(container): add container name reconciliation helpers

* feat(openclaw): serialize lifecycle across processes

* fix(openclaw): reconcile fixed gateway container startup

* test(openclaw): cover lifecycle race recovery

* fix(server): satisfy process lock error override

* fix(openclaw): address review feedback

* test(openclaw): align serialization mock with image check
2026-04-30 11:31:40 -07:00
Nikhil
492f3fcdf2 feat(openclaw): prewarm ghcr image in vm (#887)
* feat(openclaw): add gateway image inspection

* feat(openclaw): pull gateway image from registry

* refactor(vm): decouple readiness from image cache

* refactor(openclaw): remove vm cache from runtime factory

* feat(openclaw): detect current gateway image

* feat(openclaw): prewarm vm runtime and reuse current gateway

* feat(openclaw): prewarm runtime on server startup

* refactor(vm): remove browseros image cache runtime

* refactor(build-tools): remove openclaw tarball pipeline

* chore: self-review fixes

* fix(openclaw): suppress prewarm pull progress logs

* fix(openclaw): address review feedback

* fix(openclaw): resolve review findings

* fix(dev): stop stale watch supervisors
2026-04-30 11:18:11 -07:00
Nikhil
cb0c0dd0c1 chore: simplify root test scripts (#886)
* chore: simplify root test scripts

* fix: avoid chained root test scripts

* fix: update test workflow commands

* fix: move app test commands into packages
2026-04-30 10:58:08 -07:00
Dani Akash
8712f89f18 feat(agents): durable per-agent chat message queue + composer Stop (#880)
* feat(agents): durable per-agent chat message queue + composer Stop button

* fix(agents): tighten queue UI — smaller Stop, drop empty indicator, live drain attach

User feedback round 1 on the message-queue UX:

1) The Stop button matched the send/voice mics at h-10 w-10 with a
   solid destructive fill, which read as alarming. Shrunk to h-8 w-8,
   ghost variant with a soft destructive/10 background, smaller
   filled square glyph. Reads as a calm 'stop' affordance instead of
   a panic button.

2) The QueueItem's leading <QueueItemIndicator> dot was decorative
   only — no state, no interaction. Dropped it from QueuePanel along
   with the import; queue items now render as a clean preview line
   with the trailing X remove action.

3) When the server drained the queue and started the next turn, the
   chat panel didn't pick up the live stream until the user
   navigated away and back. The hook's resume effect previously
   only fired on agent change, not on listing-observed activeTurnId
   change. Surface activeTurnId from useHarnessAgents into
   useAgentConversation; effect now re-runs when the id changes,
   calls /chat/active, and attaches to the new turn — so a queued
   message starts streaming the moment the server drain pops it.

* fix(agents): don't reset streaming state from the resume effect's no-op paths

The Stop button was disappearing while the agent was actively
streaming, even though events were still flowing into the chat. Root
cause: the resume effect's `finally` block reset `streaming`,
`turnIdRef`, and `lastSeqRef` unconditionally — including on the
early-return paths (no active turn, or another mechanism already
owns the stream).

Sequence that triggered it:
  1) User sends a message → send() sets streamAbortRef + streaming=true
     and starts consuming the SSE.
  2) User enqueues another message → enqueue mutation invalidates the
     listing query.
  3) Listing refetches with the live activeTurnId → the resume
     effect re-fires (deps include activeTurnIdDep).
  4) attemptResume hits `if (streamAbortRef.current) return` because
     send() owns it.
  5) The finally clause fires anyway and calls setStreaming(false),
     clobbering the live state set by send(). The SSE consumer keeps
     running (refs are intact) so text keeps streaming, but the React
     flag is wrong, so the Stop button gates off.

Fix: track whether *this* run actually started a stream
(`weStartedStream`). The finally only resets state when it does.
Early-return / no-active-turn paths now leave streaming/turnIdRef/
lastSeqRef alone for whoever does own them.

Also widens the Stop button's visibility (`canStop` prop on
ConversationInput) so it stays steady across the brief gap between
turns when a queue drain is mid-flight; the parent computes
`streaming || activeTurnId !== null || queue.length > 0`. The
visibility widening is independent of the streaming-state fix above
— both are now in place.

* revert: drop canStop widening — Stop only shows while streaming

Reverts the canStop prop on ConversationInput and the OR-with-queue
visibility from AgentCommandConversation. Stop is gated solely on
`streaming` again. Between turns (queue draining) the button stays
hidden — only the actively-streaming turn is interruptible from the
composer, which matches what the user actually expects.

* fix(agents): persist the kicking-off prompt on active turns so the resume placeholder isn't empty

When a queued message drained and started a new turn, the chat
panel's resume effect staged a placeholder turn with userText: ''
because the hook had no way to know what message kicked off the
turn — only the agent-side stream was visible, and the user bubble
above it was blank until the user navigated away and back (at which
point the session record's history loaded normally).

Fix: ActiveTurnRegistry.register now accepts an optional `prompt`
that's stashed on the turn and surfaced via describe() / the
ActiveTurnInfo response. AgentHarnessService.startTurn passes the
incoming message into register. /chat/active returns it. The chat
hook's resume effect uses active.prompt as the placeholder
turn's userText, so the user bubble shows the queued message text
the moment streaming begins. Falls back to '' for older clients
that haven't been refetched yet.

* fix(agents): always release streamAbortRef on resume cleanup, even when cancelled

Greptile P1 follow-up. The previous `weStartedStream` guard correctly
stopped the resume effect's no-op early-returns from clobbering an
in-flight `send()` stream — but it also stopped a *cancelled*
mid-stream resume from clearing its own `streamAbortRef`. When the
cleanup fires (e.g. the 5s listing poll captures a new queue-drain
turn id while the SSE for the prior turn is still finishing), the
next effect run hits the `if (streamAbortRef.current) return` guard
against the now-aborted controller and never reattaches, leaving
`streaming === true` with no live stream until the user navigates
away.

Split the finally block: always release `streamAbortRef` when we
owned the controller (so the next run can take over), but only
reset the streaming flag / turn id / lastSeq on a clean exit (the
new run will set those itself, so resetting on cancel would just
flicker).
2026-04-30 18:26:56 +05:30
Dani Akash
ba60bf466f feat(agents): rich command-center rows + home grid + dead-code sweep (#879)
* feat(agents): rich-info command center rows + pin/PATCH/adapter-health backbone

Splits AgentRowCard from a 271-line monolith into a shallow tree of
single-responsibility sub-components under `agent-row/`:

  AgentTile, AdapterHealthDot, PinToggle, AgentTitleRow,
  AgentSparkline, AgentSummaryChips, AgentLastMessage, CwdChip,
  AgentTokenSummary, AgentMetaRow, AgentErrorPanel, AgentActions

Adds the data each row consumes:

- pinned: boolean field on AgentDefinition + FileAgentStore.update
  + new PATCH /agents/:id route. useUpdateHarnessAgent mutation
  optimistically updates the listing cache so the star flips
  instantly; rolls back on error.
- Listing payload extended with lastUserMessage, cwd, tokens
  (cumulative + last7d shape — last7d zero-filled until the
  activity ledger lands), turnsByDay/failedByDay (zero-filled),
  lastError/lastErrorAt, activeTurnId. AcpxRuntime grows a
  getRowSnapshot() that reads cwd + cumulative tokens + last user
  message from the session record in one pass.
- Adapter health: in-memory AdapterHealthChecker probes
  `claude --version` / `codex --version` with a 2s timeout and
  caches results for 5 min. /adapters response carries
  { healthy, reason?, checkedAt }. Tile-corner dot exposes the
  state via HoverCard; openclaw inherits health from the gateway
  snapshot already on the page.

Sub-components are pure: card itself owns no state. Sort order
becomes pinned-first, then recency. HoverCard is the workhorse for
keeping rows compact while exposing depth (full message, token
breakdown, daily turn list, error stack, adapter reason).

* refactor(agents): tighten command-center row design + cut redundant affordances

User feedback round 1:
1) Two green dots on the tile (health + liveness) was confusing. Health
   moves out of the tile entirely and surfaces as an inline 'Unavailable'
   chip in the model line — silent when the adapter is healthy, with a
   warning amber chip + HoverCard reason when not. The tile now shows
   one signal: liveness.
2) The last-user-message HoverCard wasn't telegraphing intent. Drop the
   HoverCard. The line is informational, italic, with a leading quote
   glyph so the row reads like a conversation snippet. To see the full
   message the user opens the chat (which is the action they want next
   anyway).
3) Resume + Chat were duplicate CTAs. Single primary action per row:
   Resume (filled, accent-orange, with a pulsing dot) replaces Chat
   when there's an active turn. Both navigate to /agents/:id but the
   row tells the user which action they're taking.
4) Tokens weren't visible because the row gated on last7d.requestCount,
   which is zero until the activity ledger ships. Switch to lifetime
   tokens (which we have today). Drop the '7d stats:' framing — talking
   about a window we can't compute would be misleading. The HoverCard
   surfaces input/output split + a footnote that per-window stats land
   in a follow-up.
5) CWD was rendering the server's own running directory, which is
   meaningless to users. Hide it from the row entirely. The cwd field
   still rides in the listing payload for future surfaces (chat panel,
   debug view) — only the row stops rendering it.

Aesthetic refinements while we're here:
- Whole card carries state, not just the tile: working rows get an
  accent-orange tinted border with a soft glow, error rows tint
  destructive, idle rows lift on hover.
- Pin star fades in on hover (group-hover) when unpinned and stays
  solid amber when pinned — keeps the rail calm by default.
- Tabular-nums on token figures so columns visually align across rows.
- Drop CwdChip and AdapterHealthDot files: no callers left.

* fix(agents): align row title flush-left whether pinned or not

Pin star moved from leading the title to trailing the badges, and
hidden from layout entirely (`hidden group-hover:inline-flex`) when
unpinned. The previous `opacity-0` rule kept the star reserving its
`size-6` slot, which left every unpinned title indented relative to
the model / preview / meta lines underneath it. Title now flushes
left in both states; pinned star stays solid amber so the signal
isn't hidden, and unpinned reveals an outline star on row hover for
the toggle affordance.

* fix(agents): keep pin-toggle slot reserved so row height is constant

Switching the unpinned star from `hidden group-hover:inline-flex`
to `opacity-0 group-hover:opacity-100`. The hidden/show variant was
collapsing the title row's height when the star wasn't rendered,
which made every card below visibly shift on hover. Always rendering
the button (with opacity-only visibility) keeps the row's vertical
metrics constant; the title still flushes left because the slot is
trailing, not leading.

Card hover effect (-translate-y + shadow-md) restored — the layout
shift wasn't coming from the card hover; it was the pin slot
appearing and disappearing.

* fix(agents): quieten row hover — border-tint only, no lift, no shadow

Drop the `-translate-y-px` and `hover:shadow-md` from the row card
plus the working-state inner ring. The translate + shadow grow
combination was visibly noisy as the cursor moved through the rail —
each row 'lifted' as you passed over it. Hover now just tints the
border in accent-orange/30; working and error states keep their
distinct border colours but no inner ring. Card height and shadow
stay constant in every state, so the rail reads as a calm vertical
list of cards.

* feat(home): rich Recent Agents grid + dead-code sweep

The /home Recent Agents grid was a placeholder shell. Every 'rich'
field on the card (lastMessage, lastMessageTimestamp, activitySummary,
currentTool, costUsd) was wired to undefined because AgentCommandHome
called `buildAgentCardData(agents, status?.status, undefined)` — the
dashboard arg has been hard-coded undefined since the harness
migration. Repointing the grid at `useHarnessAgents` + `useAgentAdapters`
gives every card the same enriched data the rail uses.

What the new card shows per agent:
  • Adapter glyph tile + liveness dot (working pulses; asleep is
    hollow; error is red)
  • Name + Working pill (when active)
  • Adapter · model · reasoning summary line, with an inline
    Unavailable chip + HoverCard reason when the adapter binary
    isn't on $PATH
  • Italic last-user-message preview (line-clamp-2, leading quote
    glyph) — same visual language as the rail
  • Footer: 'X ago' + state chip (Asleep / Attention) OR a Resume
    button (orange, with pulsing dot) when activeTurnId is non-null

Sort on the home grid is active-turn → recency. Pinning is NOT a
sort key here (and there's no pin indicator on the card) — pinning
belongs to the rail at /agents; the home page is action-oriented
and trusts active-turn + recency to surface the right agent.

Dead code removed:
  • useAgentDashboard.ts (96 lines, no callers; subscribed to the
    dead /claw/dashboard/stream from the OpenClaw-only era)
  • useAgentCardData.ts (the dashboard-merge shim; passed undefined
    every call so all enriched fields landed as undefined)
  • AgentCard.tsx (AgentCardExpanded replaced by HomeAgentCard;
    AgentCardCompact had no callers — the dock's compact mode was
    never used)
  • AgentCardData interface dropped from lib/agent-conversations/
    types.ts; the new card consumes HarnessAgent directly

Visual language stays continuous between rail and grid: same
<AgentTile>, same <LivenessDot>, same italic-quote message
preview, same orange Resume button with a pulsing dot.
2026-04-30 16:36:22 +05:30
Nikhil
26afb826c6 feat(eval): add viewer manifest contract (#878)
* refactor(eval): canonicalize viewer manifest contract

* refactor(eval): publish canonical viewer manifests

* feat(eval): make r2 viewer use manifest artifact paths

* fix(eval): keep weekly report compatible with viewer manifests

* docs(eval): document r2 viewer manifest contract

* chore: self-review fixes

* fix: address review feedback for PR #878
2026-04-29 20:50:35 -07:00
Nikhil
b2340c8afa refactor(eval): split orchestrated executor backends (#876)
* refactor(eval): split orchestrated executor backends

* fix(eval): address executor backend review comments
2026-04-29 18:02:32 -07:00
Felarof
790a270f47 Update README.md (#877) 2026-04-29 17:35:15 -07:00
Nikhil
84a79ba0a1 feat: refactor eval pipeline workflow (#875)
* feat(eval): add suite variant config bridge

* feat(eval): add stable run artifacts

* refactor(eval): add shared grader contract

* feat(eval): persist grader artifacts

* refactor(eval): rename runner layers

* refactor(eval): add executor backend boundary

* refactor(eval): split clado backend

* feat(eval): add workflow compatible cli

* feat(eval): add r2 publisher module

* ci(eval): migrate weekly workflow to eval cli

* docs(eval): document suite pipeline

* chore(eval): verify pipeline refactor

* fix: address review feedback for PR #875

* docs(eval): add env example

* docs(eval): explain suites and variants

* chore(eval): organize config layouts

* chore(eval): colocate grader python evaluators
2026-04-29 17:21:02 -07:00
Nikhil
6e3306f5e5 fix: make R2 uploads retryable (#874)
* fix: make R2 uploads retryable

* fix: address review feedback for PR #874
2026-04-29 16:43:33 -07:00
Nikhil
c244462b29 fix: use Node 24 GitHub actions (#872) 2026-04-29 15:31:23 -07:00
Nikhil
ebf97f74f6 fix: bound VM agent cache smoke test (#870)
* fix: bound VM agent cache smoke test

* fix: address review feedback for PR #870
2026-04-29 13:43:37 -07:00
Nikhil
561f2baf97 fix(eval): split AGISDK smoke and full configs (#871)
* fix(eval): split agisdk smoke and full configs

* fix(eval): default agisdk smoke to openrouter
2026-04-29 13:38:55 -07:00
shivammittal274
df0f45dd29 Feat: eval debug dev ci (#869)
* chore(eval): instrument server startup to root-cause dev CI health-check timeouts

Three diagnostics + one config swap to investigate why the eval-weekly
workflow has been failing on dev since 2026-04-25 with "Server health
check timed out" (every worker, every retry).

Background:
- Last successful weekly eval on dev: 2026-04-18 (sha f5a2b73)
- Since then, ~30 server commits landed including Lima/VM runtime,
  OpenClaw service, ACL system, ACP SDK — 108 server files changed,
  ~13K LOC added.
- Server process spawns cleanly in CI (PID logged) but never binds
  /health within the 30s eval-side timeout. Static analysis finds no
  obvious blocker; we need runtime evidence.

Changes:

1. apps/server/package.json — add `start:ci` script (no `--watch`).
   The default `start` uses `bun --watch` which forks a child process
   that watches every file in the import graph. Dev's graph is ~108
   files larger than main's; on a cold CI runner the watcher setup is a
   plausible source of multi-second startup overhead.

2. apps/eval/src/runner/browseros-app-manager.ts:
   - Use `start:ci` when `process.env.CI` is set (true on
     GitHub-hosted runners by default), else `start`.
   - Capture per-worker server stderr to /tmp/browseros-server-logs/
     instead of ignoring it. Without this we have no visibility into
     why the server is hung pre-/health.
   - Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module
     graph may simply need more cold-start time on CI.

3. .github/workflows/eval-weekly.yml — upload the server logs dir as a
   workflow artifact (always, not just on success) so we can post-mortem
   any startup failure on the next run.

4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter ->
   Fireworks (bypasses the OpenRouter per-key spend cap that has been
   eating recent runs) and drop num_workers 10 -> 4 (well below the
   Fireworks per-account TPM threshold that overwhelmed the original
   2026-04-23 run).

Plan: trigger the eval-weekly workflow on this branch with the agisdk
config and observe (a) whether it gets past server startup, and
(b) if it doesn't, what the captured server stderr says.

* fix(eval): capture stdout too — pino logger writes to stdout, not stderr

Previous diagnostic patch only redirected stderr; the captured per-worker
log files came back as 0 bytes because the server uses pino which writes
all log output to stdout (fd 1), not stderr (fd 2). Capture both into
the same file.

* fix(server): catch sync throw from OpenClaw constructor on Linux

The container runtime constructor in OpenClawService throws synchronously
on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing
.catch() on tryAutoStart() only handles async throws inside auto-start —
the sync throw from configureOpenClawService(...) itself propagates up
through Application.start() and crashes the process via index.ts:48
(process.exit(EXIT_CODES.GENERAL_ERROR)).

This is what's been killing dev's eval-weekly CI: the server crashes in
milliseconds, the eval client polls /health, gets nothing, times out.

Fix: wrap the configureOpenClawService call in try/catch matching the
existing .catch() intent (best-effort, don't crash). Server continues
without OpenClaw on platforms where it can't initialize.

Verified by reading captured server stdout from run 25123195126:
  Failed to start server: error: browseros-vm currently supports macOS only
    at buildContainerRuntime (container-runtime-factory.ts:54:11)
    at new OpenClawService (openclaw-service.ts:652:15)
    at configureOpenClawService (openclaw-service.ts:1527:19)
    at start (main.ts:127:5)

* fix(server): defer OpenClaw chat client port lookup to request time

apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort()
synchronously when constructing the OpenClawGatewayChatClient inside the
createHttpServer object literal. On non-darwin platforms this throws via
the OpenClawService constructor → buildContainerRuntime, escaping the
try/catch added in 5cf7b765 (which only protected the configureOpenClawService
call further down in main.ts).

Every other getOpenClawService() reference in server.ts is already wrapped
in an arrow function. This was the lone holdout. Make it lazy too: change
the chat client constructor to take getHostPort: () => number instead of
hostPort: number, evaluate it inside streamTurn at request time. Behavior
on darwin is unchanged.

This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't
available — the chat endpoint isn't exercised by the eval, so a deferred
throw is acceptable.

* fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1

Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't
unblock dev's Linux CI — same throw kept reproducing. Whether this is bun
caching stale stack frames or a missed eager call site, the safer move is
to fix it at the root: make buildContainerRuntime never throw on Linux
when the runner has explicitly opted out.

Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test
escape hatch in container-runtime-factory.ts. When set, returns the existing
UnsupportedPlatformTestRuntime stub — server boots normally, /health binds,
any actual OpenClaw API call still fails loudly at request time.

eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and
non-CI Linux behavior unchanged (without the flag they still throw).

* feat(eval): align Clado action executor with new endpoint contract

David Shan shared the updated Clado BrowserOS Action Model spec.
Changes to match it:

- Bump endpoint URL + model id to the 000159-merged checkpoint
  (clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef)
  in browseros-oe-clado-weekly.json and the README example.
- CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min;
  the 2-min ceiling was failing every cold-start request.
- Treat HTTP 200 with action=null / parse_error as an INVALID step
  instead of aborting the executor loop. The model can self-correct
  on the next call. Cap consecutive parse failures at 3 to avoid
  infinite loops.
- Capture final_answer from end actions. Surface it in the observation
  back to the orchestrator so its task answer can use the model's
  declared result.
- Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X).
- Switch screenshot format from webp → png to match the documented
  "PNG or JPEG" contract.

* chore(eval): refresh test-clado-api script for new Clado contract

Updated the local smoke-test to match the new Clado endpoint and
response contract:

- New action + health URLs (000159-merged checkpoint).
- Drop the grounding-model branch (orchestrator-executor doesn't
  use it; the README David shared only documents the action model).
- Health-check waits up to 6 minutes for cold start with a 30s
  warning so the operator knows it's spinning up.
- Print every documented response field (action, x/y, text, key,
  direction, amount, drag start/end, time, final_answer, thinking,
  parse_error, inference_time_seconds).
- Three-step run that exercises a click, a typing continuation
  with formatted history, and an end+final_answer probe.

* chore(eval): point clado weekly config at agisdk-real

Switches the orchestrator-executor + Clado weekly config to run on the
AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff
grader. Matches the orchestrator-executor smoke target (Fireworks K2.5
orchestrator + Clado action executor) we want to track week-over-week.

* chore(eval): run clado weekly headless

Default to headless so the weekly job (and local repros) don't pop ten
visible Chrome windows. Set headless=false locally if you need to watch
a worker.

* fix(eval): address Greptile P1+P2 on server log fd handling

P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir
failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the
log directory missing and crash the server spawn with ENOENT. Move openSync
into the same try block; fall back to /dev/null so spawn always succeeds.

P2: the log fd was opened on every server start but never closed. Each
restart attempt leaked one fd across all workers — over a long eval run
that could exhaust the process fd limit. Track the fd on the manager and
closeSync it in killApp() right after the server process exits (the child's
dup keeps the file open until it exits, so we don't truncate output).
2026-04-30 01:33:49 +05:30
Nikhil
edfc5c751c fix: align OpenClaw gateway image with VM cache (#868)
* fix: load OpenClaw gateway image from VM cache

* fix: use container port for OpenClaw ACP bridge

* fix: address review feedback for PR #868
2026-04-29 12:11:00 -07:00
Nikhil
471256f31c fix: stop passing native permission flags to ACP adapters (#867) 2026-04-29 11:07:51 -07:00
Nikhil
4c90ca696b fix(agents): connect OpenClaw ACP inside gateway container (#866) 2026-04-29 11:07:29 -07:00
Nikhil
f2ac87d7c3 feat: show created agents in sidepanel (#865)
* feat(agent): list created agents in sidepanel target catalog

* feat(agent): show created agents in sidepanel selector

* feat(server): add sidepanel chat route for created agents

* feat(agent): route sidepanel agent sends by agent id

* chore(agent): retire virtual sidepanel acp targets

* fix: address review feedback for PR #865
2026-04-29 10:15:58 -07:00
shivammittal274
231bd6821d fix(eval): pin agisdk version + exclude 4 invalid tasks (Phase 2 dataset hygiene) (#844)
* chore(eval): pin agisdk version to prevent silent dataset drift

`pip install agisdk` previously fetched whatever version pip resolved at
CI time. If agisdk publishes a new version with changed task definitions
or grader behavior, the weekly eval silently shifts under our feet —
making "did the score move because of code or data?" unanswerable.

Pin to agisdk==0.3.5 (the version we currently develop against). Bump
intentionally with a documented re-baseline run.

* fix(eval): exclude 4 more tasks identified by 8-trial never-passing audit

After 8 trials across K2.5 + Opus 4.6 (Phase 1 and Phase 2), 5 tasks
never passed. Per-task root-cause investigation via parallel deep-dive
subagents flagged 4 of them as fundamentally unfixable in the eval
pipeline as it stands; the 5th (`dashdish-5`) is a prompt-rule fix
that stays in.

- gocalendar-7: goal/grader contradiction. Goal says "move event to
  July 19, 10 AM"; grader expects `eventsDiff.updated.*.start ==
  "2024-07-18T17:00Z"` (= July 18, 10 AM PDT — same day, 1 hour shift).
  Even after the Phase 2 HTML5 dnd dispatch fix correctly populates
  `eventsDiff.updated`, the values are July 19 (matching the goal),
  which the grader rejects.

- staynb-5: grader hardcodes literal `'Oct 13 2025'` and `'Oct 23 2025'`
  year strings. The staynb date picker interprets bare "Oct 13" as the
  most-recent-past instance (currently 2024 since today is 2026), not
  2025. No agent can produce a persisted date string containing 2025.

- staynb-9: under-specified task. Goal says "maximum number of guests
  supported"; grader requires the very specific string "32 Guests, 16
  Infants" — encoding UI knowledge (Adults+Children=Guests display,
  Infants render separately, per-category cap=16, Pets excluded) that
  isn't in the prompt. Even Opus 4.6 stopped at 16 across 3 trials.

- opendining-3: grader requires `contains(booking.date, '2024-07-20')`
  but the React-controlled date textbox flakily no-ops on `fill`. 3/8
  trial pass rate is essentially coin-flip noise driven by tool-fidelity
  variance rather than agent capability. Removing to reduce score noise;
  Phase 2 fill post-validate warning helps when it does work, but the
  task's signal-to-noise is too low for the eval set.

Dataset goes from 40 -> 36 tasks. Total EXCLUDED_TASKS now 11 entries.

Validated by 8-trial pass-record audit; deep-dive notes saved to
plans/audits/.
2026-04-29 22:07:53 +05:30
Dani Akash
a228c278c6 feat(agents): background-resilient chat — turns survive tab disconnect (#863)
* feat(agents): decouple chat turn lifecycle from SSE response

Introduce a per-process ActiveTurnRegistry that owns each agent turn's
lifecycle and a ring-buffered event stream, so chat tabs that close,
refresh, or navigate away no longer cancel the in-flight turn. New
endpoints:

  POST   /agents/:id/chat          starts a turn (now returns 409 when
                                   one is already running, with the
                                   active turnId for attaching)
  GET    /agents/:id/chat/active   reports the running turn for a UI
                                   that just mounted
  GET    /agents/:id/chat/stream   subscribes to a turn; supports
                                   Last-Event-ID resume via per-event
                                   seq ids
  POST   /agents/:id/chat/cancel   explicit cancel — fetch abort no
                                   longer affects the underlying turn

The chat hook now captures X-Turn-Id, tracks lastSeq from SSE id lines,
re-attaches on mount when the server still has an active turn, and
routes Stop through the cancel endpoint. The runtime call uses the
registry's per-turn AbortController instead of the HTTP request signal,
which is the core decoupling that lets turns outlive their initiator.

* feat(agents): add ActiveTurnRegistry primitive backing the new chat lifecycle

The previous commit referenced these files in tests and the harness
service but global gitignore swallowed them on the first add.

The registry owns the per-turn ring buffer (drop-oldest, terminal frame
preserved), the per-turn AbortController, and subscriber fan-out used
by /chat/stream resume.
2026-04-29 21:01:06 +05:30
Dani Akash
e2ec1991cf feat(agents): redesign the agent command center for multi-adapter use (#861)
* feat(agents): redesign agent rail to match the rest of the app

Reshape the `/agents` page so it reads as a sibling of `/scheduled`
and `/soul` and adapts to the multi-adapter world (OpenClaw, Claude
Code, Codex). Visual scaffolding only in this commit — per-agent
liveness state ships as `unknown` until the server-side activity
tracker lands.

  - New `AgentsHeader` mirrors `SoulHeader`/`ScheduledTasksHeader`:
    accent bot tile, title, descriptive subtitle, "+ New Agent"
    button. Replaces the loose top toolbar that mixed page-level and
    OpenClaw-lifecycle controls.
  - New `GatewayStatusBar` collects the OpenClaw lifecycle pills
    (running, control plane connected) plus the Terminal/Refresh
    affordances into a single labeled bar that only renders when the
    gateway is running AND there is at least one OpenClaw agent in
    the merged list.
  - New `AgentRowCard` per agent: adapter tile with liveness dot,
    name + status badge, adapter/model/reasoning chips, last-used
    relative time + truncated workspace path, primary "Chat" button,
    overflow menu (Copy id / Rename* / Reset history* / Delete).
    Rename + Reset are disabled with "coming soon" tooltips until
    the corresponding endpoints ship; Delete is hidden for the
    protected `main` agent.
  - New `AgentsEmptyState` mirrors the scheduled-tasks empty card.
  - New `AdapterIcon` + `LivenessDot` + `agent-display.helpers.ts`
    keep the row card focused on layout; helpers cover display name
    fallbacks for legacy `oc-<uuid>` titles, workspace label rules,
    and a tiny relative-time formatter.
  - `AgentList` now sorts by `lastUsedAt` desc with `null`s falling
    to the bottom; the gateway's `main` agent is pinned to the top
    only while it has zero turns so a fresh install has an obvious
    starting point. The list also threads a per-agent activity map
    so future commits can light up working/idle/asleep without
    reshuffling the API.
  - `AgentsPage` swaps to the standard `fade-in slide-in-from-bottom-5
    animate-in space-y-6 duration-500` shell and threads a
    `harnessAgentLookup` Map down to the row card so adapter chips
    and reasoning effort render correctly without a re-fetch.

* feat(agents): wire per-agent liveness end-to-end into the rail

Closes the placeholder `unknown` dot from the redesign's first
commit. The rail now shows real working / idle / asleep / error
states per agent, with `lastUsedAt` driving the recency sort.

Server side:
  - `AgentHarnessService` keeps an in-memory activity tracker keyed
    by agentId. `notifyTurnStarted` flips an entry to `working`,
    `notifyTurnEnded({ok})` either drops it (success) or pins it to
    `error` (failure / error event).
  - `send()` wraps the runtime stream so the lifecycle hook fires
    exactly once on natural close, error event, downstream cancel,
    or thrown setup. The runtime itself stays unchanged — fork is
    contained at the harness layer.
  - New `listAgentsWithActivity()` method enriches every agent with
    `{ status, lastUsedAt }`. lastUsedAt is read from the acpx
    session record's last persisted item via `runtime.getHistory`,
    so it survives server restart even though the activity map
    doesn't.
  - Status derivation: `working`/`error` take precedence; otherwise
    timestamp-based — `idle` until 15 min of silence, then `asleep`.
    Never-used agents resolve to `idle` (asleep implies "was active,
    went quiet").
  - `GET /agents` returns the enriched shape.

Client side:
  - `HarnessAgent` UI type extended with optional `status` +
    `lastUsedAt` so older deployments still typecheck.
  - `useHarnessAgents` flips on `refetchInterval: 5_000` (with
    `refetchIntervalInBackground: false` so hidden tabs go quiet)
    so the per-row dots and last-used copy stay fresh without a
    websocket.
  - `AgentsPage` builds an activity map from the harness listing
    response and threads it into `AgentList` → `AgentRowCard`. The
    sort by `lastUsedAt` desc (already in the row card) now has
    real data to operate on.

Tests:
  - New `marks an agent working while a turn streams and idle once
    it ends` exercises the wrap; uses a held upstream stream so
    the in-flight `working` state is observable.
  - New `flips to error when a turn emits an error event`.

* fix(agents): dedupe agent rail when /claw/agents and /agents share an id

The agents page was rendering every OpenClaw agent twice — once from
the legacy `/claw/agents` listing (`useOpenClawAgents`) and once from
the harness `/agents` listing (`useHarnessAgents`). Post Step 9
backfill the harness store contains every gateway agent, so the
overlap is the rule, not the exception.

Mirror the dedup the chat-panel layout already does: when a gateway
agent's id appears in the harness listing, drop the legacy entry and
keep the harness one (it has adapter/model/reasoning/status/lastUsedAt
the chat path actually consumes).

* feat(agents): swap GatewayStatusBar refresh icon for a Restart Gateway button + tooltips

The manual refresh became redundant once `useHarnessAgents` and
`useOpenClawStatus` started polling on a 5s interval — every visible
field self-refreshes within seconds. The previous AgentsPageHeader
had a real Restart action that the redesign dropped; reinstate it on
the bar so a wedged gateway is one click away again.

  - GatewayStatusBar: dropped the `RotateCcw` refresh icon and the
    `onRefresh` prop. Added `onRestart` + `actionInProgress` props;
    the button shows a spinner while a gateway lifecycle mutation is
    in flight.
  - Both Terminal and Restart Gateway buttons get tooltips explaining
    what they do — Terminal as a power-user shell escape hatch,
    Restart for unsticking a wedged gateway or after manual config
    edits.
  - AgentsPage: drop the now-unused `refreshAll` helper and the
    `refetchStatus`/`refetchAdapters`/`refetchOpenClawAgents`
    destructures it depended on. Wire `restartOpenClaw` (already
    pulled from `useOpenClawMutations`) through
    `runWithPageErrorHandling` like the legacy header did.

* feat(agents): consolidate gateway status into the /agents listing

Folds the gateway lifecycle snapshot into the harness listing so the
agents page polls one endpoint instead of two. Drops the dead
`/claw/status` call from the command center while keeping every UI
affordance the page already shipped (Running / Control plane
connected pills, GatewayStateCards setup/start prompts,
ControlPlaneAlert for degraded states).

Server side:
  - `OpenClawProvisioner.getStatus()` (optional) — when wired, returns
    the same `GatewayStatusSnapshot` shape `/claw/status` does.
  - `AgentHarnessService.getGatewayStatus()` — best-effort wrapper
    around the provisioner method; logs and swallows errors so a
    transient gateway issue doesn't 500 the listing endpoint.
  - `GET /agents` now returns `{agents, gateway}` in a single
    `Promise.all`. Both fields are independent — agents enrichment
    succeeds even if the gateway snapshot is null.
  - `server.ts` wires `getOpenClawService().getStatus()` into the
    provisioner accessor object alongside `createAgent` /
    `removeAgent` / `listAgents`.

Client side:
  - `useHarnessAgents` returns `{harnessAgents, gateway}` (plus the
    legacy `agents` mapping). Same 5s `refetchInterval` as before —
    one round-trip drives the per-row liveness AND the gateway pills.
  - `AgentsPage` drops `useOpenClawStatus` entirely; `status` comes
    from the harness query. Loader + error/lifecycle plumbing
    rewired around the harness query's loading/error.
  - `agents-page-utils.getInlineError` and `getAgentsLoading` lose
    the now-redundant `statusError` / `statusLoading` /
    `openClawAgentsEnabled` params.

The chat-panel layout (`agent-command-layout.tsx`) still consumes
`useOpenClawStatus(5000)` for now — left intact per the user's "only
the command center" scope. Folding that one in is a separate,
smaller pass once we're sure no regression slipped here.

* test(agents): teach the route fake service about the new listing shape

PR #861 CI surfaced two failures in tests/api/routes/agents.test.ts:
both call \`GET /agents\` and the route handler now invokes
\`service.listAgentsWithActivity()\` + \`service.getGatewayStatus()\`
which the fake created here didn't implement. Add both methods to
the fake (returning idle / null) and update the empty-list assertion
to expect the new \`{agents, gateway}\` envelope.
2026-04-29 19:03:29 +05:30
Dani Akash
0c84547e8f feat(agents): migrate OpenClaw chat onto the unified harness/ACP path (#859)
* chore(acp): smoke-test ACP capabilities against running gateway

Adds apps/server/scripts/acp-smoke.ts which spawns `openclaw acp`
inside the gateway container and exercises every method we plan to
depend on: initialize, newSession, prompt (text + image), cancel,
listSessions, loadSession.

SDK pinned to 0.19.1 (Bun's minimum-release-age policy blocks 0.20+
which were released < 7 days ago).

Findings (full notes in plan outcomes):
- promptCapabilities advertises image:true but the model does NOT see
  image bytes — silently dropped at the bridge.
- sessionCapabilities advertises {list:{}} but session/list throws
  "Method not found": stale capability advertising.
- loadSession works; replays user/assistant/thought text and
  session_info/usage/commands updates. No tool_call replay, as
  documented.
- cancel works end-to-end: stopReason=cancelled.
- closeSession/resumeSession are not on ClientSideConnection in
  0.19.1; kill child to close, use loadSession for rebind.

Plan revisions triggered by spike are recorded in
plans/browseros-ai/BrowserOS/features/2026-04-28-2310-claude-code-acp-implementation-roadmap.md.

* chore(acp): re-run smoke on SDK 0.21.0 and add mode/config/auth scenarios

After bypassing Bun's minimum-release-age and upgrading the SDK to
0.21.0, restore the previously-skipped resume/close paths and add
three new scenarios: mode (setSessionMode), config (setSessionConfigOption,
correct configId field), and auth (authenticate noop).

Findings, all bridge-side (independent of SDK):
- session/list, session/resume, session/close all throw -32601 on
  OpenClaw 2026.4.12 — capability advertising is stale.
- Image content blocks silently dropped; model never sees the bytes.
- setSessionMode and setSessionConfigOption work; latter requires
  `configId` (not `optionId`) per the schema.
- loadSession replays user/assistant/thought text + session_info +
  usage + available_commands; no tool_call replay (documented).
- authenticate is a noop on OpenClaw (no authMethods advertised).

Plan outcomes updated with full method-support matrix.

* chore(deps): promote @agentclientprotocol/sdk to a runtime dependency

The smoke script in apps/server/scripts/acp-smoke.ts used the SDK as
devDependency. The upcoming ACP bridge (apps/server/src/api/services/acp/)
needs it at runtime, not just for tooling. Move the entry from
devDependencies to dependencies, alphabetically first under @a*.

Pinned to 0.21.0 — same version the smoke script validated against.
README gains a small Dependencies note pointing at the future bridge
location.

No code changes yet. The bridge wiring lands in subsequent commits.

* fix(openclaw): wire LlmProvider.supportsImages through to OpenClaw model config

When BrowserOS sets up a custom OpenAI-compat provider on the gateway,
the agent UI's "Supports Image" flag (LlmProviderConfig.supportsImages)
was being dropped on the floor. As a result the persisted model entry
had no `input` field, OpenClaw defaulted it to ['text'], and image_url
content parts were silently stripped before the model saw them.

Fix:
- Extend OpenClawSetupInput / OpenClawAgentMutationInput on the agent
  side (useOpenClaw.ts) and the route body schema + SetupInput +
  createAgent input on the server side with `supportsImages?: boolean`.
- AgentsPage forwards `llmOption?.supportsImages` from the selected
  LlmProviderConfig in both handleSetup and handleCreate.
- provider-map.resolveSupportedOpenClawProvider emits
  `input: ['text', 'image']` on the model entry when the flag is
  truthy; otherwise emits the explicit `['text']` so the value is
  always pinned (avoids relying on OpenClaw's implicit default).
- applyBrowserosConfig adds `tools.media.image.enabled = true` to the
  bootstrap batch so the gateway's image-understanding pipeline is
  always wired up — per-model `input` still gates which models see
  images, this just enables the global path.

ACP image content blocks are still dropped by the OpenClaw bridge —
that's a separate bridge bug, not addressed here. This commit
restores image support for the OpenAI-compat /v1/chat/completions
path that the upcoming ACP chat panel will use as a carve-out for
image-bearing prompts.

Existing custom-provider configs are NOT auto-migrated; users will
re-acquire image support either by re-running setup or by editing
their model entries' `input` field manually. A migration pass for
legacy installs is not in scope for this commit because the
"supportsImages" intent isn't recoverable from the persisted config
alone — the source of truth is the LlmProvider record on the agent
side.

* feat(agents): add OpenClaw to AgentAdapter union and catalog

Extends AgentAdapter to 'claude' | 'codex' | 'openclaw' and adds the
OpenClaw entry to AGENT_ADAPTER_CATALOG. The new entry has:

- defaultModelId: 'default' — OpenClaw's ACP bridge does not surface
  per-session model selection (verified during the ACP spike), so
  models live in the OpenClawService config, not in the adapter
  catalog. AgentDefinition.modelId carries the gateway-side model
  name for display only.
- models: [] — empty list signals "no per-session model picker" in
  the UI; isSupportedAgentModel('openclaw', undefined|'default')
  returns true via the existing fallback path.
- reasoningEfforts mirror OpenClaw's session-level `thought_level`
  config option (off / minimal / low / medium / high / adaptive).

Also extends:
- isAgentAdapter type guard recognizes 'openclaw'
- HarnessAgentAdapter union on the extension side
- agents.test.ts createAgent fake type
- agent-catalog.test.ts asserts on the new entry, empty model list
  passthrough behavior, and OpenClaw's reasoning effort set

Lockfile delta is the workspace SDK pin reconciling 0.20.0 (taken
from dev's lock) up to our package.json's 0.21.0 (added in
c1d987ea). acpx still uses 0.20.0 transitively — both are present.

No runtime wiring yet — the registry override and AcpxRuntime
plumbing land in subsequent commits.

* feat(agents): plumb OpenClaw gateway accessors into AcpxRuntime

Adds an optional `openclawGateway` accessor to AcpxRuntime so the
upcoming registry override (Step 4) can spawn `openclaw acp` inside
the gateway container with the right port, token, and container/VM
identity. All accessors are getter-shaped so values stay live across
gateway restarts (port can change, token can rotate).

The accessor is threaded:
  server.ts → createAgentRoutes → AgentHarnessService → AcpxRuntime
                            ↘ sidepanel lazy AcpxRuntime

Also adds OpenClawService.getGatewayToken() returning the in-memory
token string. We pass it via OPENCLAW_GATEWAY_TOKEN env var on the
spawn (per OpenClaw's documented env-var precedence) instead of via
`--token` flag (which leaks to ps aux) or `--token-file` path (no
discrete token file lives inside the container — the token is nested
inside openclaw.json).

Wiring is dormant — the registry override that consumes these
accessors lands in Step 4. Typecheck + existing acpx/harness/routes
tests pass unchanged.

* refactor(agents): scrub local plan-step references from code comments

Replaces forward-looking comments that referenced internal plan
steps (e.g. "Step 4 wires this into…") with comments that justify
the code on its own merits. Plan files live locally on the
contributor's machine, so cross-references are noise to the rest of
the project.

No behavior change.

* feat(agents): spawn openclaw ACP adapter inside the gateway container

When the harness resolves the `openclaw` adapter, it now returns a
command that runs `openclaw acp` inside the bundled gateway container
via `limactl shell <vm> -- nerdctl exec -i ... openclaw acp --url
ws://127.0.0.1:<port>`. This reuses the openclaw binary already
installed alongside the gateway — no host-side openclaw install is
required.

Auth: the gateway token is injected via OPENCLAW_GATEWAY_TOKEN on
the container exec rather than `--token` on the openclaw CLI, so
the secret never appears in `ps aux`.

Banner output: OPENCLAW_HIDE_BANNER=1 and OPENCLAW_SUPPRESS_NOTES=1
keep stdout JSON-RPC-clean.

LIMA_HOME: prefixed via `env LIMA_HOME=<path>` on the resolved
command so the spawned limactl finds the BrowserOS-owned VM (the
server doesn't set LIMA_HOME on its own process env).

When the gateway accessor is absent, falls through to acpx's
built-in openclaw adapter which assumes a host-side install — that
branch will fail at spawn time with a descriptive error.

Verified end-to-end via the existing acp-smoke script during the
Step 0 spike.

* feat(agents): dual-create OpenClaw harness agents on the gateway

When the harness creates an `openclaw` adapter agent, it now also
provisions a matching agent on the OpenClaw gateway via the existing
CLI path (OpenClawService.createAgent). Symmetric on delete: gateway
removeAgent runs alongside the harness-store delete.

- Adds an OpenClawProvisioner interface (decoupled from OpenClawService
  for testability) and injects it through AgentHarnessService.
- createAgent rolls back the harness record if gateway provisioning
  fails; deleteAgent tolerates gateway-side failures so harness
  identity stays consistent with the user-facing UI.
- New OpenClawProvisionerUnavailableError surfaces as a 503 when an
  openclaw create request lands on a harness with no provisioner
  wired in (instead of a generic 500).
- FileAgentStore mints openclaw agent ids with an 'oc-' prefix so
  the id satisfies the gateway's `^[a-z][a-z0-9-]*$` agent name
  pattern. Other adapters keep raw UUIDs to preserve compatibility.
- POST /agents body schema accepts providerType / providerName /
  baseUrl / apiKey / supportsImages, forwarded to the provisioner
  when adapter='openclaw'.

The agents-page UI still routes openclaw create through the legacy
/claw/agents flow; switching that path to the harness is a separate
UI cutover.

Tests cover: gateway dual-create on success, rollback on gateway
failure, 503 when provisioner is missing, and tolerant delete on
gateway-side failure.

* fix(agents): skip catalog model validation for OpenClaw adapter

OpenClaw agents resolve their model from the gateway-side provider
config (set at agent-create time via OpenClawService) rather than
from the harness catalog, which has an empty `models: []` entry by
design. Without this carve-out, every OpenClaw create body fails
parsing with "Invalid modelId" because no concrete model id can
satisfy isSupportedAgentModel('openclaw', ...).

The reasoning-effort check still runs against the catalog (those
values map directly to OpenClaw's session `thought_level` config
option).

* fix(agents): pass --session to openclaw bridge so newSession routes correctly

acpx's AcpClient.createSession calls connection.newSession with cwd
and mcpServers but never forwards the sessionKey. Without it, the
openclaw bridge falls back to a synthetic acp:<uuid> session that
doesn't resolve to any provisioned gateway agent — every harness
chat returns a generic "Internal error" from -32603.

Fix: bake `--session <key>` into the resolved spawn command. The
bridge then uses that as the default session key for any newSession
the bridge receives, routing the turn to the matching gateway agent.

Per-session keying means each openclaw agent gets its own
AcpxCoreRuntime instance (cached by sessionKey on top of the
existing cwd/permissionMode key). This adds one extra runtime per
active openclaw session — claude/codex are unaffected.

Test asserts the resolved command includes the right --session arg.

* fix(agents): suppress BrowserOS MCP for openclaw bridge

The openclaw ACP bridge rejects newSession when mcpServers is non-empty
because its provider tooling comes from the gateway, not from ACP-side
MCP servers. Forwarding the BrowserOS HTTP MCP made every harness chat
fail with a JSON-RPC -32603 "Internal error" before the session was even
opened. Claude/codex still need the BrowserOS MCP for browser tooling,
so the carve-out is keyed off whether the runtime is for an openclaw
session.

* feat(agents): route OpenClaw chat through the harness behind a flag

Adds the `feature.useAcpxForOpenClaw` extension storage flag. When on,
OpenClaw agents in the agent-command chat panel use the harness
/agents/<id>/chat SSE and harness history hook instead of the legacy
/claw/agents/<id>/chat. When off, behavior is unchanged.

Also dedupes the agent rail when the same id appears in both stores
(dual-created agents from /claw/agents and /agents) by preferring the
harness entry — without this, every dual-created OpenClaw agent shows
up twice after Step 5.

Image attachments are temporarily disabled when the harness path is
active; the carve-out lands in the next commit.

* fix(agents): keep legacy OpenClaw agents on ClawChat

The previous commit's flag-gated branch routed every `source='openclaw'`
agent through `/agents/<id>/chat` when the flag was on, but the layout
dedup means the only agents that ever reach that branch are legacy
gateway-only entries (`main`, orphan agents from rolled-back creates) —
which by definition have no harness record, so the harness path 404s
and chat is unusable. Source is the only routing signal again: harness
agents go through the harness, legacy agents stay on ClawChat. The
storage flag stays for Step 9/10's migration story.

* feat(agents): expose OpenClaw in sidepanel and route through gateway main

`buildSidepanelChatTargets` now emits a single default ACP target for
adapters with no per-session model picker (OpenClaw, whose model is
configured on the gateway-side agent). Without this, OpenClaw never
appeared in the sidepanel target picker because the catalog entry has
`models: []`.

Sidepanel sessions don't have a dedicated provisioned gateway agent.
The openclaw bridge `--session` flag previously got the raw sidepanel
key (`sidepanel:<convId>:openclaw:...`), which doesn't match any
gateway agent — newSession was accepted but every prompt hung
forever. The bridge command now rewrites non-harness session keys
onto the always-present `main` gateway agent, encoding the original
key as a channel suffix to keep state segregated per conversation.
Verified end-to-end via curl: sidepanel openclaw chat streams
`text-delta` + `finish: stop`.

* feat(agents): backfill harness records for legacy gateway agents

Reframes Step 9 of the OpenClaw-on-acpx migration. The plan's literal
Step 9 (route OpenClaw history through the harness when the flag is on)
was already a no-op after the Step 6 walkback — history is routed by
source today. The actual blocker for Steps 10–13 was that legacy
gateway-only agents (e.g. `main`, orphans from rolled-back creates) had
no harness record, so they could never migrate to the harness path
without breaking chat.

`AgentHarnessService.reconcileWithGateway()` now lists every gateway
agent and upserts a matching harness record for any that are missing.
The pass runs lazily on first `listAgents()` call (memoized on success,
retried on failure so a gateway-down boot doesn't permanently disable
backfill). Verified end-to-end: the legacy `agent` agent now streams
`text_delta` + `done(end_turn)` through `/agents/agent/chat`, with the
bridge resolving to the gateway's `agent` record via the existing
`agent:<name>:main` session-key format.

After this, every OpenClaw agent surfaces as `source='agent-harness'`
post-dedup, the legacy `useClawChatHistory` hook becomes unreachable
for OpenClaw, and Steps 11–13 (delete legacy chat/history paths) are
unblocked.

* fix(agents): drop duplicate OpenClaw entry from NewAgentDialog adapter list

The adapter Select hardcoded an `<SelectItem value="openclaw">OpenClaw</SelectItem>`
on top of iterating `adapters`, which now includes OpenClaw post the
catalog change. The dropdown rendered "OpenClaw" twice — once at the
top, once at the bottom of the list. The literal was a pre-catalog
artifact; removing it leaves a single OpenClaw entry sourced from the
catalog. Routing into `handleOpenClawCreate` is unchanged because
the value (`'openclaw'`) is identical either way.

* fix(agents): always reconcile harness with gateway on list, just dedupe concurrent calls

Memoizing the first successful reconcile meant new gateway agents (created
via the legacy /claw/agents path or out-of-band CLI) never appeared in the
harness until server restart. The Promise now serves as a concurrent-call
dedupe only — cleared on settle — so every listAgents call picks up the
current gateway state. Reconcile is one cheap idempotent CLI call.

* chore(agents): remove dormant useAcpxForOpenClaw flag

The flag was scaffolded in Step 6 but its routing effect was walked
back the same day after it broke chat for legacy gateway-only agents.
After the Step 9 backfill, every OpenClaw agent has a harness record
and routes through the harness path purely from `source='agent-harness'`
— no flag is consulted anywhere. Remove the dead storage item, hook,
and stale comment.

* refactor(agents): drop legacy /claw/agents/:id/history endpoint

The harness /agents/:id/sessions/main/history endpoint replaced this
once every OpenClaw agent got a harness record (Step 9 backfill).
Routing is fully source-driven now, so the UI's useClawChatHistory
hook is never enabled today — verified live: legacy URL returns 404,
harness history hydrates correctly for the same agent.

Removes the GET /claw/agents/:id/history route, OpenClawService's
getAgentHistoryPage method plus its cursor/limit helpers and the
history-only types it owned (BrowserOSOpenClawHistoryPageResponse,
HistoryPageInput, normalizeHistoryLimit, encodeHistoryCursor,
decodeHistoryCursor, jsonlEventsToHistoryItems), and the route +
service tests that covered the dropped endpoint.

OpenClawJsonlReader stays alive — still feeds /claw/dashboard,
/claw/agents/:id/sessions, and the boot-time clawSession seed.
Removing those is its own follow-up since the dashboard would need
a harness-side replacement first.

* feat(agents): wire image attachments through the harness ACP path

Composer attachments now flow into the ACP `prompt` request as
spec-compliant `image` content blocks alongside the user's text. End
to end:

  composer → chatWithHarnessAgent({attachments}) →
  POST /agents/:id/chat {message, attachments} →
  parseChatBody decodes data: URLs to {mediaType, base64} →
  AgentHarnessService.send forwards →
  AcpxRuntime.send forwards →
  acpx startTurn({attachments}) → ACP image blocks

UI no longer disables the attach button on harness agents — the
gating was just a placeholder before this commit landed. Verified
end to end with a 1×1 red PNG against a Claude harness agent: model
replies "Red." correctly.

OpenClaw's `acp` bridge still drops image content blocks upstream
(verified by the same probe — Kimi-k2p5 reports "I don't see an
image"). That's an upstream openclaw limitation, not a harness-side
gap; Claude/Codex agents work as advertised today.

* chore(openclaw): delete OpenClawJsonlReader and JSONL-backed routes

* chore(openclaw): remove legacy /claw/agents/:id/chat and /queue routes

* chore(agents): collapse chat panel to harness-only path

* feat(agents): route OpenClaw image turns through the gateway HTTP client

The OpenClaw `acp` bridge silently drops ACP `image` content blocks
(verified during dogfood — model says "I don't see an image"). When
the user attaches images to an OpenClaw agent, the harness now diverts
that turn to the gateway's HTTP `/v1/chat/completions` endpoint, which
accepts OpenAI-style `image_url` parts and forwards them natively to
the provider.

  - New `OpenClawGatewayChatClient` translates an OpenAI streaming
    response into the same `AgentStreamEvent` shape the rest of the
    harness already consumes, so the chat panel renders identically
    whether a turn went through ACP or the gateway carve-out.
  - `AcpxRuntime.send` forks at the top: openclaw + any image
    attachment + a wired gateway client → `sendOpenclawViaGateway`.
    Other turns (text-only openclaw, claude, codex) take the existing
    ACP path unchanged.
  - The diverted path reads the prior turn history from the acpx
    session record so context is preserved, builds the OpenAI
    multimodal user message with text + image_url parts, and pumps
    the gateway SSE back to the caller through a tee that accumulates
    the assistant text. On natural completion, persists a synthetic
    user+assistant message pair to the acpx session record so reload
    shows the image turn in history.
  - Wired `OpenClawGatewayChatClient` into `AgentHarnessService` via
    `server.ts` (gateway port + token accessor, just like the existing
    `openclawGateway`).

Persistence note: the acpx record requires User messages to carry an
`id` and Agent messages to carry `tool_results` — without them the
record fails to round-trip through `parseSessionRecord`. The persist
helper now sets both.

Limitation by design: image recognition only works if the OpenClaw
agent's provider supports vision (e.g. Claude-via-OpenClaw, GPT-4o).
The pipeline routes images correctly to the provider regardless;
text-only providers like Kimi-k2p5 will reply "I don't see an image"
because the model itself has no vision capability — that's a provider
config issue, not a routing bug. The unit test asserts the image_url
part is present in the OpenAI request the gateway client sends.

The wider plan (background-resilient chat, queue, replay) remains in
`plans/.../2026-04-29-1527-...-background-resilient-chat-and-image-uploads.md`
as Phases 3–12; this commit ships only Phases 1–2.

* feat(agents): validate inbound image attachments on /agents/:id/chat

The harness chat body parser was accepting any mediaType and any
dataUrl length. The composer enforces these caps client-side but the
endpoint also serves direct curl/script callers, so the server has to
defend itself.

Restores the same caps the legacy /claw/agents/:id/chat parser had
before it was deleted in the migration:

  - 10 attachments per message
  - 5 MB raw image bytes (≈ 6.7 MB once base64-encoded plus prefix)
  - PNG / JPEG / WebP / GIF only
  - Must start with `data:`

Each violation returns 400 with a specific error message instead of
silently dropping or forwarding the payload.
2026-04-29 16:37:03 +05:30
Nikhil
2ff5c12840 feat: add sidepanel ACP chat targets (#857)
* feat(agent): add sidepanel chat target catalog

* feat(agent): show acp models in sidepanel selector

* feat(server): adapt acp events to ui message streams

* feat(server): add sidepanel acp chat route

* feat(agent): route sidepanel chat through acp targets

* chore: self-review fixes

* fix: address review feedback for PR #857
2026-04-28 18:23:38 -07:00
Nikhil
d87422eea1 fix: hide BrowserOS ACP wrapper in history (#856) 2026-04-28 17:31:11 -07:00
Nikhil
1946ca0cf8 chore: clean up unused agent sdk (#855) 2026-04-28 17:21:46 -07:00
Nikhil
754f7d0e1d test: cover terminal limactl resolver errors (#854) 2026-04-28 17:12:08 -07:00
Nikhil
85bb3f7b42 fix: avoid eager limactl resolution in server tests (#853) 2026-04-28 16:56:41 -07:00
Nikhil
cb32b8191d fix: show rich ACP harness history from ACPX (#852)
* fix: load ACP harness history from ACPX

* fix: address ACP history review comments
2026-04-28 16:40:22 -07:00
Nikhil
7a92654abc feat: add BrowserOS MCP to ACP agents (#851)
* feat: add BrowserOS MCP to ACP agents

* fix: bypass ACP agent permissions

* fix: address review feedback for PR #851
2026-04-28 16:30:20 -07:00
304 changed files with 20701 additions and 14731 deletions

View File

@@ -1,157 +0,0 @@
name: build-agent
on:
workflow_dispatch:
inputs:
agent:
description: "Agent name from bundle.json"
required: true
type: string
default: openclaw
publish:
description: "Upload to R2 and merge manifest slice"
required: false
default: false
type: boolean
pull_request:
paths:
- "packages/browseros-agent/packages/build-tools/**"
- ".github/workflows/build-agent.yml"
env:
BUN_VERSION: "1.3.6"
PKG_DIR: packages/browseros-agent/packages/build-tools
permissions:
contents: read
jobs:
check:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- working-directory: packages/browseros-agent
run: bun run --filter @browseros/build-tools typecheck
- working-directory: packages/browseros-agent
run: bun run --filter @browseros/build-tools test
build:
needs: check
strategy:
fail-fast: false
matrix:
include:
- arch: arm64
runner: ubuntu-24.04-arm
runs-on: ${{ matrix.runner }}
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- name: Install podman
run: |
sudo apt-get update
sudo apt-get install -y podman
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Build tarball
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
OUT: ${{ github.workspace }}/dist/images
run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
- uses: actions/upload-artifact@v4
with:
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
path: dist/images/
retention-days: 7
smoke:
needs: build
runs-on: ubuntu-24.04-arm
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- uses: actions/download-artifact@v4
with:
name: tarball-${{ inputs.agent || 'openclaw' }}-arm64
path: dist/images
- name: Install podman
run: |
sudo apt-get update
sudo apt-get install -y podman
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Smoke test tarball
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
run: |
set -euo pipefail
tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-arm64.tar.gz" -print -quit)"
if [ -z "$tarball" ]; then
echo "missing arm64 tarball artifact for ${AGENT}" >&2
exit 1
fi
bun run smoke:tarball -- --agent "$AGENT" --arch arm64 --tarball "$tarball"
publish:
needs: [build, smoke]
if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
runs-on: ubuntu-24.04
environment: release
concurrency:
group: r2-manifest-publish
cancel-in-progress: false
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- uses: actions/download-artifact@v4
with:
pattern: tarball-*
path: dist/images
merge-multiple: true
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Upload tarballs to R2
working-directory: ${{ env.PKG_DIR }}
env:
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
R2_BUCKET: ${{ secrets.R2_BUCKET }}
run: |
set -euo pipefail
for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
base="$(basename "$file")"
bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
done
- name: Merge agent slice into manifest
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
R2_BUCKET: ${{ secrets.R2_BUCKET }}
run: |
set -euo pipefail
mkdir -p dist/images
cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
bun run emit-manifest -- \
--slice "agents:${AGENT}" \
--dist-dir dist \
--merge-from dist/baseline-manifest.json \
--out dist/manifest.json
bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"

View File

@@ -14,7 +14,7 @@ on:
config:
description: 'Eval config file (relative to apps/eval/)'
required: false
default: 'configs/browseros-agent-weekly.json'
default: 'configs/legacy/browseros-agent-weekly.json'
permissions:
contents: read
@@ -42,10 +42,12 @@ jobs:
- name: Install dependencies
working-directory: packages/browseros-agent
run: bun install --ignore-scripts && bun run build:agent-sdk
run: bun install --ignore-scripts
- name: Install Python eval dependencies
run: pip install agisdk requests
# agisdk pinned so silent upstream releases can't shift task definitions
# or grader behavior. Bump intentionally with a documented re-baseline.
run: pip install agisdk==0.3.5 requests
- name: Clone WebArena-Infinity
run: git clone --depth 1 https://github.com/web-arena-x/webarena-infinity.git /tmp/webarena-infinity
@@ -60,33 +62,27 @@ jobs:
curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
unzip -qo /tmp/nopecha.zip -d extensions/nopecha
- name: Run eval
- name: Run eval and publish to R2
working-directory: packages/browseros-agent/apps/eval
env:
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
run: |
echo "Running eval with config: $EVAL_CONFIG"
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"
- name: Upload runs to R2
if: success()
working-directory: packages/browseros-agent/apps/eval
env:
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
# OpenClaw container runtime is macOS-only; opt the Linux runner
# into the no-op stub so the server can boot and the eval can run.
BROWSEROS_SKIP_OPENCLAW: '1'
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
run: |
CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
bun scripts/upload-run.ts "results/$CONFIG_NAME"
echo "Running eval with config: $EVAL_CONFIG"
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
- name: Generate trend report
if: success()
@@ -107,3 +103,11 @@ jobs:
with:
name: eval-report-${{ github.run_id }}
path: /tmp/eval-report.html
- name: Upload server stderr logs (for post-mortem on startup failures)
if: always()
uses: actions/upload-artifact@v4
with:
name: browseros-server-logs-${{ github.run_id }}
path: /tmp/browseros-server-logs/
if-no-files-found: ignore

View File

@@ -1,168 +1,11 @@
name: Release BrowserOS Agent SDK
name: Release BrowserOS Agent SDK (disabled)
on:
workflow_dispatch:
concurrency:
group: release-agent-sdk
cancel-in-progress: false
jobs:
publish:
if: github.ref == 'refs/heads/main'
disabled:
if: ${{ false }}
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
defaults:
run:
working-directory: packages/browseros-agent/packages/agent-sdk
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- uses: oven-sh/setup-bun@v2
- uses: actions/setup-node@v6
with:
node-version: "20"
registry-url: "https://registry.npmjs.org"
- name: Install dependencies
run: bun ci
working-directory: packages/browseros-agent
- name: Build
run: bun run build
- name: Test
run: bun test
- name: Get version
id: version
run: |
echo "version=$(node -p "require('./package.json').version")" >> "$GITHUB_OUTPUT"
echo "release_sha=$(git rev-parse HEAD)" >> "$GITHUB_OUTPUT"
- name: Generate release notes
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
SDK_PATH="packages/browseros-agent/packages/agent-sdk"
CURRENT_TAG="agent-sdk-v${{ steps.version.outputs.version }}"
# Find the previous tag, excluding the current version's tag
# (which may already exist from a prior failed run)
PREV_TAG=$(git tag -l "agent-sdk-v*" --sort=-v:refname | grep -v "^${CURRENT_TAG}$" | head -n 1)
if [ -z "$PREV_TAG" ]; then
echo "Initial release" > /tmp/release-notes.md
else
# Get commits scoped to the SDK directory
COMMITS=$(git log "$PREV_TAG"..HEAD --pretty=format:"%H" -- "$SDK_PATH")
if [ -z "$COMMITS" ]; then
echo "No notable changes." > /tmp/release-notes.md
else
echo "## What's Changed" > /tmp/release-notes.md
echo "" >> /tmp/release-notes.md
# For each commit, find the associated PR and format with author
CONTRIBUTORS=""
while IFS= read -r SHA; do
# Get commit subject and author
SUBJECT=$(git log -1 --pretty=format:"%s" "$SHA")
AUTHOR=$(git log -1 --pretty=format:"%an" "$SHA")
GITHUB_USER=$(gh api "/repos/${{ github.repository }}/commits/${SHA}" --jq '.author.login // empty' 2>/dev/null)
# Find associated PR number
PR_NUM=$(gh api "/repos/${{ github.repository }}/commits/${SHA}/pulls" --jq '.[0].number // empty' 2>/dev/null)
# Format line: skip PR number if already in the commit subject
# (squash merges include "(#123)" in the subject automatically)
if [ -n "$PR_NUM" ] && ! echo "$SUBJECT" | grep -qF "(#${PR_NUM})"; then
echo "- ${SUBJECT} (#${PR_NUM})" >> /tmp/release-notes.md
else
echo "- ${SUBJECT}" >> /tmp/release-notes.md
fi
done <<< "$COMMITS"
fi
fi
working-directory: ${{ github.workspace }}
- name: Publish
run: npm publish --access public
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
- name: Create GitHub release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TAG="agent-sdk-v${{ steps.version.outputs.version }}"
RELEASE_SHA="${{ steps.version.outputs.release_sha }}"
TITLE="BrowserOS Agent SDK - v${{ steps.version.outputs.version }}"
# Create or reuse tag (idempotent for re-runs)
if git rev-parse "$TAG" >/dev/null 2>&1; then
echo "Tag $TAG already exists, skipping tag creation"
else
git tag "$TAG" "$RELEASE_SHA"
fi
# Push tag (skip if already on remote)
if git ls-remote --tags origin "$TAG" | grep -q "$TAG"; then
echo "Tag $TAG already on remote, skipping push"
else
git push origin "$TAG"
fi
# Create or update release
if gh release view "$TAG" >/dev/null 2>&1; then
echo "Release $TAG already exists, updating"
gh release edit "$TAG" --title "$TITLE" --notes-file /tmp/release-notes.md
else
gh release create "$TAG" --title "$TITLE" --notes-file /tmp/release-notes.md
fi
working-directory: ${{ github.workspace }}
- name: Update CHANGELOG.md via PR
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION="${{ steps.version.outputs.version }}"
DATE=$(date -u +"%Y-%m-%d")
BRANCH="docs/agent-sdk-changelog-v${VERSION}"
CHANGELOG="packages/browseros-agent/packages/agent-sdk/CHANGELOG.md"
# Return to main before branching
git checkout main
# Use head/tail to safely insert without sed quoting issues
{
head -n 1 "$CHANGELOG"
echo ""
echo "## v${VERSION} (${DATE})"
echo ""
cat /tmp/release-notes.md
echo ""
tail -n +2 "$CHANGELOG"
} > /tmp/new-changelog.md
mv /tmp/new-changelog.md "$CHANGELOG"
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git checkout -b "$BRANCH"
git add "$CHANGELOG"
git commit -m "docs: update agent-sdk changelog for v${VERSION}"
git push origin "$BRANCH"
gh pr create \
--title "docs: update agent-sdk changelog for v${VERSION}" \
--body "Auto-generated changelog update for BrowserOS Agent SDK v${VERSION}." \
--base main \
--head "$BRANCH"
gh pr merge "$BRANCH" --squash --auto || true
working-directory: ${{ github.workspace }}
- run: echo "Agent SDK publishing is disabled."

View File

@@ -54,28 +54,24 @@ jobs:
command: (cd apps/server && bun run test:integration)
junit_path: test-results/server-integration.xml
needs_browser: true
- suite: server-sdk
command: (cd apps/server && bun run test:sdk)
junit_path: test-results/server-sdk.xml
needs_browser: true
- suite: server-lib
command: (cd apps/server && bun run test:lib)
junit_path: test-results/server-lib.xml
needs_browser: false
- suite: server-root
command: (cd apps/server && bun run test:root)
junit_path: test-results/server-root.xml
needs_browser: false
- suite: agent
command: bun run test:agent
command: (cd apps/agent && bun run test)
junit_path: test-results/agent.xml
needs_browser: false
- suite: eval
command: bun run test:eval
command: (cd apps/eval && bun run test)
junit_path: test-results/eval.xml
needs_browser: false
- suite: agent-sdk
command: bun run test:agent-sdk
junit_path: test-results/agent-sdk.xml
needs_browser: false
- suite: build
command: bun run test:build
command: bun run ./scripts/run-bun-test.ts ./scripts/build
junit_path: test-results/build.xml
needs_browser: false

View File

@@ -188,6 +188,21 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
- [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.
## Citation
If you use BrowserOS in your research or project, please cite:
```bibtex
@software{browseros2025,
author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
title = {BrowserOS: The open-source Agentic browser},
url = {https://github.com/browseros-ai/BrowserOS},
year = {2025},
publisher = {GitHub},
license = {AGPL-3.0},
}
```
## License
BrowserOS is open source under the [AGPL-3.0 license](LICENSE).

View File

@@ -1,6 +1,6 @@
# BrowserOS Agent
The agent platform powering [BrowserOS](https://github.com/browseros-ai/BrowserOS) — contains the MCP server, agent UI, CLI, evaluation framework, and SDK.
The agent platform powering [BrowserOS](https://github.com/browseros-ai/BrowserOS) — contains the MCP server, agent UI, CLI, and evaluation framework.
## Monorepo Structure
@@ -12,7 +12,6 @@ apps/
eval/ # Evaluation framework for benchmarking agents
packages/
agent-sdk/ # Node.js SDK (@browseros-ai/agent-sdk)
cdp-protocol/ # Type-safe Chrome DevTools Protocol bindings
shared/ # Shared constants (ports, timeouts, limits)
```
@@ -23,7 +22,6 @@ packages/
| `apps/agent` | Agent UI — Chrome extension for the chat interface |
| `apps/cli` | Go CLI — control BrowserOS from the terminal or AI coding agents |
| `apps/eval` | Benchmark framework — WebVoyager, Mind2Web evaluation |
| `packages/agent-sdk` | Node.js SDK for browser automation with natural language |
| `packages/cdp-protocol` | Auto-generated CDP type bindings used by the server |
| `packages/shared` | Shared constants used across packages |
@@ -81,14 +79,15 @@ cp apps/server/.env.example apps/server/.env.development
cp apps/agent/.env.example apps/agent/.env.development
cp apps/server/.env.production.example apps/server/.env.production
# Install deps, generate agent code, and sync the VM cache
# Install deps and generate agent code
bun run dev:setup
# Start the full dev environment
bun run dev:watch
```
`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.
`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
the server startup path and pulls the configured GHCR image on demand.
### Environment Variables
@@ -158,9 +157,14 @@ bun run build:server # Build production server resource artifacts and u
bun run build:agent # Build agent extension
# Test
bun run test # Run standard tests
bun run test:cdp # Run CDP-based tests
bun run test:integration # Run integration tests
bun run test # Run all tests
bun run test:all # Run all tests
bun run test:main # Run key server tools and integration tests
# App-specific test groups (from packages/browseros-agent)
cd apps/server && bun run test:tools
cd apps/server && bun run test:cdp
cd apps/server && bun run test:integration
# Quality
bun run lint # Check with Biome

View File

@@ -0,0 +1,50 @@
import type { Provider } from './chatComponentTypes'
export interface ProviderOptionGroup {
key: 'llm' | 'acp'
label: string
options: Provider[]
}
export function groupProviderOptions(
providers: Provider[],
): ProviderOptionGroup[] {
const llm = providers.filter((provider) => provider.kind !== 'acp')
const acp = providers.filter((provider) => provider.kind === 'acp')
return [
...(llm.length
? [{ key: 'llm' as const, label: 'AI Providers', options: llm }]
: []),
...(acp.length
? [{ key: 'acp' as const, label: 'Agents', options: acp }]
: []),
]
}
export function getProviderSearchValue(
provider: Provider,
groupLabel: string,
): string {
return [
provider.id,
provider.name,
provider.type,
groupLabel,
provider.adapterName,
provider.modelLabel,
]
.filter(Boolean)
.join(' ')
}
export function getProviderSubtitle(provider: Provider): string | undefined {
if (provider.kind !== 'acp') return undefined
return [
provider.adapterName,
provider.modelLabel,
provider.modelControl === 'best-effort' ? 'best effort' : undefined,
]
.filter(Boolean)
.join(' · ')
}

View File

@@ -0,0 +1,72 @@
import { describe, expect, it } from 'bun:test'
import {
getProviderSearchValue,
getProviderSubtitle,
groupProviderOptions,
} from './ChatProviderSelector.helpers'
import type { Provider } from './chatComponentTypes'
const options: Provider[] = [
{ kind: 'llm', id: 'browseros', name: 'BrowserOS', type: 'browseros' },
{
kind: 'llm',
id: 'anthropic-sonnet',
name: 'Anthropic Sonnet',
type: 'anthropic',
},
{
kind: 'acp',
id: 'agent-claude-review',
name: 'Review Bot',
type: 'acp',
adapterName: 'Claude Code',
modelLabel: 'Haiku',
modelControl: 'best-effort',
},
{
kind: 'acp',
id: 'agent-codex-browser',
name: 'Browser Driver',
type: 'acp',
adapterName: 'Codex',
modelLabel: 'GPT-5.5',
modelControl: 'runtime-supported',
},
]
describe('groupProviderOptions', () => {
it('groups normal providers separately from created agents', () => {
expect(groupProviderOptions(options)).toEqual([
{
key: 'llm',
label: 'AI Providers',
options: [options[0], options[1]],
},
{
key: 'acp',
label: 'Agents',
options: [options[2], options[3]],
},
])
})
})
describe('getProviderSearchValue', () => {
it('matches created-agent group labels and item labels', () => {
expect(getProviderSearchValue(options[2], 'Agents')).toContain('Agents')
expect(getProviderSearchValue(options[2], 'Agents')).toContain('Review Bot')
expect(getProviderSearchValue(options[2], 'Agents')).toContain(
'Claude Code',
)
})
})
describe('getProviderSubtitle', () => {
it('describes created-agent runtime context without model-target copy', () => {
expect(getProviderSubtitle(options[2])).toBe(
'Claude Code · Haiku · best effort',
)
expect(getProviderSubtitle(options[3])).toBe('Codex · GPT-5.5')
expect(getProviderSubtitle(options[0])).toBeUndefined()
})
})

View File

@@ -1,4 +1,4 @@
import { Check, Plus } from 'lucide-react'
import { Bot, Check, Plus } from 'lucide-react'
import type { FC, PropsWithChildren } from 'react'
import { useState } from 'react'
import {
@@ -17,6 +17,11 @@ import {
import { BrowserOSIcon, ProviderIcon } from '@/lib/llm-providers/providerIcons'
import type { ProviderType } from '@/lib/llm-providers/types'
import { cn } from '@/lib/utils'
import {
getProviderSearchValue,
getProviderSubtitle,
groupProviderOptions,
} from './ChatProviderSelector.helpers'
import type { Provider } from './chatComponentTypes'
interface ChatProviderSelectorProps {
@@ -29,54 +34,58 @@ export const ChatProviderSelector: FC<
PropsWithChildren<ChatProviderSelectorProps>
> = ({ children, providers, selectedProvider, onSelectProvider }) => {
const [open, setOpen] = useState(false)
const groups = groupProviderOptions(providers)
return (
<Popover open={open} onOpenChange={setOpen}>
<PopoverTrigger asChild>{children}</PopoverTrigger>
<PopoverContent side="bottom" align="start" className="w-48 p-0">
<PopoverContent side="bottom" align="start" className="w-64 p-0">
<Command>
<CommandInput placeholder="Search providers..." className="h-9" />
<CommandInput
placeholder="Search providers or agents..."
className="h-9"
/>
<CommandList>
<div className="my-2 px-2 font-semibold text-muted-foreground text-xs uppercase tracking-wide">
AI Provider
</div>
<CommandEmpty>No provider found</CommandEmpty>
<CommandGroup>
{providers.map((provider) => {
const isSelected = selectedProvider.id === provider.id
return (
<CommandItem
key={provider.id}
value={`${provider.id} ${provider.name}`}
onSelect={() => {
onSelectProvider(provider)
setOpen(false)
}}
className={cn(
'flex w-full items-center gap-3 rounded-md p-2 transition-colors',
isSelected && 'bg-[var(--accent-orange)]/10',
)}
>
<span className="text-muted-foreground">
{provider.type === 'browseros' ? (
<BrowserOSIcon size={18} />
) : (
<ProviderIcon
type={provider.type as ProviderType}
size={18}
/>
{groups.map((group) => (
<CommandGroup key={group.key} heading={group.label}>
{group.options.map((provider) => {
const isSelected = selectedProvider.id === provider.id
const subtitle = getProviderSubtitle(provider)
return (
<CommandItem
key={provider.id}
value={getProviderSearchValue(provider, group.label)}
onSelect={() => {
onSelectProvider(provider)
setOpen(false)
}}
className={cn(
'flex w-full items-center gap-3 rounded-md p-2 transition-colors',
isSelected && 'bg-[var(--accent-orange)]/10',
)}
</span>
<span className="flex-1 text-left text-sm">
{provider.name}
</span>
{isSelected && (
<Check className="h-3.5 w-3.5 text-[var(--accent-orange)]" />
)}
</CommandItem>
)
})}
</CommandGroup>
>
<span className="text-muted-foreground">
<ProviderOptionIcon provider={provider} />
</span>
<span className="min-w-0 flex-1 text-left">
<span className="block truncate text-sm">
{provider.name}
</span>
{subtitle && (
<span className="block truncate text-muted-foreground text-xs">
{subtitle}
</span>
)}
</span>
{isSelected && (
<Check className="h-3.5 w-3.5 text-[var(--accent-orange)]" />
)}
</CommandItem>
)
})}
</CommandGroup>
))}
<div className="border-border border-t p-1">
<button
type="button"
@@ -96,3 +105,9 @@ export const ChatProviderSelector: FC<
</Popover>
)
}
function ProviderOptionIcon({ provider }: { provider: Provider }) {
if (provider.kind === 'acp') return <Bot size={18} />
if (provider.type === 'browseros') return <BrowserOSIcon size={18} />
return <ProviderIcon type={provider.type as ProviderType} size={18} />
}

View File

@@ -1,7 +1,14 @@
import type { ProviderType } from '@/lib/llm-providers/types'
export type ChatProviderType = ProviderType | 'acp'
export interface Provider {
id: string
name: string
type: ProviderType
type: ChatProviderType
kind: 'llm' | 'acp'
agentId?: string
adapterName?: string
modelLabel?: string
modelControl?: 'runtime-supported' | 'best-effort'
}

View File

@@ -1,136 +0,0 @@
import { Bot, Loader2, Wrench } from 'lucide-react'
import type { FC } from 'react'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import { cn } from '@/lib/utils'
interface AgentCardProps {
agent: AgentCardData
onClick: () => void
active?: boolean
}
function formatTimestamp(timestamp?: number): string {
if (!timestamp) return 'No activity yet'
const diff = Date.now() - timestamp
const minutes = Math.floor(diff / 60000)
if (minutes < 1) return 'just now'
if (minutes < 60) return `${minutes}m ago`
const hours = Math.floor(minutes / 60)
if (hours < 24) return `${hours}h ago`
return `${Math.floor(hours / 24)}d ago`
}
function getStatusLabel(status: AgentCardData['status']): string {
if (status === 'working') return 'Working'
if (status === 'error') return 'Error'
return 'Ready'
}
function getStatusTone(status: AgentCardData['status']): string {
if (status === 'working') return 'bg-amber-500'
if (status === 'error') return 'bg-destructive'
return 'bg-emerald-500'
}
function formatCost(usd: number): string {
if (usd < 0.005) return `$${usd.toFixed(4)}`
return `$${usd.toFixed(2)}`
}
export const AgentCardExpanded: FC<AgentCardProps> = ({
agent,
onClick,
active,
}) => (
<button
type="button"
onClick={onClick}
className={cn(
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
active
? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
: 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
)}
>
<div className="flex items-start justify-between gap-3">
<div className="flex min-w-0 items-center gap-3">
<div
className={cn(
'flex size-10 shrink-0 items-center justify-center rounded-xl',
active
? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
: 'bg-muted text-muted-foreground',
)}
>
<Bot className="size-5" />
</div>
<div className="min-w-0">
<div className="truncate font-semibold text-sm">{agent.name}</div>
<div className="truncate text-muted-foreground text-xs">
{agent.model ?? 'OpenClaw agent'}
</div>
</div>
</div>
<div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
<span
className={cn('size-2 rounded-full', getStatusTone(agent.status))}
/>
<span>{getStatusLabel(agent.status)}</span>
</div>
</div>
<div className="mt-4 flex-1">
<p className="line-clamp-2 text-foreground/90 text-sm">
{agent.lastMessage ??
'Start a conversation to see recent work and summaries.'}
</p>
</div>
<div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
<div className="flex items-center justify-between gap-3">
<span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
{agent.costUsd ? (
<span className="tabular-nums opacity-70">
{formatCost(agent.costUsd)}
</span>
) : null}
</div>
{agent.status === 'working' && agent.currentTool ? (
<div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
<Loader2 className="size-3 shrink-0 animate-spin" />
<span className="truncate">{agent.currentTool}</span>
</div>
) : agent.activitySummary ? (
<div className="flex items-center gap-1.5 text-muted-foreground/60">
<Wrench className="size-3 shrink-0" />
<span className="truncate">{agent.activitySummary}</span>
</div>
) : null}
</div>
</button>
)
export const AgentCardCompact: FC<AgentCardProps> = ({
agent,
onClick,
active,
}) => (
<button
type="button"
onClick={onClick}
className={cn(
'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
active
? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
: 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
)}
>
<span
className={cn(
'size-2 rounded-full',
active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
)}
/>
<span className="truncate">{agent.name}</span>
</button>
)

View File

@@ -1,70 +1,71 @@
import { Plus } from 'lucide-react'
import type { FC } from 'react'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import type {
HarnessAdapterDescriptor,
HarnessAdapterHealth,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import { cn } from '@/lib/utils'
import { AgentCardCompact, AgentCardExpanded } from './AgentCard'
import { HomeAgentCard } from './HomeAgentCard'
interface AgentCardDockProps {
agents: AgentCardData[]
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
activeAgentId?: string
onSelectAgent: (agentId: string) => void
onCreateAgent?: () => void
compact?: boolean
}
function CreateAgentButton({
compact,
onCreateAgent,
}: {
compact?: boolean
onCreateAgent: () => void
}) {
function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
return (
<button
type="button"
onClick={onCreateAgent}
className={cn(
'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
compact
? 'rounded-full px-3 py-2 text-sm'
: 'min-h-32 rounded-2xl px-5 py-4',
'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
)}
>
<Plus className={compact ? 'size-3.5' : 'size-5'} />
<span>{compact ? 'New' : 'Create agent'}</span>
<Plus className="size-5" />
<span>Create agent</span>
</button>
)
}
/**
* 3-column grid of HomeAgentCards plus a trailing "Create agent"
* tile. The previous `compact` mode (rendered a horizontal pill rail)
* had no callers and was dropped along with the legacy AgentCard.
*/
export const AgentCardDock: FC<AgentCardDockProps> = ({
agents,
adapters,
activeAgentId,
onSelectAgent,
onCreateAgent,
compact,
}) => {
if (agents.length === 0 && !onCreateAgent) return null
const Card = compact ? AgentCardCompact : AgentCardExpanded
const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
for (const descriptor of adapters) {
if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
}
return (
<div
className={cn(
compact
? 'flex items-center gap-2 overflow-x-auto pb-1'
: 'grid gap-4 md:grid-cols-3',
)}
>
<div className="grid gap-4 md:grid-cols-3">
{agents.map((agent) => (
<Card
key={agent.agentId}
<HomeAgentCard
key={agent.id}
agent={agent}
active={agent.agentId === activeAgentId}
onClick={() => onSelectAgent(agent.agentId)}
adapter={agent.adapter}
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
active={agent.id === activeAgentId}
onClick={() => onSelectAgent(agent.id)}
/>
))}
{onCreateAgent ? (
<CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
<CreateAgentButton onCreateAgent={onCreateAgent} />
) : null}
</div>
)

View File

@@ -1,7 +1,13 @@
import { ArrowLeft, Bot, Home } from 'lucide-react'
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
import { type FC, useEffect, useMemo, useRef } from 'react'
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
import { Button } from '@/components/ui/button'
import {
cancelHarnessTurn,
useEnqueueHarnessMessage,
useHarnessAgents,
useRemoveHarnessQueuedMessage,
} from '@/entrypoints/app/agents/useAgents'
import {
type AgentEntry,
getModelDisplayName,
@@ -15,10 +21,9 @@ import {
filterTurnsPersistedInHistory,
flattenHistoryPages,
} from './claw-chat-types'
import { QueuePanel } from './QueuePanel'
import { useAgentConversation } from './useAgentConversation'
import { useClawChatHistory } from './useClawChatHistory'
import { useHarnessChatHistory } from './useHarnessChatHistory'
import { useOutboundQueue } from './useOutboundQueue'
function StatusBadge({ status }: { status: string }) {
return (
@@ -176,19 +181,10 @@ function getAgentEntryMeta(agent: AgentEntry | undefined): string {
return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
}
function getConversationStatusCopy(status: string | undefined): string {
if (status === 'running') return 'Ready'
if (status === 'starting') return 'Connecting'
if (status === 'error') return 'Attention'
if (status === 'stopped') return 'Offline'
return 'Setup'
}
function AgentConversationController({
agentId,
initialMessage,
onInitialMessageConsumed,
status,
agents,
agentPathPrefix,
createAgentPath,
@@ -196,7 +192,6 @@ function AgentConversationController({
agentId: string
initialMessage: string | null
onInitialMessageConsumed: () => void
status: ReturnType<typeof useAgentCommandData>['status']
agents: AgentEntry[]
agentPathPrefix: string
createAgentPath: string
@@ -204,121 +199,67 @@ function AgentConversationController({
const navigate = useNavigate()
const initialMessageSentRef = useRef<string | null>(null)
const onInitialMessageConsumedRef = useRef(onInitialMessageConsumed)
const [streamSessionKey, setStreamSessionKey] = useState<string | null>(null)
const agent = agents.find((entry) => entry.agentId === agentId)
const agentName = agent?.name || agentId || 'Agent'
const isAgentHarnessAgent = agent?.source === 'agent-harness'
const clawHistoryQuery = useClawChatHistory({
agentId,
sessionKey: streamSessionKey,
enabled: Boolean(agent) && !isAgentHarnessAgent,
})
const harnessHistoryQuery = useHarnessChatHistory(
agentId,
Boolean(agent) && isAgentHarnessAgent,
)
// Routing is now harness-only. Every OpenClaw agent has a harness
// record post the gateway → harness backfill, so the chat panel
// always talks to /agents/<id>/chat. The legacy ClawChat surface
// was deleted with the /claw/agents/:id/chat server route.
const harnessHistoryQuery = useHarnessChatHistory(agentId, Boolean(agent))
const historyMessages = useMemo(
() =>
flattenHistoryPages(
isAgentHarnessAgent
? harnessHistoryQuery.data
? [harnessHistoryQuery.data]
: []
: (clawHistoryQuery.data?.pages ?? []),
harnessHistoryQuery.data ? [harnessHistoryQuery.data] : [],
),
[
clawHistoryQuery.data?.pages,
harnessHistoryQuery.data,
isAgentHarnessAgent,
],
[harnessHistoryQuery.data],
)
const chatHistory = useMemo(
() => buildChatHistoryFromClawMessages(historyMessages),
[historyMessages],
)
const resolvedSessionKey =
streamSessionKey ??
(isAgentHarnessAgent
? null
: (clawHistoryQuery.data?.pages?.[0]?.sessionKey ?? null))
// Listing query feeds queue + active-turn state for this agent. We
// already poll it every 5s for the rail; reusing the same cache
// keeps cross-tab queue state in sync without a second poll.
const { harnessAgents } = useHarnessAgents()
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
const queue = harnessAgent?.queue ?? []
const activeTurnId = harnessAgent?.activeTurnId ?? null
const { turns, streaming, send } = useAgentConversation(agentId, {
runtime: isAgentHarnessAgent ? 'agent-harness' : 'openclaw',
sessionKey: resolvedSessionKey,
runtime: 'agent-harness',
sessionKey: null,
history: chatHistory,
activeTurnId,
onComplete: () => {
if (isAgentHarnessAgent) {
void harnessHistoryQuery.refetch()
}
},
onSessionKeyChange: (sessionKey) => {
setStreamSessionKey(sessionKey)
void harnessHistoryQuery.refetch()
},
onSessionKeyChange: () => {},
})
const enqueueMessage = useEnqueueHarnessMessage()
const removeQueuedMessage = useRemoveHarnessQueuedMessage()
const handleStop = () => {
void cancelHarnessTurn(agentId, {
turnId: activeTurnId ?? undefined,
reason: 'user pressed stop',
})
}
const visibleTurns = useMemo(
() =>
isAgentHarnessAgent
? filterTurnsPersistedInHistory(turns, historyMessages)
: turns,
[historyMessages, isAgentHarnessAgent, turns],
() => filterTurnsPersistedInHistory(turns, historyMessages),
[historyMessages, turns],
)
const outboundQueue = useOutboundQueue({
agentId,
sessionKey: resolvedSessionKey,
enabled: Boolean(agent) && !isAgentHarnessAgent,
})
onInitialMessageConsumedRef.current = onInitialMessageConsumed
// Refetch history whenever a server-dispatched queue item completes.
// The server worker streams the queued turn into OpenClaw directly, so
// the client never observes the live tokens — we only see the new
// assistant turn once the JSONL is updated. Watching the queue for
// any 'sending' item dropping out is the cleanest "turn finalized"
// signal we have without exposing per-turn SSE.
const previousSendingIdsRef = useRef<Set<string>>(new Set())
useEffect(() => {
if (isAgentHarnessAgent) return
const currentSending = new Set(
outboundQueue.queue
.filter((item) => item.status === 'sending')
.map((item) => item.id),
)
const dropped = [...previousSendingIdsRef.current].filter(
(id) => !currentSending.has(id),
)
previousSendingIdsRef.current = currentSending
if (dropped.length > 0) {
void clawHistoryQuery.refetch()
}
}, [clawHistoryQuery, isAgentHarnessAgent, outboundQueue.queue])
const disabled =
!agent || (!isAgentHarnessAgent && status?.status !== 'running')
// Two-part gate: cover both "still fetching" AND "just got enabled but
// hasn't started fetching yet". When `enabled` flips true (baseUrl
// resolves), there's a render frame where React Query reports
// isLoading=false but hasn't run the queryFn yet — `isFetched` is still
// false. Without this we render EmptyState during that one frame.
const isInitialLoading =
!isAgentHarnessAgent &&
(clawHistoryQuery.isLoading ||
(!clawHistoryQuery.isFetched && !clawHistoryQuery.isError))
const disabled = !agent
const historyReady =
(isAgentHarnessAgent &&
(harnessHistoryQuery.isFetched || harnessHistoryQuery.isError)) ||
(!isAgentHarnessAgent &&
(clawHistoryQuery.isFetched || clawHistoryQuery.isError))
harnessHistoryQuery.isFetched || harnessHistoryQuery.isError
const initialMessageKey = initialMessage
? `${agentId}:${initialMessage}`
: null
const error = isAgentHarnessAgent
? (harnessHistoryQuery.error ?? null)
: (clawHistoryQuery.error ?? null)
const error = harnessHistoryQuery.error ?? null
const enqueueRef = useRef(outboundQueue.enqueue)
enqueueRef.current = outboundQueue.enqueue
const sendRef = useRef(send)
sendRef.current = send
@@ -340,18 +281,8 @@ function AgentConversationController({
initialMessageSentRef.current = initialMessageKey
onInitialMessageConsumedRef.current()
if (isAgentHarnessAgent) {
void sendRef.current({ text: query })
} else {
enqueueRef.current({ text: query })
}
}, [
disabled,
historyReady,
initialMessage,
initialMessageKey,
isAgentHarnessAgent,
])
void sendRef.current({ text: query })
}, [disabled, historyReady, initialMessage, initialMessageKey])
const handleSelectAgent = (entry: AgentEntry) => {
navigate(`${agentPathPrefix}/${entry.agentId}`)
@@ -364,32 +295,26 @@ function AgentConversationController({
historyMessages={historyMessages}
turns={visibleTurns}
streaming={streaming}
isInitialLoading={
isAgentHarnessAgent ? harnessHistoryQuery.isLoading : isInitialLoading
}
isInitialLoading={harnessHistoryQuery.isLoading}
error={error}
hasNextPage={
isAgentHarnessAgent ? false : Boolean(clawHistoryQuery.hasNextPage)
}
isFetchingNextPage={
isAgentHarnessAgent ? false : clawHistoryQuery.isFetchingNextPage
}
onFetchNextPage={() => {
if (!isAgentHarnessAgent) {
void clawHistoryQuery.fetchNextPage()
}
}}
hasNextPage={false}
isFetchingNextPage={false}
onFetchNextPage={() => {}}
onRetry={() => {
if (isAgentHarnessAgent) {
void harnessHistoryQuery.refetch()
} else {
void clawHistoryQuery.refetch()
}
void harnessHistoryQuery.refetch()
}}
/>
<div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
<div className="mx-auto max-w-3xl">
<div className="mx-auto max-w-3xl space-y-3">
{queue.length > 0 ? (
<QueuePanel
queue={queue}
onRemove={(messageId) =>
removeQueuedMessage.mutate({ agentId, messageId })
}
/>
) : null}
<ConversationInput
variant="conversation"
agents={agents}
@@ -404,31 +329,30 @@ function AgentConversationController({
name: a.name,
dataUrl: a.dataUrl,
}))
if (isAgentHarnessAgent) {
void send({ text: input.text, attachments, attachmentPreviews })
} else {
outboundQueue.enqueue({
text: input.text,
// When the agent already has an in-flight turn, route
// the new message into the durable queue instead of
// starting a parallel turn. Drains automatically as
// soon as the active turn ends.
if (streaming || activeTurnId) {
enqueueMessage.mutate({
agentId,
message: input.text,
attachments,
attachmentPreviews,
history: chatHistory,
})
return
}
void send({ text: input.text, attachments, attachmentPreviews })
}}
onCreateAgent={() => navigate(createAgentPath)}
onStop={handleStop}
streaming={streaming}
disabled={disabled}
status={isAgentHarnessAgent ? 'running' : status?.status}
attachmentsEnabled={!isAgentHarnessAgent}
placeholder={`Message ${agentName}...`}
outboundQueue={
isAgentHarnessAgent ? undefined : outboundQueue.queue
}
onCancelQueued={
isAgentHarnessAgent ? undefined : outboundQueue.cancel
}
onRetryQueued={
isAgentHarnessAgent ? undefined : outboundQueue.retry
status="running"
attachmentsEnabled={true}
placeholder={
streaming
? `Type to queue another message for ${agentName}...`
: `Message ${agentName}...`
}
/>
</div>
@@ -453,7 +377,7 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
const { agentId } = useParams<{ agentId: string }>()
const [searchParams, setSearchParams] = useSearchParams()
const navigate = useNavigate()
const { status, agents } = useAgentCommandData()
const { agents } = useAgentCommandData()
const shouldRedirectHome = !agentId
const resolvedAgentId = agentId ?? ''
const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
@@ -471,10 +395,11 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
navigate(`${agentPathPrefix}/${entry.agentId}`)
}
const statusCopy =
agent?.source === 'agent-harness'
? 'Ready'
: getConversationStatusCopy(status?.status)
// Every visible agent runs through the harness now, so per-agent
// runtime status doesn't gate chat the way OpenClaw's legacy
// gateway lifecycle did. Show "Ready" once the agent record is
// resolved from the rail, "Setup" otherwise.
const statusCopy = agent ? 'Ready' : 'Setup'
return (
<div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
@@ -500,7 +425,6 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
key={resolvedAgentId}
agentId={resolvedAgentId}
agents={agents}
status={status}
initialMessage={initialMessage}
onInitialMessageConsumed={() =>
setSearchParams({}, { replace: true })

View File

@@ -1,18 +1,25 @@
import { Plus } from 'lucide-react'
import { type FC, useEffect, useState } from 'react'
import { type FC, useEffect, useMemo, useState } from 'react'
import { useNavigate } from 'react-router'
import { Button } from '@/components/ui/button'
import { Card, CardContent } from '@/components/ui/card'
import { Separator } from '@/components/ui/separator'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
} from '@/entrypoints/app/agents/agent-harness-types'
import {
useAgentAdapters,
useHarnessAgents,
} from '@/entrypoints/app/agents/useAgents'
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import { AgentCardDock } from './AgentCardDock'
import { useAgentCommandData } from './agent-command-layout'
import { ConversationInput } from './ConversationInput'
import { buildAgentCardData } from './useAgentCardData'
import { orderHomeAgents } from './home-agent-card.helpers'
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
return (
@@ -38,11 +45,13 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
function RecentThreads({
activeAgentId,
agents,
adapters,
onOpenAgents,
onSelectAgent,
}: {
activeAgentId?: string | null
agents: AgentCardData[]
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
onOpenAgents: () => void
onSelectAgent: (agentId: string) => void
}) {
@@ -68,6 +77,7 @@ function RecentThreads({
</div>
<AgentCardDock
agents={agents}
adapters={adapters}
activeAgentId={activeAgentId ?? undefined}
onSelectAgent={onSelectAgent}
onCreateAgent={onOpenAgents}
@@ -79,25 +89,32 @@ function RecentThreads({
export const AgentCommandHome: FC = () => {
const navigate = useNavigate()
const activeHint = useActiveHint()
const { agents, status } = useAgentCommandData()
// The conversation input still consumes the merged AgentEntry list
// from the layout context (handles legacy /claw/agents entries that
// haven't yet been backfilled into the harness store). The Recent
// Agents grid below reads the richer harness payload directly.
const { agents: legacyAgents, status } = useAgentCommandData()
const { harnessAgents } = useHarnessAgents()
const { adapters } = useAgentAdapters()
const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
const cardData = buildAgentCardData(agents, status?.status, undefined)
const orderedAgents = useMemo(
() => orderHomeAgents(harnessAgents),
[harnessAgents],
)
useEffect(() => {
if (agents.length === 0) {
if (selectedAgentId) {
setSelectedAgentId(null)
}
if (legacyAgents.length === 0) {
if (selectedAgentId) setSelectedAgentId(null)
return
}
if (
!selectedAgentId ||
!agents.some((agent) => agent.agentId === selectedAgentId)
!legacyAgents.some((agent) => agent.agentId === selectedAgentId)
) {
setSelectedAgentId(agents[0].agentId)
setSelectedAgentId(legacyAgents[0].agentId)
}
}, [agents, selectedAgentId])
}, [legacyAgents, selectedAgentId])
const handleSend = (input: { text: string }) => {
if (!selectedAgentId) return
@@ -110,7 +127,7 @@ export const AgentCommandHome: FC = () => {
setSelectedAgentId(agent.agentId)
}
const selectedAgent = agents.find(
const selectedAgent = legacyAgents.find(
(agent) => agent.agentId === selectedAgentId,
)
const selectedAgentReady = selectedAgent
@@ -118,13 +135,15 @@ export const AgentCommandHome: FC = () => {
: false
const selectedAgentStatus =
selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
const selectedCard =
cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]
const selectedAgentName =
selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
const hasAgents = legacyAgents.length > 0
return (
<div className="min-h-full px-4 py-6">
<div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
{cardData.length > 0 ? (
{hasAgents ? (
<>
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
<div className="space-y-3">
@@ -140,7 +159,7 @@ export const AgentCommandHome: FC = () => {
<div className="w-full max-w-3xl">
<ConversationInput
variant="home"
agents={agents}
agents={legacyAgents}
selectedAgentId={selectedAgentId}
onSelectAgent={handleSelectAgent}
onSend={handleSend}
@@ -151,7 +170,7 @@ export const AgentCommandHome: FC = () => {
attachmentsEnabled={false}
placeholder={
selectedAgentReady
? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
? `Ask ${selectedAgentName} to handle a task...`
: 'Agent runtime is not running...'
}
/>
@@ -162,7 +181,8 @@ export const AgentCommandHome: FC = () => {
<RecentThreads
activeAgentId={selectedAgentId}
agents={cardData}
agents={orderedAgents}
adapters={adapters}
onOpenAgents={() => navigate('/agents')}
onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
/>

View File

@@ -1,5 +1,4 @@
import {
AlertTriangle,
ArrowRight,
Bot,
ChevronDown,
@@ -9,7 +8,6 @@ import {
Loader2,
Mic,
Paperclip,
RefreshCw,
Square,
X,
} from 'lucide-react'
@@ -38,7 +36,6 @@ import { cn } from '@/lib/utils'
import { useVoiceInput } from '@/lib/voice/useVoiceInput'
import { useWorkspace } from '@/lib/workspace/use-workspace'
import { AgentSelector } from './AgentSelector'
import type { OutboundMessage } from './useOutboundQueue'
export interface ConversationInputSendInput {
text: string
@@ -57,34 +54,40 @@ interface ConversationInputProps {
placeholder?: string
attachmentsEnabled?: boolean
variant?: 'home' | 'conversation'
// Outbound queue: when present, the composer renders the queue strip
// above the textarea and lets the user keep sending while a previous
// turn is in flight. Optional so non-conversation variants (the home
// page) can opt out — the queue only makes sense in the conversation
// page where each enqueued message will eventually be delivered to the
// active agent.
outboundQueue?: OutboundMessage[]
onCancelQueued?: (id: string) => void
onRetryQueued?: (id: string) => void
/**
* When set, a Stop button surfaces to the left of the voice mic
* while `streaming === true`. Click cancels the active turn
* server-side via the chat-cancel endpoint. Absent → no Stop
* button (legacy behaviour for the home composer).
*/
onStop?: () => void
}
function InputActionButton({
disabled,
onClick,
streaming,
hasContent,
}: {
disabled: boolean
onClick: () => void
streaming: boolean
hasContent: boolean
}) {
// Show the spinner while streaming only when there's nothing to
// send — once the user types something, the icon flips back to the
// paper-plane so it reads as "queue this message" instead of
// "still working".
const showSpinner = streaming && !hasContent
return (
<Button
onClick={onClick}
size="icon"
disabled={disabled}
title={streaming && hasContent ? 'Queue message' : undefined}
className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
>
{streaming ? (
{showSpinner ? (
<Loader2 className="h-5 w-5 animate-spin" />
) : (
<ArrowRight className="h-5 w-5" />
@@ -93,6 +96,22 @@ function InputActionButton({
)
}
function StopButton({ onStop }: { onStop: () => void }) {
return (
<Button
type="button"
size="icon"
variant="ghost"
onClick={onStop}
title="Stop current turn — queued messages will start next."
aria-label="Stop current turn"
className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
>
<Square className="h-3.5 w-3.5 fill-current" />
</Button>
)
}
function VoiceButton({
isRecording,
isTranscribing,
@@ -311,9 +330,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
placeholder,
attachmentsEnabled = true,
variant = 'conversation',
outboundQueue,
onCancelQueued,
onRetryQueued,
onStop,
}) => {
const [input, setInput] = useState('')
const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -394,15 +411,17 @@ export const ConversationInput: FC<ConversationInputProps> = ({
}
const hasContent = input.trim().length > 0 || attachments.length > 0
const queueEnabled = outboundQueue !== undefined
// Queue-aware composers (the conversation panel passes `onStop`)
// accept input while streaming — the parent decides whether the
// submission opens a new turn or enqueues onto the active one.
// Surfaces without a Stop hook (home) keep the legacy behaviour
// and block input until the current turn finishes.
const queueAware = Boolean(onStop)
const handleSend = () => {
const text = input.trim()
// The outbound queue accepts new messages while streaming; legacy
// direct-send callers (e.g., the home composer) keep the original
// streaming-blocks-send semantic.
if (disabled || isStaging) return
if (!queueEnabled && streaming) return
if (streaming && !queueAware) return
if (!text && attachments.length === 0) return
onSend({ text, attachments })
setInput('')
@@ -494,13 +513,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
error={attachmentError}
/>
) : null}
{queueEnabled && outboundQueue && outboundQueue.length > 0 ? (
<OutboundQueueStrip
messages={outboundQueue}
onCancel={onCancelQueued}
onRetry={onRetryQueued}
/>
) : null}
<div
className={cn(
'flex gap-3',
@@ -539,6 +551,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
)}
/>
</div>
{streaming && onStop ? <StopButton onStop={onStop} /> : null}
<VoiceButton
isRecording={voice.isRecording}
isTranscribing={voice.isTranscribing}
@@ -556,15 +569,13 @@ export const ConversationInput: FC<ConversationInputProps> = ({
!!disabled ||
voice.isRecording ||
voice.isTranscribing ||
// Only block on `streaming` for the legacy direct-send path
// (no queue). With the queue active the press always
// succeeds — it just enqueues instead of dispatching.
(!queueEnabled && streaming)
(streaming && !queueAware)
}
onClick={handleSend}
// Spinner stays the user-facing "agent is busy" hint; with the
// queue active we still spin while a turn is in flight.
streaming={streaming}
hasContent={hasContent}
/>
</div>
{voice.error ? (
@@ -595,117 +606,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
)
}
function OutboundQueueStrip({
messages,
onCancel,
onRetry,
}: {
messages: OutboundMessage[]
onCancel?: (id: string) => void
onRetry?: (id: string) => void
}) {
return (
<div className="border-border/40 border-b px-4 pt-3 pb-2">
<ul className="flex flex-col gap-1">
{messages.map((message) => (
<OutboundQueueItem
key={message.id}
message={message}
onCancel={onCancel}
onRetry={onRetry}
/>
))}
</ul>
</div>
)
}
function OutboundQueueItem({
message,
onCancel,
onRetry,
}: {
message: OutboundMessage
onCancel?: (id: string) => void
onRetry?: (id: string) => void
}) {
const preview = message.text.trim() || '(attachments only)'
return (
<li className="flex items-center gap-2 rounded-md px-2 py-1 text-xs">
<OutboundQueueStatusIcon status={message.status} />
<span className="min-w-0 flex-1 truncate text-muted-foreground">
{preview}
</span>
{message.attachmentPreviews.length > 0 ? (
<span className="inline-flex items-center gap-1 text-muted-foreground/70">
<Paperclip className="size-3" />
<span className="tabular-nums">
{message.attachmentPreviews.length}
</span>
</span>
) : null}
{message.status === 'queued' && onCancel ? (
<button
type="button"
onClick={() => onCancel(message.id)}
className="ml-1 inline-flex size-5 items-center justify-center rounded-full text-muted-foreground hover:bg-accent hover:text-foreground"
aria-label="Cancel queued message"
title="Cancel"
>
<X className="size-3" />
</button>
) : null}
{message.status === 'failed' ? (
<span className="ml-1 inline-flex items-center gap-2 text-destructive">
<span className="max-w-[160px] truncate" title={message.error}>
{message.error ?? 'Failed'}
</span>
{onRetry ? (
<button
type="button"
onClick={() => onRetry(message.id)}
className="inline-flex size-5 items-center justify-center rounded-full hover:bg-accent hover:text-foreground"
aria-label="Retry failed message"
title="Retry"
>
<RefreshCw className="size-3" />
</button>
) : null}
{onCancel ? (
<button
type="button"
onClick={() => onCancel(message.id)}
className="inline-flex size-5 items-center justify-center rounded-full hover:bg-accent hover:text-foreground"
aria-label="Discard failed message"
title="Discard"
>
<X className="size-3" />
</button>
) : null}
</span>
) : null}
</li>
)
}
function OutboundQueueStatusIcon({
status,
}: {
status: OutboundMessage['status']
}) {
if (status === 'sending') {
return (
<Loader2 className="size-3.5 shrink-0 animate-spin text-muted-foreground" />
)
}
if (status === 'failed') {
return <AlertTriangle className="size-3.5 shrink-0 text-destructive" />
}
return (
<span className="inline-block size-2 shrink-0 rounded-full bg-muted-foreground/40" />
)
}
function AttachmentStrip({
attachments,
onRemove,

View File

@@ -0,0 +1,243 @@
import { Quote, TriangleAlert } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
import type {
HarnessAdapterHealth,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
import {
firstNonBlankLine,
truncate,
} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
import { cn } from '@/lib/utils'
interface HomeAgentCardProps {
agent: HarnessAgent
adapter: HarnessAgentAdapter | 'unknown'
/** Per-adapter health snapshot, shared across cards rendering the
* same adapter. `null` when the /adapters response hasn't surfaced
* health yet (we treat that as healthy until proven otherwise). */
adapterHealth: HarnessAdapterHealth | null
/** Highlights the card with an accent ring; tells the user which
* agent the conversation input is bound to. */
active?: boolean
onClick: () => void
}
const PREVIEW_CHARS = 100
/**
* Grid-shaped card for the /home Recent agents section. Composition
* mirrors the rail's `AgentRowCard` but the layout is a vertical
* column sized for a 1/3-width tile rather than a full-width row.
*
* Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
* `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
* inline `Unavailable` chip pattern so the visual language is
* continuous between rail and grid.
*/
export const HomeAgentCard: FC<HomeAgentCardProps> = ({
agent,
adapter,
adapterHealth,
active,
onClick,
}) => {
const status = agent.status ?? 'unknown'
const lastUsedAt = agent.lastUsedAt ?? null
const isWorking = status === 'working'
const isAsleep = status === 'asleep'
const isError = status === 'error'
const hasActiveTurn = Boolean(agent.activeTurnId)
return (
<button
type="button"
onClick={onClick}
className={cn(
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
active && 'ring-1 ring-[var(--accent-orange)]/30',
isWorking
? 'border-[var(--accent-orange)]/40'
: isError
? 'border-destructive/30'
: 'border-border/60 hover:border-[var(--accent-orange)]/30',
)}
>
<div className="flex items-start gap-3">
<AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
<div className="min-w-0 flex-1">
<div className="flex items-center gap-1.5">
<span className="truncate font-semibold text-sm">
{displayName(agent)}
</span>
{isWorking && (
<Badge
variant="secondary"
className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
</div>
<SummaryLine
adapter={adapter}
modelId={agent.modelId ?? null}
reasoningEffort={agent.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
<LastMessage message={agent.lastUserMessage ?? null} />
<div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
<span>{statusFootnote(status, lastUsedAt)}</span>
{hasActiveTurn ? (
<ResumeChip />
) : isAsleep ? (
<Badge variant="outline" className="text-muted-foreground">
Asleep
</Badge>
) : isError ? (
<ErrorChip lastError={agent.lastError ?? null} />
) : null}
</div>
</button>
)
}
const SummaryLine: FC<{
adapter: HarnessAgentAdapter | 'unknown'
modelId: string | null
reasoningEffort: string | null
adapterHealth: HarnessAdapterHealth | null
}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
const parts = [adapterLabel(adapter)]
if (modelId) parts.push(modelId)
if (reasoningEffort) parts.push(reasoningEffort)
const unhealthy = adapterHealth?.healthy === false
return (
<div
className={cn(
'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
unhealthy && 'text-muted-foreground/70',
)}
>
<span className="truncate">{parts.join(' · ')}</span>
{unhealthy && (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge
variant="outline"
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
>
<TriangleAlert className="size-2.5" />
<span className="font-normal">Unavailable</span>
</Badge>
</HoverCardTrigger>
<HoverCardContent side="right" className="w-72 text-sm">
<div className="font-medium">
{adapterLabel(adapter)} CLI not available
</div>
<div className="mt-1 text-muted-foreground text-xs">
{adapterHealth?.reason ??
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
</div>
</HoverCardContent>
</HoverCard>
)}
</div>
)
}
const LastMessage: FC<{ message: string | null }> = ({ message }) => {
if (!message) {
return (
<p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
No messages yet start a chat
</p>
)
}
return (
<p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
<Quote
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
aria-hidden
/>
<span className="line-clamp-2">
{truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
</span>
</p>
)
}
const ResumeChip: FC = () => (
<span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
<span className="relative flex size-1.5">
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
<span className="relative inline-flex size-1.5 rounded-full bg-white" />
</span>
Resume
</span>
)
const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
if (!lastError) {
return <Badge variant="destructive">Attention</Badge>
}
return (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge variant="destructive" className="cursor-default">
Attention
</Badge>
</HoverCardTrigger>
<HoverCardContent
side="left"
className="max-w-xs whitespace-pre-wrap font-mono text-xs"
>
{lastError}
</HoverCardContent>
</HoverCard>
)
}
/**
* Footer left side: relative time on every state EXCEPT working,
* which shows `now` (the dot is already pulsing — restating it as
* "Working" would duplicate the pill in the title row).
*/
function statusFootnote(
status: AgentLiveness,
lastUsedAt: number | null,
): string {
if (status === 'working') return 'now'
return formatRelativeTime(lastUsedAt)
}
const UUID_PATTERN =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
const OC_UUID_PATTERN =
/^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
function displayName(agent: HarnessAgent): string {
const name = agent.name?.trim()
const id = agent.id
if (!name || name === id) {
if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
if (UUID_PATTERN.test(id)) return id.slice(0, 8)
return id
}
return name
}

View File

@@ -0,0 +1,94 @@
import { ListPlus, X } from 'lucide-react'
import type { FC } from 'react'
import {
Queue,
QueueItem,
QueueItemAction,
QueueItemActions,
QueueItemAttachment,
QueueItemContent,
QueueItemFile,
QueueItemImage,
QueueList,
QueueSection,
QueueSectionContent,
QueueSectionLabel,
QueueSectionTrigger,
} from '@/components/ai-elements/queue'
import type {
HarnessQueuedMessage,
HarnessQueuedMessageAttachment,
} from '@/entrypoints/app/agents/agent-harness-types'
import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
interface QueuePanelProps {
queue: HarnessQueuedMessage[]
onRemove: (messageId: string) => void
}
/**
* Renders the agent's pending message queue using the shared AI
* Elements `Queue` primitives. Caller is expected to gate render on
* `queue.length > 0` — when empty, this returns null so the panel
* disappears cleanly between turns.
*/
export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
if (queue.length === 0) return null
return (
<Queue>
<QueueSection>
<QueueSectionTrigger>
<QueueSectionLabel
count={queue.length}
label={queue.length === 1 ? 'queued message' : 'queued messages'}
icon={<ListPlus className="size-3.5" />}
/>
</QueueSectionTrigger>
<QueueSectionContent>
<QueueList>
{queue.map((entry) => (
<QueueItem key={entry.id}>
<div className="flex items-center gap-2">
<QueueItemContent>
{firstNonBlankLine(entry.message)}
</QueueItemContent>
<QueueItemActions>
<QueueItemAction
aria-label="Remove from queue"
onClick={() => onRemove(entry.id)}
>
<X className="size-3" />
</QueueItemAction>
</QueueItemActions>
</div>
{entry.attachments && entry.attachments.length > 0 ? (
<QueueItemAttachment>
{entry.attachments.map((attachment, idx) =>
renderAttachment(entry.id, attachment, idx),
)}
</QueueItemAttachment>
) : null}
</QueueItem>
))}
</QueueList>
</QueueSectionContent>
</QueueSection>
</Queue>
)
}
function renderAttachment(
messageId: string,
attachment: HarnessQueuedMessageAttachment,
idx: number,
) {
if (attachment.mediaType.startsWith('image/')) {
const src = `data:${attachment.mediaType};base64,${attachment.data}`
return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
}
return (
<QueueItemFile key={`${messageId}-${idx}`}>
{attachment.mediaType}
</QueueItemFile>
)
}

View File

@@ -26,7 +26,15 @@ export const AgentCommandLayout: FC = () => {
const { agents: harnessAgents, loading: harnessAgentsLoading } =
useHarnessAgents()
const visibleOpenClawAgents = openClawEnabled ? openClawAgents : []
const agents = [...visibleOpenClawAgents, ...harnessAgents]
// Dual-created OpenClaw agents appear in both `/claw/agents` (gateway
// record) and `/agents` (harness record) under the same id. Prefer the
// harness entry so the chat panel can route through the harness path
// and the rail doesn't show duplicates.
const harnessAgentIds = new Set(harnessAgents.map((entry) => entry.agentId))
const dedupedOpenClawAgents = visibleOpenClawAgents.filter(
(entry) => !harnessAgentIds.has(entry.agentId),
)
const agents = [...dedupedOpenClawAgents, ...harnessAgents]
return (
<Outlet

View File

@@ -23,9 +23,9 @@ export interface BrowserOSChatHistoryToolCall {
toolName: string
label: string
subject?: string
status: 'completed' | 'failed'
input?: Record<string, unknown>
output?: string
status: 'pending' | 'running' | 'completed' | 'failed'
input?: unknown
output?: unknown
error?: string
durationMs?: number
}

View File

@@ -0,0 +1,71 @@
import { buildToolLabel } from '../../../lib/tool-labels'
import type { HarnessAgentHistoryPage } from '../agents/agent-harness-types'
import type {
AgentHistoryPageResponse,
BrowserOSChatHistoryItem,
BrowserOSChatHistoryToolCall,
} from './claw-chat-types'
export function mapHarnessHistoryPage(
page: HarnessAgentHistoryPage,
): AgentHistoryPageResponse {
const items: BrowserOSChatHistoryItem[] = page.items.map((item, index) => {
const toolCalls = item.toolCalls?.map(
(tool): BrowserOSChatHistoryToolCall => {
const input = asRecord(tool.input)
const { label, subject } = buildToolLabel(tool.toolName, input)
return {
toolName: tool.toolName,
label,
status: tool.status,
...(tool.toolCallId ? { toolCallId: tool.toolCallId } : {}),
...(subject ? { subject } : {}),
...(tool.input !== undefined ? { input: tool.input } : {}),
...(tool.output !== undefined ? { output: tool.output } : {}),
...(tool.error ? { error: tool.error } : {}),
...(tool.durationMs != null ? { durationMs: tool.durationMs } : {}),
}
},
)
return {
id: item.id,
role: item.role,
text: item.text,
timestamp: item.createdAt,
messageSeq: index + 1,
sessionKey: 'main',
source: 'user-chat',
...(item.reasoning ? { reasoning: item.reasoning } : {}),
...(toolCalls && toolCalls.length > 0 ? { toolCalls } : {}),
}
})
const updatedAt =
page.items.length > 0
? Math.max(...page.items.map((item) => item.createdAt))
: Date.now()
return {
agentId: page.agentId,
sessionKey: 'main',
session: {
key: 'main',
updatedAt,
sessionId: 'main',
agentId: page.agentId,
kind: 'agent-harness',
source: 'user-chat',
},
items,
page: {
hasMore: false,
limit: items.length,
},
}
}
function asRecord(value: unknown): Record<string, unknown> | undefined {
return value && typeof value === 'object' && !Array.isArray(value)
? (value as Record<string, unknown>)
: undefined
}

View File

@@ -0,0 +1,69 @@
import { describe, expect, it } from 'bun:test'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { orderHomeAgents } from './home-agent-card.helpers'
function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
return {
id: overrides.id ?? 'agent-x',
name: overrides.name ?? overrides.id ?? 'agent-x',
adapter: overrides.adapter ?? 'codex',
permissionMode: 'approve-all',
sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
createdAt: 1000,
updatedAt: 1000,
...overrides,
}
}
describe('orderHomeAgents', () => {
it('places active-turn agents before everyone else', () => {
const sorted = orderHomeAgents([
agent({ id: 'a', lastUsedAt: 5000 }),
agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
agent({ id: 'c', lastUsedAt: 7000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
})
it('orders non-active agents by lastUsedAt desc', () => {
const sorted = orderHomeAgents([
agent({ id: 'old', lastUsedAt: 1000 }),
agent({ id: 'new', lastUsedAt: 9000 }),
agent({ id: 'mid', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
})
it('puts the gateway `main` seed agent above other never-used agents', () => {
const sorted = orderHomeAgents([
agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
agent({ id: 'main', lastUsedAt: null }),
agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
])
expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
})
it('sends never-used agents to the bottom even when `main` is among them', () => {
const sorted = orderHomeAgents([
agent({ id: 'main', lastUsedAt: null }),
agent({ id: 'used', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
})
it('does NOT sort by pinned — pinned agents are treated like any other', () => {
const sorted = orderHomeAgents([
agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
])
expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
})
it('falls back to id-stable ordering when lastUsedAt ties', () => {
const sorted = orderHomeAgents([
agent({ id: 'b', lastUsedAt: 5000 }),
agent({ id: 'a', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
})
})

View File

@@ -0,0 +1,42 @@
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
/**
* Order for the /home Recent agents grid.
*
* 1. Active turn first — agents mid-turn float to the top so the
* Resume affordance is the first thing the user sees on /home.
* 2. The protected gateway-side `main` agent stays pinned-to-top in
* the never-used group on a fresh install (mirrors the rail).
* 3. Recency (`lastUsedAt` desc).
* 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
* every 5-second poll.
*
* Pin is NOT a sort key. The home grid is action-oriented and trusts
* recency + active-turn to surface the right agent; pinning is an
* organisation tool that lives on the rail at /agents.
*/
export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
return [...agents].sort((a, b) => {
const aActive = a.activeTurnId != null
const bActive = b.activeTurnId != null
if (aActive !== bActive) return aActive ? -1 : 1
// Recency wins outright. Never-used agents (`lastUsedAt == null`)
// both fall to the same `-Infinity` bucket and the seed/id rules
// below decide their order — but a used agent always beats any
// never-used agent regardless of id.
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
// Inside the never-used (or exact-tie) group: pin the gateway
// `main` seed to the top of the group on a fresh install, then
// fall back to id-stable order so the grid doesn't reshuffle on
// every poll.
const aSeed = a.id === 'main' && a.lastUsedAt == null
const bSeed = b.id === 'main' && b.lastUsedAt == null
if (aSeed !== bSeed) return aSeed ? -1 : 1
return a.id.localeCompare(b.id)
})
}

View File

@@ -1,53 +0,0 @@
import {
type AgentEntry,
getModelDisplayName,
type OpenClawStatus,
} from '@/entrypoints/app/agents/useOpenClaw'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import type { AgentOverview } from './useAgentDashboard'
function resolveAgentStatus(
gatewayStatus: OpenClawStatus['status'] | undefined,
liveStatus: AgentOverview['status'] | undefined,
): AgentCardData['status'] {
// Gateway-level errors take precedence
if (gatewayStatus === 'error') return 'error'
if (gatewayStatus === 'starting') return 'working'
// Per-agent live status from the WS observer
if (liveStatus === 'working') return 'working'
if (liveStatus === 'error') return 'error'
return 'idle'
}
/**
* Build agent card display data by merging the raw agent entries from
* the gateway with enriched overview data from the dashboard API.
*
* Pure function — no hooks, no IndexedDB, no async.
*/
export function buildAgentCardData(
agents: AgentEntry[],
status: OpenClawStatus['status'] | undefined,
dashboard: AgentOverview[] | undefined,
): AgentCardData[] {
return agents.map((agent) => {
const overview = dashboard?.find((d) => d.agentId === agent.agentId)
return {
agentId: agent.agentId,
name: agent.name,
model: getModelDisplayName(agent.model),
status:
agent.source === 'agent-harness'
? 'idle'
: resolveAgentStatus(status, overview?.status),
lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
activitySummary: overview?.activitySummary ?? undefined,
currentTool: overview?.currentTool ?? undefined,
costUsd: overview?.totalCostUsd ?? undefined,
}
})
}

View File

@@ -1,13 +1,12 @@
import { useEffect, useRef, useState } from 'react'
import {
type AgentHarnessStreamEvent,
attachToHarnessTurn,
cancelHarnessTurn,
chatWithHarnessAgent,
fetchActiveHarnessTurn,
} from '@/entrypoints/app/agents/useAgents'
import {
chatWithAgent,
type OpenClawChatHistoryMessage,
type OpenClawStreamEvent,
} from '@/entrypoints/app/agents/useOpenClaw'
import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpenClaw'
import type {
AgentConversationTurn,
AssistantPart,
@@ -29,11 +28,23 @@ export interface SendInput {
}
interface UseAgentConversationOptions {
runtime?: 'openclaw' | 'agent-harness'
// The hook always speaks to the harness chat path now; the OpenClaw
// legacy /claw/agents/:id/chat surface was removed in Step 12. The
// option remains for forward-compatibility.
runtime?: 'agent-harness'
sessionKey?: string | null
history?: OpenClawChatHistoryMessage[]
onComplete?: () => void
onSessionKeyChange?: (sessionKey: string) => void
/**
* Server-side active turn id, surfaced via the listing query. When
* this changes from null/<id> to a different non-null id while we
* aren't already streaming (e.g. the server just popped a queued
* message and started a new turn), the hook reattaches via
* /chat/active so the chat panel picks up the live stream without
* waiting for a remount.
*/
activeTurnId?: string | null
}
export function useAgentConversation(
@@ -49,6 +60,11 @@ export function useAgentConversation(
const streamAbortRef = useRef<AbortController | null>(null)
const onCompleteRef = useRef(options.onComplete)
const onSessionKeyChangeRef = useRef(options.onSessionKeyChange)
// Per-turn resume bookkeeping. `turnId` is captured from the response
// header; `lastSeq` advances with every SSE event so a reconnect can
// resume via Last-Event-ID.
const turnIdRef = useRef<string | null>(null)
const lastSeqRef = useRef<number | null>(null)
useEffect(() => {
sessionKeyRef.current = options.sessionKey ?? ''
@@ -72,6 +88,12 @@ export function useAgentConversation(
}
}, [])
// Indirection for the resume effect below: lets it call the latest
// event handler without re-subscribing on every render.
const processEventRef = useRef<(event: AgentHarnessStreamEvent) => void>(
() => {},
)
const updateCurrentTurnParts = (
updater: (parts: AssistantPart[]) => AssistantPart[],
) => {
@@ -82,85 +104,6 @@ export function useAgentConversation(
})
}
const processStreamEvent = (event: OpenClawStreamEvent) => {
switch (event.type) {
case 'text-delta': {
appendTextDelta((event.data.text as string) ?? '')
break
}
case 'thinking': {
appendThinkingDelta((event.data.text as string) ?? '')
break
}
case 'tool-start': {
const rawName = (event.data.toolName as string) ?? 'unknown'
const args = event.data.args as Record<string, unknown> | undefined
const { label, subject } = buildToolLabel(rawName, args)
const tool = {
id: (event.data.toolCallId as string) ?? crypto.randomUUID(),
name: rawName,
label,
subject,
status: 'running' as const,
}
updateCurrentTurnParts((parts) => {
const last = parts[parts.length - 1]
if (last?.kind === 'tool-batch') {
return [
...parts.slice(0, -1),
{ ...last, tools: [...last.tools, tool] },
]
}
return [...parts, { kind: 'tool-batch', tools: [tool] }]
})
break
}
case 'tool-end': {
const toolId = event.data.toolCallId as string
const toolStatus: 'completed' | 'error' =
(event.data.status as string) === 'error' ? 'error' : 'completed'
const durationMs = event.data.durationMs as number | undefined
updateCurrentTurnParts((parts) => {
for (let i = parts.length - 1; i >= 0; i--) {
const part = parts[i]
if (
part.kind === 'tool-batch' &&
part.tools.some((t) => t.id === toolId)
) {
const updatedTools = part.tools.map((t) =>
t.id === toolId ? { ...t, status: toolStatus, durationMs } : t,
)
return [
...parts.slice(0, i),
{ ...part, tools: updatedTools },
...parts.slice(i + 1),
]
}
}
return parts
})
break
}
case 'done': {
markCurrentTurnDone()
break
}
case 'error': {
const msg =
(event.data.message as string) ??
(event.data.error as string) ??
'Unknown error'
appendErrorText(msg)
break
}
}
}
const appendTextDelta = (delta: string) => {
textAccRef.current += delta
const text = textAccRef.current
@@ -275,6 +218,105 @@ export function useAgentConversation(
break
}
}
processEventRef.current = processAgentHarnessStreamEvent
const activeTurnIdDep = options.activeTurnId ?? null
// On mount, on agent change, and whenever the listing reports a
// *new* active turn id, check whether the server has an in-flight
// turn for this agent and reattach to it. This catches three
// cases at once: the chat resilience flow (tab close/reopen),
// navigation between agents, AND queue drain (the server starts a
// new turn from a queued message → activeTurnId flips → attach).
useEffect(() => {
let cancelled = false
const abortController = new AbortController()
// Reference the dep inside the body so biome's exhaustive-deps
// rule sees it consumed; the value is just an "any non-null
// active turn id" trigger — the actual id we attach to comes
// from the fresh fetchActiveHarnessTurn call below.
void activeTurnIdDep
const attemptResume = async () => {
// Track whether *we* started a stream in this run. When the
// early-return paths fire (no active turn, or a `send()` /
// earlier resume already owns `streamAbortRef`), the finally
// block must NOT touch streaming/turnIdRef/lastSeqRef —
// otherwise we clobber the in-flight stream's state and the
// Stop button drops out mid-turn while events keep arriving.
let weStartedStream = false
try {
const active = await fetchActiveHarnessTurn(agentId)
if (cancelled || !active || active.status !== 'running') return
if (streamAbortRef.current) return // someone else already owns the stream
// Stage a placeholder turn so the streamed events have a row
// to render into. The server now persists the kicking-off
// prompt on the active turn, so we render it as the user
// bubble immediately — no empty-bubble flicker when a queued
// message starts running.
setTurns((prev) => [
...prev,
{
id: crypto.randomUUID(),
userText: active.prompt ?? '',
parts: [],
done: false,
timestamp: active.startedAt,
},
])
textAccRef.current = ''
thinkAccRef.current = ''
turnIdRef.current = active.turnId
lastSeqRef.current = null
streamAbortRef.current = abortController
setStreaming(true)
weStartedStream = true
const response = await attachToHarnessTurn(agentId, {
turnId: active.turnId,
signal: abortController.signal,
})
if (!response.ok) return
await consumeSSEStream<AgentHarnessStreamEvent>(
response,
(event, meta) => {
if (typeof meta.seq === 'number') lastSeqRef.current = meta.seq
processEventRef.current(event)
},
abortController.signal,
)
} catch {
// Resume is best-effort; transient errors fall back to the
// user starting a new turn manually.
} finally {
// Always release `streamAbortRef` if we owned it — even when
// the effect was cancelled mid-stream (a listing poll
// captured the next queue-drain turn id, for example). If we
// don't, the next effect run hits `if (streamAbortRef.current)
// return` against our now-aborted controller and never
// reattaches, leaving `streaming === true` with no live stream.
if (weStartedStream && streamAbortRef.current === abortController) {
streamAbortRef.current = null
}
// The other state (streaming flag, turn id, lastSeq) is the
// *current run's* lifecycle: only reset it on a clean exit.
// When `cancelled` is true the next run will set these
// itself, so resetting here would only cause a brief flicker.
if (!cancelled && weStartedStream) {
turnIdRef.current = null
lastSeqRef.current = null
setStreaming(false)
}
}
}
void attemptResume()
return () => {
cancelled = true
abortController.abort()
}
}, [agentId, activeTurnIdDep])
const send = async (input: string | SendInput) => {
const normalized: SendInput =
@@ -304,17 +346,25 @@ export function useAgentConversation(
streamAbortRef.current = abortController
try {
const response =
options.runtime === 'agent-harness'
? await chatWithHarnessAgent(agentId, trimmed, abortController.signal)
: await chatWithAgent(
agentId,
trimmed,
sessionKeyRef.current || undefined,
historyRef.current,
abortController.signal,
attachments,
)
let response = await chatWithHarnessAgent(
agentId,
trimmed,
abortController.signal,
attachments,
)
// 409 means the server already has an active turn for this
// agent (e.g. a previous tab kicked one off and we're a fresh
// mount that missed the resume window). Attach to it instead of
// double-sending.
if (response.status === 409) {
const body = (await response.json()) as { turnId?: string }
if (body.turnId) {
response = await attachToHarnessTurn(agentId, {
turnId: body.turnId,
signal: abortController.signal,
})
}
}
const responseSessionKey =
response.headers.get('X-Session-Key') ??
response.headers.get('X-Session-Id')
@@ -322,6 +372,11 @@ export function useAgentConversation(
sessionKeyRef.current = responseSessionKey
onSessionKeyChangeRef.current?.(responseSessionKey)
}
const responseTurnId = response.headers.get('X-Turn-Id')
if (responseTurnId) {
turnIdRef.current = responseTurnId
lastSeqRef.current = null
}
if (!response.ok) {
const err = await response.text()
updateCurrentTurnParts((parts) => [
@@ -330,19 +385,14 @@ export function useAgentConversation(
])
return
}
if (options.runtime === 'agent-harness') {
await consumeSSEStream<AgentHarnessStreamEvent>(
response,
processAgentHarnessStreamEvent,
abortController.signal,
)
} else {
await consumeSSEStream<OpenClawStreamEvent>(
response,
processStreamEvent,
abortController.signal,
)
}
await consumeSSEStream<AgentHarnessStreamEvent>(
response,
(event, meta) => {
if (typeof meta.seq === 'number') lastSeqRef.current = meta.seq
processAgentHarnessStreamEvent(event)
},
abortController.signal,
)
} catch (err) {
if (abortController.signal.aborted) return
const msg = err instanceof Error ? err.message : String(err)
@@ -354,14 +404,35 @@ export function useAgentConversation(
if (streamAbortRef.current === abortController) {
streamAbortRef.current = null
}
turnIdRef.current = null
lastSeqRef.current = null
onCompleteRef.current?.()
setStreaming(false)
}
}
const resetConversation = () => {
/**
* Stop button. The fetch abort only detaches *this* SSE subscriber
* now — the underlying turn would otherwise keep running on the
* server. So we explicitly cancel via the new endpoint, then unwind
* the local stream.
*/
const stop = async () => {
const turnId = turnIdRef.current ?? undefined
streamAbortRef.current?.abort()
streamAbortRef.current = null
try {
await cancelHarnessTurn(agentId, {
turnId,
reason: 'user pressed stop',
})
} catch {
// Best-effort — UI already aborted.
}
}
const resetConversation = () => {
void stop()
setTurns([])
setStreaming(false)
}
@@ -371,6 +442,7 @@ export function useAgentConversation(
streaming,
sessionKey: sessionKeyRef.current,
send,
stop,
resetConversation,
}
}

View File

@@ -1,95 +0,0 @@
import { useQuery, useQueryClient } from '@tanstack/react-query'
import { useEffect } from 'react'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
export interface AgentOverview {
agentId: string
status: 'working' | 'idle' | 'error' | 'unknown'
latestMessage: string | null
latestMessageAt: number | null
activitySummary: string | null
currentTool: string | null
totalCostUsd: number
sessionCount: number
}
export interface DashboardResponse {
agents: AgentOverview[]
summary: {
totalAgents: number
totalCostUsd: number
}
}
interface StatusEvent {
agentId: string
status: AgentOverview['status']
currentTool: string | null
error: string | null
timestamp: number
}
const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
export function useAgentDashboard(enabled: boolean) {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
const ready = enabled && Boolean(baseUrl) && !urlLoading
// Initial data load + periodic refresh as fallback
const query = useQuery<DashboardResponse>({
queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
queryFn: async () => {
const url = new URL('/claw/dashboard', baseUrl as string)
const response = await fetch(url.toString())
if (!response.ok) throw new Error('Failed to fetch dashboard')
return response.json()
},
enabled: ready,
})
// SSE subscription for real-time status patches
useEffect(() => {
if (!ready || !baseUrl) return
const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
const eventSource = new EventSource(streamUrl.toString())
eventSource.addEventListener('snapshot', (event) => {
try {
const dashboard = JSON.parse(event.data) as DashboardResponse
queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
} catch {}
})
eventSource.addEventListener('status', (event) => {
try {
const status = JSON.parse(event.data) as StatusEvent
queryClient.setQueryData<DashboardResponse>(
[...DASHBOARD_QUERY_KEY, baseUrl],
(prev) => {
if (!prev) return prev
return {
...prev,
agents: prev.agents.map((agent) =>
agent.agentId === status.agentId
? {
...agent,
status: status.status,
currentTool: status.currentTool,
}
: agent,
),
}
},
)
} catch {}
})
return () => {
eventSource.close()
}
}, [ready, baseUrl, queryClient])
return query
}

View File

@@ -1,71 +0,0 @@
import { useInfiniteQuery } from '@tanstack/react-query'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
import type { AgentHistoryPageResponse } from './claw-chat-types'
const HISTORY_QUERY_KEY = 'claw-agent-history'
async function fetchClawJson<T>(url: string): Promise<T> {
const response = await fetch(url)
if (!response.ok) {
let message = `Request failed with status ${response.status}`
try {
const body = (await response.json()) as { error?: string }
if (body.error) message = body.error
} catch {}
throw new Error(message)
}
return response.json() as Promise<T>
}
function buildClawUrl(baseUrl: string, path: string): URL {
return new URL(`/claw${path}`, baseUrl)
}
export function useClawChatHistory({
agentId,
sessionKey,
enabled = true,
limit = 50,
}: {
agentId: string
// null lets the server resolve the most recent user-chat session for the
// agent — avoids an extra /session round-trip and the race that came with it.
sessionKey: string | null
enabled?: boolean
limit?: number
}) {
const {
baseUrl,
isLoading: urlLoading,
error: urlError,
} = useAgentServerUrl()
const query = useInfiniteQuery<AgentHistoryPageResponse, Error>({
queryKey: [HISTORY_QUERY_KEY, baseUrl, agentId, sessionKey],
initialPageParam: undefined as string | undefined,
queryFn: async ({ pageParam }) => {
const url = buildClawUrl(baseUrl as string, `/agents/${agentId}/history`)
url.searchParams.set('limit', String(limit))
if (sessionKey) {
url.searchParams.set('sessionKey', sessionKey)
}
if (typeof pageParam === 'string' && pageParam) {
url.searchParams.set('cursor', pageParam)
}
return fetchClawJson<AgentHistoryPageResponse>(url.toString())
},
getNextPageParam: (lastPage) =>
lastPage.page.hasMore ? lastPage.page.cursor : undefined,
enabled: enabled && Boolean(baseUrl) && !urlLoading && Boolean(agentId),
})
return {
...query,
error: query.error ?? urlError,
isLoading: query.isLoading || urlLoading,
}
}

View File

@@ -0,0 +1,55 @@
import { describe, expect, it } from 'bun:test'
import { mapHarnessHistoryPage } from './harness-history-mapper'
describe('mapHarnessHistoryPage', () => {
it('maps rich harness history into chat history items', () => {
const page = mapHarnessHistoryPage({
agentId: 'agent-1',
sessionId: 'main',
items: [
{
id: 'agent:agent-1:main:1',
agentId: 'agent-1',
sessionId: 'main',
role: 'assistant',
text: 'Done.',
createdAt: 1000,
reasoning: { text: 'checking state' },
toolCalls: [
{
toolCallId: 'tool-1',
toolName: 'read_file',
status: 'completed',
input: { path: 'src/index.ts' },
output: 'file contents',
},
],
},
],
})
expect(page.items).toEqual([
{
id: 'agent:agent-1:main:1',
role: 'assistant',
text: 'Done.',
timestamp: 1000,
messageSeq: 1,
sessionKey: 'main',
source: 'user-chat',
reasoning: { text: 'checking state' },
toolCalls: [
{
toolCallId: 'tool-1',
toolName: 'read_file',
label: 'Read file',
subject: 'index.ts',
status: 'completed',
input: { path: 'src/index.ts' },
output: 'file contents',
},
],
},
])
})
})

View File

@@ -1,11 +1,8 @@
import { useQuery } from '@tanstack/react-query'
import type { HarnessAgentHistoryPage } from '@/entrypoints/app/agents/agent-harness-types'
import { fetchHarnessAgentHistory } from '@/entrypoints/app/agents/useAgents'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
import type {
AgentHistoryPageResponse,
BrowserOSChatHistoryItem,
} from './claw-chat-types'
import type { AgentHistoryPageResponse } from './claw-chat-types'
import { mapHarnessHistoryPage } from './harness-history-mapper'
const HISTORY_QUERY_KEY = 'harness-agent-history'
@@ -30,39 +27,3 @@ export function useHarnessChatHistory(agentId: string, enabled = true) {
isLoading: query.isLoading || urlLoading,
}
}
function mapHarnessHistoryPage(
page: HarnessAgentHistoryPage,
): AgentHistoryPageResponse {
const items: BrowserOSChatHistoryItem[] = page.items.map((item, index) => ({
id: item.id,
role: item.role,
text: item.text,
timestamp: item.createdAt,
messageSeq: index + 1,
sessionKey: 'main',
source: 'user-chat',
}))
const updatedAt =
page.items.length > 0
? Math.max(...page.items.map((item) => item.createdAt))
: Date.now()
return {
agentId: page.agentId,
sessionKey: 'main',
session: {
key: 'main',
updatedAt,
sessionId: 'main',
agentId: page.agentId,
kind: 'agent-harness',
source: 'user-chat',
},
items,
page: {
hasMore: false,
limit: items.length,
},
}
}

View File

@@ -1,271 +0,0 @@
import { useCallback, useEffect, useRef, useState } from 'react'
import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpenClaw'
import type { UserAttachmentPreview } from '@/lib/agent-conversations/types'
import type { ServerAttachmentPayload } from '@/lib/attachments'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
export type OutboundMessageStatus = 'queued' | 'sending' | 'failed'
export interface OutboundMessage {
id: string
text: string
attachments: ServerAttachmentPayload[]
attachmentPreviews: UserAttachmentPreview[]
status: OutboundMessageStatus
error?: string
createdAt: number
}
export interface OutboundQueueEnqueueInput {
text: string
attachments?: ServerAttachmentPayload[]
attachmentPreviews?: UserAttachmentPreview[]
history?: OpenClawChatHistoryMessage[]
}
export interface OutboundQueueApi {
queue: OutboundMessage[]
enqueue(input: OutboundQueueEnqueueInput): void
cancel(id: string): void
retry(id: string): void
}
interface UseOutboundQueueOptions {
agentId: string | null | undefined
sessionKey?: string | null
enabled?: boolean
}
interface ServerQueuedItem {
id: string
status: 'queued' | 'dispatching' | 'failed'
message: string
attachmentsPreview: Array<{
kind: 'image' | 'file'
mediaType: string
name?: string
}>
error?: string
createdAt: number
}
function makeId(): string {
if (typeof crypto !== 'undefined' && crypto.randomUUID) {
return crypto.randomUUID()
}
return `${Date.now().toString(36)}-${Math.random().toString(36).slice(2, 10)}`
}
/**
* Server-backed outbound message queue. The browser is purely a
* projection of server state — closing the tab is safe because the queue
* keeps draining server-side via the OutboundQueueService.
*
* Single id-keyed list: the client generates the queue id and hands it
* to the server in the POST body, so the optimistic row and the SSE
* snapshot reconcile on the same key from frame zero — there is no
* window in which the message renders twice.
*/
export function useOutboundQueue(
options: UseOutboundQueueOptions,
): OutboundQueueApi {
const { agentId, enabled = true, sessionKey } = options
const { baseUrl } = useAgentServerUrl()
const sessionKeyRef = useRef<string | null | undefined>(sessionKey)
sessionKeyRef.current = sessionKey
const [items, setItems] = useState<OutboundMessage[]>([])
// Track which ids the server has confirmed seeing in any SSE snapshot.
// We use this to know whether a missing-from-snapshot id is "drained
// by the server" (drop it) or "still in flight client-side" (keep
// showing the optimistic row).
const everSeenByServerRef = useRef<Set<string>>(new Set())
// Local-only attachment previews, keyed by queue id. Data URLs never
// leave the browser — the SSE feed only carries metadata, so we hold
// them here so the chip strip keeps rendering after server takeover.
const previewMapRef = useRef<Map<string, UserAttachmentPreview[]>>(new Map())
useEffect(() => {
if (!enabled || !baseUrl || !agentId) {
setItems([])
everSeenByServerRef.current = new Set()
previewMapRef.current = new Map()
return
}
let cancelled = false
const url = `${baseUrl}/claw/agents/${encodeURIComponent(agentId)}/queue/stream`
const source = new EventSource(url)
source.onmessage = (event) => {
if (cancelled) return
try {
const parsed = JSON.parse(event.data) as { items: ServerQueuedItem[] }
const snapshotIds = new Set(parsed.items.map((item) => item.id))
for (const id of snapshotIds) everSeenByServerRef.current.add(id)
setItems((prev) => {
const next: OutboundMessage[] = parsed.items.map((item) => ({
id: item.id,
text: item.message,
attachments: [],
attachmentPreviews: previewMapRef.current.get(item.id) ?? [],
status: serverStatusToClient(item.status),
error: item.error,
createdAt: item.createdAt,
}))
// Carry forward any optimistic / failed entries the server
// doesn't know about yet (POST in flight) or has finished
// dispatching but the client wants to keep visible (failed).
const carried = prev.filter((local) => {
if (snapshotIds.has(local.id)) return false
if (everSeenByServerRef.current.has(local.id)) {
// Server saw it before and it's gone now — drained.
previewMapRef.current.delete(local.id)
return false
}
return local.status !== 'failed' || Boolean(local.error)
})
return [...carried, ...next]
})
} catch {
// Malformed event — ignore; next snapshot will recover.
}
}
source.onerror = () => {
// Auto-reconnects; nothing to do here.
}
return () => {
cancelled = true
source.close()
}
}, [baseUrl, agentId, enabled])
const enqueue = useCallback(
(input: OutboundQueueEnqueueInput) => {
if (!enabled || !baseUrl || !agentId) return
const trimmed = input.text.trim()
const attachments = input.attachments ?? []
if (!trimmed && attachments.length === 0) return
const id = makeId()
const previews = input.attachmentPreviews ?? []
previewMapRef.current.set(id, previews)
setItems((prev) => [
...prev,
{
id,
text: trimmed,
attachments,
attachmentPreviews: previews,
status: 'queued',
createdAt: Date.now(),
},
])
void (async () => {
try {
const response = await fetch(
`${baseUrl}/claw/agents/${encodeURIComponent(agentId)}/queue`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
id,
message: trimmed,
attachments: attachments.length > 0 ? attachments : undefined,
sessionKey: sessionKeyRef.current ?? undefined,
history: input.history,
}),
},
)
if (!response.ok) {
const text = await response.text().catch(() => '')
previewMapRef.current.delete(id)
setItems((prev) =>
prev.map((item) =>
item.id === id
? {
...item,
status: 'failed',
error:
text || `Failed to enqueue (status ${response.status})`,
}
: item,
),
)
}
} catch (err) {
// Only mark as failed if the SSE snapshot hasn't already
// taken ownership of the entry (i.e. the request actually
// reached the server).
if (everSeenByServerRef.current.has(id)) return
previewMapRef.current.delete(id)
setItems((prev) =>
prev.map((item) =>
item.id === id
? {
...item,
status: 'failed',
error:
err instanceof Error
? err.message
: 'Failed to enqueue message',
}
: item,
),
)
}
})()
},
[baseUrl, agentId, enabled],
)
const cancel = useCallback(
(id: string) => {
// If the server has never seen this id, just drop it locally.
if (!everSeenByServerRef.current.has(id)) {
previewMapRef.current.delete(id)
setItems((prev) => prev.filter((item) => item.id !== id))
return
}
if (!enabled || !baseUrl || !agentId) return
void fetch(
`${baseUrl}/claw/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(id)}`,
{ method: 'DELETE' },
).catch(() => {})
},
[baseUrl, agentId, enabled],
)
const retry = useCallback(
(id: string) => {
if (!everSeenByServerRef.current.has(id)) {
// Optimistic-only entry, never made it to the server. Reset
// status so the user can press Send again.
setItems((prev) =>
prev.map((item) =>
item.id === id
? { ...item, status: 'queued', error: undefined }
: item,
),
)
return
}
if (!enabled || !baseUrl || !agentId) return
void fetch(
`${baseUrl}/claw/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(id)}/retry`,
{ method: 'POST' },
).catch(() => {})
},
[baseUrl, agentId, enabled],
)
return { queue: items, enqueue, cancel, retry }
}
function serverStatusToClient(
status: ServerQueuedItem['status'],
): OutboundMessageStatus {
if (status === 'dispatching') return 'sending'
if (status === 'failed') return 'failed'
return 'queued'
}

View File

@@ -0,0 +1,42 @@
import { Bot, Cpu, Sparkles } from 'lucide-react'
import type { FC } from 'react'
import type { HarnessAgentAdapter } from './agent-harness-types'
/**
* Single icon component for any adapter the agent rail can render.
* Falls back to a generic bot when the adapter is unknown so future
* adapters land without a code change at the call site.
*/
interface AdapterIconProps {
adapter: HarnessAgentAdapter | 'unknown'
className?: string
}
export const AdapterIcon: FC<AdapterIconProps> = ({ adapter, className }) => {
switch (adapter) {
case 'claude':
// Claude Code — text-based agent, sparkles to evoke the "AI assistant" feel.
return <Sparkles className={className} aria-label="Claude Code" />
case 'codex':
// Codex — code-leaning, CPU mark.
return <Cpu className={className} aria-label="Codex" />
case 'openclaw':
// OpenClaw — bot/automation framing.
return <Bot className={className} aria-label="OpenClaw" />
default:
return <Bot className={className} aria-label="Agent" />
}
}
export function adapterLabel(adapter: HarnessAgentAdapter | 'unknown'): string {
switch (adapter) {
case 'claude':
return 'Claude Code'
case 'codex':
return 'Codex'
case 'openclaw':
return 'OpenClaw'
default:
return 'Agent'
}
}

View File

@@ -1,117 +1,176 @@
import { Bot, Cpu, Loader2, MessageSquare, Plus, Trash2 } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card'
import { Loader2 } from 'lucide-react'
import { type FC, useMemo } from 'react'
import { AgentRowCard } from './AgentRowCard'
import { AgentsEmptyState } from './AgentsEmptyState'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from './agent-harness-types'
import type {
AgentAdapterHealth,
AgentRowData,
} from './agent-row/agent-row.types'
import type { AgentListItem } from './agents-page-types'
import type { AgentLiveness } from './LivenessDot'
interface AgentListProps {
agents: AgentListItem[]
/** Optional per-agent activity metadata, keyed by `agentId`. */
activity?: Record<
string,
{ status: AgentLiveness; lastUsedAt: number | null }
>
/** Lookup table from harness id → enriched agent record. */
harnessAgentLookup?: Map<string, HarnessAgent>
/** Adapter catalog (carries per-adapter health). */
adapters: HarnessAdapterDescriptor[]
loading: boolean
deletingAgentKey: string | null
onChatAgent: (agent: AgentListItem) => void
onCreateAgent: () => void
onDeleteAgent: (agent: AgentListItem) => void
onPinToggle: (agent: AgentListItem, next: boolean) => void
}
export const AgentList: FC<AgentListProps> = ({
agents,
activity,
harnessAgentLookup,
adapters,
loading,
deletingAgentKey,
onChatAgent,
onCreateAgent,
onDeleteAgent,
onPinToggle,
}) => {
const adapterHealth = useMemo(() => {
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
for (const adapter of adapters) {
if (adapter.health) {
map.set(adapter.id, {
healthy: adapter.health.healthy,
reason: adapter.health.reason,
})
}
}
return map
}, [adapters])
// Sort: pinned rows first, then most recently used, then never-used
// agents in id-stable order. The gateway's `main` agent stays
// pinned-to-top when never touched so a fresh install has an
// obvious starting point.
const ordered = useMemo(() => {
const withMeta = agents.map((agent) => {
const harness = harnessAgentLookup?.get(agent.agentId)
return {
agent,
pinned: harness?.pinned ?? false,
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
}
})
return withMeta
.sort((a, b) => {
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? -Infinity
const bValue = b.lastUsedAt ?? -Infinity
if (aValue !== bValue) return bValue - aValue
return a.agent.agentId.localeCompare(b.agent.agentId)
})
.map((entry) => entry.agent)
}, [activity, agents, harnessAgentLookup])
if (loading && agents.length === 0) {
return (
<div className="flex h-36 items-center justify-center rounded-lg border border-border/70">
<div className="flex h-36 items-center justify-center rounded-xl border border-border border-dashed bg-card/50">
<Loader2 className="size-5 animate-spin text-muted-foreground" />
</div>
)
}
if (agents.length === 0) {
return (
<Card>
<CardContent className="flex h-48 flex-col items-center justify-center gap-4 text-center">
<div className="flex size-10 items-center justify-center rounded-lg bg-muted text-muted-foreground">
<Bot className="size-5" />
</div>
<div className="space-y-1">
<h2 className="font-medium text-base">No agents</h2>
<p className="text-muted-foreground text-sm">
Create an OpenClaw, Claude Code, or Codex agent.
</p>
</div>
<Button variant="outline" onClick={onCreateAgent}>
<Plus className="mr-2 size-4" />
New Agent
</Button>
</CardContent>
</Card>
)
return <AgentsEmptyState onCreateAgent={onCreateAgent} />
}
return (
<div className="grid gap-3">
{agents.map((agent) => (
<Card key={agent.key} className="rounded-lg border-border/70">
<CardHeader className="flex flex-row items-center justify-between gap-4 py-3">
<div className="flex min-w-0 items-center gap-3">
<div className="flex size-10 shrink-0 items-center justify-center rounded-lg bg-muted text-muted-foreground">
{agent.source === 'openclaw' ? (
<Cpu className="size-5" />
) : (
<Bot className="size-5" />
)}
</div>
<div className="min-w-0">
<CardTitle className="truncate text-base">
{agent.name}
</CardTitle>
<div className="mt-1 flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
<Badge variant="outline" className="rounded-md">
{agent.runtimeLabel}
</Badge>
<span>{agent.modelLabel}</span>
<Badge variant="outline" className="rounded-md">
main
</Badge>
</div>
<p className="mt-1 truncate font-mono text-muted-foreground text-xs">
{agent.detail}
</p>
</div>
</div>
<div className="flex shrink-0 items-center gap-1">
<Button
variant="ghost"
size="sm"
onClick={() => onChatAgent(agent)}
disabled={!agent.canChat}
>
<MessageSquare className="mr-1 size-4" />
Chat
</Button>
{agent.canDelete ? (
<Button
variant="ghost"
size="icon"
title="Delete agent"
onClick={() => onDeleteAgent(agent)}
disabled={deletingAgentKey === agent.key}
>
{deletingAgentKey === agent.key ? (
<Loader2 className="size-4 animate-spin" />
) : (
<Trash2 className="size-4 text-destructive" />
)}
</Button>
) : null}
</div>
</CardHeader>
</Card>
))}
{ordered.map((agent) => {
const harness = harnessAgentLookup?.get(agent.agentId)
const adapter: HarnessAgentAdapter | 'unknown' =
harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
const data = buildRowData({
agent,
adapter,
harness,
activity: activity?.[agent.agentId],
adapterHealth:
adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
})
return (
<AgentRowCard
key={agent.key}
data={data}
deleting={deletingAgentKey === agent.key}
onDelete={onDeleteAgent}
onPinToggle={onPinToggle}
/>
)
})}
</div>
)
}
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
const lower = label?.toLowerCase()
if (lower === 'claude code') return 'claude'
if (lower === 'codex') return 'codex'
if (lower === 'openclaw') return 'openclaw'
return 'unknown'
}
const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
function buildRowData(input: {
agent: AgentListItem
adapter: HarnessAgentAdapter | 'unknown'
harness: HarnessAgent | undefined
activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
adapterHealth: AgentAdapterHealth | null
}): AgentRowData {
const { agent, adapter, harness, activity, adapterHealth } = input
return {
agent,
adapter,
modelLabel: deriveModelLabel(agent, harness),
reasoningEffort: harness?.reasoningEffort ?? null,
status: activity?.status ?? 'unknown',
lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
pinned: harness?.pinned ?? false,
cwd: harness?.cwd ?? null,
lastUserMessage: harness?.lastUserMessage ?? null,
tokens: harness?.tokens ?? null,
turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
lastError: harness?.lastError ?? null,
lastErrorAt: harness?.lastErrorAt ?? null,
activeTurnId: harness?.activeTurnId ?? null,
adapterHealth,
}
}
function deriveModelLabel(
agent: AgentListItem,
harness: HarnessAgent | undefined,
): string | null {
// Prefer the agent rail's modelLabel when meaningful; harness's
// modelId is a stable identifier but the rail's `modelLabel`
// already maps to a friendly display string.
if (agent.modelLabel && agent.modelLabel !== 'default') {
return agent.modelLabel
}
return harness?.modelId ?? null
}

View File

@@ -0,0 +1,99 @@
import type { FC } from 'react'
import { cn } from '@/lib/utils'
import { AgentActions } from './agent-row/AgentActions'
import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
import { AgentLastMessage } from './agent-row/AgentLastMessage'
import { AgentMetaRow } from './agent-row/AgentMetaRow'
import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
import { AgentTile } from './agent-row/AgentTile'
import { AgentTitleRow } from './agent-row/AgentTitleRow'
import type {
AgentRowCallbacks,
AgentRowData,
} from './agent-row/agent-row.types'
interface AgentRowCardProps extends AgentRowCallbacks {
data: AgentRowData
/** Whether THIS agent is mid-delete; renders a spinner in the menu. */
deleting?: boolean
}
/**
* Composition shell for the agent rail. Owns no state; sub-components
* each handle their own micro-state (error-panel collapse, etc.) and
* emit callbacks (delete, pin/unpin) for the page to act on.
*
* The whole card carries state — not just the tile — so the row's
* border subtly tells the user what's going on at a glance:
* working → accent-orange border with a soft glow
* error → destructive border
* idle → muted border, lifts on hover
*/
export const AgentRowCard: FC<AgentRowCardProps> = ({
data,
deleting,
onDelete,
onPinToggle,
}) => {
return (
<div
className={cn(
// Layout-stable hover. No translate, no shadow change — both
// visibly perturb neighbouring rows. Only the border tint
// shifts on hover, and the rail's vertical rhythm stays
// exactly the same in every state.
'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
data.status === 'working'
? 'border-[var(--accent-orange)]/40'
: data.status === 'error'
? 'border-destructive/40'
: 'border-border hover:border-[var(--accent-orange)]/30',
)}
>
<div className="flex items-start gap-4">
<AgentTile
adapter={data.adapter}
status={data.status}
lastUsedAt={data.lastUsedAt}
/>
<div className="min-w-0 flex-1">
<AgentTitleRow
agent={data.agent}
status={data.status}
pinned={data.pinned}
turnsByDay={data.turnsByDay}
failedByDay={data.failedByDay}
onPinToggle={(next) => onPinToggle(data.agent, next)}
/>
<AgentSummaryChips
adapter={data.adapter}
modelLabel={data.modelLabel}
reasoningEffort={data.reasoningEffort}
adapterHealth={data.adapterHealth}
/>
<AgentLastMessage message={data.lastUserMessage} />
<AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
{data.status === 'error' && data.lastError && (
<AgentErrorPanel
agentId={data.agent.agentId}
message={data.lastError}
errorAt={data.lastErrorAt}
/>
)}
</div>
<AgentActions
agent={data.agent}
activeTurnId={data.activeTurnId}
deleting={deleting}
onDelete={onDelete}
/>
</div>
</div>
)
}

View File

@@ -0,0 +1,32 @@
import { Bot, Plus } from 'lucide-react'
import type { FC } from 'react'
import { Button } from '@/components/ui/button'
interface AgentsEmptyStateProps {
onCreateAgent: () => void
}
export const AgentsEmptyState: FC<AgentsEmptyStateProps> = ({
onCreateAgent,
}) => {
return (
<div className="rounded-xl border border-border border-dashed bg-card/50 p-12 text-center">
<div className="mx-auto mb-4 flex h-12 w-12 items-center justify-center rounded-xl bg-[var(--accent-orange)]/10">
<Bot className="h-6 w-6 text-[var(--accent-orange)]" />
</div>
<h3 className="mb-1 font-semibold">No agents yet</h3>
<p className="mx-auto mb-4 max-w-sm text-muted-foreground text-sm">
Spin up an OpenClaw, Claude Code, or Codex agent to chat with, schedule,
or run in the background.
</p>
<Button
onClick={onCreateAgent}
variant="outline"
className="border-[var(--accent-orange)] bg-[var(--accent-orange)]/10 text-[var(--accent-orange)] hover:bg-[var(--accent-orange)]/20 hover:text-[var(--accent-orange)]"
>
<Plus className="mr-1.5 h-4 w-4" />
Create your first agent
</Button>
</div>
)
}

View File

@@ -0,0 +1,41 @@
import { Bot, Plus } from 'lucide-react'
import type { FC } from 'react'
import { Button } from '@/components/ui/button'
interface AgentsHeaderProps {
onCreateAgent: () => void
}
/**
* Mirrors the visual shape of `SoulHeader` and `ScheduledTasksHeader`
* so the page reads as part of the same family. Loose lifecycle
* controls that used to sit next to the title moved into
* `GatewayStatusBar` — they're OpenClaw-specific and don't apply to
* Claude/Codex agents.
*/
export const AgentsHeader: FC<AgentsHeaderProps> = ({ onCreateAgent }) => {
return (
<div className="rounded-xl border border-border bg-card p-6 shadow-sm transition-all hover:shadow-md">
<div className="flex items-start gap-4">
<div className="flex h-12 w-12 shrink-0 items-center justify-center rounded-xl bg-[var(--accent-orange)]/10">
<Bot className="h-6 w-6 text-[var(--accent-orange)]" />
</div>
<div className="flex-1">
<h2 className="mb-1 font-semibold text-xl">Agents</h2>
<p className="text-muted-foreground text-sm">
OpenClaw, Claude Code, and Codex agents chat, schedule, and run
them in the background.
</p>
</div>
<Button
onClick={onCreateAgent}
className="border-[var(--accent-orange)] bg-[var(--accent-orange)]/10 text-[var(--accent-orange)] hover:bg-[var(--accent-orange)]/20 hover:text-[var(--accent-orange)]"
variant="outline"
>
<Plus className="mr-1.5 h-4 w-4" />
New Agent
</Button>
</div>
</div>
)
}

View File

@@ -3,8 +3,9 @@ import { type FC, useMemo, useState } from 'react'
import { useNavigate } from 'react-router'
import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
import { AgentList } from './AgentList'
import { AgentsHeader } from './AgentsHeader'
import { AgentTerminal } from './AgentTerminal'
import type { HarnessAgentAdapter } from './agent-harness-types'
import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
import { createAgentPageActions } from './agents-page-actions'
import {
useDefaultAgentName,
@@ -29,9 +30,9 @@ import {
toHarnessListItem,
toOpenClawListItem,
} from './agents-page-utils'
import { GatewayStatusBar } from './GatewayStatusBar'
import { NewAgentDialog } from './NewAgentDialog'
import {
AgentsPageHeader,
ControlPlaneAlert,
GatewayStateCards,
InlineErrorAlert,
@@ -43,51 +44,45 @@ import {
useCreateHarnessAgent,
useDeleteHarnessAgent,
useHarnessAgents,
useUpdateHarnessAgent,
} from './useAgents'
import {
useOpenClawAgents,
useOpenClawMutations,
useOpenClawStatus,
} from './useOpenClaw'
import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'
export const AgentsPage: FC = () => {
const navigate = useNavigate()
const {
status,
loading: statusLoading,
error: statusError,
refetch: refetchStatus,
} = useOpenClawStatus()
const { providers, defaultProviderId } = useLlmProviders()
const {
adapters,
loading: adaptersLoading,
error: adaptersError,
refetch: refetchAdapters,
} = useAgentAdapters()
// The harness listing now carries the gateway lifecycle snapshot
// alongside the agents — one polling source for everything the
// agents page renders. The legacy `/claw/status` poll is dead from
// this surface; the chat-panel layout still uses it for now.
const {
harnessAgents,
gateway: status,
loading: harnessAgentsLoading,
error: harnessAgentsError,
} = useHarnessAgents()
const openClawAgentsEnabled =
status?.status === 'running' && status.controlPlaneStatus === 'connected'
const {
agents: openClawAgents,
loading: openClawAgentsLoading,
error: openClawAgentsError,
refetch: refetchOpenClawAgents,
} = useOpenClawAgents(openClawAgentsEnabled)
const {
harnessAgents,
loading: harnessAgentsLoading,
error: harnessAgentsError,
refetch: refetchHarnessAgents,
} = useHarnessAgents()
const createHarnessAgent = useCreateHarnessAgent()
const deleteHarnessAgent = useDeleteHarnessAgent()
const updateHarnessAgent = useUpdateHarnessAgent()
const {
setupOpenClaw,
createAgent: createOpenClawAgent,
deleteAgent: deleteOpenClawAgent,
startOpenClaw,
stopOpenClaw,
restartOpenClaw,
reconnectOpenClaw,
actionInProgress,
@@ -158,42 +153,68 @@ export const AgentsPage: FC = () => {
openClawAgentsEnabled,
openClawAgents,
)
const agentListItems = useMemo(
() => [
...visibleOpenClawAgents.map((agent) =>
const agentListItems = useMemo(() => {
// Dual-created OpenClaw agents (and the backfilled `main`/orphans
// post Step 9) live in both `/claw/agents` and `/agents` under the
// same id. Prefer the harness entry — it carries adapter/model/
// reasoning/lastUsedAt/status that the chat path actually uses —
// and drop the legacy duplicate so the rail doesn't show every
// OpenClaw agent twice.
const harnessIds = new Set(harnessAgents.map((agent) => agent.id))
const dedupedOpenClawAgents = visibleOpenClawAgents.filter(
(agent) => !harnessIds.has(agent.agentId),
)
return [
...dedupedOpenClawAgents.map((agent) =>
toOpenClawListItem(agent, openClawManageable),
),
...harnessAgents.map(toHarnessListItem),
],
[harnessAgents, openClawManageable, visibleOpenClawAgents],
)
]
}, [harnessAgents, openClawManageable, visibleOpenClawAgents])
// Lookup map so AgentList can render adapter chips, reasoning, etc.
// Computed up here to keep all hooks above the early returns below.
const harnessAgentLookup = useMemo(() => {
const map = new Map<string, HarnessAgent>()
for (const agent of harnessAgents) map.set(agent.id, agent)
return map
}, [harnessAgents])
// Activity map keyed by agentId. Sourced from the harness listing's
// server-side enrichment (`status` + `lastUsedAt`). Legacy gateway
// agents that don't have a harness record yet (rare post-backfill)
// simply miss from the map and render with the default `unknown`
// dot until reconciliation picks them up.
const agentActivity = useMemo(() => {
const map: Record<
string,
{
status: 'working' | 'idle' | 'asleep' | 'error'
lastUsedAt: number | null
}
> = {}
for (const agent of harnessAgents) {
if (!agent.status) continue
map[agent.id] = {
status: agent.status,
lastUsedAt: agent.lastUsedAt ?? null,
}
}
return map
}, [harnessAgents])
const inlineError = getInlineError({
lifecyclePending,
pageError,
statusError,
openClawAgentsError,
adaptersError,
harnessAgentsError,
})
const agentsLoading = getAgentsLoading({
statusLoading,
adaptersLoading,
harnessAgentsLoading,
openClawAgentsEnabled,
openClawAgentsLoading,
})
const creatingAgent = creatingOpenClawAgent || createHarnessAgent.isPending
const deletingAgent = deletingOpenClawAgent || deleteHarnessAgent.isPending
const refreshAll = async () => {
await Promise.all([
refetchStatus(),
refetchAdapters(),
refetchHarnessAgents(),
openClawAgentsEnabled ? refetchOpenClawAgents() : Promise.resolve(),
])
}
const handleHarnessAdapterChange = (adapter: HarnessAgentAdapter) => {
const descriptor = adapters.find((entry) => entry.id === adapter)
setHarnessAdapterId(adapter)
@@ -239,7 +260,9 @@ export const AgentsPage: FC = () => {
)
}
if (statusLoading && !status) {
// First-paint loader: until the harness listing has resolved at
// least once we don't know which adapters / agents to render.
if (harnessAgentsLoading && !status) {
return (
<div className="flex items-center justify-center py-20">
<Loader2 className="size-6 animate-spin text-muted-foreground" />
@@ -255,27 +278,18 @@ export const AgentsPage: FC = () => {
const recoveryDetail = status ? getRecoveryDetail(status) : null
const controlPlaneCopy = getControlPlaneCopyForStatus(status)
// Bar only makes sense when the gateway is meaningfully alive AND
// there's at least one OpenClaw agent in the merged list. Hide it
// for Claude/Codex-only setups so the page stays uncluttered.
const showGatewayStatusBar =
status?.status === 'running' &&
(visibleOpenClawAgents.length > 0 ||
harnessAgents.some((agent) => agent.adapter === 'openclaw'))
return (
<div className="min-h-full bg-background px-6 py-8">
<div className="mx-auto flex w-full max-w-5xl flex-col gap-6">
<AgentsPageHeader
actionInProgress={actionInProgress}
controlPlaneBusy={gatewayUiState.controlPlaneBusy}
reconnecting={reconnecting}
status={status}
onCreateAgent={() => setCreateOpen(true)}
onOpenTerminal={() => setShowTerminal(true)}
onReconnect={() => {
void runWithPageErrorHandling(reconnectOpenClaw)
}}
onRefresh={() => void refreshAll()}
onRestart={() => {
void runWithPageErrorHandling(restartOpenClaw)
}}
onStop={() => {
void runWithPageErrorHandling(stopOpenClaw)
}}
/>
<div className="fade-in slide-in-from-bottom-5 mx-auto flex w-full max-w-5xl animate-in flex-col gap-6 duration-500">
<AgentsHeader onCreateAgent={() => setCreateOpen(true)} />
{lifecycleBanner ? <LifecycleAlert message={lifecycleBanner} /> : null}
@@ -315,15 +329,39 @@ export const AgentsPage: FC = () => {
}}
/>
{showGatewayStatusBar ? (
<GatewayStatusBar
status={status}
actionInProgress={actionInProgress}
onOpenTerminal={() => setShowTerminal(true)}
onRestart={() => {
void runWithPageErrorHandling(restartOpenClaw)
}}
/>
) : null}
<AgentList
agents={agentListItems}
activity={agentActivity}
harnessAgentLookup={harnessAgentLookup}
adapters={adapters}
loading={agentsLoading}
deletingAgentKey={deletingAgent ? deletingAgentKey : null}
onChatAgent={(agent) => navigate(`/agents/${agent.agentId}`)}
onCreateAgent={() => setCreateOpen(true)}
onDeleteAgent={(agent) => {
void handleDelete(agent)
}}
onPinToggle={(agent, next) => {
// Optimistic mutation; harness-only — gateway-original
// OpenClaw entries are gated server-side via the harness
// backfill, so we only fire when the row maps to a
// harness agent record.
if (!harnessAgentLookup.has(agent.agentId)) return
updateHarnessAgent.mutate({
agentId: agent.agentId,
patch: { pinned: next },
})
}}
/>
<SetupOpenClawDialog

View File

@@ -0,0 +1,206 @@
import { Loader2, RotateCcw, Terminal } from 'lucide-react'
import type { FC, ReactNode } from 'react'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import { Separator } from '@/components/ui/separator'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import { cn } from '@/lib/utils'
import type { OpenClawStatus } from './useOpenClaw'
interface GatewayStatusBarProps {
status: OpenClawStatus | null
/** Disabled while a gateway lifecycle mutation is mid-flight. */
actionInProgress: boolean
onOpenTerminal: () => void
onRestart: () => void
}
/**
* Compact one-line status bar for the OpenClaw gateway. Renders the
* lifecycle pills (Running / Control plane connected) plus a Terminal
* escape hatch and a Restart Gateway action. Lives between the page
* header and the agent list when at least one OpenClaw agent is in
* the merged list; collapses to nothing for Claude/Codex-only setups.
*
* Status is sourced from `GET /agents`'s `gateway` field — the agents
* page no longer polls `/claw/status` directly. One endpoint, one
* 5s interval, no duplicate state.
*/
export const GatewayStatusBar: FC<GatewayStatusBarProps> = ({
status,
actionInProgress,
onOpenTerminal,
onRestart,
}) => {
if (!status) return null
const runningPill = pillForRuntimeStatus(status.status)
const controlPlanePill = pillForControlPlane(status.controlPlaneStatus)
return (
<div className="rounded-xl border border-border bg-card px-4 py-3 shadow-sm">
<div className="flex items-center gap-3 text-sm">
<span className="font-medium text-muted-foreground">
OpenClaw gateway
</span>
<Badge
variant={runningPill.variant}
className={cn('gap-1.5', runningPill.className)}
>
<span
className={cn(
'inline-block h-1.5 w-1.5 rounded-full',
runningPill.dot,
)}
/>
{runningPill.label}
</Badge>
<Badge
variant={controlPlanePill.variant}
className={cn('gap-1.5', controlPlanePill.className)}
>
<span
className={cn(
'inline-block h-1.5 w-1.5 rounded-full',
controlPlanePill.dot,
)}
/>
{controlPlanePill.label}
</Badge>
<Separator orientation="vertical" className="h-4" />
<WithTooltip label="Open a shell into the OpenClaw gateway container for raw CLI access (config edits, session inspection).">
<Button variant="ghost" size="sm" onClick={onOpenTerminal}>
<Terminal className="mr-1.5 h-3.5 w-3.5" />
Terminal
</Button>
</WithTooltip>
<WithTooltip label="Restart the OpenClaw gateway. Useful when the gateway is stuck or after editing provider config.">
<Button
variant="ghost"
size="sm"
onClick={onRestart}
disabled={actionInProgress}
className="ml-auto"
>
{actionInProgress ? (
<Loader2 className="mr-1.5 h-3.5 w-3.5 animate-spin" />
) : (
<RotateCcw className="mr-1.5 h-3.5 w-3.5" />
)}
Restart Gateway
</Button>
</WithTooltip>
</div>
</div>
)
}
const WithTooltip: FC<{ label: string; children: ReactNode }> = ({
label,
children,
}) => (
<TooltipProvider delayDuration={250}>
<Tooltip>
<TooltipTrigger asChild>{children}</TooltipTrigger>
<TooltipContent side="bottom" className="max-w-xs text-xs">
{label}
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
type PillKind = {
variant: 'default' | 'secondary' | 'outline' | 'destructive'
label: string
dot: string
className?: string
}
function pillForRuntimeStatus(status: OpenClawStatus['status']): PillKind {
switch (status) {
case 'running':
return {
variant: 'secondary',
label: 'Running',
dot: 'bg-emerald-500',
className: 'bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
}
case 'starting':
return {
variant: 'secondary',
label: 'Starting',
dot: 'bg-amber-500 animate-pulse',
className: 'bg-amber-50 text-amber-900 hover:bg-amber-50',
}
case 'stopped':
return {
variant: 'outline',
label: 'Stopped',
dot: 'bg-muted-foreground/40',
}
case 'error':
return {
variant: 'destructive',
label: 'Error',
dot: 'bg-destructive-foreground',
}
default:
return {
variant: 'outline',
label: 'Unknown',
dot: 'bg-muted-foreground/40',
}
}
}
function pillForControlPlane(
status: OpenClawStatus['controlPlaneStatus'],
): PillKind {
switch (status) {
case 'connected':
return {
variant: 'secondary',
label: 'Control plane connected',
dot: 'bg-emerald-500',
className: 'bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
}
case 'connecting':
return {
variant: 'secondary',
label: 'Connecting',
dot: 'bg-amber-500 animate-pulse',
className: 'bg-amber-50 text-amber-900 hover:bg-amber-50',
}
case 'reconnecting':
return {
variant: 'secondary',
label: 'Reconnecting',
dot: 'bg-amber-500 animate-pulse',
className: 'bg-amber-50 text-amber-900 hover:bg-amber-50',
}
case 'recovering':
return {
variant: 'secondary',
label: 'Recovering',
dot: 'bg-amber-500 animate-pulse',
className: 'bg-amber-50 text-amber-900 hover:bg-amber-50',
}
case 'failed':
return {
variant: 'destructive',
label: 'Needs attention',
dot: 'bg-destructive-foreground',
}
default:
return {
variant: 'outline',
label: 'Disconnected',
dot: 'bg-muted-foreground/40',
}
}
}

View File

@@ -0,0 +1,83 @@
import type { FC } from 'react'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import { cn } from '@/lib/utils'
export type AgentLiveness = 'working' | 'idle' | 'asleep' | 'error' | 'unknown'
interface LivenessDotProps {
status: AgentLiveness
/**
* Optional human-friendly secondary line, e.g. "Idle for 4 min" or
* "Asleep — no activity for 22 min". When absent the tooltip just
* reads the status label.
*/
detail?: string
className?: string
}
const VARIANT: Record<
AgentLiveness,
{ dot: string; ring: string; label: string }
> = {
working: {
// Animated amber pulse + soft halo so the eye catches an active
// agent in a long list without the dot screaming for attention.
dot: 'bg-amber-500 animate-pulse',
ring: 'ring-2 ring-amber-200',
label: 'Working on a turn',
},
idle: {
dot: 'bg-emerald-500',
ring: 'ring-2 ring-emerald-100',
label: 'Idle',
},
asleep: {
dot: 'bg-muted-foreground/40',
ring: 'ring-2 ring-muted',
label: 'Asleep',
},
error: {
dot: 'bg-destructive',
ring: 'ring-2 ring-destructive/30',
label: 'Attention',
},
unknown: {
dot: 'bg-muted-foreground/30',
ring: 'ring-2 ring-muted',
label: 'Status unknown',
},
}
export const LivenessDot: FC<LivenessDotProps> = ({
status,
detail,
className,
}) => {
const variant = VARIANT[status]
return (
<TooltipProvider delayDuration={150}>
<Tooltip>
<TooltipTrigger asChild>
<span
role="img"
aria-label={detail ?? variant.label}
className={cn(
'inline-block h-3 w-3 rounded-full',
variant.dot,
variant.ring,
className,
)}
/>
</TooltipTrigger>
<TooltipContent side="right" className="text-xs">
{detail ?? variant.label}
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
}

View File

@@ -154,7 +154,6 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="openclaw">OpenClaw</SelectItem>
{adapters.map((adapter) => (
<SelectItem key={adapter.id} value={adapter.id}>
{adapter.name}

View File

@@ -0,0 +1,107 @@
import type { AgentListItem } from './agents-page-types'
import type { AgentLiveness } from './LivenessDot'
/**
* Display rules for the redesigned agent rows. Pure helpers — no React,
* no API calls — so they're trivial to unit-test and the row card stays
* focused on layout.
*/
const UUID_PATTERN =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
const OC_UUID_PATTERN =
/^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
/**
* The agent rail used to render whatever the gateway returned for `name`.
* Post-migration that's frequently the agent's UUID — readable to nobody.
* Prefer the explicit `name` when it differs meaningfully from the id;
* otherwise fall back to a short prefix users can recognize on second
* glance.
*/
export function displayName(agent: AgentListItem): string {
const name = agent.name?.trim()
const id = agent.agentId
if (!name || name === id) {
if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11) // "oc-XXXXXXXX"
if (UUID_PATTERN.test(id)) return id.slice(0, 8)
return id
}
return name
}
export function canDelete(agent: AgentListItem): boolean {
// The gateway's protected `main` agent must not be deletable. The
// server enforces this too, but disabling the menu item avoids users
// hitting an opaque 400.
if (agent.agentId === 'main') return false
return agent.canDelete
}
/**
* Rename will be wired to a future `PATCH /agents/:id` endpoint. The
* legacy `/claw/agents` create flow named the agent on the gateway via
* the `name` field but the field isn't editable post-create today.
*/
export function canRename(_agent: AgentListItem): boolean {
return false
}
/**
* The detail line carries the agent's workspace path. The `detail`
* field on AgentListItem already holds it for OpenClaw entries
* (`/home/node/.openclaw/workspace-...`); for harness agents it's the
* synthetic `<adapter>:main` marker that's not informative — hide it.
*/
export function workspaceLabel(agent: AgentListItem): string | null {
if (!agent.detail) return null
if (/^(claude|codex|openclaw):main$/.test(agent.detail)) return null
return agent.detail
}
const ONE_MINUTE = 60_000
const ONE_HOUR = 60 * ONE_MINUTE
const ONE_DAY = 24 * ONE_HOUR
/**
* Lightweight relative-time formatter. We don't want to drag in
* `dayjs/relativeTime` just for a few labels.
*/
export function formatRelativeTime(epochMs: number | null): string {
if (epochMs === null || !Number.isFinite(epochMs)) return 'never'
const diff = Math.max(0, Date.now() - epochMs)
if (diff < ONE_MINUTE) return 'just now'
if (diff < ONE_HOUR) {
const m = Math.floor(diff / ONE_MINUTE)
return `${m} min ago`
}
if (diff < ONE_DAY) {
const h = Math.floor(diff / ONE_HOUR)
return h === 1 ? '1 hr ago' : `${h} hr ago`
}
const d = Math.floor(diff / ONE_DAY)
return d === 1 ? '1 day ago' : `${d} days ago`
}
/**
* Tooltip-friendly description of a row's current liveness state.
* Returns `undefined` when the state has nothing extra to add (e.g.
* `unknown` with no timestamp).
*/
export function livenessDetail(
status: AgentLiveness,
lastUsedAt: number | null | undefined,
): string | undefined {
if (lastUsedAt == null) return undefined
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
if (status === 'asleep') {
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
const hr = Math.floor(diffMin / 60)
return `Asleep — quiet for ${hr} hr`
}
if (status === 'working') return 'Working on a turn'
if (status === 'error') return 'Attention — last turn failed'
return undefined
}

View File

@@ -1,6 +1,6 @@
import type { AgentEntry } from './useOpenClaw'
export type HarnessAgentAdapter = 'claude' | 'codex'
export type HarnessAgentAdapter = 'claude' | 'codex' | 'openclaw'
export type AgentHarnessStreamEvent =
| {
@@ -33,6 +33,8 @@ export type AgentHarnessStreamEvent =
code?: string
}
export type HarnessAgentLiveness = 'working' | 'idle' | 'asleep' | 'error'
export interface HarnessAgent {
id: string
name: string
@@ -43,6 +45,54 @@ export interface HarnessAgent {
sessionKey: string
createdAt: number
updatedAt: number
/**
* Server-derived liveness state. When the listing endpoint hasn't
* been enriched yet (older deployments) this is undefined and the UI
* falls back to `unknown`.
*/
status?: HarnessAgentLiveness
/**
* Wall-clock ms of the last persisted turn. `null` for never-used
* agents. Drives the recency sort and the "Last used X min ago" copy.
*/
lastUsedAt?: number | null
/** Pinned agents float to the top of the list. Defaults to `false`. */
pinned?: boolean
/** First non-blank line of the most recent user message; null if none. */
lastUserMessage?: string | null
/** Working directory the agent runs in; null when no session record yet. */
cwd?: string | null
/** Cumulative + 7-day rolling token usage; null when no record. */
tokens?: {
last7d: { input: number; output: number; requestCount: number }
cumulative: { input: number; output: number }
} | null
turnsByDay?: number[]
failedByDay?: number[]
lastError?: string | null
lastErrorAt?: number | null
/** When non-null, an in-flight turn this row can be resumed from. */
activeTurnId?: string | null
/** Persistent FIFO queue of messages waiting for this agent. */
queue?: HarnessQueuedMessage[]
}
export interface HarnessQueuedMessageAttachment {
mediaType: string
data: string
}
export interface HarnessQueuedMessage {
id: string
createdAt: number
message: string
attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
}
export interface HarnessAdapterHealth {
healthy: boolean
reason?: string
checkedAt: number
}
export interface HarnessAdapterDescriptor {
@@ -53,6 +103,7 @@ export interface HarnessAdapterDescriptor {
modelControl: 'runtime-supported' | 'best-effort'
models: Array<{ id: string; label: string; recommended?: boolean }>
reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
health?: HarnessAdapterHealth
}
export interface CreateHarnessAgentInput {
@@ -62,19 +113,36 @@ export interface CreateHarnessAgentInput {
reasoningEffort?: string
}
export interface HarnessTranscriptEntry {
export interface HarnessHistoryReasoning {
text: string
durationMs?: number
}
export interface HarnessHistoryToolCall {
toolCallId?: string
toolName: string
status: 'pending' | 'running' | 'completed' | 'failed'
input?: unknown
output?: unknown
error?: string
durationMs?: number
}
export interface HarnessHistoryEntry {
id: string
agentId: string
sessionId: 'main'
role: 'user' | 'assistant'
text: string
createdAt: number
reasoning?: HarnessHistoryReasoning
toolCalls?: HarnessHistoryToolCall[]
}
export interface HarnessAgentHistoryPage {
agentId: string
sessionId: 'main'
items: HarnessTranscriptEntry[]
items: HarnessHistoryEntry[]
}
export function mapHarnessAgentToEntry(agent: HarnessAgent): AgentEntry {

View File

@@ -0,0 +1,160 @@
import {
Copy,
Loader2,
MessageSquare,
MoreHorizontal,
Pencil,
RotateCcw,
Trash2,
} from 'lucide-react'
import type { FC } from 'react'
import { useNavigate } from 'react-router'
import { toast } from 'sonner'
import { Button } from '@/components/ui/button'
import {
DropdownMenu,
DropdownMenuContent,
DropdownMenuItem,
DropdownMenuSeparator,
DropdownMenuTrigger,
} from '@/components/ui/dropdown-menu'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import {
canDelete as canDeleteAgent,
canRename as canRenameAgent,
displayName,
} from '../agent-display.helpers'
import type { AgentListItem } from '../agents-page-types'
interface AgentActionsProps {
agent: AgentListItem
activeTurnId: string | null
deleting?: boolean
onDelete: (agent: AgentListItem) => void
}
/**
* Single primary CTA per row: `Resume` (filled, accent-orange, with a
* pulsing dot) when an active turn exists; otherwise `Chat` (outline).
* Both navigate to the same place — the chat hook auto-attaches via
* `/chat/active` when there's a live turn — but the row signals which
* action the user is actually taking.
*/
export const AgentActions: FC<AgentActionsProps> = ({
agent,
activeTurnId,
deleting,
onDelete,
}) => {
const navigate = useNavigate()
const allowDelete = canDeleteAgent(agent)
const allowRename = canRenameAgent(agent)
const handleChat = () => navigate(`/agents/${agent.agentId}`)
const handleCopyId = async () => {
try {
await navigator.clipboard.writeText(agent.agentId)
toast.success('Agent id copied')
} catch {
toast.error('Could not copy agent id')
}
}
return (
<div className="flex shrink-0 items-center gap-1.5">
{activeTurnId ? (
<Button
variant="default"
size="sm"
onClick={handleChat}
className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
>
<span className="relative flex size-2">
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
<span className="relative inline-flex size-2 rounded-full bg-white" />
</span>
Resume
</Button>
) : (
<Button variant="outline" size="sm" onClick={handleChat}>
<MessageSquare className="mr-1.5 size-3" />
Chat
</Button>
)}
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button
variant="ghost"
size="icon"
aria-label={`More actions for ${displayName(agent)}`}
className="size-8 text-muted-foreground hover:text-foreground"
>
<MoreHorizontal className="size-4" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" className="w-44">
<DropdownMenuItem onSelect={() => void handleCopyId()}>
<Copy className="mr-2 size-3.5" />
Copy id
</DropdownMenuItem>
<ComingSoonItem
icon={Pencil}
label="Rename"
disabled={!allowRename}
/>
<ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
<DropdownMenuSeparator />
<DropdownMenuItem
onSelect={() => onDelete(agent)}
disabled={!allowDelete || deleting}
className="text-destructive focus:text-destructive"
>
{deleting ? (
<Loader2 className="mr-2 size-3.5 animate-spin" />
) : (
<Trash2 className="mr-2 size-3.5" />
)}
Delete
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
</div>
)
}
interface ComingSoonItemProps {
icon: typeof Pencil
label: string
disabled: boolean
}
const ComingSoonItem: FC<ComingSoonItemProps> = ({
icon: Icon,
label,
disabled,
}) => {
const item = (
<DropdownMenuItem disabled className="text-muted-foreground">
<Icon className="mr-2 size-3.5" />
{label}
</DropdownMenuItem>
)
if (!disabled) return item
return (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<span className="block w-full">{item}</span>
</TooltipTrigger>
<TooltipContent side="left" className="text-xs">
{label} coming soon
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
}

View File

@@ -0,0 +1,96 @@
import { AlertTriangle, ChevronDown } from 'lucide-react'
import { type FC, useEffect, useState } from 'react'
import { Button } from '@/components/ui/button'
import {
Collapsible,
CollapsibleContent,
CollapsibleTrigger,
} from '@/components/ui/collapsible'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { truncate } from './agent-row.helpers'
interface AgentErrorPanelProps {
agentId: string
message: string
errorAt: number | null
}
const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
const PREVIEW_CHARS = 200
export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
agentId,
message,
errorAt,
}) => {
const storageKey = `${STORAGE_PREFIX}${agentId}`
// Open if we've never seen this `errorAt` for this agent. Once the
// user collapses the panel (or refreshes after seeing it), we mark
// it seen so it doesn't re-pop on every poll.
const [open, setOpen] = useState<boolean>(() => {
if (typeof window === 'undefined' || !errorAt) return true
const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
return !Number.isFinite(seen) || errorAt > seen
})
useEffect(() => {
if (!open && errorAt && typeof window !== 'undefined') {
window.localStorage.setItem(storageKey, String(errorAt))
}
}, [open, errorAt, storageKey])
const preview = truncate(message, PREVIEW_CHARS)
const truncated = preview.length < message.length
return (
<Collapsible open={open} onOpenChange={setOpen} className="mt-3">
<div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
<div className="flex items-center gap-2 font-medium text-destructive text-xs">
<AlertTriangle className="size-3.5" />
Last error
</div>
<CollapsibleTrigger asChild>
<Button
variant="ghost"
size="sm"
className="h-6 px-2 text-muted-foreground"
>
<span className="text-xs">{open ? 'hide' : 'show'}</span>
<ChevronDown
className={cn(
'ml-1 size-3 transition-transform',
open && 'rotate-180',
)}
/>
</Button>
</CollapsibleTrigger>
</div>
<CollapsibleContent>
<div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
{truncated ? (
<HoverCard openDelay={300}>
<HoverCardTrigger asChild>
<span className="cursor-default font-mono text-foreground/80">
{preview}
</span>
</HoverCardTrigger>
<HoverCardContent
side="bottom"
className="max-w-md whitespace-pre-wrap font-mono text-xs"
>
{message}
</HoverCardContent>
</HoverCard>
) : (
<span className="font-mono text-foreground/80">{message}</span>
)}
</div>
</CollapsibleContent>
</Collapsible>
)
}

View File

@@ -0,0 +1,35 @@
import { Quote } from 'lucide-react'
import type { FC } from 'react'
import { firstNonBlankLine, truncate } from './agent-row.helpers'
interface AgentLastMessageProps {
message: string | null
}
const PREVIEW_CHARS = 110
/**
* Inline preview of the most recent user message. Renders as a quoted,
* italic line so the row reads like a conversation snippet rather than
* a label-and-value pair. No hover-card — opening the agent's chat is
* the canonical way to read the full message.
*/
export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
if (!message) {
return (
<p className="mt-1 text-muted-foreground/70 text-xs italic">
No messages yet start a chat
</p>
)
}
const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
return (
<p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
<Quote
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
aria-hidden
/>
<span className="truncate">{preview}</span>
</p>
)
}

View File

@@ -0,0 +1,37 @@
import type { FC } from 'react'
import { formatRelativeTime } from '../agent-display.helpers'
import { AgentTokenSummary } from './AgentTokenSummary'
import type { AgentTokenUsage } from './agent-row.types'
interface AgentMetaRowProps {
lastUsedAt: number | null
tokens: AgentTokenUsage | null
}
/**
* Bottom-of-row meta line. Intentionally sparse — last activity time
* and lifetime tokens. CWD is no longer surfaced here because the path
* the server happens to be running from isn't actionable; if a future
* surface needs the cwd (chat panel, debug view) it reads from the
* listing payload directly.
*/
export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
const lastUsedLabel = formatRelativeTime(lastUsedAt)
const tokensTotal =
(tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
const showTokens = tokensTotal > 0
return (
<div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
<span>{lastUsedLabel}</span>
{showTokens && (
<>
<span aria-hidden className="text-muted-foreground/50">
·
</span>
<AgentTokenSummary tokens={tokens} />
</>
)}
</div>
)
}

View File

@@ -0,0 +1,92 @@
import type { FC } from 'react'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
interface AgentSparklineProps {
/** 14 entries, oldest → newest. Today's bucket is the last index. */
turnsByDay: number[]
/** Same length, same order. Failed turns counted separately. */
failedByDay: number[]
className?: string
}
const MIN_BAR_HEIGHT_PX = 2
const MAX_BAR_HEIGHT_PX = 18
export const AgentSparkline: FC<AgentSparklineProps> = ({
turnsByDay,
failedByDay,
className,
}) => {
if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
const max = Math.max(1, ...turnsByDay)
return (
<HoverCard openDelay={250}>
<HoverCardTrigger asChild>
<div
role="img"
aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
className={cn('flex h-5 items-end gap-px', className)}
>
{turnsByDay.map((count, idx) => {
const ratio = count / max
const height = Math.max(
MIN_BAR_HEIGHT_PX,
Math.round(ratio * MAX_BAR_HEIGHT_PX),
)
const isToday = idx === ROW_BAR_COUNT - 1
const failed = failedByDay[idx] ?? 0
return (
<div
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
key={`bar-${idx}`}
className={cn(
'w-1.5 rounded-sm',
count === 0
? 'bg-muted-foreground/15'
: failed > 0
? 'bg-destructive/50'
: 'bg-[var(--accent-orange)]/50',
isToday && 'ring-1 ring-foreground/30',
)}
style={{ height }}
/>
)
})}
</div>
</HoverCardTrigger>
<HoverCardContent side="left" className="w-56 text-xs">
<div className="mb-2 font-medium text-sm">Last 14 days</div>
<ul className="space-y-0.5">
{turnsByDay.map((count, idx) => {
const failed = failedByDay[idx] ?? 0
const dayLabel = formatLocalDate(idx)
return (
<li
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
key={`day-${idx}`}
className="flex items-center justify-between text-muted-foreground"
>
<span>{dayLabel}</span>
<span>
{count}
{failed > 0 && (
<span className="ml-1 text-destructive">
({failed} failed)
</span>
)}
</span>
</li>
)
})}
</ul>
</HoverCardContent>
</HoverCard>
)
}

View File

@@ -0,0 +1,71 @@
import { TriangleAlert } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { adapterLabel } from '../AdapterIcon'
import type { HarnessAgentAdapter } from '../agent-harness-types'
import type { AgentAdapterHealth } from './agent-row.types'
interface AgentSummaryChipsProps {
adapter: HarnessAgentAdapter | 'unknown'
modelLabel: string | null
reasoningEffort: string | null
/** When unhealthy, the adapter label dims and a warning chip appears. */
adapterHealth: AgentAdapterHealth | null
}
/**
* Adapter / model / reasoning summary line. Always rendered (so OpenClaw
* rows that fall back to defaults still expose what they're set up to do)
* and surfaces adapter-health *only when unhealthy* — keeping the calm
* default state silent and reserving visual noise for things the user
* needs to act on.
*/
export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
adapter,
modelLabel,
reasoningEffort,
adapterHealth,
}) => {
const parts = [adapterLabel(adapter)]
if (modelLabel) parts.push(modelLabel)
if (reasoningEffort) parts.push(reasoningEffort)
const unhealthy = adapterHealth?.healthy === false
return (
<div
className={cn(
'flex items-center gap-1.5 text-muted-foreground text-xs',
unhealthy && 'text-muted-foreground/70',
)}
>
<span className="truncate">{parts.join(' · ')}</span>
{unhealthy && adapterHealth && (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge
variant="outline"
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
>
<TriangleAlert className="size-2.5" />
<span className="font-normal">Unavailable</span>
</Badge>
</HoverCardTrigger>
<HoverCardContent side="right" className="w-72 text-sm">
<div className="font-medium">
{adapterLabel(adapter)} CLI not available
</div>
<div className="mt-1 text-muted-foreground text-xs">
{adapterHealth.reason ??
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
</div>
</HoverCardContent>
</HoverCard>
)}
</div>
)
}

View File

@@ -0,0 +1,37 @@
import type { FC } from 'react'
import { cn } from '@/lib/utils'
import { AdapterIcon } from '../AdapterIcon'
import { livenessDetail } from '../agent-display.helpers'
import type { HarnessAgentAdapter } from '../agent-harness-types'
import { type AgentLiveness, LivenessDot } from '../LivenessDot'
export interface AgentTileProps {
adapter: HarnessAgentAdapter | 'unknown'
status: AgentLiveness
lastUsedAt: number | null
}
/**
* Adapter glyph + a single liveness dot. Adapter health is no longer
* surfaced here — it lives as an inline pill inside `AgentSummaryChips`
* so the user isn't asked to disambiguate two dots on the same tile.
*/
export const AgentTile: FC<AgentTileProps> = ({
adapter,
status,
lastUsedAt,
}) => (
<div className="relative shrink-0">
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
<AdapterIcon adapter={adapter} className="h-6 w-6" />
</div>
<LivenessDot
status={status}
detail={livenessDetail(status, lastUsedAt)}
className={cn(
'absolute -right-0.5 -bottom-0.5',
status === 'working' && 'animate-pulse',
)}
/>
</div>
)

View File

@@ -0,0 +1,55 @@
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { displayName } from '../agent-display.helpers'
import type { AgentListItem } from '../agents-page-types'
import type { AgentLiveness } from '../LivenessDot'
import { AgentSparkline } from './AgentSparkline'
import { PinToggle } from './PinToggle'
interface AgentTitleRowProps {
agent: AgentListItem
status: AgentLiveness
pinned: boolean
turnsByDay: number[]
failedByDay: number[]
onPinToggle: (next: boolean) => void
}
/**
* Title strip: name + status badge + (right-aligned) sparkline. The
* pin toggle sits trailing the title so the title always flushes left
* regardless of pin state — moving the star left of the title indents
* the row's first line off-axis from the model/preview/meta lines
* below it. When unpinned and not hovered, the toggle is removed from
* layout entirely so it reserves no space at all.
*/
export const AgentTitleRow: FC<AgentTitleRowProps> = ({
agent,
status,
pinned,
turnsByDay,
failedByDay,
onPinToggle,
}) => (
<div className="mb-1 flex items-center gap-2">
<span className="truncate font-semibold">{displayName(agent)}</span>
{status === 'working' && (
<Badge
variant="secondary"
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
{status === 'asleep' && (
<Badge variant="outline" className="text-muted-foreground">
Asleep
</Badge>
)}
{status === 'error' && <Badge variant="destructive">Attention</Badge>}
<PinToggle pinned={pinned} onToggle={onPinToggle} />
<div className="ml-auto">
<AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
</div>
</div>
)

View File

@@ -0,0 +1,63 @@
import type { FC } from 'react'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { Progress } from '@/components/ui/progress'
import { formatTokens } from './agent-row.helpers'
import type { AgentTokenUsage } from './agent-row.types'
interface AgentTokenSummaryProps {
tokens: AgentTokenUsage | null
}
/**
* Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
* (the only window we can compute reliably from the session record).
* Per-window stats land in a follow-up once the activity ledger ships.
*/
export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
if (!tokens) return null
const { input, output } = tokens.cumulative
const total = input + output
if (total === 0) return null
const inputPct = (input / total) * 100
return (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
{formatTokens(total)} tokens
</span>
</HoverCardTrigger>
<HoverCardContent side="top" align="end" className="w-72 text-sm">
<div className="mb-3 flex items-center justify-between">
<span className="font-medium">Lifetime tokens</span>
<span className="text-muted-foreground text-xs tabular-nums">
{formatTokens(total)} total
</span>
</div>
<div className="space-y-2">
<div className="flex items-center justify-between text-xs">
<span className="text-muted-foreground">Input</span>
<span className="tabular-nums">{formatTokens(input)}</span>
</div>
<Progress value={inputPct} className="h-1.5" />
<div className="mt-2 flex items-center justify-between text-xs">
<span className="text-muted-foreground">Output</span>
<span className="tabular-nums">{formatTokens(output)}</span>
</div>
<Progress value={100 - inputPct} className="h-1.5" />
</div>
<p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
Cumulative across every turn this agent has run. Per-window stats
arrive in a future release.
</p>
</HoverCardContent>
</HoverCard>
)
}

View File

@@ -0,0 +1,60 @@
import { Star } from 'lucide-react'
import type { FC } from 'react'
import { Button } from '@/components/ui/button'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import { cn } from '@/lib/utils'
interface PinToggleProps {
pinned: boolean
onToggle: (next: boolean) => void
}
/**
* Trailing star toggle. The button is *always rendered* — only its
* opacity changes between pinned/unpinned/hover states — so the title
* row's height is constant. Hiding the slot via `display: none` would
* collapse the row's vertical metrics on hover and shift every card
* below in the rail.
*
* Placement is trailing the title (after the status badge) so the
* title itself flushes left regardless of pin state — leading the
* row with the star would indent the title relative to the model /
* preview / meta lines beneath it.
*/
export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<Button
variant="ghost"
size="icon"
className={cn(
'size-6 text-muted-foreground transition-opacity hover:text-foreground',
pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
)}
aria-pressed={pinned}
aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
onClick={(event) => {
event.stopPropagation()
onToggle(!pinned)
}}
>
<Star
className={cn(
'size-3.5',
pinned && 'fill-amber-400 text-amber-500',
)}
/>
</Button>
</TooltipTrigger>
<TooltipContent side="top" className="text-xs">
{pinned ? 'Unpin' : 'Pin to top'}
</TooltipContent>
</Tooltip>
</TooltipProvider>
)

View File

@@ -0,0 +1,73 @@
import { describe, expect, it } from 'bun:test'
import {
firstNonBlankLine,
formatLocalDate,
formatTokens,
ROW_BAR_COUNT,
truncate,
} from './agent-row.helpers'
describe('formatTokens', () => {
it('renders zero / NaN as "0"', () => {
expect(formatTokens(0)).toBe('0')
expect(formatTokens(Number.NaN)).toBe('0')
})
it('renders sub-1K as integer', () => {
expect(formatTokens(142)).toBe('142')
})
it('renders K with one decimal under 10', () => {
expect(formatTokens(8_400)).toBe('8.4K')
})
it('drops the decimal at >=10K', () => {
expect(formatTokens(120_000)).toBe('120K')
})
it('renders M with one decimal under 10', () => {
expect(formatTokens(1_200_000)).toBe('1.2M')
})
})
describe('firstNonBlankLine', () => {
it('returns the first non-blank line', () => {
expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
})
it('skips USER_QUERY envelope tags', () => {
expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
'fix tests',
)
})
it('falls back to the trimmed input when nothing matches', () => {
expect(firstNonBlankLine(' single ')).toBe('single')
})
})
describe('truncate', () => {
it('returns input unchanged when within limit', () => {
expect(truncate('hello', 10)).toBe('hello')
})
it('appends an ellipsis when over limit', () => {
expect(truncate('hello world', 6)).toBe('hello…')
})
})
describe('formatLocalDate', () => {
const today = new Date('2026-04-30T12:00:00Z')
it('labels today and yesterday explicitly', () => {
expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
})
it('returns a "Mon D" format for older days', () => {
const label = formatLocalDate(0, today)
// "Apr 17" or "Apr 17," depending on locale; just assert it
// contains a month abbreviation and a day number.
expect(label).toMatch(/[A-Za-z]+ \d+/)
})
})

View File

@@ -0,0 +1,64 @@
/**
* Pure formatters consumed by row sub-components. Kept distinct from
* `agent-display.helpers.ts` (page-level helpers) so the row internals
* have an obvious single home.
*/
const TOKEN_THRESHOLDS: Array<[number, string]> = [
[1_000_000, 'M'],
[1_000, 'K'],
]
/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
export function formatTokens(n: number): string {
if (!Number.isFinite(n) || n <= 0) return '0'
for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
if (n >= threshold) {
const value = n / threshold
const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
return `${decimal}${suffix}`
}
}
return String(Math.round(n))
}
const USER_QUERY_OPEN = /^<USER_QUERY>$/i
const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
/**
* First non-blank line, with the BrowserOS user-system-prompt
* `<USER_QUERY>` envelope tags stripped so previews don't show
* structural noise.
*/
export function firstNonBlankLine(text: string): string {
const lines = text.split('\n').map((line) => line.trim())
for (const line of lines) {
if (!line) continue
if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
return line
}
return text.trim()
}
export function truncate(text: string, max: number): string {
if (text.length <= max) return text
return `${text.slice(0, max - 1).trimEnd()}`
}
const SPARKLINE_DAYS = 14
/**
* "today" / "yesterday" / "Apr 17" — given an index 0..13 from
* oldest → newest. `today` defaults to `new Date()` so callers don't
* have to thread a clock through.
*/
export function formatLocalDate(idx: number, today: Date = new Date()): string {
if (idx === SPARKLINE_DAYS - 1) return 'today'
if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
const offset = SPARKLINE_DAYS - 1 - idx
const date = new Date(today)
date.setDate(date.getDate() - offset)
return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
}
export const ROW_BAR_COUNT = SPARKLINE_DAYS

View File

@@ -0,0 +1,51 @@
import type { HarnessAgentAdapter } from '../agent-harness-types'
import type { AgentListItem } from '../agents-page-types'
import type { AgentLiveness } from '../LivenessDot'
/**
* Window-bounded token usage. Server returns `null` when no session
* record exists yet for the agent.
*/
export interface AgentTokenUsage {
last7d: { input: number; output: number; requestCount: number }
cumulative: { input: number; output: number }
}
export interface AgentAdapterHealth {
healthy: boolean
reason?: string
}
/**
* Everything an `AgentRowCard` needs to render. Mirrors the shape
* `useHarnessAgents` exposes; the page assembles one entry per row in
* `AgentList` and passes it down. Sub-components only see slices of
* this object — no prop drilling beyond two levels.
*/
export interface AgentRowData {
agent: AgentListItem
adapter: HarnessAgentAdapter | 'unknown'
modelLabel: string | null
reasoningEffort: string | null
status: AgentLiveness
lastUsedAt: number | null
pinned: boolean
cwd: string | null
lastUserMessage: string | null
tokens: AgentTokenUsage | null
/** 14 entries, oldest → newest. Today is the last index. */
turnsByDay: number[]
/** Same length and ordering as `turnsByDay`. */
failedByDay: number[]
lastError: string | null
lastErrorAt: number | null
/** When non-null, an in-flight turn this row can be resumed from. */
activeTurnId: string | null
/** Adapter-level health, shared across rows for the same adapter. */
adapterHealth: AgentAdapterHealth | null
}
export interface AgentRowCallbacks {
onDelete: (agent: AgentListItem) => void
onPinToggle: (agent: AgentListItem, next: boolean) => void
}

View File

@@ -138,24 +138,20 @@ export function getVisibleOpenClawAgents(
}
export function getAgentsLoading(input: {
statusLoading: boolean
adaptersLoading: boolean
harnessAgentsLoading: boolean
openClawAgentsEnabled: boolean
openClawAgentsLoading: boolean
}): boolean {
return (
input.statusLoading ||
input.adaptersLoading ||
input.harnessAgentsLoading ||
(input.openClawAgentsEnabled && input.openClawAgentsLoading)
input.openClawAgentsLoading
)
}
export function getInlineError(input: {
lifecyclePending: boolean
pageError: string | null
statusError: Error | null
openClawAgentsError: Error | null
adaptersError: Error | null
harnessAgentsError: Error | null
@@ -163,7 +159,6 @@ export function getInlineError(input: {
if (input.lifecyclePending) return null
return (
input.pageError ??
input.statusError?.message ??
input.openClawAgentsError?.message ??
input.adaptersError?.message ??
input.harnessAgentsError?.message ??

View File

@@ -8,8 +8,20 @@ import {
type HarnessAdapterDescriptor,
type HarnessAgent,
type HarnessAgentHistoryPage,
type HarnessQueuedMessage,
mapHarnessAgentToEntry,
} from './agent-harness-types'
import type { OpenClawStatus } from './useOpenClaw'
/**
* Combined response shape of `GET /agents`. The page polls this once
* and consumes both fields, replacing the dedicated `/claw/status`
* poll the previous design carried.
*/
interface HarnessAgentsResponse {
agents: HarnessAgent[]
gateway: OpenClawStatus | null
}
export type { AgentHarnessStreamEvent }
@@ -69,21 +81,31 @@ export function useHarnessAgents(enabled = true) {
error: urlError,
} = useAgentServerUrl()
const query = useQuery<HarnessAgent[], Error>({
const query = useQuery<HarnessAgentsResponse, Error>({
queryKey: [AGENT_QUERY_KEYS.agents, baseUrl],
queryFn: async () => {
const data = await agentsFetch<{ agents: HarnessAgent[] }>(
const data = await agentsFetch<HarnessAgentsResponse>(
baseUrl as string,
'/',
)
return data.agents ?? []
return {
agents: data.agents ?? [],
gateway: data.gateway ?? null,
}
},
enabled: Boolean(baseUrl) && !urlLoading && enabled,
// Poll every 5s so the per-agent liveness state (working / idle /
// asleep / error) and last-used timestamps stay fresh without a
// websocket. `refetchIntervalInBackground: false` lets a hidden
// tab go quiet — react-query's default, made explicit.
refetchInterval: 5_000,
refetchIntervalInBackground: false,
})
return {
agents: (query.data ?? []).map(mapHarnessAgentToEntry),
harnessAgents: query.data ?? [],
agents: (query.data?.agents ?? []).map(mapHarnessAgentToEntry),
harnessAgents: query.data?.agents ?? [],
gateway: query.data?.gateway ?? null,
loading: query.isLoading || urlLoading,
error: query.error ?? urlError,
refetch: query.refetch,
@@ -114,6 +136,63 @@ export function useCreateHarnessAgent() {
})
}
/**
* Apply a partial update to a harness agent. Used by the pin-toggle
* star and (eventually) the inline rename UI. Optimistically writes
* the patch into the listing query cache so the row updates instantly,
* then rolls back if the server rejects the change.
*/
export function useUpdateHarnessAgent() {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: {
agentId: string
patch: { name?: string; pinned?: boolean }
}) => {
if (!baseUrl || urlLoading) {
throw new Error('BrowserOS agent server URL is not ready')
}
const data = await agentsFetch<{ agent: HarnessAgent }>(
baseUrl,
`/${encodeURIComponent(input.agentId)}`,
{
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(input.patch),
},
)
return data.agent
},
onMutate: async ({ agentId, patch }) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === agentId ? { ...agent, ...patch } : agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}
export function useDeleteHarnessAgent() {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
@@ -141,16 +220,97 @@ export async function chatWithHarnessAgent(
agentId: string,
message: string,
signal?: AbortSignal,
attachments?: ReadonlyArray<unknown>,
): Promise<Response> {
const baseUrl = await getAgentServerUrl()
return fetch(`${baseUrl}/agents/${encodeURIComponent(agentId)}/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
body: JSON.stringify({
message,
...(attachments && attachments.length > 0 ? { attachments } : {}),
}),
signal,
})
}
/**
* Subscribe to an existing turn (the server's `ActiveTurnRegistry`
* decoupled the turn lifecycle from POST /chat). `lastSeq` lets the
* client resume after a disconnect — the server replays buffered
* frames with seq > lastSeq, then tails new ones.
*/
export async function attachToHarnessTurn(
agentId: string,
options: { turnId?: string; lastSeq?: number; signal?: AbortSignal } = {},
): Promise<Response> {
const baseUrl = await getAgentServerUrl()
const url = new URL(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/chat/stream`,
)
if (options.turnId) url.searchParams.set('turnId', options.turnId)
const headers: Record<string, string> = {}
if (typeof options.lastSeq === 'number') {
headers['Last-Event-ID'] = String(options.lastSeq)
}
return fetch(url.toString(), { signal: options.signal, headers })
}
export interface HarnessActiveTurnInfo {
turnId: string
agentId: string
sessionId: 'main'
status: 'running' | 'done' | 'error' | 'cancelled'
lastSeq: number
startedAt: number
endedAt?: number
/** User message that kicked off the turn; null when not captured. */
prompt: string | null
}
/**
* Discover an in-flight turn for an agent. Used on chat mount so the
* UI reattaches instead of starting a new turn after a tab/refresh.
*/
export async function fetchActiveHarnessTurn(
agentId: string,
): Promise<HarnessActiveTurnInfo | null> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/chat/active`,
)
if (!response.ok) return null
const body = (await response.json()) as {
active: HarnessActiveTurnInfo | null
}
return body.active
}
/**
* Stop button. Hits the explicit cancel endpoint instead of just
* aborting the fetch (which now only detaches *this* subscriber from
* the buffer; the underlying turn would otherwise keep running).
*/
export async function cancelHarnessTurn(
agentId: string,
options: { turnId?: string; reason?: string } = {},
): Promise<{ cancelled: boolean }> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/chat/cancel`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
...(options.turnId ? { turnId: options.turnId } : {}),
...(options.reason ? { reason: options.reason } : {}),
}),
},
)
if (!response.ok) return { cancelled: false }
return (await response.json()) as { cancelled: boolean }
}
export async function fetchHarnessAgentHistory(
agentId: string,
): Promise<HarnessAgentHistoryPage> {
@@ -160,3 +320,145 @@ export async function fetchHarnessAgentHistory(
`/${encodeURIComponent(agentId)}/sessions/main/history`,
)
}
export interface EnqueueMessageInput {
message: string
attachments?: ReadonlyArray<unknown>
}
export async function enqueueHarnessMessage(
agentId: string,
input: EnqueueMessageInput,
): Promise<HarnessQueuedMessage> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: input.message,
...(input.attachments && input.attachments.length > 0
? { attachments: input.attachments }
: {}),
}),
},
)
if (!response.ok) {
let message = `Request failed with status ${response.status}`
try {
const body = (await response.json()) as { error?: string }
if (body.error) message = body.error
} catch {}
throw new Error(message)
}
const body = (await response.json()) as { queued: HarnessQueuedMessage }
return body.queued
}
export async function removeHarnessQueuedMessage(
agentId: string,
messageId: string,
): Promise<{ removed: boolean }> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
messageId,
)}`,
{ method: 'DELETE' },
)
if (!response.ok) return { removed: false }
return (await response.json()) as { removed: boolean }
}
/**
* Optimistic enqueue: writes the new queued message into the listing
* cache immediately so the queue panel reflects the change without
* waiting for the next poll. Rolls back if the server rejects.
*/
export function useEnqueueHarnessMessage() {
const { baseUrl } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
enqueueHarnessMessage(input.agentId, input),
onMutate: async (input) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
const optimistic: HarnessQueuedMessage = {
id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
createdAt: Date.now(),
message: input.message,
}
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === input.agentId
? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
: agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}
/**
* Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
*/
export function useRemoveHarnessQueuedMessage() {
const { baseUrl } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: { agentId: string; messageId: string }) =>
removeHarnessQueuedMessage(input.agentId, input.messageId),
onMutate: async (input) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === input.agentId
? {
...agent,
queue: (agent.queue ?? []).filter(
(entry) => entry.id !== input.messageId,
),
}
: agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}

View File

@@ -1,5 +1,4 @@
import { useMutation, useQuery, useQueryClient } from '@tanstack/react-query'
import { getAgentServerUrl } from '@/lib/browseros/helpers'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
export interface AgentEntry {
@@ -319,25 +318,3 @@ export function buildChatHistoryFromTurns(
return messages
}
export async function chatWithAgent(
agentId: string,
message: string,
sessionKey?: string,
history: OpenClawChatHistoryMessage[] = [],
signal?: AbortSignal,
attachments?: ReadonlyArray<unknown>,
): Promise<Response> {
const baseUrl = await getAgentServerUrl()
return fetch(`${baseUrl}/claw/agents/${agentId}/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message,
sessionKey,
history,
...(attachments && attachments.length > 0 ? { attachments } : {}),
}),
signal,
})
}

View File

@@ -164,9 +164,17 @@ export const NewScheduledTaskDialog: FC<NewScheduledTaskDialogProps> = ({
const resolvedProvider: Provider | null = (() => {
const id = selectedProviderId ?? defaultProviderId
const found = providers.find((p) => p.id === id)
if (found) return { id: found.id, name: found.name, type: found.type }
if (found) {
return {
kind: 'llm' as const,
id: found.id,
name: found.name,
type: found.type,
}
}
if (providers[0])
return {
kind: 'llm' as const,
id: providers[0].id,
name: providers[0].name,
type: providers[0].type,
@@ -175,6 +183,7 @@ export const NewScheduledTaskDialog: FC<NewScheduledTaskDialogProps> = ({
})()
const providerOptions: Provider[] = providers.map((p) => ({
kind: 'llm',
id: p.id,
name: p.name,
type: p.type,

View File

@@ -1,4 +1,4 @@
import { Github, History, Plus, SettingsIcon } from 'lucide-react'
import { Bot, Github, History, Plus, SettingsIcon } from 'lucide-react'
import type { FC } from 'react'
import { Link, useLocation, useNavigate } from 'react-router'
import { ChatProviderSelector } from '@/components/chat/ChatProviderSelector'
@@ -64,7 +64,9 @@ export const ChatHeader: FC<ChatHeaderProps> = ({
className="group relative inline-flex cursor-pointer items-center gap-2 rounded-lg p-2 text-muted-foreground transition-colors hover:bg-muted/50 hover:text-foreground data-[state=open]:bg-accent"
title="Change AI Provider"
>
{selectedProvider.type === 'browseros' ? (
{selectedProvider.kind === 'acp' ? (
<Bot className="h-[18px] w-[18px]" />
) : selectedProvider.type === 'browseros' ? (
<BrowserOSIcon size={18} />
) : (
<ProviderIcon

View File

@@ -0,0 +1,258 @@
import { describe, expect, it } from 'bun:test'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import {
buildSidepanelChatTargets,
persistSidepanelChatTargetSelection,
resolveSidepanelChatTarget,
type SidepanelChatTargetSelection,
toLlmProviderConfig,
} from './sidepanel-chat-targets'
const timestamp = 1000
const providers: LlmProviderConfig[] = [
{
id: 'browseros',
type: 'browseros',
name: 'BrowserOS',
baseUrl: 'https://api.browseros.com/v1',
modelId: 'browseros-auto',
supportsImages: true,
contextWindow: 200000,
temperature: 0.2,
createdAt: timestamp,
updatedAt: timestamp,
},
{
id: 'anthropic-sonnet',
type: 'anthropic',
name: 'Anthropic Sonnet',
modelId: 'claude-sonnet-4-6',
apiKey: 'sk-ant',
supportsImages: true,
contextWindow: 200000,
temperature: 0.2,
createdAt: timestamp,
updatedAt: timestamp,
},
]
const adapters: HarnessAdapterDescriptor[] = [
{
id: 'claude',
name: 'Claude Code',
defaultModelId: 'haiku',
defaultReasoningEffort: 'medium',
modelControl: 'best-effort',
models: [
{ id: 'sonnet', label: 'Sonnet' },
{ id: 'haiku', label: 'Haiku', recommended: true },
],
reasoningEfforts: [
{ id: 'medium', label: 'Medium', recommended: true },
{ id: 'high', label: 'High' },
],
},
{
id: 'codex',
name: 'Codex',
defaultModelId: 'gpt-5.5',
defaultReasoningEffort: 'medium',
modelControl: 'runtime-supported',
models: [{ id: 'gpt-5.5', label: 'GPT-5.5', recommended: true }],
reasoningEfforts: [{ id: 'medium', label: 'Medium', recommended: true }],
},
{
id: 'openclaw',
name: 'OpenClaw',
defaultModelId: 'default',
defaultReasoningEffort: 'medium',
modelControl: 'best-effort',
models: [],
reasoningEfforts: [
{ id: 'medium', label: 'Medium', recommended: true },
{ id: 'high', label: 'High' },
],
},
]
const agents: HarnessAgent[] = [
{
id: 'agent-codex',
name: 'Review Bot',
adapter: 'codex',
modelId: 'gpt-5.5',
reasoningEffort: 'medium',
permissionMode: 'approve-all',
sessionKey: 'agent:agent-codex:main',
createdAt: timestamp,
updatedAt: timestamp,
},
{
id: 'agent-openclaw',
name: 'Research Claw',
adapter: 'openclaw',
modelId: 'default',
reasoningEffort: 'high',
permissionMode: 'approve-all',
sessionKey: 'agent:agent-openclaw:main',
createdAt: timestamp,
updatedAt: timestamp,
},
]
describe('buildSidepanelChatTargets', () => {
it('returns LLM targets plus one ACP target per persisted harness agent', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
expect(targets.map((target) => target.id)).toEqual([
'browseros',
'anthropic-sonnet',
'agent-codex',
'agent-openclaw',
])
})
it('does not emit catalog-only ACP targets without persisted agents', () => {
const targets = buildSidepanelChatTargets({
providers,
adapters,
agents: [],
})
expect(targets.map((target) => target.id)).toEqual([
'browseros',
'anthropic-sonnet',
])
})
it('uses the created OpenClaw agent name instead of a generic adapter target', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const openclaw = targets.find((target) => target.id === 'agent-openclaw')
expect(openclaw).toMatchObject({
kind: 'acp',
id: 'agent-openclaw',
agentId: 'agent-openclaw',
adapter: 'openclaw',
adapterName: 'OpenClaw',
modelId: 'default',
modelLabel: 'default',
name: 'Research Claw',
modelControl: 'best-effort',
reasoningEffort: 'high',
})
})
it('preserves adapter metadata for created agent targets', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const codex = targets.find((target) => target.id === 'agent-codex')
expect(codex).toMatchObject({
kind: 'acp',
agentId: 'agent-codex',
adapter: 'codex',
adapterName: 'Codex',
modelId: 'gpt-5.5',
modelLabel: 'GPT-5.5',
modelControl: 'runtime-supported',
recommended: true,
reasoningEffort: 'medium',
reasoningEffortLabel: 'Medium',
})
})
it('still returns LLM targets when agents and adapters are unavailable', () => {
expect(
buildSidepanelChatTargets({ providers, adapters: [], agents: [] }),
).toEqual([
{
kind: 'llm',
id: 'browseros',
name: 'BrowserOS',
type: 'browseros',
provider: providers[0],
},
{
kind: 'llm',
id: 'anthropic-sonnet',
name: 'Anthropic Sonnet',
type: 'anthropic',
provider: providers[1],
},
])
})
})
describe('resolveSidepanelChatTarget', () => {
it('resolves selected LLM targets back to their provider config', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const resolved = resolveSidepanelChatTarget({
targets,
defaultProviderId: 'browseros',
selection: { kind: 'llm', id: 'anthropic-sonnet' },
})
expect(resolved?.kind).toBe('llm')
expect(toLlmProviderConfig(resolved)?.modelId).toBe('claude-sonnet-4-6')
})
it('falls back to the current default LLM provider when a persisted ACP target is stale', () => {
const targets = buildSidepanelChatTargets({
providers,
adapters,
agents: [],
})
expect(
resolveSidepanelChatTarget({
targets,
defaultProviderId: 'anthropic-sonnet',
selection: { kind: 'acp', id: 'agent-codex' },
}),
).toMatchObject({
kind: 'llm',
id: 'anthropic-sonnet',
})
})
it('falls back when an old catalog-style ACP target id is persisted', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
expect(
resolveSidepanelChatTarget({
targets,
defaultProviderId: 'anthropic-sonnet',
selection: { kind: 'acp', id: 'acp:codex:gpt-5.5:medium' },
}),
).toMatchObject({
kind: 'llm',
id: 'anthropic-sonnet',
})
})
})
describe('persistSidepanelChatTargetSelection', () => {
it('stores only target identity and does not mutate LLM provider arrays', async () => {
let savedSelection: SidepanelChatTargetSelection | null = null
const originalProviders = providers.map((provider) => ({ ...provider }))
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const target = targets.find((candidate) => candidate.id === 'agent-codex')
await persistSidepanelChatTargetSelection(target, {
setValue: async (value) => {
savedSelection = value
},
})
expect(savedSelection as SidepanelChatTargetSelection | null).toEqual({
kind: 'acp',
id: 'agent-codex',
})
expect(providers).toEqual(originalProviders)
})
})

View File

@@ -0,0 +1,178 @@
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
export type SidepanelTargetKind = 'llm' | 'acp'
export type SidepanelChatTarget =
| {
kind: 'llm'
id: string
name: string
type: ProviderType
provider: LlmProviderConfig
}
| {
kind: 'acp'
id: string
name: string
type: 'acp'
agentId: string
adapter: HarnessAgentAdapter
adapterName: string
modelId: string
modelLabel: string
modelControl: HarnessAdapterDescriptor['modelControl']
recommended?: boolean
reasoningEffort: string
reasoningEffortLabel?: string
}
export type SidepanelChatTargetSelection = Pick<
SidepanelChatTarget,
'kind' | 'id'
>
interface BuildSidepanelChatTargetsInput {
providers: LlmProviderConfig[]
adapters: HarnessAdapterDescriptor[]
agents?: HarnessAgent[]
}
interface ResolveSidepanelChatTargetInput {
targets: SidepanelChatTarget[]
defaultProviderId: string
selection?: SidepanelChatTargetSelection | null
}
interface SidepanelChatTargetSelectionWriter {
setValue(value: SidepanelChatTargetSelection | null): Promise<void>
}
interface SidepanelChatTargetSelectionReader {
getValue(): Promise<SidepanelChatTargetSelection | null>
}
type SidepanelChatTargetSelectionStore = SidepanelChatTargetSelectionReader &
SidepanelChatTargetSelectionWriter
let sidepanelChatTargetSelectionStorage:
| SidepanelChatTargetSelectionStore
| undefined
export function buildSidepanelChatTargets({
providers,
adapters,
agents = [],
}: BuildSidepanelChatTargetsInput): SidepanelChatTarget[] {
return [
...providers.map(toLlmTarget),
...agents.map((agent) => toAcpTargetForAgent(agent, adapters)),
]
}
function toAcpTargetForAgent(
agent: HarnessAgent,
adapters: HarnessAdapterDescriptor[],
): SidepanelChatTarget {
const adapter = adapters.find((entry) => entry.id === agent.adapter)
const modelId = agent.modelId ?? adapter?.defaultModelId ?? 'default'
const reasoningEffort =
agent.reasoningEffort ?? adapter?.defaultReasoningEffort ?? 'medium'
const model = adapter?.models.find((entry) => entry.id === modelId)
const reasoning = adapter?.reasoningEfforts.find(
(effort) => effort.id === reasoningEffort,
)
return {
kind: 'acp',
id: agent.id,
name: agent.name,
type: 'acp',
agentId: agent.id,
adapter: agent.adapter,
adapterName: adapter?.name ?? formatAdapterName(agent.adapter),
modelId,
modelLabel: model?.label ?? modelId,
modelControl: adapter?.modelControl ?? 'best-effort',
recommended: model?.recommended,
reasoningEffort,
reasoningEffortLabel: reasoning?.label,
}
}
function formatAdapterName(adapter: HarnessAgentAdapter): string {
if (adapter === 'claude') return 'Claude Code'
if (adapter === 'codex') return 'Codex'
if (adapter === 'openclaw') return 'OpenClaw'
return adapter
}
export function resolveSidepanelChatTarget({
targets,
defaultProviderId,
selection,
}: ResolveSidepanelChatTargetInput): SidepanelChatTarget | undefined {
if (selection) {
const selected = targets.find(
(target) => target.kind === selection.kind && target.id === selection.id,
)
if (selected) return selected
}
return (
targets.find(
(target) => target.kind === 'llm' && target.id === defaultProviderId,
) ?? targets.find((target) => target.kind === 'llm')
)
}
export function toLlmProviderConfig(
target: SidepanelChatTarget | undefined,
): LlmProviderConfig | undefined {
return target?.kind === 'llm' ? target.provider : undefined
}
export async function persistSidepanelChatTargetSelection(
target: SidepanelChatTarget | undefined,
store?: SidepanelChatTargetSelectionWriter,
): Promise<void> {
const targetStore = store ?? (await getSidepanelChatTargetSelectionStorage())
await targetStore.setValue(
target ? { kind: target.kind, id: target.id } : null,
)
}
export async function loadSidepanelChatTargetSelection(
store?: SidepanelChatTargetSelectionReader,
): Promise<SidepanelChatTargetSelection | null> {
const targetStore = store ?? (await getSidepanelChatTargetSelectionStorage())
return targetStore.getValue()
}
function toLlmTarget(provider: LlmProviderConfig): SidepanelChatTarget {
return {
kind: 'llm',
id: provider.id,
name: provider.name,
type: provider.type,
provider,
}
}
async function getSidepanelChatTargetSelectionStorage(): Promise<SidepanelChatTargetSelectionStore> {
if (sidepanelChatTargetSelectionStorage) {
return sidepanelChatTargetSelectionStorage
}
const { storage } = await import('@wxt-dev/storage')
sidepanelChatTargetSelectionStorage =
storage.defineItem<SidepanelChatTargetSelection | null>(
'local:sidepanel-chat-target-selection',
{ fallback: null },
)
return sidepanelChatTargetSelectionStorage
}

View File

@@ -1,9 +1,21 @@
import { useEffect, useRef } from 'react'
import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
import useDeepCompareEffect from 'use-deep-compare-effect'
import {
useAgentAdapters,
useHarnessAgents,
} from '@/entrypoints/app/agents/useAgents'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
import { type McpServer, useMcpServers } from '@/lib/mcp/mcpServerStorage'
import { usePersonalization } from '@/lib/personalization/personalizationStorage'
import {
buildSidepanelChatTargets,
loadSidepanelChatTargetSelection,
persistSidepanelChatTargetSelection,
resolveSidepanelChatTarget,
type SidepanelChatTarget,
type SidepanelChatTargetSelection,
} from './sidepanel-chat-targets'
const constructMcpServers = (servers: McpServer[]) => {
return servers
@@ -23,14 +35,53 @@ const constructCustomServers = (servers: McpServer[]) => {
export const useChatRefs = () => {
const { servers: mcpServers } = useMcpServers()
const {
providers: llmProviders,
selectedProvider: selectedLlmProvider,
setDefaultProvider,
isLoading: isLoadingProviders,
} = useLlmProviders()
const { adapters, loading: isLoadingAdapters } = useAgentAdapters()
const { harnessAgents, loading: isLoadingAgents } = useHarnessAgents()
const { personalization } = usePersonalization()
const [targetSelection, setTargetSelection] =
useState<SidepanelChatTargetSelection | null>(null)
useEffect(() => {
let cancelled = false
loadSidepanelChatTargetSelection().then((selection) => {
if (!cancelled) setTargetSelection(selection)
})
return () => {
cancelled = true
}
}, [])
const chatTargets = useMemo(
() =>
buildSidepanelChatTargets({
providers: llmProviders,
adapters,
agents: harnessAgents,
}),
[llmProviders, adapters, harnessAgents],
)
const selectedChatTarget = useMemo(
() =>
resolveSidepanelChatTarget({
targets: chatTargets,
defaultProviderId: selectedLlmProvider?.id ?? llmProviders[0]?.id ?? '',
selection: targetSelection,
}),
[chatTargets, llmProviders, selectedLlmProvider, targetSelection],
)
const selectedLlmProviderRef = useRef<LlmProviderConfig | null>(
selectedLlmProvider,
)
const selectedChatTargetRef = useRef<SidepanelChatTarget | undefined>(
selectedChatTarget,
)
const enabledMcpServersRef = useRef(constructMcpServers(mcpServers))
const enabledCustomServersRef = useRef(constructCustomServers(mcpServers))
const personalizationRef = useRef(personalization)
@@ -41,16 +92,36 @@ export const useChatRefs = () => {
enabledCustomServersRef.current = constructCustomServers(mcpServers)
}, [selectedLlmProvider, mcpServers])
useEffect(() => {
selectedChatTargetRef.current = selectedChatTarget
}, [selectedChatTarget])
useEffect(() => {
personalizationRef.current = personalization
}, [personalization])
const selectChatTarget = useCallback(
async (target: SidepanelChatTarget | undefined) => {
selectedChatTargetRef.current = target
setTargetSelection(target ? { kind: target.kind, id: target.id } : null)
await persistSidepanelChatTargetSelection(target)
},
[],
)
return {
selectedLlmProviderRef,
selectedChatTargetRef,
enabledMcpServersRef,
enabledCustomServersRef,
personalizationRef,
llmProviders,
setDefaultProvider,
chatTargets,
selectedChatTarget,
selectChatTarget,
selectedLlmProvider,
isLoadingProviders,
isLoadingProviders:
isLoadingProviders || isLoadingAdapters || isLoadingAgents,
}
}

View File

@@ -0,0 +1,153 @@
import { describe, expect, it } from 'bun:test'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import type { ChatMode } from './chatTypes'
import type { SidepanelChatTarget } from './sidepanel-chat-targets'
import { buildSidepanelPreparedSendMessagesRequest } from './useChatSessionRequest'
const conversationId = '00000000-0000-4000-8000-000000000001'
describe('buildSidepanelPreparedSendMessagesRequest', () => {
it('keeps LLM targets on the existing /chat request body', () => {
const request = buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: 'http://127.0.0.1:5151',
target: llmTarget,
fallbackProvider,
message: 'Summarize this page',
...commonRequestInput(),
})
expect(request.api).toBe('http://127.0.0.1:5151/chat')
expect(request.body).toMatchObject({
message: 'Summarize this page',
conversationId,
provider: 'browseros',
providerType: 'browseros',
providerName: 'BrowserOS',
model: 'gpt-5',
mode: 'agent',
browserContext: {
activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
enabledMcpServers: ['slack'],
},
userSystemPrompt: 'Be concise',
userWorkingDir: '/tmp/work',
previousConversation: [{ role: 'assistant', content: 'Prior answer' }],
selectedText: 'selected text',
selectedTextSource: {
url: 'https://example.com',
title: 'Example',
},
})
})
it('sends created-agent targets to the agent-id sidepanel route', () => {
const request = buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: 'http://127.0.0.1:5151',
target: acpTarget,
fallbackProvider,
message: 'Inspect the current tab',
approvalResponses: [
{ approvalId: 'approval-1', approved: true, reason: 'ok' },
],
...commonRequestInput(),
})
expect(request.api).toBe(
'http://127.0.0.1:5151/agents/agent-codex/sidepanel/chat',
)
expect(request.body).toEqual({
conversationId,
message: 'Inspect the current tab',
browserContext: {
activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
enabledMcpServers: ['slack'],
},
userSystemPrompt: 'Be concise',
userWorkingDir: '/tmp/work',
selectedText: 'selected text',
selectedTextSource: {
url: 'https://example.com',
title: 'Example',
},
})
})
it('keeps tool approval retry payloads scoped to LLM chat', () => {
const request = buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: 'http://127.0.0.1:5151',
target: llmTarget,
fallbackProvider,
approvalResponses: [
{ approvalId: 'approval-1', approved: false, reason: 'no' },
],
...commonRequestInput(),
})
expect(request.api).toBe('http://127.0.0.1:5151/chat')
expect(request.body).toMatchObject({
message: '',
toolApprovalResponses: [
{ approvalId: 'approval-1', approved: false, reason: 'no' },
],
})
})
})
function commonRequestInput() {
return {
conversationId,
mode: 'agent' as ChatMode,
browserContext: {
activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
enabledMcpServers: ['slack'],
},
userSystemPrompt: 'Be concise',
userWorkingDir: '/tmp/work',
previousConversation: [
{ role: 'assistant' as const, content: 'Prior answer' },
],
declinedApps: ['gmail'],
aclRules: [{ id: 'rule-1', sitePattern: '*://*/*', enabled: true }],
selectedText: 'selected text',
selectedTextSource: {
url: 'https://example.com',
title: 'Example',
},
toolApprovalConfig: { categories: { navigation: true } },
}
}
const fallbackProvider: LlmProviderConfig = {
id: 'browseros',
type: 'browseros',
name: 'BrowserOS',
modelId: 'gpt-5',
supportsImages: true,
contextWindow: 128000,
temperature: 0.7,
createdAt: 1000,
updatedAt: 1000,
}
const llmTarget: SidepanelChatTarget = {
kind: 'llm',
id: fallbackProvider.id,
name: fallbackProvider.name,
type: fallbackProvider.type,
provider: fallbackProvider,
}
const acpTarget: SidepanelChatTarget = {
kind: 'acp',
id: 'agent-codex',
name: 'Review bot',
type: 'acp',
agentId: 'agent-codex',
adapter: 'codex',
adapterName: 'Codex',
modelId: 'gpt-5.5',
modelLabel: 'GPT-5.5',
modelControl: 'best-effort',
reasoningEffort: 'medium',
reasoningEffortLabel: 'Medium',
}

View File

@@ -26,15 +26,14 @@ import { useInvalidateCredits } from '@/lib/credits/useCredits'
import { declinedAppsStorage } from '@/lib/declined-apps/storage'
import { useGraphqlQuery } from '@/lib/graphql/useGraphqlQuery'
import { createDefaultBrowserOSProvider } from '@/lib/llm-providers/storage'
import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
import {
type ApprovalResponseData,
buildChatRequestBody,
type ChatRequestBrowserContext,
import type {
ApprovalResponseData,
ChatRequestBrowserContext,
} from '@/lib/messaging/server/buildChatRequestBody'
import { track } from '@/lib/metrics/track'
import { searchActionsStorage } from '@/lib/search-actions/searchActionsStorage'
import { selectedTextStorage } from '@/lib/selected-text/selectedTextStorage'
import { sentry } from '@/lib/sentry/sentry'
import { stopAgentStorage } from '@/lib/stop-agent/stop-agent-storage'
import {
type ApprovalResponse,
@@ -52,7 +51,12 @@ import {
import { selectedWorkspaceStorage } from '@/lib/workspace/workspace-storage'
import type { ChatMode } from './chatTypes'
import { GetConversationWithMessagesDocument } from './graphql/chatSessionDocument'
import { toLlmProviderConfig } from './sidepanel-chat-targets'
import { useChatRefs } from './useChatRefs'
import {
buildSidepanelPreparedSendMessagesRequest,
toProviderOption,
} from './useChatSessionRequest'
import { useExecutionHistoryTracker } from './useExecutionHistoryTracker'
import { useNotifyActiveTab } from './useNotifyActiveTab'
import { useRemoteConversationSave } from './useRemoteConversationSave'
@@ -186,16 +190,19 @@ const buildRequestBrowserContext = ({
export const useChatSession = (options?: ChatSessionOptions) => {
const {
selectedLlmProviderRef,
selectedChatTargetRef,
enabledMcpServersRef,
enabledCustomServersRef,
personalizationRef,
setDefaultProvider,
chatTargets,
selectedChatTarget,
selectChatTarget,
selectedLlmProvider,
isLoadingProviders,
} = useChatRefs()
const invalidateCredits = useInvalidateCredits()
const { providers: llmProviders, setDefaultProvider } = useLlmProviders()
const {
baseUrl: agentServerUrl,
isLoading: isLoadingAgentUrl,
@@ -218,11 +225,7 @@ export const useChatSession = (options?: ChatSessionOptions) => {
agentUrlRef.current = agentServerUrl
}, [agentServerUrl])
const providers: Provider[] = llmProviders.map((p) => ({
id: p.id,
name: p.name,
type: p.type,
}))
const providers: Provider[] = chatTargets.map(toProviderOption)
const [mode, setMode] = useState<ChatMode>('agent')
const [textToAction, setTextToAction] = useState<Map<string, ChatAction>>(
@@ -324,15 +327,8 @@ export const useChatSession = (options?: ChatSessionOptions) => {
textToActionRef.current = textToAction
}, [mode, textToAction])
const selectedProvider = selectedLlmProvider
? {
id: selectedLlmProvider.id,
name: selectedLlmProvider.name,
type:
selectedLlmProvider.id === 'browseros'
? ('browseros' as const)
: selectedLlmProvider.type,
}
const selectedProvider = selectedChatTarget
? toProviderOption(selectedChatTarget)
: providers[0]
const {
@@ -346,7 +342,8 @@ export const useChatSession = (options?: ChatSessionOptions) => {
} = useChat({
transport: new DefaultChatTransport({
prepareSendMessagesRequest: async ({ messages }) => {
const provider =
const target = selectedChatTargetRef.current
const fallbackProvider =
selectedLlmProviderRef.current ?? createDefaultBrowserOSProvider()
const activeTabsList = await chrome.tabs.query({
active: true,
@@ -395,51 +392,46 @@ export const useChatSession = (options?: ChatSessionOptions) => {
personalizationRef.current,
)
const approvalResponses = extractApprovalResponses(messages)
const commonRequest = {
conversationId: conversationIdRef.current,
mode: currentMode,
browserContext: requestBrowserContext,
userSystemPrompt,
userWorkingDir: workingDirRef.current,
previousConversation,
declinedApps,
aclRules: enabledAclRules,
toolApprovalConfig: approvalConfig,
}
const approvalResponses =
target?.kind === 'acp' ? null : extractApprovalResponses(messages)
if (approvalResponses) {
return {
api: `${agentUrlRef.current}/chat`,
body: buildChatRequestBody({
conversationId: conversationIdRef.current,
provider,
mode: currentMode,
browserContext: requestBrowserContext,
userSystemPrompt,
userWorkingDir: workingDirRef.current,
previousConversation,
declinedApps,
aclRules: enabledAclRules,
toolApprovalConfig: approvalConfig,
toolApprovalResponses: approvalResponses,
}),
}
return buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: agentUrlRef.current ?? undefined,
target,
fallbackProvider,
...commonRequest,
approvalResponses,
})
}
const message = getLastMessageText(messages)
const result = {
api: `${agentUrlRef.current}/chat`,
body: buildChatRequestBody({
message,
conversationId: conversationIdRef.current,
provider,
mode: currentMode,
browserContext: requestBrowserContext,
userSystemPrompt,
userWorkingDir: workingDirRef.current,
previousConversation,
declinedApps,
aclRules: enabledAclRules,
selectedText: activeTabSelection?.text,
selectedTextSource: activeTabSelection
? {
url: activeTabSelection.url,
title: activeTabSelection.title,
}
: undefined,
toolApprovalConfig: approvalConfig,
}),
}
const result = buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: agentUrlRef.current ?? undefined,
target,
fallbackProvider,
message,
...commonRequest,
selectedText: activeTabSelection?.text,
selectedTextSource: activeTabSelection
? {
url: activeTabSelection.url,
title: activeTabSelection.title,
}
: undefined,
})
// Track which tab's selection was sent so we can clear it on success
pendingSelectionTabKeyRef.current =
@@ -451,7 +443,7 @@ export const useChatSession = (options?: ChatSessionOptions) => {
sendAutomaticallyWhen: () => {
if (approvalJustRespondedRef.current) {
approvalJustRespondedRef.current = false
return true
return selectedChatTargetRef.current?.kind !== 'acp'
}
return false
},
@@ -686,10 +678,22 @@ export const useChatSession = (options?: ChatSessionOptions) => {
}, [dispatchMessage, isIntegrationsSynced])
const sendMessage = (params: { text: string; action?: ChatAction }) => {
const target = selectedChatTargetRef.current
const llmTargetProvider = toLlmProviderConfig(target)
const agentTarget = target?.kind === 'acp' ? target : undefined
track(MESSAGE_SENT_EVENT, {
mode,
provider_type: selectedLlmProvider?.type,
model: selectedLlmProvider?.modelId,
provider_id:
agentTarget?.agentId ??
llmTargetProvider?.id ??
selectedLlmProvider?.id,
provider_type: agentTarget ? 'acp' : llmTargetProvider?.type,
agent_id: agentTarget?.agentId,
adapter: agentTarget?.adapter,
model:
agentTarget?.modelId ??
llmTargetProvider?.modelId ??
selectedLlmProvider?.modelId,
})
if (!isIntegrationsSyncedRef.current) {
@@ -741,14 +745,54 @@ export const useChatSession = (options?: ChatSessionOptions) => {
addToolApprovalResponse(params)
}
const resetConversationState = () => {
stop()
void finishExecutionTask({ isAbort: true })
setConversationId(crypto.randomUUID())
setMessages([])
setTextToAction(new Map())
setLiked({})
setDisliked({})
setRestoredConversationId(null)
resetRemoteConversation()
}
const handleSelectProvider = (provider: Provider) => {
const fullProvider = llmProviders.find((p) => p.id === provider.id)
const target = chatTargets.find(
(candidate) =>
candidate.id === provider.id && candidate.kind === provider.kind,
)
if (!target) return
const previousTarget = selectedChatTargetRef.current
track(PROVIDER_SELECTED_EVENT, {
provider_id: provider.id,
provider_type: provider.type,
model_id: fullProvider?.modelId,
provider_id: target.id,
provider_type: target.kind === 'acp' ? 'acp' : target.type,
model_id:
target.kind === 'acp' ? target.modelId : target.provider.modelId,
agent_id: target.kind === 'acp' ? target.agentId : undefined,
adapter: target.kind === 'acp' ? target.adapter : undefined,
})
setDefaultProvider(provider.id)
void selectChatTarget(target).catch((error) => {
sentry.captureException(error, {
extra: {
message: 'Failed to persist sidepanel chat target selection',
targetId: target.id,
targetKind: target.kind,
},
})
})
if (target.kind === 'llm') setDefaultProvider(target.provider.id)
if (
previousTarget &&
(previousTarget.kind !== target.kind ||
previousTarget.id !== target.id) &&
messagesRef.current.length > 0
) {
resetConversationState()
}
}
const getActionForMessage = (message: UIMessage) => {
@@ -762,15 +806,7 @@ export const useChatSession = (options?: ChatSessionOptions) => {
const resetConversation = () => {
track(CONVERSATION_RESET_EVENT, { message_count: messages.length })
stop()
void finishExecutionTask({ isAbort: true })
setConversationId(crypto.randomUUID())
setMessages([])
setTextToAction(new Map())
setLiked({})
setDisliked({})
setRestoredConversationId(null)
resetRemoteConversation()
resetConversationState()
}
const isRestoringConversation =

View File

@@ -0,0 +1,74 @@
import type { Provider } from '../../../components/chat/chatComponentTypes'
import type { LlmProviderConfig } from '../../../lib/llm-providers/types'
import {
type ApprovalResponseData,
buildChatRequestBody,
} from '../../../lib/messaging/server/buildChatRequestBody'
import {
type SidepanelChatTarget,
toLlmProviderConfig,
} from './sidepanel-chat-targets'
type LlmChatRequestBodyInput = Parameters<typeof buildChatRequestBody>[0]
type CommonSidepanelRequestInput = Omit<
LlmChatRequestBodyInput,
'provider' | 'message' | 'toolApprovalResponses' | 'isScheduledTask'
>
interface BuildSidepanelPreparedSendMessagesRequestInput
extends CommonSidepanelRequestInput {
agentServerUrl: string | undefined
target: SidepanelChatTarget | undefined
fallbackProvider: LlmProviderConfig
message?: string
approvalResponses?: ApprovalResponseData[] | null
}
export function buildSidepanelPreparedSendMessagesRequest({
agentServerUrl,
target,
fallbackProvider,
message,
approvalResponses,
...common
}: BuildSidepanelPreparedSendMessagesRequestInput) {
if (target?.kind === 'acp') {
return {
api: `${agentServerUrl}/agents/${encodeURIComponent(target.agentId)}/sidepanel/chat`,
body: {
conversationId: common.conversationId,
message: message ?? '',
browserContext: common.browserContext,
userSystemPrompt: common.userSystemPrompt,
userWorkingDir: common.userWorkingDir,
selectedText: common.selectedText,
selectedTextSource: common.selectedTextSource,
},
}
}
const provider = toLlmProviderConfig(target) ?? fallbackProvider
return {
api: `${agentServerUrl}/chat`,
body: buildChatRequestBody({
...common,
provider,
message,
toolApprovalResponses: approvalResponses ?? undefined,
}),
}
}
export function toProviderOption(target: SidepanelChatTarget): Provider {
return {
id: target.id,
name: target.name,
type: target.type,
kind: target.kind,
agentId: target.kind === 'acp' ? target.agentId : undefined,
adapterName: target.kind === 'acp' ? target.adapterName : undefined,
modelLabel: target.kind === 'acp' ? target.modelLabel : undefined,
modelControl: target.kind === 'acp' ? target.modelControl : undefined,
}
}

View File

@@ -59,15 +59,3 @@ export interface AgentConversation {
createdAt: number
updatedAt: number
}
export interface AgentCardData {
agentId: string
name: string
model?: string
status: 'idle' | 'working' | 'error'
lastMessage?: string
lastMessageTimestamp?: number
activitySummary?: string
currentTool?: string
costUsd?: number
}

View File

@@ -2,29 +2,75 @@ function isAbortError(error: unknown): boolean {
return error instanceof DOMException && error.name === 'AbortError'
}
export interface ParsedSSEEvent<T> {
data: T
/** Numeric `id:` line on the same SSE event, if any. */
seq?: number
}
export function parseSSELines<T>(buffer: string): {
events: T[]
events: ParsedSSEEvent<T>[]
remainder: string
} {
// SSE events are separated by blank lines. Buffer lines until we hit
// a blank, then assemble each event. Lines we recognise: `id: <n>`
// and `data: <payload>`. Everything else is ignored.
const events: ParsedSSEEvent<T>[] = []
const lines = buffer.split('\n')
const remainder = lines.pop() ?? ''
const events: T[] = []
for (const line of lines) {
if (!line.startsWith('data: ')) continue
const payload = line.slice(6)
if (payload === '[DONE]') continue
try {
events.push(JSON.parse(payload) as T)
} catch {}
// Find the last blank-line boundary; everything after it is the
// remainder (next event partially received).
let lastBoundary = -1
for (let i = lines.length - 1; i >= 0; i--) {
if (lines[i] === '') {
lastBoundary = i
break
}
}
const completeLines = lastBoundary >= 0 ? lines.slice(0, lastBoundary) : []
const remainder =
lastBoundary >= 0 ? lines.slice(lastBoundary + 1).join('\n') : buffer
let currentSeq: number | undefined
let currentData: string | null = null
const flush = () => {
if (currentData != null && currentData !== '[DONE]') {
try {
events.push({
data: JSON.parse(currentData) as T,
seq: currentSeq,
})
} catch {
// ignore
}
}
currentSeq = undefined
currentData = null
}
for (const line of completeLines) {
if (line === '') {
flush()
continue
}
if (line.startsWith('id: ')) {
const n = Number.parseInt(line.slice(4).trim(), 10)
if (Number.isFinite(n)) currentSeq = n
continue
}
if (line.startsWith('data: ')) {
currentData = line.slice(6)
}
}
// Catch a complete trailing event with no terminating blank line —
// shouldn't happen in well-formed SSE, but be tolerant.
flush()
return { events, remainder }
}
export async function consumeSSEStream<T>(
response: Response,
onEvent: (event: T) => void,
onEvent: (event: T, meta: { seq?: number }) => void,
signal?: AbortSignal,
): Promise<void> {
const reader = response.body?.getReader()
@@ -49,7 +95,7 @@ export async function consumeSSEStream<T>(
buffer = remainder
for (const event of events) {
onEvent(event)
onEvent(event.data, { seq: event.seq })
}
}
} catch (error) {
@@ -64,7 +110,7 @@ export async function consumeSSEStream<T>(
if (buffer) {
const { events } = parseSSELines<T>(buffer)
for (const event of events) {
onEvent(event)
onEvent(event.data, { seq: event.seq })
}
}
}

View File

@@ -9,6 +9,7 @@
"build": "bun run codegen && wxt build",
"build:dev": "bun --env-file=.env.development wxt build --mode development",
"zip": "wxt zip",
"test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
"compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
"lint": "bunx biome check",
"typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",

View File

@@ -0,0 +1,51 @@
# Copy to .env.development for local eval runs.
# Provider keys used by existing config files.
OPENROUTER_API_KEY=
FIREWORKS_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_GENERATIVE_AI_API_KEY=
# Claude Agent SDK token used by performance_grader.
CLAUDE_CODE_OAUTH_TOKEN=
# Suite-mode model selection.
EVAL_VARIANT=local
EVAL_AGENT_PROVIDER=openai-compatible
EVAL_AGENT_MODEL=
EVAL_AGENT_API_KEY=
EVAL_AGENT_BASE_URL=
EVAL_AGENT_SUPPORTS_IMAGES=true
# Optional suite-mode executor override for orchestrator suites.
EVAL_EXECUTOR_MODEL=
EVAL_EXECUTOR_API_KEY=
EVAL_EXECUTOR_BASE_URL=
# Clado visual action executor.
CLADO_ACTION_MODEL=
CLADO_ACTION_API_KEY=
CLADO_ACTION_BASE_URL=
# Backward-compatible alias used by older local scripts.
CLADO_ACTION_URL=
# BrowserOS runner.
BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
BROWSEROS_SERVER_URL=http://127.0.0.1:9110
BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
BROWSEROS_CONFIG_URL=
# Captcha solver extension.
NOPECHA_API_KEY=
# WebArena-Infinity.
WEBARENA_INFINITY_DIR=
INFINITY_APP_URL=
# R2 publishing and weekly report.
EVAL_R2_ACCOUNT_ID=
EVAL_R2_ACCESS_KEY_ID=
EVAL_R2_SECRET_ACCESS_KEY=
EVAL_R2_BUCKET=browseros-eval
EVAL_R2_CDN_BASE_URL=https://eval.browseros.com

View File

@@ -14,6 +14,7 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
```bash
cd apps/eval
cp .env.example .env.development
# Edit .env.development with your keys, then:
bun run eval
```
@@ -23,11 +24,55 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
### CLI mode
```bash
bun run eval -c configs/browseros-agent-weekly.json
bun run eval -c configs/legacy/browseros-agent-weekly.json
bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
```
Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.
The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
```bash
bun run eval run --config configs/legacy/browseros-agent-weekly.json
bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
```
Config files live in two groups:
```txt
configs/legacy/ # Complete EvalConfig files used by older workflows and the dashboard
configs/suites/ # Suite definitions; model/provider comes from CLI flags or env
```
Suite mode takes model settings from CLI flags first, then env:
```bash
EVAL_VARIANT=kimi-fireworks \
EVAL_AGENT_PROVIDER=openai-compatible \
EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
```
### Suites and variants
A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
This lets us run the same suite against multiple model setups without copying the benchmark config:
```txt
agisdk-daily-10 + kimi-fireworks
agisdk-daily-10 + claude-sonnet
agisdk-daily-10 + clado-action-000159
```
For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
## Agent types
| Type | Description |
@@ -66,9 +111,9 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
},
"executor": {
"provider": "clado-action",
"model": "qwen3-vl-30b-a3b-instruct",
"model": "Qwen3.5-35B-A3B-action-000159-merged",
"apiKey": "",
"baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
"baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
}
}
}
@@ -96,6 +141,20 @@ The `apiKey` field supports two formats:
- **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
- **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)
### Environment variables
| Variable | Used for |
|----------|----------|
| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
### Supported providers
| Provider | `provider` value | Requires `baseUrl` |
@@ -110,6 +169,22 @@ The `apiKey` field supports two formats:
| Ollama | `ollama` | No |
| Clado Action (executor only) | `clado-action` | Yes |
### R2 publishing
`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
```bash
export EVAL_R2_ACCOUNT_ID=...
export EVAL_R2_ACCESS_KEY_ID=...
export EVAL_R2_SECRET_ACCESS_KEY=...
export EVAL_R2_BUCKET=browseros-eval
export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
```
`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
### BrowserOS infrastructure
```json
@@ -137,10 +212,12 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP
| File | Tasks | Description |
|------|-------|-------------|
| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
| `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
| `mind2web.jsonl` | 300 | Online-Mind2Web |
| `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
| `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
| `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
| `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
@@ -167,14 +244,47 @@ results/
browseros-agent-weekly/
2026-04-29-1430/
Amazon--0/
attempt.json # Stable attempt summary for viewer/reporting
metadata.json # Task result, timing, grader scores
grades.json # Compact grader results
messages.jsonl # Full message log
grader-artifacts/ # Grader-specific inputs/outputs/stderr
screenshots/
001.png # Step-by-step screenshots
002.png
summary.json # Aggregate pass rates
```
R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
### R2 viewer manifest
`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
```json
{
"schemaVersion": 2,
"runId": "agisdk-real-smoke-2026-04-30-0000",
"tasks": [
{
"queryId": "agisdk-dashdish-10",
"paths": {
"metadata": "tasks/agisdk-dashdish-10/metadata.json",
"messages": "tasks/agisdk-dashdish-10/messages.jsonl",
"grades": "tasks/agisdk-dashdish-10/grades.json",
"trace": "tasks/agisdk-dashdish-10/trace.jsonl",
"screenshots": "tasks/agisdk-dashdish-10/screenshots",
"graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
}
}
]
}
```
The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
## Troubleshooting
**BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.

View File

@@ -7,8 +7,8 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/agisdk-real.jsonl",
"num_workers": 10,
"dataset": "../../data/agisdk-real-smoke.jsonl",
"num_workers": 1,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",

View File

@@ -0,0 +1,26 @@
{
"agent": {
"type": "single",
"provider": "openai-compatible",
"model": "accounts/fireworks/models/kimi-k2p5",
"apiKey": "FIREWORKS_API_KEY",
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 4,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/webbench-2of4-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {

View File

@@ -14,7 +14,7 @@
"baseUrl": "https://api.fireworks.ai/inference/v1"
}
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/webbench-2of4-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {

View File

@@ -9,12 +9,12 @@
},
"executor": {
"provider": "clado-action",
"model": "qwen3-vl-30b-a3b-instruct",
"model": "Qwen3.5-35B-A3B-action-000159-merged",
"apiKey": "",
"baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
"baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
}
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
@@ -23,11 +23,11 @@
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
"headless": true
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["performance_grader"],
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/webarena-infinity-hard-50.jsonl",
"dataset": "../../data/webarena-infinity-hard-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {

View File

@@ -5,7 +5,7 @@
"model": "openai/gpt-4.1",
"apiKey": "OPENROUTER_API_KEY"
},
"dataset": "../data/mind2web.jsonl",
"dataset": "../../data/mind2web.jsonl",
"num_workers": 5,
"restart_server_per_task": true,
"browseros": {

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../data/webvoyager.jsonl",
"dataset": "../../data/webvoyager.jsonl",
"num_workers": 3,
"restart_server_per_task": true,
"browseros": {

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-daily-10",
"dataset": "../../data/agisdk-daily-10.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": true
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-real-smoke",
"dataset": "../../data/agisdk-real-smoke.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-real",
"dataset": "../../data/agisdk-real.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,10 @@
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}

View File

@@ -0,0 +1 @@
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}

View File

@@ -32,9 +32,5 @@
{"query_id": "agisdk-networkin-10", "dataset": "agisdk-real", "query": "Generate a polite follow-up message for a previous unanswered chat, starting with \"Following up on\".", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-10", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-10", "challenge_type": "action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-gomail-3", "dataset": "agisdk-real", "query": "Compose a new email to jonathan.smith@example.com with the subject \"Meeting Notes\" and body \"Please find the meeting notes attached.\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-3", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-3", "challenge_type": "action", "difficulty": "easy", "similar_to": "Gmail"}}}
{"query_id": "agisdk-udriver-6", "dataset": "agisdk-real", "query": "Me and 4 friends need a ride from the Palace Hotel to dinner at Osha Thai leaving now", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-6", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-6", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-staynb-9", "dataset": "agisdk-real", "query": "Book a stay with the maximum number of guests supported. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-9", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-9", "challenge_type": "action", "difficulty": "hard", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-zilloft-3", "dataset": "agisdk-real", "query": "Find a home in San Diego priced under $150,000 with at least 2 bedrooms and request a tour. Use these details: Contact Name: John Doe, Email: johndoe@example.com, Phone: 555-123-4567, Tour Time: 2:00 PM, Tour Date: First available.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-3", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-3", "challenge_type": "retrieval-action", "difficulty": "easy", "similar_to": "Zillow"}}}
{"query_id": "agisdk-fly-unified-6", "dataset": "agisdk-real", "query": "Reserve me a seat for the flight from Austin to Pittsburgh departing on December 11th, 2024 at 8:00 in Basic Economy.\nPassenger: Alice Brown\nDate of Birth: 05/20/1992\nSex: Female\nSeat Selection: Yes (Aisle seat)\nPayment: Credit Card (378342143523967), Exp: 09/27, security code: 332 Address: 789 Pine St, Los Angeles, CA, 90012, USA, Phone: 555-456-7890, Email: alicebrown@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-6", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-opendining-3", "dataset": "agisdk-real", "query": "Book a table at \"The Royal Dine\" for a party of 4 on July 20, 2024, at 7 PM. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-3", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-3", "challenge_type": "action", "difficulty": "easy", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-gocalendar-7", "dataset": "agisdk-real", "query": "Reschedule the \"Morning Coffee with sister\" event from July 18, 2024, at 9 AM to July 19, 2024, at 10AM using drag-and-drop functionality", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-7", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-7", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-staynb-5", "dataset": "agisdk-real", "query": "Use the search bar to look for a stay. For the \"Where\" section, use the \"Search by region\" popover and select \"Europe\". Set the check-in date to October 13th and the check-out date to October 23rd. For the \"Who\" section, select 1 infant, 2 children, and 2 adults. Press the search button, select the first stay, and book it.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-5", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-5", "challenge_type": "action", "difficulty": "medium", "similar_to": "Airbnb"}}}

View File

@@ -5,6 +5,7 @@
"type": "module",
"scripts": {
"eval": "bun --env-file=.env.development run src/index.ts",
"test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
"typecheck": "tsc --noEmit"
},
"dependencies": {

View File

@@ -64,6 +64,37 @@ EXCLUDED_TASKS = {
# the grader's first criterion (search history contains "stanford") was
# never triggered server-side. Eval-site bug.
"networkin-9",
# Goal text instructs "move event to July 19, 10 AM" but the grader expects
# `eventsDiff.updated.*.start == "2024-07-18T17:00:00Z"` (= July 18, 10 AM
# PDT — same day, 1 hour shift). Goal contradicts grader: following the
# goal yields July 19 timestamps; satisfying the grader requires ignoring
# the explicit "to July 19" instruction. Confirmed via 8-trial deep-dive:
# never passed even after the Phase 2 HTML5 dnd dispatch fix made the drag
# actually populate `eventsDiff.updated` (now produces July 19 values, but
# grader rejects them).
"gocalendar-7",
# Grader hardcodes literal year strings `'Oct 13 2025'` / `'Oct 23 2025'`
# in checkin/checkout criteria. Today is 2026, and the staynb date picker
# interprets bare "Oct 13" as the most recent past instance — currently
# 2024, not 2025. Even a perfectly-acting agent cannot produce a booking
# whose persisted date contains "2025". Confirmed via 8 trials, 0 passes.
"staynb-5",
# Goal says "maximum number of guests supported"; grader expects the very
# specific string "32 Guests, 16 Infants" — which requires the agent to
# know that (a) Adults+Children sum into the displayed "Guests" count,
# (b) Infants render separately, (c) Pets are excluded, (d) per-category
# cap is 16 despite no UI affordance signalling it. None of this is in
# the prompt. 8 trials, 0 passes; even Opus 4.6 stopped at 16 (one
# category maxed). Task is under-specified relative to grader expectation.
"staynb-9",
# Grader requires `contains(booking.date, '2024-07-20')` but the eval-site
# date picker is a React-controlled textbox that the agent's `fill` tool
# frequently no-ops on. 3 of 8 trials passed (when fill happened to stick),
# 5 failed with `actual_value: False` (booking persisted with the eval-site
# default search date, not Jul 20). Effectively a coin-flip task that
# exercises tool-fidelity flakiness rather than agent capability —
# contributes noise, not signal. Excluding for eval reliability.
"opendining-3",
}
# Far-future replacement used by `freshen_goal_dates` when a task's hardcoded

View File

@@ -1,34 +1,73 @@
/**
* Test script for Clado API endpoints (grounding + action models)
* Smoke-test for the Clado BrowserOS Action endpoint.
*
* Health-checks the model, then runs a generate call and prints every
* field the new contract documents (action, coordinates, text, key,
* direction, scroll/drag fields, wait, end+final_answer, thinking,
* parse_error, raw_response).
*
* Usage:
* bun apps/eval/scripts/test-clado-api.ts [screenshot-path]
*
* If no screenshot provided, captures one from a running BrowserOS server.
* If no screenshot path is given, captures one over MCP from a
* running BrowserOS server (default http://127.0.0.1:9110, override
* with BROWSEROS_URL).
*
* Cold start can take ~5 minutes; the script waits up to 6.
*/
import { readFile } from 'node:fs/promises'
import { resolve } from 'node:path'
const ACTION_URL =
'https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run'
'https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run'
const ACTION_HEALTH_URL =
'https://clado-ai--clado-browseros-action-actionmodel-health.modal.run'
const GROUNDING_URL =
'https://clado-ai--clado-browseros-grounding-groundingmodel-generate.modal.run'
const GROUNDING_HEALTH_URL =
'https://clado-ai--clado-browseros-grounding-groundingmodel-health.modal.run'
'https://clado-ai--clado-browseros-action-000159-merged-actionmod-5e5033.modal.run'
async function checkHealth(name: string, url: string): Promise<boolean> {
console.log(`\n--- ${name} health check ---`)
console.log(` URL: ${url}`)
const COLD_START_BUDGET_MS = 360_000 // 6 min — Clado cold start is ~5 min
const COLD_START_WARN_MS = 30_000
interface CladoResponse {
action?: string | null
thinking?: string | null
raw_response?: string
parse_error?: string | null
inference_time_seconds?: number
x?: number
y?: number
text?: string
key?: string
direction?: string
amount?: number
startX?: number
startY?: number
endX?: number
endY?: number
time?: number
final_answer?: string | null
}
async function checkHealth(): Promise<boolean> {
console.log(`\n--- Action model health ---`)
console.log(` URL: ${ACTION_HEALTH_URL}`)
console.log(
` Note: cold start can take ~5 min; waiting up to ${COLD_START_BUDGET_MS / 1000}s.`,
)
const start = performance.now()
const warn = setTimeout(() => {
console.log(
` ...still waiting (${COLD_START_WARN_MS / 1000}s in) — model is likely cold-starting on Modal.`,
)
}, COLD_START_WARN_MS)
try {
const resp = await fetch(url, { signal: AbortSignal.timeout(30_000) })
const resp = await fetch(ACTION_HEALTH_URL, {
signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
})
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
const body = await resp.text()
console.log(` Status: ${resp.status} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 200)}`)
console.log(` Body: ${body.slice(0, 400)}`)
return resp.ok
} catch (err) {
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
@@ -36,63 +75,34 @@ async function checkHealth(name: string, url: string): Promise<boolean> {
` FAILED (${elapsed}s): ${err instanceof Error ? err.message : err}`,
)
return false
} finally {
clearTimeout(warn)
}
}
async function testGenerate(
name: string,
url: string,
async function generate(
label: string,
payload: Record<string, unknown>,
): Promise<Record<string, unknown> | null> {
console.log(`\n--- ${name} generate ---`)
console.log(` URL: ${url}`)
): Promise<CladoResponse | null> {
console.log(`\n--- ${label} ---`)
console.log(` URL: ${ACTION_URL}`)
console.log(` Instruction: ${payload.instruction}`)
console.log(
` Image size: ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
` Image size: ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
)
if (payload.history) console.log(` History: ${payload.history}`)
if (payload.history && payload.history !== 'None') {
console.log(` History: ${payload.history}`)
}
const start = performance.now()
let resp: Response
try {
const resp = await fetch(url, {
resp = await fetch(ACTION_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
signal: AbortSignal.timeout(120_000),
signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
})
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
if (!resp.ok) {
const body = await resp.text()
console.log(` FAILED: HTTP ${resp.status} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 400)}`)
return null
}
const result = (await resp.json()) as Record<string, unknown>
console.log(` Status: ${resp.status} (${elapsed}s)`)
console.log(` Action: ${result.action}`)
if (result.x !== null && result.x !== undefined)
console.log(` Coordinates: (${result.x}, ${result.y})`)
if (result.text)
console.log(` Text: ${(result.text as string).slice(0, 100)}`)
if (result.key) console.log(` Key: ${result.key}`)
if (result.inference_time_seconds)
console.log(` Inference: ${result.inference_time_seconds}s`)
// Show thinking if present
const raw = result.raw_response as string | undefined
if (raw) {
const thinkMatch = raw.match(/<thinking>([\s\S]*?)<\/thinking>/)
if (thinkMatch) {
const thinking = thinkMatch[1].trim()
console.log(
` Thinking: ${thinking.slice(0, 200)}${thinking.length > 200 ? '...' : ''}`,
)
}
}
return result
} catch (err) {
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
console.log(
@@ -100,6 +110,50 @@ async function testGenerate(
)
return null
}
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
if (!resp.ok) {
const body = await resp.text()
console.log(` HTTP ${resp.status} ${resp.statusText} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 400)}`)
return null
}
const result = (await resp.json()) as CladoResponse
console.log(` HTTP ${resp.status} (${elapsed}s)`)
console.log(` action: ${result.action ?? 'null'}`)
if (result.parse_error) {
console.log(` parse_error: ${result.parse_error}`)
}
if (result.thinking) {
const trimmed = result.thinking.replace(/\s+/g, ' ').trim()
console.log(
` thinking: ${trimmed.slice(0, 240)}${trimmed.length > 240 ? '…' : ''}`,
)
}
if (typeof result.x === 'number' || typeof result.y === 'number') {
console.log(` x, y: ${result.x}, ${result.y}`)
}
if (typeof result.text === 'string')
console.log(` text: ${result.text.slice(0, 120)}`)
if (typeof result.key === 'string')
console.log(` key: ${result.key}`)
if (typeof result.direction === 'string')
console.log(` direction: ${result.direction}`)
if (typeof result.amount === 'number')
console.log(` amount: ${result.amount}`)
if (typeof result.startX === 'number' || typeof result.endX === 'number') {
console.log(
` drag: (${result.startX}, ${result.startY}) → (${result.endX}, ${result.endY})`,
)
}
if (typeof result.time === 'number')
console.log(` time: ${result.time}s`)
if (result.final_answer)
console.log(` final_answer: ${result.final_answer.slice(0, 240)}`)
if (typeof result.inference_time_seconds === 'number')
console.log(` inference_time_seconds: ${result.inference_time_seconds}`)
return result
}
async function loadScreenshot(path?: string): Promise<string> {
@@ -110,10 +164,9 @@ async function loadScreenshot(path?: string): Promise<string> {
return data.toString('base64')
}
// Try to capture from a running BrowserOS server
const serverUrl = process.env.BROWSEROS_URL || 'http://127.0.0.1:9110'
console.log(
`No screenshot path provided. Trying to capture from ${serverUrl}...`,
`No screenshot path provided. Capturing from ${serverUrl} via MCP...`,
)
const { Client } = await import('@modelcontextprotocol/sdk/client/index.js')
@@ -134,82 +187,101 @@ async function loadScreenshot(path?: string): Promise<string> {
arguments: { format: 'png', page: 1 },
})) as { content: Array<{ type: string; data?: string }> }
const imageContent = result.content?.find((c) => c.type === 'image')
if (!imageContent?.data)
throw new Error('No image data in screenshot response')
const image = result.content?.find((c) => c.type === 'image')
if (!image?.data)
throw new Error('No image data in take_screenshot response')
console.log(
`Captured screenshot (${(imageContent.data.length / 1024).toFixed(0)} KB base64)`,
`Captured screenshot (${(image.data.length / 1024).toFixed(0)} KB base64)`,
)
return imageContent.data
return image.data
} finally {
try {
await transport.close()
} catch {}
} catch {
/* ignore */
}
}
}
function summarize(history: CladoResponse[]): string {
if (history.length === 0) return 'None'
return history
.map((h) => {
switch (h.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${h.action}(${h.x}, ${h.y})`
case 'type':
return `type(${JSON.stringify(h.text ?? '')})`
case 'press_key':
return `press_key(${JSON.stringify(h.key ?? '')})`
case 'scroll':
return `scroll(${h.direction ?? 'down'})`
case 'drag':
return `drag(${h.startX},${h.startY} -> ${h.endX},${h.endY})`
case 'wait':
return `wait(${h.time ?? 1}s)`
case 'end':
return 'end()'
default:
return h.action ?? 'invalid'
}
})
.join(' -> ')
}
async function main() {
const screenshotPath = process.argv[2]
console.log('=== Clado action endpoint smoke test ===')
console.log('=== Clado API Test ===\n')
// Health checks (parallel)
const [actionHealthy, groundingHealthy] = await Promise.all([
checkHealth('Action Model', ACTION_HEALTH_URL),
checkHealth('Grounding Model', GROUNDING_HEALTH_URL),
])
if (!actionHealthy && !groundingHealthy) {
console.log('\nBoth endpoints are down. Exiting.')
const healthy = await checkHealth()
if (!healthy) {
console.log('\nHealth check failed. Exiting.')
process.exit(1)
}
// Load screenshot
let imageBase64: string
try {
imageBase64 = await loadScreenshot(screenshotPath)
imageBase64 = await loadScreenshot(process.argv[2])
} catch (err) {
console.log(
`\nFailed to load screenshot: ${err instanceof Error ? err.message : err}`,
)
console.log(
'Provide a screenshot path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
'Pass a path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
)
process.exit(1)
}
const instruction = 'Click on the search button or search bar'
const history: CladoResponse[] = []
// Test grounding model
if (groundingHealthy) {
await testGenerate('Grounding Model', GROUNDING_URL, {
instruction,
// Step 1: open task — let the model decide what to do.
const step1 = await generate('Step 1: cold task', {
instruction: 'Find the search bar and click it',
image_base64: imageBase64,
history: 'None',
})
if (step1?.action) history.push(step1)
// Step 2: continuation with history, asks for typing.
if (step1?.action) {
const step2 = await generate('Step 2: with history', {
instruction: 'Type "hello world" into the search bar',
image_base64: imageBase64,
history: summarize(history),
})
} else {
console.log('\nSkipping grounding model (unhealthy)')
if (step2?.action) history.push(step2)
}
// Test action model (no history)
if (actionHealthy) {
const result = await testGenerate('Action Model (step 1)', ACTION_URL, {
instruction,
image_base64: imageBase64,
history: 'None',
})
// Test action model with history (simulate multi-turn)
if (result && result.action === 'click') {
await testGenerate('Action Model (step 2, with history)', ACTION_URL, {
instruction: 'Type "hello world" in the search bar',
image_base64: imageBase64,
history: `click(${result.x}, ${result.y})`,
})
}
} else {
console.log('\nSkipping action model (unhealthy)')
}
// Step 3: ask for end with a final answer to exercise that field.
await generate('Step 3: ask for end+final_answer', {
instruction:
'You have completed the task. Reply with end() and final_answer="done".',
image_base64: imageBase64,
history: summarize(history),
})
console.log('\n=== Done ===')
}

View File

@@ -1,349 +1,43 @@
#!/usr/bin/env bun
/**
* Upload eval runs to R2.
*
* Two modes:
* bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
* → uploads that specific run
*
* bun scripts/upload-run.ts results/browseros-agent-weekly
* → finds all timestamped subfolders, uploads any not yet in R2
*
* Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
* EVAL_R2_BUCKET (default: browseros-eval)
* EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
*/
import { readdir, readFile, stat } from 'node:fs/promises'
import { basename, dirname, extname, join } from 'node:path'
import {
GetObjectCommand,
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3'
loadR2ConfigFromEnv,
R2Publisher,
} from '../src/publishing/r2-publisher'
const CONCURRENCY = 20
const CONTENT_TYPES: Record<string, string> = {
'.json': 'application/json',
'.jsonl': 'application/x-ndjson',
'.png': 'image/png',
}
interface R2Config {
accountId: string
accessKeyId: string
secretAccessKey: string
bucket: string
cdnBaseUrl: string
}
function loadConfig(): R2Config {
const accountId = process.env.EVAL_R2_ACCOUNT_ID
const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
if (!accountId || !accessKeyId || !secretAccessKey) {
console.error(
'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
)
process.exit(1)
}
return {
accountId,
accessKeyId,
secretAccessKey,
bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
cdnBaseUrl: (
process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
).replace(/\/+$/, ''),
}
}
function createClient(config: R2Config): S3Client {
return new S3Client({
region: 'auto',
endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
credentials: {
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey,
},
})
}
async function upload(
client: S3Client,
bucket: string,
key: string,
body: Buffer,
contentType: string,
) {
await client.send(
new PutObjectCommand({
Bucket: bucket,
Key: key,
Body: body,
ContentType: contentType,
}),
)
}
async function collectFiles(dir: string): Promise<string[]> {
const files: string[] = []
const entries = await readdir(dir, { withFileTypes: true })
for (const entry of entries) {
const full = join(dir, entry.name)
if (entry.isDirectory()) {
files.push(...(await collectFiles(full)))
} else {
files.push(full)
}
}
return files
}
async function runPool<T>(
items: T[],
concurrency: number,
fn: (item: T) => Promise<void>,
) {
let i = 0
const workers = Array.from({ length: concurrency }, async () => {
while (i < items.length) {
const idx = i++
await fn(items[idx])
}
})
await Promise.all(workers)
}
// Check if a run has already been uploaded to R2
async function isUploaded(
client: S3Client,
bucket: string,
runId: string,
): Promise<boolean> {
try {
await client.send(
new GetObjectCommand({
Bucket: bucket,
Key: `runs/${runId}/manifest.json`,
}),
)
return true
} catch {
return false
}
}
// Detect if a directory is a run dir (has task subdirs with metadata.json)
// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
async function isRunDir(dir: string): Promise<boolean> {
const entries = await readdir(dir, { withFileTypes: true })
const subdirs = entries.filter((e) => e.isDirectory())
for (const subdir of subdirs) {
const metaPath = join(dir, subdir.name, 'metadata.json')
const metaStat = await stat(metaPath).catch(() => null)
if (metaStat?.isFile()) return true
}
return false
}
async function uploadSingleRun(
runDir: string,
runId: string,
r2Config: R2Config,
client: S3Client,
): Promise<void> {
const taskDirs = await readdir(runDir, { withFileTypes: true })
const taskEntries = taskDirs.filter((d) => d.isDirectory())
if (taskEntries.length === 0) {
console.warn(` No task subdirectories in ${runId}, skipping`)
return
}
const manifestTasks: Record<string, unknown>[] = []
const jobs: { key: string; filePath: string; contentType: string }[] = []
// Extract agent config from first task
let agentConfig: Record<string, unknown> | undefined
let dataset: string | undefined
for (const taskDir of taskEntries) {
const taskId = taskDir.name
const taskPath = join(runDir, taskId)
const metaPath = join(taskPath, 'metadata.json')
let meta: Record<string, unknown> = {}
try {
meta = JSON.parse(await readFile(metaPath, 'utf-8'))
} catch {
continue
}
if (!agentConfig && meta.agent_config)
agentConfig = meta.agent_config as Record<string, unknown>
if (!dataset && meta.dataset) dataset = meta.dataset as string
const files = await collectFiles(taskPath)
let screenshotCount = 0
for (const file of files) {
const relative = file.slice(taskPath.length + 1)
const ext = extname(file)
if (relative.startsWith('screenshots/') && ext === '.png')
screenshotCount++
jobs.push({
key: `runs/${runId}/${taskId}/${relative}`,
filePath: file,
contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
})
}
manifestTasks.push({
queryId: meta.query_id || taskId,
query: meta.query || '',
startUrl: meta.start_url || '',
status:
meta.termination_reason === 'completed'
? 'completed'
: meta.termination_reason || 'unknown',
durationMs: meta.total_duration_ms || 0,
screenshotCount: (meta.screenshot_count as number) || screenshotCount,
graderResults: meta.grader_results || {},
})
}
if (manifestTasks.length === 0) {
console.warn(` No completed tasks in ${runId}, skipping`)
return
}
console.log(
` Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
)
let uploaded = 0
await runPool(jobs, CONCURRENCY, async (job) => {
const body = await readFile(job.filePath)
await upload(client, r2Config.bucket, job.key, body, job.contentType)
uploaded++
if (uploaded % 50 === 0 || uploaded === jobs.length) {
console.log(` ${uploaded}/${jobs.length}`)
}
})
// Read summary.json if it exists
let summaryData: Record<string, unknown> | undefined
try {
summaryData = JSON.parse(
await readFile(join(runDir, 'summary.json'), 'utf-8'),
)
} catch {}
// Upload manifest
const manifest = {
runId,
uploadedAt: new Date().toISOString(),
agentConfig,
dataset,
summary: summaryData
? {
passRate: summaryData.passRate,
avgDurationMs: summaryData.avgDurationMs,
}
: undefined,
tasks: manifestTasks,
}
const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
await upload(
client,
r2Config.bucket,
`runs/${runId}/manifest.json`,
manifestBody,
'application/json',
)
// Upload viewer.html to bucket root
const viewerPath = join(
import.meta.dir,
'..',
'src',
'dashboard',
'viewer.html',
)
const viewerBody = await readFile(viewerPath)
await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
console.log(` Uploaded ${uploaded + 2} files`)
console.log(` ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
}
async function main() {
async function main(): Promise<void> {
const inputDir = process.argv[2]
if (!inputDir) {
console.error(
throw new Error(
'Usage:\n' +
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730 (specific run)\n' +
' bun scripts/upload-run.ts results/config-name (all un-uploaded runs)',
)
process.exit(1)
}
const dirStat = await stat(inputDir).catch(() => null)
if (!dirStat?.isDirectory()) {
console.error(`Not a directory: ${inputDir}`)
process.exit(1)
}
const r2Config = loadConfig()
const client = createClient(r2Config)
if (await isRunDir(inputDir)) {
// Single run: results/config-name/2026-03-21-1730
const timestamp = basename(inputDir)
const configName = basename(dirname(inputDir))
const runId = `${configName}-${timestamp}`
console.log(`Uploading run: ${runId}`)
await uploadSingleRun(inputDir, runId, r2Config, client)
} else {
// Config dir: results/config-name/ — upload all un-uploaded runs
const configName = basename(inputDir)
const entries = await readdir(inputDir, { withFileTypes: true })
const runDirs = entries
.filter((e) => e.isDirectory())
.map((e) => e.name)
.sort()
if (runDirs.length === 0) {
console.error('No run subdirectories found')
process.exit(1)
}
console.log(
`Found ${runDirs.length} runs for config "${configName}", checking R2...`,
)
let uploadedCount = 0
for (const dir of runDirs) {
const runId = `${configName}-${dir}`
const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
if (alreadyUploaded) {
console.log(` ${runId}: already uploaded, skipping`)
continue
}
console.log(` ${runId}: uploading...`)
await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
uploadedCount++
}
console.log(
`\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
' bun scripts/upload-run.ts results/config-name',
)
}
const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
const result = await publisher.publishPath(inputDir)
for (const run of result.uploadedRuns) {
console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
console.log(run.viewerUrl)
}
for (const runId of result.skippedRuns) {
console.log(`${runId}: already uploaded, skipping`)
}
console.log(
`Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
)
}
main()
main().catch((error) => {
console.error(error instanceof Error ? error.message : String(error))
process.exit(1)
})

View File

@@ -24,45 +24,11 @@ import {
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3'
interface ManifestTask {
queryId: string
query: string
status: string
durationMs: number
screenshotCount: number
graderResults: Record<string, { pass: boolean; score: number }>
}
interface Manifest {
runId: string
uploadedAt: string
agentConfig?: { type?: string; model?: string }
dataset?: string
summary?: { passRate?: number; avgDurationMs?: number }
tasks: ManifestTask[]
}
interface RunSummary {
runId: string
configName: string
date: string
avgScore: number
total: number
completed: number
failed: number
timeout: number
avgDurationMs: number
model: string
dataset: string
agentType: string
}
const PASS_FAIL_GRADER_ORDER = [
'agisdk_state_diff',
'infinity_state',
'performance_grader',
]
import {
buildRunSummaries,
type ReportManifest,
type RunSummary,
} from '../src/reporting/run-summary'
function requireEnv(name: string): string {
const value = process.env[name]
@@ -87,7 +53,7 @@ const client = new S3Client({
// Step 1: List all manifest.json files in runs/
console.log('Scanning R2 for eval runs...')
const manifests: Manifest[] = []
const manifests: ReportManifest[] = []
let continuationToken: string | undefined
do {
@@ -127,64 +93,9 @@ if (manifests.length === 0) {
}
// Step 2: Build run summaries
const runs: RunSummary[] = manifests
.map((m) => {
const total = m.tasks.length
const completed = m.tasks.filter((t) => t.status === 'completed').length
const failed = m.tasks.filter((t) => t.status === 'failed').length
const timeout = m.tasks.filter((t) => t.status === 'timeout').length
let scoredCount = 0
let scoreSum = 0
for (const task of m.tasks) {
if (!task.graderResults) continue
for (const name of PASS_FAIL_GRADER_ORDER) {
if (task.graderResults[name]) {
scoredCount++
scoreSum += task.graderResults[name].score ?? 0
break
}
}
}
const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
const durations = m.tasks
.filter((t) => t.durationMs > 0)
.map((t) => t.durationMs)
const avgDurationMs =
durations.length > 0
? durations.reduce((a, b) => a + b, 0) / durations.length
: 0
const date = m.uploadedAt
? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
: m.runId.slice(0, 15)
const model = m.agentConfig?.model || 'unknown'
const dataset = m.dataset || m.runId
const agentType = m.agentConfig?.type || 'unknown'
const configName = extractConfigName(m.runId)
return {
runId: m.runId,
configName,
date,
avgScore,
total,
completed,
failed,
timeout,
avgDurationMs,
model,
dataset,
agentType,
}
})
.sort((a, b) => a.date.localeCompare(b.date))
const runs: RunSummary[] = buildRunSummaries(manifests)
// Step 3: Identify unique config groups
// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
// Extract config name by stripping the date-time suffix pattern
function escHtml(s: string): string {
return s
.replace(/&/g, '&amp;')
@@ -193,12 +104,6 @@ function escHtml(s: string): string {
.replace(/"/g, '&quot;')
}
function extractConfigName(runId: string): string {
// "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
// "ci-weekly" → "ci-weekly" (no timestamp, old format)
return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
}
const configGroups = [...new Set(runs.map((r) => r.configName))]
const defaultConfig = configGroups.includes('ci-weekly')
? 'ci-weekly'

View File

@@ -1,113 +1,67 @@
import { randomUUID } from 'node:crypto'
import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
import { sleep } from '../../../../utils/sleep'
import type {
ExecutorConfig,
ExecutorResult,
} from '../../../orchestrator-executor/types'
import type { ExecutorCallbacks } from '../../executor-backend'
import {
CLADO_REQUEST_TIMEOUT_MS,
MAX_ACTIONS_PER_DELEGATION,
} from '../../constants'
import { McpClient, type McpToolResult } from '../../utils/mcp-client'
import { sleep } from '../../utils/sleep'
import type { ExecutorCallbacks } from './executor'
import type { ExecutorConfig, ExecutorResult } from './types'
extractCladoThinking,
formatCladoHistory,
getCladoActionSignature,
parseCladoActions,
summarizeCladoPrediction,
} from './clado-actions'
import {
normalizeCladoDirection,
normalizeCladoPressKey,
normalizeCladoScrollAmount,
prepareCladoToolArgs,
resolveCladoPoint,
} from './clado-browser-driver'
import { CladoActionClient } from './clado-client'
import {
CLADO_ACTION_PROVIDER,
type CladoAction,
type CladoActionPoint,
type CladoActionResponse,
type CladoViewport,
isCladoActionProvider,
} from './types'
const CLADO_ACTION_PROVIDER = 'clado-action'
const PAGE_SCOPED_TOOLS = new Set<string>([
'take_screenshot',
'evaluate_script',
'click',
'click_at',
'hover',
'hover_at',
'clear',
'fill',
'press_key',
'type_at',
'drag',
'drag_at',
'scroll',
'handle_dialog',
'select_option',
'navigate_page',
'close_page',
'wait_for',
])
interface CladoActionResponse {
action?: string
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
inference_time_seconds?: number
raw_response?: string
}
interface Viewport {
width: number
height: number
}
interface CladoAction {
action: string
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
}
type RawActionPayload = Partial<CladoAction>
interface ActionPoint {
x: number
y: number
}
const MAX_CONSECUTIVE_PARSE_FAILURES = 3
function asErrorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error)
}
function clampNormalized(value: number): number {
return Math.min(999, Math.max(0, Math.round(value)))
}
function isCladoProvider(provider: string): boolean {
return provider === CLADO_ACTION_PROVIDER
}
export class CladoActionExecutor {
private readonly mcpClient: McpClient
private readonly cladoClient: CladoActionClient
private readonly pageId: number
private callbacks: ExecutorCallbacks = {}
private stepsUsed = 0
private viewport: Viewport | null = null
private lastPoint: ActionPoint | null = null
private viewport: CladoViewport | null = null
private lastPoint: CladoActionPoint | null = null
private currentUrl = ''
constructor(
private readonly config: ExecutorConfig,
config: ExecutorConfig,
serverUrl: string,
readonly _windowId?: number,
readonly _tabId?: number,
initialPageId?: number,
) {
if (!isCladoProvider(config.provider)) {
if (!isCladoActionProvider(config.provider)) {
throw new Error(
`CladoActionExecutor requires provider="${CLADO_ACTION_PROVIDER}"`,
)
}
this.mcpClient = new McpClient(`${serverUrl}/mcp`)
this.cladoClient = new CladoActionClient({
baseUrl: config.baseUrl,
apiKey: config.apiKey,
})
this.pageId = initialPageId ?? 1
}
@@ -135,6 +89,8 @@ export class CladoActionExecutor {
const actionHistory: CladoAction[] = []
let predictionCalls = 0
const thinkingTrace: string[] = []
let consecutiveParseFailures = 0
let finalAnswer: string | undefined
let status: ExecutorResult['status'] = 'done'
let reason = 'Goal executed.'
@@ -155,7 +111,7 @@ export class CladoActionExecutor {
break
}
const historyForPrediction = this.formatHistory(actionHistory)
const historyForPrediction = formatCladoHistory(actionHistory)
const actionToolCallId = randomUUID()
const predictionInput = {
instruction,
@@ -177,7 +133,7 @@ export class CladoActionExecutor {
signal,
)
predictionCalls++
const thinking = this.extractThinking(prediction.raw_response)
const thinking = extractCladoThinking(prediction.raw_response)
if (thinking) {
const previous = thinkingTrace[thinkingTrace.length - 1]
if (previous !== thinking) {
@@ -207,8 +163,19 @@ export class CladoActionExecutor {
break
}
const predictedActions = this.parseActions(prediction)
const predictedActions = parseCladoActions(prediction)
if (predictedActions.length === 0) {
// Per Clado contract: HTTP 200 with action=null on parse failure.
// Count as an invalid step so the model can self-correct on the
// next call instead of dropping the trajectory.
consecutiveParseFailures++
const parseError =
prediction.parse_error ?? 'no parsable <answer> in raw_response'
actionHistory.push({
action: 'invalid',
text: `parse_error: ${parseError}`,
})
this.stepsUsed++
await this.callbacks.onStepFinish?.({
toolCalls: [
{
@@ -222,16 +189,23 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: [],
parseError,
consecutiveParseFailures,
},
},
],
})
status = 'blocked'
reason = 'Clado action response did not contain a valid action.'
break
if (consecutiveParseFailures >= MAX_CONSECUTIVE_PARSE_FAILURES) {
status = 'blocked'
reason = `Clado returned ${consecutiveParseFailures} consecutive unparseable responses.`
break
}
continue
}
consecutiveParseFailures = 0
let requestedStop = false
const executionNotes: string[] = []
@@ -257,7 +231,7 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: predictedActions,
executed: executionNotes,
},
@@ -272,7 +246,12 @@ export class CladoActionExecutor {
actionHistory.push(predictedAction)
if (predictedAction.action === 'end') {
reason = 'Model requested end() and marked task complete.'
if (predictedAction.final_answer) {
finalAnswer = predictedAction.final_answer
reason = `Model requested end() with final_answer: ${predictedAction.final_answer.slice(0, 240)}`
} else {
reason = 'Model requested end() and marked task complete.'
}
requestedStop = true
break
}
@@ -293,7 +272,7 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: predictedActions,
executed: executionNotes,
},
@@ -327,6 +306,7 @@ export class CladoActionExecutor {
actions: actionHistory,
url: this.currentUrl,
thinkingTrace,
finalAnswer,
})
return {
@@ -344,121 +324,12 @@ export class CladoActionExecutor {
actionHistory: CladoAction[],
signal?: AbortSignal,
): Promise<CladoActionResponse> {
if (!this.config.baseUrl) {
throw new Error('executor.baseUrl must be set for clado-action provider')
}
const requestController = new AbortController()
const onAbort = () => requestController.abort()
signal?.addEventListener('abort', onAbort, { once: true })
const timeoutHandle = setTimeout(() => {
requestController.abort()
}, CLADO_REQUEST_TIMEOUT_MS)
try {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (this.config.apiKey) {
headers.Authorization = `Bearer ${this.config.apiKey}`
}
const response = await fetch(this.config.baseUrl, {
method: 'POST',
headers,
body: JSON.stringify({
instruction,
image_base64: imageBase64,
history: this.formatHistory(actionHistory),
}),
signal: requestController.signal,
})
if (!response.ok) {
const body = await response.text()
throw new Error(
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
)
}
return (await response.json()) as CladoActionResponse
} finally {
clearTimeout(timeoutHandle)
signal?.removeEventListener('abort', onAbort)
}
}
private parseActions(prediction: CladoActionResponse): CladoAction[] {
const actionFromField =
typeof prediction.action === 'string' ? prediction.action : null
const rawActions = this.parseActionsFromRawResponse(prediction.raw_response)
const primaryFromRaw = rawActions[0] ?? null
const mergedPrimary = {
...primaryFromRaw,
...prediction,
action: actionFromField ?? primaryFromRaw?.action,
}
const normalized: CladoAction[] = []
const primary = this.normalizeActionPayload(mergedPrimary)
if (primary) normalized.push(primary)
for (const candidate of rawActions.slice(1)) {
const parsed = this.normalizeActionPayload(candidate)
if (!parsed) continue
const prev = normalized[normalized.length - 1]
if (
!prev ||
this.getActionSignature(prev) !== this.getActionSignature(parsed)
) {
normalized.push(parsed)
}
}
return normalized
}
private normalizeActionPayload(
payload: RawActionPayload,
): CladoAction | null {
if (!payload.action || typeof payload.action !== 'string') {
return null
}
return {
action: payload.action,
x: typeof payload.x === 'number' ? payload.x : undefined,
y: typeof payload.y === 'number' ? payload.y : undefined,
text: typeof payload.text === 'string' ? payload.text : undefined,
key: typeof payload.key === 'string' ? payload.key : undefined,
direction:
typeof payload.direction === 'string' ? payload.direction : undefined,
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
time: typeof payload.time === 'number' ? payload.time : undefined,
}
}
private parseActionsFromRawResponse(
rawResponse: string | undefined,
): RawActionPayload[] {
if (!rawResponse) return []
const matches = [
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
]
const parsed: RawActionPayload[] = []
for (const match of matches) {
try {
parsed.push(JSON.parse(match[1]) as RawActionPayload)
} catch {
// ignore malformed answer blocks
}
}
return parsed
return this.cladoClient.requestActionPrediction({
instruction,
imageBase64,
actionHistory,
signal,
})
}
private async executeAction(
@@ -529,14 +400,14 @@ export class CladoActionExecutor {
}
case 'press_key': {
const key = this.normalizePressKey(action.key)
const key = normalizeCladoPressKey(action.key)
await this.runTool('press_key', { key }, signal)
return `Pressed key "${key}".`
}
case 'scroll': {
const direction = this.normalizeDirection(action.direction)
const amountPx = this.normalizeScrollAmount(action.amount)
const direction = normalizeCladoDirection(action.direction)
const amountPx = normalizeCladoScrollAmount(action.amount)
const ticks = Math.max(1, Math.round(amountPx / 120))
await this.runTool('scroll', { direction, amount: ticks }, signal)
@@ -578,7 +449,9 @@ export class CladoActionExecutor {
}
case 'end': {
return 'Model requested end().'
return action.final_answer
? `Model requested end() with final_answer: ${action.final_answer.slice(0, 240)}`
: 'Model requested end().'
}
default: {
@@ -588,9 +461,10 @@ export class CladoActionExecutor {
}
private async captureScreenshotBase64(signal?: AbortSignal): Promise<string> {
// Clado contract is PNG or JPEG; use PNG for lossless input.
const result = await this.runTool(
'take_screenshot',
{ format: 'webp', quality: 80 },
{ format: 'png' },
signal,
)
@@ -604,7 +478,7 @@ export class CladoActionExecutor {
return image.data
}
private async getViewport(signal?: AbortSignal): Promise<Viewport> {
private async getViewport(signal?: AbortSignal): Promise<CladoViewport> {
if (this.viewport) return this.viewport
try {
@@ -635,15 +509,9 @@ export class CladoActionExecutor {
normalizedX: number | undefined,
normalizedY: number | undefined,
signal?: AbortSignal,
): Promise<ActionPoint> {
): Promise<CladoActionPoint> {
const viewport = await this.getViewport(signal)
const nx = clampNormalized(normalizedX ?? 500)
const ny = clampNormalized(normalizedY ?? 500)
return {
x: Math.round((nx / 1000) * viewport.width),
y: Math.round((ny / 1000) * viewport.height),
}
return resolveCladoPoint(viewport, normalizedX, normalizedY)
}
private async getCurrentUrl(signal?: AbortSignal): Promise<string> {
@@ -670,7 +538,7 @@ export class CladoActionExecutor {
throw new Error('aborted')
}
const toolArgs = this.prepareToolArgs(toolName, args)
const toolArgs = prepareCladoToolArgs(toolName, args, this.pageId)
try {
const raw = await this.mcpClient.callTool(toolName, toolArgs)
@@ -689,211 +557,22 @@ export class CladoActionExecutor {
}
}
private prepareToolArgs(
toolName: string,
args: Record<string, unknown>,
): Record<string, unknown> {
const prepared: Record<string, unknown> = { ...args }
if (
toolName === 'evaluate_script' &&
typeof prepared.function === 'string' &&
prepared.expression === undefined
) {
prepared.expression = this.toEvaluateExpression(prepared.function)
delete prepared.function
}
if (
toolName === 'click_at' &&
typeof prepared.dblClick === 'boolean' &&
prepared.clickCount === undefined
) {
prepared.clickCount = prepared.dblClick ? 2 : 1
delete prepared.dblClick
}
// Use fixed page ID for all page-scoped tools (single-page operation)
if (PAGE_SCOPED_TOOLS.has(toolName) && typeof prepared.page !== 'number') {
prepared.page = this.pageId
}
return prepared
}
private toEvaluateExpression(rawFunction: unknown): string {
const source = String(rawFunction).trim()
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
return `(${source})()`
}
if (source.startsWith('function')) {
return `(${source})()`
}
return source
}
private normalizePressKey(key: string | undefined): string {
const raw = (key ?? '').trim()
if (!raw) throw new Error('press_key action missing key field')
const map: Record<string, string> = {
'C-a': 'Control+A',
'C-c': 'Control+C',
'C-v': 'Control+V',
'C-x': 'Control+X',
'C-z': 'Control+Z',
'C-y': 'Control+Y',
'C-s': 'Control+S',
'C-t': 'Control+T',
'C-w': 'Control+W',
'C-h': 'Control+H',
'C-f': 'Control+F',
'C-+': 'Control++',
'C--': 'Control+-',
'C-tab': 'Control+Tab',
'C-S-tab': 'Control+Shift+Tab',
'C-S-n': 'Control+Shift+N',
'C-down': 'Control+ArrowDown',
'M-f4': 'Alt+F4',
}
return map[raw] ?? raw
}
private normalizeDirection(
direction: string | undefined,
): 'up' | 'down' | 'left' | 'right' {
if (
direction === 'up' ||
direction === 'down' ||
direction === 'left' ||
direction === 'right'
) {
return direction
}
return 'down'
}
private normalizeScrollAmount(amount: number | undefined): number {
if (typeof amount !== 'number') return 500
if (amount <= 0) return 100
const clamped = Math.min(amount, 1000)
return Math.max(100, Math.round((clamped / 1000) * 900))
}
private summarizePrediction(
prediction: CladoActionResponse,
): Record<string, unknown> {
const preview =
typeof prediction.raw_response === 'string' &&
prediction.raw_response.length > 0
? prediction.raw_response.slice(0, 240)
: undefined
return {
action: prediction.action,
x: prediction.x,
y: prediction.y,
text: prediction.text,
key: prediction.key,
direction: prediction.direction,
startX: prediction.startX,
startY: prediction.startY,
endX: prediction.endX,
endY: prediction.endY,
amount: prediction.amount,
time: prediction.time,
inference_time_seconds: prediction.inference_time_seconds,
raw_response_preview: preview,
}
}
private extractThinking(rawResponse: string | undefined): string | undefined {
if (!rawResponse) return undefined
const matches = [
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
]
if (matches.length === 0) return undefined
const merged = matches
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
.filter((value) => value.length > 0)
.join(' ')
if (!merged) return undefined
return merged
}
private getActionSignature(action: CladoAction): string {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
case 'type':
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
case 'press_key':
return `${action.action}:${action.key ?? 'key'}`
case 'scroll':
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
case 'drag':
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
case 'wait':
return `${action.action}:${action.time ?? 1}`
case 'end':
return 'end()'
default:
return action.action
}
}
private formatHistory(actions: CladoAction[]): string {
if (actions.length === 0) return 'None'
const parts = actions.map((action) => {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
case 'type': {
const text = (action.text ?? '').replace(/'/g, "\\'")
return `type('${text}')`
}
case 'press_key':
return `press_key('${action.key ?? 'Enter'}')`
case 'scroll':
return `scroll(${action.direction ?? 'down'})`
case 'drag':
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
case 'wait':
return `wait(${Math.round(action.time ?? 1)}s)`
case 'end':
return 'end()'
default:
return action.action
}
})
return parts.join(' -> ')
}
private buildObservation(params: {
status: ExecutorResult['status']
reason: string
actions: CladoAction[]
url: string
thinkingTrace: string[]
finalAnswer?: string
}): string {
const { status, reason, actions, url, thinkingTrace } = params
const { status, reason, actions, url, thinkingTrace, finalAnswer } = params
const actionSummary =
actions.length === 0
? 'No actions were executed.'
: actions
.slice(-5)
.map(
(action, idx) => `${idx + 1}. ${this.getActionSignature(action)}`,
(action, idx) => `${idx + 1}. ${getCladoActionSignature(action)}`,
)
.join('\n')
const thinkingSummary =
@@ -907,6 +586,7 @@ export class CladoActionExecutor {
`Status: ${status}`,
`Reason: ${reason}`,
`URL: ${url || 'unknown'}`,
finalAnswer ? `Final answer: ${finalAnswer}` : '',
'',
'Recent actions:',
actionSummary,

View File

@@ -0,0 +1,191 @@
import type {
CladoAction,
CladoActionResponse,
RawCladoActionPayload,
} from './types'
/** Parses Clado's structured response plus any raw `<answer>` blocks into executable actions. */
export function parseCladoActions(
prediction: CladoActionResponse,
): CladoAction[] {
const actionFromField =
typeof prediction.action === 'string' ? prediction.action : null
const rawActions = parseCladoActionsFromRawResponse(prediction.raw_response)
const primaryFromRaw = rawActions[0] ?? null
const mergedPrimary = {
...primaryFromRaw,
...prediction,
action: actionFromField ?? primaryFromRaw?.action,
}
const normalized: CladoAction[] = []
const primary = normalizeCladoActionPayload(mergedPrimary)
if (primary) normalized.push(primary)
for (const candidate of rawActions.slice(1)) {
const parsed = normalizeCladoActionPayload(candidate)
if (!parsed) continue
const prev = normalized[normalized.length - 1]
if (
!prev ||
getCladoActionSignature(prev) !== getCladoActionSignature(parsed)
) {
normalized.push(parsed)
}
}
return normalized
}
export function normalizeCladoActionPayload(
payload: RawCladoActionPayload,
): CladoAction | null {
if (!payload.action || typeof payload.action !== 'string') {
return null
}
return {
action: payload.action,
x: typeof payload.x === 'number' ? payload.x : undefined,
y: typeof payload.y === 'number' ? payload.y : undefined,
text: typeof payload.text === 'string' ? payload.text : undefined,
key: typeof payload.key === 'string' ? payload.key : undefined,
direction:
typeof payload.direction === 'string' ? payload.direction : undefined,
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
time: typeof payload.time === 'number' ? payload.time : undefined,
final_answer:
typeof payload.final_answer === 'string'
? payload.final_answer
: undefined,
}
}
export function parseCladoActionsFromRawResponse(
rawResponse: string | undefined,
): RawCladoActionPayload[] {
if (!rawResponse) return []
const matches = [
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
]
const parsed: RawCladoActionPayload[] = []
for (const match of matches) {
try {
parsed.push(JSON.parse(match[1]) as RawCladoActionPayload)
} catch {
// Ignore malformed answer blocks so one bad block does not drop the whole prediction.
}
}
return parsed
}
export function extractCladoThinking(
rawResponse: string | undefined,
): string | undefined {
if (!rawResponse) return undefined
const matches = [
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
]
if (matches.length === 0) return undefined
const merged = matches
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
.filter((value) => value.length > 0)
.join(' ')
if (!merged) return undefined
return merged
}
export function summarizeCladoPrediction(
prediction: CladoActionResponse,
): Record<string, unknown> {
const preview =
typeof prediction.raw_response === 'string' &&
prediction.raw_response.length > 0
? prediction.raw_response.slice(0, 240)
: undefined
return {
action: prediction.action,
x: prediction.x,
y: prediction.y,
text: prediction.text,
key: prediction.key,
direction: prediction.direction,
startX: prediction.startX,
startY: prediction.startY,
endX: prediction.endX,
endY: prediction.endY,
amount: prediction.amount,
time: prediction.time,
inference_time_seconds: prediction.inference_time_seconds,
raw_response_preview: preview,
}
}
export function getCladoActionSignature(action: CladoAction): string {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
case 'type':
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
case 'press_key':
return `${action.action}:${action.key ?? 'key'}`
case 'scroll':
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
case 'drag':
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
case 'wait':
return `${action.action}:${action.time ?? 1}`
case 'end':
return action.final_answer
? `end(${action.final_answer.slice(0, 32)})`
: 'end()'
case 'invalid':
return `invalid(${(action.text ?? '').slice(0, 40)})`
default:
return action.action
}
}
export function formatCladoHistory(actions: CladoAction[]): string {
if (actions.length === 0) return 'None'
const parts = actions.map((action) => {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
case 'type': {
const text = (action.text ?? '').replace(/'/g, "\\'")
return `type('${text}')`
}
case 'press_key':
return `press_key('${action.key ?? 'Enter'}')`
case 'scroll':
return `scroll(${action.direction ?? 'down'})`
case 'drag':
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
case 'wait':
return `wait(${Math.round(action.time ?? 1)}s)`
case 'end':
return 'end()'
case 'invalid':
return 'invalid()'
default:
return action.action
}
})
return parts.join(' -> ')
}

View File

@@ -0,0 +1,123 @@
import {
CLADO_PAGE_SCOPED_TOOLS,
type CladoActionPoint,
type CladoViewport,
} from './types'
export function clampCladoNormalizedCoordinate(value: number): number {
return Math.min(999, Math.max(0, Math.round(value)))
}
/** Converts Clado's 0-1000 normalized coordinate space into BrowserOS viewport pixels. */
export function resolveCladoPoint(
viewport: CladoViewport,
normalizedX: number | undefined,
normalizedY: number | undefined,
): CladoActionPoint {
const nx = clampCladoNormalizedCoordinate(normalizedX ?? 500)
const ny = clampCladoNormalizedCoordinate(normalizedY ?? 500)
return {
x: Math.round((nx / 1000) * viewport.width),
y: Math.round((ny / 1000) * viewport.height),
}
}
/** Adapts Clado action tool arguments to the BrowserOS MCP tool argument contract. */
export function prepareCladoToolArgs(
toolName: string,
args: Record<string, unknown>,
pageId: number,
): Record<string, unknown> {
const prepared: Record<string, unknown> = { ...args }
if (
toolName === 'evaluate_script' &&
typeof prepared.function === 'string' &&
prepared.expression === undefined
) {
prepared.expression = toCladoEvaluateExpression(prepared.function)
delete prepared.function
}
if (
toolName === 'click_at' &&
typeof prepared.dblClick === 'boolean' &&
prepared.clickCount === undefined
) {
prepared.clickCount = prepared.dblClick ? 2 : 1
delete prepared.dblClick
}
if (
CLADO_PAGE_SCOPED_TOOLS.has(toolName) &&
typeof prepared.page !== 'number'
) {
prepared.page = pageId
}
return prepared
}
export function toCladoEvaluateExpression(rawFunction: unknown): string {
const source = String(rawFunction).trim()
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
return `(${source})()`
}
if (source.startsWith('function')) {
return `(${source})()`
}
return source
}
export function normalizeCladoPressKey(key: string | undefined): string {
const raw = (key ?? '').trim()
if (!raw) throw new Error('press_key action missing key field')
const map: Record<string, string> = {
'C-a': 'Control+A',
'C-c': 'Control+C',
'C-v': 'Control+V',
'C-x': 'Control+X',
'C-z': 'Control+Z',
'C-y': 'Control+Y',
'C-s': 'Control+S',
'C-t': 'Control+T',
'C-w': 'Control+W',
'C-h': 'Control+H',
'C-f': 'Control+F',
'C-+': 'Control++',
'C--': 'Control+-',
'C-tab': 'Control+Tab',
'C-S-tab': 'Control+Shift+Tab',
'C-S-n': 'Control+Shift+N',
'C-down': 'Control+ArrowDown',
'M-a': 'Meta+A',
'M-c': 'Meta+C',
'M-v': 'Meta+V',
'M-x': 'Meta+X',
'M-f4': 'Alt+F4',
}
return map[raw] ?? raw
}
export function normalizeCladoDirection(
direction: string | undefined,
): 'up' | 'down' | 'left' | 'right' {
if (
direction === 'up' ||
direction === 'down' ||
direction === 'left' ||
direction === 'right'
) {
return direction
}
return 'down'
}
export function normalizeCladoScrollAmount(amount: number | undefined): number {
if (typeof amount !== 'number') return 500
if (amount <= 0) return 100
const clamped = Math.min(amount, 1000)
return Math.max(100, Math.round((clamped / 1000) * 900))
}

View File

@@ -0,0 +1,68 @@
import { CLADO_REQUEST_TIMEOUT_MS } from '../../../../constants'
import { formatCladoHistory } from './clado-actions'
import type { CladoAction, CladoActionResponse } from './types'
export interface CladoActionClientOptions {
baseUrl?: string
apiKey?: string
}
export interface CladoActionPredictionInput {
instruction: string
imageBase64: string
actionHistory: CladoAction[]
signal?: AbortSignal
}
/** Calls the Clado action model without exposing credentials in process arguments or artifacts. */
export class CladoActionClient {
constructor(private readonly options: CladoActionClientOptions) {}
async requestActionPrediction(
input: CladoActionPredictionInput,
): Promise<CladoActionResponse> {
if (!this.options.baseUrl) {
throw new Error('executor.baseUrl must be set for clado-action provider')
}
const requestController = new AbortController()
const onAbort = () => requestController.abort()
input.signal?.addEventListener('abort', onAbort, { once: true })
const timeoutHandle = setTimeout(() => {
requestController.abort()
}, CLADO_REQUEST_TIMEOUT_MS)
try {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (this.options.apiKey) {
headers.Authorization = `Bearer ${this.options.apiKey}`
}
const response = await fetch(this.options.baseUrl, {
method: 'POST',
headers,
body: JSON.stringify({
instruction: input.instruction,
image_base64: input.imageBase64,
history: formatCladoHistory(input.actionHistory),
}),
signal: requestController.signal,
})
if (!response.ok) {
const body = await response.text()
throw new Error(
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
)
}
return (await response.json()) as CladoActionResponse
} finally {
clearTimeout(timeoutHandle)
input.signal?.removeEventListener('abort', onAbort)
}
}
}

View File

@@ -0,0 +1,56 @@
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
import type {
DelegationResult,
ExecutorBackend,
ExecutorCallbacks,
} from '../../executor-backend'
import { CladoActionExecutor } from './clado-action-executor'
export interface CladoExecutorBackendOptions {
configTemplate: ResolvedAgentConfig
serverUrl: string
initialPageId?: number
callbacks?: ExecutorCallbacks
}
/** Executes delegated goals through the Clado visual action model. */
export class CladoExecutorBackend implements ExecutorBackend {
readonly kind = 'clado'
private executor: CladoActionExecutor | null = null
constructor(private readonly options: CladoExecutorBackendOptions) {}
async execute(
instruction: string,
signal?: AbortSignal,
): Promise<DelegationResult> {
const executor = this.getExecutor()
const result = await executor.execute(instruction, signal)
return result
}
async close(): Promise<void> {
await this.executor?.close()
}
getTotalSteps(): number {
return this.executor?.getTotalSteps() ?? 0
}
private getExecutor(): CladoActionExecutor {
if (this.executor) return this.executor
this.executor = new CladoActionExecutor(
{
provider: this.options.configTemplate.provider,
model: this.options.configTemplate.model,
apiKey: this.options.configTemplate.apiKey ?? '',
baseUrl: this.options.configTemplate.baseUrl,
},
this.options.serverUrl,
this.options.initialPageId,
)
this.executor.setCallbacks(this.options.callbacks ?? {})
return this.executor
}
}

View File

@@ -0,0 +1,78 @@
export const CLADO_ACTION_PROVIDER = 'clado-action'
export const CLADO_PAGE_SCOPED_TOOLS = new Set<string>([
'take_screenshot',
'evaluate_script',
'click',
'click_at',
'hover',
'hover_at',
'clear',
'fill',
'press_key',
'type_at',
'drag',
'drag_at',
'scroll',
'handle_dialog',
'select_option',
'navigate_page',
'close_page',
'wait_for',
])
export interface CladoActionResponse {
action?: string | null
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
final_answer?: string | null
inference_time_seconds?: number
raw_response?: string
thinking?: string | null
parse_error?: string | null
}
export interface CladoViewport {
width: number
height: number
}
export interface CladoAction {
action: string
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
final_answer?: string
}
export type RawCladoActionPayload = Partial<
Omit<CladoAction, 'final_answer'>
> & {
final_answer?: string | null
}
export interface CladoActionPoint {
x: number
y: number
}
export function isCladoActionProvider(provider: string): boolean {
return provider === CLADO_ACTION_PROVIDER
}

View File

@@ -0,0 +1,60 @@
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
import type { Browser } from '@browseros/server/browser'
import type {
ExecutorBackend,
ExecutorBackendKind,
ExecutorCallbacks,
} from '../executor-backend'
import { CladoExecutorBackend } from './clado/clado-executor-backend'
import { isCladoActionProvider } from './clado/types'
import { ToolLoopExecutorBackend } from './tool-loop/tool-loop-executor-backend'
export interface CreateExecutorBackendOptions {
backendKind?: ExecutorBackendKind
provider?: string
configTemplate?: ResolvedAgentConfig
browser?: Browser | null
serverUrl?: string
windowId?: number
tabId?: number
initialPageId?: number
callbacks?: ExecutorCallbacks
executor?: ExecutorBackend
}
export function backendKindForProvider(provider: string): ExecutorBackendKind {
return isCladoActionProvider(provider) ? 'clado' : 'tool-loop'
}
/** Creates the backend used for one orchestrator delegation. */
export function createExecutorBackend(
options: CreateExecutorBackendOptions,
): ExecutorBackend {
if (options.executor) return options.executor
const kind =
options.backendKind ??
backendKindForProvider(
options.provider ?? options.configTemplate?.provider ?? '',
)
if (kind === 'clado') {
return new CladoExecutorBackend({
configTemplate: required(options.configTemplate, 'configTemplate'),
serverUrl: required(options.serverUrl, 'serverUrl'),
initialPageId: options.initialPageId,
callbacks: options.callbacks,
})
}
return new ToolLoopExecutorBackend({
configTemplate: required(options.configTemplate, 'configTemplate'),
browser: options.browser ?? null,
callbacks: options.callbacks,
})
}
function required<T>(value: T | undefined, name: string): T {
if (value === undefined) throw new Error(`${name} is required`)
return value
}

View File

@@ -0,0 +1,144 @@
import { randomUUID } from 'node:crypto'
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
import type { Browser } from '@browseros/server/browser'
import { registry } from '@browseros/server/tools/registry'
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
import type {
DelegationResult,
ExecutorBackend,
ExecutorCallbacks,
} from '../../executor-backend'
import { TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT } from './tool-loop-executor-prompt'
export interface ToolLoopExecutorBackendOptions {
configTemplate: ResolvedAgentConfig
browser: Browser | null
callbacks?: ExecutorCallbacks
}
/** Executes delegated goals through the BrowserOS ToolLoopAgent. */
export class ToolLoopExecutorBackend implements ExecutorBackend {
readonly kind = 'tool-loop'
private stepsUsed = 0
private currentUrl = ''
constructor(private readonly options: ToolLoopExecutorBackendOptions) {}
async execute(
instruction: string,
signal?: AbortSignal,
): Promise<DelegationResult> {
const browser = this.options.browser
if (!browser) {
throw new Error('Browser instance is required for tool-loop executor')
}
const stepsAtStart = this.stepsUsed
const toolsUsed: string[] = []
let status: DelegationResult['status'] = 'done'
let resultText = ''
const conversationId = randomUUID()
const agentConfig: ResolvedAgentConfig = {
...this.options.configTemplate,
conversationId,
userSystemPrompt: TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT,
evalMode: true,
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
}
const browserContext = await this.browserContext(browser)
let agent: AiSdkAgent | null = null
try {
agent = await AiSdkAgent.create({
resolvedConfig: agentConfig,
browser,
registry,
browserContext,
})
await agent.toolLoopAgent.generate({
prompt: instruction,
abortSignal: signal,
experimental_onToolCallStart: ({ toolCall }) => {
const input = toolCall.input as Record<string, unknown> | undefined
if (input && typeof input.url === 'string' && input.url.length > 0) {
this.currentUrl = input.url
}
this.options.callbacks?.onToolCallStart?.({
toolCallId: toolCall.toolCallId,
toolName: toolCall.toolName,
input: toolCall.input,
})
},
experimental_onToolCallFinish: async () => {
this.stepsUsed++
await this.options.callbacks?.onToolCallFinish?.()
},
onStepFinish: async ({ toolCalls, toolResults, text }) => {
if (toolCalls) {
for (const toolCall of toolCalls) {
if (!toolsUsed.includes(toolCall.toolName)) {
toolsUsed.push(toolCall.toolName)
}
}
}
if (text) resultText = text
await this.options.callbacks?.onStepFinish?.({
toolCalls,
toolResults,
text,
})
},
})
} catch {
status = signal?.aborted ? 'timeout' : 'blocked'
} finally {
if (agent) await agent.dispose().catch(() => {})
}
if (status === 'done' && signal?.aborted) {
status = 'timeout'
}
return {
observation: resultText || 'Execution completed with no actions taken.',
status,
url: this.currentUrl,
actionsPerformed: this.stepsUsed - stepsAtStart,
toolsUsed,
}
}
async close(): Promise<void> {
// No persistent resources; AiSdkAgent is disposed at the end of each execute() call.
}
getTotalSteps(): number {
return this.stepsUsed
}
private async browserContext(
browser: Browser,
): Promise<BrowserContext | undefined> {
const pages = await browser.listPages()
const activePage = pages[0]
if (!activePage) return undefined
return {
activeTab: {
id: activePage.tabId,
pageId: activePage.pageId,
url: activePage.url,
title: activePage.title,
},
}
}
}

View File

@@ -0,0 +1,21 @@
export const TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
## Your Job
1. Execute browser actions to achieve the given goal
2. Stop as soon as the goal is accomplished -- do NOT perform extra actions
3. Write a final observation describing the result
## Final Response Format
When done, your response MUST include:
- What you accomplished (or what went wrong)
- What the page currently shows: key headings, links, data, or content visible
- The current URL from the address bar
- If you got stuck, what is blocking progress
## Rules
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
- If the goal is to click something, confirm the result of the click.
- If you cannot find what was asked for, say so clearly -- do not guess or improvise.
- Prefer browser_navigate over browser_open_tab for going to URLs.
- Do NOT call browser_group_tabs or other organizational tools.`

View File

@@ -0,0 +1,33 @@
import type { ExecutorResult } from '../orchestrator-executor/types'
export type ExecutorBackendKind = 'tool-loop' | 'clado'
export type DelegationResult = ExecutorResult
export interface ToolCallInfo {
toolCallId: string
toolName: string
input: unknown
}
export interface ToolResultInfo {
toolCallId: string
toolName: string
output: unknown
}
export interface ExecutorCallbacks {
onToolCallStart?: (toolCall: ToolCallInfo) => void
onToolCallFinish?: () => Promise<void>
onStepFinish?: (step: {
toolCalls?: ReadonlyArray<ToolCallInfo>
toolResults?: ReadonlyArray<ToolResultInfo>
text?: string
}) => Promise<void>
}
export interface ExecutorBackend {
readonly kind: ExecutorBackendKind
execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
close(): Promise<void>
getTotalSteps(): number
}

View File

@@ -1,243 +0,0 @@
/**
* Executor - Wraps AiSdkAgent for page-level browser actions (direct CDP)
*
* The executor:
* - Receives goal-level instructions from orchestrator
* - Executes browser actions until the goal is accomplished
* - Returns observation to orchestrator (not full history)
*/
import { randomUUID } from 'node:crypto'
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
import type { Browser } from '@browseros/server/browser'
import { registry } from '@browseros/server/tools/registry'
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
import { CladoActionExecutor } from './clado-action-executor'
import type { ExecutorResult } from './types'
const EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
## Your Job
1. Execute browser actions to achieve the given goal
2. Stop as soon as the goal is accomplished — do NOT perform extra actions
3. Write a final observation describing the result
## Final Response Format
When done, your response MUST include:
- What you accomplished (or what went wrong)
- What the page currently shows: key headings, links, data, or content visible
- The current URL from the address bar
- If you got stuck, what is blocking progress
## Rules
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
- If the goal is to click something, confirm the result of the click.
- If you cannot find what was asked for, say so clearly — do not guess or improvise.
- Prefer browser_navigate over browser_open_tab for going to URLs.
- Do NOT call browser_group_tabs or other organizational tools.`
export interface ToolCallInfo {
toolCallId: string
toolName: string
input: unknown
}
export interface ToolResultInfo {
toolCallId: string
toolName: string
output: unknown
}
export interface ExecutorCallbacks {
onToolCallStart?: (toolCall: ToolCallInfo) => void
onToolCallFinish?: () => Promise<void>
onStepFinish?: (step: {
toolCalls?: ReadonlyArray<ToolCallInfo>
toolResults?: ReadonlyArray<ToolResultInfo>
text?: string
}) => Promise<void>
}
export class Executor {
private cladoExecutor: CladoActionExecutor | null = null
private stepsUsed = 0
private currentUrl = ''
private configTemplate: ResolvedAgentConfig
private isCladoAction: boolean
private browser: Browser | null
private serverUrl: string
private windowId?: number
private tabId?: number
private initialPageId?: number
private callbacks: ExecutorCallbacks
constructor(
configTemplate: ResolvedAgentConfig,
browser: Browser | null,
serverUrl: string,
options?: {
isCladoAction?: boolean
windowId?: number
tabId?: number
initialPageId?: number
callbacks?: ExecutorCallbacks
},
) {
this.configTemplate = configTemplate
this.isCladoAction = options?.isCladoAction ?? false
this.browser = browser
this.serverUrl = serverUrl
this.windowId = options?.windowId
this.tabId = options?.tabId
this.initialPageId = options?.initialPageId
this.callbacks = options?.callbacks ?? {}
}
async execute(
instruction: string,
signal?: AbortSignal,
): Promise<ExecutorResult> {
if (this.isCladoAction) {
if (!this.cladoExecutor) {
this.cladoExecutor = new CladoActionExecutor(
{
provider: this.configTemplate.provider,
model: this.configTemplate.model,
apiKey: this.configTemplate.apiKey ?? '',
baseUrl: this.configTemplate.baseUrl,
},
this.serverUrl,
this.windowId,
this.tabId,
this.initialPageId,
)
this.cladoExecutor.setCallbacks(this.callbacks)
}
const result = await this.cladoExecutor.execute(instruction, signal)
this.stepsUsed = this.cladoExecutor.getTotalSteps()
this.currentUrl = result.url || this.currentUrl
return result
}
if (!this.browser) {
throw new Error('Browser instance is required for standard executor path')
}
const stepsAtStart = this.stepsUsed
const toolsUsed: string[] = []
let status: 'done' | 'blocked' | 'timeout' = 'done'
let resultText = ''
const conversationId = randomUUID()
const agentConfig: ResolvedAgentConfig = {
...this.configTemplate,
conversationId,
userSystemPrompt: EXECUTOR_SYSTEM_PROMPT,
evalMode: true,
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
}
// Build browser context so executor agent knows the correct page ID
let browserContext: BrowserContext | undefined
if (this.browser) {
const pages = await this.browser.listPages()
const activePage = pages[0]
if (activePage) {
browserContext = {
activeTab: {
id: activePage.tabId,
pageId: activePage.pageId,
url: activePage.url,
title: activePage.title,
},
}
}
}
let agent: AiSdkAgent | null = null
try {
agent = await AiSdkAgent.create({
resolvedConfig: agentConfig,
browser: this.browser,
registry,
browserContext,
})
await agent.toolLoopAgent.generate({
prompt: instruction,
abortSignal: signal,
experimental_onToolCallStart: ({ toolCall }) => {
const input = toolCall.input as Record<string, unknown> | undefined
if (input && typeof input.url === 'string' && input.url.length > 0) {
this.currentUrl = input.url
}
this.callbacks.onToolCallStart?.({
toolCallId: toolCall.toolCallId,
toolName: toolCall.toolName,
input: toolCall.input,
})
},
experimental_onToolCallFinish: async () => {
this.stepsUsed++
await this.callbacks.onToolCallFinish?.()
},
onStepFinish: async ({ toolCalls, toolResults, text }) => {
if (toolCalls) {
for (const tc of toolCalls) {
if (!toolsUsed.includes(tc.toolName)) {
toolsUsed.push(tc.toolName)
}
}
}
if (text) {
resultText = text
}
await this.callbacks.onStepFinish?.({ toolCalls, toolResults, text })
},
})
} catch {
if (signal?.aborted) {
status = 'timeout'
} else {
status = 'blocked'
}
} finally {
if (agent) await agent.dispose().catch(() => {})
}
if (status === 'done' && signal?.aborted) {
status = 'timeout'
}
const observation =
resultText || 'Execution completed with no actions taken.'
return {
observation,
status,
url: this.currentUrl,
actionsPerformed: this.stepsUsed - stepsAtStart,
toolsUsed,
}
}
async close(): Promise<void> {
await this.cladoExecutor?.close()
}
getTotalSteps(): number {
if (this.isCladoAction) {
return this.cladoExecutor?.getTotalSteps() ?? 0
}
return this.stepsUsed
}
}

View File

@@ -24,15 +24,16 @@ import {
resolveProviderConfig,
} from '../../utils/resolve-provider-config'
import { withEvalTimeout } from '../../utils/with-eval-timeout'
import { isCladoActionProvider } from '../orchestrated/backends/clado/types'
import { createExecutorBackend } from '../orchestrated/backends/create-executor-backend'
import type { ExecutorCallbacks } from '../orchestrated/executor-backend'
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
import { Executor, type ExecutorCallbacks } from './executor'
import { OrchestratorAgent } from './orchestrator-agent'
import type { ExecutorFactory, ExecutorResult } from './types'
interface ResolvedConfigs {
orchestratorConfig: ResolvedAgentConfig & { maxTurns?: number }
executorConfig: ResolvedAgentConfig
isCladoAction: boolean
}
function toResolvedAgentConfig(
@@ -67,7 +68,10 @@ async function resolveAgentConfig(
if (!executorModel) {
throw new Error('executor.model is required in config')
}
if (config.executor.provider === 'clado-action' && !config.executor.baseUrl) {
if (
isCladoActionProvider(config.executor.provider) &&
!config.executor.baseUrl
) {
throw new Error(
'executor.baseUrl is required in config for clado-action provider',
)
@@ -75,10 +79,8 @@ async function resolveAgentConfig(
const resolvedOrchestrator = await resolveProviderConfig(config.orchestrator)
const isCladoAction = config.executor.provider === 'clado-action'
let executorConfig: ResolvedAgentConfig
if (isCladoAction) {
if (isCladoActionProvider(config.executor.provider)) {
executorConfig = {
conversationId: crypto.randomUUID(),
provider: config.executor.provider as ResolvedAgentConfig['provider'],
@@ -107,7 +109,7 @@ async function resolveAgentConfig(
maxTurns: config.orchestrator.maxTurns,
}
return { orchestratorConfig, executorConfig, isCladoAction }
return { orchestratorConfig, executorConfig }
}
export class OrchestratorExecutorEvaluator implements AgentEvaluator {
@@ -127,7 +129,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
}
const agentConfig = config.agent as OrchestratorExecutorConfig
const { orchestratorConfig, executorConfig, isCladoAction } =
const { orchestratorConfig, executorConfig } =
await resolveAgentConfig(agentConfig)
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
@@ -235,12 +237,12 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
await capture.messageLogger.logStreamEvent(delegateInputEvent)
capture.emitEvent(task.query_id, delegateInputEvent)
const executor = new Executor(
executorConfig,
const executor = createExecutorBackend({
configTemplate: executorConfig,
browser,
config.browseros.server_url,
{ isCladoAction, callbacks },
)
serverUrl: config.browseros.server_url,
callbacks,
})
let result: ExecutorResult
try {
result = await executor.execute(instruction, signal)
@@ -329,6 +331,5 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
}
}
export { Executor } from './executor'
export { OrchestratorAgent } from './orchestrator-agent'
export * from './types'

Some files were not shown because too many files have changed in this diff Show More