Compare commits

...

36 Commits

Author SHA1 Message Date
Nikhil Sonti
8ff97ef62d fix: address review feedback for PR #922 2026-05-02 14:44:20 -07:00
Nikhil Sonti
eba926422c fix: default extract base to BASE_COMMIT 2026-05-02 14:31:51 -07:00
Nikhil
921a797c5b feat: add ACPX agent soul and memory support (#917)
* feat: add acpx agent runtime context helpers

* feat: add acpx runtime state store

* feat: prepare acpx agent runtime context

* feat: inject acpx agent command environment

* feat: forward acpx agent chat cwd

* fix: normalize acpx session record fallback

* feat: improve acpx agent soul and memory prompts

* fix: address PR review comments for memory-soul-acp

* fix: satisfy acpx runtime deepscan checks
2026-05-02 13:45:40 -07:00
Nikhil
d94597bbf9 fix(agent): add CLI model catalog entries (#915)
* fix(agent): add CLI model catalog entries

* fix: address PR review comments for acpx-models
2026-05-02 13:06:41 -07:00
github-actions[bot]
ecc6bac070 chore: sync internal-docs submodule (#911)
Co-authored-by: browseros-bot <bot@browseros.ai>
2026-05-01 20:16:26 +00:00
Dani Akash
84e2739663 feat(agent): rich rail + header on /agents/:agentId chat (#908)
* feat(agent): rich rail + header on /agents/:agentId chat

Replace the chat screen's legacy AgentEntry rail and binary READY
header with the same rich data the /agents page already exposes:
adapter glyph, liveness dot, pin star, status badge, adapter · model ·
reasoning chip line, last-used time, lifetime tokens, queue count,
and the Adapter Unavailable warning. Source of truth flips from the
merged AgentEntry list to useHarnessAgents() directly.

Sort order matches /agents (pinned → recency) — not /home
(active-first → recency) — because chat is index-shaped and shuffling
rows every 5s as turns transition would be jarring while reading.

Lift the inline pin-then-recency comparator out of /agents
AgentList.tsx into a shared agents-list-order.ts so both surfaces
stay on identical sort semantics.

* fix(agent): chat header height + composer sticking to bottom

Header was clipping descenders because the strip was vertical-content
sized at min-h-14 with tight py-2.5; bump padding and lean on natural
content height. Drop the AgentTile glyph (the rail row already shows
adapter identity) and the cwd path (too long, pushed the meta line
off-screen). Header is now name + pin star + status pill, then
adapter · model · reasoning, then last-used · tokens · queued.

Composer was floating mid-screen on short chats because the chat
grid had no grid-template-rows — the implicit auto row collapsed to
content height, so the right-column flex wrapper never received the
full container height. Add grid-rows-[minmax(0,1fr)] so the single
row claims 100% and ClawChat's flex-1 expands to push the composer
flush to the bottom.

* fix(agent): composer flush to bottom on short chats

Match the sidepanel chat's nested-flex pattern. The right-column
wrapper got h-full so it expands to the grid row; the conversation
controller's root added flex-1 so ClawChat's existing flex-1 has
something to actually fill against. Without these, the grid cell
stretched but the inner flex columns shrank to content height,
leaving the composer floating mid-screen.

* fix(agent): align rail header with chat header in shared top band

Pull the rail's "Agents" + back-button into the same horizontal strip
as the agent identity header. The two halves now sit on a single row
that spans both columns, so they can't drift in height as the chat
header gains/loses meta lines (last-used, tokens, queued).

The rail below the band keeps its scrollable list only; the chat
column below holds the conversation + composer. Border-bottom moves
from ConversationHeader to the band wrapper so we don't get a
double-rule on the boundary.

* fix(agent): reserve header height to prevent layout shift on data load

The chat header grew from a single line to three lines once the
useHarnessAgents() poll resolved (adapter chips + meta line populate
asynchronously), shoving the rail and conversation body downward.
Lock min-h-[84px] on both the band's left "Agents" cell and the
ConversationHeader root, and always render the meta line slot
(non-breaking space when empty) so the typographic frame is stable
regardless of data state.

* refactor(agent): pull status pill + meta to right side of chat header

Two-column header layout instead of three stacked rows: name + pin
star + adapter chips on the left, status pill stacked on top of the
last-used / tokens / queued meta line on the right. Drops min-h
from 84px → 60px so the band reclaims ~24px of vertical space and
the chat body starts higher on screen. Band's left "Agents" cell
matches the new height.
2026-05-01 20:19:16 +05:30
Dani Akash
974e7e9b86 fix(agents): hide BrowserOS ACP envelope from chat history payloads (TKT-774) (#907)
* fix(agents): hide BrowserOS ACP envelope from chat history payloads (TKT-774)

The user-message text persisted on the wire carried two nested
envelopes — the outer `<role>You are BrowserOS…</role>` +
`<user_request>…</user_request>` block from buildBrowserosAcpPrompt
and the inner `## Browser Context` + `<selected_text>` +
`<USER_QUERY>` block from formatUserMessage. PR #856 had unwrapped
only the outer envelope on history reads, so the user bubble in
the agent rail still rendered the inner envelope, and the LLM
chat-service path leaked the wrapper all the way back to the
sidepanel client through AI SDK's stream sync.

Two surgical fixes, both server-only:

1) ACP path (acpx-runtime.ts) — replace unwrapBrowserosAcpPrompt
   with a comprehensive unwrapBrowserosAcpUserMessage that strips
   both layers and decodes the &lt;/&gt;/&amp; escapes the server
   applied via escapePromptTagText. Each step is independently
   defensive (anchors that don't match are skipped) so the helper
   is idempotent and tolerates partial / older / future-shape
   envelopes. Applied in userContentToText (history mapper) and
   inherited by extractLastUserMessage (listing's lastUserMessage).

2) LLM chat path (chat-service.ts) — split the persisted user
   message from the prompt-time copy. session.agent.appendUserMessage
   now stores the raw user text; a transient promptUiMessages array
   is built with the wrapped (formatUserMessage + context-change
   prefix) form and passed to createAgentUIStreamResponse for the
   model. onFinish restores the raw form before persisting, so the
   user-visible message and any future history reads see only the
   user's typed text.

Tests:

- acpx-runtime.test.ts: new dedicated unwrapBrowserosAcpUserMessage
  suite covering fully-wrapped messages, only-outer / only-inner
  inputs, selected_text blocks with attribute strings, idempotency,
  literal user-typed angle-bracket round-trip, and an integration
  test that round-trips the real formatUserMessage output through
  the unwrap to pin the writer/reader contract.
- chat-service.test.ts: existing 'rebuilds a managed-app session'
  test updated for the new behaviour — asserts the persisted user
  message is the raw text and the prompt copy passed to the agent
  carries the Klavis context-change notice.

* fix(agents): decode entity escapes before stripping inner envelope (TKT-774)

The unwrap was running its inner-envelope strips against the
literal-tag form (<USER_QUERY>, <selected_text>) but the persisted
payload has those tags entity-escaped (&lt;USER_QUERY&gt;,
&lt;selected_text&gt;) — buildBrowserosAcpPrompt runs
escapePromptTagText over the entire formatUserMessage payload
before adding the outer <role>+<user_request> envelope, so the
inner anchors never matched against the on-disk text and the user
was still seeing <USER_QUERY> in /agents/:id/sessions/main/history
responses.

Reorder unwrapBrowserosAcpUserMessage to: outer-strip → decode
entities → inner-strips. Test fixtures updated to reflect the
actual on-wire form (escaped inner tags); the round-trip test
duplicates the escape rule inline so the contract between
buildBrowserosAcpPrompt and the unwrap is pinned end-to-end.
2026-05-01 19:42:48 +05:30
github-actions[bot]
19e07c086f chore: sync internal-docs submodule (#903)
Co-authored-by: browseros-bot <bot@browseros.ai>
2026-05-01 08:36:41 +00:00
Nikhil
ab354d7dd7 fix(ci): restore PAT on actions/checkout for submodule fetch (#898)
Without a token on actions/checkout, the action falls back to
GITHUB_TOKEN, which has no access to the private internal-docs
repo. Submodule clone fails with "repository not found".

PAT is back on checkout. PR ops still use GITHUB_TOKEN via the
GH_TOKEN env var on the run step. The bot-branch git push uses
the credential helper set up by checkout (the PAT, which has
Contents: Read and write).
2026-04-30 16:23:58 -07:00
Nikhil
0e779fa344 fix(ci): switch internal-docs sync to PR + auto-merge (#897)
Direct push to dev fails the dev ruleset's "Require pull request"
rule. Open a tiny PR from a bot branch and enable auto-merge
(squash, 0 approvals required) instead. No bypass actor needed —
the rule stays strict for everyone, including the bot.

PR ops use GITHUB_TOKEN with explicit pull-requests: write
permission. The cross-repo PAT is only used to rewrite the SSH
submodule URL so internal-docs can be cloned over HTTPS.
2026-04-30 16:17:15 -07:00
Nikhil
dfbce48994 feat: remove CLI auto init discovery (#896)
* feat: remove CLI auto init discovery

* fix: address review feedback for PR #896
2026-04-30 16:03:47 -07:00
Nikhil
7c942e91ce chore: add internal-docs submodule (#895)
Mounts browseros-ai/internal-docs at .internal-docs/, tracking main.

This activates the /document-internal and /ask-internal skills (which
early-exit if the submodule is missing) and lets the sync-internal-docs
workflow start bumping the pointer on its 4-hourly schedule.

Team members: after this lands, run once from a fresh dev pull:
    git submodule update --init .internal-docs
2026-04-30 15:13:41 -07:00
Nikhil
1ff92c44b3 feat(internal-docs): scaffold private docs submodule, skills, sync action (#894)
* feat(internal-docs): scaffold private docs submodule, skills, sync action

Adds the OSS-side scaffolding for the internal-docs system:

- /document-internal skill — drafts a 1-page feature/architecture/design
  doc from the current branch's diff, asks four sharp questions, enforces
  voice rules (no em dashes, banned filler words, 60-line cap on feature
  notes), then opens a PR to browseros-ai/internal-docs via a tmp clone.
- /ask-internal skill — answers team-internal questions by greping
  internal-docs and the codebase, synthesizing with file:line citations,
  optionally executing surfaced commands with per-command confirmation,
  and drafting a new doc + PR if grep returns nothing useful.
- .github/workflows/sync-internal-docs.yml — every 4 hours, bumps the
  submodule pointer on dev directly (no PR; relies on dev branch
  protection blocking force-push). Skips silently until the submodule
  is configured. Uses url.insteadOf to rewrite the SSH submodule URL
  to HTTPS-with-token for the bot, while keeping SSH the local default.
- .claude/skills/document-internal/seeds/ — root README and three
  templates (feature-note, architecture-note, design-spec) ready to
  copy into the new internal-docs repo on rollout.

Design spec: .llm/superpowers/specs/2026-04-30-internal-docs-submodule-design.md

Manual prereqs (NOT in this PR — handled out-of-band):
1. Create private repo browseros-ai/internal-docs with branch protection on main.
2. Seed it with the contents of .claude/skills/document-internal/seeds/.
3. Create a bot account, mark as bypass actor on dev branch protection.
4. Add INTERNAL_DOCS_SYNC_TOKEN secret with repo + read access to internal-docs.
5. Once internal-docs exists, on a follow-up branch:
     git submodule add -b main git@github.com:browseros-ai/internal-docs.git .internal-docs
6. Send the team the one-time init snippet for their existing checkouts:
     git submodule update --init .internal-docs

* fix(internal-docs): address Greptile review feedback

- Workflow: rebase onto dev before push to handle non-fast-forward race;
  bump fetch-depth 1->50 so rebase has merge-base history.
- Workflow: move INTERNAL_DOCS_SYNC_TOKEN into step env: per Actions
  credential-injection pattern, instead of inlining in the script body.
- Skill (BASE bug): suppress git rev-parse stdout so SHA does not get
  captured into BASE alongside the literal 'dev'. Was breaking every
  downstream git log/diff call.
- Skill (tmp clone): trap 'rm -rf "$TMP" EXIT after mktemp so cleanup
  always runs, even if any subsequent step fails.
2026-04-30 15:04:08 -07:00
shivammittal274
c81906ecbf feat(eval): add claude code eval agent (#885) 2026-05-01 02:25:08 +05:30
Nikhil
ffc0f09c86 feat(dev): add target-aware reset cleanup (#893)
* feat(dev): add target-aware reset cleanup

* fix(dev): address cleanup reset review comments
2026-04-30 13:34:52 -07:00
Nikhil
7fb53c9921 feat(dev): bootstrap setup from dev watch (#891)
* feat(dev): bootstrap setup from dev watch

* fix: address review feedback for PR #891
2026-04-30 13:00:46 -07:00
Nikhil
d38b01a8c7 feat(dev): add guided cleanup and reset commands (#890)
* feat(dev): add guided cleanup and reset commands

* fix: address cleanup reset review feedback
2026-04-30 12:27:15 -07:00
Nikhil
ff36c8412b fix(dev): use run lock for watch cleanup (#889)
* fix(dev): use run lock for watch cleanup

* fix(dev): address watch lock review comments
2026-04-30 11:46:17 -07:00
Nikhil
fd5aba249b fix: stabilize OpenClaw gateway startup (#888)
* feat(server): add shared process lock helper

* feat(container): add container name reconciliation helpers

* feat(openclaw): serialize lifecycle across processes

* fix(openclaw): reconcile fixed gateway container startup

* test(openclaw): cover lifecycle race recovery

* fix(server): satisfy process lock error override

* fix(openclaw): address review feedback

* test(openclaw): align serialization mock with image check
2026-04-30 11:31:40 -07:00
Nikhil
492f3fcdf2 feat(openclaw): prewarm ghcr image in vm (#887)
* feat(openclaw): add gateway image inspection

* feat(openclaw): pull gateway image from registry

* refactor(vm): decouple readiness from image cache

* refactor(openclaw): remove vm cache from runtime factory

* feat(openclaw): detect current gateway image

* feat(openclaw): prewarm vm runtime and reuse current gateway

* feat(openclaw): prewarm runtime on server startup

* refactor(vm): remove browseros image cache runtime

* refactor(build-tools): remove openclaw tarball pipeline

* chore: self-review fixes

* fix(openclaw): suppress prewarm pull progress logs

* fix(openclaw): address review feedback

* fix(openclaw): resolve review findings

* fix(dev): stop stale watch supervisors
2026-04-30 11:18:11 -07:00
Nikhil
cb0c0dd0c1 chore: simplify root test scripts (#886)
* chore: simplify root test scripts

* fix: avoid chained root test scripts

* fix: update test workflow commands

* fix: move app test commands into packages
2026-04-30 10:58:08 -07:00
Dani Akash
8712f89f18 feat(agents): durable per-agent chat message queue + composer Stop (#880)
* feat(agents): durable per-agent chat message queue + composer Stop button

* fix(agents): tighten queue UI — smaller Stop, drop empty indicator, live drain attach

User feedback round 1 on the message-queue UX:

1) The Stop button matched the send/voice mics at h-10 w-10 with a
   solid destructive fill, which read as alarming. Shrunk to h-8 w-8,
   ghost variant with a soft destructive/10 background, smaller
   filled square glyph. Reads as a calm 'stop' affordance instead of
   a panic button.

2) The QueueItem's leading <QueueItemIndicator> dot was decorative
   only — no state, no interaction. Dropped it from QueuePanel along
   with the import; queue items now render as a clean preview line
   with the trailing X remove action.

3) When the server drained the queue and started the next turn, the
   chat panel didn't pick up the live stream until the user
   navigated away and back. The hook's resume effect previously
   only fired on agent change, not on listing-observed activeTurnId
   change. Surface activeTurnId from useHarnessAgents into
   useAgentConversation; effect now re-runs when the id changes,
   calls /chat/active, and attaches to the new turn — so a queued
   message starts streaming the moment the server drain pops it.

* fix(agents): don't reset streaming state from the resume effect's no-op paths

The Stop button was disappearing while the agent was actively
streaming, even though events were still flowing into the chat. Root
cause: the resume effect's `finally` block reset `streaming`,
`turnIdRef`, and `lastSeqRef` unconditionally — including on the
early-return paths (no active turn, or another mechanism already
owns the stream).

Sequence that triggered it:
  1) User sends a message → send() sets streamAbortRef + streaming=true
     and starts consuming the SSE.
  2) User enqueues another message → enqueue mutation invalidates the
     listing query.
  3) Listing refetches with the live activeTurnId → the resume
     effect re-fires (deps include activeTurnIdDep).
  4) attemptResume hits `if (streamAbortRef.current) return` because
     send() owns it.
  5) The finally clause fires anyway and calls setStreaming(false),
     clobbering the live state set by send(). The SSE consumer keeps
     running (refs are intact) so text keeps streaming, but the React
     flag is wrong, so the Stop button gates off.

Fix: track whether *this* run actually started a stream
(`weStartedStream`). The finally only resets state when it does.
Early-return / no-active-turn paths now leave streaming/turnIdRef/
lastSeqRef alone for whoever does own them.

Also widens the Stop button's visibility (`canStop` prop on
ConversationInput) so it stays steady across the brief gap between
turns when a queue drain is mid-flight; the parent computes
`streaming || activeTurnId !== null || queue.length > 0`. The
visibility widening is independent of the streaming-state fix above
— both are now in place.

* revert: drop canStop widening — Stop only shows while streaming

Reverts the canStop prop on ConversationInput and the OR-with-queue
visibility from AgentCommandConversation. Stop is gated solely on
`streaming` again. Between turns (queue draining) the button stays
hidden — only the actively-streaming turn is interruptible from the
composer, which matches what the user actually expects.

* fix(agents): persist the kicking-off prompt on active turns so the resume placeholder isn't empty

When a queued message drained and started a new turn, the chat
panel's resume effect staged a placeholder turn with userText: ''
because the hook had no way to know what message kicked off the
turn — only the agent-side stream was visible, and the user bubble
above it was blank until the user navigated away and back (at which
point the session record's history loaded normally).

Fix: ActiveTurnRegistry.register now accepts an optional `prompt`
that's stashed on the turn and surfaced via describe() / the
ActiveTurnInfo response. AgentHarnessService.startTurn passes the
incoming message into register. /chat/active returns it. The chat
hook's resume effect uses active.prompt as the placeholder
turn's userText, so the user bubble shows the queued message text
the moment streaming begins. Falls back to '' for older clients
that haven't been refetched yet.

* fix(agents): always release streamAbortRef on resume cleanup, even when cancelled

Greptile P1 follow-up. The previous `weStartedStream` guard correctly
stopped the resume effect's no-op early-returns from clobbering an
in-flight `send()` stream — but it also stopped a *cancelled*
mid-stream resume from clearing its own `streamAbortRef`. When the
cleanup fires (e.g. the 5s listing poll captures a new queue-drain
turn id while the SSE for the prior turn is still finishing), the
next effect run hits the `if (streamAbortRef.current) return` guard
against the now-aborted controller and never reattaches, leaving
`streaming === true` with no live stream until the user navigates
away.

Split the finally block: always release `streamAbortRef` when we
owned the controller (so the next run can take over), but only
reset the streaming flag / turn id / lastSeq on a clean exit (the
new run will set those itself, so resetting on cancel would just
flicker).
2026-04-30 18:26:56 +05:30
Dani Akash
ba60bf466f feat(agents): rich command-center rows + home grid + dead-code sweep (#879)
* feat(agents): rich-info command center rows + pin/PATCH/adapter-health backbone

Splits AgentRowCard from a 271-line monolith into a shallow tree of
single-responsibility sub-components under `agent-row/`:

  AgentTile, AdapterHealthDot, PinToggle, AgentTitleRow,
  AgentSparkline, AgentSummaryChips, AgentLastMessage, CwdChip,
  AgentTokenSummary, AgentMetaRow, AgentErrorPanel, AgentActions

Adds the data each row consumes:

- pinned: boolean field on AgentDefinition + FileAgentStore.update
  + new PATCH /agents/:id route. useUpdateHarnessAgent mutation
  optimistically updates the listing cache so the star flips
  instantly; rolls back on error.
- Listing payload extended with lastUserMessage, cwd, tokens
  (cumulative + last7d shape — last7d zero-filled until the
  activity ledger lands), turnsByDay/failedByDay (zero-filled),
  lastError/lastErrorAt, activeTurnId. AcpxRuntime grows a
  getRowSnapshot() that reads cwd + cumulative tokens + last user
  message from the session record in one pass.
- Adapter health: in-memory AdapterHealthChecker probes
  `claude --version` / `codex --version` with a 2s timeout and
  caches results for 5 min. /adapters response carries
  { healthy, reason?, checkedAt }. Tile-corner dot exposes the
  state via HoverCard; openclaw inherits health from the gateway
  snapshot already on the page.

Sub-components are pure: card itself owns no state. Sort order
becomes pinned-first, then recency. HoverCard is the workhorse for
keeping rows compact while exposing depth (full message, token
breakdown, daily turn list, error stack, adapter reason).

* refactor(agents): tighten command-center row design + cut redundant affordances

User feedback round 1:
1) Two green dots on the tile (health + liveness) was confusing. Health
   moves out of the tile entirely and surfaces as an inline 'Unavailable'
   chip in the model line — silent when the adapter is healthy, with a
   warning amber chip + HoverCard reason when not. The tile now shows
   one signal: liveness.
2) The last-user-message HoverCard wasn't telegraphing intent. Drop the
   HoverCard. The line is informational, italic, with a leading quote
   glyph so the row reads like a conversation snippet. To see the full
   message the user opens the chat (which is the action they want next
   anyway).
3) Resume + Chat were duplicate CTAs. Single primary action per row:
   Resume (filled, accent-orange, with a pulsing dot) replaces Chat
   when there's an active turn. Both navigate to /agents/:id but the
   row tells the user which action they're taking.
4) Tokens weren't visible because the row gated on last7d.requestCount,
   which is zero until the activity ledger ships. Switch to lifetime
   tokens (which we have today). Drop the '7d stats:' framing — talking
   about a window we can't compute would be misleading. The HoverCard
   surfaces input/output split + a footnote that per-window stats land
   in a follow-up.
5) CWD was rendering the server's own running directory, which is
   meaningless to users. Hide it from the row entirely. The cwd field
   still rides in the listing payload for future surfaces (chat panel,
   debug view) — only the row stops rendering it.

Aesthetic refinements while we're here:
- Whole card carries state, not just the tile: working rows get an
  accent-orange tinted border with a soft glow, error rows tint
  destructive, idle rows lift on hover.
- Pin star fades in on hover (group-hover) when unpinned and stays
  solid amber when pinned — keeps the rail calm by default.
- Tabular-nums on token figures so columns visually align across rows.
- Drop CwdChip and AdapterHealthDot files: no callers left.

* fix(agents): align row title flush-left whether pinned or not

Pin star moved from leading the title to trailing the badges, and
hidden from layout entirely (`hidden group-hover:inline-flex`) when
unpinned. The previous `opacity-0` rule kept the star reserving its
`size-6` slot, which left every unpinned title indented relative to
the model / preview / meta lines underneath it. Title now flushes
left in both states; pinned star stays solid amber so the signal
isn't hidden, and unpinned reveals an outline star on row hover for
the toggle affordance.

* fix(agents): keep pin-toggle slot reserved so row height is constant

Switching the unpinned star from `hidden group-hover:inline-flex`
to `opacity-0 group-hover:opacity-100`. The hidden/show variant was
collapsing the title row's height when the star wasn't rendered,
which made every card below visibly shift on hover. Always rendering
the button (with opacity-only visibility) keeps the row's vertical
metrics constant; the title still flushes left because the slot is
trailing, not leading.

Card hover effect (-translate-y + shadow-md) restored — the layout
shift wasn't coming from the card hover; it was the pin slot
appearing and disappearing.

* fix(agents): quieten row hover — border-tint only, no lift, no shadow

Drop the `-translate-y-px` and `hover:shadow-md` from the row card
plus the working-state inner ring. The translate + shadow grow
combination was visibly noisy as the cursor moved through the rail —
each row 'lifted' as you passed over it. Hover now just tints the
border in accent-orange/30; working and error states keep their
distinct border colours but no inner ring. Card height and shadow
stay constant in every state, so the rail reads as a calm vertical
list of cards.

* feat(home): rich Recent Agents grid + dead-code sweep

The /home Recent Agents grid was a placeholder shell. Every 'rich'
field on the card (lastMessage, lastMessageTimestamp, activitySummary,
currentTool, costUsd) was wired to undefined because AgentCommandHome
called `buildAgentCardData(agents, status?.status, undefined)` — the
dashboard arg has been hard-coded undefined since the harness
migration. Repointing the grid at `useHarnessAgents` + `useAgentAdapters`
gives every card the same enriched data the rail uses.

What the new card shows per agent:
  • Adapter glyph tile + liveness dot (working pulses; asleep is
    hollow; error is red)
  • Name + Working pill (when active)
  • Adapter · model · reasoning summary line, with an inline
    Unavailable chip + HoverCard reason when the adapter binary
    isn't on $PATH
  • Italic last-user-message preview (line-clamp-2, leading quote
    glyph) — same visual language as the rail
  • Footer: 'X ago' + state chip (Asleep / Attention) OR a Resume
    button (orange, with pulsing dot) when activeTurnId is non-null

Sort on the home grid is active-turn → recency. Pinning is NOT a
sort key here (and there's no pin indicator on the card) — pinning
belongs to the rail at /agents; the home page is action-oriented
and trusts active-turn + recency to surface the right agent.

Dead code removed:
  • useAgentDashboard.ts (96 lines, no callers; subscribed to the
    dead /claw/dashboard/stream from the OpenClaw-only era)
  • useAgentCardData.ts (the dashboard-merge shim; passed undefined
    every call so all enriched fields landed as undefined)
  • AgentCard.tsx (AgentCardExpanded replaced by HomeAgentCard;
    AgentCardCompact had no callers — the dock's compact mode was
    never used)
  • AgentCardData interface dropped from lib/agent-conversations/
    types.ts; the new card consumes HarnessAgent directly

Visual language stays continuous between rail and grid: same
<AgentTile>, same <LivenessDot>, same italic-quote message
preview, same orange Resume button with a pulsing dot.
2026-04-30 16:36:22 +05:30
Nikhil
26afb826c6 feat(eval): add viewer manifest contract (#878)
* refactor(eval): canonicalize viewer manifest contract

* refactor(eval): publish canonical viewer manifests

* feat(eval): make r2 viewer use manifest artifact paths

* fix(eval): keep weekly report compatible with viewer manifests

* docs(eval): document r2 viewer manifest contract

* chore: self-review fixes

* fix: address review feedback for PR #878
2026-04-29 20:50:35 -07:00
Nikhil
b2340c8afa refactor(eval): split orchestrated executor backends (#876)
* refactor(eval): split orchestrated executor backends

* fix(eval): address executor backend review comments
2026-04-29 18:02:32 -07:00
Felarof
790a270f47 Update README.md (#877) 2026-04-29 17:35:15 -07:00
Nikhil
84a79ba0a1 feat: refactor eval pipeline workflow (#875)
* feat(eval): add suite variant config bridge

* feat(eval): add stable run artifacts

* refactor(eval): add shared grader contract

* feat(eval): persist grader artifacts

* refactor(eval): rename runner layers

* refactor(eval): add executor backend boundary

* refactor(eval): split clado backend

* feat(eval): add workflow compatible cli

* feat(eval): add r2 publisher module

* ci(eval): migrate weekly workflow to eval cli

* docs(eval): document suite pipeline

* chore(eval): verify pipeline refactor

* fix: address review feedback for PR #875

* docs(eval): add env example

* docs(eval): explain suites and variants

* chore(eval): organize config layouts

* chore(eval): colocate grader python evaluators
2026-04-29 17:21:02 -07:00
Nikhil
6e3306f5e5 fix: make R2 uploads retryable (#874)
* fix: make R2 uploads retryable

* fix: address review feedback for PR #874
2026-04-29 16:43:33 -07:00
Nikhil
c244462b29 fix: use Node 24 GitHub actions (#872) 2026-04-29 15:31:23 -07:00
Nikhil
ebf97f74f6 fix: bound VM agent cache smoke test (#870)
* fix: bound VM agent cache smoke test

* fix: address review feedback for PR #870
2026-04-29 13:43:37 -07:00
Nikhil
561f2baf97 fix(eval): split AGISDK smoke and full configs (#871)
* fix(eval): split agisdk smoke and full configs

* fix(eval): default agisdk smoke to openrouter
2026-04-29 13:38:55 -07:00
shivammittal274
df0f45dd29 Feat: eval debug dev ci (#869)
* chore(eval): instrument server startup to root-cause dev CI health-check timeouts

Three diagnostics + one config swap to investigate why the eval-weekly
workflow has been failing on dev since 2026-04-25 with "Server health
check timed out" (every worker, every retry).

Background:
- Last successful weekly eval on dev: 2026-04-18 (sha f5a2b73)
- Since then, ~30 server commits landed including Lima/VM runtime,
  OpenClaw service, ACL system, ACP SDK — 108 server files changed,
  ~13K LOC added.
- Server process spawns cleanly in CI (PID logged) but never binds
  /health within the 30s eval-side timeout. Static analysis finds no
  obvious blocker; we need runtime evidence.

Changes:

1. apps/server/package.json — add `start:ci` script (no `--watch`).
   The default `start` uses `bun --watch` which forks a child process
   that watches every file in the import graph. Dev's graph is ~108
   files larger than main's; on a cold CI runner the watcher setup is a
   plausible source of multi-second startup overhead.

2. apps/eval/src/runner/browseros-app-manager.ts:
   - Use `start:ci` when `process.env.CI` is set (true on
     GitHub-hosted runners by default), else `start`.
   - Capture per-worker server stderr to /tmp/browseros-server-logs/
     instead of ignoring it. Without this we have no visibility into
     why the server is hung pre-/health.
   - Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module
     graph may simply need more cold-start time on CI.

3. .github/workflows/eval-weekly.yml — upload the server logs dir as a
   workflow artifact (always, not just on success) so we can post-mortem
   any startup failure on the next run.

4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter ->
   Fireworks (bypasses the OpenRouter per-key spend cap that has been
   eating recent runs) and drop num_workers 10 -> 4 (well below the
   Fireworks per-account TPM threshold that overwhelmed the original
   2026-04-23 run).

Plan: trigger the eval-weekly workflow on this branch with the agisdk
config and observe (a) whether it gets past server startup, and
(b) if it doesn't, what the captured server stderr says.

* fix(eval): capture stdout too — pino logger writes to stdout, not stderr

Previous diagnostic patch only redirected stderr; the captured per-worker
log files came back as 0 bytes because the server uses pino which writes
all log output to stdout (fd 1), not stderr (fd 2). Capture both into
the same file.

* fix(server): catch sync throw from OpenClaw constructor on Linux

The container runtime constructor in OpenClawService throws synchronously
on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing
.catch() on tryAutoStart() only handles async throws inside auto-start —
the sync throw from configureOpenClawService(...) itself propagates up
through Application.start() and crashes the process via index.ts:48
(process.exit(EXIT_CODES.GENERAL_ERROR)).

This is what's been killing dev's eval-weekly CI: the server crashes in
milliseconds, the eval client polls /health, gets nothing, times out.

Fix: wrap the configureOpenClawService call in try/catch matching the
existing .catch() intent (best-effort, don't crash). Server continues
without OpenClaw on platforms where it can't initialize.

Verified by reading captured server stdout from run 25123195126:
  Failed to start server: error: browseros-vm currently supports macOS only
    at buildContainerRuntime (container-runtime-factory.ts:54:11)
    at new OpenClawService (openclaw-service.ts:652:15)
    at configureOpenClawService (openclaw-service.ts:1527:19)
    at start (main.ts:127:5)

* fix(server): defer OpenClaw chat client port lookup to request time

apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort()
synchronously when constructing the OpenClawGatewayChatClient inside the
createHttpServer object literal. On non-darwin platforms this throws via
the OpenClawService constructor → buildContainerRuntime, escaping the
try/catch added in 5cf7b765 (which only protected the configureOpenClawService
call further down in main.ts).

Every other getOpenClawService() reference in server.ts is already wrapped
in an arrow function. This was the lone holdout. Make it lazy too: change
the chat client constructor to take getHostPort: () => number instead of
hostPort: number, evaluate it inside streamTurn at request time. Behavior
on darwin is unchanged.

This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't
available — the chat endpoint isn't exercised by the eval, so a deferred
throw is acceptable.

* fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1

Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't
unblock dev's Linux CI — same throw kept reproducing. Whether this is bun
caching stale stack frames or a missed eager call site, the safer move is
to fix it at the root: make buildContainerRuntime never throw on Linux
when the runner has explicitly opted out.

Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test
escape hatch in container-runtime-factory.ts. When set, returns the existing
UnsupportedPlatformTestRuntime stub — server boots normally, /health binds,
any actual OpenClaw API call still fails loudly at request time.

eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and
non-CI Linux behavior unchanged (without the flag they still throw).

* feat(eval): align Clado action executor with new endpoint contract

David Shan shared the updated Clado BrowserOS Action Model spec.
Changes to match it:

- Bump endpoint URL + model id to the 000159-merged checkpoint
  (clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef)
  in browseros-oe-clado-weekly.json and the README example.
- CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min;
  the 2-min ceiling was failing every cold-start request.
- Treat HTTP 200 with action=null / parse_error as an INVALID step
  instead of aborting the executor loop. The model can self-correct
  on the next call. Cap consecutive parse failures at 3 to avoid
  infinite loops.
- Capture final_answer from end actions. Surface it in the observation
  back to the orchestrator so its task answer can use the model's
  declared result.
- Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X).
- Switch screenshot format from webp → png to match the documented
  "PNG or JPEG" contract.

* chore(eval): refresh test-clado-api script for new Clado contract

Updated the local smoke-test to match the new Clado endpoint and
response contract:

- New action + health URLs (000159-merged checkpoint).
- Drop the grounding-model branch (orchestrator-executor doesn't
  use it; the README David shared only documents the action model).
- Health-check waits up to 6 minutes for cold start with a 30s
  warning so the operator knows it's spinning up.
- Print every documented response field (action, x/y, text, key,
  direction, amount, drag start/end, time, final_answer, thinking,
  parse_error, inference_time_seconds).
- Three-step run that exercises a click, a typing continuation
  with formatted history, and an end+final_answer probe.

* chore(eval): point clado weekly config at agisdk-real

Switches the orchestrator-executor + Clado weekly config to run on the
AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff
grader. Matches the orchestrator-executor smoke target (Fireworks K2.5
orchestrator + Clado action executor) we want to track week-over-week.

* chore(eval): run clado weekly headless

Default to headless so the weekly job (and local repros) don't pop ten
visible Chrome windows. Set headless=false locally if you need to watch
a worker.

* fix(eval): address Greptile P1+P2 on server log fd handling

P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir
failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the
log directory missing and crash the server spawn with ENOENT. Move openSync
into the same try block; fall back to /dev/null so spawn always succeeds.

P2: the log fd was opened on every server start but never closed. Each
restart attempt leaked one fd across all workers — over a long eval run
that could exhaust the process fd limit. Track the fd on the manager and
closeSync it in killApp() right after the server process exits (the child's
dup keeps the file open until it exits, so we don't truncate output).
2026-04-30 01:33:49 +05:30
Nikhil
edfc5c751c fix: align OpenClaw gateway image with VM cache (#868)
* fix: load OpenClaw gateway image from VM cache

* fix: use container port for OpenClaw ACP bridge

* fix: address review feedback for PR #868
2026-04-29 12:11:00 -07:00
Nikhil
471256f31c fix: stop passing native permission flags to ACP adapters (#867) 2026-04-29 11:07:51 -07:00
Nikhil
4c90ca696b fix(agents): connect OpenClaw ACP inside gateway container (#866) 2026-04-29 11:07:29 -07:00
Nikhil
f2ac87d7c3 feat: show created agents in sidepanel (#865)
* feat(agent): list created agents in sidepanel target catalog

* feat(agent): show created agents in sidepanel selector

* feat(server): add sidepanel chat route for created agents

* feat(agent): route sidepanel agent sends by agent id

* chore(agent): retire virtual sidepanel acp targets

* fix: address review feedback for PR #865
2026-04-29 10:15:58 -07:00
293 changed files with 18859 additions and 7580 deletions

View File

@@ -0,0 +1,152 @@
---
name: ask-internal
description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
allowed-tools: Bash, Read, Grep, Glob, Edit, Write
---
# Ask Internal
Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
## When to use
- "How do I reset my dogfood profile?"
- "What's the deal with the OpenClaw VM startup?"
- "Where do we configure release signing?"
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
## Hard rules — never do these
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER cite a file or line number you have not actually read.
## Voice rules
Apply the same voice rules as `document-internal` to the synthesized answer:
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences. Active voice. No em dashes.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- No filler intros.
## Workflow
### Step 0: Pre-flight
```bash
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
exit 0
fi
[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
echo ".internal-docs/ missing or empty. Submodule not configured?"
exit 0
}
```
### Step 1: Parse the question
Pull the keywords from the user's question. Drop stop words. Identify intent:
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
- **Free-form** ("anything about Y"): search all categories.
### Step 2: Multi-source search
Run grep in parallel across two sources.
**Internal docs:**
```bash
grep -rni --include='*.md' '<keyword>' .internal-docs/
```
Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
**Codebase (skip vendored Chromium and `node_modules`):**
```bash
grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
--exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
'<keyword>' packages/ scripts/ .config/ .github/
```
Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
### Step 3: Synthesize answer
Structure the response:
1. **Direct answer.** First sentence answers the question. No preamble.
2. **Steps if applicable.** Numbered list with exact commands.
3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
### Step 4: Offer execution (only if commands surfaced)
If Step 3 produced executable commands the user could run, ask:
> Run these for you? (y / n / dry-run)
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
- **n:** Skip. Done.
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
### Step 5: Doc-not-found path
If Step 2 returned nothing useful (no doc hits AND no clear code answer):
1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
2. Ask: "Draft a new doc and open a PR to internal-docs?"
3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
### Step 6: Completion status
Report one of:
- **DONE** — answer delivered, citations verified.
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
- **BLOCKED** — submodule missing or other pre-flight failure.
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
## Citation discipline
Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
If a doc says one thing and the code says another, surface the conflict explicitly:
> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
## Common Mistakes
**Skimming and then citing**
- **Problem:** Citation points to a line that doesn't actually contain the claim.
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
**Executing without per-command confirmation for mutations**
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
**Searching only docs, not code**
- **Problem:** Doc says X but code does Y; answer is wrong.
- **Fix:** Always grep both sources in Step 2.
## Red Flags
**Never:**
- Cite a file:line you haven't read.
- Run mutations without per-command confirmation.
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
**Always:**
- Pre-flight check before any search.
- Reconcile doc vs code conflicts in the answer, don't hide them.
- Plain "no doc covers this" when grep is empty — never invent.

View File

@@ -0,0 +1,208 @@
---
name: document-internal
description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
allowed-tools: Bash, Read, Write, Edit, Grep, Glob
---
# Document Internal
Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
## When to use
After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
## Hard rules — never do these
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
- NEVER fabricate filler content for empty template sections. Empty stays empty.
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
## Voice rules — enforced by Step 4
The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
- Lead with the point. First sentence answers "what is this?"
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
- Short sentences. Average <20 words. No deeply nested clauses.
- Active voice. "X does Y" not "Y is done by X".
- No em dashes. Use commas, periods, or rephrase.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
- No filler intros ("This document describes..."). Start with the substance.
- Empty sections stay empty. Do not write "N/A" or fabricate content.
## Workflow
### Step 0: Pre-flight
Bail with a clear message on any failure.
```bash
# Submodule must be initialized
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
exit 0
fi
[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
# Must be on a feature branch
BRANCH=$(git branch --show-current)
if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
echo "On $BRANCH. Run from a feature branch."
exit 0
fi
# Determine base branch (default: dev for this repo, fall back to main).
# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
# Gather context
git log "$BASE..HEAD" --oneline
git diff "$BASE...HEAD" --stat
gh pr view --json body -q .body 2>/dev/null # may be empty if no PR yet
```
### Step 1: Identify the doc
Ask the user for three things in one prompt:
1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
### Step 2: Decision brief — four sharp questions
Ask one question at a time. Each answer constrains the next. These force compression before drafting.
1. "In one sentence: what can someone now DO that they could not before?"
2. "What is the one design decision a future engineer needs to know?"
3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
4. "Any sharp edges or gotchas? (or 'none')"
Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
### Step 3: Draft from the template
Read the matching template from `.internal-docs/_templates/`:
- `feature` `feature-note.md`
- `architecture` `architecture-note.md`
- `design` `design-spec.md`
If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
Generate the 1-pager from the template, the four answers, and the diff context.
### Step 4: Voice self-check
Scan the draft for violations:
- Em dash present (`—`).
- Any banned word from the list.
- Average sentence length > 20 words.
- Body line count > 60 (feature notes only — architecture/design have no cap).
If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
### Step 5: Show + iterate
Print the full draft. Ask:
> Edit needed? Paste any changes, or say "looks good".
Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
### Step 6: Open PR to internal-docs
Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
```bash
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT # cleans up even if any step below fails
git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
cd "$TMP"
git checkout -b "docs/<slug>"
# Write the doc
mkdir -p "<type>" # features, architecture, designs, or setup
cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
<draft content>
DOC
# Update the root README index — insert one line under the matching section
# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
git commit -m "docs(<type>): <slug>"
git push -u origin "docs/<slug>"
PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
--head "docs/<slug>" \
--title "docs(<type>): <slug>" \
--body "$(cat <<'BODY'
## Summary
<one-line of what this doc covers>
## Source
- BrowserOS branch: <branch>
- Related PR: <#NNN if any>
BODY
)")
cd -
echo "PR opened: $PR_URL"
# trap above cleans up $TMP on EXIT
```
If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
### Step 7: Completion status
Report one of:
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
## Doc type defaults
| Branch pattern | Default doc type | Default location |
|----------------|------------------|------------------|
| `feat/*` | feature | `features/` |
| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
| `rfc/*` or `design/*` | design | `designs/` |
| Otherwise | ask | ask |
## Common Mistakes
**Drafting before asking the four questions**
- **Problem:** Output is generic filler that says nothing concrete.
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
**Touching `.internal-docs/` directly**
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
- **Fix:** Always use the tmp clone in Step 6.
**Skipping voice check on user edits**
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
- **Fix:** Re-run Step 4 after every user edit.
## Red Flags
**Never:**
- Push to `internal-docs/main`. Always branch + PR.
- Modify the OSS repo's `.gitmodules` or submodule pointer.
- Fabricate content for empty template sections.
**Always:**
- Pre-flight check before doing any work.
- One-pager rule for feature notes (60-line body cap).
- File:line citations when referencing code.

View File

@@ -0,0 +1,51 @@
# BrowserOS Internal Docs
Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
## How to find what you need
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
- Recently shipped feature → look in [`features/`](features/)
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
- A design decision or RFC → look in [`designs/`](designs/)
Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
## How to add a doc
Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
## Index
### Setup
<!-- one line per setup runbook: -->
<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
### Features
<!-- one line per shipped feature, newest first: -->
<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
### Architecture
<!-- one line per cross-cutting subsystem: -->
<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
### Designs
<!-- one line per design spec, newest first: -->
<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
## Templates
When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
## Voice
Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences, active voice, no em dashes.
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
- Empty sections stay empty. Do not write "N/A" or fake content.
- Feature notes target one screen, body 60 lines max.

View File

@@ -0,0 +1,31 @@
---
title: <subsystem name>
owner: <github handle>
status: current | deprecated
date: YYYY-MM-DD
related-features: [feature-slug-1, feature-slug-2]
---
# <subsystem name>
## What this subsystem does
<1-2 paragraphs. The top-level responsibility. Boundaries.>
## Architecture
<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
## Constraints
<Hard rules the design enforces. "X must never call Y" type statements.>
## Decisions made
<Numbered list of non-obvious decisions and the reason for each.>
## Key files
- `path/to/file.ts` — role
- `path/to/dir/` — what lives here
## How to evolve this
<Where to add things. Which tests to expect to update. What NOT to touch.>
## Open questions
<What is still being figured out. Empty if none.>

View File

@@ -0,0 +1,34 @@
---
title: <design name>
owner: <github handle>
status: proposed | accepted | rejected | superseded
date: YYYY-MM-DD
supersedes: <design-slug or none>
---
# <design name>
## Goal
<2-4 sentences. What this design is trying to accomplish.>
## Context
<1-2 paragraphs. The current state, what is failing, why this needs to change.>
## Selected Approach
<The chosen design at a high level. Architecture, components, data flow.>
## Alternatives Considered
### 1. <name>
<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
### 2. <name>
<Same shape.>
## Out of Scope
<What this design does NOT cover. Defer references.>
## Rollout
<Numbered steps from "nothing exists" to "fully shipped".>
## Open Questions
<Resolved during design? Empty. Unresolved? List with owner.>

View File

@@ -0,0 +1,29 @@
---
title: <feature name>
owner: <github handle>
status: shipped | wip | deprecated
date: YYYY-MM-DD
prs: ["#NNN"]
tags: [agent, browser, mcp]
---
# <feature name>
## What it does
<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
## Why we built it
<1-2 sentences. Motivation. What pain it removed or what unlocked.>
## How it works
<3-6 sentences. The flow at a high level. Name the key files.>
## Key files
- `path/to/file.ts` — what it does
- `path/to/other.ts` — what it does
## How to run / test it locally
<bullet list of commands. Empty section if N/A do not fake.>
## Gotchas
<known sharp edges. "If you see X, that's why." Empty if N/A.>

View File

@@ -1,157 +0,0 @@
name: build-agent
on:
workflow_dispatch:
inputs:
agent:
description: "Agent name from bundle.json"
required: true
type: string
default: openclaw
publish:
description: "Upload to R2 and merge manifest slice"
required: false
default: false
type: boolean
pull_request:
paths:
- "packages/browseros-agent/packages/build-tools/**"
- ".github/workflows/build-agent.yml"
env:
BUN_VERSION: "1.3.6"
PKG_DIR: packages/browseros-agent/packages/build-tools
permissions:
contents: read
jobs:
check:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- working-directory: packages/browseros-agent
run: bun run --filter @browseros/build-tools typecheck
- working-directory: packages/browseros-agent
run: bun run --filter @browseros/build-tools test
build:
needs: check
strategy:
fail-fast: false
matrix:
include:
- arch: arm64
runner: ubuntu-24.04-arm
runs-on: ${{ matrix.runner }}
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- name: Install podman
run: |
sudo apt-get update
sudo apt-get install -y podman
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Build tarball
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
OUT: ${{ github.workspace }}/dist/images
run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
- uses: actions/upload-artifact@v4
with:
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
path: dist/images/
retention-days: 7
smoke:
needs: build
runs-on: ubuntu-24.04-arm
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- uses: actions/download-artifact@v4
with:
name: tarball-${{ inputs.agent || 'openclaw' }}-arm64
path: dist/images
- name: Install podman
run: |
sudo apt-get update
sudo apt-get install -y podman
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Smoke test tarball
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
run: |
set -euo pipefail
tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-arm64.tar.gz" -print -quit)"
if [ -z "$tarball" ]; then
echo "missing arm64 tarball artifact for ${AGENT}" >&2
exit 1
fi
bun run smoke:tarball -- --agent "$AGENT" --arch arm64 --tarball "$tarball"
publish:
needs: [build, smoke]
if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
runs-on: ubuntu-24.04
environment: release
concurrency:
group: r2-manifest-publish
cancel-in-progress: false
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: ${{ env.BUN_VERSION }}
- uses: actions/download-artifact@v4
with:
pattern: tarball-*
path: dist/images
merge-multiple: true
- working-directory: packages/browseros-agent
run: bun install --frozen-lockfile
- name: Upload tarballs to R2
working-directory: ${{ env.PKG_DIR }}
env:
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
R2_BUCKET: ${{ secrets.R2_BUCKET }}
run: |
set -euo pipefail
for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
base="$(basename "$file")"
bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
done
- name: Merge agent slice into manifest
working-directory: ${{ env.PKG_DIR }}
env:
AGENT: ${{ inputs.agent || 'openclaw' }}
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
R2_BUCKET: ${{ secrets.R2_BUCKET }}
run: |
set -euo pipefail
mkdir -p dist/images
cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
bun run emit-manifest -- \
--slice "agents:${AGENT}" \
--dist-dir dist \
--merge-from dist/baseline-manifest.json \
--out dist/manifest.json
bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"

View File

@@ -14,7 +14,7 @@ on:
config:
description: 'Eval config file (relative to apps/eval/)'
required: false
default: 'configs/browseros-agent-weekly.json'
default: 'configs/legacy/browseros-agent-weekly.json'
permissions:
contents: read
@@ -62,33 +62,27 @@ jobs:
curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
unzip -qo /tmp/nopecha.zip -d extensions/nopecha
- name: Run eval
- name: Run eval and publish to R2
working-directory: packages/browseros-agent/apps/eval
env:
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
run: |
echo "Running eval with config: $EVAL_CONFIG"
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"
- name: Upload runs to R2
if: success()
working-directory: packages/browseros-agent/apps/eval
env:
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
# OpenClaw container runtime is macOS-only; opt the Linux runner
# into the no-op stub so the server can boot and the eval can run.
BROWSEROS_SKIP_OPENCLAW: '1'
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
run: |
CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
bun scripts/upload-run.ts "results/$CONFIG_NAME"
echo "Running eval with config: $EVAL_CONFIG"
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
- name: Generate trend report
if: success()
@@ -109,3 +103,11 @@ jobs:
with:
name: eval-report-${{ github.run_id }}
path: /tmp/eval-report.html
- name: Upload server stderr logs (for post-mortem on startup failures)
if: always()
uses: actions/upload-artifact@v4
with:
name: browseros-server-logs-${{ github.run_id }}
path: /tmp/browseros-server-logs/
if-no-files-found: ignore

View File

@@ -0,0 +1,62 @@
name: Sync internal-docs submodule
on:
schedule:
- cron: '0 */4 * * *'
workflow_dispatch:
jobs:
sync:
name: Bump internal-docs submodule pointer on dev
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Rewrite SSH submodule URL to HTTPS-with-token
env:
TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
run: |
git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
- uses: actions/checkout@v4
with:
token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
submodules: true
ref: dev
fetch-depth: 50
- name: Open auto-merge PR if internal-docs has new commits
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -e
# Skip if submodule not yet configured (handoff window before someone adds it)
if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
exit 0
fi
git submodule update --remote --merge .internal-docs
if git diff --quiet .internal-docs; then
echo "No internal-docs changes to sync."
exit 0
fi
BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
git config user.name "browseros-bot"
git config user.email "bot@browseros.ai"
git checkout -b "$BRANCH"
git add .internal-docs
git commit -m "chore: sync internal-docs submodule"
git push -u origin "$BRANCH"
PR_URL=$(gh pr create \
--base dev \
--head "$BRANCH" \
--title "chore: sync internal-docs submodule" \
--body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
gh pr merge "$PR_URL" --auto --squash --delete-branch

View File

@@ -63,15 +63,15 @@ jobs:
junit_path: test-results/server-root.xml
needs_browser: false
- suite: agent
command: bun run test:agent
command: (cd apps/agent && bun run test)
junit_path: test-results/agent.xml
needs_browser: false
- suite: eval
command: bun run test:eval
command: (cd apps/eval && bun run test)
junit_path: test-results/eval.xml
needs_browser: false
- suite: build
command: bun run test:build
command: bun run ./scripts/run-bun-test.ts ./scripts/build
junit_path: test-results/build.xml
needs_browser: false

4
.gitmodules vendored
View File

@@ -0,0 +1,4 @@
[submodule ".internal-docs"]
path = .internal-docs
url = git@github.com:browseros-ai/internal-docs.git
branch = main

1
.internal-docs Submodule

Submodule .internal-docs added at 590799ae1c

View File

@@ -188,6 +188,21 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
- [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.
## Citation
If you use BrowserOS in your research or project, please cite:
```bibtex
@software{browseros2025,
author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
title = {BrowserOS: The open-source Agentic browser},
url = {https://github.com/browseros-ai/BrowserOS},
year = {2025},
publisher = {GitHub},
license = {AGPL-3.0},
}
```
## License
BrowserOS is open source under the [AGPL-3.0 license](LICENSE).

View File

@@ -79,14 +79,15 @@ cp apps/server/.env.example apps/server/.env.development
cp apps/agent/.env.example apps/agent/.env.development
cp apps/server/.env.production.example apps/server/.env.production
# Install deps, generate agent code, and sync the VM cache
# Install deps and generate agent code
bun run dev:setup
# Start the full dev environment
bun run dev:watch
```
`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.
`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
the server startup path and pulls the configured GHCR image on demand.
### Environment Variables
@@ -156,9 +157,14 @@ bun run build:server # Build production server resource artifacts and u
bun run build:agent # Build agent extension
# Test
bun run test # Run standard tests
bun run test:cdp # Run CDP-based tests
bun run test:integration # Run integration tests
bun run test # Run all tests
bun run test:all # Run all tests
bun run test:main # Run key server tools and integration tests
# App-specific test groups (from packages/browseros-agent)
cd apps/server && bun run test:tools
cd apps/server && bun run test:cdp
cd apps/server && bun run test:integration
# Quality
bun run lint # Check with Biome

View File

@@ -17,7 +17,7 @@ export function groupProviderOptions(
? [{ key: 'llm' as const, label: 'AI Providers', options: llm }]
: []),
...(acp.length
? [{ key: 'acp' as const, label: 'ACP Models', options: acp }]
? [{ key: 'acp' as const, label: 'Agents', options: acp }]
: []),
]
}
@@ -26,14 +26,25 @@ export function getProviderSearchValue(
provider: Provider,
groupLabel: string,
): string {
return [provider.id, provider.name, provider.type, groupLabel]
return [
provider.id,
provider.name,
provider.type,
groupLabel,
provider.adapterName,
provider.modelLabel,
]
.filter(Boolean)
.join(' ')
}
export function getProviderSubtitle(provider: Provider): string | undefined {
if (provider.kind !== 'acp') return undefined
return provider.modelControl === 'best-effort'
? 'ACP model · best effort'
: 'ACP model'
return [
provider.adapterName,
provider.modelLabel,
provider.modelControl === 'best-effort' ? 'best effort' : undefined,
]
.filter(Boolean)
.join(' · ')
}

View File

@@ -16,22 +16,26 @@ const options: Provider[] = [
},
{
kind: 'acp',
id: 'acp:claude:haiku:medium',
name: 'Claude Code Haiku',
id: 'agent-claude-review',
name: 'Review Bot',
type: 'acp',
adapterName: 'Claude Code',
modelLabel: 'Haiku',
modelControl: 'best-effort',
},
{
kind: 'acp',
id: 'acp:codex:gpt-5.5:medium',
name: 'Codex GPT-5.5',
id: 'agent-codex-browser',
name: 'Browser Driver',
type: 'acp',
adapterName: 'Codex',
modelLabel: 'GPT-5.5',
modelControl: 'runtime-supported',
},
]
describe('groupProviderOptions', () => {
it('groups normal providers separately from ACP models', () => {
it('groups normal providers separately from created agents', () => {
expect(groupProviderOptions(options)).toEqual([
{
key: 'llm',
@@ -40,7 +44,7 @@ describe('groupProviderOptions', () => {
},
{
key: 'acp',
label: 'ACP Models',
label: 'Agents',
options: [options[2], options[3]],
},
])
@@ -48,20 +52,21 @@ describe('groupProviderOptions', () => {
})
describe('getProviderSearchValue', () => {
it('matches ACP group labels and item labels', () => {
expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
'ACP Models',
)
expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
'Claude Code Haiku',
it('matches created-agent group labels and item labels', () => {
expect(getProviderSearchValue(options[2], 'Agents')).toContain('Agents')
expect(getProviderSearchValue(options[2], 'Agents')).toContain('Review Bot')
expect(getProviderSearchValue(options[2], 'Agents')).toContain(
'Claude Code',
)
})
})
describe('getProviderSubtitle', () => {
it('does not present best-effort ACP models as guaranteed routing', () => {
expect(getProviderSubtitle(options[2])).toBe('ACP model · best effort')
expect(getProviderSubtitle(options[3])).toBe('ACP model')
it('describes created-agent runtime context without model-target copy', () => {
expect(getProviderSubtitle(options[2])).toBe(
'Claude Code · Haiku · best effort',
)
expect(getProviderSubtitle(options[3])).toBe('Codex · GPT-5.5')
expect(getProviderSubtitle(options[0])).toBeUndefined()
})
})

View File

@@ -41,7 +41,10 @@ export const ChatProviderSelector: FC<
<PopoverTrigger asChild>{children}</PopoverTrigger>
<PopoverContent side="bottom" align="start" className="w-64 p-0">
<Command>
<CommandInput placeholder="Search models..." className="h-9" />
<CommandInput
placeholder="Search providers or agents..."
className="h-9"
/>
<CommandList>
<CommandEmpty>No provider found</CommandEmpty>
{groups.map((group) => (

View File

@@ -7,5 +7,8 @@ export interface Provider {
name: string
type: ChatProviderType
kind: 'llm' | 'acp'
agentId?: string
adapterName?: string
modelLabel?: string
modelControl?: 'runtime-supported' | 'best-effort'
}

View File

@@ -1,136 +0,0 @@
import { Bot, Loader2, Wrench } from 'lucide-react'
import type { FC } from 'react'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import { cn } from '@/lib/utils'
interface AgentCardProps {
agent: AgentCardData
onClick: () => void
active?: boolean
}
function formatTimestamp(timestamp?: number): string {
if (!timestamp) return 'No activity yet'
const diff = Date.now() - timestamp
const minutes = Math.floor(diff / 60000)
if (minutes < 1) return 'just now'
if (minutes < 60) return `${minutes}m ago`
const hours = Math.floor(minutes / 60)
if (hours < 24) return `${hours}h ago`
return `${Math.floor(hours / 24)}d ago`
}
function getStatusLabel(status: AgentCardData['status']): string {
if (status === 'working') return 'Working'
if (status === 'error') return 'Error'
return 'Ready'
}
function getStatusTone(status: AgentCardData['status']): string {
if (status === 'working') return 'bg-amber-500'
if (status === 'error') return 'bg-destructive'
return 'bg-emerald-500'
}
function formatCost(usd: number): string {
if (usd < 0.005) return `$${usd.toFixed(4)}`
return `$${usd.toFixed(2)}`
}
export const AgentCardExpanded: FC<AgentCardProps> = ({
agent,
onClick,
active,
}) => (
<button
type="button"
onClick={onClick}
className={cn(
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
active
? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
: 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
)}
>
<div className="flex items-start justify-between gap-3">
<div className="flex min-w-0 items-center gap-3">
<div
className={cn(
'flex size-10 shrink-0 items-center justify-center rounded-xl',
active
? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
: 'bg-muted text-muted-foreground',
)}
>
<Bot className="size-5" />
</div>
<div className="min-w-0">
<div className="truncate font-semibold text-sm">{agent.name}</div>
<div className="truncate text-muted-foreground text-xs">
{agent.model ?? 'OpenClaw agent'}
</div>
</div>
</div>
<div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
<span
className={cn('size-2 rounded-full', getStatusTone(agent.status))}
/>
<span>{getStatusLabel(agent.status)}</span>
</div>
</div>
<div className="mt-4 flex-1">
<p className="line-clamp-2 text-foreground/90 text-sm">
{agent.lastMessage ??
'Start a conversation to see recent work and summaries.'}
</p>
</div>
<div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
<div className="flex items-center justify-between gap-3">
<span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
{agent.costUsd ? (
<span className="tabular-nums opacity-70">
{formatCost(agent.costUsd)}
</span>
) : null}
</div>
{agent.status === 'working' && agent.currentTool ? (
<div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
<Loader2 className="size-3 shrink-0 animate-spin" />
<span className="truncate">{agent.currentTool}</span>
</div>
) : agent.activitySummary ? (
<div className="flex items-center gap-1.5 text-muted-foreground/60">
<Wrench className="size-3 shrink-0" />
<span className="truncate">{agent.activitySummary}</span>
</div>
) : null}
</div>
</button>
)
export const AgentCardCompact: FC<AgentCardProps> = ({
agent,
onClick,
active,
}) => (
<button
type="button"
onClick={onClick}
className={cn(
'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
active
? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
: 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
)}
>
<span
className={cn(
'size-2 rounded-full',
active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
)}
/>
<span className="truncate">{agent.name}</span>
</button>
)

View File

@@ -1,70 +1,71 @@
import { Plus } from 'lucide-react'
import type { FC } from 'react'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import type {
HarnessAdapterDescriptor,
HarnessAdapterHealth,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import { cn } from '@/lib/utils'
import { AgentCardCompact, AgentCardExpanded } from './AgentCard'
import { HomeAgentCard } from './HomeAgentCard'
interface AgentCardDockProps {
agents: AgentCardData[]
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
activeAgentId?: string
onSelectAgent: (agentId: string) => void
onCreateAgent?: () => void
compact?: boolean
}
function CreateAgentButton({
compact,
onCreateAgent,
}: {
compact?: boolean
onCreateAgent: () => void
}) {
function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
return (
<button
type="button"
onClick={onCreateAgent}
className={cn(
'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
compact
? 'rounded-full px-3 py-2 text-sm'
: 'min-h-32 rounded-2xl px-5 py-4',
'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
)}
>
<Plus className={compact ? 'size-3.5' : 'size-5'} />
<span>{compact ? 'New' : 'Create agent'}</span>
<Plus className="size-5" />
<span>Create agent</span>
</button>
)
}
/**
* 3-column grid of HomeAgentCards plus a trailing "Create agent"
* tile. The previous `compact` mode (rendered a horizontal pill rail)
* had no callers and was dropped along with the legacy AgentCard.
*/
export const AgentCardDock: FC<AgentCardDockProps> = ({
agents,
adapters,
activeAgentId,
onSelectAgent,
onCreateAgent,
compact,
}) => {
if (agents.length === 0 && !onCreateAgent) return null
const Card = compact ? AgentCardCompact : AgentCardExpanded
const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
for (const descriptor of adapters) {
if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
}
return (
<div
className={cn(
compact
? 'flex items-center gap-2 overflow-x-auto pb-1'
: 'grid gap-4 md:grid-cols-3',
)}
>
<div className="grid gap-4 md:grid-cols-3">
{agents.map((agent) => (
<Card
key={agent.agentId}
<HomeAgentCard
key={agent.id}
agent={agent}
active={agent.agentId === activeAgentId}
onClick={() => onSelectAgent(agent.agentId)}
adapter={agent.adapter}
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
active={agent.id === activeAgentId}
onClick={() => onSelectAgent(agent.id)}
/>
))}
{onCreateAgent ? (
<CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
<CreateAgentButton onCreateAgent={onCreateAgent} />
) : null}
</div>
)

View File

@@ -1,179 +1,35 @@
import { ArrowLeft, Bot, Home } from 'lucide-react'
import { ArrowLeft } from 'lucide-react'
import { type FC, useEffect, useMemo, useRef } from 'react'
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
import { Button } from '@/components/ui/button'
import type {
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import {
type AgentEntry,
getModelDisplayName,
} from '@/entrypoints/app/agents/useOpenClaw'
import { cn } from '@/lib/utils'
cancelHarnessTurn,
useAgentAdapters,
useEnqueueHarnessMessage,
useHarnessAgents,
useRemoveHarnessQueuedMessage,
useUpdateHarnessAgent,
} from '@/entrypoints/app/agents/useAgents'
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
import { AgentRail } from './AgentRail'
import { useAgentCommandData } from './agent-command-layout'
import { ClawChat } from './ClawChat'
import { ConversationHeader } from './ConversationHeader'
import { ConversationInput } from './ConversationInput'
import {
buildChatHistoryFromClawMessages,
filterTurnsPersistedInHistory,
flattenHistoryPages,
} from './claw-chat-types'
import { QueuePanel } from './QueuePanel'
import { useAgentConversation } from './useAgentConversation'
import { useHarnessChatHistory } from './useHarnessChatHistory'
function StatusBadge({ status }: { status: string }) {
return (
<div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
<span
className={cn(
'size-1.5 rounded-full',
status === 'Working on your request'
? 'bg-amber-500'
: status === 'Ready'
? 'bg-emerald-500'
: status === 'Offline'
? 'bg-muted-foreground/50'
: 'bg-[var(--accent-orange)]',
)}
/>
<span>{status}</span>
</div>
)
}
function AgentIdentity({
name,
meta,
className,
}: {
name: string
meta: string
className?: string
}) {
return (
<div className={cn('min-w-0', className)}>
<div className="truncate font-semibold text-[15px] leading-5">{name}</div>
<div className="truncate text-muted-foreground text-xs leading-5">
{meta}
</div>
</div>
)
}
function ConversationHeader({
agentName,
agentMeta,
status,
backLabel,
backTarget,
onGoHome,
}: {
agentName: string
agentMeta: string
status: string
backLabel: string
backTarget: 'home' | 'page'
onGoHome: () => void
}) {
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
return (
<div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 rounded-xl lg:hidden"
title={backLabel}
>
<BackIcon className="size-4" />
</Button>
<div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
<Bot className="size-4" />
</div>
<AgentIdentity name={agentName} meta={agentMeta} />
</div>
<StatusBadge status={status} />
</div>
)
}
function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
return (
<div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 rounded-xl"
title="Back to home"
>
<ArrowLeft className="size-4" />
</Button>
<div className="truncate font-semibold text-[15px] leading-5">
Agents
</div>
</div>
</div>
)
}
function AgentRailList({
activeAgentId,
agents,
onSelectAgent,
}: {
activeAgentId: string
agents: AgentEntry[]
onSelectAgent: (entry: AgentEntry) => void
}) {
return (
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
<div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
{agents.map((entry) => {
const active = entry.agentId === activeAgentId
const modelName = getAgentEntryMeta(entry)
return (
<button
key={entry.agentId}
type="button"
onClick={() => onSelectAgent(entry)}
className={cn(
'w-full rounded-2xl border px-3 py-3 text-left transition-all',
active
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
)}
>
<div className="flex items-center gap-3">
<div
className={cn(
'flex size-9 items-center justify-center rounded-xl',
active
? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
: 'bg-muted text-muted-foreground',
)}
>
<Bot className="size-4" />
</div>
<AgentIdentity name={entry.name} meta={modelName} />
</div>
</button>
)
})}
</div>
</aside>
)
}
function getAgentEntryMeta(agent: AgentEntry | undefined): string {
if (agent?.source === 'agent-harness') {
return getModelDisplayName(agent.model) ?? 'ACP agent'
}
return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
}
function AgentConversationController({
agentId,
initialMessage,
@@ -212,15 +68,33 @@ function AgentConversationController({
[historyMessages],
)
// Listing query feeds queue + active-turn state for this agent. We
// already poll it every 5s for the rail; reusing the same cache
// keeps cross-tab queue state in sync without a second poll.
const { harnessAgents } = useHarnessAgents()
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
const queue = harnessAgent?.queue ?? []
const activeTurnId = harnessAgent?.activeTurnId ?? null
const { turns, streaming, send } = useAgentConversation(agentId, {
runtime: 'agent-harness',
sessionKey: null,
history: chatHistory,
activeTurnId,
onComplete: () => {
void harnessHistoryQuery.refetch()
},
onSessionKeyChange: () => {},
})
const enqueueMessage = useEnqueueHarnessMessage()
const removeQueuedMessage = useRemoveHarnessQueuedMessage()
const handleStop = () => {
void cancelHarnessTurn(agentId, {
turnId: activeTurnId ?? undefined,
reason: 'user pressed stop',
})
}
const visibleTurns = useMemo(
() => filterTurnsPersistedInHistory(turns, historyMessages),
[historyMessages, turns],
@@ -264,7 +138,7 @@ function AgentConversationController({
}
return (
<div className="flex min-h-0 flex-col overflow-hidden">
<div className="flex min-h-0 flex-1 flex-col overflow-hidden">
<ClawChat
agentName={agentName}
historyMessages={historyMessages}
@@ -281,7 +155,15 @@ function AgentConversationController({
/>
<div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
<div className="mx-auto max-w-3xl">
<div className="mx-auto max-w-3xl space-y-3">
{queue.length > 0 ? (
<QueuePanel
queue={queue}
onRemove={(messageId) =>
removeQueuedMessage.mutate({ agentId, messageId })
}
/>
) : null}
<ConversationInput
variant="conversation"
agents={agents}
@@ -296,14 +178,31 @@ function AgentConversationController({
name: a.name,
dataUrl: a.dataUrl,
}))
// When the agent already has an in-flight turn, route
// the new message into the durable queue instead of
// starting a parallel turn. Drains automatically as
// soon as the active turn ends.
if (streaming || activeTurnId) {
enqueueMessage.mutate({
agentId,
message: input.text,
attachments,
})
return
}
void send({ text: input.text, attachments, attachmentPreviews })
}}
onCreateAgent={() => navigate(createAgentPath)}
onStop={handleStop}
streaming={streaming}
disabled={disabled}
status="running"
attachmentsEnabled={true}
placeholder={`Message ${agentName}...`}
placeholder={
streaming
? `Type to queue another message for ${agentName}...`
: `Message ${agentName}...`
}
/>
</div>
</div>
@@ -318,6 +217,22 @@ interface AgentCommandConversationProps {
createAgentPath?: string
}
function inferAdapterFromEntry(
entry: AgentEntry | undefined,
): HarnessAgentAdapter | 'unknown' {
if (!entry) return 'unknown'
if (entry.source === 'agent-harness') {
// Harness entries don't carry the adapter on AgentEntry; the rail
// / header read the harness record directly. This branch only runs
// before the harness query resolves, so 'unknown' is correct — the
// tile's bot fallback renders until data arrives.
return 'unknown'
}
// OpenClaw-only entries (no harness shadow) are deprecated in
// practice but the rail still tolerates them.
return 'openclaw'
}
export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
variant = 'command',
backPath = '/home',
@@ -328,60 +243,110 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
const [searchParams, setSearchParams] = useSearchParams()
const navigate = useNavigate()
const { agents } = useAgentCommandData()
const { harnessAgents } = useHarnessAgents()
const { adapters } = useAgentAdapters()
const updateAgent = useUpdateHarnessAgent()
const shouldRedirectHome = !agentId
const resolvedAgentId = agentId ?? ''
const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
const agentName = agent?.name || resolvedAgentId || 'Agent'
const agentMeta = getAgentEntryMeta(agent)
const harnessAgent = harnessAgents.find(
(entry) => entry.id === resolvedAgentId,
)
const entry = agents.find((item) => item.agentId === resolvedAgentId)
const fallbackName = entry?.name || resolvedAgentId || 'Agent'
const fallbackAdapter = inferAdapterFromEntry(entry)
const initialMessage = searchParams.get('q')
const isPageVariant = variant === 'page'
const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'
const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
const adapterId = harnessAgent?.adapter
if (!adapterId) return null
const descriptor = adapters.find((item) => item.id === adapterId)
if (!descriptor?.health) return null
return {
healthy: descriptor.health.healthy,
reason: descriptor.health.reason,
}
}, [adapters, harnessAgent?.adapter])
if (shouldRedirectHome) {
return <Navigate to="/home" replace />
}
const handleSelectAgent = (entry: AgentEntry) => {
navigate(`${agentPathPrefix}/${entry.agentId}`)
const handleSelectHarnessAgent = (target: HarnessAgent) => {
navigate(`${agentPathPrefix}/${target.id}`)
}
// Every visible agent runs through the harness now, so per-agent
// runtime status doesn't gate chat the way OpenClaw's legacy
// gateway lifecycle did. Show "Ready" once the agent record is
// resolved from the rail, "Setup" otherwise.
const statusCopy = agent ? 'Ready' : 'Setup'
const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
if (!target) return
updateAgent.mutate({
agentId: target.id,
patch: { pinned: next },
})
}
return (
<div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
<div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
<AgentRailHeader onGoHome={() => navigate(backPath)} />
<div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
{/* Shared top band — the rail's "Agents" header and the chat
header live on one row so they're aligned by construction. */}
<div className="flex shrink-0 items-stretch border-border/50 border-b">
<div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
<Button
variant="ghost"
size="icon"
onClick={() => navigate(backPath)}
className="size-8 rounded-xl"
title="Back to home"
>
<ArrowLeft className="size-4" />
</Button>
<div className="truncate font-semibold text-[15px] leading-5">
Agents
</div>
</div>
<div className="min-w-0 flex-1">
<ConversationHeader
agent={harnessAgent ?? null}
fallbackName={fallbackName}
fallbackAdapter={fallbackAdapter}
adapterHealth={adapterHealth}
backLabel={backLabel}
backTarget={isPageVariant ? 'page' : 'home'}
onGoHome={() => navigate(backPath)}
onPinToggle={(next) =>
handlePinToggle(harnessAgent ?? null, next)
}
/>
</div>
</div>
<ConversationHeader
agentName={agentName}
agentMeta={agentMeta}
status={statusCopy}
backLabel={backLabel}
backTarget={isPageVariant ? 'page' : 'home'}
onGoHome={() => navigate(backPath)}
/>
{/* Body grid: rail list + chat. Both columns share the same
top edge (the band above) so headers can never drift. */}
<div className="grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)] lg:grid-cols-[288px_minmax(0,1fr)]">
<AgentRail
agents={harnessAgents}
adapters={adapters}
activeAgentId={resolvedAgentId}
onSelectAgent={handleSelectHarnessAgent}
onPinToggle={(target, next) => handlePinToggle(target, next)}
/>
<AgentRailList
activeAgentId={resolvedAgentId}
agents={agents}
onSelectAgent={handleSelectAgent}
/>
<AgentConversationController
key={resolvedAgentId}
agentId={resolvedAgentId}
agents={agents}
initialMessage={initialMessage}
onInitialMessageConsumed={() =>
setSearchParams({}, { replace: true })
}
agentPathPrefix={agentPathPrefix}
createAgentPath={createAgentPath}
/>
<div className="flex h-full min-h-0 flex-col overflow-hidden">
<AgentConversationController
key={resolvedAgentId}
agentId={resolvedAgentId}
agents={agents}
initialMessage={initialMessage}
onInitialMessageConsumed={() =>
setSearchParams({}, { replace: true })
}
agentPathPrefix={agentPathPrefix}
createAgentPath={createAgentPath}
/>
</div>
</div>
</div>
</div>
)

View File

@@ -1,18 +1,25 @@
import { Plus } from 'lucide-react'
import { type FC, useEffect, useState } from 'react'
import { type FC, useEffect, useMemo, useState } from 'react'
import { useNavigate } from 'react-router'
import { Button } from '@/components/ui/button'
import { Card, CardContent } from '@/components/ui/card'
import { Separator } from '@/components/ui/separator'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
} from '@/entrypoints/app/agents/agent-harness-types'
import {
useAgentAdapters,
useHarnessAgents,
} from '@/entrypoints/app/agents/useAgents'
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import { AgentCardDock } from './AgentCardDock'
import { useAgentCommandData } from './agent-command-layout'
import { ConversationInput } from './ConversationInput'
import { buildAgentCardData } from './useAgentCardData'
import { orderHomeAgents } from './home-agent-card.helpers'
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
return (
@@ -38,11 +45,13 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
function RecentThreads({
activeAgentId,
agents,
adapters,
onOpenAgents,
onSelectAgent,
}: {
activeAgentId?: string | null
agents: AgentCardData[]
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
onOpenAgents: () => void
onSelectAgent: (agentId: string) => void
}) {
@@ -68,6 +77,7 @@ function RecentThreads({
</div>
<AgentCardDock
agents={agents}
adapters={adapters}
activeAgentId={activeAgentId ?? undefined}
onSelectAgent={onSelectAgent}
onCreateAgent={onOpenAgents}
@@ -79,25 +89,32 @@ function RecentThreads({
export const AgentCommandHome: FC = () => {
const navigate = useNavigate()
const activeHint = useActiveHint()
const { agents, status } = useAgentCommandData()
// The conversation input still consumes the merged AgentEntry list
// from the layout context (handles legacy /claw/agents entries that
// haven't yet been backfilled into the harness store). The Recent
// Agents grid below reads the richer harness payload directly.
const { agents: legacyAgents, status } = useAgentCommandData()
const { harnessAgents } = useHarnessAgents()
const { adapters } = useAgentAdapters()
const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
const cardData = buildAgentCardData(agents, status?.status, undefined)
const orderedAgents = useMemo(
() => orderHomeAgents(harnessAgents),
[harnessAgents],
)
useEffect(() => {
if (agents.length === 0) {
if (selectedAgentId) {
setSelectedAgentId(null)
}
if (legacyAgents.length === 0) {
if (selectedAgentId) setSelectedAgentId(null)
return
}
if (
!selectedAgentId ||
!agents.some((agent) => agent.agentId === selectedAgentId)
!legacyAgents.some((agent) => agent.agentId === selectedAgentId)
) {
setSelectedAgentId(agents[0].agentId)
setSelectedAgentId(legacyAgents[0].agentId)
}
}, [agents, selectedAgentId])
}, [legacyAgents, selectedAgentId])
const handleSend = (input: { text: string }) => {
if (!selectedAgentId) return
@@ -110,7 +127,7 @@ export const AgentCommandHome: FC = () => {
setSelectedAgentId(agent.agentId)
}
const selectedAgent = agents.find(
const selectedAgent = legacyAgents.find(
(agent) => agent.agentId === selectedAgentId,
)
const selectedAgentReady = selectedAgent
@@ -118,13 +135,15 @@ export const AgentCommandHome: FC = () => {
: false
const selectedAgentStatus =
selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
const selectedCard =
cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]
const selectedAgentName =
selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
const hasAgents = legacyAgents.length > 0
return (
<div className="min-h-full px-4 py-6">
<div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
{cardData.length > 0 ? (
{hasAgents ? (
<>
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
<div className="space-y-3">
@@ -140,7 +159,7 @@ export const AgentCommandHome: FC = () => {
<div className="w-full max-w-3xl">
<ConversationInput
variant="home"
agents={agents}
agents={legacyAgents}
selectedAgentId={selectedAgentId}
onSelectAgent={handleSelectAgent}
onSend={handleSend}
@@ -151,7 +170,7 @@ export const AgentCommandHome: FC = () => {
attachmentsEnabled={false}
placeholder={
selectedAgentReady
? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
? `Ask ${selectedAgentName} to handle a task...`
: 'Agent runtime is not running...'
}
/>
@@ -162,7 +181,8 @@ export const AgentCommandHome: FC = () => {
<RecentThreads
activeAgentId={selectedAgentId}
agents={cardData}
agents={orderedAgents}
adapters={adapters}
onOpenAgents={() => navigate('/agents')}
onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
/>

View File

@@ -0,0 +1,65 @@
import { type FC, useMemo } from 'react'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
import { AgentRailRow } from './AgentRailRow'
interface AgentRailProps {
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
activeAgentId: string
onSelectAgent: (agent: HarnessAgent) => void
onPinToggle: (agent: HarnessAgent, next: boolean) => void
}
/**
* Left-column scrollable list of agents. The "Agents" label + back
* button live in the shared top band above (so the rail header and
* the chat header sit on a single aligned strip rather than as two
* separately-sized headers per column). Sort matches `/agents`:
* pinned-first → recency, so the rail doesn't reshuffle as turns
* transition every 5 s.
*/
export const AgentRail: FC<AgentRailProps> = ({
agents,
adapters,
activeAgentId,
onSelectAgent,
onPinToggle,
}) => {
const adapterHealth = useMemo(() => {
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
for (const adapter of adapters) {
if (adapter.health) {
map.set(adapter.id, {
healthy: adapter.health.healthy,
reason: adapter.health.reason,
})
}
}
return map
}, [adapters])
const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
return (
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
<div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
{ordered.map((agent) => (
<AgentRailRow
key={agent.id}
agent={agent}
active={agent.id === activeAgentId}
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
onSelect={() => onSelectAgent(agent)}
onPinToggle={(next) => onPinToggle(agent, next)}
/>
))}
</div>
</aside>
)
}

View File

@@ -0,0 +1,102 @@
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
import { cn } from '@/lib/utils'
interface AgentRailRowProps {
agent: HarnessAgent
active: boolean
adapterHealth: AgentAdapterHealth | null
onSelect: () => void
onPinToggle: (next: boolean) => void
}
/**
* Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
* down to the essentials that fit a ~280 px rail: tile + name + status
* badge + pin star, with the adapter / model / reasoning chips on a
* second line. Token totals, sparkline, last-message preview all stay
* on the `/agents` page where rows are full-width.
*/
export const AgentRailRow: FC<AgentRailRowProps> = ({
agent,
active,
adapterHealth,
onSelect,
onPinToggle,
}) => {
const status = agent.status ?? 'unknown'
const lastUsedAt = agent.lastUsedAt ?? null
const pinned = agent.pinned ?? false
return (
<button
type="button"
onClick={onSelect}
className={cn(
'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
active
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
)}
>
<div className="flex min-w-0 items-start gap-3">
<AgentTile
adapter={agent.adapter}
status={status}
lastUsedAt={lastUsedAt}
/>
<div className="min-w-0 flex-1">
<div className="flex items-center gap-1.5">
<span className="truncate font-semibold text-[14px] leading-5">
{agent.name}
</span>
{status === 'working' && (
<Badge
variant="secondary"
className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
{status === 'asleep' && (
<Badge
variant="outline"
className="h-5 px-1.5 text-[10px] text-muted-foreground"
>
Asleep
</Badge>
)}
{status === 'error' && (
<Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
Attention
</Badge>
)}
<div className="ml-auto">
<PinToggle pinned={pinned} onToggle={onPinToggle} />
</div>
</div>
<AgentSummaryChips
adapter={agent.adapter}
modelLabel={agent.modelId ?? null}
reasoningEffort={agent.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
</button>
)
}
/**
* Tooltip-only label helper kept exported in case the tile row needs to
* show "Codex agent" or similar in a future state. Inlined fallback for
* the rare `unknown` adapter rendering path.
*/
export function railRowAdapterLabel(agent: HarnessAgent): string {
return adapterLabel(agent.adapter)
}

View File

@@ -0,0 +1,179 @@
import { ArrowLeft, Home } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
import { cn } from '@/lib/utils'
interface ConversationHeaderProps {
agent: HarnessAgent | null
fallbackName: string
fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'unknown'
adapterHealth: AgentAdapterHealth | null
backLabel: string
backTarget: 'home' | 'page'
onGoHome: () => void
onPinToggle: (next: boolean) => void
}
/**
* Strip above the chat. Mirrors the `/agents` row card's title row +
* summary chips so the user gets adapter health, pin state, and status
* at a glance — but adds the meta line (last used · lifetime tokens ·
* queued) that's specific to this surface.
*
* The mobile `lg:hidden` Back button is preserved so the small-screen
* collapse keeps a navigable header without a sidebar.
*/
export const ConversationHeader: FC<ConversationHeaderProps> = ({
agent,
fallbackName,
fallbackAdapter,
adapterHealth,
backLabel,
backTarget,
onGoHome,
onPinToggle,
}) => {
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
const adapter = agent?.adapter ?? fallbackAdapter
const status: AgentLiveness = agent?.status ?? 'unknown'
const lastUsedAt = agent?.lastUsedAt ?? null
const pinned = agent?.pinned ?? false
const queueCount = agent?.queue?.length ?? 0
const tokens = agent?.tokens ?? null
const lifetimeTotal = tokens
? tokens.cumulative.input + tokens.cumulative.output
: 0
const metaParts: string[] = []
if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
if (queueCount > 0) {
metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
}
return (
<div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 shrink-0 rounded-xl lg:hidden"
title={backLabel}
>
<BackIcon className="size-4" />
</Button>
<div className="group min-w-0 flex-1">
<div className="flex items-center gap-2">
<span className="truncate font-semibold text-[15px] leading-6">
{agent?.name || fallbackName}
</span>
{agent ? (
<PinToggle pinned={pinned} onToggle={onPinToggle} />
) : null}
</div>
<div className="mt-0.5 flex items-center gap-2">
<AgentSummaryChips
adapter={adapter}
modelLabel={agent?.modelId ?? null}
reasoningEffort={agent?.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
</div>
<div className="flex shrink-0 flex-col items-end gap-1">
<StatusPill
status={status}
hasActiveTurn={Boolean(agent?.activeTurnId)}
/>
<div className="flex h-4 items-center text-[11px] text-muted-foreground">
<span className="truncate">
{metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
</span>
</div>
</div>
</div>
)
}
interface StatusPillProps {
status: AgentLiveness
hasActiveTurn: boolean
}
/**
* Working / Asleep / Attention all get distinctive styling; idle keeps
* the legacy emerald `Ready` pill so the default state is visually
* calm. Defensive working: `idle + activeTurnId` falls through to the
* working pill since the server says a turn is in flight.
*/
const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
const effective: AgentLiveness =
status === 'idle' && hasActiveTurn ? 'working' : status
const base =
'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
if (effective === 'working') {
return (
<Badge
variant="secondary"
className={cn(
base,
'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
)}
>
<span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
Working
</Badge>
)
}
if (effective === 'asleep') {
return (
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
<span className="size-1.5 rounded-full bg-muted-foreground/50" />
Asleep
</Badge>
)
}
if (effective === 'error') {
return (
<Badge
variant="destructive"
className={cn(base, 'border-destructive/30')}
>
<span className="size-1.5 rounded-full bg-destructive-foreground" />
Attention
</Badge>
)
}
if (effective === 'idle') {
return (
<Badge
variant="outline"
className={cn(
base,
'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
)}
>
<span className="size-1.5 rounded-full bg-emerald-500" />
Ready
</Badge>
)
}
return (
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
<span className="size-1.5 rounded-full bg-muted-foreground/30" />
Setup
</Badge>
)
}

View File

@@ -54,25 +54,40 @@ interface ConversationInputProps {
placeholder?: string
attachmentsEnabled?: boolean
variant?: 'home' | 'conversation'
/**
* When set, a Stop button surfaces to the left of the voice mic
* while `streaming === true`. Click cancels the active turn
* server-side via the chat-cancel endpoint. Absent → no Stop
* button (legacy behaviour for the home composer).
*/
onStop?: () => void
}
function InputActionButton({
disabled,
onClick,
streaming,
hasContent,
}: {
disabled: boolean
onClick: () => void
streaming: boolean
hasContent: boolean
}) {
// Show the spinner while streaming only when there's nothing to
// send — once the user types something, the icon flips back to the
// paper-plane so it reads as "queue this message" instead of
// "still working".
const showSpinner = streaming && !hasContent
return (
<Button
onClick={onClick}
size="icon"
disabled={disabled}
title={streaming && hasContent ? 'Queue message' : undefined}
className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
>
{streaming ? (
{showSpinner ? (
<Loader2 className="h-5 w-5 animate-spin" />
) : (
<ArrowRight className="h-5 w-5" />
@@ -81,6 +96,22 @@ function InputActionButton({
)
}
function StopButton({ onStop }: { onStop: () => void }) {
return (
<Button
type="button"
size="icon"
variant="ghost"
onClick={onStop}
title="Stop current turn — queued messages will start next."
aria-label="Stop current turn"
className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
>
<Square className="h-3.5 w-3.5 fill-current" />
</Button>
)
}
function VoiceButton({
isRecording,
isTranscribing,
@@ -299,6 +330,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
placeholder,
attachmentsEnabled = true,
variant = 'conversation',
onStop,
}) => {
const [input, setInput] = useState('')
const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -379,10 +411,17 @@ export const ConversationInput: FC<ConversationInputProps> = ({
}
const hasContent = input.trim().length > 0 || attachments.length > 0
// Queue-aware composers (the conversation panel passes `onStop`)
// accept input while streaming — the parent decides whether the
// submission opens a new turn or enqueues onto the active one.
// Surfaces without a Stop hook (home) keep the legacy behaviour
// and block input until the current turn finishes.
const queueAware = Boolean(onStop)
const handleSend = () => {
const text = input.trim()
if (disabled || isStaging || streaming) return
if (disabled || isStaging) return
if (streaming && !queueAware) return
if (!text && attachments.length === 0) return
onSend({ text, attachments })
setInput('')
@@ -512,6 +551,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
)}
/>
</div>
{streaming && onStop ? <StopButton onStop={onStop} /> : null}
<VoiceButton
isRecording={voice.isRecording}
isTranscribing={voice.isTranscribing}
@@ -529,12 +569,13 @@ export const ConversationInput: FC<ConversationInputProps> = ({
!!disabled ||
voice.isRecording ||
voice.isTranscribing ||
streaming
(streaming && !queueAware)
}
onClick={handleSend}
// Spinner stays the user-facing "agent is busy" hint; with the
// queue active we still spin while a turn is in flight.
streaming={streaming}
hasContent={hasContent}
/>
</div>
{voice.error ? (

View File

@@ -0,0 +1,243 @@
import { Quote, TriangleAlert } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
import type {
HarnessAdapterHealth,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
import {
firstNonBlankLine,
truncate,
} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
import { cn } from '@/lib/utils'
interface HomeAgentCardProps {
agent: HarnessAgent
adapter: HarnessAgentAdapter | 'unknown'
/** Per-adapter health snapshot, shared across cards rendering the
* same adapter. `null` when the /adapters response hasn't surfaced
* health yet (we treat that as healthy until proven otherwise). */
adapterHealth: HarnessAdapterHealth | null
/** Highlights the card with an accent ring; tells the user which
* agent the conversation input is bound to. */
active?: boolean
onClick: () => void
}
const PREVIEW_CHARS = 100
/**
* Grid-shaped card for the /home Recent agents section. Composition
* mirrors the rail's `AgentRowCard` but the layout is a vertical
* column sized for a 1/3-width tile rather than a full-width row.
*
* Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
* `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
* inline `Unavailable` chip pattern so the visual language is
* continuous between rail and grid.
*/
export const HomeAgentCard: FC<HomeAgentCardProps> = ({
agent,
adapter,
adapterHealth,
active,
onClick,
}) => {
const status = agent.status ?? 'unknown'
const lastUsedAt = agent.lastUsedAt ?? null
const isWorking = status === 'working'
const isAsleep = status === 'asleep'
const isError = status === 'error'
const hasActiveTurn = Boolean(agent.activeTurnId)
return (
<button
type="button"
onClick={onClick}
className={cn(
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
active && 'ring-1 ring-[var(--accent-orange)]/30',
isWorking
? 'border-[var(--accent-orange)]/40'
: isError
? 'border-destructive/30'
: 'border-border/60 hover:border-[var(--accent-orange)]/30',
)}
>
<div className="flex items-start gap-3">
<AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
<div className="min-w-0 flex-1">
<div className="flex items-center gap-1.5">
<span className="truncate font-semibold text-sm">
{displayName(agent)}
</span>
{isWorking && (
<Badge
variant="secondary"
className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
</div>
<SummaryLine
adapter={adapter}
modelId={agent.modelId ?? null}
reasoningEffort={agent.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
<LastMessage message={agent.lastUserMessage ?? null} />
<div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
<span>{statusFootnote(status, lastUsedAt)}</span>
{hasActiveTurn ? (
<ResumeChip />
) : isAsleep ? (
<Badge variant="outline" className="text-muted-foreground">
Asleep
</Badge>
) : isError ? (
<ErrorChip lastError={agent.lastError ?? null} />
) : null}
</div>
</button>
)
}
const SummaryLine: FC<{
adapter: HarnessAgentAdapter | 'unknown'
modelId: string | null
reasoningEffort: string | null
adapterHealth: HarnessAdapterHealth | null
}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
const parts = [adapterLabel(adapter)]
if (modelId) parts.push(modelId)
if (reasoningEffort) parts.push(reasoningEffort)
const unhealthy = adapterHealth?.healthy === false
return (
<div
className={cn(
'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
unhealthy && 'text-muted-foreground/70',
)}
>
<span className="truncate">{parts.join(' · ')}</span>
{unhealthy && (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge
variant="outline"
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
>
<TriangleAlert className="size-2.5" />
<span className="font-normal">Unavailable</span>
</Badge>
</HoverCardTrigger>
<HoverCardContent side="right" className="w-72 text-sm">
<div className="font-medium">
{adapterLabel(adapter)} CLI not available
</div>
<div className="mt-1 text-muted-foreground text-xs">
{adapterHealth?.reason ??
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
</div>
</HoverCardContent>
</HoverCard>
)}
</div>
)
}
const LastMessage: FC<{ message: string | null }> = ({ message }) => {
if (!message) {
return (
<p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
No messages yet start a chat
</p>
)
}
return (
<p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
<Quote
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
aria-hidden
/>
<span className="line-clamp-2">
{truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
</span>
</p>
)
}
const ResumeChip: FC = () => (
<span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
<span className="relative flex size-1.5">
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
<span className="relative inline-flex size-1.5 rounded-full bg-white" />
</span>
Resume
</span>
)
const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
if (!lastError) {
return <Badge variant="destructive">Attention</Badge>
}
return (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge variant="destructive" className="cursor-default">
Attention
</Badge>
</HoverCardTrigger>
<HoverCardContent
side="left"
className="max-w-xs whitespace-pre-wrap font-mono text-xs"
>
{lastError}
</HoverCardContent>
</HoverCard>
)
}
/**
* Footer left side: relative time on every state EXCEPT working,
* which shows `now` (the dot is already pulsing — restating it as
* "Working" would duplicate the pill in the title row).
*/
function statusFootnote(
status: AgentLiveness,
lastUsedAt: number | null,
): string {
if (status === 'working') return 'now'
return formatRelativeTime(lastUsedAt)
}
const UUID_PATTERN =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
const OC_UUID_PATTERN =
/^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
function displayName(agent: HarnessAgent): string {
const name = agent.name?.trim()
const id = agent.id
if (!name || name === id) {
if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
if (UUID_PATTERN.test(id)) return id.slice(0, 8)
return id
}
return name
}

View File

@@ -0,0 +1,94 @@
import { ListPlus, X } from 'lucide-react'
import type { FC } from 'react'
import {
Queue,
QueueItem,
QueueItemAction,
QueueItemActions,
QueueItemAttachment,
QueueItemContent,
QueueItemFile,
QueueItemImage,
QueueList,
QueueSection,
QueueSectionContent,
QueueSectionLabel,
QueueSectionTrigger,
} from '@/components/ai-elements/queue'
import type {
HarnessQueuedMessage,
HarnessQueuedMessageAttachment,
} from '@/entrypoints/app/agents/agent-harness-types'
import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
interface QueuePanelProps {
queue: HarnessQueuedMessage[]
onRemove: (messageId: string) => void
}
/**
* Renders the agent's pending message queue using the shared AI
* Elements `Queue` primitives. Caller is expected to gate render on
* `queue.length > 0` — when empty, this returns null so the panel
* disappears cleanly between turns.
*/
export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
if (queue.length === 0) return null
return (
<Queue>
<QueueSection>
<QueueSectionTrigger>
<QueueSectionLabel
count={queue.length}
label={queue.length === 1 ? 'queued message' : 'queued messages'}
icon={<ListPlus className="size-3.5" />}
/>
</QueueSectionTrigger>
<QueueSectionContent>
<QueueList>
{queue.map((entry) => (
<QueueItem key={entry.id}>
<div className="flex items-center gap-2">
<QueueItemContent>
{firstNonBlankLine(entry.message)}
</QueueItemContent>
<QueueItemActions>
<QueueItemAction
aria-label="Remove from queue"
onClick={() => onRemove(entry.id)}
>
<X className="size-3" />
</QueueItemAction>
</QueueItemActions>
</div>
{entry.attachments && entry.attachments.length > 0 ? (
<QueueItemAttachment>
{entry.attachments.map((attachment, idx) =>
renderAttachment(entry.id, attachment, idx),
)}
</QueueItemAttachment>
) : null}
</QueueItem>
))}
</QueueList>
</QueueSectionContent>
</QueueSection>
</Queue>
)
}
function renderAttachment(
messageId: string,
attachment: HarnessQueuedMessageAttachment,
idx: number,
) {
if (attachment.mediaType.startsWith('image/')) {
const src = `data:${attachment.mediaType};base64,${attachment.data}`
return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
}
return (
<QueueItemFile key={`${messageId}-${idx}`}>
{attachment.mediaType}
</QueueItemFile>
)
}

View File

@@ -0,0 +1,69 @@
import { describe, expect, it } from 'bun:test'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { orderHomeAgents } from './home-agent-card.helpers'
function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
return {
id: overrides.id ?? 'agent-x',
name: overrides.name ?? overrides.id ?? 'agent-x',
adapter: overrides.adapter ?? 'codex',
permissionMode: 'approve-all',
sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
createdAt: 1000,
updatedAt: 1000,
...overrides,
}
}
describe('orderHomeAgents', () => {
it('places active-turn agents before everyone else', () => {
const sorted = orderHomeAgents([
agent({ id: 'a', lastUsedAt: 5000 }),
agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
agent({ id: 'c', lastUsedAt: 7000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
})
it('orders non-active agents by lastUsedAt desc', () => {
const sorted = orderHomeAgents([
agent({ id: 'old', lastUsedAt: 1000 }),
agent({ id: 'new', lastUsedAt: 9000 }),
agent({ id: 'mid', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
})
it('puts the gateway `main` seed agent above other never-used agents', () => {
const sorted = orderHomeAgents([
agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
agent({ id: 'main', lastUsedAt: null }),
agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
])
expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
})
it('sends never-used agents to the bottom even when `main` is among them', () => {
const sorted = orderHomeAgents([
agent({ id: 'main', lastUsedAt: null }),
agent({ id: 'used', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
})
it('does NOT sort by pinned — pinned agents are treated like any other', () => {
const sorted = orderHomeAgents([
agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
])
expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
})
it('falls back to id-stable ordering when lastUsedAt ties', () => {
const sorted = orderHomeAgents([
agent({ id: 'b', lastUsedAt: 5000 }),
agent({ id: 'a', lastUsedAt: 5000 }),
])
expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
})
})

View File

@@ -0,0 +1,42 @@
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
/**
* Order for the /home Recent agents grid.
*
* 1. Active turn first — agents mid-turn float to the top so the
* Resume affordance is the first thing the user sees on /home.
* 2. The protected gateway-side `main` agent stays pinned-to-top in
* the never-used group on a fresh install (mirrors the rail).
* 3. Recency (`lastUsedAt` desc).
* 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
* every 5-second poll.
*
* Pin is NOT a sort key. The home grid is action-oriented and trusts
* recency + active-turn to surface the right agent; pinning is an
* organisation tool that lives on the rail at /agents.
*/
export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
return [...agents].sort((a, b) => {
const aActive = a.activeTurnId != null
const bActive = b.activeTurnId != null
if (aActive !== bActive) return aActive ? -1 : 1
// Recency wins outright. Never-used agents (`lastUsedAt == null`)
// both fall to the same `-Infinity` bucket and the seed/id rules
// below decide their order — but a used agent always beats any
// never-used agent regardless of id.
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
// Inside the never-used (or exact-tie) group: pin the gateway
// `main` seed to the top of the group on a fresh install, then
// fall back to id-stable order so the grid doesn't reshuffle on
// every poll.
const aSeed = a.id === 'main' && a.lastUsedAt == null
const bSeed = b.id === 'main' && b.lastUsedAt == null
if (aSeed !== bSeed) return aSeed ? -1 : 1
return a.id.localeCompare(b.id)
})
}

View File

@@ -1,53 +0,0 @@
import {
type AgentEntry,
getModelDisplayName,
type OpenClawStatus,
} from '@/entrypoints/app/agents/useOpenClaw'
import type { AgentCardData } from '@/lib/agent-conversations/types'
import type { AgentOverview } from './useAgentDashboard'
function resolveAgentStatus(
gatewayStatus: OpenClawStatus['status'] | undefined,
liveStatus: AgentOverview['status'] | undefined,
): AgentCardData['status'] {
// Gateway-level errors take precedence
if (gatewayStatus === 'error') return 'error'
if (gatewayStatus === 'starting') return 'working'
// Per-agent live status from the WS observer
if (liveStatus === 'working') return 'working'
if (liveStatus === 'error') return 'error'
return 'idle'
}
/**
* Build agent card display data by merging the raw agent entries from
* the gateway with enriched overview data from the dashboard API.
*
* Pure function — no hooks, no IndexedDB, no async.
*/
export function buildAgentCardData(
agents: AgentEntry[],
status: OpenClawStatus['status'] | undefined,
dashboard: AgentOverview[] | undefined,
): AgentCardData[] {
return agents.map((agent) => {
const overview = dashboard?.find((d) => d.agentId === agent.agentId)
return {
agentId: agent.agentId,
name: agent.name,
model: getModelDisplayName(agent.model),
status:
agent.source === 'agent-harness'
? 'idle'
: resolveAgentStatus(status, overview?.status),
lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
activitySummary: overview?.activitySummary ?? undefined,
currentTool: overview?.currentTool ?? undefined,
costUsd: overview?.totalCostUsd ?? undefined,
}
})
}

View File

@@ -36,6 +36,15 @@ interface UseAgentConversationOptions {
history?: OpenClawChatHistoryMessage[]
onComplete?: () => void
onSessionKeyChange?: (sessionKey: string) => void
/**
* Server-side active turn id, surfaced via the listing query. When
* this changes from null/<id> to a different non-null id while we
* aren't already streaming (e.g. the server just popped a queued
* message and started a new turn), the hook reattaches via
* /chat/active so the chat panel picks up the live stream without
* waiting for a remount.
*/
activeTurnId?: string | null
}
export function useAgentConversation(
@@ -211,31 +220,46 @@ export function useAgentConversation(
}
processEventRef.current = processAgentHarnessStreamEvent
// On mount (and whenever the agent changes), check whether the
// server has an in-flight turn for this agent and reattach to it.
// This is what makes the chat resilient across tab close/reopen,
// refresh, and navigation: the runtime call kept running on the
// server while we were away. Effect only depends on `agentId` —
// the event handler is read off a ref so this doesn't re-subscribe
// every render.
const activeTurnIdDep = options.activeTurnId ?? null
// On mount, on agent change, and whenever the listing reports a
// *new* active turn id, check whether the server has an in-flight
// turn for this agent and reattach to it. This catches three
// cases at once: the chat resilience flow (tab close/reopen),
// navigation between agents, AND queue drain (the server starts a
// new turn from a queued message → activeTurnId flips → attach).
useEffect(() => {
let cancelled = false
const abortController = new AbortController()
// Reference the dep inside the body so biome's exhaustive-deps
// rule sees it consumed; the value is just an "any non-null
// active turn id" trigger — the actual id we attach to comes
// from the fresh fetchActiveHarnessTurn call below.
void activeTurnIdDep
const attemptResume = async () => {
// Track whether *we* started a stream in this run. When the
// early-return paths fire (no active turn, or a `send()` /
// earlier resume already owns `streamAbortRef`), the finally
// block must NOT touch streaming/turnIdRef/lastSeqRef —
// otherwise we clobber the in-flight stream's state and the
// Stop button drops out mid-turn while events keep arriving.
let weStartedStream = false
try {
const active = await fetchActiveHarnessTurn(agentId)
if (cancelled || !active || active.status !== 'running') return
if (streamAbortRef.current) return // a fresh send already in flight
if (streamAbortRef.current) return // someone else already owns the stream
// Stage a placeholder turn so the streamed events have a row
// to render into. We don't have the user message text on
// resume; the assistant turn is what we're catching up on.
// to render into. The server now persists the kicking-off
// prompt on the active turn, so we render it as the user
// bubble immediately — no empty-bubble flicker when a queued
// message starts running.
setTurns((prev) => [
...prev,
{
id: crypto.randomUUID(),
userText: '',
userText: active.prompt ?? '',
parts: [],
done: false,
timestamp: active.startedAt,
@@ -247,6 +271,7 @@ export function useAgentConversation(
lastSeqRef.current = null
streamAbortRef.current = abortController
setStreaming(true)
weStartedStream = true
const response = await attachToHarnessTurn(agentId, {
turnId: active.turnId,
@@ -265,10 +290,20 @@ export function useAgentConversation(
// Resume is best-effort; transient errors fall back to the
// user starting a new turn manually.
} finally {
if (!cancelled) {
if (streamAbortRef.current === abortController) {
streamAbortRef.current = null
}
// Always release `streamAbortRef` if we owned it — even when
// the effect was cancelled mid-stream (a listing poll
// captured the next queue-drain turn id, for example). If we
// don't, the next effect run hits `if (streamAbortRef.current)
// return` against our now-aborted controller and never
// reattaches, leaving `streaming === true` with no live stream.
if (weStartedStream && streamAbortRef.current === abortController) {
streamAbortRef.current = null
}
// The other state (streaming flag, turn id, lastSeq) is the
// *current run's* lifecycle: only reset it on a clean exit.
// When `cancelled` is true the next run will set these
// itself, so resetting here would only cause a brief flicker.
if (!cancelled && weStartedStream) {
turnIdRef.current = null
lastSeqRef.current = null
setStreaming(false)
@@ -281,7 +316,7 @@ export function useAgentConversation(
cancelled = true
abortController.abort()
}
}, [agentId])
}, [agentId, activeTurnIdDep])
const send = async (input: string | SendInput) => {
const normalized: SendInput =

View File

@@ -1,95 +0,0 @@
import { useQuery, useQueryClient } from '@tanstack/react-query'
import { useEffect } from 'react'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
export interface AgentOverview {
agentId: string
status: 'working' | 'idle' | 'error' | 'unknown'
latestMessage: string | null
latestMessageAt: number | null
activitySummary: string | null
currentTool: string | null
totalCostUsd: number
sessionCount: number
}
export interface DashboardResponse {
agents: AgentOverview[]
summary: {
totalAgents: number
totalCostUsd: number
}
}
interface StatusEvent {
agentId: string
status: AgentOverview['status']
currentTool: string | null
error: string | null
timestamp: number
}
const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
export function useAgentDashboard(enabled: boolean) {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
const ready = enabled && Boolean(baseUrl) && !urlLoading
// Initial data load + periodic refresh as fallback
const query = useQuery<DashboardResponse>({
queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
queryFn: async () => {
const url = new URL('/claw/dashboard', baseUrl as string)
const response = await fetch(url.toString())
if (!response.ok) throw new Error('Failed to fetch dashboard')
return response.json()
},
enabled: ready,
})
// SSE subscription for real-time status patches
useEffect(() => {
if (!ready || !baseUrl) return
const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
const eventSource = new EventSource(streamUrl.toString())
eventSource.addEventListener('snapshot', (event) => {
try {
const dashboard = JSON.parse(event.data) as DashboardResponse
queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
} catch {}
})
eventSource.addEventListener('status', (event) => {
try {
const status = JSON.parse(event.data) as StatusEvent
queryClient.setQueryData<DashboardResponse>(
[...DASHBOARD_QUERY_KEY, baseUrl],
(prev) => {
if (!prev) return prev
return {
...prev,
agents: prev.agents.map((agent) =>
agent.agentId === status.agentId
? {
...agent,
status: status.status,
currentTool: status.currentTool,
}
: agent,
),
}
},
)
} catch {}
})
return () => {
eventSource.close()
}
}, [ready, baseUrl, queryClient])
return query
}

View File

@@ -2,67 +2,75 @@ import { Loader2 } from 'lucide-react'
import { type FC, useMemo } from 'react'
import { AgentRowCard } from './AgentRowCard'
import { AgentsEmptyState } from './AgentsEmptyState'
import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from './agent-harness-types'
import type {
AgentAdapterHealth,
AgentRowData,
} from './agent-row/agent-row.types'
import { compareAgentsByPinThenRecency } from './agents-list-order'
import type { AgentListItem } from './agents-page-types'
import type { AgentLiveness } from './LivenessDot'
interface AgentListProps {
agents: AgentListItem[]
/**
* Optional per-agent activity metadata. Keyed by `agentId`. Missing
* entries fall back to status='unknown' / lastUsedAt=null and the
* row renders an "unknown" dot. The server will populate this once
* the activity tracker ships; the page works without it.
*/
/** Optional per-agent activity metadata, keyed by `agentId`. */
activity?: Record<
string,
{ status: AgentLiveness; lastUsedAt: number | null }
>
/**
* Lookup table from harness agent id → adapter + reasoning effort,
* sourced from `useHarnessAgents`. Lets the row card render the
* correct adapter icon and chips for harness agents (legacy
* /claw/agents entries fall back to inferring from `runtimeLabel`).
*/
/** Lookup table from harness id → enriched agent record. */
harnessAgentLookup?: Map<string, HarnessAgent>
/** Adapter catalog (carries per-adapter health). */
adapters: HarnessAdapterDescriptor[]
loading: boolean
deletingAgentKey: string | null
onCreateAgent: () => void
onDeleteAgent: (agent: AgentListItem) => void
onPinToggle: (agent: AgentListItem, next: boolean) => void
}
export const AgentList: FC<AgentListProps> = ({
agents,
activity,
harnessAgentLookup,
adapters,
loading,
deletingAgentKey,
onCreateAgent,
onDeleteAgent,
onPinToggle,
}) => {
// Sort by recency: most recently used first; never-used agents drop
// to the bottom in id-stable order so the list doesn't reshuffle on
// every refresh. The pinned exception is the gateway's `main` agent
// when it's never been touched — keep it at the top so a fresh
// install has an obvious starting point.
const adapterHealth = useMemo(() => {
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
for (const adapter of adapters) {
if (adapter.health) {
map.set(adapter.id, {
healthy: adapter.health.healthy,
reason: adapter.health.reason,
})
}
}
return map
}, [adapters])
const ordered = useMemo(() => {
const withScore = agents.map((agent) => {
const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
return { agent, lastUsedAt }
const withMeta = agents.map((agent) => {
const harness = harnessAgentLookup?.get(agent.agentId)
return {
agent,
id: agent.agentId,
pinned: harness?.pinned ?? false,
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
}
})
return withScore
.sort((a, b) => {
const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
if (aPinned && !bPinned) return -1
if (!aPinned && bPinned) return 1
const aValue = a.lastUsedAt ?? -Infinity
const bValue = b.lastUsedAt ?? -Infinity
if (aValue !== bValue) return bValue - aValue
return a.agent.agentId.localeCompare(b.agent.agentId)
})
return withMeta
.sort(compareAgentsByPinThenRecency)
.map((entry) => entry.agent)
}, [activity, agents])
}, [activity, agents, harnessAgentLookup])
if (loading && agents.length === 0) {
return (
@@ -80,18 +88,23 @@ export const AgentList: FC<AgentListProps> = ({
<div className="grid gap-3">
{ordered.map((agent) => {
const harness = harnessAgentLookup?.get(agent.agentId)
const adapter: HarnessAgentAdapter | undefined =
const adapter: HarnessAgentAdapter | 'unknown' =
harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
const data = buildRowData({
agent,
adapter,
harness,
activity: activity?.[agent.agentId],
adapterHealth:
adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
})
return (
<AgentRowCard
key={agent.key}
agent={agent}
status={activity?.[agent.agentId]?.status}
lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
adapter={adapter}
reasoningEffort={harness?.reasoningEffort ?? null}
onDelete={onDeleteAgent}
data={data}
deleting={deletingAgentKey === agent.key}
onDelete={onDeleteAgent}
onPinToggle={onPinToggle}
/>
)
})}
@@ -99,10 +112,53 @@ export const AgentList: FC<AgentListProps> = ({
)
}
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
const lower = label?.toLowerCase()
if (lower === 'claude code') return 'claude'
if (lower === 'codex') return 'codex'
if (lower === 'openclaw') return 'openclaw'
return undefined
return 'unknown'
}
const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
function buildRowData(input: {
agent: AgentListItem
adapter: HarnessAgentAdapter | 'unknown'
harness: HarnessAgent | undefined
activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
adapterHealth: AgentAdapterHealth | null
}): AgentRowData {
const { agent, adapter, harness, activity, adapterHealth } = input
return {
agent,
adapter,
modelLabel: deriveModelLabel(agent, harness),
reasoningEffort: harness?.reasoningEffort ?? null,
status: activity?.status ?? 'unknown',
lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
pinned: harness?.pinned ?? false,
cwd: harness?.cwd ?? null,
lastUserMessage: harness?.lastUserMessage ?? null,
tokens: harness?.tokens ?? null,
turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
lastError: harness?.lastError ?? null,
lastErrorAt: harness?.lastErrorAt ?? null,
activeTurnId: harness?.activeTurnId ?? null,
adapterHealth,
}
}
function deriveModelLabel(
agent: AgentListItem,
harness: HarnessAgent | undefined,
): string | null {
// Prefer the agent rail's modelLabel when meaningful; harness's
// modelId is a stable identifier but the rail's `modelLabel`
// already maps to a friendly display string.
if (agent.modelLabel && agent.modelLabel !== 'default') {
return agent.modelLabel
}
return harness?.modelId ?? null
}

View File

@@ -1,270 +1,99 @@
import {
Copy,
Loader2,
MessageSquare,
MoreHorizontal,
Pencil,
RotateCcw,
Trash2,
} from 'lucide-react'
import type { FC } from 'react'
import { useNavigate } from 'react-router'
import { toast } from 'sonner'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import {
DropdownMenu,
DropdownMenuContent,
DropdownMenuItem,
DropdownMenuSeparator,
DropdownMenuTrigger,
} from '@/components/ui/dropdown-menu'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import { cn } from '@/lib/utils'
import { AdapterIcon, adapterLabel } from './AdapterIcon'
import {
canDelete as canDeleteAgent,
canRename as canRenameAgent,
displayName,
formatRelativeTime,
workspaceLabel,
} from './agent-display.helpers'
import type { HarnessAgentAdapter } from './agent-harness-types'
import type { AgentListItem } from './agents-page-types'
import { type AgentLiveness, LivenessDot } from './LivenessDot'
import { AgentActions } from './agent-row/AgentActions'
import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
import { AgentLastMessage } from './agent-row/AgentLastMessage'
import { AgentMetaRow } from './agent-row/AgentMetaRow'
import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
import { AgentTile } from './agent-row/AgentTile'
import { AgentTitleRow } from './agent-row/AgentTitleRow'
import type {
AgentRowCallbacks,
AgentRowData,
} from './agent-row/agent-row.types'
interface AgentRowCardProps {
agent: AgentListItem
/**
* Per-agent extras the listing surface provides on top of the
* minimal `AgentListItem` shape. `lastUsedAt` survives server
* restart (sourced from acpx session record); `status` is in-memory
* server-side.
*/
status?: AgentLiveness
lastUsedAt?: number | null
/** Adapter the agent belongs to. Drives icon + label. */
adapter?: HarnessAgentAdapter
/** Reasoning effort chip (claude/codex/openclaw catalog). */
reasoningEffort?: string | null
/** Modeled directly off the inbound delete handler so the parent owns the dialog. */
onDelete: (agent: AgentListItem) => void
/** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
interface AgentRowCardProps extends AgentRowCallbacks {
data: AgentRowData
/** Whether THIS agent is mid-delete; renders a spinner in the menu. */
deleting?: boolean
}
/**
* Composition shell for the agent rail. Owns no state; sub-components
* each handle their own micro-state (error-panel collapse, etc.) and
* emit callbacks (delete, pin/unpin) for the page to act on.
*
* The whole card carries state — not just the tile — so the row's
* border subtly tells the user what's going on at a glance:
* working → accent-orange border with a soft glow
* error → destructive border
* idle → muted border, lifts on hover
*/
export const AgentRowCard: FC<AgentRowCardProps> = ({
agent,
status = 'unknown',
lastUsedAt,
adapter,
reasoningEffort,
onDelete,
data,
deleting,
onDelete,
onPinToggle,
}) => {
const navigate = useNavigate()
const adapterId = adapter ?? inferAdapterFromListItem(agent)
const workspace = workspaceLabel(agent)
const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
const allowDelete = canDeleteAgent(agent)
const allowRename = canRenameAgent(agent)
const handleChat = () => navigate(`/agents/${agent.agentId}`)
const handleCopyId = async () => {
try {
await navigator.clipboard.writeText(agent.agentId)
toast.success('Agent id copied')
} catch {
toast.error('Could not copy agent id')
}
}
return (
<div
className={cn(
'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
// Layout-stable hover. No translate, no shadow change — both
// visibly perturb neighbouring rows. Only the border tint
// shifts on hover, and the rail's vertical rhythm stays
// exactly the same in every state.
'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
data.status === 'working'
? 'border-[var(--accent-orange)]/40'
: data.status === 'error'
? 'border-destructive/40'
: 'border-border hover:border-[var(--accent-orange)]/30',
)}
>
<div className="flex items-start gap-4">
{/* Adapter tile + liveness dot in the corner. */}
<div className="relative shrink-0">
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
<AdapterIcon adapter={adapterId} className="h-6 w-6" />
</div>
<LivenessDot
status={status}
detail={livenessDetail(status, lastUsedAt)}
className="absolute -right-0.5 -bottom-0.5"
/>
</div>
<AgentTile
adapter={data.adapter}
status={data.status}
lastUsedAt={data.lastUsedAt}
/>
<div className="min-w-0 flex-1">
<div className="mb-1 flex items-center gap-2">
<span className="truncate font-semibold">{displayName(agent)}</span>
{status === 'working' && (
<Badge
variant="secondary"
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
{status === 'asleep' && (
<Badge variant="outline" className="text-muted-foreground">
Asleep
</Badge>
)}
{status === 'error' && (
<Badge variant="destructive">Attention</Badge>
)}
</div>
<AgentTitleRow
agent={data.agent}
status={data.status}
pinned={data.pinned}
turnsByDay={data.turnsByDay}
failedByDay={data.failedByDay}
onPinToggle={(next) => onPinToggle(data.agent, next)}
/>
<div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
<Badge variant="secondary" className="font-normal">
{adapterLabel(adapterId)}
</Badge>
{agent.modelLabel && agent.modelLabel !== 'default' && (
<Badge variant="outline" className="font-normal">
{agent.modelLabel}
</Badge>
)}
{reasoningEffort && reasoningEffort !== 'medium' && (
<Badge variant="outline" className="font-normal">
{reasoningEffort}
</Badge>
)}
</div>
<AgentSummaryChips
adapter={data.adapter}
modelLabel={data.modelLabel}
reasoningEffort={data.reasoningEffort}
adapterHealth={data.adapterHealth}
/>
<div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
<span>Last used {lastUsedLabel}</span>
{workspace && (
<>
<span aria-hidden></span>
<span className="truncate font-mono" title={workspace}>
{workspace}
</span>
</>
)}
</div>
<AgentLastMessage message={data.lastUserMessage} />
<AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
{data.status === 'error' && data.lastError && (
<AgentErrorPanel
agentId={data.agent.agentId}
message={data.lastError}
errorAt={data.lastErrorAt}
/>
)}
</div>
<div className="flex shrink-0 items-center gap-2">
<Button variant="outline" size="sm" onClick={handleChat}>
<MessageSquare className="mr-1.5 h-3 w-3" />
Chat
</Button>
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button
variant="ghost"
size="icon"
aria-label={`More actions for ${displayName(agent)}`}
className="h-8 w-8"
>
<MoreHorizontal className="h-4 w-4" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" className="w-44">
<DropdownMenuItem onSelect={() => void handleCopyId()}>
<Copy className="mr-2 h-3.5 w-3.5" />
Copy id
</DropdownMenuItem>
<RenameMenuItem disabled={!allowRename} />
<ResetHistoryMenuItem />
<DropdownMenuSeparator />
<DropdownMenuItem
onSelect={() => onDelete(agent)}
disabled={!allowDelete || deleting}
className="text-destructive focus:text-destructive"
>
{deleting ? (
<Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
) : (
<Trash2 className="mr-2 h-3.5 w-3.5" />
)}
Delete
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
</div>
<AgentActions
agent={data.agent}
activeTurnId={data.activeTurnId}
deleting={deleting}
onDelete={onDelete}
/>
</div>
</div>
)
}
const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
const item = (
<DropdownMenuItem disabled className="text-muted-foreground">
<Pencil className="mr-2 h-3.5 w-3.5" />
Rename
</DropdownMenuItem>
)
if (!disabled) return item
// Disabled but with a hint so users know it's coming, not broken.
return (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<span className="block w-full">{item}</span>
</TooltipTrigger>
<TooltipContent side="left" className="text-xs">
Rename coming soon
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
}
const ResetHistoryMenuItem: FC = () => {
const item = (
<DropdownMenuItem disabled className="text-muted-foreground">
<RotateCcw className="mr-2 h-3.5 w-3.5" />
Reset history
</DropdownMenuItem>
)
return (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<span className="block w-full">{item}</span>
</TooltipTrigger>
<TooltipContent side="left" className="text-xs">
Reset history coming soon
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
}
function inferAdapterFromListItem(
agent: AgentListItem,
): HarnessAgentAdapter | 'unknown' {
const label = agent.runtimeLabel?.toLowerCase()
if (label?.includes('claude')) return 'claude'
if (label?.includes('codex')) return 'codex'
if (label?.includes('openclaw')) return 'openclaw'
return 'unknown'
}
function livenessDetail(
status: AgentLiveness,
lastUsedAt: number | null | undefined,
): string | undefined {
if (lastUsedAt == null) return undefined
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
if (status === 'asleep') {
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
const hr = Math.floor(diffMin / 60)
return `Asleep — quiet for ${hr} hr`
}
if (status === 'working') return 'Working on a turn'
if (status === 'error') return 'Attention — last turn failed'
return undefined
}

View File

@@ -44,6 +44,7 @@ import {
useCreateHarnessAgent,
useDeleteHarnessAgent,
useHarnessAgents,
useUpdateHarnessAgent,
} from './useAgents'
import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'
@@ -76,6 +77,7 @@ export const AgentsPage: FC = () => {
} = useOpenClawAgents(openClawAgentsEnabled)
const createHarnessAgent = useCreateHarnessAgent()
const deleteHarnessAgent = useDeleteHarnessAgent()
const updateHarnessAgent = useUpdateHarnessAgent()
const {
setupOpenClaw,
createAgent: createOpenClawAgent,
@@ -342,12 +344,24 @@ export const AgentsPage: FC = () => {
agents={agentListItems}
activity={agentActivity}
harnessAgentLookup={harnessAgentLookup}
adapters={adapters}
loading={agentsLoading}
deletingAgentKey={deletingAgent ? deletingAgentKey : null}
onCreateAgent={() => setCreateOpen(true)}
onDeleteAgent={(agent) => {
void handleDelete(agent)
}}
onPinToggle={(agent, next) => {
// Optimistic mutation; harness-only — gateway-original
// OpenClaw entries are gated server-side via the harness
// backfill, so we only fire when the row maps to a
// harness agent record.
if (!harnessAgentLookup.has(agent.agentId)) return
updateHarnessAgent.mutate({
agentId: agent.agentId,
patch: { pinned: next },
})
}}
/>
<SetupOpenClawDialog

View File

@@ -1,4 +1,5 @@
import type { AgentListItem } from './agents-page-types'
import type { AgentLiveness } from './LivenessDot'
/**
* Display rules for the redesigned agent rows. Pure helpers — no React,
@@ -82,3 +83,25 @@ export function formatRelativeTime(epochMs: number | null): string {
const d = Math.floor(diff / ONE_DAY)
return d === 1 ? '1 day ago' : `${d} days ago`
}
/**
* Tooltip-friendly description of a row's current liveness state.
* Returns `undefined` when the state has nothing extra to add (e.g.
* `unknown` with no timestamp).
*/
export function livenessDetail(
status: AgentLiveness,
lastUsedAt: number | null | undefined,
): string | undefined {
if (lastUsedAt == null) return undefined
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
if (status === 'asleep') {
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
const hr = Math.floor(diffMin / 60)
return `Asleep — quiet for ${hr} hr`
}
if (status === 'working') return 'Working on a turn'
if (status === 'error') return 'Attention — last turn failed'
return undefined
}

View File

@@ -56,6 +56,43 @@ export interface HarnessAgent {
* agents. Drives the recency sort and the "Last used X min ago" copy.
*/
lastUsedAt?: number | null
/** Pinned agents float to the top of the list. Defaults to `false`. */
pinned?: boolean
/** First non-blank line of the most recent user message; null if none. */
lastUserMessage?: string | null
/** Working directory the agent runs in; null when no session record yet. */
cwd?: string | null
/** Cumulative + 7-day rolling token usage; null when no record. */
tokens?: {
last7d: { input: number; output: number; requestCount: number }
cumulative: { input: number; output: number }
} | null
turnsByDay?: number[]
failedByDay?: number[]
lastError?: string | null
lastErrorAt?: number | null
/** When non-null, an in-flight turn this row can be resumed from. */
activeTurnId?: string | null
/** Persistent FIFO queue of messages waiting for this agent. */
queue?: HarnessQueuedMessage[]
}
export interface HarnessQueuedMessageAttachment {
mediaType: string
data: string
}
export interface HarnessQueuedMessage {
id: string
createdAt: number
message: string
attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
}
export interface HarnessAdapterHealth {
healthy: boolean
reason?: string
checkedAt: number
}
export interface HarnessAdapterDescriptor {
@@ -66,6 +103,7 @@ export interface HarnessAdapterDescriptor {
modelControl: 'runtime-supported' | 'best-effort'
models: Array<{ id: string; label: string; recommended?: boolean }>
reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
health?: HarnessAdapterHealth
}
export interface CreateHarnessAgentInput {

View File

@@ -0,0 +1,160 @@
import {
Copy,
Loader2,
MessageSquare,
MoreHorizontal,
Pencil,
RotateCcw,
Trash2,
} from 'lucide-react'
import type { FC } from 'react'
import { useNavigate } from 'react-router'
import { toast } from 'sonner'
import { Button } from '@/components/ui/button'
import {
DropdownMenu,
DropdownMenuContent,
DropdownMenuItem,
DropdownMenuSeparator,
DropdownMenuTrigger,
} from '@/components/ui/dropdown-menu'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import {
canDelete as canDeleteAgent,
canRename as canRenameAgent,
displayName,
} from '../agent-display.helpers'
import type { AgentListItem } from '../agents-page-types'
interface AgentActionsProps {
agent: AgentListItem
activeTurnId: string | null
deleting?: boolean
onDelete: (agent: AgentListItem) => void
}
/**
* Single primary CTA per row: `Resume` (filled, accent-orange, with a
* pulsing dot) when an active turn exists; otherwise `Chat` (outline).
* Both navigate to the same place — the chat hook auto-attaches via
* `/chat/active` when there's a live turn — but the row signals which
* action the user is actually taking.
*/
export const AgentActions: FC<AgentActionsProps> = ({
agent,
activeTurnId,
deleting,
onDelete,
}) => {
const navigate = useNavigate()
const allowDelete = canDeleteAgent(agent)
const allowRename = canRenameAgent(agent)
const handleChat = () => navigate(`/agents/${agent.agentId}`)
const handleCopyId = async () => {
try {
await navigator.clipboard.writeText(agent.agentId)
toast.success('Agent id copied')
} catch {
toast.error('Could not copy agent id')
}
}
return (
<div className="flex shrink-0 items-center gap-1.5">
{activeTurnId ? (
<Button
variant="default"
size="sm"
onClick={handleChat}
className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
>
<span className="relative flex size-2">
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
<span className="relative inline-flex size-2 rounded-full bg-white" />
</span>
Resume
</Button>
) : (
<Button variant="outline" size="sm" onClick={handleChat}>
<MessageSquare className="mr-1.5 size-3" />
Chat
</Button>
)}
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button
variant="ghost"
size="icon"
aria-label={`More actions for ${displayName(agent)}`}
className="size-8 text-muted-foreground hover:text-foreground"
>
<MoreHorizontal className="size-4" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" className="w-44">
<DropdownMenuItem onSelect={() => void handleCopyId()}>
<Copy className="mr-2 size-3.5" />
Copy id
</DropdownMenuItem>
<ComingSoonItem
icon={Pencil}
label="Rename"
disabled={!allowRename}
/>
<ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
<DropdownMenuSeparator />
<DropdownMenuItem
onSelect={() => onDelete(agent)}
disabled={!allowDelete || deleting}
className="text-destructive focus:text-destructive"
>
{deleting ? (
<Loader2 className="mr-2 size-3.5 animate-spin" />
) : (
<Trash2 className="mr-2 size-3.5" />
)}
Delete
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
</div>
)
}
interface ComingSoonItemProps {
icon: typeof Pencil
label: string
disabled: boolean
}
const ComingSoonItem: FC<ComingSoonItemProps> = ({
icon: Icon,
label,
disabled,
}) => {
const item = (
<DropdownMenuItem disabled className="text-muted-foreground">
<Icon className="mr-2 size-3.5" />
{label}
</DropdownMenuItem>
)
if (!disabled) return item
return (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<span className="block w-full">{item}</span>
</TooltipTrigger>
<TooltipContent side="left" className="text-xs">
{label} coming soon
</TooltipContent>
</Tooltip>
</TooltipProvider>
)
}

View File

@@ -0,0 +1,96 @@
import { AlertTriangle, ChevronDown } from 'lucide-react'
import { type FC, useEffect, useState } from 'react'
import { Button } from '@/components/ui/button'
import {
Collapsible,
CollapsibleContent,
CollapsibleTrigger,
} from '@/components/ui/collapsible'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { truncate } from './agent-row.helpers'
interface AgentErrorPanelProps {
agentId: string
message: string
errorAt: number | null
}
const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
const PREVIEW_CHARS = 200
export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
agentId,
message,
errorAt,
}) => {
const storageKey = `${STORAGE_PREFIX}${agentId}`
// Open if we've never seen this `errorAt` for this agent. Once the
// user collapses the panel (or refreshes after seeing it), we mark
// it seen so it doesn't re-pop on every poll.
const [open, setOpen] = useState<boolean>(() => {
if (typeof window === 'undefined' || !errorAt) return true
const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
return !Number.isFinite(seen) || errorAt > seen
})
useEffect(() => {
if (!open && errorAt && typeof window !== 'undefined') {
window.localStorage.setItem(storageKey, String(errorAt))
}
}, [open, errorAt, storageKey])
const preview = truncate(message, PREVIEW_CHARS)
const truncated = preview.length < message.length
return (
<Collapsible open={open} onOpenChange={setOpen} className="mt-3">
<div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
<div className="flex items-center gap-2 font-medium text-destructive text-xs">
<AlertTriangle className="size-3.5" />
Last error
</div>
<CollapsibleTrigger asChild>
<Button
variant="ghost"
size="sm"
className="h-6 px-2 text-muted-foreground"
>
<span className="text-xs">{open ? 'hide' : 'show'}</span>
<ChevronDown
className={cn(
'ml-1 size-3 transition-transform',
open && 'rotate-180',
)}
/>
</Button>
</CollapsibleTrigger>
</div>
<CollapsibleContent>
<div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
{truncated ? (
<HoverCard openDelay={300}>
<HoverCardTrigger asChild>
<span className="cursor-default font-mono text-foreground/80">
{preview}
</span>
</HoverCardTrigger>
<HoverCardContent
side="bottom"
className="max-w-md whitespace-pre-wrap font-mono text-xs"
>
{message}
</HoverCardContent>
</HoverCard>
) : (
<span className="font-mono text-foreground/80">{message}</span>
)}
</div>
</CollapsibleContent>
</Collapsible>
)
}

View File

@@ -0,0 +1,35 @@
import { Quote } from 'lucide-react'
import type { FC } from 'react'
import { firstNonBlankLine, truncate } from './agent-row.helpers'
interface AgentLastMessageProps {
message: string | null
}
const PREVIEW_CHARS = 110
/**
* Inline preview of the most recent user message. Renders as a quoted,
* italic line so the row reads like a conversation snippet rather than
* a label-and-value pair. No hover-card — opening the agent's chat is
* the canonical way to read the full message.
*/
export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
if (!message) {
return (
<p className="mt-1 text-muted-foreground/70 text-xs italic">
No messages yet start a chat
</p>
)
}
const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
return (
<p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
<Quote
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
aria-hidden
/>
<span className="truncate">{preview}</span>
</p>
)
}

View File

@@ -0,0 +1,37 @@
import type { FC } from 'react'
import { formatRelativeTime } from '../agent-display.helpers'
import { AgentTokenSummary } from './AgentTokenSummary'
import type { AgentTokenUsage } from './agent-row.types'
interface AgentMetaRowProps {
lastUsedAt: number | null
tokens: AgentTokenUsage | null
}
/**
* Bottom-of-row meta line. Intentionally sparse — last activity time
* and lifetime tokens. CWD is no longer surfaced here because the path
* the server happens to be running from isn't actionable; if a future
* surface needs the cwd (chat panel, debug view) it reads from the
* listing payload directly.
*/
export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
const lastUsedLabel = formatRelativeTime(lastUsedAt)
const tokensTotal =
(tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
const showTokens = tokensTotal > 0
return (
<div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
<span>{lastUsedLabel}</span>
{showTokens && (
<>
<span aria-hidden className="text-muted-foreground/50">
·
</span>
<AgentTokenSummary tokens={tokens} />
</>
)}
</div>
)
}

View File

@@ -0,0 +1,92 @@
import type { FC } from 'react'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
interface AgentSparklineProps {
/** 14 entries, oldest → newest. Today's bucket is the last index. */
turnsByDay: number[]
/** Same length, same order. Failed turns counted separately. */
failedByDay: number[]
className?: string
}
const MIN_BAR_HEIGHT_PX = 2
const MAX_BAR_HEIGHT_PX = 18
export const AgentSparkline: FC<AgentSparklineProps> = ({
turnsByDay,
failedByDay,
className,
}) => {
if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
const max = Math.max(1, ...turnsByDay)
return (
<HoverCard openDelay={250}>
<HoverCardTrigger asChild>
<div
role="img"
aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
className={cn('flex h-5 items-end gap-px', className)}
>
{turnsByDay.map((count, idx) => {
const ratio = count / max
const height = Math.max(
MIN_BAR_HEIGHT_PX,
Math.round(ratio * MAX_BAR_HEIGHT_PX),
)
const isToday = idx === ROW_BAR_COUNT - 1
const failed = failedByDay[idx] ?? 0
return (
<div
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
key={`bar-${idx}`}
className={cn(
'w-1.5 rounded-sm',
count === 0
? 'bg-muted-foreground/15'
: failed > 0
? 'bg-destructive/50'
: 'bg-[var(--accent-orange)]/50',
isToday && 'ring-1 ring-foreground/30',
)}
style={{ height }}
/>
)
})}
</div>
</HoverCardTrigger>
<HoverCardContent side="left" className="w-56 text-xs">
<div className="mb-2 font-medium text-sm">Last 14 days</div>
<ul className="space-y-0.5">
{turnsByDay.map((count, idx) => {
const failed = failedByDay[idx] ?? 0
const dayLabel = formatLocalDate(idx)
return (
<li
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
key={`day-${idx}`}
className="flex items-center justify-between text-muted-foreground"
>
<span>{dayLabel}</span>
<span>
{count}
{failed > 0 && (
<span className="ml-1 text-destructive">
({failed} failed)
</span>
)}
</span>
</li>
)
})}
</ul>
</HoverCardContent>
</HoverCard>
)
}

View File

@@ -0,0 +1,71 @@
import { TriangleAlert } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { cn } from '@/lib/utils'
import { adapterLabel } from '../AdapterIcon'
import type { HarnessAgentAdapter } from '../agent-harness-types'
import type { AgentAdapterHealth } from './agent-row.types'
interface AgentSummaryChipsProps {
adapter: HarnessAgentAdapter | 'unknown'
modelLabel: string | null
reasoningEffort: string | null
/** When unhealthy, the adapter label dims and a warning chip appears. */
adapterHealth: AgentAdapterHealth | null
}
/**
* Adapter / model / reasoning summary line. Always rendered (so OpenClaw
* rows that fall back to defaults still expose what they're set up to do)
* and surfaces adapter-health *only when unhealthy* — keeping the calm
* default state silent and reserving visual noise for things the user
* needs to act on.
*/
export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
adapter,
modelLabel,
reasoningEffort,
adapterHealth,
}) => {
const parts = [adapterLabel(adapter)]
if (modelLabel) parts.push(modelLabel)
if (reasoningEffort) parts.push(reasoningEffort)
const unhealthy = adapterHealth?.healthy === false
return (
<div
className={cn(
'flex items-center gap-1.5 text-muted-foreground text-xs',
unhealthy && 'text-muted-foreground/70',
)}
>
<span className="truncate">{parts.join(' · ')}</span>
{unhealthy && adapterHealth && (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<Badge
variant="outline"
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
>
<TriangleAlert className="size-2.5" />
<span className="font-normal">Unavailable</span>
</Badge>
</HoverCardTrigger>
<HoverCardContent side="right" className="w-72 text-sm">
<div className="font-medium">
{adapterLabel(adapter)} CLI not available
</div>
<div className="mt-1 text-muted-foreground text-xs">
{adapterHealth.reason ??
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
</div>
</HoverCardContent>
</HoverCard>
)}
</div>
)
}

View File

@@ -0,0 +1,37 @@
import type { FC } from 'react'
import { cn } from '@/lib/utils'
import { AdapterIcon } from '../AdapterIcon'
import { livenessDetail } from '../agent-display.helpers'
import type { HarnessAgentAdapter } from '../agent-harness-types'
import { type AgentLiveness, LivenessDot } from '../LivenessDot'
export interface AgentTileProps {
adapter: HarnessAgentAdapter | 'unknown'
status: AgentLiveness
lastUsedAt: number | null
}
/**
* Adapter glyph + a single liveness dot. Adapter health is no longer
* surfaced here — it lives as an inline pill inside `AgentSummaryChips`
* so the user isn't asked to disambiguate two dots on the same tile.
*/
export const AgentTile: FC<AgentTileProps> = ({
adapter,
status,
lastUsedAt,
}) => (
<div className="relative shrink-0">
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
<AdapterIcon adapter={adapter} className="h-6 w-6" />
</div>
<LivenessDot
status={status}
detail={livenessDetail(status, lastUsedAt)}
className={cn(
'absolute -right-0.5 -bottom-0.5',
status === 'working' && 'animate-pulse',
)}
/>
</div>
)

View File

@@ -0,0 +1,55 @@
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { displayName } from '../agent-display.helpers'
import type { AgentListItem } from '../agents-page-types'
import type { AgentLiveness } from '../LivenessDot'
import { AgentSparkline } from './AgentSparkline'
import { PinToggle } from './PinToggle'
interface AgentTitleRowProps {
agent: AgentListItem
status: AgentLiveness
pinned: boolean
turnsByDay: number[]
failedByDay: number[]
onPinToggle: (next: boolean) => void
}
/**
* Title strip: name + status badge + (right-aligned) sparkline. The
* pin toggle sits trailing the title so the title always flushes left
* regardless of pin state — moving the star left of the title indents
* the row's first line off-axis from the model/preview/meta lines
* below it. When unpinned and not hovered, the toggle is removed from
* layout entirely so it reserves no space at all.
*/
export const AgentTitleRow: FC<AgentTitleRowProps> = ({
agent,
status,
pinned,
turnsByDay,
failedByDay,
onPinToggle,
}) => (
<div className="mb-1 flex items-center gap-2">
<span className="truncate font-semibold">{displayName(agent)}</span>
{status === 'working' && (
<Badge
variant="secondary"
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
{status === 'asleep' && (
<Badge variant="outline" className="text-muted-foreground">
Asleep
</Badge>
)}
{status === 'error' && <Badge variant="destructive">Attention</Badge>}
<PinToggle pinned={pinned} onToggle={onPinToggle} />
<div className="ml-auto">
<AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
</div>
</div>
)

View File

@@ -0,0 +1,63 @@
import type { FC } from 'react'
import {
HoverCard,
HoverCardContent,
HoverCardTrigger,
} from '@/components/ui/hover-card'
import { Progress } from '@/components/ui/progress'
import { formatTokens } from './agent-row.helpers'
import type { AgentTokenUsage } from './agent-row.types'
interface AgentTokenSummaryProps {
tokens: AgentTokenUsage | null
}
/**
* Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
* (the only window we can compute reliably from the session record).
* Per-window stats land in a follow-up once the activity ledger ships.
*/
export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
if (!tokens) return null
const { input, output } = tokens.cumulative
const total = input + output
if (total === 0) return null
const inputPct = (input / total) * 100
return (
<HoverCard openDelay={200}>
<HoverCardTrigger asChild>
<span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
{formatTokens(total)} tokens
</span>
</HoverCardTrigger>
<HoverCardContent side="top" align="end" className="w-72 text-sm">
<div className="mb-3 flex items-center justify-between">
<span className="font-medium">Lifetime tokens</span>
<span className="text-muted-foreground text-xs tabular-nums">
{formatTokens(total)} total
</span>
</div>
<div className="space-y-2">
<div className="flex items-center justify-between text-xs">
<span className="text-muted-foreground">Input</span>
<span className="tabular-nums">{formatTokens(input)}</span>
</div>
<Progress value={inputPct} className="h-1.5" />
<div className="mt-2 flex items-center justify-between text-xs">
<span className="text-muted-foreground">Output</span>
<span className="tabular-nums">{formatTokens(output)}</span>
</div>
<Progress value={100 - inputPct} className="h-1.5" />
</div>
<p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
Cumulative across every turn this agent has run. Per-window stats
arrive in a future release.
</p>
</HoverCardContent>
</HoverCard>
)
}

View File

@@ -0,0 +1,60 @@
import { Star } from 'lucide-react'
import type { FC } from 'react'
import { Button } from '@/components/ui/button'
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from '@/components/ui/tooltip'
import { cn } from '@/lib/utils'
interface PinToggleProps {
pinned: boolean
onToggle: (next: boolean) => void
}
/**
* Trailing star toggle. The button is *always rendered* — only its
* opacity changes between pinned/unpinned/hover states — so the title
* row's height is constant. Hiding the slot via `display: none` would
* collapse the row's vertical metrics on hover and shift every card
* below in the rail.
*
* Placement is trailing the title (after the status badge) so the
* title itself flushes left regardless of pin state — leading the
* row with the star would indent the title relative to the model /
* preview / meta lines beneath it.
*/
export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
<TooltipProvider delayDuration={300}>
<Tooltip>
<TooltipTrigger asChild>
<Button
variant="ghost"
size="icon"
className={cn(
'size-6 text-muted-foreground transition-opacity hover:text-foreground',
pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
)}
aria-pressed={pinned}
aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
onClick={(event) => {
event.stopPropagation()
onToggle(!pinned)
}}
>
<Star
className={cn(
'size-3.5',
pinned && 'fill-amber-400 text-amber-500',
)}
/>
</Button>
</TooltipTrigger>
<TooltipContent side="top" className="text-xs">
{pinned ? 'Unpin' : 'Pin to top'}
</TooltipContent>
</Tooltip>
</TooltipProvider>
)

View File

@@ -0,0 +1,73 @@
import { describe, expect, it } from 'bun:test'
import {
firstNonBlankLine,
formatLocalDate,
formatTokens,
ROW_BAR_COUNT,
truncate,
} from './agent-row.helpers'
describe('formatTokens', () => {
it('renders zero / NaN as "0"', () => {
expect(formatTokens(0)).toBe('0')
expect(formatTokens(Number.NaN)).toBe('0')
})
it('renders sub-1K as integer', () => {
expect(formatTokens(142)).toBe('142')
})
it('renders K with one decimal under 10', () => {
expect(formatTokens(8_400)).toBe('8.4K')
})
it('drops the decimal at >=10K', () => {
expect(formatTokens(120_000)).toBe('120K')
})
it('renders M with one decimal under 10', () => {
expect(formatTokens(1_200_000)).toBe('1.2M')
})
})
describe('firstNonBlankLine', () => {
it('returns the first non-blank line', () => {
expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
})
it('skips USER_QUERY envelope tags', () => {
expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
'fix tests',
)
})
it('falls back to the trimmed input when nothing matches', () => {
expect(firstNonBlankLine(' single ')).toBe('single')
})
})
describe('truncate', () => {
it('returns input unchanged when within limit', () => {
expect(truncate('hello', 10)).toBe('hello')
})
it('appends an ellipsis when over limit', () => {
expect(truncate('hello world', 6)).toBe('hello…')
})
})
describe('formatLocalDate', () => {
const today = new Date('2026-04-30T12:00:00Z')
it('labels today and yesterday explicitly', () => {
expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
})
it('returns a "Mon D" format for older days', () => {
const label = formatLocalDate(0, today)
// "Apr 17" or "Apr 17," depending on locale; just assert it
// contains a month abbreviation and a day number.
expect(label).toMatch(/[A-Za-z]+ \d+/)
})
})

View File

@@ -0,0 +1,64 @@
/**
* Pure formatters consumed by row sub-components. Kept distinct from
* `agent-display.helpers.ts` (page-level helpers) so the row internals
* have an obvious single home.
*/
const TOKEN_THRESHOLDS: Array<[number, string]> = [
[1_000_000, 'M'],
[1_000, 'K'],
]
/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
export function formatTokens(n: number): string {
if (!Number.isFinite(n) || n <= 0) return '0'
for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
if (n >= threshold) {
const value = n / threshold
const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
return `${decimal}${suffix}`
}
}
return String(Math.round(n))
}
const USER_QUERY_OPEN = /^<USER_QUERY>$/i
const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
/**
* First non-blank line, with the BrowserOS user-system-prompt
* `<USER_QUERY>` envelope tags stripped so previews don't show
* structural noise.
*/
export function firstNonBlankLine(text: string): string {
const lines = text.split('\n').map((line) => line.trim())
for (const line of lines) {
if (!line) continue
if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
return line
}
return text.trim()
}
export function truncate(text: string, max: number): string {
if (text.length <= max) return text
return `${text.slice(0, max - 1).trimEnd()}`
}
const SPARKLINE_DAYS = 14
/**
* "today" / "yesterday" / "Apr 17" — given an index 0..13 from
* oldest → newest. `today` defaults to `new Date()` so callers don't
* have to thread a clock through.
*/
export function formatLocalDate(idx: number, today: Date = new Date()): string {
if (idx === SPARKLINE_DAYS - 1) return 'today'
if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
const offset = SPARKLINE_DAYS - 1 - idx
const date = new Date(today)
date.setDate(date.getDate() - offset)
return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
}
export const ROW_BAR_COUNT = SPARKLINE_DAYS

View File

@@ -0,0 +1,51 @@
import type { HarnessAgentAdapter } from '../agent-harness-types'
import type { AgentListItem } from '../agents-page-types'
import type { AgentLiveness } from '../LivenessDot'
/**
* Window-bounded token usage. Server returns `null` when no session
* record exists yet for the agent.
*/
export interface AgentTokenUsage {
last7d: { input: number; output: number; requestCount: number }
cumulative: { input: number; output: number }
}
export interface AgentAdapterHealth {
healthy: boolean
reason?: string
}
/**
* Everything an `AgentRowCard` needs to render. Mirrors the shape
* `useHarnessAgents` exposes; the page assembles one entry per row in
* `AgentList` and passes it down. Sub-components only see slices of
* this object — no prop drilling beyond two levels.
*/
export interface AgentRowData {
agent: AgentListItem
adapter: HarnessAgentAdapter | 'unknown'
modelLabel: string | null
reasoningEffort: string | null
status: AgentLiveness
lastUsedAt: number | null
pinned: boolean
cwd: string | null
lastUserMessage: string | null
tokens: AgentTokenUsage | null
/** 14 entries, oldest → newest. Today is the last index. */
turnsByDay: number[]
/** Same length and ordering as `turnsByDay`. */
failedByDay: number[]
lastError: string | null
lastErrorAt: number | null
/** When non-null, an in-flight turn this row can be resumed from. */
activeTurnId: string | null
/** Adapter-level health, shared across rows for the same adapter. */
adapterHealth: AgentAdapterHealth | null
}
export interface AgentRowCallbacks {
onDelete: (agent: AgentListItem) => void
onPinToggle: (agent: AgentListItem, next: boolean) => void
}

View File

@@ -0,0 +1,104 @@
import { describe, expect, it } from 'bun:test'
import type { HarnessAgent } from './agent-harness-types'
import {
compareAgentsByPinThenRecency,
orderAgentsByPinThenRecency,
} from './agents-list-order'
function makeAgent(input: {
id: string
pinned?: boolean
lastUsedAt?: number | null
}): HarnessAgent {
return {
id: input.id,
name: input.id,
adapter: 'codex',
permissionMode: 'approve-all',
sessionKey: 'session',
createdAt: 0,
updatedAt: 0,
pinned: input.pinned,
lastUsedAt: input.lastUsedAt,
}
}
describe('orderAgentsByPinThenRecency', () => {
it('floats pinned agents to the top regardless of recency', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
])
expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
})
it('sorts by lastUsedAt desc within each pin group', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
])
expect(result.map((entry) => entry.id)).toEqual([
'newer-pin',
'older-pin',
'newer',
'older',
])
})
it('seed-pins the gateway main agent above other never-used agents', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
])
expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
})
it('drops the main seed-pin once the agent has been used', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
])
expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
})
it('puts never-used agents below recently-used ones', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
])
expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
})
it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
])
expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
})
})
describe('compareAgentsByPinThenRecency', () => {
it('produces the same order as the harness-shape helper', () => {
const items = [
{ id: 'older', pinned: false, lastUsedAt: 50 },
{ id: 'newer', pinned: false, lastUsedAt: 80 },
{ id: 'pinned', pinned: true, lastUsedAt: 1 },
]
const sorted = [...items].sort(compareAgentsByPinThenRecency)
expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
})
it('seeds the main agent above other never-used rows', () => {
const items = [
{ id: 'zzz', pinned: false, lastUsedAt: null },
{ id: 'main', pinned: false, lastUsedAt: null },
]
const sorted = [...items].sort(compareAgentsByPinThenRecency)
expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
})
})

View File

@@ -0,0 +1,59 @@
import type { HarnessAgent } from './agent-harness-types'
/**
* Stable ordering for index-shaped agent surfaces (the `/agents` rail
* and the chat-screen rail at `/agents/:agentId`). Pinned rows float
* to the top, then recency desc, with never-used agents falling to
* the bottom in id-stable order. The gateway's `main` agent gets
* seed-pinned to the top of the never-used group so a fresh install
* has an obvious starting point even before the user has used it.
*
* NOT the same rule as the home grid (`orderHomeAgents`): home is
* action-shaped — active-turn floats to the top — so users can
* resume what's running. The chat rail keeps recency stable so it
* doesn't reshuffle as turns transition every 5s.
*/
export function orderAgentsByPinThenRecency(
agents: HarnessAgent[],
): HarnessAgent[] {
return [...agents].sort((a, b) => {
const aPinned = a.pinned ?? false
const bPinned = b.pinned ?? false
if (aPinned !== bPinned) return aPinned ? -1 : 1
const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
return a.id.localeCompare(b.id)
})
}
/**
* Same comparator, but operates over arbitrary records that carry
* `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
* `/agents` `AgentList` which pivots `AgentListItem` + harness
* lookup into a sortable shape; both surfaces stay on identical
* sort semantics through this adapter.
*/
export function compareAgentsByPinThenRecency<
T extends { pinned: boolean; lastUsedAt: number | null; id: string },
>(a: T, b: T): number {
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
const aSeed = a.id === 'main' && a.lastUsedAt === null
const bSeed = b.id === 'main' && b.lastUsedAt === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
return a.id.localeCompare(b.id)
}

View File

@@ -8,6 +8,7 @@ import {
type HarnessAdapterDescriptor,
type HarnessAgent,
type HarnessAgentHistoryPage,
type HarnessQueuedMessage,
mapHarnessAgentToEntry,
} from './agent-harness-types'
import type { OpenClawStatus } from './useOpenClaw'
@@ -135,6 +136,63 @@ export function useCreateHarnessAgent() {
})
}
/**
* Apply a partial update to a harness agent. Used by the pin-toggle
* star and (eventually) the inline rename UI. Optimistically writes
* the patch into the listing query cache so the row updates instantly,
* then rolls back if the server rejects the change.
*/
export function useUpdateHarnessAgent() {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: {
agentId: string
patch: { name?: string; pinned?: boolean }
}) => {
if (!baseUrl || urlLoading) {
throw new Error('BrowserOS agent server URL is not ready')
}
const data = await agentsFetch<{ agent: HarnessAgent }>(
baseUrl,
`/${encodeURIComponent(input.agentId)}`,
{
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(input.patch),
},
)
return data.agent
},
onMutate: async ({ agentId, patch }) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === agentId ? { ...agent, ...patch } : agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}
export function useDeleteHarnessAgent() {
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
const queryClient = useQueryClient()
@@ -206,6 +264,8 @@ export interface HarnessActiveTurnInfo {
lastSeq: number
startedAt: number
endedAt?: number
/** User message that kicked off the turn; null when not captured. */
prompt: string | null
}
/**
@@ -260,3 +320,145 @@ export async function fetchHarnessAgentHistory(
`/${encodeURIComponent(agentId)}/sessions/main/history`,
)
}
export interface EnqueueMessageInput {
message: string
attachments?: ReadonlyArray<unknown>
}
export async function enqueueHarnessMessage(
agentId: string,
input: EnqueueMessageInput,
): Promise<HarnessQueuedMessage> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: input.message,
...(input.attachments && input.attachments.length > 0
? { attachments: input.attachments }
: {}),
}),
},
)
if (!response.ok) {
let message = `Request failed with status ${response.status}`
try {
const body = (await response.json()) as { error?: string }
if (body.error) message = body.error
} catch {}
throw new Error(message)
}
const body = (await response.json()) as { queued: HarnessQueuedMessage }
return body.queued
}
export async function removeHarnessQueuedMessage(
agentId: string,
messageId: string,
): Promise<{ removed: boolean }> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
messageId,
)}`,
{ method: 'DELETE' },
)
if (!response.ok) return { removed: false }
return (await response.json()) as { removed: boolean }
}
/**
* Optimistic enqueue: writes the new queued message into the listing
* cache immediately so the queue panel reflects the change without
* waiting for the next poll. Rolls back if the server rejects.
*/
export function useEnqueueHarnessMessage() {
const { baseUrl } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
enqueueHarnessMessage(input.agentId, input),
onMutate: async (input) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
const optimistic: HarnessQueuedMessage = {
id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
createdAt: Date.now(),
message: input.message,
}
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === input.agentId
? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
: agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}
/**
* Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
*/
export function useRemoveHarnessQueuedMessage() {
const { baseUrl } = useAgentServerUrl()
const queryClient = useQueryClient()
return useMutation({
mutationFn: async (input: { agentId: string; messageId: string }) =>
removeHarnessQueuedMessage(input.agentId, input.messageId),
onMutate: async (input) => {
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
await queryClient.cancelQueries({ queryKey })
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
if (!previous) return { previous: undefined }
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
...previous,
agents: previous.agents.map((agent) =>
agent.id === input.agentId
? {
...agent,
queue: (agent.queue ?? []).filter(
(entry) => entry.id !== input.messageId,
),
}
: agent,
),
})
return { previous }
},
onError: (_err, _vars, context) => {
if (!context?.previous) return
queryClient.setQueryData(
[AGENT_QUERY_KEYS.agents, baseUrl],
context.previous,
)
},
onSettled: async () => {
await queryClient.invalidateQueries({
queryKey: [AGENT_QUERY_KEYS.agents],
})
},
})
}

View File

@@ -1,5 +1,8 @@
import { describe, expect, it } from 'bun:test'
import type { HarnessAdapterDescriptor } from '@/entrypoints/app/agents/agent-harness-types'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import {
buildSidepanelChatTargets,
@@ -77,58 +80,96 @@ const adapters: HarnessAdapterDescriptor[] = [
},
]
const agents: HarnessAgent[] = [
{
id: 'agent-codex',
name: 'Review Bot',
adapter: 'codex',
modelId: 'gpt-5.5',
reasoningEffort: 'medium',
permissionMode: 'approve-all',
sessionKey: 'agent:agent-codex:main',
createdAt: timestamp,
updatedAt: timestamp,
},
{
id: 'agent-openclaw',
name: 'Research Claw',
adapter: 'openclaw',
modelId: 'default',
reasoningEffort: 'high',
permissionMode: 'approve-all',
sessionKey: 'agent:agent-openclaw:main',
createdAt: timestamp,
updatedAt: timestamp,
},
]
describe('buildSidepanelChatTargets', () => {
it('returns LLM targets plus one ACP target per adapter model', () => {
const targets = buildSidepanelChatTargets({ providers, adapters })
it('returns LLM targets plus one ACP target per persisted harness agent', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
expect(targets.map((target) => target.id)).toEqual([
'browseros',
'anthropic-sonnet',
'acp:claude:sonnet:medium',
'acp:claude:haiku:medium',
'acp:codex:gpt-5.5:medium',
'acp:openclaw:default:medium',
'agent-codex',
'agent-openclaw',
])
})
it('emits a single default ACP target for adapters with no per-session model picker', () => {
const targets = buildSidepanelChatTargets({ providers, adapters })
const openclaw = targets.find(
(target) => target.id === 'acp:openclaw:default:medium',
)
it('does not emit catalog-only ACP targets without persisted agents', () => {
const targets = buildSidepanelChatTargets({
providers,
adapters,
agents: [],
})
expect(targets.map((target) => target.id)).toEqual([
'browseros',
'anthropic-sonnet',
])
})
it('uses the created OpenClaw agent name instead of a generic adapter target', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const openclaw = targets.find((target) => target.id === 'agent-openclaw')
expect(openclaw).toMatchObject({
kind: 'acp',
id: 'agent-openclaw',
agentId: 'agent-openclaw',
adapter: 'openclaw',
adapterName: 'OpenClaw',
modelId: 'default',
modelLabel: 'default',
// Without a model picker, the target name is just the adapter
// name — the user picks the adapter, not a model under it.
name: 'OpenClaw',
name: 'Research Claw',
modelControl: 'best-effort',
reasoningEffort: 'medium',
reasoningEffort: 'high',
})
})
it('preserves ACP model-control and recommendation metadata', () => {
const targets = buildSidepanelChatTargets({ providers, adapters })
const haiku = targets.find(
(target) => target.id === 'acp:claude:haiku:medium',
)
it('preserves adapter metadata for created agent targets', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const codex = targets.find((target) => target.id === 'agent-codex')
expect(haiku).toMatchObject({
expect(codex).toMatchObject({
kind: 'acp',
adapter: 'claude',
modelId: 'haiku',
modelControl: 'best-effort',
agentId: 'agent-codex',
adapter: 'codex',
adapterName: 'Codex',
modelId: 'gpt-5.5',
modelLabel: 'GPT-5.5',
modelControl: 'runtime-supported',
recommended: true,
reasoningEffort: 'medium',
reasoningEffortLabel: 'Medium',
})
})
it('still returns LLM targets when ACP adapters are unavailable', () => {
expect(buildSidepanelChatTargets({ providers, adapters: [] })).toEqual([
it('still returns LLM targets when agents and adapters are unavailable', () => {
expect(
buildSidepanelChatTargets({ providers, adapters: [], agents: [] }),
).toEqual([
{
kind: 'llm',
id: 'browseros',
@@ -149,7 +190,7 @@ describe('buildSidepanelChatTargets', () => {
describe('resolveSidepanelChatTarget', () => {
it('resolves selected LLM targets back to their provider config', () => {
const targets = buildSidepanelChatTargets({ providers, adapters })
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const resolved = resolveSidepanelChatTarget({
targets,
defaultProviderId: 'browseros',
@@ -161,13 +202,32 @@ describe('resolveSidepanelChatTarget', () => {
})
it('falls back to the current default LLM provider when a persisted ACP target is stale', () => {
const targets = buildSidepanelChatTargets({ providers, adapters: [] })
const targets = buildSidepanelChatTargets({
providers,
adapters,
agents: [],
})
expect(
resolveSidepanelChatTarget({
targets,
defaultProviderId: 'anthropic-sonnet',
selection: { kind: 'acp', id: 'acp:claude:haiku:medium' },
selection: { kind: 'acp', id: 'agent-codex' },
}),
).toMatchObject({
kind: 'llm',
id: 'anthropic-sonnet',
})
})
it('falls back when an old catalog-style ACP target id is persisted', () => {
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
expect(
resolveSidepanelChatTarget({
targets,
defaultProviderId: 'anthropic-sonnet',
selection: { kind: 'acp', id: 'acp:codex:gpt-5.5:medium' },
}),
).toMatchObject({
kind: 'llm',
@@ -180,10 +240,8 @@ describe('persistSidepanelChatTargetSelection', () => {
it('stores only target identity and does not mutate LLM provider arrays', async () => {
let savedSelection: SidepanelChatTargetSelection | null = null
const originalProviders = providers.map((provider) => ({ ...provider }))
const targets = buildSidepanelChatTargets({ providers, adapters })
const target = targets.find(
(candidate) => candidate.id === 'acp:codex:gpt-5.5:medium',
)
const targets = buildSidepanelChatTargets({ providers, adapters, agents })
const target = targets.find((candidate) => candidate.id === 'agent-codex')
await persistSidepanelChatTargetSelection(target, {
setValue: async (value) => {
@@ -193,7 +251,7 @@ describe('persistSidepanelChatTargetSelection', () => {
expect(savedSelection as SidepanelChatTargetSelection | null).toEqual({
kind: 'acp',
id: 'acp:codex:gpt-5.5:medium',
id: 'agent-codex',
})
expect(providers).toEqual(originalProviders)
})

View File

@@ -1,5 +1,6 @@
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
@@ -19,6 +20,7 @@ export type SidepanelChatTarget =
id: string
name: string
type: 'acp'
agentId: string
adapter: HarnessAgentAdapter
adapterName: string
modelId: string
@@ -37,6 +39,7 @@ export type SidepanelChatTargetSelection = Pick<
interface BuildSidepanelChatTargetsInput {
providers: LlmProviderConfig[]
adapters: HarnessAdapterDescriptor[]
agents?: HarnessAgent[]
}
interface ResolveSidepanelChatTargetInput {
@@ -63,61 +66,49 @@ let sidepanelChatTargetSelectionStorage:
export function buildSidepanelChatTargets({
providers,
adapters,
agents = [],
}: BuildSidepanelChatTargetsInput): SidepanelChatTarget[] {
return [
...providers.map(toLlmTarget),
...adapters.flatMap(toAcpTargetsForAdapter),
...agents.map((agent) => toAcpTargetForAgent(agent, adapters)),
]
}
function toAcpTargetsForAdapter(
adapter: HarnessAdapterDescriptor,
): SidepanelChatTarget[] {
const reasoning = adapter.reasoningEfforts.find(
(effort) => effort.id === adapter.defaultReasoningEffort,
)
function toAcpTargetForAgent(
agent: HarnessAgent,
adapters: HarnessAdapterDescriptor[],
): SidepanelChatTarget {
const adapter = adapters.find((entry) => entry.id === agent.adapter)
const modelId = agent.modelId ?? adapter?.defaultModelId ?? 'default'
const reasoningEffort =
reasoning?.id ?? adapter.defaultReasoningEffort ?? 'medium'
agent.reasoningEffort ?? adapter?.defaultReasoningEffort ?? 'medium'
const model = adapter?.models.find((entry) => entry.id === modelId)
const reasoning = adapter?.reasoningEfforts.find(
(effort) => effort.id === reasoningEffort,
)
// Adapters with no per-session model picker (e.g. OpenClaw, whose
// model lives on the gateway-side agent record) still need exactly
// one sidepanel target so the user can pick the adapter at all.
if (adapter.models.length === 0) {
return [
{
kind: 'acp',
id: buildAcpTargetId(
adapter.id,
adapter.defaultModelId,
reasoningEffort,
),
name: adapter.name,
type: 'acp',
adapter: adapter.id,
adapterName: adapter.name,
modelId: adapter.defaultModelId,
modelLabel: 'default',
modelControl: adapter.modelControl,
reasoningEffort,
reasoningEffortLabel: reasoning?.label,
},
]
}
return adapter.models.map((model) => ({
kind: 'acp' as const,
id: buildAcpTargetId(adapter.id, model.id, reasoningEffort),
name: `${adapter.name} ${model.label}`,
type: 'acp' as const,
adapter: adapter.id,
adapterName: adapter.name,
modelId: model.id,
modelLabel: model.label,
modelControl: adapter.modelControl,
recommended: model.recommended,
return {
kind: 'acp',
id: agent.id,
name: agent.name,
type: 'acp',
agentId: agent.id,
adapter: agent.adapter,
adapterName: adapter?.name ?? formatAdapterName(agent.adapter),
modelId,
modelLabel: model?.label ?? modelId,
modelControl: adapter?.modelControl ?? 'best-effort',
recommended: model?.recommended,
reasoningEffort,
reasoningEffortLabel: reasoning?.label,
}))
}
}
function formatAdapterName(adapter: HarnessAgentAdapter): string {
if (adapter === 'claude') return 'Claude Code'
if (adapter === 'codex') return 'Codex'
if (adapter === 'openclaw') return 'OpenClaw'
return adapter
}
export function resolveSidepanelChatTarget({
@@ -172,14 +163,6 @@ function toLlmTarget(provider: LlmProviderConfig): SidepanelChatTarget {
}
}
export function buildAcpTargetId(
adapter: HarnessAgentAdapter,
modelId: string,
reasoningEffort: string,
): string {
return `acp:${adapter}:${modelId}:${reasoningEffort}`
}
async function getSidepanelChatTargetSelectionStorage(): Promise<SidepanelChatTargetSelectionStore> {
if (sidepanelChatTargetSelectionStorage) {
return sidepanelChatTargetSelectionStorage

View File

@@ -1,6 +1,9 @@
import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
import useDeepCompareEffect from 'use-deep-compare-effect'
import { useAgentAdapters } from '@/entrypoints/app/agents/useAgents'
import {
useAgentAdapters,
useHarnessAgents,
} from '@/entrypoints/app/agents/useAgents'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
import { type McpServer, useMcpServers } from '@/lib/mcp/mcpServerStorage'
@@ -38,6 +41,7 @@ export const useChatRefs = () => {
isLoading: isLoadingProviders,
} = useLlmProviders()
const { adapters, loading: isLoadingAdapters } = useAgentAdapters()
const { harnessAgents, loading: isLoadingAgents } = useHarnessAgents()
const { personalization } = usePersonalization()
const [targetSelection, setTargetSelection] =
useState<SidepanelChatTargetSelection | null>(null)
@@ -57,8 +61,9 @@ export const useChatRefs = () => {
buildSidepanelChatTargets({
providers: llmProviders,
adapters,
agents: harnessAgents,
}),
[llmProviders, adapters],
[llmProviders, adapters, harnessAgents],
)
const selectedChatTarget = useMemo(
@@ -116,6 +121,7 @@ export const useChatRefs = () => {
selectedChatTarget,
selectChatTarget,
selectedLlmProvider,
isLoadingProviders: isLoadingProviders || isLoadingAdapters,
isLoadingProviders:
isLoadingProviders || isLoadingAdapters || isLoadingAgents,
}
}

View File

@@ -40,7 +40,7 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
})
})
it('sends ACP targets to the sidepanel ACP route with explicit target fields', () => {
it('sends created-agent targets to the agent-id sidepanel route', () => {
const request = buildSidepanelPreparedSendMessagesRequest({
agentServerUrl: 'http://127.0.0.1:5151',
target: acpTarget,
@@ -52,12 +52,11 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
...commonRequestInput(),
})
expect(request.api).toBe('http://127.0.0.1:5151/agents/sidepanel/chat')
expect(request.api).toBe(
'http://127.0.0.1:5151/agents/agent-codex/sidepanel/chat',
)
expect(request.body).toEqual({
conversationId,
adapter: 'codex',
modelId: 'gpt-5.5',
reasoningEffort: 'medium',
message: 'Inspect the current tab',
browserContext: {
activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
@@ -140,9 +139,10 @@ const llmTarget: SidepanelChatTarget = {
const acpTarget: SidepanelChatTarget = {
kind: 'acp',
id: 'acp:codex:gpt-5.5:medium',
name: 'Codex GPT-5.5',
id: 'agent-codex',
name: 'Review bot',
type: 'acp',
agentId: 'agent-codex',
adapter: 'codex',
adapterName: 'Codex',
modelId: 'gpt-5.5',

View File

@@ -680,13 +680,20 @@ export const useChatSession = (options?: ChatSessionOptions) => {
const sendMessage = (params: { text: string; action?: ChatAction }) => {
const target = selectedChatTargetRef.current
const llmTargetProvider = toLlmProviderConfig(target)
const agentTarget = target?.kind === 'acp' ? target : undefined
track(MESSAGE_SENT_EVENT, {
mode,
provider_type: target?.kind === 'acp' ? 'acp' : llmTargetProvider?.type,
provider_id:
agentTarget?.agentId ??
llmTargetProvider?.id ??
selectedLlmProvider?.id,
provider_type: agentTarget ? 'acp' : llmTargetProvider?.type,
agent_id: agentTarget?.agentId,
adapter: agentTarget?.adapter,
model:
target?.kind === 'acp'
? target.modelId
: llmTargetProvider?.modelId || selectedLlmProvider?.modelId,
agentTarget?.modelId ??
llmTargetProvider?.modelId ??
selectedLlmProvider?.modelId,
})
if (!isIntegrationsSyncedRef.current) {
@@ -763,6 +770,8 @@ export const useChatSession = (options?: ChatSessionOptions) => {
provider_type: target.kind === 'acp' ? 'acp' : target.type,
model_id:
target.kind === 'acp' ? target.modelId : target.provider.modelId,
agent_id: target.kind === 'acp' ? target.agentId : undefined,
adapter: target.kind === 'acp' ? target.adapter : undefined,
})
void selectChatTarget(target).catch((error) => {

View File

@@ -34,15 +34,10 @@ export function buildSidepanelPreparedSendMessagesRequest({
...common
}: BuildSidepanelPreparedSendMessagesRequestInput) {
if (target?.kind === 'acp') {
// ACP session history is owned by AcpxRuntime through sessionKey, so LLM-only
// resume and approval fields are intentionally not forwarded.
return {
api: `${agentServerUrl}/agents/sidepanel/chat`,
api: `${agentServerUrl}/agents/${encodeURIComponent(target.agentId)}/sidepanel/chat`,
body: {
conversationId: common.conversationId,
adapter: target.adapter,
modelId: target.modelId,
reasoningEffort: target.reasoningEffort,
message: message ?? '',
browserContext: common.browserContext,
userSystemPrompt: common.userSystemPrompt,
@@ -71,6 +66,9 @@ export function toProviderOption(target: SidepanelChatTarget): Provider {
name: target.name,
type: target.type,
kind: target.kind,
agentId: target.kind === 'acp' ? target.agentId : undefined,
adapterName: target.kind === 'acp' ? target.adapterName : undefined,
modelLabel: target.kind === 'acp' ? target.modelLabel : undefined,
modelControl: target.kind === 'acp' ? target.modelControl : undefined,
}
}

View File

@@ -59,15 +59,3 @@ export interface AgentConversation {
createdAt: number
updatedAt: number
}
export interface AgentCardData {
agentId: string
name: string
model?: string
status: 'idle' | 'working' | 'error'
lastMessage?: string
lastMessageTimestamp?: number
activitySummary?: string
currentTool?: string
costUsd?: number
}

View File

@@ -9,6 +9,7 @@
"build": "bun run codegen && wxt build",
"build:dev": "bun --env-file=.env.development wxt build --mode development",
"zip": "wxt zip",
"test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
"compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
"lint": "bunx biome check",
"typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",

View File

@@ -38,8 +38,8 @@ browseros-cli install # downloads BrowserOS for your platform
# If BrowserOS is installed but not running
browseros-cli launch # opens BrowserOS, waits for server
# Configure the CLI (auto-discovers running BrowserOS)
browseros-cli init --auto # detects server URL and saves config
# Configure the CLI with the Server URL from BrowserOS settings
browseros-cli init http://127.0.0.1:9000/mcp
# Verify connection
browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url> # non-interactive — pass URL directly
browseros-cli init # interactive — prompts for URL
```
Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
### CLI updates
@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
| `--debug` | `BOS_DEBUG=1` | Debug output |
| `--timeout, -t` | | Request timeout (default: 2m) |
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
## Testing
@@ -179,7 +179,7 @@ apps/cli/
│ └── config.go # Config file (~/.config/browseros-cli/config.yaml)
├── cmd/
│ ├── root.go # Root command, global flags
│ ├── init.go # Server URL configuration (URL arg, --auto, interactive)
│ ├── init.go # Server URL configuration (URL arg or interactive)
│ ├── install.go # install (download BrowserOS for current platform)
│ ├── launch.go # launch (find and start BrowserOS, wait for server)
│ ├── open.go # open (new_page / new_hidden_page)

View File

@@ -17,8 +17,6 @@ import (
)
func init() {
var autoDiscover bool
cmd := &cobra.Command{
Use: "init [url]",
Short: "Configure the BrowserOS server connection",
@@ -34,9 +32,8 @@ You can provide the full URL or just the port number:
browseros-cli init http://127.0.0.1:9000/mcp
browseros-cli init 9000
Three modes:
Modes:
browseros-cli init <url> Non-interactive (full URL or port number)
browseros-cli init --auto Auto-discover from ~/.browseros/server.json
browseros-cli init Interactive prompt`,
Annotations: map[string]string{"group": "Setup:"},
Args: cobra.MaximumNArgs(1),
@@ -49,22 +46,9 @@ Three modes:
switch {
case len(args) == 1:
// Non-interactive: URL provided as argument
input = args[0]
case autoDiscover:
// Auto-discover: server.json → config → probe common ports
discovered := probeRunningServer()
if discovered == "" {
output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
" If not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", 1)
}
input = discovered
fmt.Printf("Auto-discovered server at %s\n", input)
default:
// Interactive prompt (original behavior)
fmt.Println()
bold.Println("BrowserOS CLI Setup")
fmt.Println()
@@ -95,12 +79,14 @@ Three modes:
output.Errorf(1, "invalid URL: %s", input)
}
// Verify connectivity
fmt.Printf("Checking connection to %s ...\n", baseURL)
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Get(baseURL + "/health")
if err != nil {
output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
output.Errorf(1, "cannot connect to %s: %v\n\n"+
"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
"Then run: browseros-cli init <Server URL>\n"+
"Example: browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
}
resp.Body.Close()
@@ -121,6 +107,5 @@ Three modes:
},
}
cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
rootCmd.AddCommand(cmd)
}

View File

@@ -28,7 +28,7 @@ Linux: Downloads AppImage (or .deb with --deb flag)
After installation:
browseros-cli launch # start BrowserOS
browseros-cli init --auto # configure the CLI`,
browseros-cli init <url> # configure the CLI with the Server URL`,
Annotations: map[string]string{"group": "Setup:"},
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
fmt.Println()
bold.Println("Next steps:")
dim.Println(" browseros-cli launch # start BrowserOS")
dim.Println(" browseros-cli init --auto # configure the CLI")
dim.Println(" browseros-cli init <url> # use the Server URL from BrowserOS settings")
},
}

View File

@@ -1,6 +1,7 @@
package cmd
import (
"encoding/json"
"fmt"
"net/http"
"os"
@@ -38,6 +39,7 @@ If BrowserOS is already running, reports the server URL.`,
if url := probeRunningServer(); url != "" {
green.Printf("BrowserOS is already running at %s\n", url)
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
return
}
@@ -63,7 +65,7 @@ If BrowserOS is already running, reports the server URL.`,
green.Printf("BrowserOS is ready at %s\n", url)
fmt.Println()
dim.Println("Next: browseros-cli init --auto")
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
},
}
@@ -75,39 +77,77 @@ If BrowserOS is already running, reports the server URL.`,
// Server probing
// ---------------------------------------------------------------------------
// probeRunningServer checks server.json, config, and common ports for a running server.
var commonBrowserOSPorts = []int{9100, 9200, 9300}
// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
func probeRunningServer() string {
check := func(baseURL string) bool {
client := &http.Client{Timeout: 2 * time.Second}
resp, err := client.Get(baseURL + "/health")
if err != nil {
return false
}
resp.Body.Close()
return resp.StatusCode == 200
}
client := &http.Client{Timeout: 2 * time.Second}
// 1. server.json — written by BrowserOS on startup with the actual port
if url := loadBrowserosServerURL(); url != "" && check(url) {
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
return url
}
// 2. Saved config / env var
if url := defaultServerURL(); url != "" && check(url) {
if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
return url
}
// 3. Probe common BrowserOS ports as last resort
for _, port := range []int{9100, 9200, 9300} {
return probeCommonServerPorts(client)
}
func checkServerHealth(client *http.Client, baseURL string) bool {
resp, err := client.Get(baseURL + "/health")
if err != nil {
return false
}
resp.Body.Close()
return resp.StatusCode == 200
}
func probeCommonServerPorts(client *http.Client) string {
for _, port := range commonBrowserOSPorts {
url := fmt.Sprintf("http://127.0.0.1:%d", port)
if check(url) {
if checkServerHealth(client, url) {
return url
}
}
return ""
}
type serverDiscoveryConfig struct {
ServerPort int `json:"server_port"`
URL string `json:"url"`
ServerVersion string `json:"server_version"`
BrowserOSVersion string `json:"browseros_version,omitempty"`
ChromiumVersion string `json:"chromium_version,omitempty"`
}
// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
//
// Normal command resolution must not call this because it can override a URL the
// user explicitly saved with `browseros-cli init <Server URL>`.
func loadBrowserosServerURL() string {
home, err := os.UserHomeDir()
if err != nil {
return ""
}
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
if err != nil {
return ""
}
var sc serverDiscoveryConfig
if err := json.Unmarshal(data, &sc); err != nil {
return ""
}
return normalizeServerURL(sc.URL)
}
func mcpEndpointURL(baseURL string) string {
return strings.TrimSuffix(baseURL, "/") + "/mcp"
}
// ---------------------------------------------------------------------------
// Platform-native installation detection
// ---------------------------------------------------------------------------
@@ -117,7 +157,8 @@ func probeRunningServer() string {
// macOS: `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
// Linux: checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
// Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
// and registry uninstall key (per-user Chromium install pattern)
//
// and registry uninstall key (per-user Chromium install pattern)
func isBrowserOSInstalled() bool {
switch runtime.GOOS {
case "darwin":
@@ -271,14 +312,11 @@ func waitForServer(maxWait time.Duration) (string, bool) {
for time.Now().Before(deadline) {
// server.json is written by BrowserOS on startup with the actual port
if url := loadBrowserosServerURL(); url != "" {
resp, err := client.Get(url + "/health")
if err == nil {
resp.Body.Close()
if resp.StatusCode == 200 {
return url, true
}
}
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
return url, true
}
if url := probeCommonServerPorts(client); url != "" {
return url, true
}
fmt.Print(".")
time.Sleep(1 * time.Second)

View File

@@ -0,0 +1,99 @@
package cmd
import (
"fmt"
"net"
"net/http"
"net/http/httptest"
"net/url"
"os"
"path/filepath"
"strconv"
"testing"
"time"
"browseros-cli/config"
)
func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
discoveredServer := newHealthyServer(t)
configServer := newHealthyServer(t)
serverDir := filepath.Join(home, ".browseros")
if err := os.MkdirAll(serverDir, 0755); err != nil {
t.Fatalf("os.MkdirAll() error = %v", err)
}
data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
t.Fatalf("os.WriteFile() error = %v", err)
}
if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := probeRunningServer()
if got != normalizeServerURL(discoveredServer.URL) {
t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
}
}
func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
server := newHealthyServer(t)
port := serverPort(t, server.URL)
originalPorts := commonBrowserOSPorts
commonBrowserOSPorts = []int{port}
t.Cleanup(func() {
commonBrowserOSPorts = originalPorts
})
got, ok := waitForServer(100 * time.Millisecond)
if !ok {
t.Fatal("waitForServer() ok = false, want true")
}
if got != normalizeServerURL(server.URL) {
t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
}
}
func newHealthyServer(t *testing.T) *httptest.Server {
t.Helper()
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/health" {
http.NotFound(w, r)
return
}
w.WriteHeader(http.StatusOK)
}))
t.Cleanup(server.Close)
return server
}
func serverPort(t *testing.T, rawURL string) int {
t.Helper()
parsed, err := url.Parse(rawURL)
if err != nil {
t.Fatalf("url.Parse() error = %v", err)
}
_, portText, err := net.SplitHostPort(parsed.Host)
if err != nil {
t.Fatalf("net.SplitHostPort() error = %v", err)
}
port, err := strconv.Atoi(portText)
if err != nil {
t.Fatalf("strconv.Atoi() error = %v", err)
}
return port
}

View File

@@ -2,10 +2,8 @@ package cmd
import (
"context"
"encoding/json"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"time"
@@ -289,18 +287,15 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
}
}
// defaultServerURL returns the implicit target from user-controlled settings only.
//
// BrowserOS writes a discovery file at runtime, but normal commands intentionally
// ignore it so a saved URL is not silently overridden by another running server.
func defaultServerURL() string {
// 1. Explicit env var always wins
if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
return env
}
// 2. Live discovery file from running BrowserOS (most current)
if url := loadBrowserosServerURL(); url != "" {
return url
}
// 3. Saved config (may be stale if port changed)
cfg, err := config.Load()
if err == nil {
if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -311,33 +306,6 @@ func defaultServerURL() string {
return ""
}
type serverDiscoveryConfig struct {
ServerPort int `json:"server_port"`
URL string `json:"url"`
ServerVersion string `json:"server_version"`
BrowserOSVersion string `json:"browseros_version,omitempty"`
ChromiumVersion string `json:"chromium_version,omitempty"`
}
func loadBrowserosServerURL() string {
home, err := os.UserHomeDir()
if err != nil {
return ""
}
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
if err != nil {
return ""
}
var sc serverDiscoveryConfig
if err := json.Unmarshal(data, &sc); err != nil {
return ""
}
return normalizeServerURL(sc.URL)
}
func normalizeServerURL(raw string) string {
normalized := strings.TrimSpace(raw)
@@ -369,8 +337,10 @@ func validateServerURL(raw string) (string, error) {
return "", fmt.Errorf(
"BrowserOS server URL is not configured.\n\n" +
" If BrowserOS is running: browseros-cli init --auto\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install",
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
" Save it with: browseros-cli init <Server URL>\n" +
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install",
)
}

View File

@@ -1,8 +1,13 @@
package cmd
import (
"os"
"path/filepath"
"strings"
"testing"
"time"
"browseros-cli/config"
)
func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -100,6 +105,76 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
}
}
func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := defaultServerURL()
if got != "http://127.0.0.1:9115" {
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := defaultServerURL()
if got != "http://127.0.0.1:9115" {
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
serverDir := filepath.Join(home, ".browseros")
if err := os.MkdirAll(serverDir, 0755); err != nil {
t.Fatalf("os.MkdirAll() error = %v", err)
}
data := []byte(`{"url":"http://127.0.0.1:9999"}`)
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
t.Fatalf("os.WriteFile() error = %v", err)
}
if got := defaultServerURL(); got != "" {
t.Fatalf("defaultServerURL() = %q, want empty", got)
}
}
func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
if got != "http://127.0.0.1:9115" {
t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestValidateServerURLExplainsManualInit(t *testing.T) {
_, err := validateServerURL("")
if err == nil {
t.Fatal("validateServerURL() error = nil, want setup instructions")
}
msg := err.Error()
if !strings.Contains(msg, "browseros-cli init <Server URL>") {
t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
}
if strings.Contains(msg, "init --auto") {
t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
}
}
func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
done := make(chan struct{})
returned := make(chan struct{})

View File

@@ -44,10 +44,7 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {
session, err := sdkClient.Connect(ctx, transport, nil)
if err != nil {
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
" If BrowserOS is not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", c.BaseURL, err)
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
}
return session, nil
}
@@ -187,10 +184,7 @@ func (c *Client) Status() (map[string]any, error) {
func (c *Client) restGET(path string) (map[string]any, error) {
resp, err := c.HTTPClient.Get(c.BaseURL + path)
if err != nil {
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
" If BrowserOS is not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", c.BaseURL, err)
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
}
defer resp.Body.Close()
@@ -205,3 +199,14 @@ func (c *Client) restGET(path string) (map[string]any, error) {
}
return data, nil
}
// connectionSetupInstructions explains how to recover from a stale or missing server URL.
func connectionSetupInstructions() string {
return "\n\n" +
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
" Save it with: browseros-cli init <Server URL>\n" +
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
" Run once with: browseros-cli --server <Server URL> health\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install"
}

View File

@@ -31,8 +31,8 @@ browseros-cli install
# Start BrowserOS
browseros-cli launch
# Auto-configure MCP settings for your AI tools
browseros-cli init --auto
# Configure MCP settings with the Server URL from BrowserOS settings
browseros-cli init http://127.0.0.1:9000/mcp
# Verify everything is working
browseros-cli health

View File

@@ -0,0 +1,51 @@
# Copy to .env.development for local eval runs.
# Provider keys used by existing config files.
OPENROUTER_API_KEY=
FIREWORKS_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_GENERATIVE_AI_API_KEY=
# Claude Agent SDK token used by performance_grader.
CLAUDE_CODE_OAUTH_TOKEN=
# Suite-mode model selection.
EVAL_VARIANT=local
EVAL_AGENT_PROVIDER=openai-compatible
EVAL_AGENT_MODEL=
EVAL_AGENT_API_KEY=
EVAL_AGENT_BASE_URL=
EVAL_AGENT_SUPPORTS_IMAGES=true
# Optional suite-mode executor override for orchestrator suites.
EVAL_EXECUTOR_MODEL=
EVAL_EXECUTOR_API_KEY=
EVAL_EXECUTOR_BASE_URL=
# Clado visual action executor.
CLADO_ACTION_MODEL=
CLADO_ACTION_API_KEY=
CLADO_ACTION_BASE_URL=
# Backward-compatible alias used by older local scripts.
CLADO_ACTION_URL=
# BrowserOS runner.
BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
BROWSEROS_SERVER_URL=http://127.0.0.1:9110
BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
BROWSEROS_CONFIG_URL=
# Captcha solver extension.
NOPECHA_API_KEY=
# WebArena-Infinity.
WEBARENA_INFINITY_DIR=
INFINITY_APP_URL=
# R2 publishing and weekly report.
EVAL_R2_ACCOUNT_ID=
EVAL_R2_ACCESS_KEY_ID=
EVAL_R2_SECRET_ACCESS_KEY=
EVAL_R2_BUCKET=browseros-eval
EVAL_R2_CDN_BASE_URL=https://eval.browseros.com

View File

@@ -9,11 +9,13 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
- **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
- **Bun** runtime
- **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.
## Quick Start
```bash
cd apps/eval
cp .env.example .env.development
# Edit .env.development with your keys, then:
bun run eval
```
@@ -23,17 +25,62 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
### CLI mode
```bash
bun run eval -c configs/browseros-agent-weekly.json
bun run eval -c configs/legacy/browseros-agent-weekly.json
bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
```
Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.
The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
```bash
bun run eval run --config configs/legacy/browseros-agent-weekly.json
bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
```
Config files live in two groups:
```txt
configs/legacy/ # Complete EvalConfig files used by older workflows and the dashboard
configs/suites/ # Suite definitions; model/provider comes from CLI flags or env
```
Suite mode takes model settings from CLI flags first, then env:
```bash
EVAL_VARIANT=kimi-fireworks \
EVAL_AGENT_PROVIDER=openai-compatible \
EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
```
### Suites and variants
A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
This lets us run the same suite against multiple model setups without copying the benchmark config:
```txt
agisdk-daily-10 + kimi-fireworks
agisdk-daily-10 + claude-opus
agisdk-daily-10 + clado-action-000159
```
For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
## Agent types
| Type | Description |
|------|-------------|
| `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
| `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |
### Single agent
@@ -66,14 +113,32 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
},
"executor": {
"provider": "clado-action",
"model": "qwen3-vl-30b-a3b-instruct",
"model": "Qwen3.5-35B-A3B-action-000159-merged",
"apiKey": "",
"baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
"baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
}
}
}
```
### Claude Code
Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
```json
{
"agent": {
"type": "claude-code",
"model": "opus"
}
}
```
```bash
BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
```
## Graders
| Name | Description |
@@ -96,6 +161,21 @@ The `apiKey` field supports two formats:
- **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
- **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)
### Environment variables
| Variable | Used for |
|----------|----------|
| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
### Supported providers
| Provider | `provider` value | Requires `baseUrl` |
@@ -110,6 +190,22 @@ The `apiKey` field supports two formats:
| Ollama | `ollama` | No |
| Clado Action (executor only) | `clado-action` | Yes |
### R2 publishing
`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
```bash
export EVAL_R2_ACCOUNT_ID=...
export EVAL_R2_ACCESS_KEY_ID=...
export EVAL_R2_SECRET_ACCESS_KEY=...
export EVAL_R2_BUCKET=browseros-eval
export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
```
`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
### BrowserOS infrastructure
```json
@@ -119,7 +215,7 @@ The `apiKey` field supports two formats:
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": true
"headless": false
}
```
@@ -137,10 +233,12 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP
| File | Tasks | Description |
|------|-------|-------------|
| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
| `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
| `mind2web.jsonl` | 300 | Online-Mind2Web |
| `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
| `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
| `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
| `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
@@ -167,14 +265,47 @@ results/
browseros-agent-weekly/
2026-04-29-1430/
Amazon--0/
attempt.json # Stable attempt summary for viewer/reporting
metadata.json # Task result, timing, grader scores
grades.json # Compact grader results
messages.jsonl # Full message log
grader-artifacts/ # Grader-specific inputs/outputs/stderr
screenshots/
001.png # Step-by-step screenshots
002.png
summary.json # Aggregate pass rates
```
R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
### R2 viewer manifest
`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
```json
{
"schemaVersion": 2,
"runId": "agisdk-real-smoke-2026-04-30-0000",
"tasks": [
{
"queryId": "agisdk-dashdish-10",
"paths": {
"metadata": "tasks/agisdk-dashdish-10/metadata.json",
"messages": "tasks/agisdk-dashdish-10/messages.jsonl",
"grades": "tasks/agisdk-dashdish-10/grades.json",
"trace": "tasks/agisdk-dashdish-10/trace.jsonl",
"screenshots": "tasks/agisdk-dashdish-10/screenshots",
"graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
}
}
]
}
```
The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
## Troubleshooting
**BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.

View File

@@ -7,8 +7,8 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/agisdk-real.jsonl",
"num_workers": 10,
"dataset": "../../data/agisdk-real-smoke.jsonl",
"num_workers": 1,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",

View File

@@ -0,0 +1,26 @@
{
"agent": {
"type": "single",
"provider": "openai-compatible",
"model": "accounts/fireworks/models/kimi-k2p5",
"apiKey": "FIREWORKS_API_KEY",
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 4,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
@@ -21,6 +21,6 @@
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["performance_grader"],
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -14,7 +14,7 @@
"baseUrl": "https://api.fireworks.ai/inference/v1"
}
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/webbench-2of4-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {

View File

@@ -9,12 +9,12 @@
},
"executor": {
"provider": "clado-action",
"model": "qwen3-vl-30b-a3b-instruct",
"model": "Qwen3.5-35B-A3B-action-000159-merged",
"apiKey": "",
"baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
"baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
}
},
"dataset": "../data/webbench-2of4-50.jsonl",
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
@@ -28,6 +28,6 @@
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["performance_grader"],
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -0,0 +1,22 @@
{
"agent": {
"type": "claude-code",
"model": "opus"
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 1,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../data/webarena-infinity-hard-50.jsonl",
"dataset": "../../data/webarena-infinity-hard-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {

View File

@@ -5,7 +5,7 @@
"model": "openai/gpt-4.1",
"apiKey": "OPENROUTER_API_KEY"
},
"dataset": "../data/mind2web.jsonl",
"dataset": "../../data/mind2web.jsonl",
"num_workers": 5,
"restart_server_per_task": true,
"browseros": {

View File

@@ -7,7 +7,7 @@
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../data/webvoyager.jsonl",
"dataset": "../../data/webvoyager.jsonl",
"num_workers": 3,
"restart_server_per_task": true,
"browseros": {

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-daily-10",
"dataset": "../../data/agisdk-daily-10.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-real-smoke",
"dataset": "../../data/agisdk-real-smoke.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,22 @@
{
"id": "agisdk-real",
"dataset": "../../data/agisdk-real.jsonl",
"agent": {
"type": "single"
},
"graders": ["agisdk_state_diff"],
"workers": 1,
"restartBrowserPerTask": true,
"timeoutMs": 1800000,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
}
}

View File

@@ -0,0 +1,10 @@
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}

View File

@@ -0,0 +1 @@
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}

View File

@@ -5,6 +5,7 @@
"type": "module",
"scripts": {
"eval": "bun --env-file=.env.development run src/index.ts",
"test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
"typecheck": "tsc --noEmit"
},
"dependencies": {

View File

@@ -1,34 +1,73 @@
/**
* Test script for Clado API endpoints (grounding + action models)
* Smoke-test for the Clado BrowserOS Action endpoint.
*
* Health-checks the model, then runs a generate call and prints every
* field the new contract documents (action, coordinates, text, key,
* direction, scroll/drag fields, wait, end+final_answer, thinking,
* parse_error, raw_response).
*
* Usage:
* bun apps/eval/scripts/test-clado-api.ts [screenshot-path]
*
* If no screenshot provided, captures one from a running BrowserOS server.
* If no screenshot path is given, captures one over MCP from a
* running BrowserOS server (default http://127.0.0.1:9110, override
* with BROWSEROS_URL).
*
* Cold start can take ~5 minutes; the script waits up to 6.
*/
import { readFile } from 'node:fs/promises'
import { resolve } from 'node:path'
const ACTION_URL =
'https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run'
'https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run'
const ACTION_HEALTH_URL =
'https://clado-ai--clado-browseros-action-actionmodel-health.modal.run'
const GROUNDING_URL =
'https://clado-ai--clado-browseros-grounding-groundingmodel-generate.modal.run'
const GROUNDING_HEALTH_URL =
'https://clado-ai--clado-browseros-grounding-groundingmodel-health.modal.run'
'https://clado-ai--clado-browseros-action-000159-merged-actionmod-5e5033.modal.run'
async function checkHealth(name: string, url: string): Promise<boolean> {
console.log(`\n--- ${name} health check ---`)
console.log(` URL: ${url}`)
const COLD_START_BUDGET_MS = 360_000 // 6 min — Clado cold start is ~5 min
const COLD_START_WARN_MS = 30_000
interface CladoResponse {
action?: string | null
thinking?: string | null
raw_response?: string
parse_error?: string | null
inference_time_seconds?: number
x?: number
y?: number
text?: string
key?: string
direction?: string
amount?: number
startX?: number
startY?: number
endX?: number
endY?: number
time?: number
final_answer?: string | null
}
async function checkHealth(): Promise<boolean> {
console.log(`\n--- Action model health ---`)
console.log(` URL: ${ACTION_HEALTH_URL}`)
console.log(
` Note: cold start can take ~5 min; waiting up to ${COLD_START_BUDGET_MS / 1000}s.`,
)
const start = performance.now()
const warn = setTimeout(() => {
console.log(
` ...still waiting (${COLD_START_WARN_MS / 1000}s in) — model is likely cold-starting on Modal.`,
)
}, COLD_START_WARN_MS)
try {
const resp = await fetch(url, { signal: AbortSignal.timeout(30_000) })
const resp = await fetch(ACTION_HEALTH_URL, {
signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
})
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
const body = await resp.text()
console.log(` Status: ${resp.status} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 200)}`)
console.log(` Body: ${body.slice(0, 400)}`)
return resp.ok
} catch (err) {
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
@@ -36,63 +75,34 @@ async function checkHealth(name: string, url: string): Promise<boolean> {
` FAILED (${elapsed}s): ${err instanceof Error ? err.message : err}`,
)
return false
} finally {
clearTimeout(warn)
}
}
async function testGenerate(
name: string,
url: string,
async function generate(
label: string,
payload: Record<string, unknown>,
): Promise<Record<string, unknown> | null> {
console.log(`\n--- ${name} generate ---`)
console.log(` URL: ${url}`)
): Promise<CladoResponse | null> {
console.log(`\n--- ${label} ---`)
console.log(` URL: ${ACTION_URL}`)
console.log(` Instruction: ${payload.instruction}`)
console.log(
` Image size: ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
` Image size: ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
)
if (payload.history) console.log(` History: ${payload.history}`)
if (payload.history && payload.history !== 'None') {
console.log(` History: ${payload.history}`)
}
const start = performance.now()
let resp: Response
try {
const resp = await fetch(url, {
resp = await fetch(ACTION_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
signal: AbortSignal.timeout(120_000),
signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
})
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
if (!resp.ok) {
const body = await resp.text()
console.log(` FAILED: HTTP ${resp.status} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 400)}`)
return null
}
const result = (await resp.json()) as Record<string, unknown>
console.log(` Status: ${resp.status} (${elapsed}s)`)
console.log(` Action: ${result.action}`)
if (result.x !== null && result.x !== undefined)
console.log(` Coordinates: (${result.x}, ${result.y})`)
if (result.text)
console.log(` Text: ${(result.text as string).slice(0, 100)}`)
if (result.key) console.log(` Key: ${result.key}`)
if (result.inference_time_seconds)
console.log(` Inference: ${result.inference_time_seconds}s`)
// Show thinking if present
const raw = result.raw_response as string | undefined
if (raw) {
const thinkMatch = raw.match(/<thinking>([\s\S]*?)<\/thinking>/)
if (thinkMatch) {
const thinking = thinkMatch[1].trim()
console.log(
` Thinking: ${thinking.slice(0, 200)}${thinking.length > 200 ? '...' : ''}`,
)
}
}
return result
} catch (err) {
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
console.log(
@@ -100,6 +110,50 @@ async function testGenerate(
)
return null
}
const elapsed = ((performance.now() - start) / 1000).toFixed(2)
if (!resp.ok) {
const body = await resp.text()
console.log(` HTTP ${resp.status} ${resp.statusText} (${elapsed}s)`)
console.log(` Body: ${body.slice(0, 400)}`)
return null
}
const result = (await resp.json()) as CladoResponse
console.log(` HTTP ${resp.status} (${elapsed}s)`)
console.log(` action: ${result.action ?? 'null'}`)
if (result.parse_error) {
console.log(` parse_error: ${result.parse_error}`)
}
if (result.thinking) {
const trimmed = result.thinking.replace(/\s+/g, ' ').trim()
console.log(
` thinking: ${trimmed.slice(0, 240)}${trimmed.length > 240 ? '…' : ''}`,
)
}
if (typeof result.x === 'number' || typeof result.y === 'number') {
console.log(` x, y: ${result.x}, ${result.y}`)
}
if (typeof result.text === 'string')
console.log(` text: ${result.text.slice(0, 120)}`)
if (typeof result.key === 'string')
console.log(` key: ${result.key}`)
if (typeof result.direction === 'string')
console.log(` direction: ${result.direction}`)
if (typeof result.amount === 'number')
console.log(` amount: ${result.amount}`)
if (typeof result.startX === 'number' || typeof result.endX === 'number') {
console.log(
` drag: (${result.startX}, ${result.startY}) → (${result.endX}, ${result.endY})`,
)
}
if (typeof result.time === 'number')
console.log(` time: ${result.time}s`)
if (result.final_answer)
console.log(` final_answer: ${result.final_answer.slice(0, 240)}`)
if (typeof result.inference_time_seconds === 'number')
console.log(` inference_time_seconds: ${result.inference_time_seconds}`)
return result
}
async function loadScreenshot(path?: string): Promise<string> {
@@ -110,10 +164,9 @@ async function loadScreenshot(path?: string): Promise<string> {
return data.toString('base64')
}
// Try to capture from a running BrowserOS server
const serverUrl = process.env.BROWSEROS_URL || 'http://127.0.0.1:9110'
console.log(
`No screenshot path provided. Trying to capture from ${serverUrl}...`,
`No screenshot path provided. Capturing from ${serverUrl} via MCP...`,
)
const { Client } = await import('@modelcontextprotocol/sdk/client/index.js')
@@ -134,82 +187,101 @@ async function loadScreenshot(path?: string): Promise<string> {
arguments: { format: 'png', page: 1 },
})) as { content: Array<{ type: string; data?: string }> }
const imageContent = result.content?.find((c) => c.type === 'image')
if (!imageContent?.data)
throw new Error('No image data in screenshot response')
const image = result.content?.find((c) => c.type === 'image')
if (!image?.data)
throw new Error('No image data in take_screenshot response')
console.log(
`Captured screenshot (${(imageContent.data.length / 1024).toFixed(0)} KB base64)`,
`Captured screenshot (${(image.data.length / 1024).toFixed(0)} KB base64)`,
)
return imageContent.data
return image.data
} finally {
try {
await transport.close()
} catch {}
} catch {
/* ignore */
}
}
}
function summarize(history: CladoResponse[]): string {
if (history.length === 0) return 'None'
return history
.map((h) => {
switch (h.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${h.action}(${h.x}, ${h.y})`
case 'type':
return `type(${JSON.stringify(h.text ?? '')})`
case 'press_key':
return `press_key(${JSON.stringify(h.key ?? '')})`
case 'scroll':
return `scroll(${h.direction ?? 'down'})`
case 'drag':
return `drag(${h.startX},${h.startY} -> ${h.endX},${h.endY})`
case 'wait':
return `wait(${h.time ?? 1}s)`
case 'end':
return 'end()'
default:
return h.action ?? 'invalid'
}
})
.join(' -> ')
}
async function main() {
const screenshotPath = process.argv[2]
console.log('=== Clado action endpoint smoke test ===')
console.log('=== Clado API Test ===\n')
// Health checks (parallel)
const [actionHealthy, groundingHealthy] = await Promise.all([
checkHealth('Action Model', ACTION_HEALTH_URL),
checkHealth('Grounding Model', GROUNDING_HEALTH_URL),
])
if (!actionHealthy && !groundingHealthy) {
console.log('\nBoth endpoints are down. Exiting.')
const healthy = await checkHealth()
if (!healthy) {
console.log('\nHealth check failed. Exiting.')
process.exit(1)
}
// Load screenshot
let imageBase64: string
try {
imageBase64 = await loadScreenshot(screenshotPath)
imageBase64 = await loadScreenshot(process.argv[2])
} catch (err) {
console.log(
`\nFailed to load screenshot: ${err instanceof Error ? err.message : err}`,
)
console.log(
'Provide a screenshot path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
'Pass a path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
)
process.exit(1)
}
const instruction = 'Click on the search button or search bar'
const history: CladoResponse[] = []
// Test grounding model
if (groundingHealthy) {
await testGenerate('Grounding Model', GROUNDING_URL, {
instruction,
// Step 1: open task — let the model decide what to do.
const step1 = await generate('Step 1: cold task', {
instruction: 'Find the search bar and click it',
image_base64: imageBase64,
history: 'None',
})
if (step1?.action) history.push(step1)
// Step 2: continuation with history, asks for typing.
if (step1?.action) {
const step2 = await generate('Step 2: with history', {
instruction: 'Type "hello world" into the search bar',
image_base64: imageBase64,
history: summarize(history),
})
} else {
console.log('\nSkipping grounding model (unhealthy)')
if (step2?.action) history.push(step2)
}
// Test action model (no history)
if (actionHealthy) {
const result = await testGenerate('Action Model (step 1)', ACTION_URL, {
instruction,
image_base64: imageBase64,
history: 'None',
})
// Test action model with history (simulate multi-turn)
if (result && result.action === 'click') {
await testGenerate('Action Model (step 2, with history)', ACTION_URL, {
instruction: 'Type "hello world" in the search bar',
image_base64: imageBase64,
history: `click(${result.x}, ${result.y})`,
})
}
} else {
console.log('\nSkipping action model (unhealthy)')
}
// Step 3: ask for end with a final answer to exercise that field.
await generate('Step 3: ask for end+final_answer', {
instruction:
'You have completed the task. Reply with end() and final_answer="done".',
image_base64: imageBase64,
history: summarize(history),
})
console.log('\n=== Done ===')
}

View File

@@ -1,349 +1,43 @@
#!/usr/bin/env bun
/**
* Upload eval runs to R2.
*
* Two modes:
* bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
* → uploads that specific run
*
* bun scripts/upload-run.ts results/browseros-agent-weekly
* → finds all timestamped subfolders, uploads any not yet in R2
*
* Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
* EVAL_R2_BUCKET (default: browseros-eval)
* EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
*/
import { readdir, readFile, stat } from 'node:fs/promises'
import { basename, dirname, extname, join } from 'node:path'
import {
GetObjectCommand,
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3'
loadR2ConfigFromEnv,
R2Publisher,
} from '../src/publishing/r2-publisher'
const CONCURRENCY = 20
const CONTENT_TYPES: Record<string, string> = {
'.json': 'application/json',
'.jsonl': 'application/x-ndjson',
'.png': 'image/png',
}
interface R2Config {
accountId: string
accessKeyId: string
secretAccessKey: string
bucket: string
cdnBaseUrl: string
}
function loadConfig(): R2Config {
const accountId = process.env.EVAL_R2_ACCOUNT_ID
const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
if (!accountId || !accessKeyId || !secretAccessKey) {
console.error(
'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
)
process.exit(1)
}
return {
accountId,
accessKeyId,
secretAccessKey,
bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
cdnBaseUrl: (
process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
).replace(/\/+$/, ''),
}
}
function createClient(config: R2Config): S3Client {
return new S3Client({
region: 'auto',
endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
credentials: {
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey,
},
})
}
async function upload(
client: S3Client,
bucket: string,
key: string,
body: Buffer,
contentType: string,
) {
await client.send(
new PutObjectCommand({
Bucket: bucket,
Key: key,
Body: body,
ContentType: contentType,
}),
)
}
async function collectFiles(dir: string): Promise<string[]> {
const files: string[] = []
const entries = await readdir(dir, { withFileTypes: true })
for (const entry of entries) {
const full = join(dir, entry.name)
if (entry.isDirectory()) {
files.push(...(await collectFiles(full)))
} else {
files.push(full)
}
}
return files
}
async function runPool<T>(
items: T[],
concurrency: number,
fn: (item: T) => Promise<void>,
) {
let i = 0
const workers = Array.from({ length: concurrency }, async () => {
while (i < items.length) {
const idx = i++
await fn(items[idx])
}
})
await Promise.all(workers)
}
// Check if a run has already been uploaded to R2
async function isUploaded(
client: S3Client,
bucket: string,
runId: string,
): Promise<boolean> {
try {
await client.send(
new GetObjectCommand({
Bucket: bucket,
Key: `runs/${runId}/manifest.json`,
}),
)
return true
} catch {
return false
}
}
// Detect if a directory is a run dir (has task subdirs with metadata.json)
// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
async function isRunDir(dir: string): Promise<boolean> {
const entries = await readdir(dir, { withFileTypes: true })
const subdirs = entries.filter((e) => e.isDirectory())
for (const subdir of subdirs) {
const metaPath = join(dir, subdir.name, 'metadata.json')
const metaStat = await stat(metaPath).catch(() => null)
if (metaStat?.isFile()) return true
}
return false
}
async function uploadSingleRun(
runDir: string,
runId: string,
r2Config: R2Config,
client: S3Client,
): Promise<void> {
const taskDirs = await readdir(runDir, { withFileTypes: true })
const taskEntries = taskDirs.filter((d) => d.isDirectory())
if (taskEntries.length === 0) {
console.warn(` No task subdirectories in ${runId}, skipping`)
return
}
const manifestTasks: Record<string, unknown>[] = []
const jobs: { key: string; filePath: string; contentType: string }[] = []
// Extract agent config from first task
let agentConfig: Record<string, unknown> | undefined
let dataset: string | undefined
for (const taskDir of taskEntries) {
const taskId = taskDir.name
const taskPath = join(runDir, taskId)
const metaPath = join(taskPath, 'metadata.json')
let meta: Record<string, unknown> = {}
try {
meta = JSON.parse(await readFile(metaPath, 'utf-8'))
} catch {
continue
}
if (!agentConfig && meta.agent_config)
agentConfig = meta.agent_config as Record<string, unknown>
if (!dataset && meta.dataset) dataset = meta.dataset as string
const files = await collectFiles(taskPath)
let screenshotCount = 0
for (const file of files) {
const relative = file.slice(taskPath.length + 1)
const ext = extname(file)
if (relative.startsWith('screenshots/') && ext === '.png')
screenshotCount++
jobs.push({
key: `runs/${runId}/${taskId}/${relative}`,
filePath: file,
contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
})
}
manifestTasks.push({
queryId: meta.query_id || taskId,
query: meta.query || '',
startUrl: meta.start_url || '',
status:
meta.termination_reason === 'completed'
? 'completed'
: meta.termination_reason || 'unknown',
durationMs: meta.total_duration_ms || 0,
screenshotCount: (meta.screenshot_count as number) || screenshotCount,
graderResults: meta.grader_results || {},
})
}
if (manifestTasks.length === 0) {
console.warn(` No completed tasks in ${runId}, skipping`)
return
}
console.log(
` Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
)
let uploaded = 0
await runPool(jobs, CONCURRENCY, async (job) => {
const body = await readFile(job.filePath)
await upload(client, r2Config.bucket, job.key, body, job.contentType)
uploaded++
if (uploaded % 50 === 0 || uploaded === jobs.length) {
console.log(` ${uploaded}/${jobs.length}`)
}
})
// Read summary.json if it exists
let summaryData: Record<string, unknown> | undefined
try {
summaryData = JSON.parse(
await readFile(join(runDir, 'summary.json'), 'utf-8'),
)
} catch {}
// Upload manifest
const manifest = {
runId,
uploadedAt: new Date().toISOString(),
agentConfig,
dataset,
summary: summaryData
? {
passRate: summaryData.passRate,
avgDurationMs: summaryData.avgDurationMs,
}
: undefined,
tasks: manifestTasks,
}
const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
await upload(
client,
r2Config.bucket,
`runs/${runId}/manifest.json`,
manifestBody,
'application/json',
)
// Upload viewer.html to bucket root
const viewerPath = join(
import.meta.dir,
'..',
'src',
'dashboard',
'viewer.html',
)
const viewerBody = await readFile(viewerPath)
await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
console.log(` Uploaded ${uploaded + 2} files`)
console.log(` ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
}
async function main() {
async function main(): Promise<void> {
const inputDir = process.argv[2]
if (!inputDir) {
console.error(
throw new Error(
'Usage:\n' +
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730 (specific run)\n' +
' bun scripts/upload-run.ts results/config-name (all un-uploaded runs)',
)
process.exit(1)
}
const dirStat = await stat(inputDir).catch(() => null)
if (!dirStat?.isDirectory()) {
console.error(`Not a directory: ${inputDir}`)
process.exit(1)
}
const r2Config = loadConfig()
const client = createClient(r2Config)
if (await isRunDir(inputDir)) {
// Single run: results/config-name/2026-03-21-1730
const timestamp = basename(inputDir)
const configName = basename(dirname(inputDir))
const runId = `${configName}-${timestamp}`
console.log(`Uploading run: ${runId}`)
await uploadSingleRun(inputDir, runId, r2Config, client)
} else {
// Config dir: results/config-name/ — upload all un-uploaded runs
const configName = basename(inputDir)
const entries = await readdir(inputDir, { withFileTypes: true })
const runDirs = entries
.filter((e) => e.isDirectory())
.map((e) => e.name)
.sort()
if (runDirs.length === 0) {
console.error('No run subdirectories found')
process.exit(1)
}
console.log(
`Found ${runDirs.length} runs for config "${configName}", checking R2...`,
)
let uploadedCount = 0
for (const dir of runDirs) {
const runId = `${configName}-${dir}`
const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
if (alreadyUploaded) {
console.log(` ${runId}: already uploaded, skipping`)
continue
}
console.log(` ${runId}: uploading...`)
await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
uploadedCount++
}
console.log(
`\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
' bun scripts/upload-run.ts results/config-name',
)
}
const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
const result = await publisher.publishPath(inputDir)
for (const run of result.uploadedRuns) {
console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
console.log(run.viewerUrl)
}
for (const runId of result.skippedRuns) {
console.log(`${runId}: already uploaded, skipping`)
}
console.log(
`Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
)
}
main()
main().catch((error) => {
console.error(error instanceof Error ? error.message : String(error))
process.exit(1)
})

View File

@@ -24,45 +24,11 @@ import {
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3'
interface ManifestTask {
queryId: string
query: string
status: string
durationMs: number
screenshotCount: number
graderResults: Record<string, { pass: boolean; score: number }>
}
interface Manifest {
runId: string
uploadedAt: string
agentConfig?: { type?: string; model?: string }
dataset?: string
summary?: { passRate?: number; avgDurationMs?: number }
tasks: ManifestTask[]
}
interface RunSummary {
runId: string
configName: string
date: string
avgScore: number
total: number
completed: number
failed: number
timeout: number
avgDurationMs: number
model: string
dataset: string
agentType: string
}
const PASS_FAIL_GRADER_ORDER = [
'agisdk_state_diff',
'infinity_state',
'performance_grader',
]
import {
buildRunSummaries,
type ReportManifest,
type RunSummary,
} from '../src/reporting/run-summary'
function requireEnv(name: string): string {
const value = process.env[name]
@@ -87,7 +53,7 @@ const client = new S3Client({
// Step 1: List all manifest.json files in runs/
console.log('Scanning R2 for eval runs...')
const manifests: Manifest[] = []
const manifests: ReportManifest[] = []
let continuationToken: string | undefined
do {
@@ -127,64 +93,9 @@ if (manifests.length === 0) {
}
// Step 2: Build run summaries
const runs: RunSummary[] = manifests
.map((m) => {
const total = m.tasks.length
const completed = m.tasks.filter((t) => t.status === 'completed').length
const failed = m.tasks.filter((t) => t.status === 'failed').length
const timeout = m.tasks.filter((t) => t.status === 'timeout').length
let scoredCount = 0
let scoreSum = 0
for (const task of m.tasks) {
if (!task.graderResults) continue
for (const name of PASS_FAIL_GRADER_ORDER) {
if (task.graderResults[name]) {
scoredCount++
scoreSum += task.graderResults[name].score ?? 0
break
}
}
}
const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
const durations = m.tasks
.filter((t) => t.durationMs > 0)
.map((t) => t.durationMs)
const avgDurationMs =
durations.length > 0
? durations.reduce((a, b) => a + b, 0) / durations.length
: 0
const date = m.uploadedAt
? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
: m.runId.slice(0, 15)
const model = m.agentConfig?.model || 'unknown'
const dataset = m.dataset || m.runId
const agentType = m.agentConfig?.type || 'unknown'
const configName = extractConfigName(m.runId)
return {
runId: m.runId,
configName,
date,
avgScore,
total,
completed,
failed,
timeout,
avgDurationMs,
model,
dataset,
agentType,
}
})
.sort((a, b) => a.date.localeCompare(b.date))
const runs: RunSummary[] = buildRunSummaries(manifests)
// Step 3: Identify unique config groups
// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
// Extract config name by stripping the date-time suffix pattern
function escHtml(s: string): string {
return s
.replace(/&/g, '&amp;')
@@ -193,12 +104,6 @@ function escHtml(s: string): string {
.replace(/"/g, '&quot;')
}
function extractConfigName(runId: string): string {
// "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
// "ci-weekly" → "ci-weekly" (no timestamp, old format)
return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
}
const configGroups = [...new Set(runs.map((r) => r.configName))]
const defaultConfig = configGroups.includes('ci-weekly')
? 'ci-weekly'

View File

@@ -0,0 +1,238 @@
import { writeFile } from 'node:fs/promises'
import { join } from 'node:path'
import { DEFAULT_TIMEOUT_MS } from '../../constants'
import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
import { withEvalTimeout } from '../../utils/with-eval-timeout'
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
import {
type ClaudeCodeProcessRunner,
createClaudeCodeProcessRunner,
} from './process-runner'
import {
ClaudeCodeStreamParser,
shouldCaptureScreenshotForTool,
} from './stream-parser'
export interface ClaudeCodeEvaluatorDeps {
processRunner?: ClaudeCodeProcessRunner
}
export class ClaudeCodeEvaluator implements AgentEvaluator {
private processRunner: ClaudeCodeProcessRunner
constructor(
private ctx: AgentContext,
deps: ClaudeCodeEvaluatorDeps = {},
) {
this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
}
async execute(): Promise<AgentResult> {
const { config, task, capture, taskOutputDir } = this.ctx
const startTime = Date.now()
const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
await capture.messageLogger.logUser(task.query)
if (config.agent.type !== 'claude-code') {
throw new Error('ClaudeCodeEvaluator only supports claude-code config')
}
const agentConfig = config.agent
const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
await writeFile(
mcpConfigPath,
JSON.stringify(
buildClaudeCodeMcpConfig(config.browseros.server_url),
null,
2,
),
)
const parser = new ClaudeCodeStreamParser()
const toolNamesById = new Map<string, string>()
const prompt = buildClaudeCodePrompt(task.query)
const args = buildClaudeCodeArgs({
prompt,
mcpConfigPath,
config: agentConfig,
})
const { terminationReason } = await withEvalTimeout(
timeoutMs,
capture,
async (signal) => {
const runResult = await this.processRunner.run({
executable: agentConfig.claudePath,
args,
cwd: taskOutputDir,
signal,
onStdoutLine: async (line) => {
const events = parser.pushLine(line)
for (const event of events) {
await this.handleStreamEvent(event, toolNamesById)
}
},
})
if (runResult.exitCode !== 0) {
const message =
runResult.stderr.trim() ||
`Claude Code exited with status ${runResult.exitCode}`
capture.addError('agent_execution', message, {
exitCode: runResult.exitCode,
})
if (!parser.getLastText()) {
throw new Error(message)
}
}
for (const error of runResult.streamErrors ?? []) {
capture.addWarning(
'message_logging',
`Claude Code stream event processing failed: ${error}`,
)
}
return runResult
},
)
const endTime = Date.now()
const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
const metadata = {
query_id: task.query_id,
dataset: task.dataset,
query: task.query,
started_at: new Date(startTime).toISOString(),
completed_at: new Date(endTime).toISOString(),
total_duration_ms: endTime - startTime,
total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
termination_reason: terminationReason,
final_answer: finalAnswer,
errors: capture.getErrors(),
warnings: capture.getWarnings(),
device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
agent_config: {
type: 'claude-code' as const,
model: agentConfig.model,
},
grader_results: {},
}
await capture.trajectorySaver.saveMetadata(metadata)
return {
metadata,
messages: capture.getMessages(),
finalAnswer,
}
}
private async handleStreamEvent(
event: UIMessageStreamEvent,
toolNamesById: Map<string, string>,
): Promise<void> {
const { capture, task } = this.ctx
let screenshot: number | undefined
if (event.type === 'tool-input-available') {
toolNamesById.set(event.toolCallId, event.toolName)
if (isPageInput(event.input)) {
capture.setActivePageId(event.input.page)
}
}
if (
event.type === 'tool-output-available' ||
event.type === 'tool-output-error'
) {
const toolName = toolNamesById.get(event.toolCallId)
if (toolName && shouldCaptureScreenshotForTool(toolName)) {
screenshot = await this.captureScreenshot()
}
}
await capture.messageLogger.logStreamEvent(event, screenshot)
capture.emitEvent(task.query_id, {
...event,
...(screenshot !== undefined && { screenshot }),
})
}
private async captureScreenshot(): Promise<number | undefined> {
const { capture, task } = this.ctx
try {
const screenshot = await capture.screenshot.capture(
capture.getActivePageId(),
)
capture.emitEvent(task.query_id, {
type: 'screenshot-captured',
screenshot,
})
return screenshot
} catch {
return undefined
}
}
}
function isPageInput(input: unknown): input is { page: number } {
return (
typeof input === 'object' &&
input !== null &&
'page' in input &&
typeof input.page === 'number'
)
}
function buildClaudeCodePrompt(taskQuery: string): string {
return [
'You are running inside BrowserOS eval.',
'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
'When the task is complete, respond with the final answer only.',
'If blocked, explain the blocker clearly.',
'',
`Task: ${taskQuery}`,
].join('\n')
}
function buildClaudeCodeArgs({
prompt,
mcpConfigPath,
config,
}: {
prompt: string
mcpConfigPath: string
config: ClaudeCodeAgentConfig
}): string[] {
const args = [
'-p',
prompt,
'--mcp-config',
mcpConfigPath,
'--strict-mcp-config',
'--output-format',
'stream-json',
'--verbose',
]
if (config.model) args.push('--model', config.model)
args.push(...config.extraArgs)
return args
}
function buildClaudeCodeMcpConfig(serverUrl: string) {
const trimmed = serverUrl.replace(/\/$/, '')
const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
return {
mcpServers: {
browseros: {
type: 'http',
url,
headers: { 'X-BrowserOS-Source': 'sdk-internal' },
},
},
}
}

View File

@@ -0,0 +1,114 @@
export interface ClaudeCodeRunOptions {
executable: string
args: string[]
cwd: string
signal?: AbortSignal
onStdoutLine: (line: string) => Promise<void>
}
export interface ClaudeCodeRunResult {
exitCode: number
stderr: string
streamErrors?: string[]
}
export interface ClaudeCodeProcessRunner {
run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
}
export interface SpawnOptions {
cwd: string
signal?: AbortSignal
onStdoutLine: (line: string) => Promise<void>
}
export interface CreateClaudeCodeProcessRunnerDeps {
spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
}
export function createClaudeCodeProcessRunner(
deps: CreateClaudeCodeProcessRunnerDeps = {},
): ClaudeCodeProcessRunner {
const spawn = deps.spawn ?? spawnClaudeCode
return {
run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
spawn([executable, ...args], { cwd, signal, onStdoutLine }),
}
}
async function spawnClaudeCode(
cmd: string[],
options: SpawnOptions,
): Promise<ClaudeCodeRunResult> {
const proc = Bun.spawn({
cmd,
cwd: options.cwd,
stdin: 'ignore',
stdout: 'pipe',
stderr: 'pipe',
})
const abort = () => {
try {
proc.kill('SIGTERM')
} catch {
// Process may already have exited.
}
}
options.signal?.addEventListener('abort', abort, { once: true })
try {
const streamErrors: string[] = []
const stdoutPromise = readLines(
proc.stdout,
options.onStdoutLine,
streamErrors,
)
const stderrPromise = new Response(proc.stderr).text()
const exitCode = await proc.exited
await stdoutPromise
const stderr = await stderrPromise
return { exitCode, stderr, streamErrors }
} finally {
options.signal?.removeEventListener('abort', abort)
}
}
async function readLines(
stream: ReadableStream<Uint8Array>,
onLine: (line: string) => Promise<void>,
streamErrors: string[],
): Promise<void> {
const reader = stream.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() ?? ''
for (const line of lines) {
await emitLine(line, onLine, streamErrors)
}
}
buffer += decoder.decode()
if (buffer.length > 0) {
await emitLine(buffer, onLine, streamErrors)
}
}
async function emitLine(
line: string,
onLine: (line: string) => Promise<void>,
streamErrors: string[],
): Promise<void> {
try {
await onLine(line)
} catch (error) {
streamErrors.push(error instanceof Error ? error.message : String(error))
}
}

View File

@@ -0,0 +1,142 @@
import { randomUUID } from 'node:crypto'
import type { UIMessageStreamEvent } from '../../types'
type JsonObject = Record<string, unknown>
export class ClaudeCodeStreamParser {
private lastText: string | null = null
private toolCallCount = 0
pushLine(line: string): UIMessageStreamEvent[] {
const trimmed = line.trim()
if (!trimmed) return []
let parsed: unknown
try {
parsed = JSON.parse(trimmed)
} catch {
return []
}
if (!isObject(parsed)) return []
if (parsed.type === 'assistant') {
return this.parseAssistantMessage(parsed)
}
if (parsed.type === 'user') {
return this.parseUserMessage(parsed)
}
if (parsed.type === 'result' && typeof parsed.result === 'string') {
this.lastText = parsed.result
}
return []
}
getLastText(): string | null {
return this.lastText
}
getToolCallCount(): number {
return this.toolCallCount
}
private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
const content = contentBlocks(message)
const events: UIMessageStreamEvent[] = []
for (const block of content) {
if (block.type === 'text' && typeof block.text === 'string') {
const id = randomUUID()
this.lastText = block.text
events.push(
{ type: 'text-start', id },
{ type: 'text-delta', id, delta: block.text },
{ type: 'text-end', id },
)
} else if (
block.type === 'tool_use' &&
typeof block.id === 'string' &&
typeof block.name === 'string'
) {
this.toolCallCount++
events.push({
type: 'tool-input-available',
toolCallId: block.id,
toolName: block.name,
input: block.input,
})
}
}
return events
}
private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
const content = contentBlocks(message)
const events: UIMessageStreamEvent[] = []
for (const block of content) {
if (
block.type !== 'tool_result' ||
typeof block.tool_use_id !== 'string'
) {
continue
}
if (block.is_error === true) {
events.push({
type: 'tool-output-error',
toolCallId: block.tool_use_id,
errorText: stringifyToolContent(block.content),
})
} else {
events.push({
type: 'tool-output-available',
toolCallId: block.tool_use_id,
output: normalizeToolContent(block.content),
})
}
}
return events
}
}
export function shouldCaptureScreenshotForTool(toolName: string): boolean {
if (!toolName.startsWith('mcp__browseros__')) return false
return !toolName.endsWith('__take_screenshot')
}
function contentBlocks(message: JsonObject): JsonObject[] {
const inner = isObject(message.message) ? message.message : message
return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
}
function isObject(value: unknown): value is JsonObject {
return typeof value === 'object' && value !== null
}
function normalizeToolContent(content: unknown): unknown {
if (!Array.isArray(content)) return content
return content.map((item) => {
if (
isObject(item) &&
item.type === 'text' &&
typeof item.text === 'string'
) {
return item.text
}
return item
})
}
function stringifyToolContent(content: unknown): string {
const normalized = normalizeToolContent(content)
if (typeof normalized === 'string') return normalized
try {
return JSON.stringify(normalized)
} catch {
return String(normalized)
}
}

View File

@@ -1,3 +1,4 @@
import { ClaudeCodeEvaluator } from './claude-code'
import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
import { SingleAgentEvaluator } from './single-agent'
import type { AgentContext, AgentEvaluator } from './types'
@@ -8,6 +9,8 @@ export function createAgent(context: AgentContext): AgentEvaluator {
return new SingleAgentEvaluator(context)
case 'orchestrator-executor':
return new OrchestratorExecutorEvaluator(context)
case 'claude-code':
return new ClaudeCodeEvaluator(context)
}
}

View File

@@ -1,113 +1,67 @@
import { randomUUID } from 'node:crypto'
import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
import { sleep } from '../../../../utils/sleep'
import type {
ExecutorConfig,
ExecutorResult,
} from '../../../orchestrator-executor/types'
import type { ExecutorCallbacks } from '../../executor-backend'
import {
CLADO_REQUEST_TIMEOUT_MS,
MAX_ACTIONS_PER_DELEGATION,
} from '../../constants'
import { McpClient, type McpToolResult } from '../../utils/mcp-client'
import { sleep } from '../../utils/sleep'
import type { ExecutorCallbacks } from './executor'
import type { ExecutorConfig, ExecutorResult } from './types'
extractCladoThinking,
formatCladoHistory,
getCladoActionSignature,
parseCladoActions,
summarizeCladoPrediction,
} from './clado-actions'
import {
normalizeCladoDirection,
normalizeCladoPressKey,
normalizeCladoScrollAmount,
prepareCladoToolArgs,
resolveCladoPoint,
} from './clado-browser-driver'
import { CladoActionClient } from './clado-client'
import {
CLADO_ACTION_PROVIDER,
type CladoAction,
type CladoActionPoint,
type CladoActionResponse,
type CladoViewport,
isCladoActionProvider,
} from './types'
const CLADO_ACTION_PROVIDER = 'clado-action'
const PAGE_SCOPED_TOOLS = new Set<string>([
'take_screenshot',
'evaluate_script',
'click',
'click_at',
'hover',
'hover_at',
'clear',
'fill',
'press_key',
'type_at',
'drag',
'drag_at',
'scroll',
'handle_dialog',
'select_option',
'navigate_page',
'close_page',
'wait_for',
])
interface CladoActionResponse {
action?: string
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
inference_time_seconds?: number
raw_response?: string
}
interface Viewport {
width: number
height: number
}
interface CladoAction {
action: string
x?: number
y?: number
text?: string
key?: string
direction?: string
startX?: number
startY?: number
endX?: number
endY?: number
amount?: number
time?: number
}
type RawActionPayload = Partial<CladoAction>
interface ActionPoint {
x: number
y: number
}
const MAX_CONSECUTIVE_PARSE_FAILURES = 3
function asErrorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error)
}
function clampNormalized(value: number): number {
return Math.min(999, Math.max(0, Math.round(value)))
}
function isCladoProvider(provider: string): boolean {
return provider === CLADO_ACTION_PROVIDER
}
export class CladoActionExecutor {
private readonly mcpClient: McpClient
private readonly cladoClient: CladoActionClient
private readonly pageId: number
private callbacks: ExecutorCallbacks = {}
private stepsUsed = 0
private viewport: Viewport | null = null
private lastPoint: ActionPoint | null = null
private viewport: CladoViewport | null = null
private lastPoint: CladoActionPoint | null = null
private currentUrl = ''
constructor(
private readonly config: ExecutorConfig,
config: ExecutorConfig,
serverUrl: string,
readonly _windowId?: number,
readonly _tabId?: number,
initialPageId?: number,
) {
if (!isCladoProvider(config.provider)) {
if (!isCladoActionProvider(config.provider)) {
throw new Error(
`CladoActionExecutor requires provider="${CLADO_ACTION_PROVIDER}"`,
)
}
this.mcpClient = new McpClient(`${serverUrl}/mcp`)
this.cladoClient = new CladoActionClient({
baseUrl: config.baseUrl,
apiKey: config.apiKey,
})
this.pageId = initialPageId ?? 1
}
@@ -135,6 +89,8 @@ export class CladoActionExecutor {
const actionHistory: CladoAction[] = []
let predictionCalls = 0
const thinkingTrace: string[] = []
let consecutiveParseFailures = 0
let finalAnswer: string | undefined
let status: ExecutorResult['status'] = 'done'
let reason = 'Goal executed.'
@@ -155,7 +111,7 @@ export class CladoActionExecutor {
break
}
const historyForPrediction = this.formatHistory(actionHistory)
const historyForPrediction = formatCladoHistory(actionHistory)
const actionToolCallId = randomUUID()
const predictionInput = {
instruction,
@@ -177,7 +133,7 @@ export class CladoActionExecutor {
signal,
)
predictionCalls++
const thinking = this.extractThinking(prediction.raw_response)
const thinking = extractCladoThinking(prediction.raw_response)
if (thinking) {
const previous = thinkingTrace[thinkingTrace.length - 1]
if (previous !== thinking) {
@@ -207,8 +163,19 @@ export class CladoActionExecutor {
break
}
const predictedActions = this.parseActions(prediction)
const predictedActions = parseCladoActions(prediction)
if (predictedActions.length === 0) {
// Per Clado contract: HTTP 200 with action=null on parse failure.
// Count as an invalid step so the model can self-correct on the
// next call instead of dropping the trajectory.
consecutiveParseFailures++
const parseError =
prediction.parse_error ?? 'no parsable <answer> in raw_response'
actionHistory.push({
action: 'invalid',
text: `parse_error: ${parseError}`,
})
this.stepsUsed++
await this.callbacks.onStepFinish?.({
toolCalls: [
{
@@ -222,16 +189,23 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: [],
parseError,
consecutiveParseFailures,
},
},
],
})
status = 'blocked'
reason = 'Clado action response did not contain a valid action.'
break
if (consecutiveParseFailures >= MAX_CONSECUTIVE_PARSE_FAILURES) {
status = 'blocked'
reason = `Clado returned ${consecutiveParseFailures} consecutive unparseable responses.`
break
}
continue
}
consecutiveParseFailures = 0
let requestedStop = false
const executionNotes: string[] = []
@@ -257,7 +231,7 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: predictedActions,
executed: executionNotes,
},
@@ -272,7 +246,12 @@ export class CladoActionExecutor {
actionHistory.push(predictedAction)
if (predictedAction.action === 'end') {
reason = 'Model requested end() and marked task complete.'
if (predictedAction.final_answer) {
finalAnswer = predictedAction.final_answer
reason = `Model requested end() with final_answer: ${predictedAction.final_answer.slice(0, 240)}`
} else {
reason = 'Model requested end() and marked task complete.'
}
requestedStop = true
break
}
@@ -293,7 +272,7 @@ export class CladoActionExecutor {
toolCallId: actionToolCallId,
toolName: 'clado_action_predict',
output: {
prediction: this.summarizePrediction(prediction),
prediction: summarizeCladoPrediction(prediction),
parsedActions: predictedActions,
executed: executionNotes,
},
@@ -327,6 +306,7 @@ export class CladoActionExecutor {
actions: actionHistory,
url: this.currentUrl,
thinkingTrace,
finalAnswer,
})
return {
@@ -344,121 +324,12 @@ export class CladoActionExecutor {
actionHistory: CladoAction[],
signal?: AbortSignal,
): Promise<CladoActionResponse> {
if (!this.config.baseUrl) {
throw new Error('executor.baseUrl must be set for clado-action provider')
}
const requestController = new AbortController()
const onAbort = () => requestController.abort()
signal?.addEventListener('abort', onAbort, { once: true })
const timeoutHandle = setTimeout(() => {
requestController.abort()
}, CLADO_REQUEST_TIMEOUT_MS)
try {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (this.config.apiKey) {
headers.Authorization = `Bearer ${this.config.apiKey}`
}
const response = await fetch(this.config.baseUrl, {
method: 'POST',
headers,
body: JSON.stringify({
instruction,
image_base64: imageBase64,
history: this.formatHistory(actionHistory),
}),
signal: requestController.signal,
})
if (!response.ok) {
const body = await response.text()
throw new Error(
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
)
}
return (await response.json()) as CladoActionResponse
} finally {
clearTimeout(timeoutHandle)
signal?.removeEventListener('abort', onAbort)
}
}
private parseActions(prediction: CladoActionResponse): CladoAction[] {
const actionFromField =
typeof prediction.action === 'string' ? prediction.action : null
const rawActions = this.parseActionsFromRawResponse(prediction.raw_response)
const primaryFromRaw = rawActions[0] ?? null
const mergedPrimary = {
...primaryFromRaw,
...prediction,
action: actionFromField ?? primaryFromRaw?.action,
}
const normalized: CladoAction[] = []
const primary = this.normalizeActionPayload(mergedPrimary)
if (primary) normalized.push(primary)
for (const candidate of rawActions.slice(1)) {
const parsed = this.normalizeActionPayload(candidate)
if (!parsed) continue
const prev = normalized[normalized.length - 1]
if (
!prev ||
this.getActionSignature(prev) !== this.getActionSignature(parsed)
) {
normalized.push(parsed)
}
}
return normalized
}
private normalizeActionPayload(
payload: RawActionPayload,
): CladoAction | null {
if (!payload.action || typeof payload.action !== 'string') {
return null
}
return {
action: payload.action,
x: typeof payload.x === 'number' ? payload.x : undefined,
y: typeof payload.y === 'number' ? payload.y : undefined,
text: typeof payload.text === 'string' ? payload.text : undefined,
key: typeof payload.key === 'string' ? payload.key : undefined,
direction:
typeof payload.direction === 'string' ? payload.direction : undefined,
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
time: typeof payload.time === 'number' ? payload.time : undefined,
}
}
private parseActionsFromRawResponse(
rawResponse: string | undefined,
): RawActionPayload[] {
if (!rawResponse) return []
const matches = [
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
]
const parsed: RawActionPayload[] = []
for (const match of matches) {
try {
parsed.push(JSON.parse(match[1]) as RawActionPayload)
} catch {
// ignore malformed answer blocks
}
}
return parsed
return this.cladoClient.requestActionPrediction({
instruction,
imageBase64,
actionHistory,
signal,
})
}
private async executeAction(
@@ -529,14 +400,14 @@ export class CladoActionExecutor {
}
case 'press_key': {
const key = this.normalizePressKey(action.key)
const key = normalizeCladoPressKey(action.key)
await this.runTool('press_key', { key }, signal)
return `Pressed key "${key}".`
}
case 'scroll': {
const direction = this.normalizeDirection(action.direction)
const amountPx = this.normalizeScrollAmount(action.amount)
const direction = normalizeCladoDirection(action.direction)
const amountPx = normalizeCladoScrollAmount(action.amount)
const ticks = Math.max(1, Math.round(amountPx / 120))
await this.runTool('scroll', { direction, amount: ticks }, signal)
@@ -578,7 +449,9 @@ export class CladoActionExecutor {
}
case 'end': {
return 'Model requested end().'
return action.final_answer
? `Model requested end() with final_answer: ${action.final_answer.slice(0, 240)}`
: 'Model requested end().'
}
default: {
@@ -588,9 +461,10 @@ export class CladoActionExecutor {
}
private async captureScreenshotBase64(signal?: AbortSignal): Promise<string> {
// Clado contract is PNG or JPEG; use PNG for lossless input.
const result = await this.runTool(
'take_screenshot',
{ format: 'webp', quality: 80 },
{ format: 'png' },
signal,
)
@@ -604,7 +478,7 @@ export class CladoActionExecutor {
return image.data
}
private async getViewport(signal?: AbortSignal): Promise<Viewport> {
private async getViewport(signal?: AbortSignal): Promise<CladoViewport> {
if (this.viewport) return this.viewport
try {
@@ -635,15 +509,9 @@ export class CladoActionExecutor {
normalizedX: number | undefined,
normalizedY: number | undefined,
signal?: AbortSignal,
): Promise<ActionPoint> {
): Promise<CladoActionPoint> {
const viewport = await this.getViewport(signal)
const nx = clampNormalized(normalizedX ?? 500)
const ny = clampNormalized(normalizedY ?? 500)
return {
x: Math.round((nx / 1000) * viewport.width),
y: Math.round((ny / 1000) * viewport.height),
}
return resolveCladoPoint(viewport, normalizedX, normalizedY)
}
private async getCurrentUrl(signal?: AbortSignal): Promise<string> {
@@ -670,7 +538,7 @@ export class CladoActionExecutor {
throw new Error('aborted')
}
const toolArgs = this.prepareToolArgs(toolName, args)
const toolArgs = prepareCladoToolArgs(toolName, args, this.pageId)
try {
const raw = await this.mcpClient.callTool(toolName, toolArgs)
@@ -689,211 +557,22 @@ export class CladoActionExecutor {
}
}
private prepareToolArgs(
toolName: string,
args: Record<string, unknown>,
): Record<string, unknown> {
const prepared: Record<string, unknown> = { ...args }
if (
toolName === 'evaluate_script' &&
typeof prepared.function === 'string' &&
prepared.expression === undefined
) {
prepared.expression = this.toEvaluateExpression(prepared.function)
delete prepared.function
}
if (
toolName === 'click_at' &&
typeof prepared.dblClick === 'boolean' &&
prepared.clickCount === undefined
) {
prepared.clickCount = prepared.dblClick ? 2 : 1
delete prepared.dblClick
}
// Use fixed page ID for all page-scoped tools (single-page operation)
if (PAGE_SCOPED_TOOLS.has(toolName) && typeof prepared.page !== 'number') {
prepared.page = this.pageId
}
return prepared
}
private toEvaluateExpression(rawFunction: unknown): string {
const source = String(rawFunction).trim()
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
return `(${source})()`
}
if (source.startsWith('function')) {
return `(${source})()`
}
return source
}
private normalizePressKey(key: string | undefined): string {
const raw = (key ?? '').trim()
if (!raw) throw new Error('press_key action missing key field')
const map: Record<string, string> = {
'C-a': 'Control+A',
'C-c': 'Control+C',
'C-v': 'Control+V',
'C-x': 'Control+X',
'C-z': 'Control+Z',
'C-y': 'Control+Y',
'C-s': 'Control+S',
'C-t': 'Control+T',
'C-w': 'Control+W',
'C-h': 'Control+H',
'C-f': 'Control+F',
'C-+': 'Control++',
'C--': 'Control+-',
'C-tab': 'Control+Tab',
'C-S-tab': 'Control+Shift+Tab',
'C-S-n': 'Control+Shift+N',
'C-down': 'Control+ArrowDown',
'M-f4': 'Alt+F4',
}
return map[raw] ?? raw
}
private normalizeDirection(
direction: string | undefined,
): 'up' | 'down' | 'left' | 'right' {
if (
direction === 'up' ||
direction === 'down' ||
direction === 'left' ||
direction === 'right'
) {
return direction
}
return 'down'
}
private normalizeScrollAmount(amount: number | undefined): number {
if (typeof amount !== 'number') return 500
if (amount <= 0) return 100
const clamped = Math.min(amount, 1000)
return Math.max(100, Math.round((clamped / 1000) * 900))
}
private summarizePrediction(
prediction: CladoActionResponse,
): Record<string, unknown> {
const preview =
typeof prediction.raw_response === 'string' &&
prediction.raw_response.length > 0
? prediction.raw_response.slice(0, 240)
: undefined
return {
action: prediction.action,
x: prediction.x,
y: prediction.y,
text: prediction.text,
key: prediction.key,
direction: prediction.direction,
startX: prediction.startX,
startY: prediction.startY,
endX: prediction.endX,
endY: prediction.endY,
amount: prediction.amount,
time: prediction.time,
inference_time_seconds: prediction.inference_time_seconds,
raw_response_preview: preview,
}
}
private extractThinking(rawResponse: string | undefined): string | undefined {
if (!rawResponse) return undefined
const matches = [
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
]
if (matches.length === 0) return undefined
const merged = matches
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
.filter((value) => value.length > 0)
.join(' ')
if (!merged) return undefined
return merged
}
private getActionSignature(action: CladoAction): string {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
case 'type':
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
case 'press_key':
return `${action.action}:${action.key ?? 'key'}`
case 'scroll':
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
case 'drag':
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
case 'wait':
return `${action.action}:${action.time ?? 1}`
case 'end':
return 'end()'
default:
return action.action
}
}
private formatHistory(actions: CladoAction[]): string {
if (actions.length === 0) return 'None'
const parts = actions.map((action) => {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
case 'type': {
const text = (action.text ?? '').replace(/'/g, "\\'")
return `type('${text}')`
}
case 'press_key':
return `press_key('${action.key ?? 'Enter'}')`
case 'scroll':
return `scroll(${action.direction ?? 'down'})`
case 'drag':
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
case 'wait':
return `wait(${Math.round(action.time ?? 1)}s)`
case 'end':
return 'end()'
default:
return action.action
}
})
return parts.join(' -> ')
}
private buildObservation(params: {
status: ExecutorResult['status']
reason: string
actions: CladoAction[]
url: string
thinkingTrace: string[]
finalAnswer?: string
}): string {
const { status, reason, actions, url, thinkingTrace } = params
const { status, reason, actions, url, thinkingTrace, finalAnswer } = params
const actionSummary =
actions.length === 0
? 'No actions were executed.'
: actions
.slice(-5)
.map(
(action, idx) => `${idx + 1}. ${this.getActionSignature(action)}`,
(action, idx) => `${idx + 1}. ${getCladoActionSignature(action)}`,
)
.join('\n')
const thinkingSummary =
@@ -907,6 +586,7 @@ export class CladoActionExecutor {
`Status: ${status}`,
`Reason: ${reason}`,
`URL: ${url || 'unknown'}`,
finalAnswer ? `Final answer: ${finalAnswer}` : '',
'',
'Recent actions:',
actionSummary,

View File

@@ -0,0 +1,191 @@
import type {
CladoAction,
CladoActionResponse,
RawCladoActionPayload,
} from './types'
/** Parses Clado's structured response plus any raw `<answer>` blocks into executable actions. */
export function parseCladoActions(
prediction: CladoActionResponse,
): CladoAction[] {
const actionFromField =
typeof prediction.action === 'string' ? prediction.action : null
const rawActions = parseCladoActionsFromRawResponse(prediction.raw_response)
const primaryFromRaw = rawActions[0] ?? null
const mergedPrimary = {
...primaryFromRaw,
...prediction,
action: actionFromField ?? primaryFromRaw?.action,
}
const normalized: CladoAction[] = []
const primary = normalizeCladoActionPayload(mergedPrimary)
if (primary) normalized.push(primary)
for (const candidate of rawActions.slice(1)) {
const parsed = normalizeCladoActionPayload(candidate)
if (!parsed) continue
const prev = normalized[normalized.length - 1]
if (
!prev ||
getCladoActionSignature(prev) !== getCladoActionSignature(parsed)
) {
normalized.push(parsed)
}
}
return normalized
}
export function normalizeCladoActionPayload(
payload: RawCladoActionPayload,
): CladoAction | null {
if (!payload.action || typeof payload.action !== 'string') {
return null
}
return {
action: payload.action,
x: typeof payload.x === 'number' ? payload.x : undefined,
y: typeof payload.y === 'number' ? payload.y : undefined,
text: typeof payload.text === 'string' ? payload.text : undefined,
key: typeof payload.key === 'string' ? payload.key : undefined,
direction:
typeof payload.direction === 'string' ? payload.direction : undefined,
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
time: typeof payload.time === 'number' ? payload.time : undefined,
final_answer:
typeof payload.final_answer === 'string'
? payload.final_answer
: undefined,
}
}
export function parseCladoActionsFromRawResponse(
rawResponse: string | undefined,
): RawCladoActionPayload[] {
if (!rawResponse) return []
const matches = [
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
]
const parsed: RawCladoActionPayload[] = []
for (const match of matches) {
try {
parsed.push(JSON.parse(match[1]) as RawCladoActionPayload)
} catch {
// Ignore malformed answer blocks so one bad block does not drop the whole prediction.
}
}
return parsed
}
export function extractCladoThinking(
rawResponse: string | undefined,
): string | undefined {
if (!rawResponse) return undefined
const matches = [
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
]
if (matches.length === 0) return undefined
const merged = matches
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
.filter((value) => value.length > 0)
.join(' ')
if (!merged) return undefined
return merged
}
export function summarizeCladoPrediction(
prediction: CladoActionResponse,
): Record<string, unknown> {
const preview =
typeof prediction.raw_response === 'string' &&
prediction.raw_response.length > 0
? prediction.raw_response.slice(0, 240)
: undefined
return {
action: prediction.action,
x: prediction.x,
y: prediction.y,
text: prediction.text,
key: prediction.key,
direction: prediction.direction,
startX: prediction.startX,
startY: prediction.startY,
endX: prediction.endX,
endY: prediction.endY,
amount: prediction.amount,
time: prediction.time,
inference_time_seconds: prediction.inference_time_seconds,
raw_response_preview: preview,
}
}
export function getCladoActionSignature(action: CladoAction): string {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
case 'type':
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
case 'press_key':
return `${action.action}:${action.key ?? 'key'}`
case 'scroll':
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
case 'drag':
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
case 'wait':
return `${action.action}:${action.time ?? 1}`
case 'end':
return action.final_answer
? `end(${action.final_answer.slice(0, 32)})`
: 'end()'
case 'invalid':
return `invalid(${(action.text ?? '').slice(0, 40)})`
default:
return action.action
}
}
export function formatCladoHistory(actions: CladoAction[]): string {
if (actions.length === 0) return 'None'
const parts = actions.map((action) => {
switch (action.action) {
case 'click':
case 'double_click':
case 'right_click':
case 'hover':
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
case 'type': {
const text = (action.text ?? '').replace(/'/g, "\\'")
return `type('${text}')`
}
case 'press_key':
return `press_key('${action.key ?? 'Enter'}')`
case 'scroll':
return `scroll(${action.direction ?? 'down'})`
case 'drag':
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
case 'wait':
return `wait(${Math.round(action.time ?? 1)}s)`
case 'end':
return 'end()'
case 'invalid':
return 'invalid()'
default:
return action.action
}
})
return parts.join(' -> ')
}

View File

@@ -0,0 +1,123 @@
import {
CLADO_PAGE_SCOPED_TOOLS,
type CladoActionPoint,
type CladoViewport,
} from './types'
export function clampCladoNormalizedCoordinate(value: number): number {
return Math.min(999, Math.max(0, Math.round(value)))
}
/** Converts Clado's 0-1000 normalized coordinate space into BrowserOS viewport pixels. */
export function resolveCladoPoint(
viewport: CladoViewport,
normalizedX: number | undefined,
normalizedY: number | undefined,
): CladoActionPoint {
const nx = clampCladoNormalizedCoordinate(normalizedX ?? 500)
const ny = clampCladoNormalizedCoordinate(normalizedY ?? 500)
return {
x: Math.round((nx / 1000) * viewport.width),
y: Math.round((ny / 1000) * viewport.height),
}
}
/** Adapts Clado action tool arguments to the BrowserOS MCP tool argument contract. */
export function prepareCladoToolArgs(
toolName: string,
args: Record<string, unknown>,
pageId: number,
): Record<string, unknown> {
const prepared: Record<string, unknown> = { ...args }
if (
toolName === 'evaluate_script' &&
typeof prepared.function === 'string' &&
prepared.expression === undefined
) {
prepared.expression = toCladoEvaluateExpression(prepared.function)
delete prepared.function
}
if (
toolName === 'click_at' &&
typeof prepared.dblClick === 'boolean' &&
prepared.clickCount === undefined
) {
prepared.clickCount = prepared.dblClick ? 2 : 1
delete prepared.dblClick
}
if (
CLADO_PAGE_SCOPED_TOOLS.has(toolName) &&
typeof prepared.page !== 'number'
) {
prepared.page = pageId
}
return prepared
}
export function toCladoEvaluateExpression(rawFunction: unknown): string {
const source = String(rawFunction).trim()
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
return `(${source})()`
}
if (source.startsWith('function')) {
return `(${source})()`
}
return source
}
export function normalizeCladoPressKey(key: string | undefined): string {
const raw = (key ?? '').trim()
if (!raw) throw new Error('press_key action missing key field')
const map: Record<string, string> = {
'C-a': 'Control+A',
'C-c': 'Control+C',
'C-v': 'Control+V',
'C-x': 'Control+X',
'C-z': 'Control+Z',
'C-y': 'Control+Y',
'C-s': 'Control+S',
'C-t': 'Control+T',
'C-w': 'Control+W',
'C-h': 'Control+H',
'C-f': 'Control+F',
'C-+': 'Control++',
'C--': 'Control+-',
'C-tab': 'Control+Tab',
'C-S-tab': 'Control+Shift+Tab',
'C-S-n': 'Control+Shift+N',
'C-down': 'Control+ArrowDown',
'M-a': 'Meta+A',
'M-c': 'Meta+C',
'M-v': 'Meta+V',
'M-x': 'Meta+X',
'M-f4': 'Alt+F4',
}
return map[raw] ?? raw
}
export function normalizeCladoDirection(
direction: string | undefined,
): 'up' | 'down' | 'left' | 'right' {
if (
direction === 'up' ||
direction === 'down' ||
direction === 'left' ||
direction === 'right'
) {
return direction
}
return 'down'
}
export function normalizeCladoScrollAmount(amount: number | undefined): number {
if (typeof amount !== 'number') return 500
if (amount <= 0) return 100
const clamped = Math.min(amount, 1000)
return Math.max(100, Math.round((clamped / 1000) * 900))
}

View File

@@ -0,0 +1,68 @@
import { CLADO_REQUEST_TIMEOUT_MS } from '../../../../constants'
import { formatCladoHistory } from './clado-actions'
import type { CladoAction, CladoActionResponse } from './types'
export interface CladoActionClientOptions {
baseUrl?: string
apiKey?: string
}
export interface CladoActionPredictionInput {
instruction: string
imageBase64: string
actionHistory: CladoAction[]
signal?: AbortSignal
}
/** Calls the Clado action model without exposing credentials in process arguments or artifacts. */
export class CladoActionClient {
constructor(private readonly options: CladoActionClientOptions) {}
async requestActionPrediction(
input: CladoActionPredictionInput,
): Promise<CladoActionResponse> {
if (!this.options.baseUrl) {
throw new Error('executor.baseUrl must be set for clado-action provider')
}
const requestController = new AbortController()
const onAbort = () => requestController.abort()
input.signal?.addEventListener('abort', onAbort, { once: true })
const timeoutHandle = setTimeout(() => {
requestController.abort()
}, CLADO_REQUEST_TIMEOUT_MS)
try {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (this.options.apiKey) {
headers.Authorization = `Bearer ${this.options.apiKey}`
}
const response = await fetch(this.options.baseUrl, {
method: 'POST',
headers,
body: JSON.stringify({
instruction: input.instruction,
image_base64: input.imageBase64,
history: formatCladoHistory(input.actionHistory),
}),
signal: requestController.signal,
})
if (!response.ok) {
const body = await response.text()
throw new Error(
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
)
}
return (await response.json()) as CladoActionResponse
} finally {
clearTimeout(timeoutHandle)
input.signal?.removeEventListener('abort', onAbort)
}
}
}

View File

@@ -0,0 +1,56 @@
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
import type {
DelegationResult,
ExecutorBackend,
ExecutorCallbacks,
} from '../../executor-backend'
import { CladoActionExecutor } from './clado-action-executor'
export interface CladoExecutorBackendOptions {
configTemplate: ResolvedAgentConfig
serverUrl: string
initialPageId?: number
callbacks?: ExecutorCallbacks
}
/** Executes delegated goals through the Clado visual action model. */
export class CladoExecutorBackend implements ExecutorBackend {
readonly kind = 'clado'
private executor: CladoActionExecutor | null = null
constructor(private readonly options: CladoExecutorBackendOptions) {}
async execute(
instruction: string,
signal?: AbortSignal,
): Promise<DelegationResult> {
const executor = this.getExecutor()
const result = await executor.execute(instruction, signal)
return result
}
async close(): Promise<void> {
await this.executor?.close()
}
getTotalSteps(): number {
return this.executor?.getTotalSteps() ?? 0
}
private getExecutor(): CladoActionExecutor {
if (this.executor) return this.executor
this.executor = new CladoActionExecutor(
{
provider: this.options.configTemplate.provider,
model: this.options.configTemplate.model,
apiKey: this.options.configTemplate.apiKey ?? '',
baseUrl: this.options.configTemplate.baseUrl,
},
this.options.serverUrl,
this.options.initialPageId,
)
this.executor.setCallbacks(this.options.callbacks ?? {})
return this.executor
}
}

Some files were not shown because too many files have changed in this diff Show More