fix: address review feedback for PR #922

fix: default extract base to BASE_COMMIT
feat: add ACPX agent soul and memory support (#917 )
2026-05-14 08:03:58 +00:00 · 2026-05-02 14:44:20 -07:00 · 2026-05-02 14:31:51 -07:00 · 2026-05-02 13:45:40 -07:00 · 2026-05-02 13:06:41 -07:00 · 2026-05-01 20:16:26 +00:00
293 changed files with 18859 additions and 7580 deletions
--- a/.claude/skills/ask-internal/SKILL.md
+++ b/.claude/skills/ask-internal/SKILL.md
@@ -0,0 +1,152 @@
+---
+name: ask-internal
+description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
+allowed-tools: Bash, Read, Grep, Glob, Edit, Write
+---
+
+# Ask Internal
+
+Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
+
+**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
+
+## When to use
+
+- "How do I reset my dogfood profile?"
+- "What's the deal with the OpenClaw VM startup?"
+- "Where do we configure release signing?"
+- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
+
+## Hard rules — never do these
+
+- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
+- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
+- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
+- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
+- NEVER cite a file or line number you have not actually read.
+
+## Voice rules
+
+Apply the same voice rules as `document-internal` to the synthesized answer:
+
+- Lead with the point.
+- Concrete nouns. Name files, functions, commands.
+- Short sentences. Active voice. No em dashes.
+- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
+- No filler intros.
+
+## Workflow
+
+### Step 0: Pre-flight
+
+```bash
+if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
+  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
+  exit 0
+fi
+[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
+  echo ".internal-docs/ missing or empty. Submodule not configured?"
+  exit 0
+}
+```
+
+### Step 1: Parse the question
+
+Pull the keywords from the user's question. Drop stop words. Identify intent:
+
+- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
+- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
+- **Free-form** ("anything about Y"): search all categories.
+
+### Step 2: Multi-source search
+
+Run grep in parallel across two sources.
+
+**Internal docs:**
+
+```bash
+grep -rni --include='*.md' '<keyword>' .internal-docs/
+```
+
+Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
+
+**Codebase (skip vendored Chromium and `node_modules`):**
+
+```bash
+grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
+     --exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
+     '<keyword>' packages/ scripts/ .config/ .github/
+```
+
+Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
+
+### Step 3: Synthesize answer
+
+Structure the response:
+
+1. **Direct answer.** First sentence answers the question. No preamble.
+2. **Steps if applicable.** Numbered list with exact commands.
+3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
+
+If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
+
+### Step 4: Offer execution (only if commands surfaced)
+
+If Step 3 produced executable commands the user could run, ask:
+
+> Run these for you? (y / n / dry-run)
+
+- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
+- **n:** Skip. Done.
+- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
+
+### Step 5: Doc-not-found path
+
+If Step 2 returned nothing useful (no doc hits AND no clear code answer):
+
+1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
+2. Ask: "Draft a new doc and open a PR to internal-docs?"
+3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
+
+### Step 6: Completion status
+
+Report one of:
+
+- **DONE** — answer delivered, citations verified.
+- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
+- **BLOCKED** — submodule missing or other pre-flight failure.
+- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
+
+## Citation discipline
+
+Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
+
+If a doc says one thing and the code says another, surface the conflict explicitly:
+
+> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
+
+## Common Mistakes
+
+**Skimming and then citing**
+- **Problem:** Citation points to a line that doesn't actually contain the claim.
+- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
+
+**Executing without per-command confirmation for mutations**
+- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
+- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
+
+**Searching only docs, not code**
+- **Problem:** Doc says X but code does Y; answer is wrong.
+- **Fix:** Always grep both sources in Step 2.
+
+## Red Flags
+
+**Never:**
+- Cite a file:line you haven't read.
+- Run mutations without per-command confirmation.
+- Modify BrowserOS code from this skill (use `/document-internal` for writes).
+
+**Always:**
+- Pre-flight check before any search.
+- Reconcile doc vs code conflicts in the answer, don't hide them.
+- Plain "no doc covers this" when grep is empty — never invent.
--- a/.claude/skills/document-internal/SKILL.md
+++ b/.claude/skills/document-internal/SKILL.md
@@ -0,0 +1,208 @@
+---
+name: document-internal
+description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
+allowed-tools: Bash, Read, Write, Edit, Grep, Glob
+---
+
+# Document Internal
+
+Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
+
+**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
+
+## When to use
+
+After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
+
+## Hard rules — never do these
+
+- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
+- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
+- NEVER fabricate filler content for empty template sections. Empty stays empty.
+- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
+- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
+- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
+
+## Voice rules — enforced by Step 4
+
+The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
+
+- Lead with the point. First sentence answers "what is this?"
+- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
+- Short sentences. Average <20 words. No deeply nested clauses.
+- Active voice. "X does Y" not "Y is done by X".
+- No em dashes. Use commas, periods, or rephrase.
+- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
+- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
+- No filler intros ("This document describes..."). Start with the substance.
+- Empty sections stay empty. Do not write "N/A" or fabricate content.
+
+## Workflow
+
+### Step 0: Pre-flight
+
+Bail with a clear message on any failure.
+
+```bash
+# Submodule must be initialized
+if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
+  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
+  exit 0
+fi
+[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
+
+# Must be on a feature branch
+BRANCH=$(git branch --show-current)
+if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
+  echo "On $BRANCH. Run from a feature branch."
+  exit 0
+fi
+
+# Determine base branch (default: dev for this repo, fall back to main).
+# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
+BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
+
+# Gather context
+git log "$BASE..HEAD" --oneline
+git diff "$BASE...HEAD" --stat
+gh pr view --json body -q .body 2>/dev/null  # may be empty if no PR yet
+```
+
+### Step 1: Identify the doc
+
+Ask the user for three things in one prompt:
+
+1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
+2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
+3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
+
+### Step 2: Decision brief — four sharp questions
+
+Ask one question at a time. Each answer constrains the next. These force compression before drafting.
+
+1. "In one sentence: what can someone now DO that they could not before?"
+2. "What is the one design decision a future engineer needs to know?"
+3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
+4. "Any sharp edges or gotchas? (or 'none')"
+
+Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
+
+### Step 3: Draft from the template
+
+Read the matching template from `.internal-docs/_templates/`:
+
+- `feature` → `feature-note.md`
+- `architecture` → `architecture-note.md`
+- `design` → `design-spec.md`
+
+If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
+
+Generate the 1-pager from the template, the four answers, and the diff context.
+
+### Step 4: Voice self-check
+
+Scan the draft for violations:
+
+- Em dash present (`—`).
+- Any banned word from the list.
+- Average sentence length > 20 words.
+- Body line count > 60 (feature notes only — architecture/design have no cap).
+
+If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
+
+If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
+
+### Step 5: Show + iterate
+
+Print the full draft. Ask:
+
+> Edit needed? Paste any changes, or say "looks good".
+
+Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
+
+### Step 6: Open PR to internal-docs
+
+Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
+
+```bash
+TMP=$(mktemp -d)
+trap 'rm -rf "$TMP"' EXIT  # cleans up even if any step below fails
+git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
+cd "$TMP"
+git checkout -b "docs/<slug>"
+
+# Write the doc
+mkdir -p "<type>"  # features, architecture, designs, or setup
+cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
+<draft content>
+DOC
+
+# Update the root README index — insert one line under the matching section
+# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
+
+git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
+git commit -m "docs(<type>): <slug>"
+git push -u origin "docs/<slug>"
+
+PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
+  --head "docs/<slug>" \
+  --title "docs(<type>): <slug>" \
+  --body "$(cat <<'BODY'
+## Summary
+<one-line of what this doc covers>
+
+## Source
+- BrowserOS branch: <branch>
+- Related PR: <#NNN if any>
+BODY
+)")
+
+cd -
+echo "PR opened: $PR_URL"
+# trap above cleans up $TMP on EXIT
+```
+
+If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
+
+### Step 7: Completion status
+
+Report one of:
+
+- **DONE** — file written, branch pushed, PR opened. Print PR URL.
+- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
+- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
+
+## Doc type defaults
+
+| Branch pattern | Default doc type | Default location |
+|----------------|------------------|------------------|
+| `feat/*`       | feature          | `features/`      |
+| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
+| `rfc/*` or `design/*` | design          | `designs/`       |
+| Otherwise      | ask              | ask              |
+
+## Common Mistakes
+
+**Drafting before asking the four questions**
+- **Problem:** Output is generic filler that says nothing concrete.
+- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
+
+**Touching `.internal-docs/` directly**
+- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
+- **Fix:** Always use the tmp clone in Step 6.
+
+**Skipping voice check on user edits**
+- **Problem:** User pastes prose with em dashes or filler; ships as-is.
+- **Fix:** Re-run Step 4 after every user edit.
+
+## Red Flags
+
+**Never:**
+- Push to `internal-docs/main`. Always branch + PR.
+- Modify the OSS repo's `.gitmodules` or submodule pointer.
+- Fabricate content for empty template sections.
+
+**Always:**
+- Pre-flight check before doing any work.
+- One-pager rule for feature notes (60-line body cap).
+- File:line citations when referencing code.
--- a/.claude/skills/document-internal/seeds/README.md
+++ b/.claude/skills/document-internal/seeds/README.md
@@ -0,0 +1,51 @@
+# BrowserOS Internal Docs
+
+Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
+
+If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
+
+## How to find what you need
+
+- Setup task ("how do I X locally") → look in [`setup/`](setup/)
+- Recently shipped feature → look in [`features/`](features/)
+- Cross-cutting subsystem → look in [`architecture/`](architecture/)
+- A design decision or RFC → look in [`designs/`](designs/)
+
+Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
+
+## How to add a doc
+
+Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
+
+## Index
+
+### Setup
+<!-- one line per setup runbook: -->
+<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
+
+### Features
+<!-- one line per shipped feature, newest first: -->
+<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
+
+### Architecture
+<!-- one line per cross-cutting subsystem: -->
+<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
+
+### Designs
+<!-- one line per design spec, newest first: -->
+<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
+
+## Templates
+
+When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
+
+## Voice
+
+Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
+
+- Lead with the point.
+- Concrete nouns. Name files, functions, commands.
+- Short sentences, active voice, no em dashes.
+- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
+- Empty sections stay empty. Do not write "N/A" or fake content.
+- Feature notes target one screen, body 60 lines max.
--- a/.claude/skills/document-internal/seeds/_templates/architecture-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/architecture-note.md
@@ -0,0 +1,31 @@
+---
+title: <subsystem name>
+owner: <github handle>
+status: current | deprecated
+date: YYYY-MM-DD
+related-features: [feature-slug-1, feature-slug-2]
+---
+
+# <subsystem name>
+
+## What this subsystem does
+<1-2 paragraphs. The top-level responsibility. Boundaries.>
+
+## Architecture
+<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
+
+## Constraints
+<Hard rules the design enforces. "X must never call Y" type statements.>
+
+## Decisions made
+<Numbered list of non-obvious decisions and the reason for each.>
+
+## Key files
+- `path/to/file.ts` — role
+- `path/to/dir/` — what lives here
+
+## How to evolve this
+<Where to add things. Which tests to expect to update. What NOT to touch.>
+
+## Open questions
+<What is still being figured out. Empty if none.>
--- a/.claude/skills/document-internal/seeds/_templates/design-spec.md
+++ b/.claude/skills/document-internal/seeds/_templates/design-spec.md
@@ -0,0 +1,34 @@
+---
+title: <design name>
+owner: <github handle>
+status: proposed | accepted | rejected | superseded
+date: YYYY-MM-DD
+supersedes: <design-slug or none>
+---
+
+# <design name>
+
+## Goal
+<2-4 sentences. What this design is trying to accomplish.>
+
+## Context
+<1-2 paragraphs. The current state, what is failing, why this needs to change.>
+
+## Selected Approach
+<The chosen design at a high level. Architecture, components, data flow.>
+
+## Alternatives Considered
+### 1. <name>
+<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
+
+### 2. <name>
+<Same shape.>
+
+## Out of Scope
+<What this design does NOT cover. Defer references.>
+
+## Rollout
+<Numbered steps from "nothing exists" to "fully shipped".>
+
+## Open Questions
+<Resolved during design? Empty. Unresolved? List with owner.>
--- a/.claude/skills/document-internal/seeds/_templates/feature-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/feature-note.md
@@ -0,0 +1,29 @@
+---
+title: <feature name>
+owner: <github handle>
+status: shipped | wip | deprecated
+date: YYYY-MM-DD
+prs: ["#NNN"]
+tags: [agent, browser, mcp]
+---
+
+# <feature name>
+
+## What it does
+<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
+
+## Why we built it
+<1-2 sentences. Motivation. What pain it removed or what unlocked.>
+
+## How it works
+<3-6 sentences. The flow at a high level. Name the key files.>
+
+## Key files
+- `path/to/file.ts` — what it does
+- `path/to/other.ts` — what it does
+
+## How to run / test it locally
+<bullet list of commands. Empty section if N/A — do not fake.>
+
+## Gotchas
+<known sharp edges. "If you see X, that's why." Empty if N/A.>
--- a/.github/workflows/build-agent.yml
+++ b/.github/workflows/build-agent.yml
@@ -1,157 +0,0 @@
-name: build-agent
-
-on:
-  workflow_dispatch:
-    inputs:
-      agent:
-        description: "Agent name from bundle.json"
-        required: true
-        type: string
-        default: openclaw
-      publish:
-        description: "Upload to R2 and merge manifest slice"
-        required: false
-        default: false
-        type: boolean
-  pull_request:
-    paths:
-      - "packages/browseros-agent/packages/build-tools/**"
-      - ".github/workflows/build-agent.yml"
-
-env:
-  BUN_VERSION: "1.3.6"
-  PKG_DIR: packages/browseros-agent/packages/build-tools
-
-permissions:
-  contents: read
-
-jobs:
-  check:
-    runs-on: ubuntu-24.04
-    steps:
-      - uses: actions/checkout@v4
-      - uses: oven-sh/setup-bun@v2
-        with:
-          bun-version: ${{ env.BUN_VERSION }}
-      - working-directory: packages/browseros-agent
-        run: bun install --frozen-lockfile
-      - working-directory: packages/browseros-agent
-        run: bun run --filter @browseros/build-tools typecheck
-      - working-directory: packages/browseros-agent
-        run: bun run --filter @browseros/build-tools test
-
-  build:
-    needs: check
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - arch: arm64
-            runner: ubuntu-24.04-arm
-    runs-on: ${{ matrix.runner }}
-    steps:
-      - uses: actions/checkout@v4
-      - uses: oven-sh/setup-bun@v2
-        with:
-          bun-version: ${{ env.BUN_VERSION }}
-      - name: Install podman
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y podman
-      - working-directory: packages/browseros-agent
-        run: bun install --frozen-lockfile
-      - name: Build tarball
-        working-directory: ${{ env.PKG_DIR }}
-        env:
-          AGENT: ${{ inputs.agent || 'openclaw' }}
-          OUT: ${{ github.workspace }}/dist/images
-        run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
-      - uses: actions/upload-artifact@v4
-        with:
-          name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
-          path: dist/images/
-          retention-days: 7
-
-  smoke:
-    needs: build
-    runs-on: ubuntu-24.04-arm
-    steps:
-      - uses: actions/checkout@v4
-      - uses: oven-sh/setup-bun@v2
-        with:
-          bun-version: ${{ env.BUN_VERSION }}
-      - uses: actions/download-artifact@v4
-        with:
-          name: tarball-${{ inputs.agent || 'openclaw' }}-arm64
-          path: dist/images
-      - name: Install podman
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y podman
-      - working-directory: packages/browseros-agent
-        run: bun install --frozen-lockfile
-      - name: Smoke test tarball
-        working-directory: ${{ env.PKG_DIR }}
-        env:
-          AGENT: ${{ inputs.agent || 'openclaw' }}
-        run: |
-          set -euo pipefail
-          tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-arm64.tar.gz" -print -quit)"
-          if [ -z "$tarball" ]; then
-            echo "missing arm64 tarball artifact for ${AGENT}" >&2
-            exit 1
-          fi
-          bun run smoke:tarball -- --agent "$AGENT" --arch arm64 --tarball "$tarball"
-
-  publish:
-    needs: [build, smoke]
-    if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
-    runs-on: ubuntu-24.04
-    environment: release
-    concurrency:
-      group: r2-manifest-publish
-      cancel-in-progress: false
-    steps:
-      - uses: actions/checkout@v4
-      - uses: oven-sh/setup-bun@v2
-        with:
-          bun-version: ${{ env.BUN_VERSION }}
-      - uses: actions/download-artifact@v4
-        with:
-          pattern: tarball-*
-          path: dist/images
-          merge-multiple: true
-      - working-directory: packages/browseros-agent
-        run: bun install --frozen-lockfile
-      - name: Upload tarballs to R2
-        working-directory: ${{ env.PKG_DIR }}
-        env:
-          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
-          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
-          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
-          R2_BUCKET: ${{ secrets.R2_BUCKET }}
-        run: |
-          set -euo pipefail
-          for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
-            base="$(basename "$file")"
-            bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
-          done
-      - name: Merge agent slice into manifest
-        working-directory: ${{ env.PKG_DIR }}
-        env:
-          AGENT: ${{ inputs.agent || 'openclaw' }}
-          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
-          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
-          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
-          R2_BUCKET: ${{ secrets.R2_BUCKET }}
-        run: |
-          set -euo pipefail
-          mkdir -p dist/images
-          cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
-          bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
-          bun run emit-manifest -- \
-            --slice "agents:${AGENT}" \
-            --dist-dir dist \
-            --merge-from dist/baseline-manifest.json \
-            --out dist/manifest.json
-          bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
--- a/.github/workflows/eval-weekly.yml
+++ b/.github/workflows/eval-weekly.yml
@@ -14,7 +14,7 @@ on:
      config:
        description: 'Eval config file (relative to apps/eval/)'
        required: false
-        default: 'configs/browseros-agent-weekly.json'
+        default: 'configs/legacy/browseros-agent-weekly.json'

 permissions:
  contents: read
@@ -62,33 +62,27 @@ jobs:
          curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
          unzip -qo /tmp/nopecha.zip -d extensions/nopecha

-      - name: Run eval
+      - name: Run eval and publish to R2
        working-directory: packages/browseros-agent/apps/eval
        env:
          FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
-          BROWSEROS_BINARY: /usr/bin/browseros
-          WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
-          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
-        run: |
-          echo "Running eval with config: $EVAL_CONFIG"
-          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"
-
-      - name: Upload runs to R2
-        if: success()
-        working-directory: packages/browseros-agent/apps/eval
-        env:
          EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
          EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
          EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
          EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
          EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
-          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
+          BROWSEROS_BINARY: /usr/bin/browseros
+          WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
+          # OpenClaw container runtime is macOS-only; opt the Linux runner
+          # into the no-op stub so the server can boot and the eval can run.
+          BROWSEROS_SKIP_OPENCLAW: '1'
+          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
        run: |
-          CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
-          bun scripts/upload-run.ts "results/$CONFIG_NAME"
+          echo "Running eval with config: $EVAL_CONFIG"
+          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2

      - name: Generate trend report
        if: success()
@@ -109,3 +103,11 @@ jobs:
        with:
          name: eval-report-${{ github.run_id }}
          path: /tmp/eval-report.html
+
+      - name: Upload server stderr logs (for post-mortem on startup failures)
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: browseros-server-logs-${{ github.run_id }}
+          path: /tmp/browseros-server-logs/
+          if-no-files-found: ignore
--- a/.github/workflows/sync-internal-docs.yml
+++ b/.github/workflows/sync-internal-docs.yml
@@ -0,0 +1,62 @@
+name: Sync internal-docs submodule
+
+on:
+  schedule:
+    - cron: '0 */4 * * *'
+  workflow_dispatch:
+
+jobs:
+  sync:
+    name: Bump internal-docs submodule pointer on dev
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+    steps:
+      - name: Rewrite SSH submodule URL to HTTPS-with-token
+        env:
+          TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
+        run: |
+          git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
+
+      - uses: actions/checkout@v4
+        with:
+          token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
+          submodules: true
+          ref: dev
+          fetch-depth: 50
+
+      - name: Open auto-merge PR if internal-docs has new commits
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          set -e
+
+          # Skip if submodule not yet configured (handoff window before someone adds it)
+          if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
+            echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
+            exit 0
+          fi
+
+          git submodule update --remote --merge .internal-docs
+
+          if git diff --quiet .internal-docs; then
+            echo "No internal-docs changes to sync."
+            exit 0
+          fi
+
+          BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
+          git config user.name  "browseros-bot"
+          git config user.email "bot@browseros.ai"
+          git checkout -b "$BRANCH"
+          git add .internal-docs
+          git commit -m "chore: sync internal-docs submodule"
+          git push -u origin "$BRANCH"
+
+          PR_URL=$(gh pr create \
+            --base dev \
+            --head "$BRANCH" \
+            --title "chore: sync internal-docs submodule" \
+            --body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
+
+          gh pr merge "$PR_URL" --auto --squash --delete-branch
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -63,15 +63,15 @@ jobs:
            junit_path: test-results/server-root.xml
            needs_browser: false
          - suite: agent
-            command: bun run test:agent
+            command: (cd apps/agent && bun run test)
            junit_path: test-results/agent.xml
            needs_browser: false
          - suite: eval
-            command: bun run test:eval
+            command: (cd apps/eval && bun run test)
            junit_path: test-results/eval.xml
            needs_browser: false
          - suite: build
-            command: bun run test:build
+            command: bun run ./scripts/run-bun-test.ts ./scripts/build
            junit_path: test-results/build.xml
            needs_browser: false

--- a/.gitmodules
+++ b/.gitmodules
@@ -0,0 +1,4 @@
+[submodule ".internal-docs"]
+	path = .internal-docs
+	url = git@github.com:browseros-ai/internal-docs.git
+	branch = main
--- a/.internal-docs
+++ b/.internal-docs
--- a/README.md
+++ b/README.md
@@ -188,6 +188,21 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
 - [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
 - [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.

+## Citation
+
+If you use BrowserOS in your research or project, please cite:
+
+```bibtex
+@software{browseros2025,
+  author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
+  title = {BrowserOS: The open-source Agentic browser},
+  url = {https://github.com/browseros-ai/BrowserOS},
+  year = {2025},
+  publisher = {GitHub},
+  license = {AGPL-3.0},
+}
+```
+
 ## License

 BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
--- a/packages/browseros-agent/README.md
+++ b/packages/browseros-agent/README.md
@@ -79,14 +79,15 @@ cp apps/server/.env.example apps/server/.env.development
 cp apps/agent/.env.example apps/agent/.env.development
 cp apps/server/.env.production.example apps/server/.env.production

-# Install deps, generate agent code, and sync the VM cache
+# Install deps and generate agent code
 bun run dev:setup

 # Start the full dev environment
 bun run dev:watch
 ```

-`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.
+`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
+the server startup path and pulls the configured GHCR image on demand.

 ### Environment Variables

@@ -156,9 +157,14 @@ bun run build:server          # Build production server resource artifacts and u
 bun run build:agent           # Build agent extension

 # Test
-bun run test                  # Run standard tests
-bun run test:cdp              # Run CDP-based tests
-bun run test:integration      # Run integration tests
+bun run test                  # Run all tests
+bun run test:all              # Run all tests
+bun run test:main             # Run key server tools and integration tests
+
+# App-specific test groups (from packages/browseros-agent)
+cd apps/server && bun run test:tools
+cd apps/server && bun run test:cdp
+cd apps/server && bun run test:integration

 # Quality
 bun run lint                  # Check with Biome
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
@@ -17,7 +17,7 @@ export function groupProviderOptions(
      ? [{ key: 'llm' as const, label: 'AI Providers', options: llm }]
      : []),
    ...(acp.length
-      ? [{ key: 'acp' as const, label: 'ACP Models', options: acp }]
+      ? [{ key: 'acp' as const, label: 'Agents', options: acp }]
      : []),
  ]
 }
@@ -26,14 +26,25 @@ export function getProviderSearchValue(
  provider: Provider,
  groupLabel: string,
 ): string {
-  return [provider.id, provider.name, provider.type, groupLabel]
+  return [
+    provider.id,
+    provider.name,
+    provider.type,
+    groupLabel,
+    provider.adapterName,
+    provider.modelLabel,
+  ]
    .filter(Boolean)
    .join(' ')
 }

 export function getProviderSubtitle(provider: Provider): string | undefined {
  if (provider.kind !== 'acp') return undefined
-  return provider.modelControl === 'best-effort'
-    ? 'ACP model · best effort'
-    : 'ACP model'
+  return [
+    provider.adapterName,
+    provider.modelLabel,
+    provider.modelControl === 'best-effort' ? 'best effort' : undefined,
+  ]
+    .filter(Boolean)
+    .join(' · ')
 }
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
@@ -16,22 +16,26 @@ const options: Provider[] = [
  },
  {
    kind: 'acp',
-    id: 'acp:claude:haiku:medium',
-    name: 'Claude Code Haiku',
+    id: 'agent-claude-review',
+    name: 'Review Bot',
    type: 'acp',
+    adapterName: 'Claude Code',
+    modelLabel: 'Haiku',
    modelControl: 'best-effort',
  },
  {
    kind: 'acp',
-    id: 'acp:codex:gpt-5.5:medium',
-    name: 'Codex GPT-5.5',
+    id: 'agent-codex-browser',
+    name: 'Browser Driver',
    type: 'acp',
+    adapterName: 'Codex',
+    modelLabel: 'GPT-5.5',
    modelControl: 'runtime-supported',
  },
 ]

 describe('groupProviderOptions', () => {
-  it('groups normal providers separately from ACP models', () => {
+  it('groups normal providers separately from created agents', () => {
    expect(groupProviderOptions(options)).toEqual([
      {
        key: 'llm',
@@ -40,7 +44,7 @@ describe('groupProviderOptions', () => {
      },
      {
        key: 'acp',
-        label: 'ACP Models',
+        label: 'Agents',
        options: [options[2], options[3]],
      },
    ])
@@ -48,20 +52,21 @@ describe('groupProviderOptions', () => {
 })

 describe('getProviderSearchValue', () => {
-  it('matches ACP group labels and item labels', () => {
-    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
-      'ACP Models',
-    )
-    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
-      'Claude Code Haiku',
+  it('matches created-agent group labels and item labels', () => {
+    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Agents')
+    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Review Bot')
+    expect(getProviderSearchValue(options[2], 'Agents')).toContain(
+      'Claude Code',
    )
  })
 })

 describe('getProviderSubtitle', () => {
-  it('does not present best-effort ACP models as guaranteed routing', () => {
-    expect(getProviderSubtitle(options[2])).toBe('ACP model · best effort')
-    expect(getProviderSubtitle(options[3])).toBe('ACP model')
+  it('describes created-agent runtime context without model-target copy', () => {
+    expect(getProviderSubtitle(options[2])).toBe(
+      'Claude Code · Haiku · best effort',
+    )
+    expect(getProviderSubtitle(options[3])).toBe('Codex · GPT-5.5')
    expect(getProviderSubtitle(options[0])).toBeUndefined()
  })
 })
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
@@ -41,7 +41,10 @@ export const ChatProviderSelector: FC<
      <PopoverTrigger asChild>{children}</PopoverTrigger>
      <PopoverContent side="bottom" align="start" className="w-64 p-0">
        <Command>
-          <CommandInput placeholder="Search models..." className="h-9" />
+          <CommandInput
+            placeholder="Search providers or agents..."
+            className="h-9"
+          />
          <CommandList>
            <CommandEmpty>No provider found</CommandEmpty>
            {groups.map((group) => (
--- a/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
@@ -7,5 +7,8 @@ export interface Provider {
  name: string
  type: ChatProviderType
  kind: 'llm' | 'acp'
+  agentId?: string
+  adapterName?: string
+  modelLabel?: string
  modelControl?: 'runtime-supported' | 'best-effort'
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
@@ -1,136 +0,0 @@
-import { Bot, Loader2, Wrench } from 'lucide-react'
-import type { FC } from 'react'
-import type { AgentCardData } from '@/lib/agent-conversations/types'
-import { cn } from '@/lib/utils'
-
-interface AgentCardProps {
-  agent: AgentCardData
-  onClick: () => void
-  active?: boolean
-}
-
-function formatTimestamp(timestamp?: number): string {
-  if (!timestamp) return 'No activity yet'
-  const diff = Date.now() - timestamp
-  const minutes = Math.floor(diff / 60000)
-  if (minutes < 1) return 'just now'
-  if (minutes < 60) return `${minutes}m ago`
-  const hours = Math.floor(minutes / 60)
-  if (hours < 24) return `${hours}h ago`
-  return `${Math.floor(hours / 24)}d ago`
-}
-
-function getStatusLabel(status: AgentCardData['status']): string {
-  if (status === 'working') return 'Working'
-  if (status === 'error') return 'Error'
-  return 'Ready'
-}
-
-function getStatusTone(status: AgentCardData['status']): string {
-  if (status === 'working') return 'bg-amber-500'
-  if (status === 'error') return 'bg-destructive'
-  return 'bg-emerald-500'
-}
-
-function formatCost(usd: number): string {
-  if (usd < 0.005) return `$${usd.toFixed(4)}`
-  return `$${usd.toFixed(2)}`
-}
-
-export const AgentCardExpanded: FC<AgentCardProps> = ({
-  agent,
-  onClick,
-  active,
-}) => (
-  <button
-    type="button"
-    onClick={onClick}
-    className={cn(
-      'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
-      active
-        ? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
-        : 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
-    )}
-  >
-    <div className="flex items-start justify-between gap-3">
-      <div className="flex min-w-0 items-center gap-3">
-        <div
-          className={cn(
-            'flex size-10 shrink-0 items-center justify-center rounded-xl',
-            active
-              ? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
-              : 'bg-muted text-muted-foreground',
-          )}
-        >
-          <Bot className="size-5" />
-        </div>
-        <div className="min-w-0">
-          <div className="truncate font-semibold text-sm">{agent.name}</div>
-          <div className="truncate text-muted-foreground text-xs">
-            {agent.model ?? 'OpenClaw agent'}
-          </div>
-        </div>
-      </div>
-      <div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
-        <span
-          className={cn('size-2 rounded-full', getStatusTone(agent.status))}
-        />
-        <span>{getStatusLabel(agent.status)}</span>
-      </div>
-    </div>
-
-    <div className="mt-4 flex-1">
-      <p className="line-clamp-2 text-foreground/90 text-sm">
-        {agent.lastMessage ??
-          'Start a conversation to see recent work and summaries.'}
-      </p>
-    </div>
-
-    <div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
-      <div className="flex items-center justify-between gap-3">
-        <span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
-        {agent.costUsd ? (
-          <span className="tabular-nums opacity-70">
-            {formatCost(agent.costUsd)}
-          </span>
-        ) : null}
-      </div>
-      {agent.status === 'working' && agent.currentTool ? (
-        <div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
-          <Loader2 className="size-3 shrink-0 animate-spin" />
-          <span className="truncate">{agent.currentTool}</span>
-        </div>
-      ) : agent.activitySummary ? (
-        <div className="flex items-center gap-1.5 text-muted-foreground/60">
-          <Wrench className="size-3 shrink-0" />
-          <span className="truncate">{agent.activitySummary}</span>
-        </div>
-      ) : null}
-    </div>
-  </button>
-)
-
-export const AgentCardCompact: FC<AgentCardProps> = ({
-  agent,
-  onClick,
-  active,
-}) => (
-  <button
-    type="button"
-    onClick={onClick}
-    className={cn(
-      'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
-      active
-        ? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
-        : 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
-    )}
-  >
-    <span
-      className={cn(
-        'size-2 rounded-full',
-        active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
-      )}
-    />
-    <span className="truncate">{agent.name}</span>
-  </button>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
@@ -1,70 +1,71 @@
 import { Plus } from 'lucide-react'
 import type { FC } from 'react'
-import type { AgentCardData } from '@/lib/agent-conversations/types'
+import type {
+  HarnessAdapterDescriptor,
+  HarnessAdapterHealth,
+  HarnessAgent,
+  HarnessAgentAdapter,
+} from '@/entrypoints/app/agents/agent-harness-types'
 import { cn } from '@/lib/utils'
-import { AgentCardCompact, AgentCardExpanded } from './AgentCard'
+import { HomeAgentCard } from './HomeAgentCard'

 interface AgentCardDockProps {
-  agents: AgentCardData[]
+  agents: HarnessAgent[]
+  adapters: HarnessAdapterDescriptor[]
  activeAgentId?: string
  onSelectAgent: (agentId: string) => void
  onCreateAgent?: () => void
-  compact?: boolean
 }

-function CreateAgentButton({
-  compact,
-  onCreateAgent,
-}: {
-  compact?: boolean
-  onCreateAgent: () => void
-}) {
+function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
  return (
    <button
      type="button"
      onClick={onCreateAgent}
      className={cn(
-        'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
-        compact
-          ? 'rounded-full px-3 py-2 text-sm'
-          : 'min-h-32 rounded-2xl px-5 py-4',
+        'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
+        'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
      )}
    >
-      <Plus className={compact ? 'size-3.5' : 'size-5'} />
-      <span>{compact ? 'New' : 'Create agent'}</span>
+      <Plus className="size-5" />
+      <span>Create agent</span>
    </button>
  )
 }

+/**
+ * 3-column grid of HomeAgentCards plus a trailing "Create agent"
+ * tile. The previous `compact` mode (rendered a horizontal pill rail)
+ * had no callers and was dropped along with the legacy AgentCard.
+ */
 export const AgentCardDock: FC<AgentCardDockProps> = ({
  agents,
+  adapters,
  activeAgentId,
  onSelectAgent,
  onCreateAgent,
-  compact,
 }) => {
  if (agents.length === 0 && !onCreateAgent) return null

-  const Card = compact ? AgentCardCompact : AgentCardExpanded
+  const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
+  for (const descriptor of adapters) {
+    if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
+  }

  return (
-    <div
-      className={cn(
-        compact
-          ? 'flex items-center gap-2 overflow-x-auto pb-1'
-          : 'grid gap-4 md:grid-cols-3',
-      )}
-    >
+    <div className="grid gap-4 md:grid-cols-3">
      {agents.map((agent) => (
-        <Card
-          key={agent.agentId}
+        <HomeAgentCard
+          key={agent.id}
          agent={agent}
-          active={agent.agentId === activeAgentId}
-          onClick={() => onSelectAgent(agent.agentId)}
+          adapter={agent.adapter}
+          adapterHealth={adapterHealth.get(agent.adapter) ?? null}
+          active={agent.id === activeAgentId}
+          onClick={() => onSelectAgent(agent.id)}
        />
      ))}
      {onCreateAgent ? (
-        <CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
+        <CreateAgentButton onCreateAgent={onCreateAgent} />
      ) : null}
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
@@ -1,179 +1,35 @@
-import { ArrowLeft, Bot, Home } from 'lucide-react'
+import { ArrowLeft } from 'lucide-react'
 import { type FC, useEffect, useMemo, useRef } from 'react'
 import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
 import { Button } from '@/components/ui/button'
+import type {
+  HarnessAgent,
+  HarnessAgentAdapter,
+} from '@/entrypoints/app/agents/agent-harness-types'
+import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
 import {
-  type AgentEntry,
-  getModelDisplayName,
-} from '@/entrypoints/app/agents/useOpenClaw'
-import { cn } from '@/lib/utils'
+  cancelHarnessTurn,
+  useAgentAdapters,
+  useEnqueueHarnessMessage,
+  useHarnessAgents,
+  useRemoveHarnessQueuedMessage,
+  useUpdateHarnessAgent,
+} from '@/entrypoints/app/agents/useAgents'
+import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
+import { AgentRail } from './AgentRail'
 import { useAgentCommandData } from './agent-command-layout'
 import { ClawChat } from './ClawChat'
+import { ConversationHeader } from './ConversationHeader'
 import { ConversationInput } from './ConversationInput'
 import {
  buildChatHistoryFromClawMessages,
  filterTurnsPersistedInHistory,
  flattenHistoryPages,
 } from './claw-chat-types'
+import { QueuePanel } from './QueuePanel'
 import { useAgentConversation } from './useAgentConversation'
 import { useHarnessChatHistory } from './useHarnessChatHistory'

-function StatusBadge({ status }: { status: string }) {
-  return (
-    <div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
-      <span
-        className={cn(
-          'size-1.5 rounded-full',
-          status === 'Working on your request'
-            ? 'bg-amber-500'
-            : status === 'Ready'
-              ? 'bg-emerald-500'
-              : status === 'Offline'
-                ? 'bg-muted-foreground/50'
-                : 'bg-[var(--accent-orange)]',
-        )}
-      />
-      <span>{status}</span>
-    </div>
-  )
-}
-
-function AgentIdentity({
-  name,
-  meta,
-  className,
-}: {
-  name: string
-  meta: string
-  className?: string
-}) {
-  return (
-    <div className={cn('min-w-0', className)}>
-      <div className="truncate font-semibold text-[15px] leading-5">{name}</div>
-      <div className="truncate text-muted-foreground text-xs leading-5">
-        {meta}
-      </div>
-    </div>
-  )
-}
-
-function ConversationHeader({
-  agentName,
-  agentMeta,
-  status,
-  backLabel,
-  backTarget,
-  onGoHome,
-}: {
-  agentName: string
-  agentMeta: string
-  status: string
-  backLabel: string
-  backTarget: 'home' | 'page'
-  onGoHome: () => void
-}) {
-  const BackIcon = backTarget === 'home' ? Home : ArrowLeft
-
-  return (
-    <div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
-      <div className="flex min-w-0 items-center gap-3">
-        <Button
-          variant="ghost"
-          size="icon"
-          onClick={onGoHome}
-          className="size-8 rounded-xl lg:hidden"
-          title={backLabel}
-        >
-          <BackIcon className="size-4" />
-        </Button>
-        <div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
-          <Bot className="size-4" />
-        </div>
-        <AgentIdentity name={agentName} meta={agentMeta} />
-      </div>
-
-      <StatusBadge status={status} />
-    </div>
-  )
-}
-
-function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
-  return (
-    <div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
-      <div className="flex min-w-0 items-center gap-3">
-        <Button
-          variant="ghost"
-          size="icon"
-          onClick={onGoHome}
-          className="size-8 rounded-xl"
-          title="Back to home"
-        >
-          <ArrowLeft className="size-4" />
-        </Button>
-        <div className="truncate font-semibold text-[15px] leading-5">
-          Agents
-        </div>
-      </div>
-    </div>
-  )
-}
-
-function AgentRailList({
-  activeAgentId,
-  agents,
-  onSelectAgent,
-}: {
-  activeAgentId: string
-  agents: AgentEntry[]
-  onSelectAgent: (entry: AgentEntry) => void
-}) {
-  return (
-    <aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
-      <div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
-        {agents.map((entry) => {
-          const active = entry.agentId === activeAgentId
-          const modelName = getAgentEntryMeta(entry)
-
-          return (
-            <button
-              key={entry.agentId}
-              type="button"
-              onClick={() => onSelectAgent(entry)}
-              className={cn(
-                'w-full rounded-2xl border px-3 py-3 text-left transition-all',
-                active
-                  ? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
-                  : 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
-              )}
-            >
-              <div className="flex items-center gap-3">
-                <div
-                  className={cn(
-                    'flex size-9 items-center justify-center rounded-xl',
-                    active
-                      ? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
-                      : 'bg-muted text-muted-foreground',
-                  )}
-                >
-                  <Bot className="size-4" />
-                </div>
-                <AgentIdentity name={entry.name} meta={modelName} />
-              </div>
-            </button>
-          )
-        })}
-      </div>
-    </aside>
-  )
-}
-
-function getAgentEntryMeta(agent: AgentEntry | undefined): string {
-  if (agent?.source === 'agent-harness') {
-    return getModelDisplayName(agent.model) ?? 'ACP agent'
-  }
-  return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
-}
-
 function AgentConversationController({
  agentId,
  initialMessage,
@@ -212,15 +68,33 @@ function AgentConversationController({
    [historyMessages],
  )

+  // Listing query feeds queue + active-turn state for this agent. We
+  // already poll it every 5s for the rail; reusing the same cache
+  // keeps cross-tab queue state in sync without a second poll.
+  const { harnessAgents } = useHarnessAgents()
+  const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
+  const queue = harnessAgent?.queue ?? []
+  const activeTurnId = harnessAgent?.activeTurnId ?? null
+
  const { turns, streaming, send } = useAgentConversation(agentId, {
    runtime: 'agent-harness',
    sessionKey: null,
    history: chatHistory,
+    activeTurnId,
    onComplete: () => {
      void harnessHistoryQuery.refetch()
    },
    onSessionKeyChange: () => {},
  })
+  const enqueueMessage = useEnqueueHarnessMessage()
+  const removeQueuedMessage = useRemoveHarnessQueuedMessage()
+
+  const handleStop = () => {
+    void cancelHarnessTurn(agentId, {
+      turnId: activeTurnId ?? undefined,
+      reason: 'user pressed stop',
+    })
+  }
  const visibleTurns = useMemo(
    () => filterTurnsPersistedInHistory(turns, historyMessages),
    [historyMessages, turns],
@@ -264,7 +138,7 @@ function AgentConversationController({
  }

  return (
-    <div className="flex min-h-0 flex-col overflow-hidden">
+    <div className="flex min-h-0 flex-1 flex-col overflow-hidden">
      <ClawChat
        agentName={agentName}
        historyMessages={historyMessages}
@@ -281,7 +155,15 @@ function AgentConversationController({
      />

      <div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
-        <div className="mx-auto max-w-3xl">
+        <div className="mx-auto max-w-3xl space-y-3">
+          {queue.length > 0 ? (
+            <QueuePanel
+              queue={queue}
+              onRemove={(messageId) =>
+                removeQueuedMessage.mutate({ agentId, messageId })
+              }
+            />
+          ) : null}
          <ConversationInput
            variant="conversation"
            agents={agents}
@@ -296,14 +178,31 @@ function AgentConversationController({
                name: a.name,
                dataUrl: a.dataUrl,
              }))
+              // When the agent already has an in-flight turn, route
+              // the new message into the durable queue instead of
+              // starting a parallel turn. Drains automatically as
+              // soon as the active turn ends.
+              if (streaming || activeTurnId) {
+                enqueueMessage.mutate({
+                  agentId,
+                  message: input.text,
+                  attachments,
+                })
+                return
+              }
              void send({ text: input.text, attachments, attachmentPreviews })
            }}
            onCreateAgent={() => navigate(createAgentPath)}
+            onStop={handleStop}
            streaming={streaming}
            disabled={disabled}
            status="running"
            attachmentsEnabled={true}
-            placeholder={`Message ${agentName}...`}
+            placeholder={
+              streaming
+                ? `Type to queue another message for ${agentName}...`
+                : `Message ${agentName}...`
+            }
          />
        </div>
      </div>
@@ -318,6 +217,22 @@ interface AgentCommandConversationProps {
  createAgentPath?: string
 }

+function inferAdapterFromEntry(
+  entry: AgentEntry | undefined,
+): HarnessAgentAdapter | 'unknown' {
+  if (!entry) return 'unknown'
+  if (entry.source === 'agent-harness') {
+    // Harness entries don't carry the adapter on AgentEntry; the rail
+    // / header read the harness record directly. This branch only runs
+    // before the harness query resolves, so 'unknown' is correct — the
+    // tile's bot fallback renders until data arrives.
+    return 'unknown'
+  }
+  // OpenClaw-only entries (no harness shadow) are deprecated in
+  // practice but the rail still tolerates them.
+  return 'openclaw'
+}
+
 export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
  variant = 'command',
  backPath = '/home',
@@ -328,60 +243,110 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
  const [searchParams, setSearchParams] = useSearchParams()
  const navigate = useNavigate()
  const { agents } = useAgentCommandData()
+  const { harnessAgents } = useHarnessAgents()
+  const { adapters } = useAgentAdapters()
+  const updateAgent = useUpdateHarnessAgent()
+
  const shouldRedirectHome = !agentId
  const resolvedAgentId = agentId ?? ''
-  const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
-  const agentName = agent?.name || resolvedAgentId || 'Agent'
-  const agentMeta = getAgentEntryMeta(agent)
+  const harnessAgent = harnessAgents.find(
+    (entry) => entry.id === resolvedAgentId,
+  )
+  const entry = agents.find((item) => item.agentId === resolvedAgentId)
+  const fallbackName = entry?.name || resolvedAgentId || 'Agent'
+  const fallbackAdapter = inferAdapterFromEntry(entry)
  const initialMessage = searchParams.get('q')
  const isPageVariant = variant === 'page'
  const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'

+  const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
+    const adapterId = harnessAgent?.adapter
+    if (!adapterId) return null
+    const descriptor = adapters.find((item) => item.id === adapterId)
+    if (!descriptor?.health) return null
+    return {
+      healthy: descriptor.health.healthy,
+      reason: descriptor.health.reason,
+    }
+  }, [adapters, harnessAgent?.adapter])
+
  if (shouldRedirectHome) {
    return <Navigate to="/home" replace />
  }

-  const handleSelectAgent = (entry: AgentEntry) => {
-    navigate(`${agentPathPrefix}/${entry.agentId}`)
+  const handleSelectHarnessAgent = (target: HarnessAgent) => {
+    navigate(`${agentPathPrefix}/${target.id}`)
  }

-  // Every visible agent runs through the harness now, so per-agent
-  // runtime status doesn't gate chat the way OpenClaw's legacy
-  // gateway lifecycle did. Show "Ready" once the agent record is
-  // resolved from the rail, "Setup" otherwise.
-  const statusCopy = agent ? 'Ready' : 'Setup'
+  const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
+    if (!target) return
+    updateAgent.mutate({
+      agentId: target.id,
+      patch: { pinned: next },
+    })
+  }

  return (
    <div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
-      <div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
-        <AgentRailHeader onGoHome={() => navigate(backPath)} />
+      <div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
+        {/* Shared top band — the rail's "Agents" header and the chat
+            header live on one row so they're aligned by construction. */}
+        <div className="flex shrink-0 items-stretch border-border/50 border-b">
+          <div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
+            <Button
+              variant="ghost"
+              size="icon"
+              onClick={() => navigate(backPath)}
+              className="size-8 rounded-xl"
+              title="Back to home"
+            >
+              <ArrowLeft className="size-4" />
+            </Button>
+            <div className="truncate font-semibold text-[15px] leading-5">
+              Agents
+            </div>
+          </div>
+          <div className="min-w-0 flex-1">
+            <ConversationHeader
+              agent={harnessAgent ?? null}
+              fallbackName={fallbackName}
+              fallbackAdapter={fallbackAdapter}
+              adapterHealth={adapterHealth}
+              backLabel={backLabel}
+              backTarget={isPageVariant ? 'page' : 'home'}
+              onGoHome={() => navigate(backPath)}
+              onPinToggle={(next) =>
+                handlePinToggle(harnessAgent ?? null, next)
+              }
+            />
+          </div>
+        </div>

-        <ConversationHeader
-          agentName={agentName}
-          agentMeta={agentMeta}
-          status={statusCopy}
-          backLabel={backLabel}
-          backTarget={isPageVariant ? 'page' : 'home'}
-          onGoHome={() => navigate(backPath)}
-        />
+        {/* Body grid: rail list + chat. Both columns share the same
+            top edge (the band above) so headers can never drift. */}
+        <div className="grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)] lg:grid-cols-[288px_minmax(0,1fr)]">
+          <AgentRail
+            agents={harnessAgents}
+            adapters={adapters}
+            activeAgentId={resolvedAgentId}
+            onSelectAgent={handleSelectHarnessAgent}
+            onPinToggle={(target, next) => handlePinToggle(target, next)}
+          />

-        <AgentRailList
-          activeAgentId={resolvedAgentId}
-          agents={agents}
-          onSelectAgent={handleSelectAgent}
-        />
-
-        <AgentConversationController
-          key={resolvedAgentId}
-          agentId={resolvedAgentId}
-          agents={agents}
-          initialMessage={initialMessage}
-          onInitialMessageConsumed={() =>
-            setSearchParams({}, { replace: true })
-          }
-          agentPathPrefix={agentPathPrefix}
-          createAgentPath={createAgentPath}
-        />
+          <div className="flex h-full min-h-0 flex-col overflow-hidden">
+            <AgentConversationController
+              key={resolvedAgentId}
+              agentId={resolvedAgentId}
+              agents={agents}
+              initialMessage={initialMessage}
+              onInitialMessageConsumed={() =>
+                setSearchParams({}, { replace: true })
+              }
+              agentPathPrefix={agentPathPrefix}
+              createAgentPath={createAgentPath}
+            />
+          </div>
+        </div>
      </div>
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
@@ -1,18 +1,25 @@
 import { Plus } from 'lucide-react'
-import { type FC, useEffect, useState } from 'react'
+import { type FC, useEffect, useMemo, useState } from 'react'
 import { useNavigate } from 'react-router'
 import { Button } from '@/components/ui/button'
 import { Card, CardContent } from '@/components/ui/card'
 import { Separator } from '@/components/ui/separator'
+import type {
+  HarnessAdapterDescriptor,
+  HarnessAgent,
+} from '@/entrypoints/app/agents/agent-harness-types'
+import {
+  useAgentAdapters,
+  useHarnessAgents,
+} from '@/entrypoints/app/agents/useAgents'
 import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
 import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
 import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
 import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
-import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { AgentCardDock } from './AgentCardDock'
 import { useAgentCommandData } from './agent-command-layout'
 import { ConversationInput } from './ConversationInput'
-import { buildAgentCardData } from './useAgentCardData'
+import { orderHomeAgents } from './home-agent-card.helpers'

 function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
  return (
@@ -38,11 +45,13 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
 function RecentThreads({
  activeAgentId,
  agents,
+  adapters,
  onOpenAgents,
  onSelectAgent,
 }: {
  activeAgentId?: string | null
-  agents: AgentCardData[]
+  agents: HarnessAgent[]
+  adapters: HarnessAdapterDescriptor[]
  onOpenAgents: () => void
  onSelectAgent: (agentId: string) => void
 }) {
@@ -68,6 +77,7 @@ function RecentThreads({
      </div>
      <AgentCardDock
        agents={agents}
+        adapters={adapters}
        activeAgentId={activeAgentId ?? undefined}
        onSelectAgent={onSelectAgent}
        onCreateAgent={onOpenAgents}
@@ -79,25 +89,32 @@ function RecentThreads({
 export const AgentCommandHome: FC = () => {
  const navigate = useNavigate()
  const activeHint = useActiveHint()
-  const { agents, status } = useAgentCommandData()
+  // The conversation input still consumes the merged AgentEntry list
+  // from the layout context (handles legacy /claw/agents entries that
+  // haven't yet been backfilled into the harness store). The Recent
+  // Agents grid below reads the richer harness payload directly.
+  const { agents: legacyAgents, status } = useAgentCommandData()
+  const { harnessAgents } = useHarnessAgents()
+  const { adapters } = useAgentAdapters()
  const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
-  const cardData = buildAgentCardData(agents, status?.status, undefined)
+
+  const orderedAgents = useMemo(
+    () => orderHomeAgents(harnessAgents),
+    [harnessAgents],
+  )

  useEffect(() => {
-    if (agents.length === 0) {
-      if (selectedAgentId) {
-        setSelectedAgentId(null)
-      }
+    if (legacyAgents.length === 0) {
+      if (selectedAgentId) setSelectedAgentId(null)
      return
    }
-
    if (
      !selectedAgentId ||
-      !agents.some((agent) => agent.agentId === selectedAgentId)
+      !legacyAgents.some((agent) => agent.agentId === selectedAgentId)
    ) {
-      setSelectedAgentId(agents[0].agentId)
+      setSelectedAgentId(legacyAgents[0].agentId)
    }
-  }, [agents, selectedAgentId])
+  }, [legacyAgents, selectedAgentId])

  const handleSend = (input: { text: string }) => {
    if (!selectedAgentId) return
@@ -110,7 +127,7 @@ export const AgentCommandHome: FC = () => {
    setSelectedAgentId(agent.agentId)
  }

-  const selectedAgent = agents.find(
+  const selectedAgent = legacyAgents.find(
    (agent) => agent.agentId === selectedAgentId,
  )
  const selectedAgentReady = selectedAgent
@@ -118,13 +135,15 @@ export const AgentCommandHome: FC = () => {
    : false
  const selectedAgentStatus =
    selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
-  const selectedCard =
-    cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]
+  const selectedAgentName =
+    selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
+
+  const hasAgents = legacyAgents.length > 0

  return (
    <div className="min-h-full px-4 py-6">
      <div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
-        {cardData.length > 0 ? (
+        {hasAgents ? (
          <>
            <div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
              <div className="space-y-3">
@@ -140,7 +159,7 @@ export const AgentCommandHome: FC = () => {
              <div className="w-full max-w-3xl">
                <ConversationInput
                  variant="home"
-                  agents={agents}
+                  agents={legacyAgents}
                  selectedAgentId={selectedAgentId}
                  onSelectAgent={handleSelectAgent}
                  onSend={handleSend}
@@ -151,7 +170,7 @@ export const AgentCommandHome: FC = () => {
                  attachmentsEnabled={false}
                  placeholder={
                    selectedAgentReady
-                      ? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
+                      ? `Ask ${selectedAgentName} to handle a task...`
                      : 'Agent runtime is not running...'
                  }
                />
@@ -162,7 +181,8 @@ export const AgentCommandHome: FC = () => {

            <RecentThreads
              activeAgentId={selectedAgentId}
-              agents={cardData}
+              agents={orderedAgents}
+              adapters={adapters}
              onOpenAgents={() => navigate('/agents')}
              onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
            />
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRail.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRail.tsx
@@ -0,0 +1,65 @@
+import { type FC, useMemo } from 'react'
+import type {
+  HarnessAdapterDescriptor,
+  HarnessAgent,
+  HarnessAgentAdapter,
+} from '@/entrypoints/app/agents/agent-harness-types'
+import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
+import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
+import { AgentRailRow } from './AgentRailRow'
+
+interface AgentRailProps {
+  agents: HarnessAgent[]
+  adapters: HarnessAdapterDescriptor[]
+  activeAgentId: string
+  onSelectAgent: (agent: HarnessAgent) => void
+  onPinToggle: (agent: HarnessAgent, next: boolean) => void
+}
+
+/**
+ * Left-column scrollable list of agents. The "Agents" label + back
+ * button live in the shared top band above (so the rail header and
+ * the chat header sit on a single aligned strip rather than as two
+ * separately-sized headers per column). Sort matches `/agents`:
+ * pinned-first → recency, so the rail doesn't reshuffle as turns
+ * transition every 5 s.
+ */
+export const AgentRail: FC<AgentRailProps> = ({
+  agents,
+  adapters,
+  activeAgentId,
+  onSelectAgent,
+  onPinToggle,
+}) => {
+  const adapterHealth = useMemo(() => {
+    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
+    for (const adapter of adapters) {
+      if (adapter.health) {
+        map.set(adapter.id, {
+          healthy: adapter.health.healthy,
+          reason: adapter.health.reason,
+        })
+      }
+    }
+    return map
+  }, [adapters])
+
+  const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
+
+  return (
+    <aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
+      <div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
+        {ordered.map((agent) => (
+          <AgentRailRow
+            key={agent.id}
+            agent={agent}
+            active={agent.id === activeAgentId}
+            adapterHealth={adapterHealth.get(agent.adapter) ?? null}
+            onSelect={() => onSelectAgent(agent)}
+            onPinToggle={(next) => onPinToggle(agent, next)}
+          />
+        ))}
+      </div>
+    </aside>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRailRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRailRow.tsx
@@ -0,0 +1,102 @@
+import type { FC } from 'react'
+import { Badge } from '@/components/ui/badge'
+import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
+import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
+import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
+import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
+import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
+import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
+import { cn } from '@/lib/utils'
+
+interface AgentRailRowProps {
+  agent: HarnessAgent
+  active: boolean
+  adapterHealth: AgentAdapterHealth | null
+  onSelect: () => void
+  onPinToggle: (next: boolean) => void
+}
+
+/**
+ * Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
+ * down to the essentials that fit a ~280 px rail: tile + name + status
+ * badge + pin star, with the adapter / model / reasoning chips on a
+ * second line. Token totals, sparkline, last-message preview all stay
+ * on the `/agents` page where rows are full-width.
+ */
+export const AgentRailRow: FC<AgentRailRowProps> = ({
+  agent,
+  active,
+  adapterHealth,
+  onSelect,
+  onPinToggle,
+}) => {
+  const status = agent.status ?? 'unknown'
+  const lastUsedAt = agent.lastUsedAt ?? null
+  const pinned = agent.pinned ?? false
+  return (
+    <button
+      type="button"
+      onClick={onSelect}
+      className={cn(
+        'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
+        active
+          ? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
+          : 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
+      )}
+    >
+      <div className="flex min-w-0 items-start gap-3">
+        <AgentTile
+          adapter={agent.adapter}
+          status={status}
+          lastUsedAt={lastUsedAt}
+        />
+        <div className="min-w-0 flex-1">
+          <div className="flex items-center gap-1.5">
+            <span className="truncate font-semibold text-[14px] leading-5">
+              {agent.name}
+            </span>
+            {status === 'working' && (
+              <Badge
+                variant="secondary"
+                className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
+              >
+                Working
+              </Badge>
+            )}
+            {status === 'asleep' && (
+              <Badge
+                variant="outline"
+                className="h-5 px-1.5 text-[10px] text-muted-foreground"
+              >
+                Asleep
+              </Badge>
+            )}
+            {status === 'error' && (
+              <Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
+                Attention
+              </Badge>
+            )}
+            <div className="ml-auto">
+              <PinToggle pinned={pinned} onToggle={onPinToggle} />
+            </div>
+          </div>
+          <AgentSummaryChips
+            adapter={agent.adapter}
+            modelLabel={agent.modelId ?? null}
+            reasoningEffort={agent.reasoningEffort ?? null}
+            adapterHealth={adapterHealth}
+          />
+        </div>
+      </div>
+    </button>
+  )
+}
+
+/**
+ * Tooltip-only label helper kept exported in case the tile row needs to
+ * show "Codex agent" or similar in a future state. Inlined fallback for
+ * the rare `unknown` adapter rendering path.
+ */
+export function railRowAdapterLabel(agent: HarnessAgent): string {
+  return adapterLabel(agent.adapter)
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationHeader.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationHeader.tsx
@@ -0,0 +1,179 @@
+import { ArrowLeft, Home } from 'lucide-react'
+import type { FC } from 'react'
+import { Badge } from '@/components/ui/badge'
+import { Button } from '@/components/ui/button'
+import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
+import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
+import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
+import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
+import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
+import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
+import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
+import { cn } from '@/lib/utils'
+
+interface ConversationHeaderProps {
+  agent: HarnessAgent | null
+  fallbackName: string
+  fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'unknown'
+  adapterHealth: AgentAdapterHealth | null
+  backLabel: string
+  backTarget: 'home' | 'page'
+  onGoHome: () => void
+  onPinToggle: (next: boolean) => void
+}
+
+/**
+ * Strip above the chat. Mirrors the `/agents` row card's title row +
+ * summary chips so the user gets adapter health, pin state, and status
+ * at a glance — but adds the meta line (last used · lifetime tokens ·
+ * queued) that's specific to this surface.
+ *
+ * The mobile `lg:hidden` Back button is preserved so the small-screen
+ * collapse keeps a navigable header without a sidebar.
+ */
+export const ConversationHeader: FC<ConversationHeaderProps> = ({
+  agent,
+  fallbackName,
+  fallbackAdapter,
+  adapterHealth,
+  backLabel,
+  backTarget,
+  onGoHome,
+  onPinToggle,
+}) => {
+  const BackIcon = backTarget === 'home' ? Home : ArrowLeft
+  const adapter = agent?.adapter ?? fallbackAdapter
+  const status: AgentLiveness = agent?.status ?? 'unknown'
+  const lastUsedAt = agent?.lastUsedAt ?? null
+  const pinned = agent?.pinned ?? false
+  const queueCount = agent?.queue?.length ?? 0
+  const tokens = agent?.tokens ?? null
+  const lifetimeTotal = tokens
+    ? tokens.cumulative.input + tokens.cumulative.output
+    : 0
+
+  const metaParts: string[] = []
+  if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
+  if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
+  if (queueCount > 0) {
+    metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
+  }
+
+  return (
+    <div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
+      <div className="flex min-w-0 items-center gap-3">
+        <Button
+          variant="ghost"
+          size="icon"
+          onClick={onGoHome}
+          className="size-8 shrink-0 rounded-xl lg:hidden"
+          title={backLabel}
+        >
+          <BackIcon className="size-4" />
+        </Button>
+        <div className="group min-w-0 flex-1">
+          <div className="flex items-center gap-2">
+            <span className="truncate font-semibold text-[15px] leading-6">
+              {agent?.name || fallbackName}
+            </span>
+            {agent ? (
+              <PinToggle pinned={pinned} onToggle={onPinToggle} />
+            ) : null}
+          </div>
+          <div className="mt-0.5 flex items-center gap-2">
+            <AgentSummaryChips
+              adapter={adapter}
+              modelLabel={agent?.modelId ?? null}
+              reasoningEffort={agent?.reasoningEffort ?? null}
+              adapterHealth={adapterHealth}
+            />
+          </div>
+        </div>
+      </div>
+      <div className="flex shrink-0 flex-col items-end gap-1">
+        <StatusPill
+          status={status}
+          hasActiveTurn={Boolean(agent?.activeTurnId)}
+        />
+        <div className="flex h-4 items-center text-[11px] text-muted-foreground">
+          <span className="truncate">
+            {metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
+          </span>
+        </div>
+      </div>
+    </div>
+  )
+}
+
+interface StatusPillProps {
+  status: AgentLiveness
+  hasActiveTurn: boolean
+}
+
+/**
+ * Working / Asleep / Attention all get distinctive styling; idle keeps
+ * the legacy emerald `Ready` pill so the default state is visually
+ * calm. Defensive working: `idle + activeTurnId` falls through to the
+ * working pill since the server says a turn is in flight.
+ */
+const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
+  const effective: AgentLiveness =
+    status === 'idle' && hasActiveTurn ? 'working' : status
+
+  const base =
+    'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
+
+  if (effective === 'working') {
+    return (
+      <Badge
+        variant="secondary"
+        className={cn(
+          base,
+          'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
+        )}
+      >
+        <span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
+        Working
+      </Badge>
+    )
+  }
+  if (effective === 'asleep') {
+    return (
+      <Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
+        <span className="size-1.5 rounded-full bg-muted-foreground/50" />
+        Asleep
+      </Badge>
+    )
+  }
+  if (effective === 'error') {
+    return (
+      <Badge
+        variant="destructive"
+        className={cn(base, 'border-destructive/30')}
+      >
+        <span className="size-1.5 rounded-full bg-destructive-foreground" />
+        Attention
+      </Badge>
+    )
+  }
+  if (effective === 'idle') {
+    return (
+      <Badge
+        variant="outline"
+        className={cn(
+          base,
+          'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
+        )}
+      >
+        <span className="size-1.5 rounded-full bg-emerald-500" />
+        Ready
+      </Badge>
+    )
+  }
+  return (
+    <Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
+      <span className="size-1.5 rounded-full bg-muted-foreground/30" />
+      Setup
+    </Badge>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
@@ -54,25 +54,40 @@ interface ConversationInputProps {
  placeholder?: string
  attachmentsEnabled?: boolean
  variant?: 'home' | 'conversation'
+  /**
+   * When set, a Stop button surfaces to the left of the voice mic
+   * while `streaming === true`. Click cancels the active turn
+   * server-side via the chat-cancel endpoint. Absent → no Stop
+   * button (legacy behaviour for the home composer).
+   */
+  onStop?: () => void
 }

 function InputActionButton({
  disabled,
  onClick,
  streaming,
+  hasContent,
 }: {
  disabled: boolean
  onClick: () => void
  streaming: boolean
+  hasContent: boolean
 }) {
+  // Show the spinner while streaming only when there's nothing to
+  // send — once the user types something, the icon flips back to the
+  // paper-plane so it reads as "queue this message" instead of
+  // "still working".
+  const showSpinner = streaming && !hasContent
  return (
    <Button
      onClick={onClick}
      size="icon"
      disabled={disabled}
+      title={streaming && hasContent ? 'Queue message' : undefined}
      className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
    >
-      {streaming ? (
+      {showSpinner ? (
        <Loader2 className="h-5 w-5 animate-spin" />
      ) : (
        <ArrowRight className="h-5 w-5" />
@@ -81,6 +96,22 @@ function InputActionButton({
  )
 }

+function StopButton({ onStop }: { onStop: () => void }) {
+  return (
+    <Button
+      type="button"
+      size="icon"
+      variant="ghost"
+      onClick={onStop}
+      title="Stop current turn — queued messages will start next."
+      aria-label="Stop current turn"
+      className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
+    >
+      <Square className="h-3.5 w-3.5 fill-current" />
+    </Button>
+  )
+}
+
 function VoiceButton({
  isRecording,
  isTranscribing,
@@ -299,6 +330,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  placeholder,
  attachmentsEnabled = true,
  variant = 'conversation',
+  onStop,
 }) => {
  const [input, setInput] = useState('')
  const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -379,10 +411,17 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  }

  const hasContent = input.trim().length > 0 || attachments.length > 0
+  // Queue-aware composers (the conversation panel passes `onStop`)
+  // accept input while streaming — the parent decides whether the
+  // submission opens a new turn or enqueues onto the active one.
+  // Surfaces without a Stop hook (home) keep the legacy behaviour
+  // and block input until the current turn finishes.
+  const queueAware = Boolean(onStop)

  const handleSend = () => {
    const text = input.trim()
-    if (disabled || isStaging || streaming) return
+    if (disabled || isStaging) return
+    if (streaming && !queueAware) return
    if (!text && attachments.length === 0) return
    onSend({ text, attachments })
    setInput('')
@@ -512,6 +551,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              )}
            />
          </div>
+          {streaming && onStop ? <StopButton onStop={onStop} /> : null}
          <VoiceButton
            isRecording={voice.isRecording}
            isTranscribing={voice.isTranscribing}
@@ -529,12 +569,13 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              !!disabled ||
              voice.isRecording ||
              voice.isTranscribing ||
-              streaming
+              (streaming && !queueAware)
            }
            onClick={handleSend}
            // Spinner stays the user-facing "agent is busy" hint; with the
            // queue active we still spin while a turn is in flight.
            streaming={streaming}
+            hasContent={hasContent}
          />
        </div>
        {voice.error ? (
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
@@ -0,0 +1,243 @@
+import { Quote, TriangleAlert } from 'lucide-react'
+import type { FC } from 'react'
+import { Badge } from '@/components/ui/badge'
+import {
+  HoverCard,
+  HoverCardContent,
+  HoverCardTrigger,
+} from '@/components/ui/hover-card'
+import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
+import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
+import type {
+  HarnessAdapterHealth,
+  HarnessAgent,
+  HarnessAgentAdapter,
+} from '@/entrypoints/app/agents/agent-harness-types'
+import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
+import {
+  firstNonBlankLine,
+  truncate,
+} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
+import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
+import { cn } from '@/lib/utils'
+
+interface HomeAgentCardProps {
+  agent: HarnessAgent
+  adapter: HarnessAgentAdapter | 'unknown'
+  /** Per-adapter health snapshot, shared across cards rendering the
+   *  same adapter. `null` when the /adapters response hasn't surfaced
+   *  health yet (we treat that as healthy until proven otherwise). */
+  adapterHealth: HarnessAdapterHealth | null
+  /** Highlights the card with an accent ring; tells the user which
+   *  agent the conversation input is bound to. */
+  active?: boolean
+  onClick: () => void
+}
+
+const PREVIEW_CHARS = 100
+
+/**
+ * Grid-shaped card for the /home Recent agents section. Composition
+ * mirrors the rail's `AgentRowCard` but the layout is a vertical
+ * column sized for a 1/3-width tile rather than a full-width row.
+ *
+ * Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
+ * `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
+ * inline `Unavailable` chip pattern so the visual language is
+ * continuous between rail and grid.
+ */
+export const HomeAgentCard: FC<HomeAgentCardProps> = ({
+  agent,
+  adapter,
+  adapterHealth,
+  active,
+  onClick,
+}) => {
+  const status = agent.status ?? 'unknown'
+  const lastUsedAt = agent.lastUsedAt ?? null
+  const isWorking = status === 'working'
+  const isAsleep = status === 'asleep'
+  const isError = status === 'error'
+  const hasActiveTurn = Boolean(agent.activeTurnId)
+
+  return (
+    <button
+      type="button"
+      onClick={onClick}
+      className={cn(
+        'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
+        active && 'ring-1 ring-[var(--accent-orange)]/30',
+        isWorking
+          ? 'border-[var(--accent-orange)]/40'
+          : isError
+            ? 'border-destructive/30'
+            : 'border-border/60 hover:border-[var(--accent-orange)]/30',
+      )}
+    >
+      <div className="flex items-start gap-3">
+        <AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
+        <div className="min-w-0 flex-1">
+          <div className="flex items-center gap-1.5">
+            <span className="truncate font-semibold text-sm">
+              {displayName(agent)}
+            </span>
+            {isWorking && (
+              <Badge
+                variant="secondary"
+                className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
+              >
+                Working
+              </Badge>
+            )}
+          </div>
+          <SummaryLine
+            adapter={adapter}
+            modelId={agent.modelId ?? null}
+            reasoningEffort={agent.reasoningEffort ?? null}
+            adapterHealth={adapterHealth}
+          />
+        </div>
+      </div>
+
+      <LastMessage message={agent.lastUserMessage ?? null} />
+
+      <div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
+        <span>{statusFootnote(status, lastUsedAt)}</span>
+        {hasActiveTurn ? (
+          <ResumeChip />
+        ) : isAsleep ? (
+          <Badge variant="outline" className="text-muted-foreground">
+            Asleep
+          </Badge>
+        ) : isError ? (
+          <ErrorChip lastError={agent.lastError ?? null} />
+        ) : null}
+      </div>
+    </button>
+  )
+}
+
+const SummaryLine: FC<{
+  adapter: HarnessAgentAdapter | 'unknown'
+  modelId: string | null
+  reasoningEffort: string | null
+  adapterHealth: HarnessAdapterHealth | null
+}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
+  const parts = [adapterLabel(adapter)]
+  if (modelId) parts.push(modelId)
+  if (reasoningEffort) parts.push(reasoningEffort)
+  const unhealthy = adapterHealth?.healthy === false
+  return (
+    <div
+      className={cn(
+        'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
+        unhealthy && 'text-muted-foreground/70',
+      )}
+    >
+      <span className="truncate">{parts.join(' · ')}</span>
+      {unhealthy && (
+        <HoverCard openDelay={200}>
+          <HoverCardTrigger asChild>
+            <Badge
+              variant="outline"
+              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
+            >
+              <TriangleAlert className="size-2.5" />
+              <span className="font-normal">Unavailable</span>
+            </Badge>
+          </HoverCardTrigger>
+          <HoverCardContent side="right" className="w-72 text-sm">
+            <div className="font-medium">
+              {adapterLabel(adapter)} CLI not available
+            </div>
+            <div className="mt-1 text-muted-foreground text-xs">
+              {adapterHealth?.reason ??
+                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
+            </div>
+          </HoverCardContent>
+        </HoverCard>
+      )}
+    </div>
+  )
+}
+
+const LastMessage: FC<{ message: string | null }> = ({ message }) => {
+  if (!message) {
+    return (
+      <p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
+        No messages yet — start a chat
+      </p>
+    )
+  }
+  return (
+    <p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
+      <Quote
+        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
+        aria-hidden
+      />
+      <span className="line-clamp-2">
+        {truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
+      </span>
+    </p>
+  )
+}
+
+const ResumeChip: FC = () => (
+  <span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
+    <span className="relative flex size-1.5">
+      <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
+      <span className="relative inline-flex size-1.5 rounded-full bg-white" />
+    </span>
+    Resume
+  </span>
+)
+
+const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
+  if (!lastError) {
+    return <Badge variant="destructive">Attention</Badge>
+  }
+  return (
+    <HoverCard openDelay={200}>
+      <HoverCardTrigger asChild>
+        <Badge variant="destructive" className="cursor-default">
+          Attention
+        </Badge>
+      </HoverCardTrigger>
+      <HoverCardContent
+        side="left"
+        className="max-w-xs whitespace-pre-wrap font-mono text-xs"
+      >
+        {lastError}
+      </HoverCardContent>
+    </HoverCard>
+  )
+}
+
+/**
+ * Footer left side: relative time on every state EXCEPT working,
+ * which shows `now` (the dot is already pulsing — restating it as
+ * "Working" would duplicate the pill in the title row).
+ */
+function statusFootnote(
+  status: AgentLiveness,
+  lastUsedAt: number | null,
+): string {
+  if (status === 'working') return 'now'
+  return formatRelativeTime(lastUsedAt)
+}
+
+const UUID_PATTERN =
+  /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
+const OC_UUID_PATTERN =
+  /^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
+
+function displayName(agent: HarnessAgent): string {
+  const name = agent.name?.trim()
+  const id = agent.id
+  if (!name || name === id) {
+    if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
+    if (UUID_PATTERN.test(id)) return id.slice(0, 8)
+    return id
+  }
+  return name
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
@@ -0,0 +1,94 @@
+import { ListPlus, X } from 'lucide-react'
+import type { FC } from 'react'
+import {
+  Queue,
+  QueueItem,
+  QueueItemAction,
+  QueueItemActions,
+  QueueItemAttachment,
+  QueueItemContent,
+  QueueItemFile,
+  QueueItemImage,
+  QueueList,
+  QueueSection,
+  QueueSectionContent,
+  QueueSectionLabel,
+  QueueSectionTrigger,
+} from '@/components/ai-elements/queue'
+import type {
+  HarnessQueuedMessage,
+  HarnessQueuedMessageAttachment,
+} from '@/entrypoints/app/agents/agent-harness-types'
+import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
+
+interface QueuePanelProps {
+  queue: HarnessQueuedMessage[]
+  onRemove: (messageId: string) => void
+}
+
+/**
+ * Renders the agent's pending message queue using the shared AI
+ * Elements `Queue` primitives. Caller is expected to gate render on
+ * `queue.length > 0` — when empty, this returns null so the panel
+ * disappears cleanly between turns.
+ */
+export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
+  if (queue.length === 0) return null
+  return (
+    <Queue>
+      <QueueSection>
+        <QueueSectionTrigger>
+          <QueueSectionLabel
+            count={queue.length}
+            label={queue.length === 1 ? 'queued message' : 'queued messages'}
+            icon={<ListPlus className="size-3.5" />}
+          />
+        </QueueSectionTrigger>
+        <QueueSectionContent>
+          <QueueList>
+            {queue.map((entry) => (
+              <QueueItem key={entry.id}>
+                <div className="flex items-center gap-2">
+                  <QueueItemContent>
+                    {firstNonBlankLine(entry.message)}
+                  </QueueItemContent>
+                  <QueueItemActions>
+                    <QueueItemAction
+                      aria-label="Remove from queue"
+                      onClick={() => onRemove(entry.id)}
+                    >
+                      <X className="size-3" />
+                    </QueueItemAction>
+                  </QueueItemActions>
+                </div>
+                {entry.attachments && entry.attachments.length > 0 ? (
+                  <QueueItemAttachment>
+                    {entry.attachments.map((attachment, idx) =>
+                      renderAttachment(entry.id, attachment, idx),
+                    )}
+                  </QueueItemAttachment>
+                ) : null}
+              </QueueItem>
+            ))}
+          </QueueList>
+        </QueueSectionContent>
+      </QueueSection>
+    </Queue>
+  )
+}
+
+function renderAttachment(
+  messageId: string,
+  attachment: HarnessQueuedMessageAttachment,
+  idx: number,
+) {
+  if (attachment.mediaType.startsWith('image/')) {
+    const src = `data:${attachment.mediaType};base64,${attachment.data}`
+    return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
+  }
+  return (
+    <QueueItemFile key={`${messageId}-${idx}`}>
+      {attachment.mediaType}
+    </QueueItemFile>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
@@ -0,0 +1,69 @@
+import { describe, expect, it } from 'bun:test'
+import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
+import { orderHomeAgents } from './home-agent-card.helpers'
+
+function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
+  return {
+    id: overrides.id ?? 'agent-x',
+    name: overrides.name ?? overrides.id ?? 'agent-x',
+    adapter: overrides.adapter ?? 'codex',
+    permissionMode: 'approve-all',
+    sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
+    createdAt: 1000,
+    updatedAt: 1000,
+    ...overrides,
+  }
+}
+
+describe('orderHomeAgents', () => {
+  it('places active-turn agents before everyone else', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'a', lastUsedAt: 5000 }),
+      agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
+      agent({ id: 'c', lastUsedAt: 7000 }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
+  })
+
+  it('orders non-active agents by lastUsedAt desc', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'old', lastUsedAt: 1000 }),
+      agent({ id: 'new', lastUsedAt: 9000 }),
+      agent({ id: 'mid', lastUsedAt: 5000 }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
+  })
+
+  it('puts the gateway `main` seed agent above other never-used agents', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
+      agent({ id: 'main', lastUsedAt: null }),
+      agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
+  })
+
+  it('sends never-used agents to the bottom even when `main` is among them', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'main', lastUsedAt: null }),
+      agent({ id: 'used', lastUsedAt: 5000 }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
+  })
+
+  it('does NOT sort by pinned — pinned agents are treated like any other', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
+      agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
+  })
+
+  it('falls back to id-stable ordering when lastUsedAt ties', () => {
+    const sorted = orderHomeAgents([
+      agent({ id: 'b', lastUsedAt: 5000 }),
+      agent({ id: 'a', lastUsedAt: 5000 }),
+    ])
+    expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
+  })
+})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
@@ -0,0 +1,42 @@
+import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
+
+/**
+ * Order for the /home Recent agents grid.
+ *
+ * 1. Active turn first — agents mid-turn float to the top so the
+ *    Resume affordance is the first thing the user sees on /home.
+ * 2. The protected gateway-side `main` agent stays pinned-to-top in
+ *    the never-used group on a fresh install (mirrors the rail).
+ * 3. Recency (`lastUsedAt` desc).
+ * 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
+ *    every 5-second poll.
+ *
+ * Pin is NOT a sort key. The home grid is action-oriented and trusts
+ * recency + active-turn to surface the right agent; pinning is an
+ * organisation tool that lives on the rail at /agents.
+ */
+export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
+  return [...agents].sort((a, b) => {
+    const aActive = a.activeTurnId != null
+    const bActive = b.activeTurnId != null
+    if (aActive !== bActive) return aActive ? -1 : 1
+
+    // Recency wins outright. Never-used agents (`lastUsedAt == null`)
+    // both fall to the same `-Infinity` bucket and the seed/id rules
+    // below decide their order — but a used agent always beats any
+    // never-used agent regardless of id.
+    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
+    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
+    if (aValue !== bValue) return bValue - aValue
+
+    // Inside the never-used (or exact-tie) group: pin the gateway
+    // `main` seed to the top of the group on a fresh install, then
+    // fall back to id-stable order so the grid doesn't reshuffle on
+    // every poll.
+    const aSeed = a.id === 'main' && a.lastUsedAt == null
+    const bSeed = b.id === 'main' && b.lastUsedAt == null
+    if (aSeed !== bSeed) return aSeed ? -1 : 1
+
+    return a.id.localeCompare(b.id)
+  })
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
@@ -1,53 +0,0 @@
-import {
-  type AgentEntry,
-  getModelDisplayName,
-  type OpenClawStatus,
-} from '@/entrypoints/app/agents/useOpenClaw'
-import type { AgentCardData } from '@/lib/agent-conversations/types'
-import type { AgentOverview } from './useAgentDashboard'
-
-function resolveAgentStatus(
-  gatewayStatus: OpenClawStatus['status'] | undefined,
-  liveStatus: AgentOverview['status'] | undefined,
-): AgentCardData['status'] {
-  // Gateway-level errors take precedence
-  if (gatewayStatus === 'error') return 'error'
-  if (gatewayStatus === 'starting') return 'working'
-
-  // Per-agent live status from the WS observer
-  if (liveStatus === 'working') return 'working'
-  if (liveStatus === 'error') return 'error'
-
-  return 'idle'
-}
-
-/**
- * Build agent card display data by merging the raw agent entries from
- * the gateway with enriched overview data from the dashboard API.
- *
- * Pure function — no hooks, no IndexedDB, no async.
- */
-export function buildAgentCardData(
-  agents: AgentEntry[],
-  status: OpenClawStatus['status'] | undefined,
-  dashboard: AgentOverview[] | undefined,
-): AgentCardData[] {
-  return agents.map((agent) => {
-    const overview = dashboard?.find((d) => d.agentId === agent.agentId)
-
-    return {
-      agentId: agent.agentId,
-      name: agent.name,
-      model: getModelDisplayName(agent.model),
-      status:
-        agent.source === 'agent-harness'
-          ? 'idle'
-          : resolveAgentStatus(status, overview?.status),
-      lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
-      lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
-      activitySummary: overview?.activitySummary ?? undefined,
-      currentTool: overview?.currentTool ?? undefined,
-      costUsd: overview?.totalCostUsd ?? undefined,
-    }
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
@@ -36,6 +36,15 @@ interface UseAgentConversationOptions {
  history?: OpenClawChatHistoryMessage[]
  onComplete?: () => void
  onSessionKeyChange?: (sessionKey: string) => void
+  /**
+   * Server-side active turn id, surfaced via the listing query. When
+   * this changes from null/<id> to a different non-null id while we
+   * aren't already streaming (e.g. the server just popped a queued
+   * message and started a new turn), the hook reattaches via
+   * /chat/active so the chat panel picks up the live stream without
+   * waiting for a remount.
+   */
+  activeTurnId?: string | null
 }

 export function useAgentConversation(
@@ -211,31 +220,46 @@ export function useAgentConversation(
  }
  processEventRef.current = processAgentHarnessStreamEvent

-  // On mount (and whenever the agent changes), check whether the
-  // server has an in-flight turn for this agent and reattach to it.
-  // This is what makes the chat resilient across tab close/reopen,
-  // refresh, and navigation: the runtime call kept running on the
-  // server while we were away. Effect only depends on `agentId` —
-  // the event handler is read off a ref so this doesn't re-subscribe
-  // every render.
+  const activeTurnIdDep = options.activeTurnId ?? null
+
+  // On mount, on agent change, and whenever the listing reports a
+  // *new* active turn id, check whether the server has an in-flight
+  // turn for this agent and reattach to it. This catches three
+  // cases at once: the chat resilience flow (tab close/reopen),
+  // navigation between agents, AND queue drain (the server starts a
+  // new turn from a queued message → activeTurnId flips → attach).
  useEffect(() => {
    let cancelled = false
    const abortController = new AbortController()
+    // Reference the dep inside the body so biome's exhaustive-deps
+    // rule sees it consumed; the value is just an "any non-null
+    // active turn id" trigger — the actual id we attach to comes
+    // from the fresh fetchActiveHarnessTurn call below.
+    void activeTurnIdDep

    const attemptResume = async () => {
+      // Track whether *we* started a stream in this run. When the
+      // early-return paths fire (no active turn, or a `send()` /
+      // earlier resume already owns `streamAbortRef`), the finally
+      // block must NOT touch streaming/turnIdRef/lastSeqRef —
+      // otherwise we clobber the in-flight stream's state and the
+      // Stop button drops out mid-turn while events keep arriving.
+      let weStartedStream = false
      try {
        const active = await fetchActiveHarnessTurn(agentId)
        if (cancelled || !active || active.status !== 'running') return
-        if (streamAbortRef.current) return // a fresh send already in flight
+        if (streamAbortRef.current) return // someone else already owns the stream

        // Stage a placeholder turn so the streamed events have a row
-        // to render into. We don't have the user message text on
-        // resume; the assistant turn is what we're catching up on.
+        // to render into. The server now persists the kicking-off
+        // prompt on the active turn, so we render it as the user
+        // bubble immediately — no empty-bubble flicker when a queued
+        // message starts running.
        setTurns((prev) => [
          ...prev,
          {
            id: crypto.randomUUID(),
-            userText: '',
+            userText: active.prompt ?? '',
            parts: [],
            done: false,
            timestamp: active.startedAt,
@@ -247,6 +271,7 @@ export function useAgentConversation(
        lastSeqRef.current = null
        streamAbortRef.current = abortController
        setStreaming(true)
+        weStartedStream = true

        const response = await attachToHarnessTurn(agentId, {
          turnId: active.turnId,
@@ -265,10 +290,20 @@ export function useAgentConversation(
        // Resume is best-effort; transient errors fall back to the
        // user starting a new turn manually.
      } finally {
-        if (!cancelled) {
-          if (streamAbortRef.current === abortController) {
-            streamAbortRef.current = null
-          }
+        // Always release `streamAbortRef` if we owned it — even when
+        // the effect was cancelled mid-stream (a listing poll
+        // captured the next queue-drain turn id, for example). If we
+        // don't, the next effect run hits `if (streamAbortRef.current)
+        // return` against our now-aborted controller and never
+        // reattaches, leaving `streaming === true` with no live stream.
+        if (weStartedStream && streamAbortRef.current === abortController) {
+          streamAbortRef.current = null
+        }
+        // The other state (streaming flag, turn id, lastSeq) is the
+        // *current run's* lifecycle: only reset it on a clean exit.
+        // When `cancelled` is true the next run will set these
+        // itself, so resetting here would only cause a brief flicker.
+        if (!cancelled && weStartedStream) {
          turnIdRef.current = null
          lastSeqRef.current = null
          setStreaming(false)
@@ -281,7 +316,7 @@ export function useAgentConversation(
      cancelled = true
      abortController.abort()
    }
-  }, [agentId])
+  }, [agentId, activeTurnIdDep])

  const send = async (input: string | SendInput) => {
    const normalized: SendInput =
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
@@ -1,95 +0,0 @@
-import { useQuery, useQueryClient } from '@tanstack/react-query'
-import { useEffect } from 'react'
-import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
-
-export interface AgentOverview {
-  agentId: string
-  status: 'working' | 'idle' | 'error' | 'unknown'
-  latestMessage: string | null
-  latestMessageAt: number | null
-  activitySummary: string | null
-  currentTool: string | null
-  totalCostUsd: number
-  sessionCount: number
-}
-
-export interface DashboardResponse {
-  agents: AgentOverview[]
-  summary: {
-    totalAgents: number
-    totalCostUsd: number
-  }
-}
-
-interface StatusEvent {
-  agentId: string
-  status: AgentOverview['status']
-  currentTool: string | null
-  error: string | null
-  timestamp: number
-}
-
-const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
-
-export function useAgentDashboard(enabled: boolean) {
-  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-  const ready = enabled && Boolean(baseUrl) && !urlLoading
-
-  // Initial data load + periodic refresh as fallback
-  const query = useQuery<DashboardResponse>({
-    queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
-    queryFn: async () => {
-      const url = new URL('/claw/dashboard', baseUrl as string)
-      const response = await fetch(url.toString())
-      if (!response.ok) throw new Error('Failed to fetch dashboard')
-      return response.json()
-    },
-    enabled: ready,
-  })
-
-  // SSE subscription for real-time status patches
-  useEffect(() => {
-    if (!ready || !baseUrl) return
-
-    const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
-    const eventSource = new EventSource(streamUrl.toString())
-
-    eventSource.addEventListener('snapshot', (event) => {
-      try {
-        const dashboard = JSON.parse(event.data) as DashboardResponse
-        queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
-      } catch {}
-    })
-
-    eventSource.addEventListener('status', (event) => {
-      try {
-        const status = JSON.parse(event.data) as StatusEvent
-        queryClient.setQueryData<DashboardResponse>(
-          [...DASHBOARD_QUERY_KEY, baseUrl],
-          (prev) => {
-            if (!prev) return prev
-            return {
-              ...prev,
-              agents: prev.agents.map((agent) =>
-                agent.agentId === status.agentId
-                  ? {
-                      ...agent,
-                      status: status.status,
-                      currentTool: status.currentTool,
-                    }
-                  : agent,
-              ),
-            }
-          },
-        )
-      } catch {}
-    })
-
-    return () => {
-      eventSource.close()
-    }
-  }, [ready, baseUrl, queryClient])
-
-  return query
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
@@ -2,67 +2,75 @@ import { Loader2 } from 'lucide-react'
 import { type FC, useMemo } from 'react'
 import { AgentRowCard } from './AgentRowCard'
 import { AgentsEmptyState } from './AgentsEmptyState'
-import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
+import type {
+  HarnessAdapterDescriptor,
+  HarnessAgent,
+  HarnessAgentAdapter,
+} from './agent-harness-types'
+import type {
+  AgentAdapterHealth,
+  AgentRowData,
+} from './agent-row/agent-row.types'
+import { compareAgentsByPinThenRecency } from './agents-list-order'
 import type { AgentListItem } from './agents-page-types'
 import type { AgentLiveness } from './LivenessDot'

 interface AgentListProps {
  agents: AgentListItem[]
-  /**
-   * Optional per-agent activity metadata. Keyed by `agentId`. Missing
-   * entries fall back to status='unknown' / lastUsedAt=null and the
-   * row renders an "unknown" dot. The server will populate this once
-   * the activity tracker ships; the page works without it.
-   */
+  /** Optional per-agent activity metadata, keyed by `agentId`. */
  activity?: Record<
    string,
    { status: AgentLiveness; lastUsedAt: number | null }
  >
-  /**
-   * Lookup table from harness agent id → adapter + reasoning effort,
-   * sourced from `useHarnessAgents`. Lets the row card render the
-   * correct adapter icon and chips for harness agents (legacy
-   * /claw/agents entries fall back to inferring from `runtimeLabel`).
-   */
+  /** Lookup table from harness id → enriched agent record. */
  harnessAgentLookup?: Map<string, HarnessAgent>
+  /** Adapter catalog (carries per-adapter health). */
+  adapters: HarnessAdapterDescriptor[]
  loading: boolean
  deletingAgentKey: string | null
  onCreateAgent: () => void
  onDeleteAgent: (agent: AgentListItem) => void
+  onPinToggle: (agent: AgentListItem, next: boolean) => void
 }

 export const AgentList: FC<AgentListProps> = ({
  agents,
  activity,
  harnessAgentLookup,
+  adapters,
  loading,
  deletingAgentKey,
  onCreateAgent,
  onDeleteAgent,
+  onPinToggle,
 }) => {
-  // Sort by recency: most recently used first; never-used agents drop
-  // to the bottom in id-stable order so the list doesn't reshuffle on
-  // every refresh. The pinned exception is the gateway's `main` agent
-  // when it's never been touched — keep it at the top so a fresh
-  // install has an obvious starting point.
+  const adapterHealth = useMemo(() => {
+    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
+    for (const adapter of adapters) {
+      if (adapter.health) {
+        map.set(adapter.id, {
+          healthy: adapter.health.healthy,
+          reason: adapter.health.reason,
+        })
+      }
+    }
+    return map
+  }, [adapters])
+
  const ordered = useMemo(() => {
-    const withScore = agents.map((agent) => {
-      const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
-      return { agent, lastUsedAt }
+    const withMeta = agents.map((agent) => {
+      const harness = harnessAgentLookup?.get(agent.agentId)
+      return {
+        agent,
+        id: agent.agentId,
+        pinned: harness?.pinned ?? false,
+        lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
+      }
    })
-    return withScore
-      .sort((a, b) => {
-        const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
-        const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
-        if (aPinned && !bPinned) return -1
-        if (!aPinned && bPinned) return 1
-        const aValue = a.lastUsedAt ?? -Infinity
-        const bValue = b.lastUsedAt ?? -Infinity
-        if (aValue !== bValue) return bValue - aValue
-        return a.agent.agentId.localeCompare(b.agent.agentId)
-      })
+    return withMeta
+      .sort(compareAgentsByPinThenRecency)
      .map((entry) => entry.agent)
-  }, [activity, agents])
+  }, [activity, agents, harnessAgentLookup])

  if (loading && agents.length === 0) {
    return (
@@ -80,18 +88,23 @@ export const AgentList: FC<AgentListProps> = ({
    <div className="grid gap-3">
      {ordered.map((agent) => {
        const harness = harnessAgentLookup?.get(agent.agentId)
-        const adapter: HarnessAgentAdapter | undefined =
+        const adapter: HarnessAgentAdapter | 'unknown' =
          harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
+        const data = buildRowData({
+          agent,
+          adapter,
+          harness,
+          activity: activity?.[agent.agentId],
+          adapterHealth:
+            adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
+        })
        return (
          <AgentRowCard
            key={agent.key}
-            agent={agent}
-            status={activity?.[agent.agentId]?.status}
-            lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
-            adapter={adapter}
-            reasoningEffort={harness?.reasoningEffort ?? null}
-            onDelete={onDeleteAgent}
+            data={data}
            deleting={deletingAgentKey === agent.key}
+            onDelete={onDeleteAgent}
+            onPinToggle={onPinToggle}
          />
        )
      })}
@@ -99,10 +112,53 @@ export const AgentList: FC<AgentListProps> = ({
  )
 }

-function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
+function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
  const lower = label?.toLowerCase()
  if (lower === 'claude code') return 'claude'
  if (lower === 'codex') return 'codex'
  if (lower === 'openclaw') return 'openclaw'
-  return undefined
+  return 'unknown'
+}
+
+const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
+
+function buildRowData(input: {
+  agent: AgentListItem
+  adapter: HarnessAgentAdapter | 'unknown'
+  harness: HarnessAgent | undefined
+  activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
+  adapterHealth: AgentAdapterHealth | null
+}): AgentRowData {
+  const { agent, adapter, harness, activity, adapterHealth } = input
+  return {
+    agent,
+    adapter,
+    modelLabel: deriveModelLabel(agent, harness),
+    reasoningEffort: harness?.reasoningEffort ?? null,
+    status: activity?.status ?? 'unknown',
+    lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
+    pinned: harness?.pinned ?? false,
+    cwd: harness?.cwd ?? null,
+    lastUserMessage: harness?.lastUserMessage ?? null,
+    tokens: harness?.tokens ?? null,
+    turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
+    failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
+    lastError: harness?.lastError ?? null,
+    lastErrorAt: harness?.lastErrorAt ?? null,
+    activeTurnId: harness?.activeTurnId ?? null,
+    adapterHealth,
+  }
+}
+
+function deriveModelLabel(
+  agent: AgentListItem,
+  harness: HarnessAgent | undefined,
+): string | null {
+  // Prefer the agent rail's modelLabel when meaningful; harness's
+  // modelId is a stable identifier but the rail's `modelLabel`
+  // already maps to a friendly display string.
+  if (agent.modelLabel && agent.modelLabel !== 'default') {
+    return agent.modelLabel
+  }
+  return harness?.modelId ?? null
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
@@ -1,270 +1,99 @@
-import {
-  Copy,
-  Loader2,
-  MessageSquare,
-  MoreHorizontal,
-  Pencil,
-  RotateCcw,
-  Trash2,
-} from 'lucide-react'
 import type { FC } from 'react'
-import { useNavigate } from 'react-router'
-import { toast } from 'sonner'
-import { Badge } from '@/components/ui/badge'
-import { Button } from '@/components/ui/button'
-import {
-  DropdownMenu,
-  DropdownMenuContent,
-  DropdownMenuItem,
-  DropdownMenuSeparator,
-  DropdownMenuTrigger,
-} from '@/components/ui/dropdown-menu'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
 import { cn } from '@/lib/utils'
-import { AdapterIcon, adapterLabel } from './AdapterIcon'
-import {
-  canDelete as canDeleteAgent,
-  canRename as canRenameAgent,
-  displayName,
-  formatRelativeTime,
-  workspaceLabel,
-} from './agent-display.helpers'
-import type { HarnessAgentAdapter } from './agent-harness-types'
-import type { AgentListItem } from './agents-page-types'
-import { type AgentLiveness, LivenessDot } from './LivenessDot'
+import { AgentActions } from './agent-row/AgentActions'
+import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
+import { AgentLastMessage } from './agent-row/AgentLastMessage'
+import { AgentMetaRow } from './agent-row/AgentMetaRow'
+import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
+import { AgentTile } from './agent-row/AgentTile'
+import { AgentTitleRow } from './agent-row/AgentTitleRow'
+import type {
+  AgentRowCallbacks,
+  AgentRowData,
+} from './agent-row/agent-row.types'

-interface AgentRowCardProps {
-  agent: AgentListItem
-  /**
-   * Per-agent extras the listing surface provides on top of the
-   * minimal `AgentListItem` shape. `lastUsedAt` survives server
-   * restart (sourced from acpx session record); `status` is in-memory
-   * server-side.
-   */
-  status?: AgentLiveness
-  lastUsedAt?: number | null
-  /** Adapter the agent belongs to. Drives icon + label. */
-  adapter?: HarnessAgentAdapter
-  /** Reasoning effort chip (claude/codex/openclaw catalog). */
-  reasoningEffort?: string | null
-  /** Modeled directly off the inbound delete handler so the parent owns the dialog. */
-  onDelete: (agent: AgentListItem) => void
-  /** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
+interface AgentRowCardProps extends AgentRowCallbacks {
+  data: AgentRowData
+  /** Whether THIS agent is mid-delete; renders a spinner in the menu. */
  deleting?: boolean
 }

+/**
+ * Composition shell for the agent rail. Owns no state; sub-components
+ * each handle their own micro-state (error-panel collapse, etc.) and
+ * emit callbacks (delete, pin/unpin) for the page to act on.
+ *
+ * The whole card carries state — not just the tile — so the row's
+ * border subtly tells the user what's going on at a glance:
+ *   working → accent-orange border with a soft glow
+ *   error   → destructive border
+ *   idle    → muted border, lifts on hover
+ */
 export const AgentRowCard: FC<AgentRowCardProps> = ({
-  agent,
-  status = 'unknown',
-  lastUsedAt,
-  adapter,
-  reasoningEffort,
-  onDelete,
+  data,
  deleting,
+  onDelete,
+  onPinToggle,
 }) => {
-  const navigate = useNavigate()
-  const adapterId = adapter ?? inferAdapterFromListItem(agent)
-  const workspace = workspaceLabel(agent)
-  const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
-  const allowDelete = canDeleteAgent(agent)
-  const allowRename = canRenameAgent(agent)
-
-  const handleChat = () => navigate(`/agents/${agent.agentId}`)
-  const handleCopyId = async () => {
-    try {
-      await navigator.clipboard.writeText(agent.agentId)
-      toast.success('Agent id copied')
-    } catch {
-      toast.error('Could not copy agent id')
-    }
-  }
-
  return (
    <div
      className={cn(
-        'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
-        'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
+        // Layout-stable hover. No translate, no shadow change — both
+        // visibly perturb neighbouring rows. Only the border tint
+        // shifts on hover, and the rail's vertical rhythm stays
+        // exactly the same in every state.
+        'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
+        data.status === 'working'
+          ? 'border-[var(--accent-orange)]/40'
+          : data.status === 'error'
+            ? 'border-destructive/40'
+            : 'border-border hover:border-[var(--accent-orange)]/30',
      )}
    >
      <div className="flex items-start gap-4">
-        {/* Adapter tile + liveness dot in the corner. */}
-        <div className="relative shrink-0">
-          <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
-            <AdapterIcon adapter={adapterId} className="h-6 w-6" />
-          </div>
-          <LivenessDot
-            status={status}
-            detail={livenessDetail(status, lastUsedAt)}
-            className="absolute -right-0.5 -bottom-0.5"
-          />
-        </div>
+        <AgentTile
+          adapter={data.adapter}
+          status={data.status}
+          lastUsedAt={data.lastUsedAt}
+        />

        <div className="min-w-0 flex-1">
-          <div className="mb-1 flex items-center gap-2">
-            <span className="truncate font-semibold">{displayName(agent)}</span>
-            {status === 'working' && (
-              <Badge
-                variant="secondary"
-                className="bg-amber-50 text-amber-900 hover:bg-amber-50"
-              >
-                Working
-              </Badge>
-            )}
-            {status === 'asleep' && (
-              <Badge variant="outline" className="text-muted-foreground">
-                Asleep
-              </Badge>
-            )}
-            {status === 'error' && (
-              <Badge variant="destructive">Attention</Badge>
-            )}
-          </div>
+          <AgentTitleRow
+            agent={data.agent}
+            status={data.status}
+            pinned={data.pinned}
+            turnsByDay={data.turnsByDay}
+            failedByDay={data.failedByDay}
+            onPinToggle={(next) => onPinToggle(data.agent, next)}
+          />

-          <div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
-            <Badge variant="secondary" className="font-normal">
-              {adapterLabel(adapterId)}
-            </Badge>
-            {agent.modelLabel && agent.modelLabel !== 'default' && (
-              <Badge variant="outline" className="font-normal">
-                {agent.modelLabel}
-              </Badge>
-            )}
-            {reasoningEffort && reasoningEffort !== 'medium' && (
-              <Badge variant="outline" className="font-normal">
-                {reasoningEffort}
-              </Badge>
-            )}
-          </div>
+          <AgentSummaryChips
+            adapter={data.adapter}
+            modelLabel={data.modelLabel}
+            reasoningEffort={data.reasoningEffort}
+            adapterHealth={data.adapterHealth}
+          />

-          <div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
-            <span>Last used {lastUsedLabel}</span>
-            {workspace && (
-              <>
-                <span aria-hidden>•</span>
-                <span className="truncate font-mono" title={workspace}>
-                  {workspace}
-                </span>
-              </>
-            )}
-          </div>
+          <AgentLastMessage message={data.lastUserMessage} />
+
+          <AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
+
+          {data.status === 'error' && data.lastError && (
+            <AgentErrorPanel
+              agentId={data.agent.agentId}
+              message={data.lastError}
+              errorAt={data.lastErrorAt}
+            />
+          )}
        </div>

-        <div className="flex shrink-0 items-center gap-2">
-          <Button variant="outline" size="sm" onClick={handleChat}>
-            <MessageSquare className="mr-1.5 h-3 w-3" />
-            Chat
-          </Button>
-          <DropdownMenu>
-            <DropdownMenuTrigger asChild>
-              <Button
-                variant="ghost"
-                size="icon"
-                aria-label={`More actions for ${displayName(agent)}`}
-                className="h-8 w-8"
-              >
-                <MoreHorizontal className="h-4 w-4" />
-              </Button>
-            </DropdownMenuTrigger>
-            <DropdownMenuContent align="end" className="w-44">
-              <DropdownMenuItem onSelect={() => void handleCopyId()}>
-                <Copy className="mr-2 h-3.5 w-3.5" />
-                Copy id
-              </DropdownMenuItem>
-              <RenameMenuItem disabled={!allowRename} />
-              <ResetHistoryMenuItem />
-              <DropdownMenuSeparator />
-              <DropdownMenuItem
-                onSelect={() => onDelete(agent)}
-                disabled={!allowDelete || deleting}
-                className="text-destructive focus:text-destructive"
-              >
-                {deleting ? (
-                  <Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
-                ) : (
-                  <Trash2 className="mr-2 h-3.5 w-3.5" />
-                )}
-                Delete
-              </DropdownMenuItem>
-            </DropdownMenuContent>
-          </DropdownMenu>
-        </div>
+        <AgentActions
+          agent={data.agent}
+          activeTurnId={data.activeTurnId}
+          deleting={deleting}
+          onDelete={onDelete}
+        />
      </div>
    </div>
  )
 }
-
-const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
-  const item = (
-    <DropdownMenuItem disabled className="text-muted-foreground">
-      <Pencil className="mr-2 h-3.5 w-3.5" />
-      Rename
-    </DropdownMenuItem>
-  )
-  if (!disabled) return item
-  // Disabled but with a hint so users know it's coming, not broken.
-  return (
-    <TooltipProvider delayDuration={300}>
-      <Tooltip>
-        <TooltipTrigger asChild>
-          <span className="block w-full">{item}</span>
-        </TooltipTrigger>
-        <TooltipContent side="left" className="text-xs">
-          Rename coming soon
-        </TooltipContent>
-      </Tooltip>
-    </TooltipProvider>
-  )
-}
-
-const ResetHistoryMenuItem: FC = () => {
-  const item = (
-    <DropdownMenuItem disabled className="text-muted-foreground">
-      <RotateCcw className="mr-2 h-3.5 w-3.5" />
-      Reset history
-    </DropdownMenuItem>
-  )
-  return (
-    <TooltipProvider delayDuration={300}>
-      <Tooltip>
-        <TooltipTrigger asChild>
-          <span className="block w-full">{item}</span>
-        </TooltipTrigger>
-        <TooltipContent side="left" className="text-xs">
-          Reset history coming soon
-        </TooltipContent>
-      </Tooltip>
-    </TooltipProvider>
-  )
-}
-
-function inferAdapterFromListItem(
-  agent: AgentListItem,
-): HarnessAgentAdapter | 'unknown' {
-  const label = agent.runtimeLabel?.toLowerCase()
-  if (label?.includes('claude')) return 'claude'
-  if (label?.includes('codex')) return 'codex'
-  if (label?.includes('openclaw')) return 'openclaw'
-  return 'unknown'
-}
-
-function livenessDetail(
-  status: AgentLiveness,
-  lastUsedAt: number | null | undefined,
-): string | undefined {
-  if (lastUsedAt == null) return undefined
-  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
-  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
-  if (status === 'asleep') {
-    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
-    const hr = Math.floor(diffMin / 60)
-    return `Asleep — quiet for ${hr} hr`
-  }
-  if (status === 'working') return 'Working on a turn'
-  if (status === 'error') return 'Attention — last turn failed'
-  return undefined
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
@@ -44,6 +44,7 @@ import {
  useCreateHarnessAgent,
  useDeleteHarnessAgent,
  useHarnessAgents,
+  useUpdateHarnessAgent,
 } from './useAgents'
 import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'

@@ -76,6 +77,7 @@ export const AgentsPage: FC = () => {
  } = useOpenClawAgents(openClawAgentsEnabled)
  const createHarnessAgent = useCreateHarnessAgent()
  const deleteHarnessAgent = useDeleteHarnessAgent()
+  const updateHarnessAgent = useUpdateHarnessAgent()
  const {
    setupOpenClaw,
    createAgent: createOpenClawAgent,
@@ -342,12 +344,24 @@ export const AgentsPage: FC = () => {
          agents={agentListItems}
          activity={agentActivity}
          harnessAgentLookup={harnessAgentLookup}
+          adapters={adapters}
          loading={agentsLoading}
          deletingAgentKey={deletingAgent ? deletingAgentKey : null}
          onCreateAgent={() => setCreateOpen(true)}
          onDeleteAgent={(agent) => {
            void handleDelete(agent)
          }}
+          onPinToggle={(agent, next) => {
+            // Optimistic mutation; harness-only — gateway-original
+            // OpenClaw entries are gated server-side via the harness
+            // backfill, so we only fire when the row maps to a
+            // harness agent record.
+            if (!harnessAgentLookup.has(agent.agentId)) return
+            updateHarnessAgent.mutate({
+              agentId: agent.agentId,
+              patch: { pinned: next },
+            })
+          }}
        />

        <SetupOpenClawDialog
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
@@ -1,4 +1,5 @@
 import type { AgentListItem } from './agents-page-types'
+import type { AgentLiveness } from './LivenessDot'

 /**
 * Display rules for the redesigned agent rows. Pure helpers — no React,
@@ -82,3 +83,25 @@ export function formatRelativeTime(epochMs: number | null): string {
  const d = Math.floor(diff / ONE_DAY)
  return d === 1 ? '1 day ago' : `${d} days ago`
 }
+
+/**
+ * Tooltip-friendly description of a row's current liveness state.
+ * Returns `undefined` when the state has nothing extra to add (e.g.
+ * `unknown` with no timestamp).
+ */
+export function livenessDetail(
+  status: AgentLiveness,
+  lastUsedAt: number | null | undefined,
+): string | undefined {
+  if (lastUsedAt == null) return undefined
+  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
+  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
+  if (status === 'asleep') {
+    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
+    const hr = Math.floor(diffMin / 60)
+    return `Asleep — quiet for ${hr} hr`
+  }
+  if (status === 'working') return 'Working on a turn'
+  if (status === 'error') return 'Attention — last turn failed'
+  return undefined
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
@@ -56,6 +56,43 @@ export interface HarnessAgent {
   * agents. Drives the recency sort and the "Last used X min ago" copy.
   */
  lastUsedAt?: number | null
+  /** Pinned agents float to the top of the list. Defaults to `false`. */
+  pinned?: boolean
+  /** First non-blank line of the most recent user message; null if none. */
+  lastUserMessage?: string | null
+  /** Working directory the agent runs in; null when no session record yet. */
+  cwd?: string | null
+  /** Cumulative + 7-day rolling token usage; null when no record. */
+  tokens?: {
+    last7d: { input: number; output: number; requestCount: number }
+    cumulative: { input: number; output: number }
+  } | null
+  turnsByDay?: number[]
+  failedByDay?: number[]
+  lastError?: string | null
+  lastErrorAt?: number | null
+  /** When non-null, an in-flight turn this row can be resumed from. */
+  activeTurnId?: string | null
+  /** Persistent FIFO queue of messages waiting for this agent. */
+  queue?: HarnessQueuedMessage[]
+}
+
+export interface HarnessQueuedMessageAttachment {
+  mediaType: string
+  data: string
+}
+
+export interface HarnessQueuedMessage {
+  id: string
+  createdAt: number
+  message: string
+  attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
+}
+
+export interface HarnessAdapterHealth {
+  healthy: boolean
+  reason?: string
+  checkedAt: number
 }

 export interface HarnessAdapterDescriptor {
@@ -66,6 +103,7 @@ export interface HarnessAdapterDescriptor {
  modelControl: 'runtime-supported' | 'best-effort'
  models: Array<{ id: string; label: string; recommended?: boolean }>
  reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
+  health?: HarnessAdapterHealth
 }

 export interface CreateHarnessAgentInput {
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
@@ -0,0 +1,160 @@
+import {
+  Copy,
+  Loader2,
+  MessageSquare,
+  MoreHorizontal,
+  Pencil,
+  RotateCcw,
+  Trash2,
+} from 'lucide-react'
+import type { FC } from 'react'
+import { useNavigate } from 'react-router'
+import { toast } from 'sonner'
+import { Button } from '@/components/ui/button'
+import {
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuItem,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from '@/components/ui/dropdown-menu'
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from '@/components/ui/tooltip'
+import {
+  canDelete as canDeleteAgent,
+  canRename as canRenameAgent,
+  displayName,
+} from '../agent-display.helpers'
+import type { AgentListItem } from '../agents-page-types'
+
+interface AgentActionsProps {
+  agent: AgentListItem
+  activeTurnId: string | null
+  deleting?: boolean
+  onDelete: (agent: AgentListItem) => void
+}
+
+/**
+ * Single primary CTA per row: `Resume` (filled, accent-orange, with a
+ * pulsing dot) when an active turn exists; otherwise `Chat` (outline).
+ * Both navigate to the same place — the chat hook auto-attaches via
+ * `/chat/active` when there's a live turn — but the row signals which
+ * action the user is actually taking.
+ */
+export const AgentActions: FC<AgentActionsProps> = ({
+  agent,
+  activeTurnId,
+  deleting,
+  onDelete,
+}) => {
+  const navigate = useNavigate()
+  const allowDelete = canDeleteAgent(agent)
+  const allowRename = canRenameAgent(agent)
+
+  const handleChat = () => navigate(`/agents/${agent.agentId}`)
+  const handleCopyId = async () => {
+    try {
+      await navigator.clipboard.writeText(agent.agentId)
+      toast.success('Agent id copied')
+    } catch {
+      toast.error('Could not copy agent id')
+    }
+  }
+
+  return (
+    <div className="flex shrink-0 items-center gap-1.5">
+      {activeTurnId ? (
+        <Button
+          variant="default"
+          size="sm"
+          onClick={handleChat}
+          className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
+        >
+          <span className="relative flex size-2">
+            <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
+            <span className="relative inline-flex size-2 rounded-full bg-white" />
+          </span>
+          Resume
+        </Button>
+      ) : (
+        <Button variant="outline" size="sm" onClick={handleChat}>
+          <MessageSquare className="mr-1.5 size-3" />
+          Chat
+        </Button>
+      )}
+      <DropdownMenu>
+        <DropdownMenuTrigger asChild>
+          <Button
+            variant="ghost"
+            size="icon"
+            aria-label={`More actions for ${displayName(agent)}`}
+            className="size-8 text-muted-foreground hover:text-foreground"
+          >
+            <MoreHorizontal className="size-4" />
+          </Button>
+        </DropdownMenuTrigger>
+        <DropdownMenuContent align="end" className="w-44">
+          <DropdownMenuItem onSelect={() => void handleCopyId()}>
+            <Copy className="mr-2 size-3.5" />
+            Copy id
+          </DropdownMenuItem>
+          <ComingSoonItem
+            icon={Pencil}
+            label="Rename"
+            disabled={!allowRename}
+          />
+          <ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
+          <DropdownMenuSeparator />
+          <DropdownMenuItem
+            onSelect={() => onDelete(agent)}
+            disabled={!allowDelete || deleting}
+            className="text-destructive focus:text-destructive"
+          >
+            {deleting ? (
+              <Loader2 className="mr-2 size-3.5 animate-spin" />
+            ) : (
+              <Trash2 className="mr-2 size-3.5" />
+            )}
+            Delete
+          </DropdownMenuItem>
+        </DropdownMenuContent>
+      </DropdownMenu>
+    </div>
+  )
+}
+
+interface ComingSoonItemProps {
+  icon: typeof Pencil
+  label: string
+  disabled: boolean
+}
+
+const ComingSoonItem: FC<ComingSoonItemProps> = ({
+  icon: Icon,
+  label,
+  disabled,
+}) => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <Icon className="mr-2 size-3.5" />
+      {label}
+    </DropdownMenuItem>
+  )
+  if (!disabled) return item
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          {label} coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
@@ -0,0 +1,96 @@
+import { AlertTriangle, ChevronDown } from 'lucide-react'
+import { type FC, useEffect, useState } from 'react'
+import { Button } from '@/components/ui/button'
+import {
+  Collapsible,
+  CollapsibleContent,
+  CollapsibleTrigger,
+} from '@/components/ui/collapsible'
+import {
+  HoverCard,
+  HoverCardContent,
+  HoverCardTrigger,
+} from '@/components/ui/hover-card'
+import { cn } from '@/lib/utils'
+import { truncate } from './agent-row.helpers'
+
+interface AgentErrorPanelProps {
+  agentId: string
+  message: string
+  errorAt: number | null
+}
+
+const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
+const PREVIEW_CHARS = 200
+
+export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
+  agentId,
+  message,
+  errorAt,
+}) => {
+  const storageKey = `${STORAGE_PREFIX}${agentId}`
+  // Open if we've never seen this `errorAt` for this agent. Once the
+  // user collapses the panel (or refreshes after seeing it), we mark
+  // it seen so it doesn't re-pop on every poll.
+  const [open, setOpen] = useState<boolean>(() => {
+    if (typeof window === 'undefined' || !errorAt) return true
+    const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
+    return !Number.isFinite(seen) || errorAt > seen
+  })
+
+  useEffect(() => {
+    if (!open && errorAt && typeof window !== 'undefined') {
+      window.localStorage.setItem(storageKey, String(errorAt))
+    }
+  }, [open, errorAt, storageKey])
+
+  const preview = truncate(message, PREVIEW_CHARS)
+  const truncated = preview.length < message.length
+
+  return (
+    <Collapsible open={open} onOpenChange={setOpen} className="mt-3">
+      <div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
+        <div className="flex items-center gap-2 font-medium text-destructive text-xs">
+          <AlertTriangle className="size-3.5" />
+          Last error
+        </div>
+        <CollapsibleTrigger asChild>
+          <Button
+            variant="ghost"
+            size="sm"
+            className="h-6 px-2 text-muted-foreground"
+          >
+            <span className="text-xs">{open ? 'hide' : 'show'}</span>
+            <ChevronDown
+              className={cn(
+                'ml-1 size-3 transition-transform',
+                open && 'rotate-180',
+              )}
+            />
+          </Button>
+        </CollapsibleTrigger>
+      </div>
+      <CollapsibleContent>
+        <div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
+          {truncated ? (
+            <HoverCard openDelay={300}>
+              <HoverCardTrigger asChild>
+                <span className="cursor-default font-mono text-foreground/80">
+                  {preview}…
+                </span>
+              </HoverCardTrigger>
+              <HoverCardContent
+                side="bottom"
+                className="max-w-md whitespace-pre-wrap font-mono text-xs"
+              >
+                {message}
+              </HoverCardContent>
+            </HoverCard>
+          ) : (
+            <span className="font-mono text-foreground/80">{message}</span>
+          )}
+        </div>
+      </CollapsibleContent>
+    </Collapsible>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
@@ -0,0 +1,35 @@
+import { Quote } from 'lucide-react'
+import type { FC } from 'react'
+import { firstNonBlankLine, truncate } from './agent-row.helpers'
+
+interface AgentLastMessageProps {
+  message: string | null
+}
+
+const PREVIEW_CHARS = 110
+
+/**
+ * Inline preview of the most recent user message. Renders as a quoted,
+ * italic line so the row reads like a conversation snippet rather than
+ * a label-and-value pair. No hover-card — opening the agent's chat is
+ * the canonical way to read the full message.
+ */
+export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
+  if (!message) {
+    return (
+      <p className="mt-1 text-muted-foreground/70 text-xs italic">
+        No messages yet — start a chat
+      </p>
+    )
+  }
+  const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
+  return (
+    <p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
+      <Quote
+        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
+        aria-hidden
+      />
+      <span className="truncate">{preview}</span>
+    </p>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
@@ -0,0 +1,37 @@
+import type { FC } from 'react'
+import { formatRelativeTime } from '../agent-display.helpers'
+import { AgentTokenSummary } from './AgentTokenSummary'
+import type { AgentTokenUsage } from './agent-row.types'
+
+interface AgentMetaRowProps {
+  lastUsedAt: number | null
+  tokens: AgentTokenUsage | null
+}
+
+/**
+ * Bottom-of-row meta line. Intentionally sparse — last activity time
+ * and lifetime tokens. CWD is no longer surfaced here because the path
+ * the server happens to be running from isn't actionable; if a future
+ * surface needs the cwd (chat panel, debug view) it reads from the
+ * listing payload directly.
+ */
+export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
+  const lastUsedLabel = formatRelativeTime(lastUsedAt)
+  const tokensTotal =
+    (tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
+  const showTokens = tokensTotal > 0
+
+  return (
+    <div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
+      <span>{lastUsedLabel}</span>
+      {showTokens && (
+        <>
+          <span aria-hidden className="text-muted-foreground/50">
+            ·
+          </span>
+          <AgentTokenSummary tokens={tokens} />
+        </>
+      )}
+    </div>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
@@ -0,0 +1,92 @@
+import type { FC } from 'react'
+import {
+  HoverCard,
+  HoverCardContent,
+  HoverCardTrigger,
+} from '@/components/ui/hover-card'
+import { cn } from '@/lib/utils'
+import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
+
+interface AgentSparklineProps {
+  /** 14 entries, oldest → newest. Today's bucket is the last index. */
+  turnsByDay: number[]
+  /** Same length, same order. Failed turns counted separately. */
+  failedByDay: number[]
+  className?: string
+}
+
+const MIN_BAR_HEIGHT_PX = 2
+const MAX_BAR_HEIGHT_PX = 18
+
+export const AgentSparkline: FC<AgentSparklineProps> = ({
+  turnsByDay,
+  failedByDay,
+  className,
+}) => {
+  if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
+  const max = Math.max(1, ...turnsByDay)
+
+  return (
+    <HoverCard openDelay={250}>
+      <HoverCardTrigger asChild>
+        <div
+          role="img"
+          aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
+          className={cn('flex h-5 items-end gap-px', className)}
+        >
+          {turnsByDay.map((count, idx) => {
+            const ratio = count / max
+            const height = Math.max(
+              MIN_BAR_HEIGHT_PX,
+              Math.round(ratio * MAX_BAR_HEIGHT_PX),
+            )
+            const isToday = idx === ROW_BAR_COUNT - 1
+            const failed = failedByDay[idx] ?? 0
+            return (
+              <div
+                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
+                key={`bar-${idx}`}
+                className={cn(
+                  'w-1.5 rounded-sm',
+                  count === 0
+                    ? 'bg-muted-foreground/15'
+                    : failed > 0
+                      ? 'bg-destructive/50'
+                      : 'bg-[var(--accent-orange)]/50',
+                  isToday && 'ring-1 ring-foreground/30',
+                )}
+                style={{ height }}
+              />
+            )
+          })}
+        </div>
+      </HoverCardTrigger>
+      <HoverCardContent side="left" className="w-56 text-xs">
+        <div className="mb-2 font-medium text-sm">Last 14 days</div>
+        <ul className="space-y-0.5">
+          {turnsByDay.map((count, idx) => {
+            const failed = failedByDay[idx] ?? 0
+            const dayLabel = formatLocalDate(idx)
+            return (
+              <li
+                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
+                key={`day-${idx}`}
+                className="flex items-center justify-between text-muted-foreground"
+              >
+                <span>{dayLabel}</span>
+                <span>
+                  {count}
+                  {failed > 0 && (
+                    <span className="ml-1 text-destructive">
+                      ({failed} failed)
+                    </span>
+                  )}
+                </span>
+              </li>
+            )
+          })}
+        </ul>
+      </HoverCardContent>
+    </HoverCard>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
@@ -0,0 +1,71 @@
+import { TriangleAlert } from 'lucide-react'
+import type { FC } from 'react'
+import { Badge } from '@/components/ui/badge'
+import {
+  HoverCard,
+  HoverCardContent,
+  HoverCardTrigger,
+} from '@/components/ui/hover-card'
+import { cn } from '@/lib/utils'
+import { adapterLabel } from '../AdapterIcon'
+import type { HarnessAgentAdapter } from '../agent-harness-types'
+import type { AgentAdapterHealth } from './agent-row.types'
+
+interface AgentSummaryChipsProps {
+  adapter: HarnessAgentAdapter | 'unknown'
+  modelLabel: string | null
+  reasoningEffort: string | null
+  /** When unhealthy, the adapter label dims and a warning chip appears. */
+  adapterHealth: AgentAdapterHealth | null
+}
+
+/**
+ * Adapter / model / reasoning summary line. Always rendered (so OpenClaw
+ * rows that fall back to defaults still expose what they're set up to do)
+ * and surfaces adapter-health *only when unhealthy* — keeping the calm
+ * default state silent and reserving visual noise for things the user
+ * needs to act on.
+ */
+export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
+  adapter,
+  modelLabel,
+  reasoningEffort,
+  adapterHealth,
+}) => {
+  const parts = [adapterLabel(adapter)]
+  if (modelLabel) parts.push(modelLabel)
+  if (reasoningEffort) parts.push(reasoningEffort)
+  const unhealthy = adapterHealth?.healthy === false
+  return (
+    <div
+      className={cn(
+        'flex items-center gap-1.5 text-muted-foreground text-xs',
+        unhealthy && 'text-muted-foreground/70',
+      )}
+    >
+      <span className="truncate">{parts.join(' · ')}</span>
+      {unhealthy && adapterHealth && (
+        <HoverCard openDelay={200}>
+          <HoverCardTrigger asChild>
+            <Badge
+              variant="outline"
+              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
+            >
+              <TriangleAlert className="size-2.5" />
+              <span className="font-normal">Unavailable</span>
+            </Badge>
+          </HoverCardTrigger>
+          <HoverCardContent side="right" className="w-72 text-sm">
+            <div className="font-medium">
+              {adapterLabel(adapter)} CLI not available
+            </div>
+            <div className="mt-1 text-muted-foreground text-xs">
+              {adapterHealth.reason ??
+                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
+            </div>
+          </HoverCardContent>
+        </HoverCard>
+      )}
+    </div>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
@@ -0,0 +1,37 @@
+import type { FC } from 'react'
+import { cn } from '@/lib/utils'
+import { AdapterIcon } from '../AdapterIcon'
+import { livenessDetail } from '../agent-display.helpers'
+import type { HarnessAgentAdapter } from '../agent-harness-types'
+import { type AgentLiveness, LivenessDot } from '../LivenessDot'
+
+export interface AgentTileProps {
+  adapter: HarnessAgentAdapter | 'unknown'
+  status: AgentLiveness
+  lastUsedAt: number | null
+}
+
+/**
+ * Adapter glyph + a single liveness dot. Adapter health is no longer
+ * surfaced here — it lives as an inline pill inside `AgentSummaryChips`
+ * so the user isn't asked to disambiguate two dots on the same tile.
+ */
+export const AgentTile: FC<AgentTileProps> = ({
+  adapter,
+  status,
+  lastUsedAt,
+}) => (
+  <div className="relative shrink-0">
+    <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
+      <AdapterIcon adapter={adapter} className="h-6 w-6" />
+    </div>
+    <LivenessDot
+      status={status}
+      detail={livenessDetail(status, lastUsedAt)}
+      className={cn(
+        'absolute -right-0.5 -bottom-0.5',
+        status === 'working' && 'animate-pulse',
+      )}
+    />
+  </div>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
@@ -0,0 +1,55 @@
+import type { FC } from 'react'
+import { Badge } from '@/components/ui/badge'
+import { displayName } from '../agent-display.helpers'
+import type { AgentListItem } from '../agents-page-types'
+import type { AgentLiveness } from '../LivenessDot'
+import { AgentSparkline } from './AgentSparkline'
+import { PinToggle } from './PinToggle'
+
+interface AgentTitleRowProps {
+  agent: AgentListItem
+  status: AgentLiveness
+  pinned: boolean
+  turnsByDay: number[]
+  failedByDay: number[]
+  onPinToggle: (next: boolean) => void
+}
+
+/**
+ * Title strip: name + status badge + (right-aligned) sparkline. The
+ * pin toggle sits trailing the title so the title always flushes left
+ * regardless of pin state — moving the star left of the title indents
+ * the row's first line off-axis from the model/preview/meta lines
+ * below it. When unpinned and not hovered, the toggle is removed from
+ * layout entirely so it reserves no space at all.
+ */
+export const AgentTitleRow: FC<AgentTitleRowProps> = ({
+  agent,
+  status,
+  pinned,
+  turnsByDay,
+  failedByDay,
+  onPinToggle,
+}) => (
+  <div className="mb-1 flex items-center gap-2">
+    <span className="truncate font-semibold">{displayName(agent)}</span>
+    {status === 'working' && (
+      <Badge
+        variant="secondary"
+        className="bg-amber-50 text-amber-900 hover:bg-amber-50"
+      >
+        Working
+      </Badge>
+    )}
+    {status === 'asleep' && (
+      <Badge variant="outline" className="text-muted-foreground">
+        Asleep
+      </Badge>
+    )}
+    {status === 'error' && <Badge variant="destructive">Attention</Badge>}
+    <PinToggle pinned={pinned} onToggle={onPinToggle} />
+    <div className="ml-auto">
+      <AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
+    </div>
+  </div>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
@@ -0,0 +1,63 @@
+import type { FC } from 'react'
+import {
+  HoverCard,
+  HoverCardContent,
+  HoverCardTrigger,
+} from '@/components/ui/hover-card'
+import { Progress } from '@/components/ui/progress'
+import { formatTokens } from './agent-row.helpers'
+import type { AgentTokenUsage } from './agent-row.types'
+
+interface AgentTokenSummaryProps {
+  tokens: AgentTokenUsage | null
+}
+
+/**
+ * Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
+ * (the only window we can compute reliably from the session record).
+ * Per-window stats land in a follow-up once the activity ledger ships.
+ */
+export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
+  if (!tokens) return null
+  const { input, output } = tokens.cumulative
+  const total = input + output
+  if (total === 0) return null
+  const inputPct = (input / total) * 100
+
+  return (
+    <HoverCard openDelay={200}>
+      <HoverCardTrigger asChild>
+        <span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
+          {formatTokens(total)} tokens
+        </span>
+      </HoverCardTrigger>
+      <HoverCardContent side="top" align="end" className="w-72 text-sm">
+        <div className="mb-3 flex items-center justify-between">
+          <span className="font-medium">Lifetime tokens</span>
+          <span className="text-muted-foreground text-xs tabular-nums">
+            {formatTokens(total)} total
+          </span>
+        </div>
+
+        <div className="space-y-2">
+          <div className="flex items-center justify-between text-xs">
+            <span className="text-muted-foreground">Input</span>
+            <span className="tabular-nums">{formatTokens(input)}</span>
+          </div>
+          <Progress value={inputPct} className="h-1.5" />
+
+          <div className="mt-2 flex items-center justify-between text-xs">
+            <span className="text-muted-foreground">Output</span>
+            <span className="tabular-nums">{formatTokens(output)}</span>
+          </div>
+          <Progress value={100 - inputPct} className="h-1.5" />
+        </div>
+
+        <p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
+          Cumulative across every turn this agent has run. Per-window stats
+          arrive in a future release.
+        </p>
+      </HoverCardContent>
+    </HoverCard>
+  )
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
@@ -0,0 +1,60 @@
+import { Star } from 'lucide-react'
+import type { FC } from 'react'
+import { Button } from '@/components/ui/button'
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from '@/components/ui/tooltip'
+import { cn } from '@/lib/utils'
+
+interface PinToggleProps {
+  pinned: boolean
+  onToggle: (next: boolean) => void
+}
+
+/**
+ * Trailing star toggle. The button is *always rendered* — only its
+ * opacity changes between pinned/unpinned/hover states — so the title
+ * row's height is constant. Hiding the slot via `display: none` would
+ * collapse the row's vertical metrics on hover and shift every card
+ * below in the rail.
+ *
+ * Placement is trailing the title (after the status badge) so the
+ * title itself flushes left regardless of pin state — leading the
+ * row with the star would indent the title relative to the model /
+ * preview / meta lines beneath it.
+ */
+export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
+  <TooltipProvider delayDuration={300}>
+    <Tooltip>
+      <TooltipTrigger asChild>
+        <Button
+          variant="ghost"
+          size="icon"
+          className={cn(
+            'size-6 text-muted-foreground transition-opacity hover:text-foreground',
+            pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
+          )}
+          aria-pressed={pinned}
+          aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
+          onClick={(event) => {
+            event.stopPropagation()
+            onToggle(!pinned)
+          }}
+        >
+          <Star
+            className={cn(
+              'size-3.5',
+              pinned && 'fill-amber-400 text-amber-500',
+            )}
+          />
+        </Button>
+      </TooltipTrigger>
+      <TooltipContent side="top" className="text-xs">
+        {pinned ? 'Unpin' : 'Pin to top'}
+      </TooltipContent>
+    </Tooltip>
+  </TooltipProvider>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
@@ -0,0 +1,73 @@
+import { describe, expect, it } from 'bun:test'
+import {
+  firstNonBlankLine,
+  formatLocalDate,
+  formatTokens,
+  ROW_BAR_COUNT,
+  truncate,
+} from './agent-row.helpers'
+
+describe('formatTokens', () => {
+  it('renders zero / NaN as "0"', () => {
+    expect(formatTokens(0)).toBe('0')
+    expect(formatTokens(Number.NaN)).toBe('0')
+  })
+
+  it('renders sub-1K as integer', () => {
+    expect(formatTokens(142)).toBe('142')
+  })
+
+  it('renders K with one decimal under 10', () => {
+    expect(formatTokens(8_400)).toBe('8.4K')
+  })
+
+  it('drops the decimal at >=10K', () => {
+    expect(formatTokens(120_000)).toBe('120K')
+  })
+
+  it('renders M with one decimal under 10', () => {
+    expect(formatTokens(1_200_000)).toBe('1.2M')
+  })
+})
+
+describe('firstNonBlankLine', () => {
+  it('returns the first non-blank line', () => {
+    expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
+  })
+
+  it('skips USER_QUERY envelope tags', () => {
+    expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
+      'fix tests',
+    )
+  })
+
+  it('falls back to the trimmed input when nothing matches', () => {
+    expect(firstNonBlankLine('   single   ')).toBe('single')
+  })
+})
+
+describe('truncate', () => {
+  it('returns input unchanged when within limit', () => {
+    expect(truncate('hello', 10)).toBe('hello')
+  })
+
+  it('appends an ellipsis when over limit', () => {
+    expect(truncate('hello world', 6)).toBe('hello…')
+  })
+})
+
+describe('formatLocalDate', () => {
+  const today = new Date('2026-04-30T12:00:00Z')
+
+  it('labels today and yesterday explicitly', () => {
+    expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
+    expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
+  })
+
+  it('returns a "Mon D" format for older days', () => {
+    const label = formatLocalDate(0, today)
+    // "Apr 17" or "Apr 17," depending on locale; just assert it
+    // contains a month abbreviation and a day number.
+    expect(label).toMatch(/[A-Za-z]+ \d+/)
+  })
+})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
@@ -0,0 +1,64 @@
+/**
+ * Pure formatters consumed by row sub-components. Kept distinct from
+ * `agent-display.helpers.ts` (page-level helpers) so the row internals
+ * have an obvious single home.
+ */
+
+const TOKEN_THRESHOLDS: Array<[number, string]> = [
+  [1_000_000, 'M'],
+  [1_000, 'K'],
+]
+
+/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
+export function formatTokens(n: number): string {
+  if (!Number.isFinite(n) || n <= 0) return '0'
+  for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
+    if (n >= threshold) {
+      const value = n / threshold
+      const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
+      return `${decimal}${suffix}`
+    }
+  }
+  return String(Math.round(n))
+}
+
+const USER_QUERY_OPEN = /^<USER_QUERY>$/i
+const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
+
+/**
+ * First non-blank line, with the BrowserOS user-system-prompt
+ * `<USER_QUERY>` envelope tags stripped so previews don't show
+ * structural noise.
+ */
+export function firstNonBlankLine(text: string): string {
+  const lines = text.split('\n').map((line) => line.trim())
+  for (const line of lines) {
+    if (!line) continue
+    if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
+    return line
+  }
+  return text.trim()
+}
+
+export function truncate(text: string, max: number): string {
+  if (text.length <= max) return text
+  return `${text.slice(0, max - 1).trimEnd()}…`
+}
+
+const SPARKLINE_DAYS = 14
+
+/**
+ * "today" / "yesterday" / "Apr 17" — given an index 0..13 from
+ * oldest → newest. `today` defaults to `new Date()` so callers don't
+ * have to thread a clock through.
+ */
+export function formatLocalDate(idx: number, today: Date = new Date()): string {
+  if (idx === SPARKLINE_DAYS - 1) return 'today'
+  if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
+  const offset = SPARKLINE_DAYS - 1 - idx
+  const date = new Date(today)
+  date.setDate(date.getDate() - offset)
+  return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
+}
+
+export const ROW_BAR_COUNT = SPARKLINE_DAYS
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
@@ -0,0 +1,51 @@
+import type { HarnessAgentAdapter } from '../agent-harness-types'
+import type { AgentListItem } from '../agents-page-types'
+import type { AgentLiveness } from '../LivenessDot'
+
+/**
+ * Window-bounded token usage. Server returns `null` when no session
+ * record exists yet for the agent.
+ */
+export interface AgentTokenUsage {
+  last7d: { input: number; output: number; requestCount: number }
+  cumulative: { input: number; output: number }
+}
+
+export interface AgentAdapterHealth {
+  healthy: boolean
+  reason?: string
+}
+
+/**
+ * Everything an `AgentRowCard` needs to render. Mirrors the shape
+ * `useHarnessAgents` exposes; the page assembles one entry per row in
+ * `AgentList` and passes it down. Sub-components only see slices of
+ * this object — no prop drilling beyond two levels.
+ */
+export interface AgentRowData {
+  agent: AgentListItem
+  adapter: HarnessAgentAdapter | 'unknown'
+  modelLabel: string | null
+  reasoningEffort: string | null
+  status: AgentLiveness
+  lastUsedAt: number | null
+  pinned: boolean
+  cwd: string | null
+  lastUserMessage: string | null
+  tokens: AgentTokenUsage | null
+  /** 14 entries, oldest → newest. Today is the last index. */
+  turnsByDay: number[]
+  /** Same length and ordering as `turnsByDay`. */
+  failedByDay: number[]
+  lastError: string | null
+  lastErrorAt: number | null
+  /** When non-null, an in-flight turn this row can be resumed from. */
+  activeTurnId: string | null
+  /** Adapter-level health, shared across rows for the same adapter. */
+  adapterHealth: AgentAdapterHealth | null
+}
+
+export interface AgentRowCallbacks {
+  onDelete: (agent: AgentListItem) => void
+  onPinToggle: (agent: AgentListItem, next: boolean) => void
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.test.ts
@@ -0,0 +1,104 @@
+import { describe, expect, it } from 'bun:test'
+import type { HarnessAgent } from './agent-harness-types'
+import {
+  compareAgentsByPinThenRecency,
+  orderAgentsByPinThenRecency,
+} from './agents-list-order'
+
+function makeAgent(input: {
+  id: string
+  pinned?: boolean
+  lastUsedAt?: number | null
+}): HarnessAgent {
+  return {
+    id: input.id,
+    name: input.id,
+    adapter: 'codex',
+    permissionMode: 'approve-all',
+    sessionKey: 'session',
+    createdAt: 0,
+    updatedAt: 0,
+    pinned: input.pinned,
+    lastUsedAt: input.lastUsedAt,
+  }
+}
+
+describe('orderAgentsByPinThenRecency', () => {
+  it('floats pinned agents to the top regardless of recency', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
+      makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
+      makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
+  })
+
+  it('sorts by lastUsedAt desc within each pin group', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
+      makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
+      makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
+      makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual([
+      'newer-pin',
+      'older-pin',
+      'newer',
+      'older',
+    ])
+  })
+
+  it('seed-pins the gateway main agent above other never-used agents', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
+      makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
+      makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
+  })
+
+  it('drops the main seed-pin once the agent has been used', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
+      makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
+  })
+
+  it('puts never-used agents below recently-used ones', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
+      makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
+  })
+
+  it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
+    const result = orderAgentsByPinThenRecency([
+      makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
+      makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
+    ])
+    expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
+  })
+})
+
+describe('compareAgentsByPinThenRecency', () => {
+  it('produces the same order as the harness-shape helper', () => {
+    const items = [
+      { id: 'older', pinned: false, lastUsedAt: 50 },
+      { id: 'newer', pinned: false, lastUsedAt: 80 },
+      { id: 'pinned', pinned: true, lastUsedAt: 1 },
+    ]
+    const sorted = [...items].sort(compareAgentsByPinThenRecency)
+    expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
+  })
+
+  it('seeds the main agent above other never-used rows', () => {
+    const items = [
+      { id: 'zzz', pinned: false, lastUsedAt: null },
+      { id: 'main', pinned: false, lastUsedAt: null },
+    ]
+    const sorted = [...items].sort(compareAgentsByPinThenRecency)
+    expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
+  })
+})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.ts
@@ -0,0 +1,59 @@
+import type { HarnessAgent } from './agent-harness-types'
+
+/**
+ * Stable ordering for index-shaped agent surfaces (the `/agents` rail
+ * and the chat-screen rail at `/agents/:agentId`). Pinned rows float
+ * to the top, then recency desc, with never-used agents falling to
+ * the bottom in id-stable order. The gateway's `main` agent gets
+ * seed-pinned to the top of the never-used group so a fresh install
+ * has an obvious starting point even before the user has used it.
+ *
+ * NOT the same rule as the home grid (`orderHomeAgents`): home is
+ * action-shaped — active-turn floats to the top — so users can
+ * resume what's running. The chat rail keeps recency stable so it
+ * doesn't reshuffle as turns transition every 5s.
+ */
+export function orderAgentsByPinThenRecency(
+  agents: HarnessAgent[],
+): HarnessAgent[] {
+  return [...agents].sort((a, b) => {
+    const aPinned = a.pinned ?? false
+    const bPinned = b.pinned ?? false
+    if (aPinned !== bPinned) return aPinned ? -1 : 1
+
+    const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
+    const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
+    if (aSeed && !bSeed) return -1
+    if (!aSeed && bSeed) return 1
+
+    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
+    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
+    if (aValue !== bValue) return bValue - aValue
+
+    return a.id.localeCompare(b.id)
+  })
+}
+
+/**
+ * Same comparator, but operates over arbitrary records that carry
+ * `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
+ * `/agents` `AgentList` which pivots `AgentListItem` + harness
+ * lookup into a sortable shape; both surfaces stay on identical
+ * sort semantics through this adapter.
+ */
+export function compareAgentsByPinThenRecency<
+  T extends { pinned: boolean; lastUsedAt: number | null; id: string },
+>(a: T, b: T): number {
+  if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
+
+  const aSeed = a.id === 'main' && a.lastUsedAt === null
+  const bSeed = b.id === 'main' && b.lastUsedAt === null
+  if (aSeed && !bSeed) return -1
+  if (!aSeed && bSeed) return 1
+
+  const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
+  const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
+  if (aValue !== bValue) return bValue - aValue
+
+  return a.id.localeCompare(b.id)
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
@@ -8,6 +8,7 @@ import {
  type HarnessAdapterDescriptor,
  type HarnessAgent,
  type HarnessAgentHistoryPage,
+  type HarnessQueuedMessage,
  mapHarnessAgentToEntry,
 } from './agent-harness-types'
 import type { OpenClawStatus } from './useOpenClaw'
@@ -135,6 +136,63 @@ export function useCreateHarnessAgent() {
  })
 }

+/**
+ * Apply a partial update to a harness agent. Used by the pin-toggle
+ * star and (eventually) the inline rename UI. Optimistically writes
+ * the patch into the listing query cache so the row updates instantly,
+ * then rolls back if the server rejects the change.
+ */
+export function useUpdateHarnessAgent() {
+  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+
+  return useMutation({
+    mutationFn: async (input: {
+      agentId: string
+      patch: { name?: string; pinned?: boolean }
+    }) => {
+      if (!baseUrl || urlLoading) {
+        throw new Error('BrowserOS agent server URL is not ready')
+      }
+      const data = await agentsFetch<{ agent: HarnessAgent }>(
+        baseUrl,
+        `/${encodeURIComponent(input.agentId)}`,
+        {
+          method: 'PATCH',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify(input.patch),
+        },
+      )
+      return data.agent
+    },
+    onMutate: async ({ agentId, patch }) => {
+      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
+      await queryClient.cancelQueries({ queryKey })
+      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
+      if (!previous) return { previous: undefined }
+      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
+        ...previous,
+        agents: previous.agents.map((agent) =>
+          agent.id === agentId ? { ...agent, ...patch } : agent,
+        ),
+      })
+      return { previous }
+    },
+    onError: (_err, _vars, context) => {
+      if (!context?.previous) return
+      queryClient.setQueryData(
+        [AGENT_QUERY_KEYS.agents, baseUrl],
+        context.previous,
+      )
+    },
+    onSettled: async () => {
+      await queryClient.invalidateQueries({
+        queryKey: [AGENT_QUERY_KEYS.agents],
+      })
+    },
+  })
+}
+
 export function useDeleteHarnessAgent() {
  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
  const queryClient = useQueryClient()
@@ -206,6 +264,8 @@ export interface HarnessActiveTurnInfo {
  lastSeq: number
  startedAt: number
  endedAt?: number
+  /** User message that kicked off the turn; null when not captured. */
+  prompt: string | null
 }

 /**
@@ -260,3 +320,145 @@ export async function fetchHarnessAgentHistory(
    `/${encodeURIComponent(agentId)}/sessions/main/history`,
  )
 }
+
+export interface EnqueueMessageInput {
+  message: string
+  attachments?: ReadonlyArray<unknown>
+}
+
+export async function enqueueHarnessMessage(
+  agentId: string,
+  input: EnqueueMessageInput,
+): Promise<HarnessQueuedMessage> {
+  const baseUrl = await getAgentServerUrl()
+  const response = await fetch(
+    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
+    {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({
+        message: input.message,
+        ...(input.attachments && input.attachments.length > 0
+          ? { attachments: input.attachments }
+          : {}),
+      }),
+    },
+  )
+  if (!response.ok) {
+    let message = `Request failed with status ${response.status}`
+    try {
+      const body = (await response.json()) as { error?: string }
+      if (body.error) message = body.error
+    } catch {}
+    throw new Error(message)
+  }
+  const body = (await response.json()) as { queued: HarnessQueuedMessage }
+  return body.queued
+}
+
+export async function removeHarnessQueuedMessage(
+  agentId: string,
+  messageId: string,
+): Promise<{ removed: boolean }> {
+  const baseUrl = await getAgentServerUrl()
+  const response = await fetch(
+    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
+      messageId,
+    )}`,
+    { method: 'DELETE' },
+  )
+  if (!response.ok) return { removed: false }
+  return (await response.json()) as { removed: boolean }
+}
+
+/**
+ * Optimistic enqueue: writes the new queued message into the listing
+ * cache immediately so the queue panel reflects the change without
+ * waiting for the next poll. Rolls back if the server rejects.
+ */
+export function useEnqueueHarnessMessage() {
+  const { baseUrl } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+
+  return useMutation({
+    mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
+      enqueueHarnessMessage(input.agentId, input),
+    onMutate: async (input) => {
+      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
+      await queryClient.cancelQueries({ queryKey })
+      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
+      if (!previous) return { previous: undefined }
+      const optimistic: HarnessQueuedMessage = {
+        id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
+        createdAt: Date.now(),
+        message: input.message,
+      }
+      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
+        ...previous,
+        agents: previous.agents.map((agent) =>
+          agent.id === input.agentId
+            ? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
+            : agent,
+        ),
+      })
+      return { previous }
+    },
+    onError: (_err, _vars, context) => {
+      if (!context?.previous) return
+      queryClient.setQueryData(
+        [AGENT_QUERY_KEYS.agents, baseUrl],
+        context.previous,
+      )
+    },
+    onSettled: async () => {
+      await queryClient.invalidateQueries({
+        queryKey: [AGENT_QUERY_KEYS.agents],
+      })
+    },
+  })
+}
+
+/**
+ * Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
+ */
+export function useRemoveHarnessQueuedMessage() {
+  const { baseUrl } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+
+  return useMutation({
+    mutationFn: async (input: { agentId: string; messageId: string }) =>
+      removeHarnessQueuedMessage(input.agentId, input.messageId),
+    onMutate: async (input) => {
+      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
+      await queryClient.cancelQueries({ queryKey })
+      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
+      if (!previous) return { previous: undefined }
+      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
+        ...previous,
+        agents: previous.agents.map((agent) =>
+          agent.id === input.agentId
+            ? {
+                ...agent,
+                queue: (agent.queue ?? []).filter(
+                  (entry) => entry.id !== input.messageId,
+                ),
+              }
+            : agent,
+        ),
+      })
+      return { previous }
+    },
+    onError: (_err, _vars, context) => {
+      if (!context?.previous) return
+      queryClient.setQueryData(
+        [AGENT_QUERY_KEYS.agents, baseUrl],
+        context.previous,
+      )
+    },
+    onSettled: async () => {
+      await queryClient.invalidateQueries({
+        queryKey: [AGENT_QUERY_KEYS.agents],
+      })
+    },
+  })
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
@@ -1,5 +1,8 @@
 import { describe, expect, it } from 'bun:test'
-import type { HarnessAdapterDescriptor } from '@/entrypoints/app/agents/agent-harness-types'
+import type {
+  HarnessAdapterDescriptor,
+  HarnessAgent,
+} from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import {
  buildSidepanelChatTargets,
@@ -77,58 +80,96 @@ const adapters: HarnessAdapterDescriptor[] = [
  },
 ]

+const agents: HarnessAgent[] = [
+  {
+    id: 'agent-codex',
+    name: 'Review Bot',
+    adapter: 'codex',
+    modelId: 'gpt-5.5',
+    reasoningEffort: 'medium',
+    permissionMode: 'approve-all',
+    sessionKey: 'agent:agent-codex:main',
+    createdAt: timestamp,
+    updatedAt: timestamp,
+  },
+  {
+    id: 'agent-openclaw',
+    name: 'Research Claw',
+    adapter: 'openclaw',
+    modelId: 'default',
+    reasoningEffort: 'high',
+    permissionMode: 'approve-all',
+    sessionKey: 'agent:agent-openclaw:main',
+    createdAt: timestamp,
+    updatedAt: timestamp,
+  },
+]
+
 describe('buildSidepanelChatTargets', () => {
-  it('returns LLM targets plus one ACP target per adapter model', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters })
+  it('returns LLM targets plus one ACP target per persisted harness agent', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })

    expect(targets.map((target) => target.id)).toEqual([
      'browseros',
      'anthropic-sonnet',
-      'acp:claude:sonnet:medium',
-      'acp:claude:haiku:medium',
-      'acp:codex:gpt-5.5:medium',
-      'acp:openclaw:default:medium',
+      'agent-codex',
+      'agent-openclaw',
    ])
  })

-  it('emits a single default ACP target for adapters with no per-session model picker', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters })
-    const openclaw = targets.find(
-      (target) => target.id === 'acp:openclaw:default:medium',
-    )
+  it('does not emit catalog-only ACP targets without persisted agents', () => {
+    const targets = buildSidepanelChatTargets({
+      providers,
+      adapters,
+      agents: [],
+    })
+
+    expect(targets.map((target) => target.id)).toEqual([
+      'browseros',
+      'anthropic-sonnet',
+    ])
+  })
+
+  it('uses the created OpenClaw agent name instead of a generic adapter target', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+    const openclaw = targets.find((target) => target.id === 'agent-openclaw')

    expect(openclaw).toMatchObject({
      kind: 'acp',
+      id: 'agent-openclaw',
+      agentId: 'agent-openclaw',
      adapter: 'openclaw',
      adapterName: 'OpenClaw',
      modelId: 'default',
      modelLabel: 'default',
-      // Without a model picker, the target name is just the adapter
-      // name — the user picks the adapter, not a model under it.
-      name: 'OpenClaw',
+      name: 'Research Claw',
      modelControl: 'best-effort',
-      reasoningEffort: 'medium',
+      reasoningEffort: 'high',
    })
  })

-  it('preserves ACP model-control and recommendation metadata', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters })
-    const haiku = targets.find(
-      (target) => target.id === 'acp:claude:haiku:medium',
-    )
+  it('preserves adapter metadata for created agent targets', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+    const codex = targets.find((target) => target.id === 'agent-codex')

-    expect(haiku).toMatchObject({
+    expect(codex).toMatchObject({
      kind: 'acp',
-      adapter: 'claude',
-      modelId: 'haiku',
-      modelControl: 'best-effort',
+      agentId: 'agent-codex',
+      adapter: 'codex',
+      adapterName: 'Codex',
+      modelId: 'gpt-5.5',
+      modelLabel: 'GPT-5.5',
+      modelControl: 'runtime-supported',
      recommended: true,
      reasoningEffort: 'medium',
+      reasoningEffortLabel: 'Medium',
    })
  })

-  it('still returns LLM targets when ACP adapters are unavailable', () => {
-    expect(buildSidepanelChatTargets({ providers, adapters: [] })).toEqual([
+  it('still returns LLM targets when agents and adapters are unavailable', () => {
+    expect(
+      buildSidepanelChatTargets({ providers, adapters: [], agents: [] }),
+    ).toEqual([
      {
        kind: 'llm',
        id: 'browseros',
@@ -149,7 +190,7 @@ describe('buildSidepanelChatTargets', () => {

 describe('resolveSidepanelChatTarget', () => {
  it('resolves selected LLM targets back to their provider config', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
    const resolved = resolveSidepanelChatTarget({
      targets,
      defaultProviderId: 'browseros',
@@ -161,13 +202,32 @@ describe('resolveSidepanelChatTarget', () => {
  })

  it('falls back to the current default LLM provider when a persisted ACP target is stale', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters: [] })
+    const targets = buildSidepanelChatTargets({
+      providers,
+      adapters,
+      agents: [],
+    })

    expect(
      resolveSidepanelChatTarget({
        targets,
        defaultProviderId: 'anthropic-sonnet',
-        selection: { kind: 'acp', id: 'acp:claude:haiku:medium' },
+        selection: { kind: 'acp', id: 'agent-codex' },
+      }),
+    ).toMatchObject({
+      kind: 'llm',
+      id: 'anthropic-sonnet',
+    })
+  })
+
+  it('falls back when an old catalog-style ACP target id is persisted', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+
+    expect(
+      resolveSidepanelChatTarget({
+        targets,
+        defaultProviderId: 'anthropic-sonnet',
+        selection: { kind: 'acp', id: 'acp:codex:gpt-5.5:medium' },
      }),
    ).toMatchObject({
      kind: 'llm',
@@ -180,10 +240,8 @@ describe('persistSidepanelChatTargetSelection', () => {
  it('stores only target identity and does not mutate LLM provider arrays', async () => {
    let savedSelection: SidepanelChatTargetSelection | null = null
    const originalProviders = providers.map((provider) => ({ ...provider }))
-    const targets = buildSidepanelChatTargets({ providers, adapters })
-    const target = targets.find(
-      (candidate) => candidate.id === 'acp:codex:gpt-5.5:medium',
-    )
+    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+    const target = targets.find((candidate) => candidate.id === 'agent-codex')

    await persistSidepanelChatTargetSelection(target, {
      setValue: async (value) => {
@@ -193,7 +251,7 @@ describe('persistSidepanelChatTargetSelection', () => {

    expect(savedSelection as SidepanelChatTargetSelection | null).toEqual({
      kind: 'acp',
-      id: 'acp:codex:gpt-5.5:medium',
+      id: 'agent-codex',
    })
    expect(providers).toEqual(originalProviders)
  })
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
@@ -1,5 +1,6 @@
 import type {
  HarnessAdapterDescriptor,
+  HarnessAgent,
  HarnessAgentAdapter,
 } from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
@@ -19,6 +20,7 @@ export type SidepanelChatTarget =
      id: string
      name: string
      type: 'acp'
+      agentId: string
      adapter: HarnessAgentAdapter
      adapterName: string
      modelId: string
@@ -37,6 +39,7 @@ export type SidepanelChatTargetSelection = Pick<
 interface BuildSidepanelChatTargetsInput {
  providers: LlmProviderConfig[]
  adapters: HarnessAdapterDescriptor[]
+  agents?: HarnessAgent[]
 }

 interface ResolveSidepanelChatTargetInput {
@@ -63,61 +66,49 @@ let sidepanelChatTargetSelectionStorage:
 export function buildSidepanelChatTargets({
  providers,
  adapters,
+  agents = [],
 }: BuildSidepanelChatTargetsInput): SidepanelChatTarget[] {
  return [
    ...providers.map(toLlmTarget),
-    ...adapters.flatMap(toAcpTargetsForAdapter),
+    ...agents.map((agent) => toAcpTargetForAgent(agent, adapters)),
  ]
 }

-function toAcpTargetsForAdapter(
-  adapter: HarnessAdapterDescriptor,
-): SidepanelChatTarget[] {
-  const reasoning = adapter.reasoningEfforts.find(
-    (effort) => effort.id === adapter.defaultReasoningEffort,
-  )
+function toAcpTargetForAgent(
+  agent: HarnessAgent,
+  adapters: HarnessAdapterDescriptor[],
+): SidepanelChatTarget {
+  const adapter = adapters.find((entry) => entry.id === agent.adapter)
+  const modelId = agent.modelId ?? adapter?.defaultModelId ?? 'default'
  const reasoningEffort =
-    reasoning?.id ?? adapter.defaultReasoningEffort ?? 'medium'
+    agent.reasoningEffort ?? adapter?.defaultReasoningEffort ?? 'medium'
+  const model = adapter?.models.find((entry) => entry.id === modelId)
+  const reasoning = adapter?.reasoningEfforts.find(
+    (effort) => effort.id === reasoningEffort,
+  )

-  // Adapters with no per-session model picker (e.g. OpenClaw, whose
-  // model lives on the gateway-side agent record) still need exactly
-  // one sidepanel target so the user can pick the adapter at all.
-  if (adapter.models.length === 0) {
-    return [
-      {
-        kind: 'acp',
-        id: buildAcpTargetId(
-          adapter.id,
-          adapter.defaultModelId,
-          reasoningEffort,
-        ),
-        name: adapter.name,
-        type: 'acp',
-        adapter: adapter.id,
-        adapterName: adapter.name,
-        modelId: adapter.defaultModelId,
-        modelLabel: 'default',
-        modelControl: adapter.modelControl,
-        reasoningEffort,
-        reasoningEffortLabel: reasoning?.label,
-      },
-    ]
-  }
-
-  return adapter.models.map((model) => ({
-    kind: 'acp' as const,
-    id: buildAcpTargetId(adapter.id, model.id, reasoningEffort),
-    name: `${adapter.name} ${model.label}`,
-    type: 'acp' as const,
-    adapter: adapter.id,
-    adapterName: adapter.name,
-    modelId: model.id,
-    modelLabel: model.label,
-    modelControl: adapter.modelControl,
-    recommended: model.recommended,
+  return {
+    kind: 'acp',
+    id: agent.id,
+    name: agent.name,
+    type: 'acp',
+    agentId: agent.id,
+    adapter: agent.adapter,
+    adapterName: adapter?.name ?? formatAdapterName(agent.adapter),
+    modelId,
+    modelLabel: model?.label ?? modelId,
+    modelControl: adapter?.modelControl ?? 'best-effort',
+    recommended: model?.recommended,
    reasoningEffort,
    reasoningEffortLabel: reasoning?.label,
-  }))
+  }
+}
+
+function formatAdapterName(adapter: HarnessAgentAdapter): string {
+  if (adapter === 'claude') return 'Claude Code'
+  if (adapter === 'codex') return 'Codex'
+  if (adapter === 'openclaw') return 'OpenClaw'
+  return adapter
 }

 export function resolveSidepanelChatTarget({
@@ -172,14 +163,6 @@ function toLlmTarget(provider: LlmProviderConfig): SidepanelChatTarget {
  }
 }

-export function buildAcpTargetId(
-  adapter: HarnessAgentAdapter,
-  modelId: string,
-  reasoningEffort: string,
-): string {
-  return `acp:${adapter}:${modelId}:${reasoningEffort}`
-}
-
 async function getSidepanelChatTargetSelectionStorage(): Promise<SidepanelChatTargetSelectionStore> {
  if (sidepanelChatTargetSelectionStorage) {
    return sidepanelChatTargetSelectionStorage
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
@@ -1,6 +1,9 @@
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
 import useDeepCompareEffect from 'use-deep-compare-effect'
-import { useAgentAdapters } from '@/entrypoints/app/agents/useAgents'
+import {
+  useAgentAdapters,
+  useHarnessAgents,
+} from '@/entrypoints/app/agents/useAgents'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
 import { type McpServer, useMcpServers } from '@/lib/mcp/mcpServerStorage'
@@ -38,6 +41,7 @@ export const useChatRefs = () => {
    isLoading: isLoadingProviders,
  } = useLlmProviders()
  const { adapters, loading: isLoadingAdapters } = useAgentAdapters()
+  const { harnessAgents, loading: isLoadingAgents } = useHarnessAgents()
  const { personalization } = usePersonalization()
  const [targetSelection, setTargetSelection] =
    useState<SidepanelChatTargetSelection | null>(null)
@@ -57,8 +61,9 @@ export const useChatRefs = () => {
      buildSidepanelChatTargets({
        providers: llmProviders,
        adapters,
+        agents: harnessAgents,
      }),
-    [llmProviders, adapters],
+    [llmProviders, adapters, harnessAgents],
  )

  const selectedChatTarget = useMemo(
@@ -116,6 +121,7 @@ export const useChatRefs = () => {
    selectedChatTarget,
    selectChatTarget,
    selectedLlmProvider,
-    isLoadingProviders: isLoadingProviders || isLoadingAdapters,
+    isLoadingProviders:
+      isLoadingProviders || isLoadingAdapters || isLoadingAgents,
  }
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
@@ -40,7 +40,7 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
    })
  })

-  it('sends ACP targets to the sidepanel ACP route with explicit target fields', () => {
+  it('sends created-agent targets to the agent-id sidepanel route', () => {
    const request = buildSidepanelPreparedSendMessagesRequest({
      agentServerUrl: 'http://127.0.0.1:5151',
      target: acpTarget,
@@ -52,12 +52,11 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
      ...commonRequestInput(),
    })

-    expect(request.api).toBe('http://127.0.0.1:5151/agents/sidepanel/chat')
+    expect(request.api).toBe(
+      'http://127.0.0.1:5151/agents/agent-codex/sidepanel/chat',
+    )
    expect(request.body).toEqual({
      conversationId,
-      adapter: 'codex',
-      modelId: 'gpt-5.5',
-      reasoningEffort: 'medium',
      message: 'Inspect the current tab',
      browserContext: {
        activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
@@ -140,9 +139,10 @@ const llmTarget: SidepanelChatTarget = {

 const acpTarget: SidepanelChatTarget = {
  kind: 'acp',
-  id: 'acp:codex:gpt-5.5:medium',
-  name: 'Codex GPT-5.5',
+  id: 'agent-codex',
+  name: 'Review bot',
  type: 'acp',
+  agentId: 'agent-codex',
  adapter: 'codex',
  adapterName: 'Codex',
  modelId: 'gpt-5.5',
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
@@ -680,13 +680,20 @@ export const useChatSession = (options?: ChatSessionOptions) => {
  const sendMessage = (params: { text: string; action?: ChatAction }) => {
    const target = selectedChatTargetRef.current
    const llmTargetProvider = toLlmProviderConfig(target)
+    const agentTarget = target?.kind === 'acp' ? target : undefined
    track(MESSAGE_SENT_EVENT, {
      mode,
-      provider_type: target?.kind === 'acp' ? 'acp' : llmTargetProvider?.type,
+      provider_id:
+        agentTarget?.agentId ??
+        llmTargetProvider?.id ??
+        selectedLlmProvider?.id,
+      provider_type: agentTarget ? 'acp' : llmTargetProvider?.type,
+      agent_id: agentTarget?.agentId,
+      adapter: agentTarget?.adapter,
      model:
-        target?.kind === 'acp'
-          ? target.modelId
-          : llmTargetProvider?.modelId || selectedLlmProvider?.modelId,
+        agentTarget?.modelId ??
+        llmTargetProvider?.modelId ??
+        selectedLlmProvider?.modelId,
    })

    if (!isIntegrationsSyncedRef.current) {
@@ -763,6 +770,8 @@ export const useChatSession = (options?: ChatSessionOptions) => {
      provider_type: target.kind === 'acp' ? 'acp' : target.type,
      model_id:
        target.kind === 'acp' ? target.modelId : target.provider.modelId,
+      agent_id: target.kind === 'acp' ? target.agentId : undefined,
+      adapter: target.kind === 'acp' ? target.adapter : undefined,
    })

    void selectChatTarget(target).catch((error) => {
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
@@ -34,15 +34,10 @@ export function buildSidepanelPreparedSendMessagesRequest({
  ...common
 }: BuildSidepanelPreparedSendMessagesRequestInput) {
  if (target?.kind === 'acp') {
-    // ACP session history is owned by AcpxRuntime through sessionKey, so LLM-only
-    // resume and approval fields are intentionally not forwarded.
    return {
-      api: `${agentServerUrl}/agents/sidepanel/chat`,
+      api: `${agentServerUrl}/agents/${encodeURIComponent(target.agentId)}/sidepanel/chat`,
      body: {
        conversationId: common.conversationId,
-        adapter: target.adapter,
-        modelId: target.modelId,
-        reasoningEffort: target.reasoningEffort,
        message: message ?? '',
        browserContext: common.browserContext,
        userSystemPrompt: common.userSystemPrompt,
@@ -71,6 +66,9 @@ export function toProviderOption(target: SidepanelChatTarget): Provider {
    name: target.name,
    type: target.type,
    kind: target.kind,
+    agentId: target.kind === 'acp' ? target.agentId : undefined,
+    adapterName: target.kind === 'acp' ? target.adapterName : undefined,
+    modelLabel: target.kind === 'acp' ? target.modelLabel : undefined,
    modelControl: target.kind === 'acp' ? target.modelControl : undefined,
  }
 }
--- a/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
+++ b/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
@@ -59,15 +59,3 @@ export interface AgentConversation {
  createdAt: number
  updatedAt: number
 }
-
-export interface AgentCardData {
-  agentId: string
-  name: string
-  model?: string
-  status: 'idle' | 'working' | 'error'
-  lastMessage?: string
-  lastMessageTimestamp?: number
-  activitySummary?: string
-  currentTool?: string
-  costUsd?: number
-}
--- a/packages/browseros-agent/apps/agent/package.json
+++ b/packages/browseros-agent/apps/agent/package.json
@@ -9,6 +9,7 @@
    "build": "bun run codegen && wxt build",
    "build:dev": "bun --env-file=.env.development wxt build --mode development",
    "zip": "wxt zip",
+    "test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
    "compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
    "lint": "bunx biome check",
    "typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
--- a/packages/browseros-agent/apps/cli/README.md
+++ b/packages/browseros-agent/apps/cli/README.md
@@ -38,8 +38,8 @@ browseros-cli install                # downloads BrowserOS for your platform
 # If BrowserOS is installed but not running
 browseros-cli launch                 # opens BrowserOS, waits for server

-# Configure the CLI (auto-discovers running BrowserOS)
-browseros-cli init --auto            # detects server URL and saves config
+# Configure the CLI with the Server URL from BrowserOS settings
+browseros-cli init http://127.0.0.1:9000/mcp

 # Verify connection
 browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url>             # non-interactive — pass URL directly
 browseros-cli init                   # interactive — prompts for URL
 ```

-Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
+Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.

 ### CLI updates

@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
 | `--debug` | `BOS_DEBUG=1` | Debug output |
 | `--timeout, -t` | | Request timeout (default: 2m) |

-Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
+Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file

-If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
+If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.

 ## Testing

@@ -179,7 +179,7 @@ apps/cli/
 │   └── config.go       # Config file (~/.config/browseros-cli/config.yaml)
 ├── cmd/
 │   ├── root.go         # Root command, global flags
-│   ├── init.go         # Server URL configuration (URL arg, --auto, interactive)
+│   ├── init.go         # Server URL configuration (URL arg or interactive)
 │   ├── install.go      # install (download BrowserOS for current platform)
 │   ├── launch.go       # launch (find and start BrowserOS, wait for server)
 │   ├── open.go         # open (new_page / new_hidden_page)
--- a/packages/browseros-agent/apps/cli/cmd/init.go
+++ b/packages/browseros-agent/apps/cli/cmd/init.go
@@ -17,8 +17,6 @@ import (
 )

 func init() {
-	var autoDiscover bool
-
 	cmd := &cobra.Command{
 		Use:   "init [url]",
 		Short: "Configure the BrowserOS server connection",
@@ -34,9 +32,8 @@ You can provide the full URL or just the port number:
  browseros-cli init http://127.0.0.1:9000/mcp
  browseros-cli init 9000

-Three modes:
+Modes:
  browseros-cli init <url>    Non-interactive (full URL or port number)
-  browseros-cli init --auto   Auto-discover from ~/.browseros/server.json
  browseros-cli init          Interactive prompt`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.MaximumNArgs(1),
@@ -49,22 +46,9 @@ Three modes:

 			switch {
 			case len(args) == 1:
-				// Non-interactive: URL provided as argument
 				input = args[0]

-			case autoDiscover:
-				// Auto-discover: server.json → config → probe common ports
-				discovered := probeRunningServer()
-				if discovered == "" {
-					output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
-						"  If not running:    browseros-cli launch\n"+
-						"  If not installed:  browseros-cli install", 1)
-				}
-				input = discovered
-				fmt.Printf("Auto-discovered server at %s\n", input)
-
 			default:
-				// Interactive prompt (original behavior)
 				fmt.Println()
 				bold.Println("BrowserOS CLI Setup")
 				fmt.Println()
@@ -95,12 +79,14 @@ Three modes:
 				output.Errorf(1, "invalid URL: %s", input)
 			}

-			// Verify connectivity
 			fmt.Printf("Checking connection to %s ...\n", baseURL)
 			client := &http.Client{Timeout: 5 * time.Second}
 			resp, err := client.Get(baseURL + "/health")
 			if err != nil {
-				output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
+				output.Errorf(1, "cannot connect to %s: %v\n\n"+
+					"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
+					"Then run: browseros-cli init <Server URL>\n"+
+					"Example:  browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
 			}
 			resp.Body.Close()

@@ -121,6 +107,5 @@ Three modes:
 		},
 	}

-	cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
 	rootCmd.AddCommand(cmd)
 }
--- a/packages/browseros-agent/apps/cli/cmd/install.go
+++ b/packages/browseros-agent/apps/cli/cmd/install.go
@@ -28,7 +28,7 @@ Linux:   Downloads AppImage (or .deb with --deb flag)

 After installation:
  browseros-cli launch        # start BrowserOS
-  browseros-cli init --auto   # configure the CLI`,
+  browseros-cli init <url>    # configure the CLI with the Server URL`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.NoArgs,
 		Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
 			fmt.Println()
 			bold.Println("Next steps:")
 			dim.Println("  browseros-cli launch        # start BrowserOS")
-			dim.Println("  browseros-cli init --auto   # configure the CLI")
+			dim.Println("  browseros-cli init <url>    # use the Server URL from BrowserOS settings")
 		},
 	}

--- a/packages/browseros-agent/apps/cli/cmd/launch.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch.go
@@ -1,6 +1,7 @@
 package cmd

 import (
+	"encoding/json"
 	"fmt"
 	"net/http"
 	"os"
@@ -38,6 +39,7 @@ If BrowserOS is already running, reports the server URL.`,

 			if url := probeRunningServer(); url != "" {
 				green.Printf("BrowserOS is already running at %s\n", url)
+				dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 				return
 			}

@@ -63,7 +65,7 @@ If BrowserOS is already running, reports the server URL.`,

 			green.Printf("BrowserOS is ready at %s\n", url)
 			fmt.Println()
-			dim.Println("Next: browseros-cli init --auto")
+			dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 		},
 	}

@@ -75,39 +77,77 @@ If BrowserOS is already running, reports the server URL.`,
 // Server probing
 // ---------------------------------------------------------------------------

-// probeRunningServer checks server.json, config, and common ports for a running server.
+var commonBrowserOSPorts = []int{9100, 9200, 9300}
+
+// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
 func probeRunningServer() string {
-	check := func(baseURL string) bool {
-		client := &http.Client{Timeout: 2 * time.Second}
-		resp, err := client.Get(baseURL + "/health")
-		if err != nil {
-			return false
-		}
-		resp.Body.Close()
-		return resp.StatusCode == 200
-	}
+	client := &http.Client{Timeout: 2 * time.Second}

-	// 1. server.json — written by BrowserOS on startup with the actual port
-	if url := loadBrowserosServerURL(); url != "" && check(url) {
+	if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
 		return url
 	}

-	// 2. Saved config / env var
-	if url := defaultServerURL(); url != "" && check(url) {
+	if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
 		return url
 	}

-	// 3. Probe common BrowserOS ports as last resort
-	for _, port := range []int{9100, 9200, 9300} {
+	return probeCommonServerPorts(client)
+}
+
+func checkServerHealth(client *http.Client, baseURL string) bool {
+	resp, err := client.Get(baseURL + "/health")
+	if err != nil {
+		return false
+	}
+	resp.Body.Close()
+	return resp.StatusCode == 200
+}
+
+func probeCommonServerPorts(client *http.Client) string {
+	for _, port := range commonBrowserOSPorts {
 		url := fmt.Sprintf("http://127.0.0.1:%d", port)
-		if check(url) {
+		if checkServerHealth(client, url) {
 			return url
 		}
 	}
-
 	return ""
 }

+type serverDiscoveryConfig struct {
+	ServerPort       int    `json:"server_port"`
+	URL              string `json:"url"`
+	ServerVersion    string `json:"server_version"`
+	BrowserOSVersion string `json:"browseros_version,omitempty"`
+	ChromiumVersion  string `json:"chromium_version,omitempty"`
+}
+
+// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
+//
+// Normal command resolution must not call this because it can override a URL the
+// user explicitly saved with `browseros-cli init <Server URL>`.
+func loadBrowserosServerURL() string {
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return ""
+	}
+
+	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
+	if err != nil {
+		return ""
+	}
+
+	var sc serverDiscoveryConfig
+	if err := json.Unmarshal(data, &sc); err != nil {
+		return ""
+	}
+
+	return normalizeServerURL(sc.URL)
+}
+
+func mcpEndpointURL(baseURL string) string {
+	return strings.TrimSuffix(baseURL, "/") + "/mcp"
+}
+
 // ---------------------------------------------------------------------------
 // Platform-native installation detection
 // ---------------------------------------------------------------------------
@@ -117,7 +157,8 @@ func probeRunningServer() string {
 // macOS:   `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
 // Linux:   checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
 // Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
-//          and registry uninstall key (per-user Chromium install pattern)
+//
+//	and registry uninstall key (per-user Chromium install pattern)
 func isBrowserOSInstalled() bool {
 	switch runtime.GOOS {
 	case "darwin":
@@ -271,14 +312,11 @@ func waitForServer(maxWait time.Duration) (string, bool) {

 	for time.Now().Before(deadline) {
 		// server.json is written by BrowserOS on startup with the actual port
-		if url := loadBrowserosServerURL(); url != "" {
-			resp, err := client.Get(url + "/health")
-			if err == nil {
-				resp.Body.Close()
-				if resp.StatusCode == 200 {
-					return url, true
-				}
-			}
+		if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
+			return url, true
+		}
+		if url := probeCommonServerPorts(client); url != "" {
+			return url, true
 		}
 		fmt.Print(".")
 		time.Sleep(1 * time.Second)
--- a/packages/browseros-agent/apps/cli/cmd/launch_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch_test.go
@@ -0,0 +1,99 @@
+package cmd
+
+import (
+	"fmt"
+	"net"
+	"net/http"
+	"net/http/httptest"
+	"net/url"
+	"os"
+	"path/filepath"
+	"strconv"
+	"testing"
+	"time"
+
+	"browseros-cli/config"
+)
+
+func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	discoveredServer := newHealthyServer(t)
+	configServer := newHealthyServer(t)
+
+	serverDir := filepath.Join(home, ".browseros")
+	if err := os.MkdirAll(serverDir, 0755); err != nil {
+		t.Fatalf("os.MkdirAll() error = %v", err)
+	}
+	data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
+	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
+		t.Fatalf("os.WriteFile() error = %v", err)
+	}
+	if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := probeRunningServer()
+	if got != normalizeServerURL(discoveredServer.URL) {
+		t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
+	}
+}
+
+func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+
+	server := newHealthyServer(t)
+	port := serverPort(t, server.URL)
+
+	originalPorts := commonBrowserOSPorts
+	commonBrowserOSPorts = []int{port}
+	t.Cleanup(func() {
+		commonBrowserOSPorts = originalPorts
+	})
+
+	got, ok := waitForServer(100 * time.Millisecond)
+	if !ok {
+		t.Fatal("waitForServer() ok = false, want true")
+	}
+	if got != normalizeServerURL(server.URL) {
+		t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
+	}
+}
+
+func newHealthyServer(t *testing.T) *httptest.Server {
+	t.Helper()
+
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		if r.URL.Path != "/health" {
+			http.NotFound(w, r)
+			return
+		}
+		w.WriteHeader(http.StatusOK)
+	}))
+	t.Cleanup(server.Close)
+	return server
+}
+
+func serverPort(t *testing.T, rawURL string) int {
+	t.Helper()
+
+	parsed, err := url.Parse(rawURL)
+	if err != nil {
+		t.Fatalf("url.Parse() error = %v", err)
+	}
+	_, portText, err := net.SplitHostPort(parsed.Host)
+	if err != nil {
+		t.Fatalf("net.SplitHostPort() error = %v", err)
+	}
+	port, err := strconv.Atoi(portText)
+	if err != nil {
+		t.Fatalf("strconv.Atoi() error = %v", err)
+	}
+	return port
+}
--- a/packages/browseros-agent/apps/cli/cmd/root.go
+++ b/packages/browseros-agent/apps/cli/cmd/root.go
@@ -2,10 +2,8 @@ package cmd

 import (
 	"context"
-	"encoding/json"
 	"fmt"
 	"os"
-	"path/filepath"
 	"strconv"
 	"strings"
 	"time"
@@ -289,18 +287,15 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
 	}
 }

+// defaultServerURL returns the implicit target from user-controlled settings only.
+//
+// BrowserOS writes a discovery file at runtime, but normal commands intentionally
+// ignore it so a saved URL is not silently overridden by another running server.
 func defaultServerURL() string {
-	// 1. Explicit env var always wins
 	if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
 		return env
 	}

-	// 2. Live discovery file from running BrowserOS (most current)
-	if url := loadBrowserosServerURL(); url != "" {
-		return url
-	}
-
-	// 3. Saved config (may be stale if port changed)
 	cfg, err := config.Load()
 	if err == nil {
 		if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -311,33 +306,6 @@ func defaultServerURL() string {
 	return ""
 }

-type serverDiscoveryConfig struct {
-	ServerPort       int    `json:"server_port"`
-	URL              string `json:"url"`
-	ServerVersion    string `json:"server_version"`
-	BrowserOSVersion string `json:"browseros_version,omitempty"`
-	ChromiumVersion  string `json:"chromium_version,omitempty"`
-}
-
-func loadBrowserosServerURL() string {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return ""
-	}
-
-	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
-	if err != nil {
-		return ""
-	}
-
-	var sc serverDiscoveryConfig
-	if err := json.Unmarshal(data, &sc); err != nil {
-		return ""
-	}
-
-	return normalizeServerURL(sc.URL)
-}
-
 func normalizeServerURL(raw string) string {
 	normalized := strings.TrimSpace(raw)

@@ -369,8 +337,10 @@ func validateServerURL(raw string) (string, error) {

 	return "", fmt.Errorf(
 		"BrowserOS server URL is not configured.\n\n" +
-			"  If BrowserOS is running:  browseros-cli init --auto\n" +
-			"  If BrowserOS is closed:   browseros-cli launch\n" +
-			"  If not installed:         browseros-cli install",
+			"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
+			"  Save it with:       browseros-cli init <Server URL>\n" +
+			"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
+			"  If BrowserOS is closed:  browseros-cli launch\n" +
+			"  If not installed:        browseros-cli install",
 	)
 }
--- a/packages/browseros-agent/apps/cli/cmd/root_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/root_test.go
@@ -1,8 +1,13 @@
 package cmd

 import (
+	"os"
+	"path/filepath"
+	"strings"
 	"testing"
 	"time"
+
+	"browseros-cli/config"
 )

 func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -100,6 +105,76 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
 	}
 }

+func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
+
+	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := defaultServerURL()
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := defaultServerURL()
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	serverDir := filepath.Join(home, ".browseros")
+	if err := os.MkdirAll(serverDir, 0755); err != nil {
+		t.Fatalf("os.MkdirAll() error = %v", err)
+	}
+	data := []byte(`{"url":"http://127.0.0.1:9999"}`)
+	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
+		t.Fatalf("os.WriteFile() error = %v", err)
+	}
+
+	if got := defaultServerURL(); got != "" {
+		t.Fatalf("defaultServerURL() = %q, want empty", got)
+	}
+}
+
+func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
+	got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestValidateServerURLExplainsManualInit(t *testing.T) {
+	_, err := validateServerURL("")
+	if err == nil {
+		t.Fatal("validateServerURL() error = nil, want setup instructions")
+	}
+	msg := err.Error()
+	if !strings.Contains(msg, "browseros-cli init <Server URL>") {
+		t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
+	}
+	if strings.Contains(msg, "init --auto") {
+		t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
+	}
+}
+
 func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
 	done := make(chan struct{})
 	returned := make(chan struct{})
--- a/packages/browseros-agent/apps/cli/mcp/client.go
+++ b/packages/browseros-agent/apps/cli/mcp/client.go
@@ -44,10 +44,7 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {

 	session, err := sdkClient.Connect(ctx, transport, nil)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
-			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
-			"  If BrowserOS is not running:                  browseros-cli launch\n"+
-			"  If not installed:                             browseros-cli install", c.BaseURL, err)
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
 	}
 	return session, nil
 }
@@ -187,10 +184,7 @@ func (c *Client) Status() (map[string]any, error) {
 func (c *Client) restGET(path string) (map[string]any, error) {
 	resp, err := c.HTTPClient.Get(c.BaseURL + path)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
-			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
-			"  If BrowserOS is not running:                  browseros-cli launch\n"+
-			"  If not installed:                             browseros-cli install", c.BaseURL, err)
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
 	}
 	defer resp.Body.Close()

@@ -205,3 +199,14 @@ func (c *Client) restGET(path string) (map[string]any, error) {
 	}
 	return data, nil
 }
+
+// connectionSetupInstructions explains how to recover from a stale or missing server URL.
+func connectionSetupInstructions() string {
+	return "\n\n" +
+		"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
+		"  Save it with:       browseros-cli init <Server URL>\n" +
+		"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
+		"  Run once with:      browseros-cli --server <Server URL> health\n" +
+		"  If BrowserOS is closed:  browseros-cli launch\n" +
+		"  If not installed:        browseros-cli install"
+}
--- a/packages/browseros-agent/apps/cli/npm/README.md
+++ b/packages/browseros-agent/apps/cli/npm/README.md
@@ -31,8 +31,8 @@ browseros-cli install
 # Start BrowserOS
 browseros-cli launch

-# Auto-configure MCP settings for your AI tools
-browseros-cli init --auto
+# Configure MCP settings with the Server URL from BrowserOS settings
+browseros-cli init http://127.0.0.1:9000/mcp

 # Verify everything is working
 browseros-cli health
--- a/packages/browseros-agent/apps/eval/.env.example
+++ b/packages/browseros-agent/apps/eval/.env.example
@@ -0,0 +1,51 @@
+# Copy to .env.development for local eval runs.
+
+# Provider keys used by existing config files.
+OPENROUTER_API_KEY=
+FIREWORKS_API_KEY=
+ANTHROPIC_API_KEY=
+OPENAI_API_KEY=
+GOOGLE_GENERATIVE_AI_API_KEY=
+
+# Claude Agent SDK token used by performance_grader.
+CLAUDE_CODE_OAUTH_TOKEN=
+
+# Suite-mode model selection.
+EVAL_VARIANT=local
+EVAL_AGENT_PROVIDER=openai-compatible
+EVAL_AGENT_MODEL=
+EVAL_AGENT_API_KEY=
+EVAL_AGENT_BASE_URL=
+EVAL_AGENT_SUPPORTS_IMAGES=true
+
+# Optional suite-mode executor override for orchestrator suites.
+EVAL_EXECUTOR_MODEL=
+EVAL_EXECUTOR_API_KEY=
+EVAL_EXECUTOR_BASE_URL=
+
+# Clado visual action executor.
+CLADO_ACTION_MODEL=
+CLADO_ACTION_API_KEY=
+CLADO_ACTION_BASE_URL=
+# Backward-compatible alias used by older local scripts.
+CLADO_ACTION_URL=
+
+# BrowserOS runner.
+BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
+BROWSEROS_SERVER_URL=http://127.0.0.1:9110
+BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
+BROWSEROS_CONFIG_URL=
+
+# Captcha solver extension.
+NOPECHA_API_KEY=
+
+# WebArena-Infinity.
+WEBARENA_INFINITY_DIR=
+INFINITY_APP_URL=
+
+# R2 publishing and weekly report.
+EVAL_R2_ACCOUNT_ID=
+EVAL_R2_ACCESS_KEY_ID=
+EVAL_R2_SECRET_ACCESS_KEY=
+EVAL_R2_BUCKET=browseros-eval
+EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
--- a/packages/browseros-agent/apps/eval/README.md
+++ b/packages/browseros-agent/apps/eval/README.md
@@ -9,11 +9,13 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
 - **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
 - **Bun** runtime
 - **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
+- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.

 ## Quick Start

 ```bash
 cd apps/eval
+cp .env.example .env.development
 # Edit .env.development with your keys, then:
 bun run eval
 ```
@@ -23,17 +25,62 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
 ### CLI mode

 ```bash
-bun run eval -c configs/browseros-agent-weekly.json
+bun run eval -c configs/legacy/browseros-agent-weekly.json
+bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
 ```

 Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.

+The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
+
+```bash
+bun run eval run --config configs/legacy/browseros-agent-weekly.json
+bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
+bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
+bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
+```
+
+Config files live in two groups:
+
+```txt
+configs/legacy/  # Complete EvalConfig files used by older workflows and the dashboard
+configs/suites/  # Suite definitions; model/provider comes from CLI flags or env
+```
+
+Suite mode takes model settings from CLI flags first, then env:
+
+```bash
+EVAL_VARIANT=kimi-fireworks \
+EVAL_AGENT_PROVIDER=openai-compatible \
+EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
+EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
+EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
+bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
+```
+
+### Suites and variants
+
+A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
+
+A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
+
+This lets us run the same suite against multiple model setups without copying the benchmark config:
+
+```txt
+agisdk-daily-10 + kimi-fireworks
+agisdk-daily-10 + claude-opus
+agisdk-daily-10 + clado-action-000159
+```
+
+For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
+
 ## Agent types

 | Type | Description |
 |------|-------------|
 | `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
 | `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
+| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |

 ### Single agent

@@ -66,14 +113,32 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
    },
    "executor": {
      "provider": "clado-action",
-      "model": "qwen3-vl-30b-a3b-instruct",
+      "model": "Qwen3.5-35B-A3B-action-000159-merged",
      "apiKey": "",
-      "baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
+      "baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
    }
  }
 }
 ```

+### Claude Code
+
+Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
+
+```json
+{
+  "agent": {
+    "type": "claude-code",
+    "model": "opus"
+  }
+}
+```
+
+```bash
+BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
+bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
+```
+
 ## Graders

 | Name | Description |
@@ -96,6 +161,21 @@ The `apiKey` field supports two formats:
 - **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
 - **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)

+### Environment variables
+
+| Variable | Used for |
+|----------|----------|
+| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
+| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
+| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
+| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
+| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
+| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
+| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
+| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
+| `NOPECHA_API_KEY` | CAPTCHA solver extension |
+| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
+
 ### Supported providers

 | Provider | `provider` value | Requires `baseUrl` |
@@ -110,6 +190,22 @@ The `apiKey` field supports two formats:
 | Ollama | `ollama` | No |
 | Clado Action (executor only) | `clado-action` | Yes |

+### R2 publishing
+
+`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
+
+```bash
+export EVAL_R2_ACCOUNT_ID=...
+export EVAL_R2_ACCESS_KEY_ID=...
+export EVAL_R2_SECRET_ACCESS_KEY=...
+export EVAL_R2_BUCKET=browseros-eval
+export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
+```
+
+`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
+
+Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
+
 ### BrowserOS infrastructure

 ```json
@@ -119,7 +215,7 @@ The `apiKey` field supports two formats:
  "base_server_port": 9110,
  "base_extension_port": 9310,
  "load_extensions": false,
-  "headless": true
+  "headless": false
 }
 ```

@@ -137,10 +233,12 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP

 | File | Tasks | Description |
 |------|-------|-------------|
+| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
 | `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
 | `mind2web.jsonl` | 300 | Online-Mind2Web |
 | `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
-| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
+| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
+| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
 | `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
 | `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
 | `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
@@ -167,14 +265,47 @@ results/
  browseros-agent-weekly/
    2026-04-29-1430/
      Amazon--0/
+        attempt.json          # Stable attempt summary for viewer/reporting
        metadata.json         # Task result, timing, grader scores
+        grades.json           # Compact grader results
        messages.jsonl         # Full message log
+        grader-artifacts/      # Grader-specific inputs/outputs/stderr
        screenshots/
          001.png              # Step-by-step screenshots
          002.png
      summary.json             # Aggregate pass rates
 ```

+R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
+
+### R2 viewer manifest
+
+`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
+
+```json
+{
+  "schemaVersion": 2,
+  "runId": "agisdk-real-smoke-2026-04-30-0000",
+  "tasks": [
+    {
+      "queryId": "agisdk-dashdish-10",
+      "paths": {
+        "metadata": "tasks/agisdk-dashdish-10/metadata.json",
+        "messages": "tasks/agisdk-dashdish-10/messages.jsonl",
+        "grades": "tasks/agisdk-dashdish-10/grades.json",
+        "trace": "tasks/agisdk-dashdish-10/trace.jsonl",
+        "screenshots": "tasks/agisdk-dashdish-10/screenshots",
+        "graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
+      }
+    }
+  ]
+}
+```
+
+The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
+
+Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
+
 ## Troubleshooting

 **BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
@@ -7,8 +7,8 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../data/agisdk-real.jsonl",
-  "num_workers": 10,
+  "dataset": "../../data/agisdk-real-smoke.jsonl",
+  "num_workers": 1,
  "restart_server_per_task": true,
  "browseros": {
    "server_url": "http://127.0.0.1:9110",
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
@@ -0,0 +1,26 @@
+{
+  "agent": {
+    "type": "single",
+    "provider": "openai-compatible",
+    "model": "accounts/fireworks/models/kimi-k2p5",
+    "apiKey": "FIREWORKS_API_KEY",
+    "baseUrl": "https://api.fireworks.ai/inference/v1",
+    "supportsImages": true
+  },
+  "dataset": "../../data/agisdk-real.jsonl",
+  "num_workers": 4,
+  "restart_server_per_task": true,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  },
+  "graders": ["agisdk_state_diff"],
+  "timeout_ms": 1800000
+}
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../data/webbench-2of4-50.jsonl",
+  "dataset": "../../data/agisdk-real.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
@@ -21,6 +21,6 @@
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
  },
-  "graders": ["performance_grader"],
+  "graders": ["agisdk_state_diff"],
  "timeout_ms": 1800000
 }
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
@@ -14,7 +14,7 @@
      "baseUrl": "https://api.fireworks.ai/inference/v1"
    }
  },
-  "dataset": "../data/webbench-2of4-50.jsonl",
+  "dataset": "../../data/webbench-2of4-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
@@ -9,12 +9,12 @@
    },
    "executor": {
      "provider": "clado-action",
-      "model": "qwen3-vl-30b-a3b-instruct",
+      "model": "Qwen3.5-35B-A3B-action-000159-merged",
      "apiKey": "",
-      "baseUrl": "https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run"
+      "baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
    }
  },
-  "dataset": "../data/webbench-2of4-50.jsonl",
+  "dataset": "../../data/agisdk-real.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
@@ -28,6 +28,6 @@
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
  },
-  "graders": ["performance_grader"],
+  "graders": ["agisdk_state_diff"],
  "timeout_ms": 1800000
 }
--- a/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
@@ -0,0 +1,22 @@
+{
+  "agent": {
+    "type": "claude-code",
+    "model": "opus"
+  },
+  "dataset": "../../data/agisdk-real.jsonl",
+  "num_workers": 1,
+  "restart_server_per_task": true,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  },
+  "graders": ["agisdk_state_diff"],
+  "timeout_ms": 1800000
+}
--- a/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../data/webarena-infinity-hard-50.jsonl",
+  "dataset": "../../data/webarena-infinity-hard-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
@@ -5,7 +5,7 @@
    "model": "openai/gpt-4.1",
    "apiKey": "OPENROUTER_API_KEY"
  },
-  "dataset": "../data/mind2web.jsonl",
+  "dataset": "../../data/mind2web.jsonl",
  "num_workers": 5,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://api.fireworks.ai/inference/v1",
    "supportsImages": true
  },
-  "dataset": "../data/webvoyager.jsonl",
+  "dataset": "../../data/webvoyager.jsonl",
  "num_workers": 3,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
@@ -0,0 +1,22 @@
+{
+  "id": "agisdk-daily-10",
+  "dataset": "../../data/agisdk-daily-10.jsonl",
+  "agent": {
+    "type": "single"
+  },
+  "graders": ["agisdk_state_diff"],
+  "workers": 1,
+  "restartBrowserPerTask": true,
+  "timeoutMs": 1800000,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  }
+}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
@@ -0,0 +1,22 @@
+{
+  "id": "agisdk-real-smoke",
+  "dataset": "../../data/agisdk-real-smoke.jsonl",
+  "agent": {
+    "type": "single"
+  },
+  "graders": ["agisdk_state_diff"],
+  "workers": 1,
+  "restartBrowserPerTask": true,
+  "timeoutMs": 1800000,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  }
+}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
@@ -0,0 +1,22 @@
+{
+  "id": "agisdk-real",
+  "dataset": "../../data/agisdk-real.jsonl",
+  "agent": {
+    "type": "single"
+  },
+  "graders": ["agisdk_state_diff"],
+  "workers": 1,
+  "restartBrowserPerTask": true,
+  "timeoutMs": 1800000,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  }
+}
--- a/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
@@ -0,0 +1,10 @@
+{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
+{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
+{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
+{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
+{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
+{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
+{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
+{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
+{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
+{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}
--- a/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
@@ -0,0 +1 @@
+{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
--- a/packages/browseros-agent/apps/eval/package.json
+++ b/packages/browseros-agent/apps/eval/package.json
@@ -5,6 +5,7 @@
  "type": "module",
  "scripts": {
    "eval": "bun --env-file=.env.development run src/index.ts",
+    "test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
--- a/packages/browseros-agent/apps/eval/scripts/test-clado-api.ts
+++ b/packages/browseros-agent/apps/eval/scripts/test-clado-api.ts
@@ -1,34 +1,73 @@
 /**
- * Test script for Clado API endpoints (grounding + action models)
+ * Smoke-test for the Clado BrowserOS Action endpoint.
+ *
+ * Health-checks the model, then runs a generate call and prints every
+ * field the new contract documents (action, coordinates, text, key,
+ * direction, scroll/drag fields, wait, end+final_answer, thinking,
+ * parse_error, raw_response).
 *
 * Usage:
 *   bun apps/eval/scripts/test-clado-api.ts [screenshot-path]
 *
- * If no screenshot provided, captures one from a running BrowserOS server.
+ * If no screenshot path is given, captures one over MCP from a
+ * running BrowserOS server (default http://127.0.0.1:9110, override
+ * with BROWSEROS_URL).
+ *
+ * Cold start can take ~5 minutes; the script waits up to 6.
 */

 import { readFile } from 'node:fs/promises'
 import { resolve } from 'node:path'

 const ACTION_URL =
-  'https://clado-ai--clado-browseros-action-actionmodel-generate.modal.run'
+  'https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run'
 const ACTION_HEALTH_URL =
-  'https://clado-ai--clado-browseros-action-actionmodel-health.modal.run'
-const GROUNDING_URL =
-  'https://clado-ai--clado-browseros-grounding-groundingmodel-generate.modal.run'
-const GROUNDING_HEALTH_URL =
-  'https://clado-ai--clado-browseros-grounding-groundingmodel-health.modal.run'
+  'https://clado-ai--clado-browseros-action-000159-merged-actionmod-5e5033.modal.run'

-async function checkHealth(name: string, url: string): Promise<boolean> {
-  console.log(`\n--- ${name} health check ---`)
-  console.log(`  URL: ${url}`)
+const COLD_START_BUDGET_MS = 360_000 // 6 min — Clado cold start is ~5 min
+const COLD_START_WARN_MS = 30_000
+
+interface CladoResponse {
+  action?: string | null
+  thinking?: string | null
+  raw_response?: string
+  parse_error?: string | null
+  inference_time_seconds?: number
+  x?: number
+  y?: number
+  text?: string
+  key?: string
+  direction?: string
+  amount?: number
+  startX?: number
+  startY?: number
+  endX?: number
+  endY?: number
+  time?: number
+  final_answer?: string | null
+}
+
+async function checkHealth(): Promise<boolean> {
+  console.log(`\n--- Action model health ---`)
+  console.log(`  URL:   ${ACTION_HEALTH_URL}`)
+  console.log(
+    `  Note:  cold start can take ~5 min; waiting up to ${COLD_START_BUDGET_MS / 1000}s.`,
+  )
  const start = performance.now()
+  const warn = setTimeout(() => {
+    console.log(
+      `  ...still waiting (${COLD_START_WARN_MS / 1000}s in) — model is likely cold-starting on Modal.`,
+    )
+  }, COLD_START_WARN_MS)
+
  try {
-    const resp = await fetch(url, { signal: AbortSignal.timeout(30_000) })
+    const resp = await fetch(ACTION_HEALTH_URL, {
+      signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
+    })
    const elapsed = ((performance.now() - start) / 1000).toFixed(2)
    const body = await resp.text()
    console.log(`  Status: ${resp.status} (${elapsed}s)`)
-    console.log(`  Body: ${body.slice(0, 200)}`)
+    console.log(`  Body:   ${body.slice(0, 400)}`)
    return resp.ok
  } catch (err) {
    const elapsed = ((performance.now() - start) / 1000).toFixed(2)
@@ -36,63 +75,34 @@ async function checkHealth(name: string, url: string): Promise<boolean> {
      `  FAILED (${elapsed}s): ${err instanceof Error ? err.message : err}`,
    )
    return false
+  } finally {
+    clearTimeout(warn)
  }
 }

-async function testGenerate(
-  name: string,
-  url: string,
+async function generate(
+  label: string,
  payload: Record<string, unknown>,
-): Promise<Record<string, unknown> | null> {
-  console.log(`\n--- ${name} generate ---`)
-  console.log(`  URL: ${url}`)
+): Promise<CladoResponse | null> {
+  console.log(`\n--- ${label} ---`)
+  console.log(`  URL:         ${ACTION_URL}`)
  console.log(`  Instruction: ${payload.instruction}`)
  console.log(
-    `  Image size: ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
+    `  Image size:  ${((payload.image_base64 as string).length / 1024).toFixed(0)} KB (base64)`,
  )
-  if (payload.history) console.log(`  History: ${payload.history}`)
+  if (payload.history && payload.history !== 'None') {
+    console.log(`  History:     ${payload.history}`)
+  }

  const start = performance.now()
+  let resp: Response
  try {
-    const resp = await fetch(url, {
+    resp = await fetch(ACTION_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
-      signal: AbortSignal.timeout(120_000),
+      signal: AbortSignal.timeout(COLD_START_BUDGET_MS),
    })
-    const elapsed = ((performance.now() - start) / 1000).toFixed(2)
-
-    if (!resp.ok) {
-      const body = await resp.text()
-      console.log(`  FAILED: HTTP ${resp.status} (${elapsed}s)`)
-      console.log(`  Body: ${body.slice(0, 400)}`)
-      return null
-    }
-
-    const result = (await resp.json()) as Record<string, unknown>
-    console.log(`  Status: ${resp.status} (${elapsed}s)`)
-    console.log(`  Action: ${result.action}`)
-    if (result.x !== null && result.x !== undefined)
-      console.log(`  Coordinates: (${result.x}, ${result.y})`)
-    if (result.text)
-      console.log(`  Text: ${(result.text as string).slice(0, 100)}`)
-    if (result.key) console.log(`  Key: ${result.key}`)
-    if (result.inference_time_seconds)
-      console.log(`  Inference: ${result.inference_time_seconds}s`)
-
-    // Show thinking if present
-    const raw = result.raw_response as string | undefined
-    if (raw) {
-      const thinkMatch = raw.match(/<thinking>([\s\S]*?)<\/thinking>/)
-      if (thinkMatch) {
-        const thinking = thinkMatch[1].trim()
-        console.log(
-          `  Thinking: ${thinking.slice(0, 200)}${thinking.length > 200 ? '...' : ''}`,
-        )
-      }
-    }
-
-    return result
  } catch (err) {
    const elapsed = ((performance.now() - start) / 1000).toFixed(2)
    console.log(
@@ -100,6 +110,50 @@ async function testGenerate(
    )
    return null
  }
+  const elapsed = ((performance.now() - start) / 1000).toFixed(2)
+
+  if (!resp.ok) {
+    const body = await resp.text()
+    console.log(`  HTTP ${resp.status} ${resp.statusText} (${elapsed}s)`)
+    console.log(`  Body: ${body.slice(0, 400)}`)
+    return null
+  }
+
+  const result = (await resp.json()) as CladoResponse
+  console.log(`  HTTP ${resp.status} (${elapsed}s)`)
+  console.log(`  action:                ${result.action ?? 'null'}`)
+  if (result.parse_error) {
+    console.log(`  parse_error:           ${result.parse_error}`)
+  }
+  if (result.thinking) {
+    const trimmed = result.thinking.replace(/\s+/g, ' ').trim()
+    console.log(
+      `  thinking:              ${trimmed.slice(0, 240)}${trimmed.length > 240 ? '…' : ''}`,
+    )
+  }
+  if (typeof result.x === 'number' || typeof result.y === 'number') {
+    console.log(`  x, y:                  ${result.x}, ${result.y}`)
+  }
+  if (typeof result.text === 'string')
+    console.log(`  text:                  ${result.text.slice(0, 120)}`)
+  if (typeof result.key === 'string')
+    console.log(`  key:                   ${result.key}`)
+  if (typeof result.direction === 'string')
+    console.log(`  direction:             ${result.direction}`)
+  if (typeof result.amount === 'number')
+    console.log(`  amount:                ${result.amount}`)
+  if (typeof result.startX === 'number' || typeof result.endX === 'number') {
+    console.log(
+      `  drag:                  (${result.startX}, ${result.startY}) → (${result.endX}, ${result.endY})`,
+    )
+  }
+  if (typeof result.time === 'number')
+    console.log(`  time:                  ${result.time}s`)
+  if (result.final_answer)
+    console.log(`  final_answer:          ${result.final_answer.slice(0, 240)}`)
+  if (typeof result.inference_time_seconds === 'number')
+    console.log(`  inference_time_seconds: ${result.inference_time_seconds}`)
+  return result
 }

 async function loadScreenshot(path?: string): Promise<string> {
@@ -110,10 +164,9 @@ async function loadScreenshot(path?: string): Promise<string> {
    return data.toString('base64')
  }

-  // Try to capture from a running BrowserOS server
  const serverUrl = process.env.BROWSEROS_URL || 'http://127.0.0.1:9110'
  console.log(
-    `No screenshot path provided. Trying to capture from ${serverUrl}...`,
+    `No screenshot path provided. Capturing from ${serverUrl} via MCP...`,
  )

  const { Client } = await import('@modelcontextprotocol/sdk/client/index.js')
@@ -134,82 +187,101 @@ async function loadScreenshot(path?: string): Promise<string> {
      arguments: { format: 'png', page: 1 },
    })) as { content: Array<{ type: string; data?: string }> }

-    const imageContent = result.content?.find((c) => c.type === 'image')
-    if (!imageContent?.data)
-      throw new Error('No image data in screenshot response')
+    const image = result.content?.find((c) => c.type === 'image')
+    if (!image?.data)
+      throw new Error('No image data in take_screenshot response')

    console.log(
-      `Captured screenshot (${(imageContent.data.length / 1024).toFixed(0)} KB base64)`,
+      `Captured screenshot (${(image.data.length / 1024).toFixed(0)} KB base64)`,
    )
-    return imageContent.data
+    return image.data
  } finally {
    try {
      await transport.close()
-    } catch {}
+    } catch {
+      /* ignore */
+    }
  }
 }

+function summarize(history: CladoResponse[]): string {
+  if (history.length === 0) return 'None'
+  return history
+    .map((h) => {
+      switch (h.action) {
+        case 'click':
+        case 'double_click':
+        case 'right_click':
+        case 'hover':
+          return `${h.action}(${h.x}, ${h.y})`
+        case 'type':
+          return `type(${JSON.stringify(h.text ?? '')})`
+        case 'press_key':
+          return `press_key(${JSON.stringify(h.key ?? '')})`
+        case 'scroll':
+          return `scroll(${h.direction ?? 'down'})`
+        case 'drag':
+          return `drag(${h.startX},${h.startY} -> ${h.endX},${h.endY})`
+        case 'wait':
+          return `wait(${h.time ?? 1}s)`
+        case 'end':
+          return 'end()'
+        default:
+          return h.action ?? 'invalid'
+      }
+    })
+    .join(' -> ')
+}
+
 async function main() {
-  const screenshotPath = process.argv[2]
+  console.log('=== Clado action endpoint smoke test ===')

-  console.log('=== Clado API Test ===\n')
-
-  // Health checks (parallel)
-  const [actionHealthy, groundingHealthy] = await Promise.all([
-    checkHealth('Action Model', ACTION_HEALTH_URL),
-    checkHealth('Grounding Model', GROUNDING_HEALTH_URL),
-  ])
-
-  if (!actionHealthy && !groundingHealthy) {
-    console.log('\nBoth endpoints are down. Exiting.')
+  const healthy = await checkHealth()
+  if (!healthy) {
+    console.log('\nHealth check failed. Exiting.')
    process.exit(1)
  }

-  // Load screenshot
  let imageBase64: string
  try {
-    imageBase64 = await loadScreenshot(screenshotPath)
+    imageBase64 = await loadScreenshot(process.argv[2])
  } catch (err) {
    console.log(
      `\nFailed to load screenshot: ${err instanceof Error ? err.message : err}`,
    )
    console.log(
-      'Provide a screenshot path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
+      'Pass a path: bun apps/eval/scripts/test-clado-api.ts path/to/screenshot.png',
    )
    process.exit(1)
  }

-  const instruction = 'Click on the search button or search bar'
+  const history: CladoResponse[] = []

-  // Test grounding model
-  if (groundingHealthy) {
-    await testGenerate('Grounding Model', GROUNDING_URL, {
-      instruction,
+  // Step 1: open task — let the model decide what to do.
+  const step1 = await generate('Step 1: cold task', {
+    instruction: 'Find the search bar and click it',
+    image_base64: imageBase64,
+    history: 'None',
+  })
+  if (step1?.action) history.push(step1)
+
+  // Step 2: continuation with history, asks for typing.
+  if (step1?.action) {
+    const step2 = await generate('Step 2: with history', {
+      instruction: 'Type "hello world" into the search bar',
      image_base64: imageBase64,
+      history: summarize(history),
    })
-  } else {
-    console.log('\nSkipping grounding model (unhealthy)')
+    if (step2?.action) history.push(step2)
  }

-  // Test action model (no history)
-  if (actionHealthy) {
-    const result = await testGenerate('Action Model (step 1)', ACTION_URL, {
-      instruction,
-      image_base64: imageBase64,
-      history: 'None',
-    })
-
-    // Test action model with history (simulate multi-turn)
-    if (result && result.action === 'click') {
-      await testGenerate('Action Model (step 2, with history)', ACTION_URL, {
-        instruction: 'Type "hello world" in the search bar',
-        image_base64: imageBase64,
-        history: `click(${result.x}, ${result.y})`,
-      })
-    }
-  } else {
-    console.log('\nSkipping action model (unhealthy)')
-  }
+  // Step 3: ask for end with a final answer to exercise that field.
+  await generate('Step 3: ask for end+final_answer', {
+    instruction:
+      'You have completed the task. Reply with end() and final_answer="done".',
+    image_base64: imageBase64,
+    history: summarize(history),
+  })

  console.log('\n=== Done ===')
 }
--- a/packages/browseros-agent/apps/eval/scripts/upload-run.ts
+++ b/packages/browseros-agent/apps/eval/scripts/upload-run.ts
@@ -1,349 +1,43 @@
+#!/usr/bin/env bun
+
 /**
 * Upload eval runs to R2.
 *
 * Two modes:
 *   bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
- *       → uploads that specific run
- *
 *   bun scripts/upload-run.ts results/browseros-agent-weekly
- *       → finds all timestamped subfolders, uploads any not yet in R2
- *
- * Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
- *           EVAL_R2_BUCKET (default: browseros-eval)
- *           EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
 */

-import { readdir, readFile, stat } from 'node:fs/promises'
-import { basename, dirname, extname, join } from 'node:path'
 import {
-  GetObjectCommand,
-  PutObjectCommand,
-  S3Client,
-} from '@aws-sdk/client-s3'
+  loadR2ConfigFromEnv,
+  R2Publisher,
+} from '../src/publishing/r2-publisher'

-const CONCURRENCY = 20
-
-const CONTENT_TYPES: Record<string, string> = {
-  '.json': 'application/json',
-  '.jsonl': 'application/x-ndjson',
-  '.png': 'image/png',
-}
-
-interface R2Config {
-  accountId: string
-  accessKeyId: string
-  secretAccessKey: string
-  bucket: string
-  cdnBaseUrl: string
-}
-
-function loadConfig(): R2Config {
-  const accountId = process.env.EVAL_R2_ACCOUNT_ID
-  const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
-  const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
-
-  if (!accountId || !accessKeyId || !secretAccessKey) {
-    console.error(
-      'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
-    )
-    process.exit(1)
-  }
-
-  return {
-    accountId,
-    accessKeyId,
-    secretAccessKey,
-    bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
-    cdnBaseUrl: (
-      process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
-    ).replace(/\/+$/, ''),
-  }
-}
-
-function createClient(config: R2Config): S3Client {
-  return new S3Client({
-    region: 'auto',
-    endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
-    credentials: {
-      accessKeyId: config.accessKeyId,
-      secretAccessKey: config.secretAccessKey,
-    },
-  })
-}
-
-async function upload(
-  client: S3Client,
-  bucket: string,
-  key: string,
-  body: Buffer,
-  contentType: string,
-) {
-  await client.send(
-    new PutObjectCommand({
-      Bucket: bucket,
-      Key: key,
-      Body: body,
-      ContentType: contentType,
-    }),
-  )
-}
-
-async function collectFiles(dir: string): Promise<string[]> {
-  const files: string[] = []
-  const entries = await readdir(dir, { withFileTypes: true })
-  for (const entry of entries) {
-    const full = join(dir, entry.name)
-    if (entry.isDirectory()) {
-      files.push(...(await collectFiles(full)))
-    } else {
-      files.push(full)
-    }
-  }
-  return files
-}
-
-async function runPool<T>(
-  items: T[],
-  concurrency: number,
-  fn: (item: T) => Promise<void>,
-) {
-  let i = 0
-  const workers = Array.from({ length: concurrency }, async () => {
-    while (i < items.length) {
-      const idx = i++
-      await fn(items[idx])
-    }
-  })
-  await Promise.all(workers)
-}
-
-// Check if a run has already been uploaded to R2
-async function isUploaded(
-  client: S3Client,
-  bucket: string,
-  runId: string,
-): Promise<boolean> {
-  try {
-    await client.send(
-      new GetObjectCommand({
-        Bucket: bucket,
-        Key: `runs/${runId}/manifest.json`,
-      }),
-    )
-    return true
-  } catch {
-    return false
-  }
-}
-
-// Detect if a directory is a run dir (has task subdirs with metadata.json)
-// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
-async function isRunDir(dir: string): Promise<boolean> {
-  const entries = await readdir(dir, { withFileTypes: true })
-  const subdirs = entries.filter((e) => e.isDirectory())
-  for (const subdir of subdirs) {
-    const metaPath = join(dir, subdir.name, 'metadata.json')
-    const metaStat = await stat(metaPath).catch(() => null)
-    if (metaStat?.isFile()) return true
-  }
-  return false
-}
-
-async function uploadSingleRun(
-  runDir: string,
-  runId: string,
-  r2Config: R2Config,
-  client: S3Client,
-): Promise<void> {
-  const taskDirs = await readdir(runDir, { withFileTypes: true })
-  const taskEntries = taskDirs.filter((d) => d.isDirectory())
-
-  if (taskEntries.length === 0) {
-    console.warn(`  No task subdirectories in ${runId}, skipping`)
-    return
-  }
-
-  const manifestTasks: Record<string, unknown>[] = []
-  const jobs: { key: string; filePath: string; contentType: string }[] = []
-
-  // Extract agent config from first task
-  let agentConfig: Record<string, unknown> | undefined
-  let dataset: string | undefined
-
-  for (const taskDir of taskEntries) {
-    const taskId = taskDir.name
-    const taskPath = join(runDir, taskId)
-    const metaPath = join(taskPath, 'metadata.json')
-
-    let meta: Record<string, unknown> = {}
-    try {
-      meta = JSON.parse(await readFile(metaPath, 'utf-8'))
-    } catch {
-      continue
-    }
-
-    if (!agentConfig && meta.agent_config)
-      agentConfig = meta.agent_config as Record<string, unknown>
-    if (!dataset && meta.dataset) dataset = meta.dataset as string
-
-    const files = await collectFiles(taskPath)
-    let screenshotCount = 0
-
-    for (const file of files) {
-      const relative = file.slice(taskPath.length + 1)
-      const ext = extname(file)
-      if (relative.startsWith('screenshots/') && ext === '.png')
-        screenshotCount++
-
-      jobs.push({
-        key: `runs/${runId}/${taskId}/${relative}`,
-        filePath: file,
-        contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
-      })
-    }
-
-    manifestTasks.push({
-      queryId: meta.query_id || taskId,
-      query: meta.query || '',
-      startUrl: meta.start_url || '',
-      status:
-        meta.termination_reason === 'completed'
-          ? 'completed'
-          : meta.termination_reason || 'unknown',
-      durationMs: meta.total_duration_ms || 0,
-      screenshotCount: (meta.screenshot_count as number) || screenshotCount,
-      graderResults: meta.grader_results || {},
-    })
-  }
-
-  if (manifestTasks.length === 0) {
-    console.warn(`  No completed tasks in ${runId}, skipping`)
-    return
-  }
-
-  console.log(
-    `  Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
-  )
-
-  let uploaded = 0
-  await runPool(jobs, CONCURRENCY, async (job) => {
-    const body = await readFile(job.filePath)
-    await upload(client, r2Config.bucket, job.key, body, job.contentType)
-    uploaded++
-    if (uploaded % 50 === 0 || uploaded === jobs.length) {
-      console.log(`    ${uploaded}/${jobs.length}`)
-    }
-  })
-
-  // Read summary.json if it exists
-  let summaryData: Record<string, unknown> | undefined
-  try {
-    summaryData = JSON.parse(
-      await readFile(join(runDir, 'summary.json'), 'utf-8'),
-    )
-  } catch {}
-
-  // Upload manifest
-  const manifest = {
-    runId,
-    uploadedAt: new Date().toISOString(),
-    agentConfig,
-    dataset,
-    summary: summaryData
-      ? {
-          passRate: summaryData.passRate,
-          avgDurationMs: summaryData.avgDurationMs,
-        }
-      : undefined,
-    tasks: manifestTasks,
-  }
-  const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
-  await upload(
-    client,
-    r2Config.bucket,
-    `runs/${runId}/manifest.json`,
-    manifestBody,
-    'application/json',
-  )
-
-  // Upload viewer.html to bucket root
-  const viewerPath = join(
-    import.meta.dir,
-    '..',
-    'src',
-    'dashboard',
-    'viewer.html',
-  )
-  const viewerBody = await readFile(viewerPath)
-  await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
-
-  console.log(`  Uploaded ${uploaded + 2} files`)
-  console.log(`  ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
-}
-
-async function main() {
+async function main(): Promise<void> {
  const inputDir = process.argv[2]
  if (!inputDir) {
-    console.error(
+    throw new Error(
      'Usage:\n' +
-        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730  (specific run)\n' +
-        '  bun scripts/upload-run.ts results/config-name                   (all un-uploaded runs)',
-    )
-    process.exit(1)
-  }
-
-  const dirStat = await stat(inputDir).catch(() => null)
-  if (!dirStat?.isDirectory()) {
-    console.error(`Not a directory: ${inputDir}`)
-    process.exit(1)
-  }
-
-  const r2Config = loadConfig()
-  const client = createClient(r2Config)
-
-  if (await isRunDir(inputDir)) {
-    // Single run: results/config-name/2026-03-21-1730
-    const timestamp = basename(inputDir)
-    const configName = basename(dirname(inputDir))
-    const runId = `${configName}-${timestamp}`
-    console.log(`Uploading run: ${runId}`)
-    await uploadSingleRun(inputDir, runId, r2Config, client)
-  } else {
-    // Config dir: results/config-name/ — upload all un-uploaded runs
-    const configName = basename(inputDir)
-    const entries = await readdir(inputDir, { withFileTypes: true })
-    const runDirs = entries
-      .filter((e) => e.isDirectory())
-      .map((e) => e.name)
-      .sort()
-
-    if (runDirs.length === 0) {
-      console.error('No run subdirectories found')
-      process.exit(1)
-    }
-
-    console.log(
-      `Found ${runDirs.length} runs for config "${configName}", checking R2...`,
-    )
-
-    let uploadedCount = 0
-    for (const dir of runDirs) {
-      const runId = `${configName}-${dir}`
-      const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
-      if (alreadyUploaded) {
-        console.log(`  ${runId}: already uploaded, skipping`)
-        continue
-      }
-
-      console.log(`  ${runId}: uploading...`)
-      await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
-      uploadedCount++
-    }
-
-    console.log(
-      `\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
+        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
+        '  bun scripts/upload-run.ts results/config-name',
    )
  }
+
+  const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
+  const result = await publisher.publishPath(inputDir)
+  for (const run of result.uploadedRuns) {
+    console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
+    console.log(run.viewerUrl)
+  }
+  for (const runId of result.skippedRuns) {
+    console.log(`${runId}: already uploaded, skipping`)
+  }
+  console.log(
+    `Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
+  )
 }

-main()
+main().catch((error) => {
+  console.error(error instanceof Error ? error.message : String(error))
+  process.exit(1)
+})
--- a/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
+++ b/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
@@ -24,45 +24,11 @@ import {
  PutObjectCommand,
  S3Client,
 } from '@aws-sdk/client-s3'
-
-interface ManifestTask {
-  queryId: string
-  query: string
-  status: string
-  durationMs: number
-  screenshotCount: number
-  graderResults: Record<string, { pass: boolean; score: number }>
-}
-
-interface Manifest {
-  runId: string
-  uploadedAt: string
-  agentConfig?: { type?: string; model?: string }
-  dataset?: string
-  summary?: { passRate?: number; avgDurationMs?: number }
-  tasks: ManifestTask[]
-}
-
-interface RunSummary {
-  runId: string
-  configName: string
-  date: string
-  avgScore: number
-  total: number
-  completed: number
-  failed: number
-  timeout: number
-  avgDurationMs: number
-  model: string
-  dataset: string
-  agentType: string
-}
-
-const PASS_FAIL_GRADER_ORDER = [
-  'agisdk_state_diff',
-  'infinity_state',
-  'performance_grader',
-]
+import {
+  buildRunSummaries,
+  type ReportManifest,
+  type RunSummary,
+} from '../src/reporting/run-summary'

 function requireEnv(name: string): string {
  const value = process.env[name]
@@ -87,7 +53,7 @@ const client = new S3Client({
 // Step 1: List all manifest.json files in runs/
 console.log('Scanning R2 for eval runs...')

-const manifests: Manifest[] = []
+const manifests: ReportManifest[] = []
 let continuationToken: string | undefined

 do {
@@ -127,64 +93,9 @@ if (manifests.length === 0) {
 }

 // Step 2: Build run summaries
-const runs: RunSummary[] = manifests
-  .map((m) => {
-    const total = m.tasks.length
-    const completed = m.tasks.filter((t) => t.status === 'completed').length
-    const failed = m.tasks.filter((t) => t.status === 'failed').length
-    const timeout = m.tasks.filter((t) => t.status === 'timeout').length
-
-    let scoredCount = 0
-    let scoreSum = 0
-    for (const task of m.tasks) {
-      if (!task.graderResults) continue
-      for (const name of PASS_FAIL_GRADER_ORDER) {
-        if (task.graderResults[name]) {
-          scoredCount++
-          scoreSum += task.graderResults[name].score ?? 0
-          break
-        }
-      }
-    }
-
-    const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
-    const durations = m.tasks
-      .filter((t) => t.durationMs > 0)
-      .map((t) => t.durationMs)
-    const avgDurationMs =
-      durations.length > 0
-        ? durations.reduce((a, b) => a + b, 0) / durations.length
-        : 0
-
-    const date = m.uploadedAt
-      ? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
-      : m.runId.slice(0, 15)
-
-    const model = m.agentConfig?.model || 'unknown'
-    const dataset = m.dataset || m.runId
-    const agentType = m.agentConfig?.type || 'unknown'
-
-    const configName = extractConfigName(m.runId)
-    return {
-      runId: m.runId,
-      configName,
-      date,
-      avgScore,
-      total,
-      completed,
-      failed,
-      timeout,
-      avgDurationMs,
-      model,
-      dataset,
-      agentType,
-    }
-  })
-  .sort((a, b) => a.date.localeCompare(b.date))
+const runs: RunSummary[] = buildRunSummaries(manifests)

 // Step 3: Identify unique config groups
-// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
-// Extract config name by stripping the date-time suffix pattern
 function escHtml(s: string): string {
  return s
    .replace(/&/g, '&amp;')
@@ -193,12 +104,6 @@ function escHtml(s: string): string {
    .replace(/"/g, '&quot;')
 }

-function extractConfigName(runId: string): string {
-  // "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
-  // "ci-weekly" → "ci-weekly" (no timestamp, old format)
-  return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
-}
-
 const configGroups = [...new Set(runs.map((r) => r.configName))]
 const defaultConfig = configGroups.includes('ci-weekly')
  ? 'ci-weekly'
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
@@ -0,0 +1,238 @@
+import { writeFile } from 'node:fs/promises'
+import { join } from 'node:path'
+import { DEFAULT_TIMEOUT_MS } from '../../constants'
+import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
+import { withEvalTimeout } from '../../utils/with-eval-timeout'
+import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
+import {
+  type ClaudeCodeProcessRunner,
+  createClaudeCodeProcessRunner,
+} from './process-runner'
+import {
+  ClaudeCodeStreamParser,
+  shouldCaptureScreenshotForTool,
+} from './stream-parser'
+
+export interface ClaudeCodeEvaluatorDeps {
+  processRunner?: ClaudeCodeProcessRunner
+}
+
+export class ClaudeCodeEvaluator implements AgentEvaluator {
+  private processRunner: ClaudeCodeProcessRunner
+
+  constructor(
+    private ctx: AgentContext,
+    deps: ClaudeCodeEvaluatorDeps = {},
+  ) {
+    this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
+  }
+
+  async execute(): Promise<AgentResult> {
+    const { config, task, capture, taskOutputDir } = this.ctx
+    const startTime = Date.now()
+    const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
+
+    await capture.messageLogger.logUser(task.query)
+
+    if (config.agent.type !== 'claude-code') {
+      throw new Error('ClaudeCodeEvaluator only supports claude-code config')
+    }
+    const agentConfig = config.agent
+
+    const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
+    await writeFile(
+      mcpConfigPath,
+      JSON.stringify(
+        buildClaudeCodeMcpConfig(config.browseros.server_url),
+        null,
+        2,
+      ),
+    )
+
+    const parser = new ClaudeCodeStreamParser()
+    const toolNamesById = new Map<string, string>()
+    const prompt = buildClaudeCodePrompt(task.query)
+    const args = buildClaudeCodeArgs({
+      prompt,
+      mcpConfigPath,
+      config: agentConfig,
+    })
+
+    const { terminationReason } = await withEvalTimeout(
+      timeoutMs,
+      capture,
+      async (signal) => {
+        const runResult = await this.processRunner.run({
+          executable: agentConfig.claudePath,
+          args,
+          cwd: taskOutputDir,
+          signal,
+          onStdoutLine: async (line) => {
+            const events = parser.pushLine(line)
+            for (const event of events) {
+              await this.handleStreamEvent(event, toolNamesById)
+            }
+          },
+        })
+
+        if (runResult.exitCode !== 0) {
+          const message =
+            runResult.stderr.trim() ||
+            `Claude Code exited with status ${runResult.exitCode}`
+          capture.addError('agent_execution', message, {
+            exitCode: runResult.exitCode,
+          })
+          if (!parser.getLastText()) {
+            throw new Error(message)
+          }
+        }
+
+        for (const error of runResult.streamErrors ?? []) {
+          capture.addWarning(
+            'message_logging',
+            `Claude Code stream event processing failed: ${error}`,
+          )
+        }
+
+        return runResult
+      },
+    )
+
+    const endTime = Date.now()
+    const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
+    const metadata = {
+      query_id: task.query_id,
+      dataset: task.dataset,
+      query: task.query,
+      started_at: new Date(startTime).toISOString(),
+      completed_at: new Date(endTime).toISOString(),
+      total_duration_ms: endTime - startTime,
+      total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
+      termination_reason: terminationReason,
+      final_answer: finalAnswer,
+      errors: capture.getErrors(),
+      warnings: capture.getWarnings(),
+      device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
+      agent_config: {
+        type: 'claude-code' as const,
+        model: agentConfig.model,
+      },
+      grader_results: {},
+    }
+
+    await capture.trajectorySaver.saveMetadata(metadata)
+
+    return {
+      metadata,
+      messages: capture.getMessages(),
+      finalAnswer,
+    }
+  }
+
+  private async handleStreamEvent(
+    event: UIMessageStreamEvent,
+    toolNamesById: Map<string, string>,
+  ): Promise<void> {
+    const { capture, task } = this.ctx
+    let screenshot: number | undefined
+
+    if (event.type === 'tool-input-available') {
+      toolNamesById.set(event.toolCallId, event.toolName)
+      if (isPageInput(event.input)) {
+        capture.setActivePageId(event.input.page)
+      }
+    }
+
+    if (
+      event.type === 'tool-output-available' ||
+      event.type === 'tool-output-error'
+    ) {
+      const toolName = toolNamesById.get(event.toolCallId)
+      if (toolName && shouldCaptureScreenshotForTool(toolName)) {
+        screenshot = await this.captureScreenshot()
+      }
+    }
+
+    await capture.messageLogger.logStreamEvent(event, screenshot)
+    capture.emitEvent(task.query_id, {
+      ...event,
+      ...(screenshot !== undefined && { screenshot }),
+    })
+  }
+
+  private async captureScreenshot(): Promise<number | undefined> {
+    const { capture, task } = this.ctx
+    try {
+      const screenshot = await capture.screenshot.capture(
+        capture.getActivePageId(),
+      )
+      capture.emitEvent(task.query_id, {
+        type: 'screenshot-captured',
+        screenshot,
+      })
+      return screenshot
+    } catch {
+      return undefined
+    }
+  }
+}
+
+function isPageInput(input: unknown): input is { page: number } {
+  return (
+    typeof input === 'object' &&
+    input !== null &&
+    'page' in input &&
+    typeof input.page === 'number'
+  )
+}
+
+function buildClaudeCodePrompt(taskQuery: string): string {
+  return [
+    'You are running inside BrowserOS eval.',
+    'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
+    'When the task is complete, respond with the final answer only.',
+    'If blocked, explain the blocker clearly.',
+    '',
+    `Task: ${taskQuery}`,
+  ].join('\n')
+}
+
+function buildClaudeCodeArgs({
+  prompt,
+  mcpConfigPath,
+  config,
+}: {
+  prompt: string
+  mcpConfigPath: string
+  config: ClaudeCodeAgentConfig
+}): string[] {
+  const args = [
+    '-p',
+    prompt,
+    '--mcp-config',
+    mcpConfigPath,
+    '--strict-mcp-config',
+    '--output-format',
+    'stream-json',
+    '--verbose',
+  ]
+
+  if (config.model) args.push('--model', config.model)
+  args.push(...config.extraArgs)
+
+  return args
+}
+
+function buildClaudeCodeMcpConfig(serverUrl: string) {
+  const trimmed = serverUrl.replace(/\/$/, '')
+  const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
+  return {
+    mcpServers: {
+      browseros: {
+        type: 'http',
+        url,
+        headers: { 'X-BrowserOS-Source': 'sdk-internal' },
+      },
+    },
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
@@ -0,0 +1,114 @@
+export interface ClaudeCodeRunOptions {
+  executable: string
+  args: string[]
+  cwd: string
+  signal?: AbortSignal
+  onStdoutLine: (line: string) => Promise<void>
+}
+
+export interface ClaudeCodeRunResult {
+  exitCode: number
+  stderr: string
+  streamErrors?: string[]
+}
+
+export interface ClaudeCodeProcessRunner {
+  run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
+}
+
+export interface SpawnOptions {
+  cwd: string
+  signal?: AbortSignal
+  onStdoutLine: (line: string) => Promise<void>
+}
+
+export interface CreateClaudeCodeProcessRunnerDeps {
+  spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
+}
+
+export function createClaudeCodeProcessRunner(
+  deps: CreateClaudeCodeProcessRunnerDeps = {},
+): ClaudeCodeProcessRunner {
+  const spawn = deps.spawn ?? spawnClaudeCode
+  return {
+    run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
+      spawn([executable, ...args], { cwd, signal, onStdoutLine }),
+  }
+}
+
+async function spawnClaudeCode(
+  cmd: string[],
+  options: SpawnOptions,
+): Promise<ClaudeCodeRunResult> {
+  const proc = Bun.spawn({
+    cmd,
+    cwd: options.cwd,
+    stdin: 'ignore',
+    stdout: 'pipe',
+    stderr: 'pipe',
+  })
+
+  const abort = () => {
+    try {
+      proc.kill('SIGTERM')
+    } catch {
+      // Process may already have exited.
+    }
+  }
+  options.signal?.addEventListener('abort', abort, { once: true })
+
+  try {
+    const streamErrors: string[] = []
+    const stdoutPromise = readLines(
+      proc.stdout,
+      options.onStdoutLine,
+      streamErrors,
+    )
+    const stderrPromise = new Response(proc.stderr).text()
+    const exitCode = await proc.exited
+    await stdoutPromise
+    const stderr = await stderrPromise
+    return { exitCode, stderr, streamErrors }
+  } finally {
+    options.signal?.removeEventListener('abort', abort)
+  }
+}
+
+async function readLines(
+  stream: ReadableStream<Uint8Array>,
+  onLine: (line: string) => Promise<void>,
+  streamErrors: string[],
+): Promise<void> {
+  const reader = stream.getReader()
+  const decoder = new TextDecoder()
+  let buffer = ''
+
+  while (true) {
+    const { done, value } = await reader.read()
+    if (done) break
+
+    buffer += decoder.decode(value, { stream: true })
+    const lines = buffer.split('\n')
+    buffer = lines.pop() ?? ''
+    for (const line of lines) {
+      await emitLine(line, onLine, streamErrors)
+    }
+  }
+
+  buffer += decoder.decode()
+  if (buffer.length > 0) {
+    await emitLine(buffer, onLine, streamErrors)
+  }
+}
+
+async function emitLine(
+  line: string,
+  onLine: (line: string) => Promise<void>,
+  streamErrors: string[],
+): Promise<void> {
+  try {
+    await onLine(line)
+  } catch (error) {
+    streamErrors.push(error instanceof Error ? error.message : String(error))
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
@@ -0,0 +1,142 @@
+import { randomUUID } from 'node:crypto'
+import type { UIMessageStreamEvent } from '../../types'
+
+type JsonObject = Record<string, unknown>
+
+export class ClaudeCodeStreamParser {
+  private lastText: string | null = null
+  private toolCallCount = 0
+
+  pushLine(line: string): UIMessageStreamEvent[] {
+    const trimmed = line.trim()
+    if (!trimmed) return []
+
+    let parsed: unknown
+    try {
+      parsed = JSON.parse(trimmed)
+    } catch {
+      return []
+    }
+
+    if (!isObject(parsed)) return []
+
+    if (parsed.type === 'assistant') {
+      return this.parseAssistantMessage(parsed)
+    }
+    if (parsed.type === 'user') {
+      return this.parseUserMessage(parsed)
+    }
+    if (parsed.type === 'result' && typeof parsed.result === 'string') {
+      this.lastText = parsed.result
+    }
+
+    return []
+  }
+
+  getLastText(): string | null {
+    return this.lastText
+  }
+
+  getToolCallCount(): number {
+    return this.toolCallCount
+  }
+
+  private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
+    const content = contentBlocks(message)
+    const events: UIMessageStreamEvent[] = []
+
+    for (const block of content) {
+      if (block.type === 'text' && typeof block.text === 'string') {
+        const id = randomUUID()
+        this.lastText = block.text
+        events.push(
+          { type: 'text-start', id },
+          { type: 'text-delta', id, delta: block.text },
+          { type: 'text-end', id },
+        )
+      } else if (
+        block.type === 'tool_use' &&
+        typeof block.id === 'string' &&
+        typeof block.name === 'string'
+      ) {
+        this.toolCallCount++
+        events.push({
+          type: 'tool-input-available',
+          toolCallId: block.id,
+          toolName: block.name,
+          input: block.input,
+        })
+      }
+    }
+
+    return events
+  }
+
+  private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
+    const content = contentBlocks(message)
+    const events: UIMessageStreamEvent[] = []
+
+    for (const block of content) {
+      if (
+        block.type !== 'tool_result' ||
+        typeof block.tool_use_id !== 'string'
+      ) {
+        continue
+      }
+
+      if (block.is_error === true) {
+        events.push({
+          type: 'tool-output-error',
+          toolCallId: block.tool_use_id,
+          errorText: stringifyToolContent(block.content),
+        })
+      } else {
+        events.push({
+          type: 'tool-output-available',
+          toolCallId: block.tool_use_id,
+          output: normalizeToolContent(block.content),
+        })
+      }
+    }
+
+    return events
+  }
+}
+
+export function shouldCaptureScreenshotForTool(toolName: string): boolean {
+  if (!toolName.startsWith('mcp__browseros__')) return false
+  return !toolName.endsWith('__take_screenshot')
+}
+
+function contentBlocks(message: JsonObject): JsonObject[] {
+  const inner = isObject(message.message) ? message.message : message
+  return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
+}
+
+function isObject(value: unknown): value is JsonObject {
+  return typeof value === 'object' && value !== null
+}
+
+function normalizeToolContent(content: unknown): unknown {
+  if (!Array.isArray(content)) return content
+  return content.map((item) => {
+    if (
+      isObject(item) &&
+      item.type === 'text' &&
+      typeof item.text === 'string'
+    ) {
+      return item.text
+    }
+    return item
+  })
+}
+
+function stringifyToolContent(content: unknown): string {
+  const normalized = normalizeToolContent(content)
+  if (typeof normalized === 'string') return normalized
+  try {
+    return JSON.stringify(normalized)
+  } catch {
+    return String(normalized)
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/index.ts
@@ -1,3 +1,4 @@
+import { ClaudeCodeEvaluator } from './claude-code'
 import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
 import { SingleAgentEvaluator } from './single-agent'
 import type { AgentContext, AgentEvaluator } from './types'
@@ -8,6 +9,8 @@ export function createAgent(context: AgentContext): AgentEvaluator {
      return new SingleAgentEvaluator(context)
    case 'orchestrator-executor':
      return new OrchestratorExecutorEvaluator(context)
+    case 'claude-code':
+      return new ClaudeCodeEvaluator(context)
  }
 }

--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
@@ -1,113 +1,67 @@
 import { randomUUID } from 'node:crypto'
+import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
+import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
+import { sleep } from '../../../../utils/sleep'
+import type {
+  ExecutorConfig,
+  ExecutorResult,
+} from '../../../orchestrator-executor/types'
+import type { ExecutorCallbacks } from '../../executor-backend'
 import {
-  CLADO_REQUEST_TIMEOUT_MS,
-  MAX_ACTIONS_PER_DELEGATION,
-} from '../../constants'
-import { McpClient, type McpToolResult } from '../../utils/mcp-client'
-import { sleep } from '../../utils/sleep'
-import type { ExecutorCallbacks } from './executor'
-import type { ExecutorConfig, ExecutorResult } from './types'
+  extractCladoThinking,
+  formatCladoHistory,
+  getCladoActionSignature,
+  parseCladoActions,
+  summarizeCladoPrediction,
+} from './clado-actions'
+import {
+  normalizeCladoDirection,
+  normalizeCladoPressKey,
+  normalizeCladoScrollAmount,
+  prepareCladoToolArgs,
+  resolveCladoPoint,
+} from './clado-browser-driver'
+import { CladoActionClient } from './clado-client'
+import {
+  CLADO_ACTION_PROVIDER,
+  type CladoAction,
+  type CladoActionPoint,
+  type CladoActionResponse,
+  type CladoViewport,
+  isCladoActionProvider,
+} from './types'

-const CLADO_ACTION_PROVIDER = 'clado-action'
-const PAGE_SCOPED_TOOLS = new Set<string>([
-  'take_screenshot',
-  'evaluate_script',
-  'click',
-  'click_at',
-  'hover',
-  'hover_at',
-  'clear',
-  'fill',
-  'press_key',
-  'type_at',
-  'drag',
-  'drag_at',
-  'scroll',
-  'handle_dialog',
-  'select_option',
-  'navigate_page',
-  'close_page',
-  'wait_for',
-])
-
-interface CladoActionResponse {
-  action?: string
-  x?: number
-  y?: number
-  text?: string
-  key?: string
-  direction?: string
-  startX?: number
-  startY?: number
-  endX?: number
-  endY?: number
-  amount?: number
-  time?: number
-  inference_time_seconds?: number
-  raw_response?: string
-}
-
-interface Viewport {
-  width: number
-  height: number
-}
-
-interface CladoAction {
-  action: string
-  x?: number
-  y?: number
-  text?: string
-  key?: string
-  direction?: string
-  startX?: number
-  startY?: number
-  endX?: number
-  endY?: number
-  amount?: number
-  time?: number
-}
-
-type RawActionPayload = Partial<CladoAction>
-
-interface ActionPoint {
-  x: number
-  y: number
-}
+const MAX_CONSECUTIVE_PARSE_FAILURES = 3

 function asErrorMessage(error: unknown): string {
  return error instanceof Error ? error.message : String(error)
 }

-function clampNormalized(value: number): number {
-  return Math.min(999, Math.max(0, Math.round(value)))
-}
-
-function isCladoProvider(provider: string): boolean {
-  return provider === CLADO_ACTION_PROVIDER
-}
-
 export class CladoActionExecutor {
  private readonly mcpClient: McpClient
+  private readonly cladoClient: CladoActionClient
  private readonly pageId: number
  private callbacks: ExecutorCallbacks = {}
  private stepsUsed = 0
-  private viewport: Viewport | null = null
-  private lastPoint: ActionPoint | null = null
+  private viewport: CladoViewport | null = null
+  private lastPoint: CladoActionPoint | null = null
  private currentUrl = ''

  constructor(
-    private readonly config: ExecutorConfig,
+    config: ExecutorConfig,
    serverUrl: string,
-    readonly _windowId?: number,
-    readonly _tabId?: number,
    initialPageId?: number,
  ) {
-    if (!isCladoProvider(config.provider)) {
+    if (!isCladoActionProvider(config.provider)) {
      throw new Error(
        `CladoActionExecutor requires provider="${CLADO_ACTION_PROVIDER}"`,
      )
    }
    this.mcpClient = new McpClient(`${serverUrl}/mcp`)
+    this.cladoClient = new CladoActionClient({
+      baseUrl: config.baseUrl,
+      apiKey: config.apiKey,
+    })
    this.pageId = initialPageId ?? 1
  }

@@ -135,6 +89,8 @@ export class CladoActionExecutor {
    const actionHistory: CladoAction[] = []
    let predictionCalls = 0
    const thinkingTrace: string[] = []
+    let consecutiveParseFailures = 0
+    let finalAnswer: string | undefined

    let status: ExecutorResult['status'] = 'done'
    let reason = 'Goal executed.'
@@ -155,7 +111,7 @@ export class CladoActionExecutor {
        break
      }

-      const historyForPrediction = this.formatHistory(actionHistory)
+      const historyForPrediction = formatCladoHistory(actionHistory)
      const actionToolCallId = randomUUID()
      const predictionInput = {
        instruction,
@@ -177,7 +133,7 @@ export class CladoActionExecutor {
          signal,
        )
        predictionCalls++
-        const thinking = this.extractThinking(prediction.raw_response)
+        const thinking = extractCladoThinking(prediction.raw_response)
        if (thinking) {
          const previous = thinkingTrace[thinkingTrace.length - 1]
          if (previous !== thinking) {
@@ -207,8 +163,19 @@ export class CladoActionExecutor {
        break
      }

-      const predictedActions = this.parseActions(prediction)
+      const predictedActions = parseCladoActions(prediction)
      if (predictedActions.length === 0) {
+        // Per Clado contract: HTTP 200 with action=null on parse failure.
+        // Count as an invalid step so the model can self-correct on the
+        // next call instead of dropping the trajectory.
+        consecutiveParseFailures++
+        const parseError =
+          prediction.parse_error ?? 'no parsable <answer> in raw_response'
+        actionHistory.push({
+          action: 'invalid',
+          text: `parse_error: ${parseError}`,
+        })
+        this.stepsUsed++
        await this.callbacks.onStepFinish?.({
          toolCalls: [
            {
@@ -222,16 +189,23 @@ export class CladoActionExecutor {
              toolCallId: actionToolCallId,
              toolName: 'clado_action_predict',
              output: {
-                prediction: this.summarizePrediction(prediction),
+                prediction: summarizeCladoPrediction(prediction),
                parsedActions: [],
+                parseError,
+                consecutiveParseFailures,
              },
            },
          ],
        })
-        status = 'blocked'
-        reason = 'Clado action response did not contain a valid action.'
-        break
+
+        if (consecutiveParseFailures >= MAX_CONSECUTIVE_PARSE_FAILURES) {
+          status = 'blocked'
+          reason = `Clado returned ${consecutiveParseFailures} consecutive unparseable responses.`
+          break
+        }
+        continue
      }
+      consecutiveParseFailures = 0

      let requestedStop = false
      const executionNotes: string[] = []
@@ -257,7 +231,7 @@ export class CladoActionExecutor {
                toolCallId: actionToolCallId,
                toolName: 'clado_action_predict',
                output: {
-                  prediction: this.summarizePrediction(prediction),
+                  prediction: summarizeCladoPrediction(prediction),
                  parsedActions: predictedActions,
                  executed: executionNotes,
                },
@@ -272,7 +246,12 @@ export class CladoActionExecutor {

        actionHistory.push(predictedAction)
        if (predictedAction.action === 'end') {
-          reason = 'Model requested end() and marked task complete.'
+          if (predictedAction.final_answer) {
+            finalAnswer = predictedAction.final_answer
+            reason = `Model requested end() with final_answer: ${predictedAction.final_answer.slice(0, 240)}`
+          } else {
+            reason = 'Model requested end() and marked task complete.'
+          }
          requestedStop = true
          break
        }
@@ -293,7 +272,7 @@ export class CladoActionExecutor {
              toolCallId: actionToolCallId,
              toolName: 'clado_action_predict',
              output: {
-                prediction: this.summarizePrediction(prediction),
+                prediction: summarizeCladoPrediction(prediction),
                parsedActions: predictedActions,
                executed: executionNotes,
              },
@@ -327,6 +306,7 @@ export class CladoActionExecutor {
      actions: actionHistory,
      url: this.currentUrl,
      thinkingTrace,
+      finalAnswer,
    })

    return {
@@ -344,121 +324,12 @@ export class CladoActionExecutor {
    actionHistory: CladoAction[],
    signal?: AbortSignal,
  ): Promise<CladoActionResponse> {
-    if (!this.config.baseUrl) {
-      throw new Error('executor.baseUrl must be set for clado-action provider')
-    }
-
-    const requestController = new AbortController()
-    const onAbort = () => requestController.abort()
-    signal?.addEventListener('abort', onAbort, { once: true })
-
-    const timeoutHandle = setTimeout(() => {
-      requestController.abort()
-    }, CLADO_REQUEST_TIMEOUT_MS)
-
-    try {
-      const headers: Record<string, string> = {
-        'Content-Type': 'application/json',
-      }
-      if (this.config.apiKey) {
-        headers.Authorization = `Bearer ${this.config.apiKey}`
-      }
-
-      const response = await fetch(this.config.baseUrl, {
-        method: 'POST',
-        headers,
-        body: JSON.stringify({
-          instruction,
-          image_base64: imageBase64,
-          history: this.formatHistory(actionHistory),
-        }),
-        signal: requestController.signal,
-      })
-
-      if (!response.ok) {
-        const body = await response.text()
-        throw new Error(
-          `HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
-        )
-      }
-
-      return (await response.json()) as CladoActionResponse
-    } finally {
-      clearTimeout(timeoutHandle)
-      signal?.removeEventListener('abort', onAbort)
-    }
-  }
-
-  private parseActions(prediction: CladoActionResponse): CladoAction[] {
-    const actionFromField =
-      typeof prediction.action === 'string' ? prediction.action : null
-
-    const rawActions = this.parseActionsFromRawResponse(prediction.raw_response)
-    const primaryFromRaw = rawActions[0] ?? null
-    const mergedPrimary = {
-      ...primaryFromRaw,
-      ...prediction,
-      action: actionFromField ?? primaryFromRaw?.action,
-    }
-
-    const normalized: CladoAction[] = []
-    const primary = this.normalizeActionPayload(mergedPrimary)
-    if (primary) normalized.push(primary)
-
-    for (const candidate of rawActions.slice(1)) {
-      const parsed = this.normalizeActionPayload(candidate)
-      if (!parsed) continue
-      const prev = normalized[normalized.length - 1]
-      if (
-        !prev ||
-        this.getActionSignature(prev) !== this.getActionSignature(parsed)
-      ) {
-        normalized.push(parsed)
-      }
-    }
-
-    return normalized
-  }
-
-  private normalizeActionPayload(
-    payload: RawActionPayload,
-  ): CladoAction | null {
-    if (!payload.action || typeof payload.action !== 'string') {
-      return null
-    }
-    return {
-      action: payload.action,
-      x: typeof payload.x === 'number' ? payload.x : undefined,
-      y: typeof payload.y === 'number' ? payload.y : undefined,
-      text: typeof payload.text === 'string' ? payload.text : undefined,
-      key: typeof payload.key === 'string' ? payload.key : undefined,
-      direction:
-        typeof payload.direction === 'string' ? payload.direction : undefined,
-      startX: typeof payload.startX === 'number' ? payload.startX : undefined,
-      startY: typeof payload.startY === 'number' ? payload.startY : undefined,
-      endX: typeof payload.endX === 'number' ? payload.endX : undefined,
-      endY: typeof payload.endY === 'number' ? payload.endY : undefined,
-      amount: typeof payload.amount === 'number' ? payload.amount : undefined,
-      time: typeof payload.time === 'number' ? payload.time : undefined,
-    }
-  }
-
-  private parseActionsFromRawResponse(
-    rawResponse: string | undefined,
-  ): RawActionPayload[] {
-    if (!rawResponse) return []
-    const matches = [
-      ...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
-    ]
-    const parsed: RawActionPayload[] = []
-    for (const match of matches) {
-      try {
-        parsed.push(JSON.parse(match[1]) as RawActionPayload)
-      } catch {
-        // ignore malformed answer blocks
-      }
-    }
-    return parsed
+    return this.cladoClient.requestActionPrediction({
+      instruction,
+      imageBase64,
+      actionHistory,
+      signal,
+    })
  }

  private async executeAction(
@@ -529,14 +400,14 @@ export class CladoActionExecutor {
      }

      case 'press_key': {
-        const key = this.normalizePressKey(action.key)
+        const key = normalizeCladoPressKey(action.key)
        await this.runTool('press_key', { key }, signal)
        return `Pressed key "${key}".`
      }

      case 'scroll': {
-        const direction = this.normalizeDirection(action.direction)
-        const amountPx = this.normalizeScrollAmount(action.amount)
+        const direction = normalizeCladoDirection(action.direction)
+        const amountPx = normalizeCladoScrollAmount(action.amount)
        const ticks = Math.max(1, Math.round(amountPx / 120))

        await this.runTool('scroll', { direction, amount: ticks }, signal)
@@ -578,7 +449,9 @@ export class CladoActionExecutor {
      }

      case 'end': {
-        return 'Model requested end().'
+        return action.final_answer
+          ? `Model requested end() with final_answer: ${action.final_answer.slice(0, 240)}`
+          : 'Model requested end().'
      }

      default: {
@@ -588,9 +461,10 @@ export class CladoActionExecutor {
  }

  private async captureScreenshotBase64(signal?: AbortSignal): Promise<string> {
+    // Clado contract is PNG or JPEG; use PNG for lossless input.
    const result = await this.runTool(
      'take_screenshot',
-      { format: 'webp', quality: 80 },
+      { format: 'png' },
      signal,
    )

@@ -604,7 +478,7 @@ export class CladoActionExecutor {
    return image.data
  }

-  private async getViewport(signal?: AbortSignal): Promise<Viewport> {
+  private async getViewport(signal?: AbortSignal): Promise<CladoViewport> {
    if (this.viewport) return this.viewport

    try {
@@ -635,15 +509,9 @@ export class CladoActionExecutor {
    normalizedX: number | undefined,
    normalizedY: number | undefined,
    signal?: AbortSignal,
-  ): Promise<ActionPoint> {
+  ): Promise<CladoActionPoint> {
    const viewport = await this.getViewport(signal)
-    const nx = clampNormalized(normalizedX ?? 500)
-    const ny = clampNormalized(normalizedY ?? 500)
-
-    return {
-      x: Math.round((nx / 1000) * viewport.width),
-      y: Math.round((ny / 1000) * viewport.height),
-    }
+    return resolveCladoPoint(viewport, normalizedX, normalizedY)
  }

  private async getCurrentUrl(signal?: AbortSignal): Promise<string> {
@@ -670,7 +538,7 @@ export class CladoActionExecutor {
      throw new Error('aborted')
    }

-    const toolArgs = this.prepareToolArgs(toolName, args)
+    const toolArgs = prepareCladoToolArgs(toolName, args, this.pageId)

    try {
      const raw = await this.mcpClient.callTool(toolName, toolArgs)
@@ -689,211 +557,22 @@ export class CladoActionExecutor {
    }
  }

-  private prepareToolArgs(
-    toolName: string,
-    args: Record<string, unknown>,
-  ): Record<string, unknown> {
-    const prepared: Record<string, unknown> = { ...args }
-
-    if (
-      toolName === 'evaluate_script' &&
-      typeof prepared.function === 'string' &&
-      prepared.expression === undefined
-    ) {
-      prepared.expression = this.toEvaluateExpression(prepared.function)
-      delete prepared.function
-    }
-
-    if (
-      toolName === 'click_at' &&
-      typeof prepared.dblClick === 'boolean' &&
-      prepared.clickCount === undefined
-    ) {
-      prepared.clickCount = prepared.dblClick ? 2 : 1
-      delete prepared.dblClick
-    }
-
-    // Use fixed page ID for all page-scoped tools (single-page operation)
-    if (PAGE_SCOPED_TOOLS.has(toolName) && typeof prepared.page !== 'number') {
-      prepared.page = this.pageId
-    }
-
-    return prepared
-  }
-
-  private toEvaluateExpression(rawFunction: unknown): string {
-    const source = String(rawFunction).trim()
-    if (source.startsWith('() =>') || source.startsWith('async () =>')) {
-      return `(${source})()`
-    }
-    if (source.startsWith('function')) {
-      return `(${source})()`
-    }
-    return source
-  }
-
-  private normalizePressKey(key: string | undefined): string {
-    const raw = (key ?? '').trim()
-    if (!raw) throw new Error('press_key action missing key field')
-
-    const map: Record<string, string> = {
-      'C-a': 'Control+A',
-      'C-c': 'Control+C',
-      'C-v': 'Control+V',
-      'C-x': 'Control+X',
-      'C-z': 'Control+Z',
-      'C-y': 'Control+Y',
-      'C-s': 'Control+S',
-      'C-t': 'Control+T',
-      'C-w': 'Control+W',
-      'C-h': 'Control+H',
-      'C-f': 'Control+F',
-      'C-+': 'Control++',
-      'C--': 'Control+-',
-      'C-tab': 'Control+Tab',
-      'C-S-tab': 'Control+Shift+Tab',
-      'C-S-n': 'Control+Shift+N',
-      'C-down': 'Control+ArrowDown',
-      'M-f4': 'Alt+F4',
-    }
-    return map[raw] ?? raw
-  }
-
-  private normalizeDirection(
-    direction: string | undefined,
-  ): 'up' | 'down' | 'left' | 'right' {
-    if (
-      direction === 'up' ||
-      direction === 'down' ||
-      direction === 'left' ||
-      direction === 'right'
-    ) {
-      return direction
-    }
-    return 'down'
-  }
-
-  private normalizeScrollAmount(amount: number | undefined): number {
-    if (typeof amount !== 'number') return 500
-    if (amount <= 0) return 100
-    const clamped = Math.min(amount, 1000)
-    return Math.max(100, Math.round((clamped / 1000) * 900))
-  }
-
-  private summarizePrediction(
-    prediction: CladoActionResponse,
-  ): Record<string, unknown> {
-    const preview =
-      typeof prediction.raw_response === 'string' &&
-      prediction.raw_response.length > 0
-        ? prediction.raw_response.slice(0, 240)
-        : undefined
-
-    return {
-      action: prediction.action,
-      x: prediction.x,
-      y: prediction.y,
-      text: prediction.text,
-      key: prediction.key,
-      direction: prediction.direction,
-      startX: prediction.startX,
-      startY: prediction.startY,
-      endX: prediction.endX,
-      endY: prediction.endY,
-      amount: prediction.amount,
-      time: prediction.time,
-      inference_time_seconds: prediction.inference_time_seconds,
-      raw_response_preview: preview,
-    }
-  }
-
-  private extractThinking(rawResponse: string | undefined): string | undefined {
-    if (!rawResponse) return undefined
-    const matches = [
-      ...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
-    ]
-    if (matches.length === 0) return undefined
-
-    const merged = matches
-      .map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
-      .filter((value) => value.length > 0)
-      .join(' ')
-
-    if (!merged) return undefined
-    return merged
-  }
-
-  private getActionSignature(action: CladoAction): string {
-    switch (action.action) {
-      case 'click':
-      case 'double_click':
-      case 'right_click':
-      case 'hover':
-        return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
-      case 'type':
-        return `${action.action}:${(action.text ?? '').slice(0, 16)}`
-      case 'press_key':
-        return `${action.action}:${action.key ?? 'key'}`
-      case 'scroll':
-        return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
-      case 'drag':
-        return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
-      case 'wait':
-        return `${action.action}:${action.time ?? 1}`
-      case 'end':
-        return 'end()'
-      default:
-        return action.action
-    }
-  }
-
-  private formatHistory(actions: CladoAction[]): string {
-    if (actions.length === 0) return 'None'
-
-    const parts = actions.map((action) => {
-      switch (action.action) {
-        case 'click':
-        case 'double_click':
-        case 'right_click':
-        case 'hover':
-          return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
-        case 'type': {
-          const text = (action.text ?? '').replace(/'/g, "\\'")
-          return `type('${text}')`
-        }
-        case 'press_key':
-          return `press_key('${action.key ?? 'Enter'}')`
-        case 'scroll':
-          return `scroll(${action.direction ?? 'down'})`
-        case 'drag':
-          return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
-        case 'wait':
-          return `wait(${Math.round(action.time ?? 1)}s)`
-        case 'end':
-          return 'end()'
-        default:
-          return action.action
-      }
-    })
-
-    return parts.join(' -> ')
-  }
-
  private buildObservation(params: {
    status: ExecutorResult['status']
    reason: string
    actions: CladoAction[]
    url: string
    thinkingTrace: string[]
+    finalAnswer?: string
  }): string {
-    const { status, reason, actions, url, thinkingTrace } = params
+    const { status, reason, actions, url, thinkingTrace, finalAnswer } = params
    const actionSummary =
      actions.length === 0
        ? 'No actions were executed.'
        : actions
            .slice(-5)
            .map(
-              (action, idx) => `${idx + 1}. ${this.getActionSignature(action)}`,
+              (action, idx) => `${idx + 1}. ${getCladoActionSignature(action)}`,
            )
            .join('\n')
    const thinkingSummary =
@@ -907,6 +586,7 @@ export class CladoActionExecutor {
      `Status: ${status}`,
      `Reason: ${reason}`,
      `URL: ${url || 'unknown'}`,
+      finalAnswer ? `Final answer: ${finalAnswer}` : '',
      '',
      'Recent actions:',
      actionSummary,
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
@@ -0,0 +1,191 @@
+import type {
+  CladoAction,
+  CladoActionResponse,
+  RawCladoActionPayload,
+} from './types'
+
+/** Parses Clado's structured response plus any raw `<answer>` blocks into executable actions. */
+export function parseCladoActions(
+  prediction: CladoActionResponse,
+): CladoAction[] {
+  const actionFromField =
+    typeof prediction.action === 'string' ? prediction.action : null
+
+  const rawActions = parseCladoActionsFromRawResponse(prediction.raw_response)
+  const primaryFromRaw = rawActions[0] ?? null
+  const mergedPrimary = {
+    ...primaryFromRaw,
+    ...prediction,
+    action: actionFromField ?? primaryFromRaw?.action,
+  }
+
+  const normalized: CladoAction[] = []
+  const primary = normalizeCladoActionPayload(mergedPrimary)
+  if (primary) normalized.push(primary)
+
+  for (const candidate of rawActions.slice(1)) {
+    const parsed = normalizeCladoActionPayload(candidate)
+    if (!parsed) continue
+    const prev = normalized[normalized.length - 1]
+    if (
+      !prev ||
+      getCladoActionSignature(prev) !== getCladoActionSignature(parsed)
+    ) {
+      normalized.push(parsed)
+    }
+  }
+
+  return normalized
+}
+
+export function normalizeCladoActionPayload(
+  payload: RawCladoActionPayload,
+): CladoAction | null {
+  if (!payload.action || typeof payload.action !== 'string') {
+    return null
+  }
+  return {
+    action: payload.action,
+    x: typeof payload.x === 'number' ? payload.x : undefined,
+    y: typeof payload.y === 'number' ? payload.y : undefined,
+    text: typeof payload.text === 'string' ? payload.text : undefined,
+    key: typeof payload.key === 'string' ? payload.key : undefined,
+    direction:
+      typeof payload.direction === 'string' ? payload.direction : undefined,
+    startX: typeof payload.startX === 'number' ? payload.startX : undefined,
+    startY: typeof payload.startY === 'number' ? payload.startY : undefined,
+    endX: typeof payload.endX === 'number' ? payload.endX : undefined,
+    endY: typeof payload.endY === 'number' ? payload.endY : undefined,
+    amount: typeof payload.amount === 'number' ? payload.amount : undefined,
+    time: typeof payload.time === 'number' ? payload.time : undefined,
+    final_answer:
+      typeof payload.final_answer === 'string'
+        ? payload.final_answer
+        : undefined,
+  }
+}
+
+export function parseCladoActionsFromRawResponse(
+  rawResponse: string | undefined,
+): RawCladoActionPayload[] {
+  if (!rawResponse) return []
+  const matches = [
+    ...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
+  ]
+  const parsed: RawCladoActionPayload[] = []
+  for (const match of matches) {
+    try {
+      parsed.push(JSON.parse(match[1]) as RawCladoActionPayload)
+    } catch {
+      // Ignore malformed answer blocks so one bad block does not drop the whole prediction.
+    }
+  }
+  return parsed
+}
+
+export function extractCladoThinking(
+  rawResponse: string | undefined,
+): string | undefined {
+  if (!rawResponse) return undefined
+  const matches = [
+    ...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
+  ]
+  if (matches.length === 0) return undefined
+
+  const merged = matches
+    .map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
+    .filter((value) => value.length > 0)
+    .join(' ')
+
+  if (!merged) return undefined
+  return merged
+}
+
+export function summarizeCladoPrediction(
+  prediction: CladoActionResponse,
+): Record<string, unknown> {
+  const preview =
+    typeof prediction.raw_response === 'string' &&
+    prediction.raw_response.length > 0
+      ? prediction.raw_response.slice(0, 240)
+      : undefined
+
+  return {
+    action: prediction.action,
+    x: prediction.x,
+    y: prediction.y,
+    text: prediction.text,
+    key: prediction.key,
+    direction: prediction.direction,
+    startX: prediction.startX,
+    startY: prediction.startY,
+    endX: prediction.endX,
+    endY: prediction.endY,
+    amount: prediction.amount,
+    time: prediction.time,
+    inference_time_seconds: prediction.inference_time_seconds,
+    raw_response_preview: preview,
+  }
+}
+
+export function getCladoActionSignature(action: CladoAction): string {
+  switch (action.action) {
+    case 'click':
+    case 'double_click':
+    case 'right_click':
+    case 'hover':
+      return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
+    case 'type':
+      return `${action.action}:${(action.text ?? '').slice(0, 16)}`
+    case 'press_key':
+      return `${action.action}:${action.key ?? 'key'}`
+    case 'scroll':
+      return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
+    case 'drag':
+      return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
+    case 'wait':
+      return `${action.action}:${action.time ?? 1}`
+    case 'end':
+      return action.final_answer
+        ? `end(${action.final_answer.slice(0, 32)})`
+        : 'end()'
+    case 'invalid':
+      return `invalid(${(action.text ?? '').slice(0, 40)})`
+    default:
+      return action.action
+  }
+}
+
+export function formatCladoHistory(actions: CladoAction[]): string {
+  if (actions.length === 0) return 'None'
+
+  const parts = actions.map((action) => {
+    switch (action.action) {
+      case 'click':
+      case 'double_click':
+      case 'right_click':
+      case 'hover':
+        return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
+      case 'type': {
+        const text = (action.text ?? '').replace(/'/g, "\\'")
+        return `type('${text}')`
+      }
+      case 'press_key':
+        return `press_key('${action.key ?? 'Enter'}')`
+      case 'scroll':
+        return `scroll(${action.direction ?? 'down'})`
+      case 'drag':
+        return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
+      case 'wait':
+        return `wait(${Math.round(action.time ?? 1)}s)`
+      case 'end':
+        return 'end()'
+      case 'invalid':
+        return 'invalid()'
+      default:
+        return action.action
+    }
+  })
+
+  return parts.join(' -> ')
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
@@ -0,0 +1,123 @@
+import {
+  CLADO_PAGE_SCOPED_TOOLS,
+  type CladoActionPoint,
+  type CladoViewport,
+} from './types'
+
+export function clampCladoNormalizedCoordinate(value: number): number {
+  return Math.min(999, Math.max(0, Math.round(value)))
+}
+
+/** Converts Clado's 0-1000 normalized coordinate space into BrowserOS viewport pixels. */
+export function resolveCladoPoint(
+  viewport: CladoViewport,
+  normalizedX: number | undefined,
+  normalizedY: number | undefined,
+): CladoActionPoint {
+  const nx = clampCladoNormalizedCoordinate(normalizedX ?? 500)
+  const ny = clampCladoNormalizedCoordinate(normalizedY ?? 500)
+
+  return {
+    x: Math.round((nx / 1000) * viewport.width),
+    y: Math.round((ny / 1000) * viewport.height),
+  }
+}
+
+/** Adapts Clado action tool arguments to the BrowserOS MCP tool argument contract. */
+export function prepareCladoToolArgs(
+  toolName: string,
+  args: Record<string, unknown>,
+  pageId: number,
+): Record<string, unknown> {
+  const prepared: Record<string, unknown> = { ...args }
+
+  if (
+    toolName === 'evaluate_script' &&
+    typeof prepared.function === 'string' &&
+    prepared.expression === undefined
+  ) {
+    prepared.expression = toCladoEvaluateExpression(prepared.function)
+    delete prepared.function
+  }
+
+  if (
+    toolName === 'click_at' &&
+    typeof prepared.dblClick === 'boolean' &&
+    prepared.clickCount === undefined
+  ) {
+    prepared.clickCount = prepared.dblClick ? 2 : 1
+    delete prepared.dblClick
+  }
+
+  if (
+    CLADO_PAGE_SCOPED_TOOLS.has(toolName) &&
+    typeof prepared.page !== 'number'
+  ) {
+    prepared.page = pageId
+  }
+
+  return prepared
+}
+
+export function toCladoEvaluateExpression(rawFunction: unknown): string {
+  const source = String(rawFunction).trim()
+  if (source.startsWith('() =>') || source.startsWith('async () =>')) {
+    return `(${source})()`
+  }
+  if (source.startsWith('function')) {
+    return `(${source})()`
+  }
+  return source
+}
+
+export function normalizeCladoPressKey(key: string | undefined): string {
+  const raw = (key ?? '').trim()
+  if (!raw) throw new Error('press_key action missing key field')
+
+  const map: Record<string, string> = {
+    'C-a': 'Control+A',
+    'C-c': 'Control+C',
+    'C-v': 'Control+V',
+    'C-x': 'Control+X',
+    'C-z': 'Control+Z',
+    'C-y': 'Control+Y',
+    'C-s': 'Control+S',
+    'C-t': 'Control+T',
+    'C-w': 'Control+W',
+    'C-h': 'Control+H',
+    'C-f': 'Control+F',
+    'C-+': 'Control++',
+    'C--': 'Control+-',
+    'C-tab': 'Control+Tab',
+    'C-S-tab': 'Control+Shift+Tab',
+    'C-S-n': 'Control+Shift+N',
+    'C-down': 'Control+ArrowDown',
+    'M-a': 'Meta+A',
+    'M-c': 'Meta+C',
+    'M-v': 'Meta+V',
+    'M-x': 'Meta+X',
+    'M-f4': 'Alt+F4',
+  }
+  return map[raw] ?? raw
+}
+
+export function normalizeCladoDirection(
+  direction: string | undefined,
+): 'up' | 'down' | 'left' | 'right' {
+  if (
+    direction === 'up' ||
+    direction === 'down' ||
+    direction === 'left' ||
+    direction === 'right'
+  ) {
+    return direction
+  }
+  return 'down'
+}
+
+export function normalizeCladoScrollAmount(amount: number | undefined): number {
+  if (typeof amount !== 'number') return 500
+  if (amount <= 0) return 100
+  const clamped = Math.min(amount, 1000)
+  return Math.max(100, Math.round((clamped / 1000) * 900))
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
@@ -0,0 +1,68 @@
+import { CLADO_REQUEST_TIMEOUT_MS } from '../../../../constants'
+import { formatCladoHistory } from './clado-actions'
+import type { CladoAction, CladoActionResponse } from './types'
+
+export interface CladoActionClientOptions {
+  baseUrl?: string
+  apiKey?: string
+}
+
+export interface CladoActionPredictionInput {
+  instruction: string
+  imageBase64: string
+  actionHistory: CladoAction[]
+  signal?: AbortSignal
+}
+
+/** Calls the Clado action model without exposing credentials in process arguments or artifacts. */
+export class CladoActionClient {
+  constructor(private readonly options: CladoActionClientOptions) {}
+
+  async requestActionPrediction(
+    input: CladoActionPredictionInput,
+  ): Promise<CladoActionResponse> {
+    if (!this.options.baseUrl) {
+      throw new Error('executor.baseUrl must be set for clado-action provider')
+    }
+
+    const requestController = new AbortController()
+    const onAbort = () => requestController.abort()
+    input.signal?.addEventListener('abort', onAbort, { once: true })
+
+    const timeoutHandle = setTimeout(() => {
+      requestController.abort()
+    }, CLADO_REQUEST_TIMEOUT_MS)
+
+    try {
+      const headers: Record<string, string> = {
+        'Content-Type': 'application/json',
+      }
+      if (this.options.apiKey) {
+        headers.Authorization = `Bearer ${this.options.apiKey}`
+      }
+
+      const response = await fetch(this.options.baseUrl, {
+        method: 'POST',
+        headers,
+        body: JSON.stringify({
+          instruction: input.instruction,
+          image_base64: input.imageBase64,
+          history: formatCladoHistory(input.actionHistory),
+        }),
+        signal: requestController.signal,
+      })
+
+      if (!response.ok) {
+        const body = await response.text()
+        throw new Error(
+          `HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
+        )
+      }
+
+      return (await response.json()) as CladoActionResponse
+    } finally {
+      clearTimeout(timeoutHandle)
+      input.signal?.removeEventListener('abort', onAbort)
+    }
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
@@ -0,0 +1,56 @@
+import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
+import type {
+  DelegationResult,
+  ExecutorBackend,
+  ExecutorCallbacks,
+} from '../../executor-backend'
+import { CladoActionExecutor } from './clado-action-executor'
+
+export interface CladoExecutorBackendOptions {
+  configTemplate: ResolvedAgentConfig
+  serverUrl: string
+  initialPageId?: number
+  callbacks?: ExecutorCallbacks
+}
+
+/** Executes delegated goals through the Clado visual action model. */
+export class CladoExecutorBackend implements ExecutorBackend {
+  readonly kind = 'clado'
+  private executor: CladoActionExecutor | null = null
+
+  constructor(private readonly options: CladoExecutorBackendOptions) {}
+
+  async execute(
+    instruction: string,
+    signal?: AbortSignal,
+  ): Promise<DelegationResult> {
+    const executor = this.getExecutor()
+    const result = await executor.execute(instruction, signal)
+    return result
+  }
+
+  async close(): Promise<void> {
+    await this.executor?.close()
+  }
+
+  getTotalSteps(): number {
+    return this.executor?.getTotalSteps() ?? 0
+  }
+
+  private getExecutor(): CladoActionExecutor {
+    if (this.executor) return this.executor
+
+    this.executor = new CladoActionExecutor(
+      {
+        provider: this.options.configTemplate.provider,
+        model: this.options.configTemplate.model,
+        apiKey: this.options.configTemplate.apiKey ?? '',
+        baseUrl: this.options.configTemplate.baseUrl,
+      },
+      this.options.serverUrl,
+      this.options.initialPageId,
+    )
+    this.executor.setCallbacks(this.options.callbacks ?? {})
+    return this.executor
+  }
+}
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}