chore(eval): drop the 60-char truncation on grader expected/actual values

Some criteria check long strings (job descriptions, post bodies, etc.) — truncating to 60 chars hides exactly the bytes you need to diff. The viewer's reasoning area already has max-height + scroll + word-break so long content scrolls; nothing renders worse for being full-length.
chore(eval): show every criterion in agisdk grader message, not just failures
2026-05-14 16:14:28 +00:00 · 2026-04-30 02:08:30 +05:30 · 2026-04-30 02:08:07 +05:30 · 2026-04-30 02:06:51 +05:30 · 2026-04-30 01:16:20 +05:30 · 2026-04-30 00:37:45 +05:30
368 changed files with 8310 additions and 23245 deletions
--- a/.claude/skills/ask-internal/SKILL.md
+++ b/.claude/skills/ask-internal/SKILL.md
@@ -1,152 +0,0 @@
---
-name: ask-internal
-description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
-allowed-tools: Bash, Read, Grep, Glob, Edit, Write
---
-
-# Ask Internal
-
-Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
-
-**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
-
-## When to use
-
- "How do I reset my dogfood profile?"
- "What's the deal with the OpenClaw VM startup?"
- "Where do we configure release signing?"
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
-
-## Hard rules — never do these
-
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER cite a file or line number you have not actually read.
-
-## Voice rules
-
-Apply the same voice rules as `document-internal` to the synthesized answer:
-
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences. Active voice. No em dashes.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- No filler intros.
-
-## Workflow
-
-### Step 0: Pre-flight
-
-```bash
-if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
-  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
-  exit 0
-fi
-[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
-  echo ".internal-docs/ missing or empty. Submodule not configured?"
-  exit 0
-}
-```
-
-### Step 1: Parse the question
-
-Pull the keywords from the user's question. Drop stop words. Identify intent:
-
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
- **Free-form** ("anything about Y"): search all categories.
-
-### Step 2: Multi-source search
-
-Run grep in parallel across two sources.
-
-**Internal docs:**
-
-```bash
-grep -rni --include='*.md' '<keyword>' .internal-docs/
-```
-
-Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
-
-**Codebase (skip vendored Chromium and `node_modules`):**
-
-```bash
-grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
-     --exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
-     '<keyword>' packages/ scripts/ .config/ .github/
-```
-
-Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
-
-### Step 3: Synthesize answer
-
-Structure the response:
-
-1. **Direct answer.** First sentence answers the question. No preamble.
-2. **Steps if applicable.** Numbered list with exact commands.
-3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
-
-If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
-
-### Step 4: Offer execution (only if commands surfaced)
-
-If Step 3 produced executable commands the user could run, ask:
-
-> Run these for you? (y / n / dry-run)
-
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
- **n:** Skip. Done.
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
-
-### Step 5: Doc-not-found path
-
-If Step 2 returned nothing useful (no doc hits AND no clear code answer):
-
-1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
-2. Ask: "Draft a new doc and open a PR to internal-docs?"
-3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
-
-### Step 6: Completion status
-
-Report one of:
-
- **DONE** — answer delivered, citations verified.
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
- **BLOCKED** — submodule missing or other pre-flight failure.
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
-
-## Citation discipline
-
-Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
-
-If a doc says one thing and the code says another, surface the conflict explicitly:
-
-> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
-
-## Common Mistakes
-
-**Skimming and then citing**
- **Problem:** Citation points to a line that doesn't actually contain the claim.
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
-
-**Executing without per-command confirmation for mutations**
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
-
-**Searching only docs, not code**
- **Problem:** Doc says X but code does Y; answer is wrong.
- **Fix:** Always grep both sources in Step 2.
-
-## Red Flags
-
-**Never:**
- Cite a file:line you haven't read.
- Run mutations without per-command confirmation.
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
-
-**Always:**
- Pre-flight check before any search.
- Reconcile doc vs code conflicts in the answer, don't hide them.
- Plain "no doc covers this" when grep is empty — never invent.
--- a/.claude/skills/document-internal/SKILL.md
+++ b/.claude/skills/document-internal/SKILL.md
@@ -1,208 +0,0 @@
---
-name: document-internal
-description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
-allowed-tools: Bash, Read, Write, Edit, Grep, Glob
---
-
-# Document Internal
-
-Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
-
-**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
-
-## When to use
-
-After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
-
-## Hard rules — never do these
-
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
- NEVER fabricate filler content for empty template sections. Empty stays empty.
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
-
-## Voice rules — enforced by Step 4
-
-The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
-
- Lead with the point. First sentence answers "what is this?"
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
- Short sentences. Average <20 words. No deeply nested clauses.
- Active voice. "X does Y" not "Y is done by X".
- No em dashes. Use commas, periods, or rephrase.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
- No filler intros ("This document describes..."). Start with the substance.
- Empty sections stay empty. Do not write "N/A" or fabricate content.
-
-## Workflow
-
-### Step 0: Pre-flight
-
-Bail with a clear message on any failure.
-
-```bash
-# Submodule must be initialized
-if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
-  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
-  exit 0
-fi
-[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
-
-# Must be on a feature branch
-BRANCH=$(git branch --show-current)
-if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
-  echo "On $BRANCH. Run from a feature branch."
-  exit 0
-fi
-
-# Determine base branch (default: dev for this repo, fall back to main).
-# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
-BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
-
-# Gather context
-git log "$BASE..HEAD" --oneline
-git diff "$BASE...HEAD" --stat
-gh pr view --json body -q .body 2>/dev/null  # may be empty if no PR yet
-```
-
-### Step 1: Identify the doc
-
-Ask the user for three things in one prompt:
-
-1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
-2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
-3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
-
-### Step 2: Decision brief — four sharp questions
-
-Ask one question at a time. Each answer constrains the next. These force compression before drafting.
-
-1. "In one sentence: what can someone now DO that they could not before?"
-2. "What is the one design decision a future engineer needs to know?"
-3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
-4. "Any sharp edges or gotchas? (or 'none')"
-
-Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
-
-### Step 3: Draft from the template
-
-Read the matching template from `.internal-docs/_templates/`:
-
- `feature` → `feature-note.md`
- `architecture` → `architecture-note.md`
- `design` → `design-spec.md`
-
-If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
-
-Generate the 1-pager from the template, the four answers, and the diff context.
-
-### Step 4: Voice self-check
-
-Scan the draft for violations:
-
- Em dash present (`—`).
- Any banned word from the list.
- Average sentence length > 20 words.
- Body line count > 60 (feature notes only — architecture/design have no cap).
-
-If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
-
-If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
-
-### Step 5: Show + iterate
-
-Print the full draft. Ask:
-
-> Edit needed? Paste any changes, or say "looks good".
-
-Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
-
-### Step 6: Open PR to internal-docs
-
-Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
-
-```bash
-TMP=$(mktemp -d)
-trap 'rm -rf "$TMP"' EXIT  # cleans up even if any step below fails
-git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
-cd "$TMP"
-git checkout -b "docs/<slug>"
-
-# Write the doc
-mkdir -p "<type>"  # features, architecture, designs, or setup
-cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
-<draft content>
-DOC
-
-# Update the root README index — insert one line under the matching section
-# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
-
-git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
-git commit -m "docs(<type>): <slug>"
-git push -u origin "docs/<slug>"
-
-PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
-  --head "docs/<slug>" \
-  --title "docs(<type>): <slug>" \
-  --body "$(cat <<'BODY'
-## Summary
-<one-line of what this doc covers>
-
-## Source
- BrowserOS branch: <branch>
- Related PR: <#NNN if any>
-BODY
-)")
-
-cd -
-echo "PR opened: $PR_URL"
-# trap above cleans up $TMP on EXIT
-```
-
-If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
-
-### Step 7: Completion status
-
-Report one of:
-
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
-
-## Doc type defaults
-
-| Branch pattern | Default doc type | Default location |
-|----------------|------------------|------------------|
-| `feat/*`       | feature          | `features/`      |
-| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
-| `rfc/*` or `design/*` | design          | `designs/`       |
-| Otherwise      | ask              | ask              |
-
-## Common Mistakes
-
-**Drafting before asking the four questions**
- **Problem:** Output is generic filler that says nothing concrete.
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
-
-**Touching `.internal-docs/` directly**
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
- **Fix:** Always use the tmp clone in Step 6.
-
-**Skipping voice check on user edits**
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
- **Fix:** Re-run Step 4 after every user edit.
-
-## Red Flags
-
-**Never:**
- Push to `internal-docs/main`. Always branch + PR.
- Modify the OSS repo's `.gitmodules` or submodule pointer.
- Fabricate content for empty template sections.
-
-**Always:**
- Pre-flight check before doing any work.
- One-pager rule for feature notes (60-line body cap).
- File:line citations when referencing code.
--- a/.claude/skills/document-internal/seeds/README.md
+++ b/.claude/skills/document-internal/seeds/README.md
@@ -1,51 +0,0 @@
-# BrowserOS Internal Docs
-
-Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
-
-If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
-
-## How to find what you need
-
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
- Recently shipped feature → look in [`features/`](features/)
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
- A design decision or RFC → look in [`designs/`](designs/)
-
-Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
-
-## How to add a doc
-
-Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
-
-## Index
-
-### Setup
-<!-- one line per setup runbook: -->
-<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
-
-### Features
-<!-- one line per shipped feature, newest first: -->
-<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
-
-### Architecture
-<!-- one line per cross-cutting subsystem: -->
-<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
-
-### Designs
-<!-- one line per design spec, newest first: -->
-<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
-
-## Templates
-
-When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
-
-## Voice
-
-Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
-
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences, active voice, no em dashes.
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
- Empty sections stay empty. Do not write "N/A" or fake content.
- Feature notes target one screen, body 60 lines max.
--- a/.claude/skills/document-internal/seeds/_templates/architecture-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/architecture-note.md
@@ -1,31 +0,0 @@
---
-title: <subsystem name>
-owner: <github handle>
-status: current | deprecated
-date: YYYY-MM-DD
-related-features: [feature-slug-1, feature-slug-2]
---
-
-# <subsystem name>
-
-## What this subsystem does
-<1-2 paragraphs. The top-level responsibility. Boundaries.>
-
-## Architecture
-<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
-
-## Constraints
-<Hard rules the design enforces. "X must never call Y" type statements.>
-
-## Decisions made
-<Numbered list of non-obvious decisions and the reason for each.>
-
-## Key files
- `path/to/file.ts` — role
- `path/to/dir/` — what lives here
-
-## How to evolve this
-<Where to add things. Which tests to expect to update. What NOT to touch.>
-
-## Open questions
-<What is still being figured out. Empty if none.>
--- a/.claude/skills/document-internal/seeds/_templates/design-spec.md
+++ b/.claude/skills/document-internal/seeds/_templates/design-spec.md
@@ -1,34 +0,0 @@
---
-title: <design name>
-owner: <github handle>
-status: proposed | accepted | rejected | superseded
-date: YYYY-MM-DD
-supersedes: <design-slug or none>
---
-
-# <design name>
-
-## Goal
-<2-4 sentences. What this design is trying to accomplish.>
-
-## Context
-<1-2 paragraphs. The current state, what is failing, why this needs to change.>
-
-## Selected Approach
-<The chosen design at a high level. Architecture, components, data flow.>
-
-## Alternatives Considered
-### 1. <name>
-<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
-
-### 2. <name>
-<Same shape.>
-
-## Out of Scope
-<What this design does NOT cover. Defer references.>
-
-## Rollout
-<Numbered steps from "nothing exists" to "fully shipped".>
-
-## Open Questions
-<Resolved during design? Empty. Unresolved? List with owner.>
--- a/.claude/skills/document-internal/seeds/_templates/feature-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/feature-note.md
@@ -1,29 +0,0 @@
---
-title: <feature name>
-owner: <github handle>
-status: shipped | wip | deprecated
-date: YYYY-MM-DD
-prs: ["#NNN"]
-tags: [agent, browser, mcp]
---
-
-# <feature name>
-
-## What it does
-<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
-
-## Why we built it
-<1-2 sentences. Motivation. What pain it removed or what unlocked.>
-
-## How it works
-<3-6 sentences. The flow at a high level. Name the key files.>
-
-## Key files
- `path/to/file.ts` — what it does
- `path/to/other.ts` — what it does
-
-## How to run / test it locally
-<bullet list of commands. Empty section if N/A — do not fake.>
-
-## Gotchas
-<known sharp edges. "If you see X, that's why." Empty if N/A.>
--- a/.github/workflows/build-agent.yml
+++ b/.github/workflows/build-agent.yml
@@ -0,0 +1,157 @@
+name: build-agent
+
+on:
+  workflow_dispatch:
+    inputs:
+      agent:
+        description: "Agent name from bundle.json"
+        required: true
+        type: string
+        default: openclaw
+      publish:
+        description: "Upload to R2 and merge manifest slice"
+        required: false
+        default: false
+        type: boolean
+  pull_request:
+    paths:
+      - "packages/browseros-agent/packages/build-tools/**"
+      - ".github/workflows/build-agent.yml"
+
+env:
+  BUN_VERSION: "1.3.6"
+  PKG_DIR: packages/browseros-agent/packages/build-tools
+
+permissions:
+  contents: read
+
+jobs:
+  check:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools typecheck
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools test
+
+  build:
+    needs: check
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - arch: arm64
+            runner: ubuntu-24.04-arm
+    runs-on: ${{ matrix.runner }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Build tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          OUT: ${{ github.workspace }}/dist/images
+        run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
+      - uses: actions/upload-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
+          path: dist/images/
+          retention-days: 7
+
+  smoke:
+    needs: build
+    runs-on: ubuntu-24.04-arm
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-arm64
+          path: dist/images
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Smoke test tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+        run: |
+          set -euo pipefail
+          tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-arm64.tar.gz" -print -quit)"
+          if [ -z "$tarball" ]; then
+            echo "missing arm64 tarball artifact for ${AGENT}" >&2
+            exit 1
+          fi
+          bun run smoke:tarball -- --agent "$AGENT" --arch arm64 --tarball "$tarball"
+
+  publish:
+    needs: [build, smoke]
+    if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
+    runs-on: ubuntu-24.04
+    environment: release
+    concurrency:
+      group: r2-manifest-publish
+      cancel-in-progress: false
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          pattern: tarball-*
+          path: dist/images
+          merge-multiple: true
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Upload tarballs to R2
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
+            base="$(basename "$file")"
+            bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
+          done
+      - name: Merge agent slice into manifest
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          mkdir -p dist/images
+          cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
+          bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
+          bun run emit-manifest -- \
+            --slice "agents:${AGENT}" \
+            --dist-dir dist \
+            --merge-from dist/baseline-manifest.json \
+            --out dist/manifest.json
+          bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
--- a/.github/workflows/eval-weekly.yml
+++ b/.github/workflows/eval-weekly.yml
@@ -14,7 +14,7 @@ on:
      config:
        description: 'Eval config file (relative to apps/eval/)'
        required: false
-        default: 'configs/legacy/browseros-agent-weekly.json'
+        default: 'configs/browseros-agent-weekly.json'

 permissions:
  contents: read
@@ -44,19 +44,6 @@ jobs:
        working-directory: packages/browseros-agent
        run: bun install --ignore-scripts

-      - name: Install Claude Code CLI
-        working-directory: packages/browseros-agent/apps/eval
-        env:
-          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
-        run: |
-          if bun -e "const config = await Bun.file(process.env.EVAL_CONFIG).json(); process.exit(config.agent?.type === 'claude-code' ? 0 : 1)"; then
-            npm install -g @anthropic-ai/claude-code@2.1.119
-            echo "Claude Code CLI installed at $(command -v claude)"
-            claude --version
-          else
-            echo "Eval config does not use Claude Code; skipping Claude Code CLI install"
-          fi
-
      - name: Install Python eval dependencies
        # agisdk pinned so silent upstream releases can't shift task definitions
        # or grader behavior. Bump intentionally with a documented re-baseline.
@@ -75,14 +62,11 @@ jobs:
          curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
          unzip -qo /tmp/nopecha.zip -d extensions/nopecha

-      - name: Run eval and publish to R2
+      - name: Run eval
        working-directory: packages/browseros-agent/apps/eval
        env:
          FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
-          AWS_REGION: ${{ secrets.AWS_REGION || 'us-west-2' }}
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
          BROWSEROS_BINARY: /usr/bin/browseros
@@ -90,29 +74,12 @@ jobs:
          # OpenClaw container runtime is macOS-only; opt the Linux runner
          # into the no-op stub so the server can boot and the eval can run.
          BROWSEROS_SKIP_OPENCLAW: '1'
-          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
+          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
        run: |
          echo "Running eval with config: $EVAL_CONFIG"
-          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG"
-          # Capture the run directory so report.html can be generated before the R2 publish step.
-          SUMMARY_PATH="$(find results -name summary.json -type f -print | sort | tail -n 1)"
-          if [ -z "$SUMMARY_PATH" ]; then
-            echo "No eval run summary found"
-            exit 1
-          fi
-          RUN_DIR="$(dirname "$SUMMARY_PATH")"
-          echo "EVAL_RUN_DIR=$RUN_DIR" >> "$GITHUB_ENV"
+          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"

-      - name: Generate run analysis report
-        if: success()
-        working-directory: packages/browseros-agent/apps/eval
-        env:
-          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
-        run: |
-          echo "Generating run report for $EVAL_RUN_DIR"
-          bun scripts/generate-report.ts --input "$EVAL_RUN_DIR" --output "$EVAL_RUN_DIR/report.html"
-
-      - name: Publish eval run to R2
+      - name: Upload runs to R2
        if: success()
        working-directory: packages/browseros-agent/apps/eval
        env:
@@ -121,7 +88,10 @@ jobs:
          EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
          EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
          EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
-        run: bun run src/index.ts publish --run "$EVAL_RUN_DIR" --target r2
+          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
+        run: |
+          CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
+          bun scripts/upload-run.ts "results/$CONFIG_NAME"

      - name: Generate trend report
        if: success()
@@ -136,7 +106,7 @@ jobs:
          EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
        run: bun apps/eval/scripts/weekly-report.ts /tmp/eval-report.html

-      - name: Upload trend report as artifact
+      - name: Upload report as artifact
        if: success()
        uses: actions/upload-artifact@v4
        with:
--- a/.github/workflows/sync-internal-docs.yml
+++ b/.github/workflows/sync-internal-docs.yml
@@ -1,62 +0,0 @@
-name: Sync internal-docs submodule
-
-on:
-  schedule:
-    - cron: '0 */4 * * *'
-  workflow_dispatch:
-
-jobs:
-  sync:
-    name: Bump internal-docs submodule pointer on dev
-    runs-on: ubuntu-latest
-    permissions:
-      contents: write
-      pull-requests: write
-    steps:
-      - name: Rewrite SSH submodule URL to HTTPS-with-token
-        env:
-          TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
-        run: |
-          git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
-
-      - uses: actions/checkout@v4
-        with:
-          token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
-          submodules: true
-          ref: dev
-          fetch-depth: 50
-
-      - name: Open auto-merge PR if internal-docs has new commits
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          set -e
-
-          # Skip if submodule not yet configured (handoff window before someone adds it)
-          if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
-            echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
-            exit 0
-          fi
-
-          git submodule update --remote --merge .internal-docs
-
-          if git diff --quiet .internal-docs; then
-            echo "No internal-docs changes to sync."
-            exit 0
-          fi
-
-          BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
-          git config user.name  "browseros-bot"
-          git config user.email "bot@browseros.ai"
-          git checkout -b "$BRANCH"
-          git add .internal-docs
-          git commit -m "chore: sync internal-docs submodule"
-          git push -u origin "$BRANCH"
-
-          PR_URL=$(gh pr create \
-            --base dev \
-            --head "$BRANCH" \
-            --title "chore: sync internal-docs submodule" \
-            --body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
-
-          gh pr merge "$PR_URL" --auto --squash --delete-branch
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -63,15 +63,15 @@ jobs:
            junit_path: test-results/server-root.xml
            needs_browser: false
          - suite: agent
-            command: (cd apps/agent && bun run test)
+            command: bun run test:agent
            junit_path: test-results/agent.xml
            needs_browser: false
          - suite: eval
-            command: (cd apps/eval && bun run test)
+            command: bun run test:eval
            junit_path: test-results/eval.xml
            needs_browser: false
          - suite: build
-            command: bun run ./scripts/run-bun-test.ts ./scripts/build
+            command: bun run test:build
            junit_path: test-results/build.xml
            needs_browser: false

--- a/.gitmodules
+++ b/.gitmodules
@@ -1,4 +0,0 @@
-[submodule ".internal-docs"]
-	path = .internal-docs
-	url = git@github.com:browseros-ai/internal-docs.git
-	branch = main
--- a/.internal-docs
+++ b/.internal-docs
--- a/README.md
+++ b/README.md
@@ -188,21 +188,6 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
 - [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
 - [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.

-## Citation
-
-If you use BrowserOS in your research or project, please cite:
-
-```bibtex
-@software{browseros2025,
-  author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
-  title = {BrowserOS: The open-source Agentic browser},
-  url = {https://github.com/browseros-ai/BrowserOS},
-  year = {2025},
-  publisher = {GitHub},
-  license = {AGPL-3.0},
-}
-```
-
 ## License

 BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
--- a/packages/browseros-agent/README.md
+++ b/packages/browseros-agent/README.md
@@ -79,15 +79,14 @@ cp apps/server/.env.example apps/server/.env.development
 cp apps/agent/.env.example apps/agent/.env.development
 cp apps/server/.env.production.example apps/server/.env.production

-# Install deps and generate agent code
+# Install deps, generate agent code, and sync the VM cache
 bun run dev:setup

 # Start the full dev environment
 bun run dev:watch
 ```

-`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
-the server startup path and pulls the configured GHCR image on demand.
+`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.

 ### Environment Variables

@@ -157,14 +156,9 @@ bun run build:server          # Build production server resource artifacts and u
 bun run build:agent           # Build agent extension

 # Test
-bun run test                  # Run all tests
-bun run test:all              # Run all tests
-bun run test:main             # Run key server tools and integration tests
-
-# App-specific test groups (from packages/browseros-agent)
-cd apps/server && bun run test:tools
-cd apps/server && bun run test:cdp
-cd apps/server && bun run test:integration
+bun run test                  # Run standard tests
+bun run test:cdp              # Run CDP-based tests
+bun run test:integration      # Run integration tests

 # Quality
 bun run lint                  # Check with Biome
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
@@ -17,7 +17,7 @@ export function groupProviderOptions(
      ? [{ key: 'llm' as const, label: 'AI Providers', options: llm }]
      : []),
    ...(acp.length
-      ? [{ key: 'acp' as const, label: 'Agents', options: acp }]
+      ? [{ key: 'acp' as const, label: 'ACP Models', options: acp }]
      : []),
  ]
 }
@@ -26,25 +26,14 @@ export function getProviderSearchValue(
  provider: Provider,
  groupLabel: string,
 ): string {
-  return [
-    provider.id,
-    provider.name,
-    provider.type,
-    groupLabel,
-    provider.adapterName,
-    provider.modelLabel,
-  ]
+  return [provider.id, provider.name, provider.type, groupLabel]
    .filter(Boolean)
    .join(' ')
 }

 export function getProviderSubtitle(provider: Provider): string | undefined {
  if (provider.kind !== 'acp') return undefined
-  return [
-    provider.adapterName,
-    provider.modelLabel,
-    provider.modelControl === 'best-effort' ? 'best effort' : undefined,
-  ]
-    .filter(Boolean)
-    .join(' · ')
+  return provider.modelControl === 'best-effort'
+    ? 'ACP model · best effort'
+    : 'ACP model'
 }
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
@@ -16,26 +16,22 @@ const options: Provider[] = [
  },
  {
    kind: 'acp',
-    id: 'agent-claude-review',
-    name: 'Review Bot',
+    id: 'acp:claude:haiku:medium',
+    name: 'Claude Code Haiku',
    type: 'acp',
-    adapterName: 'Claude Code',
-    modelLabel: 'Haiku',
    modelControl: 'best-effort',
  },
  {
    kind: 'acp',
-    id: 'agent-codex-browser',
-    name: 'Browser Driver',
+    id: 'acp:codex:gpt-5.5:medium',
+    name: 'Codex GPT-5.5',
    type: 'acp',
-    adapterName: 'Codex',
-    modelLabel: 'GPT-5.5',
    modelControl: 'runtime-supported',
  },
 ]

 describe('groupProviderOptions', () => {
-  it('groups normal providers separately from created agents', () => {
+  it('groups normal providers separately from ACP models', () => {
    expect(groupProviderOptions(options)).toEqual([
      {
        key: 'llm',
@@ -44,7 +40,7 @@ describe('groupProviderOptions', () => {
      },
      {
        key: 'acp',
-        label: 'Agents',
+        label: 'ACP Models',
        options: [options[2], options[3]],
      },
    ])
@@ -52,21 +48,20 @@ describe('groupProviderOptions', () => {
 })

 describe('getProviderSearchValue', () => {
-  it('matches created-agent group labels and item labels', () => {
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Agents')
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Review Bot')
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain(
-      'Claude Code',
+  it('matches ACP group labels and item labels', () => {
+    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
+      'ACP Models',
+    )
+    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
+      'Claude Code Haiku',
    )
  })
 })

 describe('getProviderSubtitle', () => {
-  it('describes created-agent runtime context without model-target copy', () => {
-    expect(getProviderSubtitle(options[2])).toBe(
-      'Claude Code · Haiku · best effort',
-    )
-    expect(getProviderSubtitle(options[3])).toBe('Codex · GPT-5.5')
+  it('does not present best-effort ACP models as guaranteed routing', () => {
+    expect(getProviderSubtitle(options[2])).toBe('ACP model · best effort')
+    expect(getProviderSubtitle(options[3])).toBe('ACP model')
    expect(getProviderSubtitle(options[0])).toBeUndefined()
  })
 })
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
@@ -41,10 +41,7 @@ export const ChatProviderSelector: FC<
      <PopoverTrigger asChild>{children}</PopoverTrigger>
      <PopoverContent side="bottom" align="start" className="w-64 p-0">
        <Command>
-          <CommandInput
-            placeholder="Search providers or agents..."
-            className="h-9"
-          />
+          <CommandInput placeholder="Search models..." className="h-9" />
          <CommandList>
            <CommandEmpty>No provider found</CommandEmpty>
            {groups.map((group) => (
--- a/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
@@ -7,8 +7,5 @@ export interface Provider {
  name: string
  type: ChatProviderType
  kind: 'llm' | 'acp'
-  agentId?: string
-  adapterName?: string
-  modelLabel?: string
  modelControl?: 'runtime-supported' | 'best-effort'
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
@@ -0,0 +1,136 @@
+import { Bot, Loader2, Wrench } from 'lucide-react'
+import type { FC } from 'react'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import { cn } from '@/lib/utils'
+
+interface AgentCardProps {
+  agent: AgentCardData
+  onClick: () => void
+  active?: boolean
+}
+
+function formatTimestamp(timestamp?: number): string {
+  if (!timestamp) return 'No activity yet'
+  const diff = Date.now() - timestamp
+  const minutes = Math.floor(diff / 60000)
+  if (minutes < 1) return 'just now'
+  if (minutes < 60) return `${minutes}m ago`
+  const hours = Math.floor(minutes / 60)
+  if (hours < 24) return `${hours}h ago`
+  return `${Math.floor(hours / 24)}d ago`
+}
+
+function getStatusLabel(status: AgentCardData['status']): string {
+  if (status === 'working') return 'Working'
+  if (status === 'error') return 'Error'
+  return 'Ready'
+}
+
+function getStatusTone(status: AgentCardData['status']): string {
+  if (status === 'working') return 'bg-amber-500'
+  if (status === 'error') return 'bg-destructive'
+  return 'bg-emerald-500'
+}
+
+function formatCost(usd: number): string {
+  if (usd < 0.005) return `$${usd.toFixed(4)}`
+  return `$${usd.toFixed(2)}`
+}
+
+export const AgentCardExpanded: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
+      active
+        ? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
+    )}
+  >
+    <div className="flex items-start justify-between gap-3">
+      <div className="flex min-w-0 items-center gap-3">
+        <div
+          className={cn(
+            'flex size-10 shrink-0 items-center justify-center rounded-xl',
+            active
+              ? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
+              : 'bg-muted text-muted-foreground',
+          )}
+        >
+          <Bot className="size-5" />
+        </div>
+        <div className="min-w-0">
+          <div className="truncate font-semibold text-sm">{agent.name}</div>
+          <div className="truncate text-muted-foreground text-xs">
+            {agent.model ?? 'OpenClaw agent'}
+          </div>
+        </div>
+      </div>
+      <div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
+        <span
+          className={cn('size-2 rounded-full', getStatusTone(agent.status))}
+        />
+        <span>{getStatusLabel(agent.status)}</span>
+      </div>
+    </div>
+
+    <div className="mt-4 flex-1">
+      <p className="line-clamp-2 text-foreground/90 text-sm">
+        {agent.lastMessage ??
+          'Start a conversation to see recent work and summaries.'}
+      </p>
+    </div>
+
+    <div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
+      <div className="flex items-center justify-between gap-3">
+        <span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
+        {agent.costUsd ? (
+          <span className="tabular-nums opacity-70">
+            {formatCost(agent.costUsd)}
+          </span>
+        ) : null}
+      </div>
+      {agent.status === 'working' && agent.currentTool ? (
+        <div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
+          <Loader2 className="size-3 shrink-0 animate-spin" />
+          <span className="truncate">{agent.currentTool}</span>
+        </div>
+      ) : agent.activitySummary ? (
+        <div className="flex items-center gap-1.5 text-muted-foreground/60">
+          <Wrench className="size-3 shrink-0" />
+          <span className="truncate">{agent.activitySummary}</span>
+        </div>
+      ) : null}
+    </div>
+  </button>
+)
+
+export const AgentCardCompact: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
+      active
+        ? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
+    )}
+  >
+    <span
+      className={cn(
+        'size-2 rounded-full',
+        active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
+      )}
+    />
+    <span className="truncate">{agent.name}</span>
+  </button>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
@@ -1,71 +1,70 @@
 import { Plus } from 'lucide-react'
 import type { FC } from 'react'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { cn } from '@/lib/utils'
-import { HomeAgentCard } from './HomeAgentCard'
+import { AgentCardCompact, AgentCardExpanded } from './AgentCard'

 interface AgentCardDockProps {
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  activeAgentId?: string
  onSelectAgent: (agentId: string) => void
  onCreateAgent?: () => void
+  compact?: boolean
 }

-function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
+function CreateAgentButton({
+  compact,
+  onCreateAgent,
+}: {
+  compact?: boolean
+  onCreateAgent: () => void
+}) {
  return (
    <button
      type="button"
      onClick={onCreateAgent}
      className={cn(
-        'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
-        'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        compact
+          ? 'rounded-full px-3 py-2 text-sm'
+          : 'min-h-32 rounded-2xl px-5 py-4',
      )}
    >
-      <Plus className="size-5" />
-      <span>Create agent</span>
+      <Plus className={compact ? 'size-3.5' : 'size-5'} />
+      <span>{compact ? 'New' : 'Create agent'}</span>
    </button>
  )
 }

-/**
- * 3-column grid of HomeAgentCards plus a trailing "Create agent"
- * tile. The previous `compact` mode (rendered a horizontal pill rail)
- * had no callers and was dropped along with the legacy AgentCard.
- */
 export const AgentCardDock: FC<AgentCardDockProps> = ({
  agents,
-  adapters,
  activeAgentId,
  onSelectAgent,
  onCreateAgent,
+  compact,
 }) => {
  if (agents.length === 0 && !onCreateAgent) return null

-  const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
-  for (const descriptor of adapters) {
-    if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
-  }
+  const Card = compact ? AgentCardCompact : AgentCardExpanded

  return (
-    <div className="grid gap-4 md:grid-cols-3">
+    <div
+      className={cn(
+        compact
+          ? 'flex items-center gap-2 overflow-x-auto pb-1'
+          : 'grid gap-4 md:grid-cols-3',
+      )}
+    >
      {agents.map((agent) => (
-        <HomeAgentCard
-          key={agent.id}
+        <Card
+          key={agent.agentId}
          agent={agent}
-          adapter={agent.adapter}
-          adapterHealth={adapterHealth.get(agent.adapter) ?? null}
-          active={agent.id === activeAgentId}
-          onClick={() => onSelectAgent(agent.id)}
+          active={agent.agentId === activeAgentId}
+          onClick={() => onSelectAgent(agent.agentId)}
        />
      ))}
      {onCreateAgent ? (
-        <CreateAgentButton onCreateAgent={onCreateAgent} />
+        <CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
      ) : null}
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
@@ -1,36 +1,179 @@
-import { ArrowLeft } from 'lucide-react'
+import { ArrowLeft, Bot, Home } from 'lucide-react'
 import { type FC, useEffect, useMemo, useRef } from 'react'
 import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
 import { Button } from '@/components/ui/button'
-import type {
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
 import {
-  cancelHarnessTurn,
-  useAgentAdapters,
-  useEnqueueHarnessMessage,
-  useHarnessAgents,
-  useRemoveHarnessQueuedMessage,
-  useUpdateHarnessAgent,
-} from '@/entrypoints/app/agents/useAgents'
-import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
-import { AgentRail } from './AgentRail'
+  type AgentEntry,
+  getModelDisplayName,
+} from '@/entrypoints/app/agents/useOpenClaw'
+import { cn } from '@/lib/utils'
 import { useAgentCommandData } from './agent-command-layout'
 import { ClawChat } from './ClawChat'
-import { ConversationHeader } from './ConversationHeader'
 import { ConversationInput } from './ConversationInput'
 import {
  buildChatHistoryFromClawMessages,
  filterTurnsPersistedInHistory,
  flattenHistoryPages,
 } from './claw-chat-types'
-import { consumePendingInitialMessage } from './pending-initial-message'
-import { QueuePanel } from './QueuePanel'
 import { useAgentConversation } from './useAgentConversation'
 import { useHarnessChatHistory } from './useHarnessChatHistory'

+function StatusBadge({ status }: { status: string }) {
+  return (
+    <div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
+      <span
+        className={cn(
+          'size-1.5 rounded-full',
+          status === 'Working on your request'
+            ? 'bg-amber-500'
+            : status === 'Ready'
+              ? 'bg-emerald-500'
+              : status === 'Offline'
+                ? 'bg-muted-foreground/50'
+                : 'bg-[var(--accent-orange)]',
+        )}
+      />
+      <span>{status}</span>
+    </div>
+  )
+}
+
+function AgentIdentity({
+  name,
+  meta,
+  className,
+}: {
+  name: string
+  meta: string
+  className?: string
+}) {
+  return (
+    <div className={cn('min-w-0', className)}>
+      <div className="truncate font-semibold text-[15px] leading-5">{name}</div>
+      <div className="truncate text-muted-foreground text-xs leading-5">
+        {meta}
+      </div>
+    </div>
+  )
+}
+
+function ConversationHeader({
+  agentName,
+  agentMeta,
+  status,
+  backLabel,
+  backTarget,
+  onGoHome,
+}: {
+  agentName: string
+  agentMeta: string
+  status: string
+  backLabel: string
+  backTarget: 'home' | 'page'
+  onGoHome: () => void
+}) {
+  const BackIcon = backTarget === 'home' ? Home : ArrowLeft
+
+  return (
+    <div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
+      <div className="flex min-w-0 items-center gap-3">
+        <Button
+          variant="ghost"
+          size="icon"
+          onClick={onGoHome}
+          className="size-8 rounded-xl lg:hidden"
+          title={backLabel}
+        >
+          <BackIcon className="size-4" />
+        </Button>
+        <div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
+          <Bot className="size-4" />
+        </div>
+        <AgentIdentity name={agentName} meta={agentMeta} />
+      </div>
+
+      <StatusBadge status={status} />
+    </div>
+  )
+}
+
+function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
+  return (
+    <div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
+      <div className="flex min-w-0 items-center gap-3">
+        <Button
+          variant="ghost"
+          size="icon"
+          onClick={onGoHome}
+          className="size-8 rounded-xl"
+          title="Back to home"
+        >
+          <ArrowLeft className="size-4" />
+        </Button>
+        <div className="truncate font-semibold text-[15px] leading-5">
+          Agents
+        </div>
+      </div>
+    </div>
+  )
+}
+
+function AgentRailList({
+  activeAgentId,
+  agents,
+  onSelectAgent,
+}: {
+  activeAgentId: string
+  agents: AgentEntry[]
+  onSelectAgent: (entry: AgentEntry) => void
+}) {
+  return (
+    <aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
+      <div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
+        {agents.map((entry) => {
+          const active = entry.agentId === activeAgentId
+          const modelName = getAgentEntryMeta(entry)
+
+          return (
+            <button
+              key={entry.agentId}
+              type="button"
+              onClick={() => onSelectAgent(entry)}
+              className={cn(
+                'w-full rounded-2xl border px-3 py-3 text-left transition-all',
+                active
+                  ? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
+                  : 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
+              )}
+            >
+              <div className="flex items-center gap-3">
+                <div
+                  className={cn(
+                    'flex size-9 items-center justify-center rounded-xl',
+                    active
+                      ? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
+                      : 'bg-muted text-muted-foreground',
+                  )}
+                >
+                  <Bot className="size-4" />
+                </div>
+                <AgentIdentity name={entry.name} meta={modelName} />
+              </div>
+            </button>
+          )
+        })}
+      </div>
+    </aside>
+  )
+}
+
+function getAgentEntryMeta(agent: AgentEntry | undefined): string {
+  if (agent?.source === 'agent-harness') {
+    return getModelDisplayName(agent.model) ?? 'ACP agent'
+  }
+  return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
+}
+
 function AgentConversationController({
  agentId,
  initialMessage,
@@ -69,33 +212,15 @@ function AgentConversationController({
    [historyMessages],
  )

-  // Listing query feeds queue + active-turn state for this agent. We
-  // already poll it every 5s for the rail; reusing the same cache
-  // keeps cross-tab queue state in sync without a second poll.
-  const { harnessAgents } = useHarnessAgents()
-  const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
-  const queue = harnessAgent?.queue ?? []
-  const activeTurnId = harnessAgent?.activeTurnId ?? null
-
  const { turns, streaming, send } = useAgentConversation(agentId, {
    runtime: 'agent-harness',
    sessionKey: null,
    history: chatHistory,
-    activeTurnId,
    onComplete: () => {
      void harnessHistoryQuery.refetch()
    },
    onSessionKeyChange: () => {},
  })
-  const enqueueMessage = useEnqueueHarnessMessage()
-  const removeQueuedMessage = useRemoveHarnessQueuedMessage()
-
-  const handleStop = () => {
-    void cancelHarnessTurn(agentId, {
-      turnId: activeTurnId ?? undefined,
-      reason: 'user pressed stop',
-    })
-  }
  const visibleTurns = useMemo(
    () => filterTurnsPersistedInHistory(turns, historyMessages),
    [historyMessages, turns],
@@ -114,59 +239,32 @@ function AgentConversationController({
  sendRef.current = send

  useEffect(() => {
-    if (disabled || !historyReady) return
-
-    // Registry-first: when the user submitted at /home with
-    // attachments, the rich payload is here. URL `?q=` may also be
-    // present and is the text-only fallback path; the registry wins
-    // when both exist because it carries the binary attachments
-    // alongside the text.
-    const pending = consumePendingInitialMessage(agentId)
-    if (pending) {
-      // Mark the dedup ref so the text-only branch below doesn't
-      // re-fire on the same render.
-      if (initialMessageKey) {
-        initialMessageSentRef.current = initialMessageKey
-      }
-      onInitialMessageConsumedRef.current()
-      void sendRef.current({
-        text: pending.text,
-        attachments: pending.attachments.map((a) => a.payload),
-        attachmentPreviews: pending.attachments.map((a) => ({
-          id: a.id,
-          kind: a.kind,
-          mediaType: a.mediaType,
-          name: a.name,
-          dataUrl: a.dataUrl,
-        })),
-      })
-      return
-    }
-
    const query = initialMessage?.trim()
    if (!initialMessageKey) {
-      // Reset is safe even on the post-registry-fire re-run: consume
-      // is destructive, so the registry is already drained — there's
-      // nothing left for a third run to re-send.
      initialMessageSentRef.current = null
      return
    }

-    if (!query || initialMessageSentRef.current === initialMessageKey) {
+    if (
+      !query ||
+      initialMessageSentRef.current === initialMessageKey ||
+      disabled ||
+      !historyReady
+    ) {
      return
    }

    initialMessageSentRef.current = initialMessageKey
    onInitialMessageConsumedRef.current()
    void sendRef.current({ text: query })
-  }, [agentId, disabled, historyReady, initialMessage, initialMessageKey])
+  }, [disabled, historyReady, initialMessage, initialMessageKey])

  const handleSelectAgent = (entry: AgentEntry) => {
    navigate(`${agentPathPrefix}/${entry.agentId}`)
  }

  return (
-    <div className="flex min-h-0 flex-1 flex-col overflow-hidden">
+    <div className="flex min-h-0 flex-col overflow-hidden">
      <ClawChat
        agentName={agentName}
        historyMessages={historyMessages}
@@ -183,15 +281,7 @@ function AgentConversationController({
      />

      <div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
-        <div className="mx-auto max-w-3xl space-y-3">
-          {queue.length > 0 ? (
-            <QueuePanel
-              queue={queue}
-              onRemove={(messageId) =>
-                removeQueuedMessage.mutate({ agentId, messageId })
-              }
-            />
-          ) : null}
+        <div className="mx-auto max-w-3xl">
          <ConversationInput
            variant="conversation"
            agents={agents}
@@ -206,31 +296,14 @@ function AgentConversationController({
                name: a.name,
                dataUrl: a.dataUrl,
              }))
-              // When the agent already has an in-flight turn, route
-              // the new message into the durable queue instead of
-              // starting a parallel turn. Drains automatically as
-              // soon as the active turn ends.
-              if (streaming || activeTurnId) {
-                enqueueMessage.mutate({
-                  agentId,
-                  message: input.text,
-                  attachments,
-                })
-                return
-              }
              void send({ text: input.text, attachments, attachmentPreviews })
            }}
            onCreateAgent={() => navigate(createAgentPath)}
-            onStop={handleStop}
            streaming={streaming}
            disabled={disabled}
            status="running"
            attachmentsEnabled={true}
-            placeholder={
-              streaming
-                ? `Type to queue another message for ${agentName}...`
-                : `Message ${agentName}...`
-            }
+            placeholder={`Message ${agentName}...`}
          />
        </div>
      </div>
@@ -245,22 +318,6 @@ interface AgentCommandConversationProps {
  createAgentPath?: string
 }

-function inferAdapterFromEntry(
-  entry: AgentEntry | undefined,
-): HarnessAgentAdapter | 'unknown' {
-  if (!entry) return 'unknown'
-  if (entry.source === 'agent-harness') {
-    // Harness entries don't carry the adapter on AgentEntry; the rail
-    // / header read the harness record directly. This branch only runs
-    // before the harness query resolves, so 'unknown' is correct — the
-    // tile's bot fallback renders until data arrives.
-    return 'unknown'
-  }
-  // OpenClaw-only entries (no harness shadow) are deprecated in
-  // practice but the rail still tolerates them.
-  return 'openclaw'
-}
-
 export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
  variant = 'command',
  backPath = '/home',
@@ -271,110 +328,60 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
  const [searchParams, setSearchParams] = useSearchParams()
  const navigate = useNavigate()
  const { agents } = useAgentCommandData()
-  const { harnessAgents } = useHarnessAgents()
-  const { adapters } = useAgentAdapters()
-  const updateAgent = useUpdateHarnessAgent()
-
  const shouldRedirectHome = !agentId
  const resolvedAgentId = agentId ?? ''
-  const harnessAgent = harnessAgents.find(
-    (entry) => entry.id === resolvedAgentId,
-  )
-  const entry = agents.find((item) => item.agentId === resolvedAgentId)
-  const fallbackName = entry?.name || resolvedAgentId || 'Agent'
-  const fallbackAdapter = inferAdapterFromEntry(entry)
+  const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
+  const agentName = agent?.name || resolvedAgentId || 'Agent'
+  const agentMeta = getAgentEntryMeta(agent)
  const initialMessage = searchParams.get('q')
  const isPageVariant = variant === 'page'
  const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'

-  const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
-    const adapterId = harnessAgent?.adapter
-    if (!adapterId) return null
-    const descriptor = adapters.find((item) => item.id === adapterId)
-    if (!descriptor?.health) return null
-    return {
-      healthy: descriptor.health.healthy,
-      reason: descriptor.health.reason,
-    }
-  }, [adapters, harnessAgent?.adapter])
-
  if (shouldRedirectHome) {
    return <Navigate to="/home" replace />
  }

-  const handleSelectHarnessAgent = (target: HarnessAgent) => {
-    navigate(`${agentPathPrefix}/${target.id}`)
+  const handleSelectAgent = (entry: AgentEntry) => {
+    navigate(`${agentPathPrefix}/${entry.agentId}`)
  }

-  const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
-    if (!target) return
-    updateAgent.mutate({
-      agentId: target.id,
-      patch: { pinned: next },
-    })
-  }
+  // Every visible agent runs through the harness now, so per-agent
+  // runtime status doesn't gate chat the way OpenClaw's legacy
+  // gateway lifecycle did. Show "Ready" once the agent record is
+  // resolved from the rail, "Setup" otherwise.
+  const statusCopy = agent ? 'Ready' : 'Setup'

  return (
    <div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
-      <div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
-        {/* Shared top band — the rail's "Agents" header and the chat
-            header live on one row so they're aligned by construction. */}
-        <div className="flex shrink-0 items-stretch border-border/50 border-b">
-          <div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
-            <Button
-              variant="ghost"
-              size="icon"
-              onClick={() => navigate(backPath)}
-              className="size-8 rounded-xl"
-              title="Back to home"
-            >
-              <ArrowLeft className="size-4" />
-            </Button>
-            <div className="truncate font-semibold text-[15px] leading-5">
-              Agents
-            </div>
-          </div>
-          <div className="min-w-0 flex-1">
-            <ConversationHeader
-              agent={harnessAgent ?? null}
-              fallbackName={fallbackName}
-              fallbackAdapter={fallbackAdapter}
-              adapterHealth={adapterHealth}
-              backLabel={backLabel}
-              backTarget={isPageVariant ? 'page' : 'home'}
-              onGoHome={() => navigate(backPath)}
-              onPinToggle={(next) =>
-                handlePinToggle(harnessAgent ?? null, next)
-              }
-            />
-          </div>
-        </div>
+      <div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
+        <AgentRailHeader onGoHome={() => navigate(backPath)} />

-        {/* Body grid: rail list + chat. Both columns share the same
-            top edge (the band above) so headers can never drift. */}
-        <div className="grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)] lg:grid-cols-[288px_minmax(0,1fr)]">
-          <AgentRail
-            agents={harnessAgents}
-            adapters={adapters}
-            activeAgentId={resolvedAgentId}
-            onSelectAgent={handleSelectHarnessAgent}
-            onPinToggle={(target, next) => handlePinToggle(target, next)}
-          />
+        <ConversationHeader
+          agentName={agentName}
+          agentMeta={agentMeta}
+          status={statusCopy}
+          backLabel={backLabel}
+          backTarget={isPageVariant ? 'page' : 'home'}
+          onGoHome={() => navigate(backPath)}
+        />

-          <div className="flex h-full min-h-0 flex-col overflow-hidden">
-            <AgentConversationController
-              key={resolvedAgentId}
-              agentId={resolvedAgentId}
-              agents={agents}
-              initialMessage={initialMessage}
-              onInitialMessageConsumed={() =>
-                setSearchParams({}, { replace: true })
-              }
-              agentPathPrefix={agentPathPrefix}
-              createAgentPath={createAgentPath}
-            />
-          </div>
-        </div>
+        <AgentRailList
+          activeAgentId={resolvedAgentId}
+          agents={agents}
+          onSelectAgent={handleSelectAgent}
+        />
+
+        <AgentConversationController
+          key={resolvedAgentId}
+          agentId={resolvedAgentId}
+          agents={agents}
+          initialMessage={initialMessage}
+          onInitialMessageConsumed={() =>
+            setSearchParams({}, { replace: true })
+          }
+          agentPathPrefix={agentPathPrefix}
+          createAgentPath={createAgentPath}
+        />
      </div>
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
@@ -1,29 +1,18 @@
 import { Plus } from 'lucide-react'
-import { type FC, useEffect, useMemo, useState } from 'react'
+import { type FC, useEffect, useState } from 'react'
 import { useNavigate } from 'react-router'
 import { Button } from '@/components/ui/button'
 import { Card, CardContent } from '@/components/ui/card'
 import { Separator } from '@/components/ui/separator'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import {
-  useAgentAdapters,
-  useHarnessAgents,
-} from '@/entrypoints/app/agents/useAgents'
 import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
 import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
 import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
 import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { AgentCardDock } from './AgentCardDock'
 import { useAgentCommandData } from './agent-command-layout'
-import {
-  ConversationInput,
-  type ConversationInputSendInput,
-} from './ConversationInput'
-import { orderHomeAgents } from './home-agent-card.helpers'
-import { setPendingInitialMessage } from './pending-initial-message'
+import { ConversationInput } from './ConversationInput'
+import { buildAgentCardData } from './useAgentCardData'

 function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
  return (
@@ -49,13 +38,11 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
 function RecentThreads({
  activeAgentId,
  agents,
-  adapters,
  onOpenAgents,
  onSelectAgent,
 }: {
  activeAgentId?: string | null
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  onOpenAgents: () => void
  onSelectAgent: (agentId: string) => void
 }) {
@@ -81,7 +68,6 @@ function RecentThreads({
      </div>
      <AgentCardDock
        agents={agents}
-        adapters={adapters}
        activeAgentId={activeAgentId ?? undefined}
        onSelectAgent={onSelectAgent}
        onCreateAgent={onOpenAgents}
@@ -93,46 +79,28 @@ function RecentThreads({
 export const AgentCommandHome: FC = () => {
  const navigate = useNavigate()
  const activeHint = useActiveHint()
-  // The conversation input still consumes the merged AgentEntry list
-  // from the layout context (handles legacy /claw/agents entries that
-  // haven't yet been backfilled into the harness store). The Recent
-  // Agents grid below reads the richer harness payload directly.
-  const { agents: legacyAgents, status } = useAgentCommandData()
-  const { harnessAgents } = useHarnessAgents()
-  const { adapters } = useAgentAdapters()
+  const { agents, status } = useAgentCommandData()
  const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
-
-  const orderedAgents = useMemo(
-    () => orderHomeAgents(harnessAgents),
-    [harnessAgents],
-  )
+  const cardData = buildAgentCardData(agents, status?.status, undefined)

  useEffect(() => {
-    if (legacyAgents.length === 0) {
-      if (selectedAgentId) setSelectedAgentId(null)
+    if (agents.length === 0) {
+      if (selectedAgentId) {
+        setSelectedAgentId(null)
+      }
      return
    }
+
    if (
      !selectedAgentId ||
-      !legacyAgents.some((agent) => agent.agentId === selectedAgentId)
+      !agents.some((agent) => agent.agentId === selectedAgentId)
    ) {
-      setSelectedAgentId(legacyAgents[0].agentId)
+      setSelectedAgentId(agents[0].agentId)
    }
-  }, [legacyAgents, selectedAgentId])
+  }, [agents, selectedAgentId])

-  const handleSend = (input: ConversationInputSendInput) => {
+  const handleSend = (input: { text: string }) => {
    if (!selectedAgentId) return
-    // Stash text + attachments in the in-memory registry. Text also
-    // travels in `?q=` so a hard refresh / shareable URL still works
-    // for text-only prompts; attachments are registry-only because a
-    // multi-megabyte dataUrl can't ride a URL search param. The chat
-    // screen prefers the registry when both are present.
-    setPendingInitialMessage({
-      agentId: selectedAgentId,
-      text: input.text,
-      attachments: input.attachments,
-      createdAt: Date.now(),
-    })
    navigate(
      `/home/agents/${selectedAgentId}?q=${encodeURIComponent(input.text)}`,
    )
@@ -142,7 +110,7 @@ export const AgentCommandHome: FC = () => {
    setSelectedAgentId(agent.agentId)
  }

-  const selectedAgent = legacyAgents.find(
+  const selectedAgent = agents.find(
    (agent) => agent.agentId === selectedAgentId,
  )
  const selectedAgentReady = selectedAgent
@@ -150,15 +118,13 @@ export const AgentCommandHome: FC = () => {
    : false
  const selectedAgentStatus =
    selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
-  const selectedAgentName =
-    selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
-
-  const hasAgents = legacyAgents.length > 0
+  const selectedCard =
+    cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]

  return (
    <div className="min-h-full px-4 py-6">
      <div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
-        {hasAgents ? (
+        {cardData.length > 0 ? (
          <>
            <div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
              <div className="space-y-3">
@@ -174,7 +140,7 @@ export const AgentCommandHome: FC = () => {
              <div className="w-full max-w-3xl">
                <ConversationInput
                  variant="home"
-                  agents={legacyAgents}
+                  agents={agents}
                  selectedAgentId={selectedAgentId}
                  onSelectAgent={handleSelectAgent}
                  onSend={handleSend}
@@ -182,10 +148,10 @@ export const AgentCommandHome: FC = () => {
                  streaming={false}
                  disabled={!selectedAgentReady}
                  status={selectedAgentStatus}
-                  attachmentsEnabled={true}
+                  attachmentsEnabled={false}
                  placeholder={
                    selectedAgentReady
-                      ? `Ask ${selectedAgentName} to handle a task...`
+                      ? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
                      : 'Agent runtime is not running...'
                  }
                />
@@ -196,8 +162,7 @@ export const AgentCommandHome: FC = () => {

            <RecentThreads
              activeAgentId={selectedAgentId}
-              agents={orderedAgents}
-              adapters={adapters}
+              agents={cardData}
              onOpenAgents={() => navigate('/agents')}
              onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
            />
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRail.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRail.tsx
@@ -1,65 +0,0 @@
-import { type FC, useMemo } from 'react'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
-import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
-import { AgentRailRow } from './AgentRailRow'
-
-interface AgentRailProps {
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
-  activeAgentId: string
-  onSelectAgent: (agent: HarnessAgent) => void
-  onPinToggle: (agent: HarnessAgent, next: boolean) => void
-}
-
-/**
- * Left-column scrollable list of agents. The "Agents" label + back
- * button live in the shared top band above (so the rail header and
- * the chat header sit on a single aligned strip rather than as two
- * separately-sized headers per column). Sort matches `/agents`:
- * pinned-first → recency, so the rail doesn't reshuffle as turns
- * transition every 5 s.
- */
-export const AgentRail: FC<AgentRailProps> = ({
-  agents,
-  adapters,
-  activeAgentId,
-  onSelectAgent,
-  onPinToggle,
-}) => {
-  const adapterHealth = useMemo(() => {
-    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
-    for (const adapter of adapters) {
-      if (adapter.health) {
-        map.set(adapter.id, {
-          healthy: adapter.health.healthy,
-          reason: adapter.health.reason,
-        })
-      }
-    }
-    return map
-  }, [adapters])
-
-  const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
-
-  return (
-    <aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
-      <div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
-        {ordered.map((agent) => (
-          <AgentRailRow
-            key={agent.id}
-            agent={agent}
-            active={agent.id === activeAgentId}
-            adapterHealth={adapterHealth.get(agent.adapter) ?? null}
-            onSelect={() => onSelectAgent(agent)}
-            onPinToggle={(next) => onPinToggle(agent, next)}
-          />
-        ))}
-      </div>
-    </aside>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRailRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentRailRow.tsx
@@ -1,102 +0,0 @@
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
-import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
-import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
-import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
-import { cn } from '@/lib/utils'
-
-interface AgentRailRowProps {
-  agent: HarnessAgent
-  active: boolean
-  adapterHealth: AgentAdapterHealth | null
-  onSelect: () => void
-  onPinToggle: (next: boolean) => void
-}
-
-/**
- * Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
- * down to the essentials that fit a ~280 px rail: tile + name + status
- * badge + pin star, with the adapter / model / reasoning chips on a
- * second line. Token totals, sparkline, last-message preview all stay
- * on the `/agents` page where rows are full-width.
- */
-export const AgentRailRow: FC<AgentRailRowProps> = ({
-  agent,
-  active,
-  adapterHealth,
-  onSelect,
-  onPinToggle,
-}) => {
-  const status = agent.status ?? 'unknown'
-  const lastUsedAt = agent.lastUsedAt ?? null
-  const pinned = agent.pinned ?? false
-  return (
-    <button
-      type="button"
-      onClick={onSelect}
-      className={cn(
-        'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
-        active
-          ? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
-          : 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
-      )}
-    >
-      <div className="flex min-w-0 items-start gap-3">
-        <AgentTile
-          adapter={agent.adapter}
-          status={status}
-          lastUsedAt={lastUsedAt}
-        />
-        <div className="min-w-0 flex-1">
-          <div className="flex items-center gap-1.5">
-            <span className="truncate font-semibold text-[14px] leading-5">
-              {agent.name}
-            </span>
-            {status === 'working' && (
-              <Badge
-                variant="secondary"
-                className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
-              >
-                Working
-              </Badge>
-            )}
-            {status === 'asleep' && (
-              <Badge
-                variant="outline"
-                className="h-5 px-1.5 text-[10px] text-muted-foreground"
-              >
-                Asleep
-              </Badge>
-            )}
-            {status === 'error' && (
-              <Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
-                Attention
-              </Badge>
-            )}
-            <div className="ml-auto">
-              <PinToggle pinned={pinned} onToggle={onPinToggle} />
-            </div>
-          </div>
-          <AgentSummaryChips
-            adapter={agent.adapter}
-            modelLabel={agent.modelId ?? null}
-            reasoningEffort={agent.reasoningEffort ?? null}
-            adapterHealth={adapterHealth}
-          />
-        </div>
-      </div>
-    </button>
-  )
-}
-
-/**
- * Tooltip-only label helper kept exported in case the tile row needs to
- * show "Codex agent" or similar in a future state. Inlined fallback for
- * the rare `unknown` adapter rendering path.
- */
-export function railRowAdapterLabel(agent: HarnessAgent): string {
-  return adapterLabel(agent.adapter)
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationHeader.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationHeader.tsx
@@ -1,179 +0,0 @@
-import { ArrowLeft, Home } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import { Button } from '@/components/ui/button'
-import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
-import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
-import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
-import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
-import { cn } from '@/lib/utils'
-
-interface ConversationHeaderProps {
-  agent: HarnessAgent | null
-  fallbackName: string
-  fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'unknown'
-  adapterHealth: AgentAdapterHealth | null
-  backLabel: string
-  backTarget: 'home' | 'page'
-  onGoHome: () => void
-  onPinToggle: (next: boolean) => void
-}
-
-/**
- * Strip above the chat. Mirrors the `/agents` row card's title row +
- * summary chips so the user gets adapter health, pin state, and status
- * at a glance — but adds the meta line (last used · lifetime tokens ·
- * queued) that's specific to this surface.
- *
- * The mobile `lg:hidden` Back button is preserved so the small-screen
- * collapse keeps a navigable header without a sidebar.
- */
-export const ConversationHeader: FC<ConversationHeaderProps> = ({
-  agent,
-  fallbackName,
-  fallbackAdapter,
-  adapterHealth,
-  backLabel,
-  backTarget,
-  onGoHome,
-  onPinToggle,
-}) => {
-  const BackIcon = backTarget === 'home' ? Home : ArrowLeft
-  const adapter = agent?.adapter ?? fallbackAdapter
-  const status: AgentLiveness = agent?.status ?? 'unknown'
-  const lastUsedAt = agent?.lastUsedAt ?? null
-  const pinned = agent?.pinned ?? false
-  const queueCount = agent?.queue?.length ?? 0
-  const tokens = agent?.tokens ?? null
-  const lifetimeTotal = tokens
-    ? tokens.cumulative.input + tokens.cumulative.output
-    : 0
-
-  const metaParts: string[] = []
-  if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
-  if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
-  if (queueCount > 0) {
-    metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
-  }
-
-  return (
-    <div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
-      <div className="flex min-w-0 items-center gap-3">
-        <Button
-          variant="ghost"
-          size="icon"
-          onClick={onGoHome}
-          className="size-8 shrink-0 rounded-xl lg:hidden"
-          title={backLabel}
-        >
-          <BackIcon className="size-4" />
-        </Button>
-        <div className="group min-w-0 flex-1">
-          <div className="flex items-center gap-2">
-            <span className="truncate font-semibold text-[15px] leading-6">
-              {agent?.name || fallbackName}
-            </span>
-            {agent ? (
-              <PinToggle pinned={pinned} onToggle={onPinToggle} />
-            ) : null}
-          </div>
-          <div className="mt-0.5 flex items-center gap-2">
-            <AgentSummaryChips
-              adapter={adapter}
-              modelLabel={agent?.modelId ?? null}
-              reasoningEffort={agent?.reasoningEffort ?? null}
-              adapterHealth={adapterHealth}
-            />
-          </div>
-        </div>
-      </div>
-      <div className="flex shrink-0 flex-col items-end gap-1">
-        <StatusPill
-          status={status}
-          hasActiveTurn={Boolean(agent?.activeTurnId)}
-        />
-        <div className="flex h-4 items-center text-[11px] text-muted-foreground">
-          <span className="truncate">
-            {metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
-          </span>
-        </div>
-      </div>
-    </div>
-  )
-}
-
-interface StatusPillProps {
-  status: AgentLiveness
-  hasActiveTurn: boolean
-}
-
-/**
- * Working / Asleep / Attention all get distinctive styling; idle keeps
- * the legacy emerald `Ready` pill so the default state is visually
- * calm. Defensive working: `idle + activeTurnId` falls through to the
- * working pill since the server says a turn is in flight.
- */
-const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
-  const effective: AgentLiveness =
-    status === 'idle' && hasActiveTurn ? 'working' : status
-
-  const base =
-    'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
-
-  if (effective === 'working') {
-    return (
-      <Badge
-        variant="secondary"
-        className={cn(
-          base,
-          'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
-        )}
-      >
-        <span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
-        Working
-      </Badge>
-    )
-  }
-  if (effective === 'asleep') {
-    return (
-      <Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
-        <span className="size-1.5 rounded-full bg-muted-foreground/50" />
-        Asleep
-      </Badge>
-    )
-  }
-  if (effective === 'error') {
-    return (
-      <Badge
-        variant="destructive"
-        className={cn(base, 'border-destructive/30')}
-      >
-        <span className="size-1.5 rounded-full bg-destructive-foreground" />
-        Attention
-      </Badge>
-    )
-  }
-  if (effective === 'idle') {
-    return (
-      <Badge
-        variant="outline"
-        className={cn(
-          base,
-          'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
-        )}
-      >
-        <span className="size-1.5 rounded-full bg-emerald-500" />
-        Ready
-      </Badge>
-    )
-  }
-  return (
-    <Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
-      <span className="size-1.5 rounded-full bg-muted-foreground/30" />
-      Setup
-    </Badge>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
@@ -54,40 +54,25 @@ interface ConversationInputProps {
  placeholder?: string
  attachmentsEnabled?: boolean
  variant?: 'home' | 'conversation'
-  /**
-   * When set, a Stop button surfaces to the left of the voice mic
-   * while `streaming === true`. Click cancels the active turn
-   * server-side via the chat-cancel endpoint. Absent → no Stop
-   * button (legacy behaviour for the home composer).
-   */
-  onStop?: () => void
 }

 function InputActionButton({
  disabled,
  onClick,
  streaming,
-  hasContent,
 }: {
  disabled: boolean
  onClick: () => void
  streaming: boolean
-  hasContent: boolean
 }) {
-  // Show the spinner while streaming only when there's nothing to
-  // send — once the user types something, the icon flips back to the
-  // paper-plane so it reads as "queue this message" instead of
-  // "still working".
-  const showSpinner = streaming && !hasContent
  return (
    <Button
      onClick={onClick}
      size="icon"
      disabled={disabled}
-      title={streaming && hasContent ? 'Queue message' : undefined}
      className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
    >
-      {showSpinner ? (
+      {streaming ? (
        <Loader2 className="h-5 w-5 animate-spin" />
      ) : (
        <ArrowRight className="h-5 w-5" />
@@ -96,22 +81,6 @@ function InputActionButton({
  )
 }

-function StopButton({ onStop }: { onStop: () => void }) {
-  return (
-    <Button
-      type="button"
-      size="icon"
-      variant="ghost"
-      onClick={onStop}
-      title="Stop current turn — queued messages will start next."
-      aria-label="Stop current turn"
-      className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
-    >
-      <Square className="h-3.5 w-3.5 fill-current" />
-    </Button>
-  )
-}
-
 function VoiceButton({
  isRecording,
  isTranscribing,
@@ -330,7 +299,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  placeholder,
  attachmentsEnabled = true,
  variant = 'conversation',
-  onStop,
 }) => {
  const [input, setInput] = useState('')
  const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -411,17 +379,10 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  }

  const hasContent = input.trim().length > 0 || attachments.length > 0
-  // Queue-aware composers (the conversation panel passes `onStop`)
-  // accept input while streaming — the parent decides whether the
-  // submission opens a new turn or enqueues onto the active one.
-  // Surfaces without a Stop hook (home) keep the legacy behaviour
-  // and block input until the current turn finishes.
-  const queueAware = Boolean(onStop)

  const handleSend = () => {
    const text = input.trim()
-    if (disabled || isStaging) return
-    if (streaming && !queueAware) return
+    if (disabled || isStaging || streaming) return
    if (!text && attachments.length === 0) return
    onSend({ text, attachments })
    setInput('')
@@ -551,7 +512,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              )}
            />
          </div>
-          {streaming && onStop ? <StopButton onStop={onStop} /> : null}
          <VoiceButton
            isRecording={voice.isRecording}
            isTranscribing={voice.isTranscribing}
@@ -569,13 +529,12 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              !!disabled ||
              voice.isRecording ||
              voice.isTranscribing ||
-              (streaming && !queueAware)
+              streaming
            }
            onClick={handleSend}
            // Spinner stays the user-facing "agent is busy" hint; with the
            // queue active we still spin while a turn is in flight.
            streaming={streaming}
-            hasContent={hasContent}
          />
        </div>
        {voice.error ? (
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
@@ -1,243 +0,0 @@
-import { Quote, TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
-import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
-import type {
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
-import {
-  firstNonBlankLine,
-  truncate,
-} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
-import { cn } from '@/lib/utils'
-
-interface HomeAgentCardProps {
-  agent: HarnessAgent
-  adapter: HarnessAgentAdapter | 'unknown'
-  /** Per-adapter health snapshot, shared across cards rendering the
-   *  same adapter. `null` when the /adapters response hasn't surfaced
-   *  health yet (we treat that as healthy until proven otherwise). */
-  adapterHealth: HarnessAdapterHealth | null
-  /** Highlights the card with an accent ring; tells the user which
-   *  agent the conversation input is bound to. */
-  active?: boolean
-  onClick: () => void
-}
-
-const PREVIEW_CHARS = 100
-
-/**
- * Grid-shaped card for the /home Recent agents section. Composition
- * mirrors the rail's `AgentRowCard` but the layout is a vertical
- * column sized for a 1/3-width tile rather than a full-width row.
- *
- * Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
- * `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
- * inline `Unavailable` chip pattern so the visual language is
- * continuous between rail and grid.
- */
-export const HomeAgentCard: FC<HomeAgentCardProps> = ({
-  agent,
-  adapter,
-  adapterHealth,
-  active,
-  onClick,
-}) => {
-  const status = agent.status ?? 'unknown'
-  const lastUsedAt = agent.lastUsedAt ?? null
-  const isWorking = status === 'working'
-  const isAsleep = status === 'asleep'
-  const isError = status === 'error'
-  const hasActiveTurn = Boolean(agent.activeTurnId)
-
-  return (
-    <button
-      type="button"
-      onClick={onClick}
-      className={cn(
-        'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
-        active && 'ring-1 ring-[var(--accent-orange)]/30',
-        isWorking
-          ? 'border-[var(--accent-orange)]/40'
-          : isError
-            ? 'border-destructive/30'
-            : 'border-border/60 hover:border-[var(--accent-orange)]/30',
-      )}
-    >
-      <div className="flex items-start gap-3">
-        <AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
-        <div className="min-w-0 flex-1">
-          <div className="flex items-center gap-1.5">
-            <span className="truncate font-semibold text-sm">
-              {displayName(agent)}
-            </span>
-            {isWorking && (
-              <Badge
-                variant="secondary"
-                className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
-              >
-                Working
-              </Badge>
-            )}
-          </div>
-          <SummaryLine
-            adapter={adapter}
-            modelId={agent.modelId ?? null}
-            reasoningEffort={agent.reasoningEffort ?? null}
-            adapterHealth={adapterHealth}
-          />
-        </div>
-      </div>
-
-      <LastMessage message={agent.lastUserMessage ?? null} />
-
-      <div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
-        <span>{statusFootnote(status, lastUsedAt)}</span>
-        {hasActiveTurn ? (
-          <ResumeChip />
-        ) : isAsleep ? (
-          <Badge variant="outline" className="text-muted-foreground">
-            Asleep
-          </Badge>
-        ) : isError ? (
-          <ErrorChip lastError={agent.lastError ?? null} />
-        ) : null}
-      </div>
-    </button>
-  )
-}
-
-const SummaryLine: FC<{
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelId: string | null
-  reasoningEffort: string | null
-  adapterHealth: HarnessAdapterHealth | null
-}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelId) parts.push(modelId)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth?.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
-
-const LastMessage: FC<{ message: string | null }> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  return (
-    <p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="line-clamp-2">
-        {truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
-      </span>
-    </p>
-  )
-}
-
-const ResumeChip: FC = () => (
-  <span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
-    <span className="relative flex size-1.5">
-      <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-      <span className="relative inline-flex size-1.5 rounded-full bg-white" />
-    </span>
-    Resume
-  </span>
-)
-
-const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
-  if (!lastError) {
-    return <Badge variant="destructive">Attention</Badge>
-  }
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <Badge variant="destructive" className="cursor-default">
-          Attention
-        </Badge>
-      </HoverCardTrigger>
-      <HoverCardContent
-        side="left"
-        className="max-w-xs whitespace-pre-wrap font-mono text-xs"
-      >
-        {lastError}
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
-
-/**
- * Footer left side: relative time on every state EXCEPT working,
- * which shows `now` (the dot is already pulsing — restating it as
- * "Working" would duplicate the pill in the title row).
- */
-function statusFootnote(
-  status: AgentLiveness,
-  lastUsedAt: number | null,
-): string {
-  if (status === 'working') return 'now'
-  return formatRelativeTime(lastUsedAt)
-}
-
-const UUID_PATTERN =
-  /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-const OC_UUID_PATTERN =
-  /^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-
-function displayName(agent: HarnessAgent): string {
-  const name = agent.name?.trim()
-  const id = agent.id
-  if (!name || name === id) {
-    if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
-    if (UUID_PATTERN.test(id)) return id.slice(0, 8)
-    return id
-  }
-  return name
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
@@ -1,94 +0,0 @@
-import { ListPlus, X } from 'lucide-react'
-import type { FC } from 'react'
-import {
-  Queue,
-  QueueItem,
-  QueueItemAction,
-  QueueItemActions,
-  QueueItemAttachment,
-  QueueItemContent,
-  QueueItemFile,
-  QueueItemImage,
-  QueueList,
-  QueueSection,
-  QueueSectionContent,
-  QueueSectionLabel,
-  QueueSectionTrigger,
-} from '@/components/ai-elements/queue'
-import type {
-  HarnessQueuedMessage,
-  HarnessQueuedMessageAttachment,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-
-interface QueuePanelProps {
-  queue: HarnessQueuedMessage[]
-  onRemove: (messageId: string) => void
-}
-
-/**
- * Renders the agent's pending message queue using the shared AI
- * Elements `Queue` primitives. Caller is expected to gate render on
- * `queue.length > 0` — when empty, this returns null so the panel
- * disappears cleanly between turns.
- */
-export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
-  if (queue.length === 0) return null
-  return (
-    <Queue>
-      <QueueSection>
-        <QueueSectionTrigger>
-          <QueueSectionLabel
-            count={queue.length}
-            label={queue.length === 1 ? 'queued message' : 'queued messages'}
-            icon={<ListPlus className="size-3.5" />}
-          />
-        </QueueSectionTrigger>
-        <QueueSectionContent>
-          <QueueList>
-            {queue.map((entry) => (
-              <QueueItem key={entry.id}>
-                <div className="flex items-center gap-2">
-                  <QueueItemContent>
-                    {firstNonBlankLine(entry.message)}
-                  </QueueItemContent>
-                  <QueueItemActions>
-                    <QueueItemAction
-                      aria-label="Remove from queue"
-                      onClick={() => onRemove(entry.id)}
-                    >
-                      <X className="size-3" />
-                    </QueueItemAction>
-                  </QueueItemActions>
-                </div>
-                {entry.attachments && entry.attachments.length > 0 ? (
-                  <QueueItemAttachment>
-                    {entry.attachments.map((attachment, idx) =>
-                      renderAttachment(entry.id, attachment, idx),
-                    )}
-                  </QueueItemAttachment>
-                ) : null}
-              </QueueItem>
-            ))}
-          </QueueList>
-        </QueueSectionContent>
-      </QueueSection>
-    </Queue>
-  )
-}
-
-function renderAttachment(
-  messageId: string,
-  attachment: HarnessQueuedMessageAttachment,
-  idx: number,
-) {
-  if (attachment.mediaType.startsWith('image/')) {
-    const src = `data:${attachment.mediaType};base64,${attachment.data}`
-    return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
-  }
-  return (
-    <QueueItemFile key={`${messageId}-${idx}`}>
-      {attachment.mediaType}
-    </QueueItemFile>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
@@ -1,69 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-import { orderHomeAgents } from './home-agent-card.helpers'
-
-function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
-  return {
-    id: overrides.id ?? 'agent-x',
-    name: overrides.name ?? overrides.id ?? 'agent-x',
-    adapter: overrides.adapter ?? 'codex',
-    permissionMode: 'approve-all',
-    sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
-    createdAt: 1000,
-    updatedAt: 1000,
-    ...overrides,
-  }
-}
-
-describe('orderHomeAgents', () => {
-  it('places active-turn agents before everyone else', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'a', lastUsedAt: 5000 }),
-      agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
-      agent({ id: 'c', lastUsedAt: 7000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
-  })
-
-  it('orders non-active agents by lastUsedAt desc', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'old', lastUsedAt: 1000 }),
-      agent({ id: 'new', lastUsedAt: 9000 }),
-      agent({ id: 'mid', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
-  })
-
-  it('puts the gateway `main` seed agent above other never-used agents', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
-  })
-
-  it('sends never-used agents to the bottom even when `main` is among them', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'used', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
-  })
-
-  it('does NOT sort by pinned — pinned agents are treated like any other', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
-      agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
-  })
-
-  it('falls back to id-stable ordering when lastUsedAt ties', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'b', lastUsedAt: 5000 }),
-      agent({ id: 'a', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
@@ -1,42 +0,0 @@
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-
-/**
- * Order for the /home Recent agents grid.
- *
- * 1. Active turn first — agents mid-turn float to the top so the
- *    Resume affordance is the first thing the user sees on /home.
- * 2. The protected gateway-side `main` agent stays pinned-to-top in
- *    the never-used group on a fresh install (mirrors the rail).
- * 3. Recency (`lastUsedAt` desc).
- * 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
- *    every 5-second poll.
- *
- * Pin is NOT a sort key. The home grid is action-oriented and trusts
- * recency + active-turn to surface the right agent; pinning is an
- * organisation tool that lives on the rail at /agents.
- */
-export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
-  return [...agents].sort((a, b) => {
-    const aActive = a.activeTurnId != null
-    const bActive = b.activeTurnId != null
-    if (aActive !== bActive) return aActive ? -1 : 1
-
-    // Recency wins outright. Never-used agents (`lastUsedAt == null`)
-    // both fall to the same `-Infinity` bucket and the seed/id rules
-    // below decide their order — but a used agent always beats any
-    // never-used agent regardless of id.
-    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    if (aValue !== bValue) return bValue - aValue
-
-    // Inside the never-used (or exact-tie) group: pin the gateway
-    // `main` seed to the top of the group on a fresh install, then
-    // fall back to id-stable order so the grid doesn't reshuffle on
-    // every poll.
-    const aSeed = a.id === 'main' && a.lastUsedAt == null
-    const bSeed = b.id === 'main' && b.lastUsedAt == null
-    if (aSeed !== bSeed) return aSeed ? -1 : 1
-
-    return a.id.localeCompare(b.id)
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/pending-initial-message.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/pending-initial-message.test.ts
@@ -1,109 +0,0 @@
-import { afterEach, describe, expect, it } from 'bun:test'
-import type { StagedAttachment } from '@/lib/attachments'
-import {
-  consumePendingInitialMessage,
-  peekPendingInitialMessage,
-  setPendingInitialMessage,
-} from './pending-initial-message'
-
-function makeAttachment(id: string): StagedAttachment {
-  return {
-    id,
-    kind: 'image',
-    mediaType: 'image/png',
-    name: `${id}.png`,
-    dataUrl: `data:image/png;base64,${id}`,
-    payload: {
-      kind: 'image',
-      mediaType: 'image/png',
-      name: `${id}.png`,
-      dataUrl: `data:image/png;base64,${id}`,
-    },
-  }
-}
-
-afterEach(() => {
-  // Drain any leftover pending entry so tests don't leak into each
-  // other (the module-scope state survives across `it` blocks).
-  consumePendingInitialMessage('drain')
-  // If still set, clear by consuming with the matching id.
-  const leftover = peekPendingInitialMessage()
-  if (leftover) consumePendingInitialMessage(leftover.agentId)
-})
-
-describe('pending-initial-message', () => {
-  it('consume returns the payload set for the same agentId', () => {
-    setPendingInitialMessage({
-      agentId: 'agent-a',
-      text: 'hello',
-      attachments: [makeAttachment('one')],
-      createdAt: Date.now(),
-    })
-    const result = consumePendingInitialMessage('agent-a')
-    expect(result?.text).toBe('hello')
-    expect(result?.attachments).toHaveLength(1)
-    expect(result?.attachments[0]?.id).toBe('one')
-  })
-
-  it('consume is destructive — second call returns null', () => {
-    setPendingInitialMessage({
-      agentId: 'agent-a',
-      text: 'hello',
-      attachments: [],
-      createdAt: Date.now(),
-    })
-    expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
-    expect(consumePendingInitialMessage('agent-a')).toBeNull()
-  })
-
-  it('consume returns null and preserves entry when agentId differs', () => {
-    setPendingInitialMessage({
-      agentId: 'agent-a',
-      text: 'hello',
-      attachments: [],
-      createdAt: Date.now(),
-    })
-    expect(consumePendingInitialMessage('agent-b')).toBeNull()
-    expect(peekPendingInitialMessage()?.agentId).toBe('agent-a')
-    expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
-  })
-
-  it('returns null for entries older than the TTL', () => {
-    setPendingInitialMessage({
-      agentId: 'agent-a',
-      text: 'old',
-      attachments: [],
-      createdAt: Date.now() - 11_000, // older than 10 s TTL
-    })
-    expect(consumePendingInitialMessage('agent-a')).toBeNull()
-  })
-
-  it('replaces a previous pending entry when set is called again', () => {
-    setPendingInitialMessage({
-      agentId: 'agent-a',
-      text: 'first',
-      attachments: [],
-      createdAt: Date.now(),
-    })
-    setPendingInitialMessage({
-      agentId: 'agent-b',
-      text: 'second',
-      attachments: [makeAttachment('two')],
-      createdAt: Date.now(),
-    })
-    expect(consumePendingInitialMessage('agent-a')).toBeNull()
-    const result = consumePendingInitialMessage('agent-b')
-    expect(result?.text).toBe('second')
-    expect(result?.attachments[0]?.id).toBe('two')
-  })
-
-  it('no-ops when set is called with empty agentId', () => {
-    setPendingInitialMessage({
-      agentId: '',
-      text: 'oops',
-      attachments: [],
-      createdAt: Date.now(),
-    })
-    expect(peekPendingInitialMessage()).toBeNull()
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/pending-initial-message.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/pending-initial-message.ts
@@ -1,81 +0,0 @@
-import type { StagedAttachment } from '@/lib/attachments'
-
-/**
- * Same-tab in-memory handoff between the `/home` composer and the
- * chat screen at `/home/agents/:agentId`. URL search params (`?q=`)
- * carry the text fine, but cannot carry binary attachments — a multi-
- * megabyte image dataUrl would explode URL length limits and round-
- * trip badly. This module is the rich-data side channel for the same
- * navigation: the composer writes here, the chat screen reads here on
- * mount.
- *
- * Intentionally module-scope. Same render tree, same tab — no need
- * for sessionStorage (which would force JSON-serialising the dataUrls
- * and re-parsing on the read side). Cross-tab handoff is out of
- * scope: the user typing at home in tab A and switching to tab B's
- * chat would surface an empty registry there, which is the correct
- * behaviour.
- */
-
-export interface PendingInitialMessage {
-  agentId: string
-  text: string
-  attachments: StagedAttachment[]
-  createdAt: number
-}
-
-/**
- * 10s TTL on the entry. A stale entry from a back-button journey
- * shouldn't fire on a future visit; if real-world latency makes 10s
- * too tight under slow harness boot, bump but never make it
- * indefinite.
- */
-const PENDING_TTL_MS = 10_000
-
-let pending: PendingInitialMessage | null = null
-let pendingTimer: ReturnType<typeof setTimeout> | null = null
-
-function clearPending(): void {
-  pending = null
-  if (pendingTimer !== null) {
-    clearTimeout(pendingTimer)
-    pendingTimer = null
-  }
-}
-
-export function setPendingInitialMessage(payload: PendingInitialMessage): void {
-  // Defensive: the home composer should never call this without an
-  // agent selected. If it somehow does, no-op rather than holding a
-  // payload we can't route.
-  if (!payload.agentId) return
-  clearPending()
-  pending = payload
-  pendingTimer = setTimeout(clearPending, PENDING_TTL_MS)
-}
-
-/**
- * Destructive read. Returns the entry only if `agentId` matches and
- * the entry is fresh; clears the entry on success so Strict-Mode
- * double-invokes can't double-send.
- */
-export function consumePendingInitialMessage(
-  agentId: string,
-): PendingInitialMessage | null {
-  if (!pending) return null
-  if (pending.agentId !== agentId) return null
-  if (Date.now() - pending.createdAt >= PENDING_TTL_MS) {
-    clearPending()
-    return null
-  }
-  const entry = pending
-  clearPending()
-  return entry
-}
-
-/**
- * Non-mutating read for tests. Production code should never need this
- * — use `consume` and own the lifecycle.
- */
-export function peekPendingInitialMessage(): PendingInitialMessage | null {
-  return pending
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
@@ -0,0 +1,53 @@
+import {
+  type AgentEntry,
+  getModelDisplayName,
+  type OpenClawStatus,
+} from '@/entrypoints/app/agents/useOpenClaw'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import type { AgentOverview } from './useAgentDashboard'
+
+function resolveAgentStatus(
+  gatewayStatus: OpenClawStatus['status'] | undefined,
+  liveStatus: AgentOverview['status'] | undefined,
+): AgentCardData['status'] {
+  // Gateway-level errors take precedence
+  if (gatewayStatus === 'error') return 'error'
+  if (gatewayStatus === 'starting') return 'working'
+
+  // Per-agent live status from the WS observer
+  if (liveStatus === 'working') return 'working'
+  if (liveStatus === 'error') return 'error'
+
+  return 'idle'
+}
+
+/**
+ * Build agent card display data by merging the raw agent entries from
+ * the gateway with enriched overview data from the dashboard API.
+ *
+ * Pure function — no hooks, no IndexedDB, no async.
+ */
+export function buildAgentCardData(
+  agents: AgentEntry[],
+  status: OpenClawStatus['status'] | undefined,
+  dashboard: AgentOverview[] | undefined,
+): AgentCardData[] {
+  return agents.map((agent) => {
+    const overview = dashboard?.find((d) => d.agentId === agent.agentId)
+
+    return {
+      agentId: agent.agentId,
+      name: agent.name,
+      model: getModelDisplayName(agent.model),
+      status:
+        agent.source === 'agent-harness'
+          ? 'idle'
+          : resolveAgentStatus(status, overview?.status),
+      lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
+      lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
+      activitySummary: overview?.activitySummary ?? undefined,
+      currentTool: overview?.currentTool ?? undefined,
+      costUsd: overview?.totalCostUsd ?? undefined,
+    }
+  })
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
@@ -36,15 +36,6 @@ interface UseAgentConversationOptions {
  history?: OpenClawChatHistoryMessage[]
  onComplete?: () => void
  onSessionKeyChange?: (sessionKey: string) => void
-  /**
-   * Server-side active turn id, surfaced via the listing query. When
-   * this changes from null/<id> to a different non-null id while we
-   * aren't already streaming (e.g. the server just popped a queued
-   * message and started a new turn), the hook reattaches via
-   * /chat/active so the chat panel picks up the live stream without
-   * waiting for a remount.
-   */
-  activeTurnId?: string | null
 }

 export function useAgentConversation(
@@ -220,46 +211,31 @@ export function useAgentConversation(
  }
  processEventRef.current = processAgentHarnessStreamEvent

-  const activeTurnIdDep = options.activeTurnId ?? null
-
-  // On mount, on agent change, and whenever the listing reports a
-  // *new* active turn id, check whether the server has an in-flight
-  // turn for this agent and reattach to it. This catches three
-  // cases at once: the chat resilience flow (tab close/reopen),
-  // navigation between agents, AND queue drain (the server starts a
-  // new turn from a queued message → activeTurnId flips → attach).
+  // On mount (and whenever the agent changes), check whether the
+  // server has an in-flight turn for this agent and reattach to it.
+  // This is what makes the chat resilient across tab close/reopen,
+  // refresh, and navigation: the runtime call kept running on the
+  // server while we were away. Effect only depends on `agentId` —
+  // the event handler is read off a ref so this doesn't re-subscribe
+  // every render.
  useEffect(() => {
    let cancelled = false
    const abortController = new AbortController()
-    // Reference the dep inside the body so biome's exhaustive-deps
-    // rule sees it consumed; the value is just an "any non-null
-    // active turn id" trigger — the actual id we attach to comes
-    // from the fresh fetchActiveHarnessTurn call below.
-    void activeTurnIdDep

    const attemptResume = async () => {
-      // Track whether *we* started a stream in this run. When the
-      // early-return paths fire (no active turn, or a `send()` /
-      // earlier resume already owns `streamAbortRef`), the finally
-      // block must NOT touch streaming/turnIdRef/lastSeqRef —
-      // otherwise we clobber the in-flight stream's state and the
-      // Stop button drops out mid-turn while events keep arriving.
-      let weStartedStream = false
      try {
        const active = await fetchActiveHarnessTurn(agentId)
        if (cancelled || !active || active.status !== 'running') return
-        if (streamAbortRef.current) return // someone else already owns the stream
+        if (streamAbortRef.current) return // a fresh send already in flight

        // Stage a placeholder turn so the streamed events have a row
-        // to render into. The server now persists the kicking-off
-        // prompt on the active turn, so we render it as the user
-        // bubble immediately — no empty-bubble flicker when a queued
-        // message starts running.
+        // to render into. We don't have the user message text on
+        // resume; the assistant turn is what we're catching up on.
        setTurns((prev) => [
          ...prev,
          {
            id: crypto.randomUUID(),
-            userText: active.prompt ?? '',
+            userText: '',
            parts: [],
            done: false,
            timestamp: active.startedAt,
@@ -271,7 +247,6 @@ export function useAgentConversation(
        lastSeqRef.current = null
        streamAbortRef.current = abortController
        setStreaming(true)
-        weStartedStream = true

        const response = await attachToHarnessTurn(agentId, {
          turnId: active.turnId,
@@ -290,20 +265,10 @@ export function useAgentConversation(
        // Resume is best-effort; transient errors fall back to the
        // user starting a new turn manually.
      } finally {
-        // Always release `streamAbortRef` if we owned it — even when
-        // the effect was cancelled mid-stream (a listing poll
-        // captured the next queue-drain turn id, for example). If we
-        // don't, the next effect run hits `if (streamAbortRef.current)
-        // return` against our now-aborted controller and never
-        // reattaches, leaving `streaming === true` with no live stream.
-        if (weStartedStream && streamAbortRef.current === abortController) {
-          streamAbortRef.current = null
-        }
-        // The other state (streaming flag, turn id, lastSeq) is the
-        // *current run's* lifecycle: only reset it on a clean exit.
-        // When `cancelled` is true the next run will set these
-        // itself, so resetting here would only cause a brief flicker.
-        if (!cancelled && weStartedStream) {
+        if (!cancelled) {
+          if (streamAbortRef.current === abortController) {
+            streamAbortRef.current = null
+          }
          turnIdRef.current = null
          lastSeqRef.current = null
          setStreaming(false)
@@ -316,7 +281,7 @@ export function useAgentConversation(
      cancelled = true
      abortController.abort()
    }
-  }, [agentId, activeTurnIdDep])
+  }, [agentId])

  const send = async (input: string | SendInput) => {
    const normalized: SendInput =
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
@@ -0,0 +1,95 @@
+import { useQuery, useQueryClient } from '@tanstack/react-query'
+import { useEffect } from 'react'
+import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
+
+export interface AgentOverview {
+  agentId: string
+  status: 'working' | 'idle' | 'error' | 'unknown'
+  latestMessage: string | null
+  latestMessageAt: number | null
+  activitySummary: string | null
+  currentTool: string | null
+  totalCostUsd: number
+  sessionCount: number
+}
+
+export interface DashboardResponse {
+  agents: AgentOverview[]
+  summary: {
+    totalAgents: number
+    totalCostUsd: number
+  }
+}
+
+interface StatusEvent {
+  agentId: string
+  status: AgentOverview['status']
+  currentTool: string | null
+  error: string | null
+  timestamp: number
+}
+
+const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
+
+export function useAgentDashboard(enabled: boolean) {
+  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+  const ready = enabled && Boolean(baseUrl) && !urlLoading
+
+  // Initial data load + periodic refresh as fallback
+  const query = useQuery<DashboardResponse>({
+    queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
+    queryFn: async () => {
+      const url = new URL('/claw/dashboard', baseUrl as string)
+      const response = await fetch(url.toString())
+      if (!response.ok) throw new Error('Failed to fetch dashboard')
+      return response.json()
+    },
+    enabled: ready,
+  })
+
+  // SSE subscription for real-time status patches
+  useEffect(() => {
+    if (!ready || !baseUrl) return
+
+    const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
+    const eventSource = new EventSource(streamUrl.toString())
+
+    eventSource.addEventListener('snapshot', (event) => {
+      try {
+        const dashboard = JSON.parse(event.data) as DashboardResponse
+        queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
+      } catch {}
+    })
+
+    eventSource.addEventListener('status', (event) => {
+      try {
+        const status = JSON.parse(event.data) as StatusEvent
+        queryClient.setQueryData<DashboardResponse>(
+          [...DASHBOARD_QUERY_KEY, baseUrl],
+          (prev) => {
+            if (!prev) return prev
+            return {
+              ...prev,
+              agents: prev.agents.map((agent) =>
+                agent.agentId === status.agentId
+                  ? {
+                      ...agent,
+                      status: status.status,
+                      currentTool: status.currentTool,
+                    }
+                  : agent,
+              ),
+            }
+          },
+        )
+      } catch {}
+    })
+
+    return () => {
+      eventSource.close()
+    }
+  }, [ready, baseUrl, queryClient])
+
+  return query
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
@@ -2,75 +2,67 @@ import { Loader2 } from 'lucide-react'
 import { type FC, useMemo } from 'react'
 import { AgentRowCard } from './AgentRowCard'
 import { AgentsEmptyState } from './AgentsEmptyState'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from './agent-harness-types'
-import type {
-  AgentAdapterHealth,
-  AgentRowData,
-} from './agent-row/agent-row.types'
-import { compareAgentsByPinThenRecency } from './agents-list-order'
+import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
 import type { AgentListItem } from './agents-page-types'
 import type { AgentLiveness } from './LivenessDot'

 interface AgentListProps {
  agents: AgentListItem[]
-  /** Optional per-agent activity metadata, keyed by `agentId`. */
+  /**
+   * Optional per-agent activity metadata. Keyed by `agentId`. Missing
+   * entries fall back to status='unknown' / lastUsedAt=null and the
+   * row renders an "unknown" dot. The server will populate this once
+   * the activity tracker ships; the page works without it.
+   */
  activity?: Record<
    string,
    { status: AgentLiveness; lastUsedAt: number | null }
  >
-  /** Lookup table from harness id → enriched agent record. */
+  /**
+   * Lookup table from harness agent id → adapter + reasoning effort,
+   * sourced from `useHarnessAgents`. Lets the row card render the
+   * correct adapter icon and chips for harness agents (legacy
+   * /claw/agents entries fall back to inferring from `runtimeLabel`).
+   */
  harnessAgentLookup?: Map<string, HarnessAgent>
-  /** Adapter catalog (carries per-adapter health). */
-  adapters: HarnessAdapterDescriptor[]
  loading: boolean
  deletingAgentKey: string | null
  onCreateAgent: () => void
  onDeleteAgent: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
 }

 export const AgentList: FC<AgentListProps> = ({
  agents,
  activity,
  harnessAgentLookup,
-  adapters,
  loading,
  deletingAgentKey,
  onCreateAgent,
  onDeleteAgent,
-  onPinToggle,
 }) => {
-  const adapterHealth = useMemo(() => {
-    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
-    for (const adapter of adapters) {
-      if (adapter.health) {
-        map.set(adapter.id, {
-          healthy: adapter.health.healthy,
-          reason: adapter.health.reason,
-        })
-      }
-    }
-    return map
-  }, [adapters])
-
+  // Sort by recency: most recently used first; never-used agents drop
+  // to the bottom in id-stable order so the list doesn't reshuffle on
+  // every refresh. The pinned exception is the gateway's `main` agent
+  // when it's never been touched — keep it at the top so a fresh
+  // install has an obvious starting point.
  const ordered = useMemo(() => {
-    const withMeta = agents.map((agent) => {
-      const harness = harnessAgentLookup?.get(agent.agentId)
-      return {
-        agent,
-        id: agent.agentId,
-        pinned: harness?.pinned ?? false,
-        lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
-      }
+    const withScore = agents.map((agent) => {
+      const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
+      return { agent, lastUsedAt }
    })
-    return withMeta
-      .sort(compareAgentsByPinThenRecency)
+    return withScore
+      .sort((a, b) => {
+        const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
+        const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
+        if (aPinned && !bPinned) return -1
+        if (!aPinned && bPinned) return 1
+        const aValue = a.lastUsedAt ?? -Infinity
+        const bValue = b.lastUsedAt ?? -Infinity
+        if (aValue !== bValue) return bValue - aValue
+        return a.agent.agentId.localeCompare(b.agent.agentId)
+      })
      .map((entry) => entry.agent)
-  }, [activity, agents, harnessAgentLookup])
+  }, [activity, agents])

  if (loading && agents.length === 0) {
    return (
@@ -88,23 +80,18 @@ export const AgentList: FC<AgentListProps> = ({
    <div className="grid gap-3">
      {ordered.map((agent) => {
        const harness = harnessAgentLookup?.get(agent.agentId)
-        const adapter: HarnessAgentAdapter | 'unknown' =
+        const adapter: HarnessAgentAdapter | undefined =
          harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
-        const data = buildRowData({
-          agent,
-          adapter,
-          harness,
-          activity: activity?.[agent.agentId],
-          adapterHealth:
-            adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
-        })
        return (
          <AgentRowCard
            key={agent.key}
-            data={data}
-            deleting={deletingAgentKey === agent.key}
+            agent={agent}
+            status={activity?.[agent.agentId]?.status}
+            lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
+            adapter={adapter}
+            reasoningEffort={harness?.reasoningEffort ?? null}
            onDelete={onDeleteAgent}
-            onPinToggle={onPinToggle}
+            deleting={deletingAgentKey === agent.key}
          />
        )
      })}
@@ -112,53 +99,10 @@ export const AgentList: FC<AgentListProps> = ({
  )
 }

-function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
+function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
  const lower = label?.toLowerCase()
  if (lower === 'claude code') return 'claude'
  if (lower === 'codex') return 'codex'
  if (lower === 'openclaw') return 'openclaw'
-  return 'unknown'
-}
-
-const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
-
-function buildRowData(input: {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  harness: HarnessAgent | undefined
-  activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
-  adapterHealth: AgentAdapterHealth | null
-}): AgentRowData {
-  const { agent, adapter, harness, activity, adapterHealth } = input
-  return {
-    agent,
-    adapter,
-    modelLabel: deriveModelLabel(agent, harness),
-    reasoningEffort: harness?.reasoningEffort ?? null,
-    status: activity?.status ?? 'unknown',
-    lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
-    pinned: harness?.pinned ?? false,
-    cwd: harness?.cwd ?? null,
-    lastUserMessage: harness?.lastUserMessage ?? null,
-    tokens: harness?.tokens ?? null,
-    turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
-    failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
-    lastError: harness?.lastError ?? null,
-    lastErrorAt: harness?.lastErrorAt ?? null,
-    activeTurnId: harness?.activeTurnId ?? null,
-    adapterHealth,
-  }
-}
-
-function deriveModelLabel(
-  agent: AgentListItem,
-  harness: HarnessAgent | undefined,
-): string | null {
-  // Prefer the agent rail's modelLabel when meaningful; harness's
-  // modelId is a stable identifier but the rail's `modelLabel`
-  // already maps to a friendly display string.
-  if (agent.modelLabel && agent.modelLabel !== 'default') {
-    return agent.modelLabel
-  }
-  return harness?.modelId ?? null
+  return undefined
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
@@ -1,99 +1,270 @@
+import {
+  Copy,
+  Loader2,
+  MessageSquare,
+  MoreHorizontal,
+  Pencil,
+  RotateCcw,
+  Trash2,
+} from 'lucide-react'
 import type { FC } from 'react'
+import { useNavigate } from 'react-router'
+import { toast } from 'sonner'
+import { Badge } from '@/components/ui/badge'
+import { Button } from '@/components/ui/button'
+import {
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuItem,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from '@/components/ui/dropdown-menu'
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from '@/components/ui/tooltip'
 import { cn } from '@/lib/utils'
-import { AgentActions } from './agent-row/AgentActions'
-import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
-import { AgentLastMessage } from './agent-row/AgentLastMessage'
-import { AgentMetaRow } from './agent-row/AgentMetaRow'
-import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
-import { AgentTile } from './agent-row/AgentTile'
-import { AgentTitleRow } from './agent-row/AgentTitleRow'
-import type {
-  AgentRowCallbacks,
-  AgentRowData,
-} from './agent-row/agent-row.types'
+import { AdapterIcon, adapterLabel } from './AdapterIcon'
+import {
+  canDelete as canDeleteAgent,
+  canRename as canRenameAgent,
+  displayName,
+  formatRelativeTime,
+  workspaceLabel,
+} from './agent-display.helpers'
+import type { HarnessAgentAdapter } from './agent-harness-types'
+import type { AgentListItem } from './agents-page-types'
+import { type AgentLiveness, LivenessDot } from './LivenessDot'

-interface AgentRowCardProps extends AgentRowCallbacks {
-  data: AgentRowData
-  /** Whether THIS agent is mid-delete; renders a spinner in the menu. */
+interface AgentRowCardProps {
+  agent: AgentListItem
+  /**
+   * Per-agent extras the listing surface provides on top of the
+   * minimal `AgentListItem` shape. `lastUsedAt` survives server
+   * restart (sourced from acpx session record); `status` is in-memory
+   * server-side.
+   */
+  status?: AgentLiveness
+  lastUsedAt?: number | null
+  /** Adapter the agent belongs to. Drives icon + label. */
+  adapter?: HarnessAgentAdapter
+  /** Reasoning effort chip (claude/codex/openclaw catalog). */
+  reasoningEffort?: string | null
+  /** Modeled directly off the inbound delete handler so the parent owns the dialog. */
+  onDelete: (agent: AgentListItem) => void
+  /** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
  deleting?: boolean
 }

-/**
- * Composition shell for the agent rail. Owns no state; sub-components
- * each handle their own micro-state (error-panel collapse, etc.) and
- * emit callbacks (delete, pin/unpin) for the page to act on.
- *
- * The whole card carries state — not just the tile — so the row's
- * border subtly tells the user what's going on at a glance:
- *   working → accent-orange border with a soft glow
- *   error   → destructive border
- *   idle    → muted border, lifts on hover
- */
 export const AgentRowCard: FC<AgentRowCardProps> = ({
-  data,
-  deleting,
+  agent,
+  status = 'unknown',
+  lastUsedAt,
+  adapter,
+  reasoningEffort,
  onDelete,
-  onPinToggle,
+  deleting,
 }) => {
+  const navigate = useNavigate()
+  const adapterId = adapter ?? inferAdapterFromListItem(agent)
+  const workspace = workspaceLabel(agent)
+  const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
+  const allowDelete = canDeleteAgent(agent)
+  const allowRename = canRenameAgent(agent)
+
+  const handleChat = () => navigate(`/agents/${agent.agentId}`)
+  const handleCopyId = async () => {
+    try {
+      await navigator.clipboard.writeText(agent.agentId)
+      toast.success('Agent id copied')
+    } catch {
+      toast.error('Could not copy agent id')
+    }
+  }
+
  return (
    <div
      className={cn(
-        // Layout-stable hover. No translate, no shadow change — both
-        // visibly perturb neighbouring rows. Only the border tint
-        // shifts on hover, and the rail's vertical rhythm stays
-        // exactly the same in every state.
-        'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
-        data.status === 'working'
-          ? 'border-[var(--accent-orange)]/40'
-          : data.status === 'error'
-            ? 'border-destructive/40'
-            : 'border-border hover:border-[var(--accent-orange)]/30',
+        'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
+        'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
      )}
    >
      <div className="flex items-start gap-4">
-        <AgentTile
-          adapter={data.adapter}
-          status={data.status}
-          lastUsedAt={data.lastUsedAt}
-        />
-
-        <div className="min-w-0 flex-1">
-          <AgentTitleRow
-            agent={data.agent}
-            status={data.status}
-            pinned={data.pinned}
-            turnsByDay={data.turnsByDay}
-            failedByDay={data.failedByDay}
-            onPinToggle={(next) => onPinToggle(data.agent, next)}
+        {/* Adapter tile + liveness dot in the corner. */}
+        <div className="relative shrink-0">
+          <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
+            <AdapterIcon adapter={adapterId} className="h-6 w-6" />
+          </div>
+          <LivenessDot
+            status={status}
+            detail={livenessDetail(status, lastUsedAt)}
+            className="absolute -right-0.5 -bottom-0.5"
          />
-
-          <AgentSummaryChips
-            adapter={data.adapter}
-            modelLabel={data.modelLabel}
-            reasoningEffort={data.reasoningEffort}
-            adapterHealth={data.adapterHealth}
-          />
-
-          <AgentLastMessage message={data.lastUserMessage} />
-
-          <AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
-
-          {data.status === 'error' && data.lastError && (
-            <AgentErrorPanel
-              agentId={data.agent.agentId}
-              message={data.lastError}
-              errorAt={data.lastErrorAt}
-            />
-          )}
        </div>

-        <AgentActions
-          agent={data.agent}
-          activeTurnId={data.activeTurnId}
-          deleting={deleting}
-          onDelete={onDelete}
-        />
+        <div className="min-w-0 flex-1">
+          <div className="mb-1 flex items-center gap-2">
+            <span className="truncate font-semibold">{displayName(agent)}</span>
+            {status === 'working' && (
+              <Badge
+                variant="secondary"
+                className="bg-amber-50 text-amber-900 hover:bg-amber-50"
+              >
+                Working
+              </Badge>
+            )}
+            {status === 'asleep' && (
+              <Badge variant="outline" className="text-muted-foreground">
+                Asleep
+              </Badge>
+            )}
+            {status === 'error' && (
+              <Badge variant="destructive">Attention</Badge>
+            )}
+          </div>
+
+          <div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
+            <Badge variant="secondary" className="font-normal">
+              {adapterLabel(adapterId)}
+            </Badge>
+            {agent.modelLabel && agent.modelLabel !== 'default' && (
+              <Badge variant="outline" className="font-normal">
+                {agent.modelLabel}
+              </Badge>
+            )}
+            {reasoningEffort && reasoningEffort !== 'medium' && (
+              <Badge variant="outline" className="font-normal">
+                {reasoningEffort}
+              </Badge>
+            )}
+          </div>
+
+          <div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
+            <span>Last used {lastUsedLabel}</span>
+            {workspace && (
+              <>
+                <span aria-hidden>•</span>
+                <span className="truncate font-mono" title={workspace}>
+                  {workspace}
+                </span>
+              </>
+            )}
+          </div>
+        </div>
+
+        <div className="flex shrink-0 items-center gap-2">
+          <Button variant="outline" size="sm" onClick={handleChat}>
+            <MessageSquare className="mr-1.5 h-3 w-3" />
+            Chat
+          </Button>
+          <DropdownMenu>
+            <DropdownMenuTrigger asChild>
+              <Button
+                variant="ghost"
+                size="icon"
+                aria-label={`More actions for ${displayName(agent)}`}
+                className="h-8 w-8"
+              >
+                <MoreHorizontal className="h-4 w-4" />
+              </Button>
+            </DropdownMenuTrigger>
+            <DropdownMenuContent align="end" className="w-44">
+              <DropdownMenuItem onSelect={() => void handleCopyId()}>
+                <Copy className="mr-2 h-3.5 w-3.5" />
+                Copy id
+              </DropdownMenuItem>
+              <RenameMenuItem disabled={!allowRename} />
+              <ResetHistoryMenuItem />
+              <DropdownMenuSeparator />
+              <DropdownMenuItem
+                onSelect={() => onDelete(agent)}
+                disabled={!allowDelete || deleting}
+                className="text-destructive focus:text-destructive"
+              >
+                {deleting ? (
+                  <Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
+                ) : (
+                  <Trash2 className="mr-2 h-3.5 w-3.5" />
+                )}
+                Delete
+              </DropdownMenuItem>
+            </DropdownMenuContent>
+          </DropdownMenu>
+        </div>
      </div>
    </div>
  )
 }
+
+const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <Pencil className="mr-2 h-3.5 w-3.5" />
+      Rename
+    </DropdownMenuItem>
+  )
+  if (!disabled) return item
+  // Disabled but with a hint so users know it's coming, not broken.
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Rename coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+const ResetHistoryMenuItem: FC = () => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <RotateCcw className="mr-2 h-3.5 w-3.5" />
+      Reset history
+    </DropdownMenuItem>
+  )
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Reset history coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+function inferAdapterFromListItem(
+  agent: AgentListItem,
+): HarnessAgentAdapter | 'unknown' {
+  const label = agent.runtimeLabel?.toLowerCase()
+  if (label?.includes('claude')) return 'claude'
+  if (label?.includes('codex')) return 'codex'
+  if (label?.includes('openclaw')) return 'openclaw'
+  return 'unknown'
+}
+
+function livenessDetail(
+  status: AgentLiveness,
+  lastUsedAt: number | null | undefined,
+): string | undefined {
+  if (lastUsedAt == null) return undefined
+  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
+  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
+  if (status === 'asleep') {
+    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
+    const hr = Math.floor(diffMin / 60)
+    return `Asleep — quiet for ${hr} hr`
+  }
+  if (status === 'working') return 'Working on a turn'
+  if (status === 'error') return 'Attention — last turn failed'
+  return undefined
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
@@ -44,7 +44,6 @@ import {
  useCreateHarnessAgent,
  useDeleteHarnessAgent,
  useHarnessAgents,
-  useUpdateHarnessAgent,
 } from './useAgents'
 import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'

@@ -77,7 +76,6 @@ export const AgentsPage: FC = () => {
  } = useOpenClawAgents(openClawAgentsEnabled)
  const createHarnessAgent = useCreateHarnessAgent()
  const deleteHarnessAgent = useDeleteHarnessAgent()
-  const updateHarnessAgent = useUpdateHarnessAgent()
  const {
    setupOpenClaw,
    createAgent: createOpenClawAgent,
@@ -344,24 +342,12 @@ export const AgentsPage: FC = () => {
          agents={agentListItems}
          activity={agentActivity}
          harnessAgentLookup={harnessAgentLookup}
-          adapters={adapters}
          loading={agentsLoading}
          deletingAgentKey={deletingAgent ? deletingAgentKey : null}
          onCreateAgent={() => setCreateOpen(true)}
          onDeleteAgent={(agent) => {
            void handleDelete(agent)
          }}
-          onPinToggle={(agent, next) => {
-            // Optimistic mutation; harness-only — gateway-original
-            // OpenClaw entries are gated server-side via the harness
-            // backfill, so we only fire when the row maps to a
-            // harness agent record.
-            if (!harnessAgentLookup.has(agent.agentId)) return
-            updateHarnessAgent.mutate({
-              agentId: agent.agentId,
-              patch: { pinned: next },
-            })
-          }}
        />

        <SetupOpenClawDialog
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
@@ -1,5 +1,4 @@
 import type { AgentListItem } from './agents-page-types'
-import type { AgentLiveness } from './LivenessDot'

 /**
 * Display rules for the redesigned agent rows. Pure helpers — no React,
@@ -83,25 +82,3 @@ export function formatRelativeTime(epochMs: number | null): string {
  const d = Math.floor(diff / ONE_DAY)
  return d === 1 ? '1 day ago' : `${d} days ago`
 }
-
-/**
- * Tooltip-friendly description of a row's current liveness state.
- * Returns `undefined` when the state has nothing extra to add (e.g.
- * `unknown` with no timestamp).
- */
-export function livenessDetail(
-  status: AgentLiveness,
-  lastUsedAt: number | null | undefined,
-): string | undefined {
-  if (lastUsedAt == null) return undefined
-  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
-  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
-  if (status === 'asleep') {
-    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
-    const hr = Math.floor(diffMin / 60)
-    return `Asleep — quiet for ${hr} hr`
-  }
-  if (status === 'working') return 'Working on a turn'
-  if (status === 'error') return 'Attention — last turn failed'
-  return undefined
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
@@ -56,43 +56,6 @@ export interface HarnessAgent {
   * agents. Drives the recency sort and the "Last used X min ago" copy.
   */
  lastUsedAt?: number | null
-  /** Pinned agents float to the top of the list. Defaults to `false`. */
-  pinned?: boolean
-  /** First non-blank line of the most recent user message; null if none. */
-  lastUserMessage?: string | null
-  /** Working directory the agent runs in; null when no session record yet. */
-  cwd?: string | null
-  /** Cumulative + 7-day rolling token usage; null when no record. */
-  tokens?: {
-    last7d: { input: number; output: number; requestCount: number }
-    cumulative: { input: number; output: number }
-  } | null
-  turnsByDay?: number[]
-  failedByDay?: number[]
-  lastError?: string | null
-  lastErrorAt?: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId?: string | null
-  /** Persistent FIFO queue of messages waiting for this agent. */
-  queue?: HarnessQueuedMessage[]
-}
-
-export interface HarnessQueuedMessageAttachment {
-  mediaType: string
-  data: string
-}
-
-export interface HarnessQueuedMessage {
-  id: string
-  createdAt: number
-  message: string
-  attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
-}
-
-export interface HarnessAdapterHealth {
-  healthy: boolean
-  reason?: string
-  checkedAt: number
 }

 export interface HarnessAdapterDescriptor {
@@ -103,7 +66,6 @@ export interface HarnessAdapterDescriptor {
  modelControl: 'runtime-supported' | 'best-effort'
  models: Array<{ id: string; label: string; recommended?: boolean }>
  reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
-  health?: HarnessAdapterHealth
 }

 export interface CreateHarnessAgentInput {
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
@@ -1,160 +0,0 @@
-import {
-  Copy,
-  Loader2,
-  MessageSquare,
-  MoreHorizontal,
-  Pencil,
-  RotateCcw,
-  Trash2,
-} from 'lucide-react'
-import type { FC } from 'react'
-import { useNavigate } from 'react-router'
-import { toast } from 'sonner'
-import { Button } from '@/components/ui/button'
-import {
-  DropdownMenu,
-  DropdownMenuContent,
-  DropdownMenuItem,
-  DropdownMenuSeparator,
-  DropdownMenuTrigger,
-} from '@/components/ui/dropdown-menu'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import {
-  canDelete as canDeleteAgent,
-  canRename as canRenameAgent,
-  displayName,
-} from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-
-interface AgentActionsProps {
-  agent: AgentListItem
-  activeTurnId: string | null
-  deleting?: boolean
-  onDelete: (agent: AgentListItem) => void
-}
-
-/**
- * Single primary CTA per row: `Resume` (filled, accent-orange, with a
- * pulsing dot) when an active turn exists; otherwise `Chat` (outline).
- * Both navigate to the same place — the chat hook auto-attaches via
- * `/chat/active` when there's a live turn — but the row signals which
- * action the user is actually taking.
- */
-export const AgentActions: FC<AgentActionsProps> = ({
-  agent,
-  activeTurnId,
-  deleting,
-  onDelete,
-}) => {
-  const navigate = useNavigate()
-  const allowDelete = canDeleteAgent(agent)
-  const allowRename = canRenameAgent(agent)
-
-  const handleChat = () => navigate(`/agents/${agent.agentId}`)
-  const handleCopyId = async () => {
-    try {
-      await navigator.clipboard.writeText(agent.agentId)
-      toast.success('Agent id copied')
-    } catch {
-      toast.error('Could not copy agent id')
-    }
-  }
-
-  return (
-    <div className="flex shrink-0 items-center gap-1.5">
-      {activeTurnId ? (
-        <Button
-          variant="default"
-          size="sm"
-          onClick={handleChat}
-          className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
-        >
-          <span className="relative flex size-2">
-            <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-            <span className="relative inline-flex size-2 rounded-full bg-white" />
-          </span>
-          Resume
-        </Button>
-      ) : (
-        <Button variant="outline" size="sm" onClick={handleChat}>
-          <MessageSquare className="mr-1.5 size-3" />
-          Chat
-        </Button>
-      )}
-      <DropdownMenu>
-        <DropdownMenuTrigger asChild>
-          <Button
-            variant="ghost"
-            size="icon"
-            aria-label={`More actions for ${displayName(agent)}`}
-            className="size-8 text-muted-foreground hover:text-foreground"
-          >
-            <MoreHorizontal className="size-4" />
-          </Button>
-        </DropdownMenuTrigger>
-        <DropdownMenuContent align="end" className="w-44">
-          <DropdownMenuItem onSelect={() => void handleCopyId()}>
-            <Copy className="mr-2 size-3.5" />
-            Copy id
-          </DropdownMenuItem>
-          <ComingSoonItem
-            icon={Pencil}
-            label="Rename"
-            disabled={!allowRename}
-          />
-          <ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
-          <DropdownMenuSeparator />
-          <DropdownMenuItem
-            onSelect={() => onDelete(agent)}
-            disabled={!allowDelete || deleting}
-            className="text-destructive focus:text-destructive"
-          >
-            {deleting ? (
-              <Loader2 className="mr-2 size-3.5 animate-spin" />
-            ) : (
-              <Trash2 className="mr-2 size-3.5" />
-            )}
-            Delete
-          </DropdownMenuItem>
-        </DropdownMenuContent>
-      </DropdownMenu>
-    </div>
-  )
-}
-
-interface ComingSoonItemProps {
-  icon: typeof Pencil
-  label: string
-  disabled: boolean
-}
-
-const ComingSoonItem: FC<ComingSoonItemProps> = ({
-  icon: Icon,
-  label,
-  disabled,
-}) => {
-  const item = (
-    <DropdownMenuItem disabled className="text-muted-foreground">
-      <Icon className="mr-2 size-3.5" />
-      {label}
-    </DropdownMenuItem>
-  )
-  if (!disabled) return item
-  return (
-    <TooltipProvider delayDuration={300}>
-      <Tooltip>
-        <TooltipTrigger asChild>
-          <span className="block w-full">{item}</span>
-        </TooltipTrigger>
-        <TooltipContent side="left" className="text-xs">
-          {label} coming soon
-        </TooltipContent>
-      </Tooltip>
-    </TooltipProvider>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
@@ -1,96 +0,0 @@
-import { AlertTriangle, ChevronDown } from 'lucide-react'
-import { type FC, useEffect, useState } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Collapsible,
-  CollapsibleContent,
-  CollapsibleTrigger,
-} from '@/components/ui/collapsible'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { truncate } from './agent-row.helpers'
-
-interface AgentErrorPanelProps {
-  agentId: string
-  message: string
-  errorAt: number | null
-}
-
-const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
-const PREVIEW_CHARS = 200
-
-export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
-  agentId,
-  message,
-  errorAt,
-}) => {
-  const storageKey = `${STORAGE_PREFIX}${agentId}`
-  // Open if we've never seen this `errorAt` for this agent. Once the
-  // user collapses the panel (or refreshes after seeing it), we mark
-  // it seen so it doesn't re-pop on every poll.
-  const [open, setOpen] = useState<boolean>(() => {
-    if (typeof window === 'undefined' || !errorAt) return true
-    const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
-    return !Number.isFinite(seen) || errorAt > seen
-  })
-
-  useEffect(() => {
-    if (!open && errorAt && typeof window !== 'undefined') {
-      window.localStorage.setItem(storageKey, String(errorAt))
-    }
-  }, [open, errorAt, storageKey])
-
-  const preview = truncate(message, PREVIEW_CHARS)
-  const truncated = preview.length < message.length
-
-  return (
-    <Collapsible open={open} onOpenChange={setOpen} className="mt-3">
-      <div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
-        <div className="flex items-center gap-2 font-medium text-destructive text-xs">
-          <AlertTriangle className="size-3.5" />
-          Last error
-        </div>
-        <CollapsibleTrigger asChild>
-          <Button
-            variant="ghost"
-            size="sm"
-            className="h-6 px-2 text-muted-foreground"
-          >
-            <span className="text-xs">{open ? 'hide' : 'show'}</span>
-            <ChevronDown
-              className={cn(
-                'ml-1 size-3 transition-transform',
-                open && 'rotate-180',
-              )}
-            />
-          </Button>
-        </CollapsibleTrigger>
-      </div>
-      <CollapsibleContent>
-        <div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
-          {truncated ? (
-            <HoverCard openDelay={300}>
-              <HoverCardTrigger asChild>
-                <span className="cursor-default font-mono text-foreground/80">
-                  {preview}…
-                </span>
-              </HoverCardTrigger>
-              <HoverCardContent
-                side="bottom"
-                className="max-w-md whitespace-pre-wrap font-mono text-xs"
-              >
-                {message}
-              </HoverCardContent>
-            </HoverCard>
-          ) : (
-            <span className="font-mono text-foreground/80">{message}</span>
-          )}
-        </div>
-      </CollapsibleContent>
-    </Collapsible>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
@@ -1,35 +0,0 @@
-import { Quote } from 'lucide-react'
-import type { FC } from 'react'
-import { firstNonBlankLine, truncate } from './agent-row.helpers'
-
-interface AgentLastMessageProps {
-  message: string | null
-}
-
-const PREVIEW_CHARS = 110
-
-/**
- * Inline preview of the most recent user message. Renders as a quoted,
- * italic line so the row reads like a conversation snippet rather than
- * a label-and-value pair. No hover-card — opening the agent's chat is
- * the canonical way to read the full message.
- */
-export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
-  return (
-    <p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="truncate">{preview}</span>
-    </p>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { formatRelativeTime } from '../agent-display.helpers'
-import { AgentTokenSummary } from './AgentTokenSummary'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentMetaRowProps {
-  lastUsedAt: number | null
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Bottom-of-row meta line. Intentionally sparse — last activity time
- * and lifetime tokens. CWD is no longer surfaced here because the path
- * the server happens to be running from isn't actionable; if a future
- * surface needs the cwd (chat panel, debug view) it reads from the
- * listing payload directly.
- */
-export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
-  const lastUsedLabel = formatRelativeTime(lastUsedAt)
-  const tokensTotal =
-    (tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
-  const showTokens = tokensTotal > 0
-
-  return (
-    <div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
-      <span>{lastUsedLabel}</span>
-      {showTokens && (
-        <>
-          <span aria-hidden className="text-muted-foreground/50">
-            ·
-          </span>
-          <AgentTokenSummary tokens={tokens} />
-        </>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
@@ -1,92 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
-
-interface AgentSparklineProps {
-  /** 14 entries, oldest → newest. Today's bucket is the last index. */
-  turnsByDay: number[]
-  /** Same length, same order. Failed turns counted separately. */
-  failedByDay: number[]
-  className?: string
-}
-
-const MIN_BAR_HEIGHT_PX = 2
-const MAX_BAR_HEIGHT_PX = 18
-
-export const AgentSparkline: FC<AgentSparklineProps> = ({
-  turnsByDay,
-  failedByDay,
-  className,
-}) => {
-  if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
-  const max = Math.max(1, ...turnsByDay)
-
-  return (
-    <HoverCard openDelay={250}>
-      <HoverCardTrigger asChild>
-        <div
-          role="img"
-          aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
-          className={cn('flex h-5 items-end gap-px', className)}
-        >
-          {turnsByDay.map((count, idx) => {
-            const ratio = count / max
-            const height = Math.max(
-              MIN_BAR_HEIGHT_PX,
-              Math.round(ratio * MAX_BAR_HEIGHT_PX),
-            )
-            const isToday = idx === ROW_BAR_COUNT - 1
-            const failed = failedByDay[idx] ?? 0
-            return (
-              <div
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
-                key={`bar-${idx}`}
-                className={cn(
-                  'w-1.5 rounded-sm',
-                  count === 0
-                    ? 'bg-muted-foreground/15'
-                    : failed > 0
-                      ? 'bg-destructive/50'
-                      : 'bg-[var(--accent-orange)]/50',
-                  isToday && 'ring-1 ring-foreground/30',
-                )}
-                style={{ height }}
-              />
-            )
-          })}
-        </div>
-      </HoverCardTrigger>
-      <HoverCardContent side="left" className="w-56 text-xs">
-        <div className="mb-2 font-medium text-sm">Last 14 days</div>
-        <ul className="space-y-0.5">
-          {turnsByDay.map((count, idx) => {
-            const failed = failedByDay[idx] ?? 0
-            const dayLabel = formatLocalDate(idx)
-            return (
-              <li
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
-                key={`day-${idx}`}
-                className="flex items-center justify-between text-muted-foreground"
-              >
-                <span>{dayLabel}</span>
-                <span>
-                  {count}
-                  {failed > 0 && (
-                    <span className="ml-1 text-destructive">
-                      ({failed} failed)
-                    </span>
-                  )}
-                </span>
-              </li>
-            )
-          })}
-        </ul>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
@@ -1,71 +0,0 @@
-import { TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { adapterLabel } from '../AdapterIcon'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentAdapterHealth } from './agent-row.types'
-
-interface AgentSummaryChipsProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  /** When unhealthy, the adapter label dims and a warning chip appears. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-/**
- * Adapter / model / reasoning summary line. Always rendered (so OpenClaw
- * rows that fall back to defaults still expose what they're set up to do)
- * and surfaces adapter-health *only when unhealthy* — keeping the calm
- * default state silent and reserving visual noise for things the user
- * needs to act on.
- */
-export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
-  adapter,
-  modelLabel,
-  reasoningEffort,
-  adapterHealth,
-}) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelLabel) parts.push(modelLabel)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && adapterHealth && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { cn } from '@/lib/utils'
-import { AdapterIcon } from '../AdapterIcon'
-import { livenessDetail } from '../agent-display.helpers'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import { type AgentLiveness, LivenessDot } from '../LivenessDot'
-
-export interface AgentTileProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  status: AgentLiveness
-  lastUsedAt: number | null
-}
-
-/**
- * Adapter glyph + a single liveness dot. Adapter health is no longer
- * surfaced here — it lives as an inline pill inside `AgentSummaryChips`
- * so the user isn't asked to disambiguate two dots on the same tile.
- */
-export const AgentTile: FC<AgentTileProps> = ({
-  adapter,
-  status,
-  lastUsedAt,
-}) => (
-  <div className="relative shrink-0">
-    <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
-      <AdapterIcon adapter={adapter} className="h-6 w-6" />
-    </div>
-    <LivenessDot
-      status={status}
-      detail={livenessDetail(status, lastUsedAt)}
-      className={cn(
-        'absolute -right-0.5 -bottom-0.5',
-        status === 'working' && 'animate-pulse',
-      )}
-    />
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
@@ -1,55 +0,0 @@
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import { displayName } from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-import { AgentSparkline } from './AgentSparkline'
-import { PinToggle } from './PinToggle'
-
-interface AgentTitleRowProps {
-  agent: AgentListItem
-  status: AgentLiveness
-  pinned: boolean
-  turnsByDay: number[]
-  failedByDay: number[]
-  onPinToggle: (next: boolean) => void
-}
-
-/**
- * Title strip: name + status badge + (right-aligned) sparkline. The
- * pin toggle sits trailing the title so the title always flushes left
- * regardless of pin state — moving the star left of the title indents
- * the row's first line off-axis from the model/preview/meta lines
- * below it. When unpinned and not hovered, the toggle is removed from
- * layout entirely so it reserves no space at all.
- */
-export const AgentTitleRow: FC<AgentTitleRowProps> = ({
-  agent,
-  status,
-  pinned,
-  turnsByDay,
-  failedByDay,
-  onPinToggle,
-}) => (
-  <div className="mb-1 flex items-center gap-2">
-    <span className="truncate font-semibold">{displayName(agent)}</span>
-    {status === 'working' && (
-      <Badge
-        variant="secondary"
-        className="bg-amber-50 text-amber-900 hover:bg-amber-50"
-      >
-        Working
-      </Badge>
-    )}
-    {status === 'asleep' && (
-      <Badge variant="outline" className="text-muted-foreground">
-        Asleep
-      </Badge>
-    )}
-    {status === 'error' && <Badge variant="destructive">Attention</Badge>}
-    <PinToggle pinned={pinned} onToggle={onPinToggle} />
-    <div className="ml-auto">
-      <AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
-    </div>
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
@@ -1,63 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { Progress } from '@/components/ui/progress'
-import { formatTokens } from './agent-row.helpers'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentTokenSummaryProps {
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
- * (the only window we can compute reliably from the session record).
- * Per-window stats land in a follow-up once the activity ledger ships.
- */
-export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
-  if (!tokens) return null
-  const { input, output } = tokens.cumulative
-  const total = input + output
-  if (total === 0) return null
-  const inputPct = (input / total) * 100
-
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
-          {formatTokens(total)} tokens
-        </span>
-      </HoverCardTrigger>
-      <HoverCardContent side="top" align="end" className="w-72 text-sm">
-        <div className="mb-3 flex items-center justify-between">
-          <span className="font-medium">Lifetime tokens</span>
-          <span className="text-muted-foreground text-xs tabular-nums">
-            {formatTokens(total)} total
-          </span>
-        </div>
-
-        <div className="space-y-2">
-          <div className="flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Input</span>
-            <span className="tabular-nums">{formatTokens(input)}</span>
-          </div>
-          <Progress value={inputPct} className="h-1.5" />
-
-          <div className="mt-2 flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Output</span>
-            <span className="tabular-nums">{formatTokens(output)}</span>
-          </div>
-          <Progress value={100 - inputPct} className="h-1.5" />
-        </div>
-
-        <p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
-          Cumulative across every turn this agent has run. Per-window stats
-          arrive in a future release.
-        </p>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
@@ -1,60 +0,0 @@
-import { Star } from 'lucide-react'
-import type { FC } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import { cn } from '@/lib/utils'
-
-interface PinToggleProps {
-  pinned: boolean
-  onToggle: (next: boolean) => void
-}
-
-/**
- * Trailing star toggle. The button is *always rendered* — only its
- * opacity changes between pinned/unpinned/hover states — so the title
- * row's height is constant. Hiding the slot via `display: none` would
- * collapse the row's vertical metrics on hover and shift every card
- * below in the rail.
- *
- * Placement is trailing the title (after the status badge) so the
- * title itself flushes left regardless of pin state — leading the
- * row with the star would indent the title relative to the model /
- * preview / meta lines beneath it.
- */
-export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
-  <TooltipProvider delayDuration={300}>
-    <Tooltip>
-      <TooltipTrigger asChild>
-        <Button
-          variant="ghost"
-          size="icon"
-          className={cn(
-            'size-6 text-muted-foreground transition-opacity hover:text-foreground',
-            pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
-          )}
-          aria-pressed={pinned}
-          aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
-          onClick={(event) => {
-            event.stopPropagation()
-            onToggle(!pinned)
-          }}
-        >
-          <Star
-            className={cn(
-              'size-3.5',
-              pinned && 'fill-amber-400 text-amber-500',
-            )}
-          />
-        </Button>
-      </TooltipTrigger>
-      <TooltipContent side="top" className="text-xs">
-        {pinned ? 'Unpin' : 'Pin to top'}
-      </TooltipContent>
-    </Tooltip>
-  </TooltipProvider>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
@@ -1,73 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import {
-  firstNonBlankLine,
-  formatLocalDate,
-  formatTokens,
-  ROW_BAR_COUNT,
-  truncate,
-} from './agent-row.helpers'
-
-describe('formatTokens', () => {
-  it('renders zero / NaN as "0"', () => {
-    expect(formatTokens(0)).toBe('0')
-    expect(formatTokens(Number.NaN)).toBe('0')
-  })
-
-  it('renders sub-1K as integer', () => {
-    expect(formatTokens(142)).toBe('142')
-  })
-
-  it('renders K with one decimal under 10', () => {
-    expect(formatTokens(8_400)).toBe('8.4K')
-  })
-
-  it('drops the decimal at >=10K', () => {
-    expect(formatTokens(120_000)).toBe('120K')
-  })
-
-  it('renders M with one decimal under 10', () => {
-    expect(formatTokens(1_200_000)).toBe('1.2M')
-  })
-})
-
-describe('firstNonBlankLine', () => {
-  it('returns the first non-blank line', () => {
-    expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
-  })
-
-  it('skips USER_QUERY envelope tags', () => {
-    expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
-      'fix tests',
-    )
-  })
-
-  it('falls back to the trimmed input when nothing matches', () => {
-    expect(firstNonBlankLine('   single   ')).toBe('single')
-  })
-})
-
-describe('truncate', () => {
-  it('returns input unchanged when within limit', () => {
-    expect(truncate('hello', 10)).toBe('hello')
-  })
-
-  it('appends an ellipsis when over limit', () => {
-    expect(truncate('hello world', 6)).toBe('hello…')
-  })
-})
-
-describe('formatLocalDate', () => {
-  const today = new Date('2026-04-30T12:00:00Z')
-
-  it('labels today and yesterday explicitly', () => {
-    expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
-    expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
-  })
-
-  it('returns a "Mon D" format for older days', () => {
-    const label = formatLocalDate(0, today)
-    // "Apr 17" or "Apr 17," depending on locale; just assert it
-    // contains a month abbreviation and a day number.
-    expect(label).toMatch(/[A-Za-z]+ \d+/)
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
@@ -1,64 +0,0 @@
-/**
- * Pure formatters consumed by row sub-components. Kept distinct from
- * `agent-display.helpers.ts` (page-level helpers) so the row internals
- * have an obvious single home.
- */
-
-const TOKEN_THRESHOLDS: Array<[number, string]> = [
-  [1_000_000, 'M'],
-  [1_000, 'K'],
-]
-
-/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
-export function formatTokens(n: number): string {
-  if (!Number.isFinite(n) || n <= 0) return '0'
-  for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
-    if (n >= threshold) {
-      const value = n / threshold
-      const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
-      return `${decimal}${suffix}`
-    }
-  }
-  return String(Math.round(n))
-}
-
-const USER_QUERY_OPEN = /^<USER_QUERY>$/i
-const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
-
-/**
- * First non-blank line, with the BrowserOS user-system-prompt
- * `<USER_QUERY>` envelope tags stripped so previews don't show
- * structural noise.
- */
-export function firstNonBlankLine(text: string): string {
-  const lines = text.split('\n').map((line) => line.trim())
-  for (const line of lines) {
-    if (!line) continue
-    if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
-    return line
-  }
-  return text.trim()
-}
-
-export function truncate(text: string, max: number): string {
-  if (text.length <= max) return text
-  return `${text.slice(0, max - 1).trimEnd()}…`
-}
-
-const SPARKLINE_DAYS = 14
-
-/**
- * "today" / "yesterday" / "Apr 17" — given an index 0..13 from
- * oldest → newest. `today` defaults to `new Date()` so callers don't
- * have to thread a clock through.
- */
-export function formatLocalDate(idx: number, today: Date = new Date()): string {
-  if (idx === SPARKLINE_DAYS - 1) return 'today'
-  if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
-  const offset = SPARKLINE_DAYS - 1 - idx
-  const date = new Date(today)
-  date.setDate(date.getDate() - offset)
-  return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
-}
-
-export const ROW_BAR_COUNT = SPARKLINE_DAYS
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
@@ -1,51 +0,0 @@
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-
-/**
- * Window-bounded token usage. Server returns `null` when no session
- * record exists yet for the agent.
- */
-export interface AgentTokenUsage {
-  last7d: { input: number; output: number; requestCount: number }
-  cumulative: { input: number; output: number }
-}
-
-export interface AgentAdapterHealth {
-  healthy: boolean
-  reason?: string
-}
-
-/**
- * Everything an `AgentRowCard` needs to render. Mirrors the shape
- * `useHarnessAgents` exposes; the page assembles one entry per row in
- * `AgentList` and passes it down. Sub-components only see slices of
- * this object — no prop drilling beyond two levels.
- */
-export interface AgentRowData {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  status: AgentLiveness
-  lastUsedAt: number | null
-  pinned: boolean
-  cwd: string | null
-  lastUserMessage: string | null
-  tokens: AgentTokenUsage | null
-  /** 14 entries, oldest → newest. Today is the last index. */
-  turnsByDay: number[]
-  /** Same length and ordering as `turnsByDay`. */
-  failedByDay: number[]
-  lastError: string | null
-  lastErrorAt: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId: string | null
-  /** Adapter-level health, shared across rows for the same adapter. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-export interface AgentRowCallbacks {
-  onDelete: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.test.ts
@@ -1,104 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import type { HarnessAgent } from './agent-harness-types'
-import {
-  compareAgentsByPinThenRecency,
-  orderAgentsByPinThenRecency,
-} from './agents-list-order'
-
-function makeAgent(input: {
-  id: string
-  pinned?: boolean
-  lastUsedAt?: number | null
-}): HarnessAgent {
-  return {
-    id: input.id,
-    name: input.id,
-    adapter: 'codex',
-    permissionMode: 'approve-all',
-    sessionKey: 'session',
-    createdAt: 0,
-    updatedAt: 0,
-    pinned: input.pinned,
-    lastUsedAt: input.lastUsedAt,
-  }
-}
-
-describe('orderAgentsByPinThenRecency', () => {
-  it('floats pinned agents to the top regardless of recency', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
-      makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
-      makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
-  })
-
-  it('sorts by lastUsedAt desc within each pin group', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
-      makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
-      makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
-      makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual([
-      'newer-pin',
-      'older-pin',
-      'newer',
-      'older',
-    ])
-  })
-
-  it('seed-pins the gateway main agent above other never-used agents', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
-      makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
-      makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
-  })
-
-  it('drops the main seed-pin once the agent has been used', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
-      makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
-  })
-
-  it('puts never-used agents below recently-used ones', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
-      makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
-  })
-
-  it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
-    const result = orderAgentsByPinThenRecency([
-      makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
-      makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
-    ])
-    expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
-  })
-})
-
-describe('compareAgentsByPinThenRecency', () => {
-  it('produces the same order as the harness-shape helper', () => {
-    const items = [
-      { id: 'older', pinned: false, lastUsedAt: 50 },
-      { id: 'newer', pinned: false, lastUsedAt: 80 },
-      { id: 'pinned', pinned: true, lastUsedAt: 1 },
-    ]
-    const sorted = [...items].sort(compareAgentsByPinThenRecency)
-    expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
-  })
-
-  it('seeds the main agent above other never-used rows', () => {
-    const items = [
-      { id: 'zzz', pinned: false, lastUsedAt: null },
-      { id: 'main', pinned: false, lastUsedAt: null },
-    ]
-    const sorted = [...items].sort(compareAgentsByPinThenRecency)
-    expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agents-list-order.ts
@@ -1,59 +0,0 @@
-import type { HarnessAgent } from './agent-harness-types'
-
-/**
- * Stable ordering for index-shaped agent surfaces (the `/agents` rail
- * and the chat-screen rail at `/agents/:agentId`). Pinned rows float
- * to the top, then recency desc, with never-used agents falling to
- * the bottom in id-stable order. The gateway's `main` agent gets
- * seed-pinned to the top of the never-used group so a fresh install
- * has an obvious starting point even before the user has used it.
- *
- * NOT the same rule as the home grid (`orderHomeAgents`): home is
- * action-shaped — active-turn floats to the top — so users can
- * resume what's running. The chat rail keeps recency stable so it
- * doesn't reshuffle as turns transition every 5s.
- */
-export function orderAgentsByPinThenRecency(
-  agents: HarnessAgent[],
-): HarnessAgent[] {
-  return [...agents].sort((a, b) => {
-    const aPinned = a.pinned ?? false
-    const bPinned = b.pinned ?? false
-    if (aPinned !== bPinned) return aPinned ? -1 : 1
-
-    const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
-    const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
-    if (aSeed && !bSeed) return -1
-    if (!aSeed && bSeed) return 1
-
-    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    if (aValue !== bValue) return bValue - aValue
-
-    return a.id.localeCompare(b.id)
-  })
-}
-
-/**
- * Same comparator, but operates over arbitrary records that carry
- * `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
- * `/agents` `AgentList` which pivots `AgentListItem` + harness
- * lookup into a sortable shape; both surfaces stay on identical
- * sort semantics through this adapter.
- */
-export function compareAgentsByPinThenRecency<
-  T extends { pinned: boolean; lastUsedAt: number | null; id: string },
->(a: T, b: T): number {
-  if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
-
-  const aSeed = a.id === 'main' && a.lastUsedAt === null
-  const bSeed = b.id === 'main' && b.lastUsedAt === null
-  if (aSeed && !bSeed) return -1
-  if (!aSeed && bSeed) return 1
-
-  const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
-  const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
-  if (aValue !== bValue) return bValue - aValue
-
-  return a.id.localeCompare(b.id)
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
@@ -8,7 +8,6 @@ import {
  type HarnessAdapterDescriptor,
  type HarnessAgent,
  type HarnessAgentHistoryPage,
-  type HarnessQueuedMessage,
  mapHarnessAgentToEntry,
 } from './agent-harness-types'
 import type { OpenClawStatus } from './useOpenClaw'
@@ -136,63 +135,6 @@ export function useCreateHarnessAgent() {
  })
 }

-/**
- * Apply a partial update to a harness agent. Used by the pin-toggle
- * star and (eventually) the inline rename UI. Optimistically writes
- * the patch into the listing query cache so the row updates instantly,
- * then rolls back if the server rejects the change.
- */
-export function useUpdateHarnessAgent() {
-  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: {
-      agentId: string
-      patch: { name?: string; pinned?: boolean }
-    }) => {
-      if (!baseUrl || urlLoading) {
-        throw new Error('BrowserOS agent server URL is not ready')
-      }
-      const data = await agentsFetch<{ agent: HarnessAgent }>(
-        baseUrl,
-        `/${encodeURIComponent(input.agentId)}`,
-        {
-          method: 'PATCH',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify(input.patch),
-        },
-      )
-      return data.agent
-    },
-    onMutate: async ({ agentId, patch }) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === agentId ? { ...agent, ...patch } : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
 export function useDeleteHarnessAgent() {
  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
  const queryClient = useQueryClient()
@@ -264,8 +206,6 @@ export interface HarnessActiveTurnInfo {
  lastSeq: number
  startedAt: number
  endedAt?: number
-  /** User message that kicked off the turn; null when not captured. */
-  prompt: string | null
 }

 /**
@@ -320,145 +260,3 @@ export async function fetchHarnessAgentHistory(
    `/${encodeURIComponent(agentId)}/sessions/main/history`,
  )
 }
-
-export interface EnqueueMessageInput {
-  message: string
-  attachments?: ReadonlyArray<unknown>
-}
-
-export async function enqueueHarnessMessage(
-  agentId: string,
-  input: EnqueueMessageInput,
-): Promise<HarnessQueuedMessage> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
-    {
-      method: 'POST',
-      headers: { 'Content-Type': 'application/json' },
-      body: JSON.stringify({
-        message: input.message,
-        ...(input.attachments && input.attachments.length > 0
-          ? { attachments: input.attachments }
-          : {}),
-      }),
-    },
-  )
-  if (!response.ok) {
-    let message = `Request failed with status ${response.status}`
-    try {
-      const body = (await response.json()) as { error?: string }
-      if (body.error) message = body.error
-    } catch {}
-    throw new Error(message)
-  }
-  const body = (await response.json()) as { queued: HarnessQueuedMessage }
-  return body.queued
-}
-
-export async function removeHarnessQueuedMessage(
-  agentId: string,
-  messageId: string,
-): Promise<{ removed: boolean }> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
-      messageId,
-    )}`,
-    { method: 'DELETE' },
-  )
-  if (!response.ok) return { removed: false }
-  return (await response.json()) as { removed: boolean }
-}
-
-/**
- * Optimistic enqueue: writes the new queued message into the listing
- * cache immediately so the queue panel reflects the change without
- * waiting for the next poll. Rolls back if the server rejects.
- */
-export function useEnqueueHarnessMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
-      enqueueHarnessMessage(input.agentId, input),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      const optimistic: HarnessQueuedMessage = {
-        id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
-        createdAt: Date.now(),
-        message: input.message,
-      }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
-/**
- * Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
- */
-export function useRemoveHarnessQueuedMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string; messageId: string }) =>
-      removeHarnessQueuedMessage(input.agentId, input.messageId),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? {
-                ...agent,
-                queue: (agent.queue ?? []).filter(
-                  (entry) => entry.id !== input.messageId,
-                ),
-              }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
@@ -1,8 +1,5 @@
 import { describe, expect, it } from 'bun:test'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-} from '@/entrypoints/app/agents/agent-harness-types'
+import type { HarnessAdapterDescriptor } from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import {
  buildSidepanelChatTargets,
@@ -80,96 +77,58 @@ const adapters: HarnessAdapterDescriptor[] = [
  },
 ]

-const agents: HarnessAgent[] = [
-  {
-    id: 'agent-codex',
-    name: 'Review Bot',
-    adapter: 'codex',
-    modelId: 'gpt-5.5',
-    reasoningEffort: 'medium',
-    permissionMode: 'approve-all',
-    sessionKey: 'agent:agent-codex:main',
-    createdAt: timestamp,
-    updatedAt: timestamp,
-  },
-  {
-    id: 'agent-openclaw',
-    name: 'Research Claw',
-    adapter: 'openclaw',
-    modelId: 'default',
-    reasoningEffort: 'high',
-    permissionMode: 'approve-all',
-    sessionKey: 'agent:agent-openclaw:main',
-    createdAt: timestamp,
-    updatedAt: timestamp,
-  },
-]
-
 describe('buildSidepanelChatTargets', () => {
-  it('returns LLM targets plus one ACP target per persisted harness agent', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+  it('returns LLM targets plus one ACP target per adapter model', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })

    expect(targets.map((target) => target.id)).toEqual([
      'browseros',
      'anthropic-sonnet',
-      'agent-codex',
-      'agent-openclaw',
+      'acp:claude:sonnet:medium',
+      'acp:claude:haiku:medium',
+      'acp:codex:gpt-5.5:medium',
+      'acp:openclaw:default:medium',
    ])
  })

-  it('does not emit catalog-only ACP targets without persisted agents', () => {
-    const targets = buildSidepanelChatTargets({
-      providers,
-      adapters,
-      agents: [],
-    })
-
-    expect(targets.map((target) => target.id)).toEqual([
-      'browseros',
-      'anthropic-sonnet',
-    ])
-  })
-
-  it('uses the created OpenClaw agent name instead of a generic adapter target', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const openclaw = targets.find((target) => target.id === 'agent-openclaw')
+  it('emits a single default ACP target for adapters with no per-session model picker', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const openclaw = targets.find(
+      (target) => target.id === 'acp:openclaw:default:medium',
+    )

    expect(openclaw).toMatchObject({
      kind: 'acp',
-      id: 'agent-openclaw',
-      agentId: 'agent-openclaw',
      adapter: 'openclaw',
      adapterName: 'OpenClaw',
      modelId: 'default',
      modelLabel: 'default',
-      name: 'Research Claw',
+      // Without a model picker, the target name is just the adapter
+      // name — the user picks the adapter, not a model under it.
+      name: 'OpenClaw',
      modelControl: 'best-effort',
-      reasoningEffort: 'high',
+      reasoningEffort: 'medium',
    })
  })

-  it('preserves adapter metadata for created agent targets', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const codex = targets.find((target) => target.id === 'agent-codex')
+  it('preserves ACP model-control and recommendation metadata', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const haiku = targets.find(
+      (target) => target.id === 'acp:claude:haiku:medium',
+    )

-    expect(codex).toMatchObject({
+    expect(haiku).toMatchObject({
      kind: 'acp',
-      agentId: 'agent-codex',
-      adapter: 'codex',
-      adapterName: 'Codex',
-      modelId: 'gpt-5.5',
-      modelLabel: 'GPT-5.5',
-      modelControl: 'runtime-supported',
+      adapter: 'claude',
+      modelId: 'haiku',
+      modelControl: 'best-effort',
      recommended: true,
      reasoningEffort: 'medium',
-      reasoningEffortLabel: 'Medium',
    })
  })

-  it('still returns LLM targets when agents and adapters are unavailable', () => {
-    expect(
-      buildSidepanelChatTargets({ providers, adapters: [], agents: [] }),
-    ).toEqual([
+  it('still returns LLM targets when ACP adapters are unavailable', () => {
+    expect(buildSidepanelChatTargets({ providers, adapters: [] })).toEqual([
      {
        kind: 'llm',
        id: 'browseros',
@@ -190,7 +149,7 @@ describe('buildSidepanelChatTargets', () => {

 describe('resolveSidepanelChatTarget', () => {
  it('resolves selected LLM targets back to their provider config', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+    const targets = buildSidepanelChatTargets({ providers, adapters })
    const resolved = resolveSidepanelChatTarget({
      targets,
      defaultProviderId: 'browseros',
@@ -202,32 +161,13 @@ describe('resolveSidepanelChatTarget', () => {
  })

  it('falls back to the current default LLM provider when a persisted ACP target is stale', () => {
-    const targets = buildSidepanelChatTargets({
-      providers,
-      adapters,
-      agents: [],
-    })
+    const targets = buildSidepanelChatTargets({ providers, adapters: [] })

    expect(
      resolveSidepanelChatTarget({
        targets,
        defaultProviderId: 'anthropic-sonnet',
-        selection: { kind: 'acp', id: 'agent-codex' },
-      }),
-    ).toMatchObject({
-      kind: 'llm',
-      id: 'anthropic-sonnet',
-    })
-  })
-
-  it('falls back when an old catalog-style ACP target id is persisted', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-
-    expect(
-      resolveSidepanelChatTarget({
-        targets,
-        defaultProviderId: 'anthropic-sonnet',
-        selection: { kind: 'acp', id: 'acp:codex:gpt-5.5:medium' },
+        selection: { kind: 'acp', id: 'acp:claude:haiku:medium' },
      }),
    ).toMatchObject({
      kind: 'llm',
@@ -240,8 +180,10 @@ describe('persistSidepanelChatTargetSelection', () => {
  it('stores only target identity and does not mutate LLM provider arrays', async () => {
    let savedSelection: SidepanelChatTargetSelection | null = null
    const originalProviders = providers.map((provider) => ({ ...provider }))
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const target = targets.find((candidate) => candidate.id === 'agent-codex')
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const target = targets.find(
+      (candidate) => candidate.id === 'acp:codex:gpt-5.5:medium',
+    )

    await persistSidepanelChatTargetSelection(target, {
      setValue: async (value) => {
@@ -251,7 +193,7 @@ describe('persistSidepanelChatTargetSelection', () => {

    expect(savedSelection as SidepanelChatTargetSelection | null).toEqual({
      kind: 'acp',
-      id: 'agent-codex',
+      id: 'acp:codex:gpt-5.5:medium',
    })
    expect(providers).toEqual(originalProviders)
  })
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
@@ -1,6 +1,5 @@
 import type {
  HarnessAdapterDescriptor,
-  HarnessAgent,
  HarnessAgentAdapter,
 } from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
@@ -20,7 +19,6 @@ export type SidepanelChatTarget =
      id: string
      name: string
      type: 'acp'
-      agentId: string
      adapter: HarnessAgentAdapter
      adapterName: string
      modelId: string
@@ -39,7 +37,6 @@ export type SidepanelChatTargetSelection = Pick<
 interface BuildSidepanelChatTargetsInput {
  providers: LlmProviderConfig[]
  adapters: HarnessAdapterDescriptor[]
-  agents?: HarnessAgent[]
 }

 interface ResolveSidepanelChatTargetInput {
@@ -66,49 +63,61 @@ let sidepanelChatTargetSelectionStorage:
 export function buildSidepanelChatTargets({
  providers,
  adapters,
-  agents = [],
 }: BuildSidepanelChatTargetsInput): SidepanelChatTarget[] {
  return [
    ...providers.map(toLlmTarget),
-    ...agents.map((agent) => toAcpTargetForAgent(agent, adapters)),
+    ...adapters.flatMap(toAcpTargetsForAdapter),
  ]
 }

-function toAcpTargetForAgent(
-  agent: HarnessAgent,
-  adapters: HarnessAdapterDescriptor[],
-): SidepanelChatTarget {
-  const adapter = adapters.find((entry) => entry.id === agent.adapter)
-  const modelId = agent.modelId ?? adapter?.defaultModelId ?? 'default'
-  const reasoningEffort =
-    agent.reasoningEffort ?? adapter?.defaultReasoningEffort ?? 'medium'
-  const model = adapter?.models.find((entry) => entry.id === modelId)
-  const reasoning = adapter?.reasoningEfforts.find(
-    (effort) => effort.id === reasoningEffort,
+function toAcpTargetsForAdapter(
+  adapter: HarnessAdapterDescriptor,
+): SidepanelChatTarget[] {
+  const reasoning = adapter.reasoningEfforts.find(
+    (effort) => effort.id === adapter.defaultReasoningEffort,
  )
+  const reasoningEffort =
+    reasoning?.id ?? adapter.defaultReasoningEffort ?? 'medium'

-  return {
-    kind: 'acp',
-    id: agent.id,
-    name: agent.name,
-    type: 'acp',
-    agentId: agent.id,
-    adapter: agent.adapter,
-    adapterName: adapter?.name ?? formatAdapterName(agent.adapter),
-    modelId,
-    modelLabel: model?.label ?? modelId,
-    modelControl: adapter?.modelControl ?? 'best-effort',
-    recommended: model?.recommended,
+  // Adapters with no per-session model picker (e.g. OpenClaw, whose
+  // model lives on the gateway-side agent record) still need exactly
+  // one sidepanel target so the user can pick the adapter at all.
+  if (adapter.models.length === 0) {
+    return [
+      {
+        kind: 'acp',
+        id: buildAcpTargetId(
+          adapter.id,
+          adapter.defaultModelId,
+          reasoningEffort,
+        ),
+        name: adapter.name,
+        type: 'acp',
+        adapter: adapter.id,
+        adapterName: adapter.name,
+        modelId: adapter.defaultModelId,
+        modelLabel: 'default',
+        modelControl: adapter.modelControl,
+        reasoningEffort,
+        reasoningEffortLabel: reasoning?.label,
+      },
+    ]
+  }
+
+  return adapter.models.map((model) => ({
+    kind: 'acp' as const,
+    id: buildAcpTargetId(adapter.id, model.id, reasoningEffort),
+    name: `${adapter.name} ${model.label}`,
+    type: 'acp' as const,
+    adapter: adapter.id,
+    adapterName: adapter.name,
+    modelId: model.id,
+    modelLabel: model.label,
+    modelControl: adapter.modelControl,
+    recommended: model.recommended,
    reasoningEffort,
    reasoningEffortLabel: reasoning?.label,
-  }
-}
-
-function formatAdapterName(adapter: HarnessAgentAdapter): string {
-  if (adapter === 'claude') return 'Claude Code'
-  if (adapter === 'codex') return 'Codex'
-  if (adapter === 'openclaw') return 'OpenClaw'
-  return adapter
+  }))
 }

 export function resolveSidepanelChatTarget({
@@ -163,6 +172,14 @@ function toLlmTarget(provider: LlmProviderConfig): SidepanelChatTarget {
  }
 }

+export function buildAcpTargetId(
+  adapter: HarnessAgentAdapter,
+  modelId: string,
+  reasoningEffort: string,
+): string {
+  return `acp:${adapter}:${modelId}:${reasoningEffort}`
+}
+
 async function getSidepanelChatTargetSelectionStorage(): Promise<SidepanelChatTargetSelectionStore> {
  if (sidepanelChatTargetSelectionStorage) {
    return sidepanelChatTargetSelectionStorage
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
@@ -1,9 +1,6 @@
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
 import useDeepCompareEffect from 'use-deep-compare-effect'
-import {
-  useAgentAdapters,
-  useHarnessAgents,
-} from '@/entrypoints/app/agents/useAgents'
+import { useAgentAdapters } from '@/entrypoints/app/agents/useAgents'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
 import { type McpServer, useMcpServers } from '@/lib/mcp/mcpServerStorage'
@@ -41,7 +38,6 @@ export const useChatRefs = () => {
    isLoading: isLoadingProviders,
  } = useLlmProviders()
  const { adapters, loading: isLoadingAdapters } = useAgentAdapters()
-  const { harnessAgents, loading: isLoadingAgents } = useHarnessAgents()
  const { personalization } = usePersonalization()
  const [targetSelection, setTargetSelection] =
    useState<SidepanelChatTargetSelection | null>(null)
@@ -61,9 +57,8 @@ export const useChatRefs = () => {
      buildSidepanelChatTargets({
        providers: llmProviders,
        adapters,
-        agents: harnessAgents,
      }),
-    [llmProviders, adapters, harnessAgents],
+    [llmProviders, adapters],
  )

  const selectedChatTarget = useMemo(
@@ -121,7 +116,6 @@ export const useChatRefs = () => {
    selectedChatTarget,
    selectChatTarget,
    selectedLlmProvider,
-    isLoadingProviders:
-      isLoadingProviders || isLoadingAdapters || isLoadingAgents,
+    isLoadingProviders: isLoadingProviders || isLoadingAdapters,
  }
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
@@ -40,7 +40,7 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
    })
  })

-  it('sends created-agent targets to the agent-id sidepanel route', () => {
+  it('sends ACP targets to the sidepanel ACP route with explicit target fields', () => {
    const request = buildSidepanelPreparedSendMessagesRequest({
      agentServerUrl: 'http://127.0.0.1:5151',
      target: acpTarget,
@@ -52,11 +52,12 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
      ...commonRequestInput(),
    })

-    expect(request.api).toBe(
-      'http://127.0.0.1:5151/agents/agent-codex/sidepanel/chat',
-    )
+    expect(request.api).toBe('http://127.0.0.1:5151/agents/sidepanel/chat')
    expect(request.body).toEqual({
      conversationId,
+      adapter: 'codex',
+      modelId: 'gpt-5.5',
+      reasoningEffort: 'medium',
      message: 'Inspect the current tab',
      browserContext: {
        activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
@@ -139,10 +140,9 @@ const llmTarget: SidepanelChatTarget = {

 const acpTarget: SidepanelChatTarget = {
  kind: 'acp',
-  id: 'agent-codex',
-  name: 'Review bot',
+  id: 'acp:codex:gpt-5.5:medium',
+  name: 'Codex GPT-5.5',
  type: 'acp',
-  agentId: 'agent-codex',
  adapter: 'codex',
  adapterName: 'Codex',
  modelId: 'gpt-5.5',
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
@@ -680,20 +680,13 @@ export const useChatSession = (options?: ChatSessionOptions) => {
  const sendMessage = (params: { text: string; action?: ChatAction }) => {
    const target = selectedChatTargetRef.current
    const llmTargetProvider = toLlmProviderConfig(target)
-    const agentTarget = target?.kind === 'acp' ? target : undefined
    track(MESSAGE_SENT_EVENT, {
      mode,
-      provider_id:
-        agentTarget?.agentId ??
-        llmTargetProvider?.id ??
-        selectedLlmProvider?.id,
-      provider_type: agentTarget ? 'acp' : llmTargetProvider?.type,
-      agent_id: agentTarget?.agentId,
-      adapter: agentTarget?.adapter,
+      provider_type: target?.kind === 'acp' ? 'acp' : llmTargetProvider?.type,
      model:
-        agentTarget?.modelId ??
-        llmTargetProvider?.modelId ??
-        selectedLlmProvider?.modelId,
+        target?.kind === 'acp'
+          ? target.modelId
+          : llmTargetProvider?.modelId || selectedLlmProvider?.modelId,
    })

    if (!isIntegrationsSyncedRef.current) {
@@ -770,8 +763,6 @@ export const useChatSession = (options?: ChatSessionOptions) => {
      provider_type: target.kind === 'acp' ? 'acp' : target.type,
      model_id:
        target.kind === 'acp' ? target.modelId : target.provider.modelId,
-      agent_id: target.kind === 'acp' ? target.agentId : undefined,
-      adapter: target.kind === 'acp' ? target.adapter : undefined,
    })

    void selectChatTarget(target).catch((error) => {
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
@@ -34,10 +34,15 @@ export function buildSidepanelPreparedSendMessagesRequest({
  ...common
 }: BuildSidepanelPreparedSendMessagesRequestInput) {
  if (target?.kind === 'acp') {
+    // ACP session history is owned by AcpxRuntime through sessionKey, so LLM-only
+    // resume and approval fields are intentionally not forwarded.
    return {
-      api: `${agentServerUrl}/agents/${encodeURIComponent(target.agentId)}/sidepanel/chat`,
+      api: `${agentServerUrl}/agents/sidepanel/chat`,
      body: {
        conversationId: common.conversationId,
+        adapter: target.adapter,
+        modelId: target.modelId,
+        reasoningEffort: target.reasoningEffort,
        message: message ?? '',
        browserContext: common.browserContext,
        userSystemPrompt: common.userSystemPrompt,
@@ -66,9 +71,6 @@ export function toProviderOption(target: SidepanelChatTarget): Provider {
    name: target.name,
    type: target.type,
    kind: target.kind,
-    agentId: target.kind === 'acp' ? target.agentId : undefined,
-    adapterName: target.kind === 'acp' ? target.adapterName : undefined,
-    modelLabel: target.kind === 'acp' ? target.modelLabel : undefined,
    modelControl: target.kind === 'acp' ? target.modelControl : undefined,
  }
 }
--- a/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
+++ b/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
@@ -59,3 +59,15 @@ export interface AgentConversation {
  createdAt: number
  updatedAt: number
 }
+
+export interface AgentCardData {
+  agentId: string
+  name: string
+  model?: string
+  status: 'idle' | 'working' | 'error'
+  lastMessage?: string
+  lastMessageTimestamp?: number
+  activitySummary?: string
+  currentTool?: string
+  costUsd?: number
+}
--- a/packages/browseros-agent/apps/agent/package.json
+++ b/packages/browseros-agent/apps/agent/package.json
@@ -9,7 +9,6 @@
    "build": "bun run codegen && wxt build",
    "build:dev": "bun --env-file=.env.development wxt build --mode development",
    "zip": "wxt zip",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
    "compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
    "lint": "bunx biome check",
    "typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
--- a/packages/browseros-agent/apps/cli/README.md
+++ b/packages/browseros-agent/apps/cli/README.md
@@ -38,8 +38,8 @@ browseros-cli install                # downloads BrowserOS for your platform
 # If BrowserOS is installed but not running
 browseros-cli launch                 # opens BrowserOS, waits for server

-# Configure the CLI with the Server URL from BrowserOS settings
-browseros-cli init http://127.0.0.1:9000/mcp
+# Configure the CLI (auto-discovers running BrowserOS)
+browseros-cli init --auto            # detects server URL and saves config

 # Verify connection
 browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url>             # non-interactive — pass URL directly
 browseros-cli init                   # interactive — prompts for URL
 ```

-Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
+Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).

 ### CLI updates

@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
 | `--debug` | `BOS_DEBUG=1` | Debug output |
 | `--timeout, -t` | | Request timeout (default: 2m) |

-Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
+Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file

-If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
+If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.

 ## Testing

@@ -179,7 +179,7 @@ apps/cli/
 │   └── config.go       # Config file (~/.config/browseros-cli/config.yaml)
 ├── cmd/
 │   ├── root.go         # Root command, global flags
-│   ├── init.go         # Server URL configuration (URL arg or interactive)
+│   ├── init.go         # Server URL configuration (URL arg, --auto, interactive)
 │   ├── install.go      # install (download BrowserOS for current platform)
 │   ├── launch.go       # launch (find and start BrowserOS, wait for server)
 │   ├── open.go         # open (new_page / new_hidden_page)
--- a/packages/browseros-agent/apps/cli/cmd/init.go
+++ b/packages/browseros-agent/apps/cli/cmd/init.go
@@ -17,6 +17,8 @@ import (
 )

 func init() {
+	var autoDiscover bool
+
 	cmd := &cobra.Command{
 		Use:   "init [url]",
 		Short: "Configure the BrowserOS server connection",
@@ -32,8 +34,9 @@ You can provide the full URL or just the port number:
  browseros-cli init http://127.0.0.1:9000/mcp
  browseros-cli init 9000

-Modes:
+Three modes:
  browseros-cli init <url>    Non-interactive (full URL or port number)
+  browseros-cli init --auto   Auto-discover from ~/.browseros/server.json
  browseros-cli init          Interactive prompt`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.MaximumNArgs(1),
@@ -46,9 +49,22 @@ Modes:

 			switch {
 			case len(args) == 1:
+				// Non-interactive: URL provided as argument
 				input = args[0]

+			case autoDiscover:
+				// Auto-discover: server.json → config → probe common ports
+				discovered := probeRunningServer()
+				if discovered == "" {
+					output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
+						"  If not running:    browseros-cli launch\n"+
+						"  If not installed:  browseros-cli install", 1)
+				}
+				input = discovered
+				fmt.Printf("Auto-discovered server at %s\n", input)
+
 			default:
+				// Interactive prompt (original behavior)
 				fmt.Println()
 				bold.Println("BrowserOS CLI Setup")
 				fmt.Println()
@@ -79,14 +95,12 @@ Modes:
 				output.Errorf(1, "invalid URL: %s", input)
 			}

+			// Verify connectivity
 			fmt.Printf("Checking connection to %s ...\n", baseURL)
 			client := &http.Client{Timeout: 5 * time.Second}
 			resp, err := client.Get(baseURL + "/health")
 			if err != nil {
-				output.Errorf(1, "cannot connect to %s: %v\n\n"+
-					"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
-					"Then run: browseros-cli init <Server URL>\n"+
-					"Example:  browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
+				output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
 			}
 			resp.Body.Close()

@@ -107,5 +121,6 @@ Modes:
 		},
 	}

+	cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
 	rootCmd.AddCommand(cmd)
 }
--- a/packages/browseros-agent/apps/cli/cmd/install.go
+++ b/packages/browseros-agent/apps/cli/cmd/install.go
@@ -28,7 +28,7 @@ Linux:   Downloads AppImage (or .deb with --deb flag)

 After installation:
  browseros-cli launch        # start BrowserOS
-  browseros-cli init <url>    # configure the CLI with the Server URL`,
+  browseros-cli init --auto   # configure the CLI`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.NoArgs,
 		Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
 			fmt.Println()
 			bold.Println("Next steps:")
 			dim.Println("  browseros-cli launch        # start BrowserOS")
-			dim.Println("  browseros-cli init <url>    # use the Server URL from BrowserOS settings")
+			dim.Println("  browseros-cli init --auto   # configure the CLI")
 		},
 	}

--- a/packages/browseros-agent/apps/cli/cmd/launch.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch.go
@@ -1,7 +1,6 @@
 package cmd

 import (
-	"encoding/json"
 	"fmt"
 	"net/http"
 	"os"
@@ -39,7 +38,6 @@ If BrowserOS is already running, reports the server URL.`,

 			if url := probeRunningServer(); url != "" {
 				green.Printf("BrowserOS is already running at %s\n", url)
-				dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 				return
 			}

@@ -65,7 +63,7 @@ If BrowserOS is already running, reports the server URL.`,

 			green.Printf("BrowserOS is ready at %s\n", url)
 			fmt.Println()
-			dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
+			dim.Println("Next: browseros-cli init --auto")
 		},
 	}

@@ -77,77 +75,39 @@ If BrowserOS is already running, reports the server URL.`,
 // Server probing
 // ---------------------------------------------------------------------------

-var commonBrowserOSPorts = []int{9100, 9200, 9300}
-
-// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
+// probeRunningServer checks server.json, config, and common ports for a running server.
 func probeRunningServer() string {
-	client := &http.Client{Timeout: 2 * time.Second}
+	check := func(baseURL string) bool {
+		client := &http.Client{Timeout: 2 * time.Second}
+		resp, err := client.Get(baseURL + "/health")
+		if err != nil {
+			return false
+		}
+		resp.Body.Close()
+		return resp.StatusCode == 200
+	}

-	if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
+	// 1. server.json — written by BrowserOS on startup with the actual port
+	if url := loadBrowserosServerURL(); url != "" && check(url) {
 		return url
 	}

-	if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
+	// 2. Saved config / env var
+	if url := defaultServerURL(); url != "" && check(url) {
 		return url
 	}

-	return probeCommonServerPorts(client)
-}
-
-func checkServerHealth(client *http.Client, baseURL string) bool {
-	resp, err := client.Get(baseURL + "/health")
-	if err != nil {
-		return false
-	}
-	resp.Body.Close()
-	return resp.StatusCode == 200
-}
-
-func probeCommonServerPorts(client *http.Client) string {
-	for _, port := range commonBrowserOSPorts {
+	// 3. Probe common BrowserOS ports as last resort
+	for _, port := range []int{9100, 9200, 9300} {
 		url := fmt.Sprintf("http://127.0.0.1:%d", port)
-		if checkServerHealth(client, url) {
+		if check(url) {
 			return url
 		}
 	}
+
 	return ""
 }

-type serverDiscoveryConfig struct {
-	ServerPort       int    `json:"server_port"`
-	URL              string `json:"url"`
-	ServerVersion    string `json:"server_version"`
-	BrowserOSVersion string `json:"browseros_version,omitempty"`
-	ChromiumVersion  string `json:"chromium_version,omitempty"`
-}
-
-// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
-//
-// Normal command resolution must not call this because it can override a URL the
-// user explicitly saved with `browseros-cli init <Server URL>`.
-func loadBrowserosServerURL() string {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return ""
-	}
-
-	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
-	if err != nil {
-		return ""
-	}
-
-	var sc serverDiscoveryConfig
-	if err := json.Unmarshal(data, &sc); err != nil {
-		return ""
-	}
-
-	return normalizeServerURL(sc.URL)
-}
-
-func mcpEndpointURL(baseURL string) string {
-	return strings.TrimSuffix(baseURL, "/") + "/mcp"
-}
-
 // ---------------------------------------------------------------------------
 // Platform-native installation detection
 // ---------------------------------------------------------------------------
@@ -157,8 +117,7 @@ func mcpEndpointURL(baseURL string) string {
 // macOS:   `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
 // Linux:   checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
 // Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
-//
-//	and registry uninstall key (per-user Chromium install pattern)
+//          and registry uninstall key (per-user Chromium install pattern)
 func isBrowserOSInstalled() bool {
 	switch runtime.GOOS {
 	case "darwin":
@@ -312,11 +271,14 @@ func waitForServer(maxWait time.Duration) (string, bool) {

 	for time.Now().Before(deadline) {
 		// server.json is written by BrowserOS on startup with the actual port
-		if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
-			return url, true
-		}
-		if url := probeCommonServerPorts(client); url != "" {
-			return url, true
+		if url := loadBrowserosServerURL(); url != "" {
+			resp, err := client.Get(url + "/health")
+			if err == nil {
+				resp.Body.Close()
+				if resp.StatusCode == 200 {
+					return url, true
+				}
+			}
 		}
 		fmt.Print(".")
 		time.Sleep(1 * time.Second)
--- a/packages/browseros-agent/apps/cli/cmd/launch_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch_test.go
@@ -1,99 +0,0 @@
-package cmd
-
-import (
-	"fmt"
-	"net"
-	"net/http"
-	"net/http/httptest"
-	"net/url"
-	"os"
-	"path/filepath"
-	"strconv"
-	"testing"
-	"time"
-
-	"browseros-cli/config"
-)
-
-func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	discoveredServer := newHealthyServer(t)
-	configServer := newHealthyServer(t)
-
-	serverDir := filepath.Join(home, ".browseros")
-	if err := os.MkdirAll(serverDir, 0755); err != nil {
-		t.Fatalf("os.MkdirAll() error = %v", err)
-	}
-	data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
-	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
-		t.Fatalf("os.WriteFile() error = %v", err)
-	}
-	if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := probeRunningServer()
-	if got != normalizeServerURL(discoveredServer.URL) {
-		t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
-	}
-}
-
-func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-
-	server := newHealthyServer(t)
-	port := serverPort(t, server.URL)
-
-	originalPorts := commonBrowserOSPorts
-	commonBrowserOSPorts = []int{port}
-	t.Cleanup(func() {
-		commonBrowserOSPorts = originalPorts
-	})
-
-	got, ok := waitForServer(100 * time.Millisecond)
-	if !ok {
-		t.Fatal("waitForServer() ok = false, want true")
-	}
-	if got != normalizeServerURL(server.URL) {
-		t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
-	}
-}
-
-func newHealthyServer(t *testing.T) *httptest.Server {
-	t.Helper()
-
-	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		if r.URL.Path != "/health" {
-			http.NotFound(w, r)
-			return
-		}
-		w.WriteHeader(http.StatusOK)
-	}))
-	t.Cleanup(server.Close)
-	return server
-}
-
-func serverPort(t *testing.T, rawURL string) int {
-	t.Helper()
-
-	parsed, err := url.Parse(rawURL)
-	if err != nil {
-		t.Fatalf("url.Parse() error = %v", err)
-	}
-	_, portText, err := net.SplitHostPort(parsed.Host)
-	if err != nil {
-		t.Fatalf("net.SplitHostPort() error = %v", err)
-	}
-	port, err := strconv.Atoi(portText)
-	if err != nil {
-		t.Fatalf("strconv.Atoi() error = %v", err)
-	}
-	return port
-}
--- a/packages/browseros-agent/apps/cli/cmd/root.go
+++ b/packages/browseros-agent/apps/cli/cmd/root.go
@@ -2,8 +2,10 @@ package cmd

 import (
 	"context"
+	"encoding/json"
 	"fmt"
 	"os"
+	"path/filepath"
 	"strconv"
 	"strings"
 	"time"
@@ -287,15 +289,18 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
 	}
 }

-// defaultServerURL returns the implicit target from user-controlled settings only.
-//
-// BrowserOS writes a discovery file at runtime, but normal commands intentionally
-// ignore it so a saved URL is not silently overridden by another running server.
 func defaultServerURL() string {
+	// 1. Explicit env var always wins
 	if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
 		return env
 	}

+	// 2. Live discovery file from running BrowserOS (most current)
+	if url := loadBrowserosServerURL(); url != "" {
+		return url
+	}
+
+	// 3. Saved config (may be stale if port changed)
 	cfg, err := config.Load()
 	if err == nil {
 		if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -306,6 +311,33 @@ func defaultServerURL() string {
 	return ""
 }

+type serverDiscoveryConfig struct {
+	ServerPort       int    `json:"server_port"`
+	URL              string `json:"url"`
+	ServerVersion    string `json:"server_version"`
+	BrowserOSVersion string `json:"browseros_version,omitempty"`
+	ChromiumVersion  string `json:"chromium_version,omitempty"`
+}
+
+func loadBrowserosServerURL() string {
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return ""
+	}
+
+	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
+	if err != nil {
+		return ""
+	}
+
+	var sc serverDiscoveryConfig
+	if err := json.Unmarshal(data, &sc); err != nil {
+		return ""
+	}
+
+	return normalizeServerURL(sc.URL)
+}
+
 func normalizeServerURL(raw string) string {
 	normalized := strings.TrimSpace(raw)

@@ -337,10 +369,8 @@ func validateServerURL(raw string) (string, error) {

 	return "", fmt.Errorf(
 		"BrowserOS server URL is not configured.\n\n" +
-			"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
-			"  Save it with:       browseros-cli init <Server URL>\n" +
-			"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
-			"  If BrowserOS is closed:  browseros-cli launch\n" +
-			"  If not installed:        browseros-cli install",
+			"  If BrowserOS is running:  browseros-cli init --auto\n" +
+			"  If BrowserOS is closed:   browseros-cli launch\n" +
+			"  If not installed:         browseros-cli install",
 	)
 }
--- a/packages/browseros-agent/apps/cli/cmd/root_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/root_test.go
@@ -1,13 +1,8 @@
 package cmd

 import (
-	"os"
-	"path/filepath"
-	"strings"
 	"testing"
 	"time"
-
-	"browseros-cli/config"
 )

 func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -105,76 +100,6 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
 	}
 }

-func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
-
-	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := defaultServerURL()
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := defaultServerURL()
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	serverDir := filepath.Join(home, ".browseros")
-	if err := os.MkdirAll(serverDir, 0755); err != nil {
-		t.Fatalf("os.MkdirAll() error = %v", err)
-	}
-	data := []byte(`{"url":"http://127.0.0.1:9999"}`)
-	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
-		t.Fatalf("os.WriteFile() error = %v", err)
-	}
-
-	if got := defaultServerURL(); got != "" {
-		t.Fatalf("defaultServerURL() = %q, want empty", got)
-	}
-}
-
-func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
-	got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestValidateServerURLExplainsManualInit(t *testing.T) {
-	_, err := validateServerURL("")
-	if err == nil {
-		t.Fatal("validateServerURL() error = nil, want setup instructions")
-	}
-	msg := err.Error()
-	if !strings.Contains(msg, "browseros-cli init <Server URL>") {
-		t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
-	}
-	if strings.Contains(msg, "init --auto") {
-		t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
-	}
-}
-
 func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
 	done := make(chan struct{})
 	returned := make(chan struct{})
--- a/packages/browseros-agent/apps/cli/mcp/client.go
+++ b/packages/browseros-agent/apps/cli/mcp/client.go
@@ -44,7 +44,10 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {

 	session, err := sdkClient.Connect(ctx, transport, nil)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
+			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
+			"  If BrowserOS is not running:                  browseros-cli launch\n"+
+			"  If not installed:                             browseros-cli install", c.BaseURL, err)
 	}
 	return session, nil
 }
@@ -184,7 +187,10 @@ func (c *Client) Status() (map[string]any, error) {
 func (c *Client) restGET(path string) (map[string]any, error) {
 	resp, err := c.HTTPClient.Get(c.BaseURL + path)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
+			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
+			"  If BrowserOS is not running:                  browseros-cli launch\n"+
+			"  If not installed:                             browseros-cli install", c.BaseURL, err)
 	}
 	defer resp.Body.Close()

@@ -199,14 +205,3 @@ func (c *Client) restGET(path string) (map[string]any, error) {
 	}
 	return data, nil
 }
-
-// connectionSetupInstructions explains how to recover from a stale or missing server URL.
-func connectionSetupInstructions() string {
-	return "\n\n" +
-		"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
-		"  Save it with:       browseros-cli init <Server URL>\n" +
-		"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
-		"  Run once with:      browseros-cli --server <Server URL> health\n" +
-		"  If BrowserOS is closed:  browseros-cli launch\n" +
-		"  If not installed:        browseros-cli install"
-}
--- a/packages/browseros-agent/apps/cli/npm/README.md
+++ b/packages/browseros-agent/apps/cli/npm/README.md
@@ -31,8 +31,8 @@ browseros-cli install
 # Start BrowserOS
 browseros-cli launch

-# Configure MCP settings with the Server URL from BrowserOS settings
-browseros-cli init http://127.0.0.1:9000/mcp
+# Auto-configure MCP settings for your AI tools
+browseros-cli init --auto

 # Verify everything is working
 browseros-cli health
--- a/packages/browseros-agent/apps/eval/.env.example
+++ b/packages/browseros-agent/apps/eval/.env.example
@@ -1,51 +0,0 @@
-# Copy to .env.development for local eval runs.
-
-# Provider keys used by existing config files.
-OPENROUTER_API_KEY=
-FIREWORKS_API_KEY=
-ANTHROPIC_API_KEY=
-OPENAI_API_KEY=
-GOOGLE_GENERATIVE_AI_API_KEY=
-
-# Claude Agent SDK token used by performance_grader.
-CLAUDE_CODE_OAUTH_TOKEN=
-
-# Suite-mode model selection.
-EVAL_VARIANT=local
-EVAL_AGENT_PROVIDER=openai-compatible
-EVAL_AGENT_MODEL=
-EVAL_AGENT_API_KEY=
-EVAL_AGENT_BASE_URL=
-EVAL_AGENT_SUPPORTS_IMAGES=true
-
-# Optional suite-mode executor override for orchestrator suites.
-EVAL_EXECUTOR_MODEL=
-EVAL_EXECUTOR_API_KEY=
-EVAL_EXECUTOR_BASE_URL=
-
-# Clado visual action executor.
-CLADO_ACTION_MODEL=
-CLADO_ACTION_API_KEY=
-CLADO_ACTION_BASE_URL=
-# Backward-compatible alias used by older local scripts.
-CLADO_ACTION_URL=
-
-# BrowserOS runner.
-BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
-BROWSEROS_SERVER_URL=http://127.0.0.1:9110
-BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
-BROWSEROS_CONFIG_URL=
-
-# Captcha solver extension.
-NOPECHA_API_KEY=
-
-# WebArena-Infinity.
-WEBARENA_INFINITY_DIR=
-INFINITY_APP_URL=
-
-# R2 publishing and weekly report.
-EVAL_R2_ACCOUNT_ID=
-EVAL_R2_ACCESS_KEY_ID=
-EVAL_R2_SECRET_ACCESS_KEY=
-EVAL_R2_BUCKET=browseros-eval
-EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
--- a/packages/browseros-agent/apps/eval/README.md
+++ b/packages/browseros-agent/apps/eval/README.md
@@ -9,13 +9,11 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
 - **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
 - **Bun** runtime
 - **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.

 ## Quick Start

 ```bash
 cd apps/eval
-cp .env.example .env.development
 # Edit .env.development with your keys, then:
 bun run eval
 ```
@@ -25,62 +23,17 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
 ### CLI mode

 ```bash
-bun run eval -c configs/legacy/browseros-agent-weekly.json
-bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
+bun run eval -c configs/browseros-agent-weekly.json
 ```

 Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.

-The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
-
-```bash
-bun run eval run --config configs/legacy/browseros-agent-weekly.json
-bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
-bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
-bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
-```
-
-Config files live in two groups:
-
-```txt
-configs/legacy/  # Complete EvalConfig files used by older workflows and the dashboard
-configs/suites/  # Suite definitions; model/provider comes from CLI flags or env
-```
-
-Suite mode takes model settings from CLI flags first, then env:
-
-```bash
-EVAL_VARIANT=kimi-fireworks \
-EVAL_AGENT_PROVIDER=openai-compatible \
-EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
-EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
-EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
-bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
-```
-
-### Suites and variants
-
-A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
-
-A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
-
-This lets us run the same suite against multiple model setups without copying the benchmark config:
-
-```txt
-agisdk-daily-10 + kimi-fireworks
-agisdk-daily-10 + claude-opus
-agisdk-daily-10 + clado-action-000159
-```
-
-For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
-
 ## Agent types

 | Type | Description |
 |------|-------------|
 | `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
 | `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
-| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |

 ### Single agent

@@ -121,24 +74,6 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
 }
 ```

-### Claude Code
-
-Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
-
-```json
-{
-  "agent": {
-    "type": "claude-code",
-    "model": "opus"
-  }
-}
-```
-
-```bash
-BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
-bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
-```
-
 ## Graders

 | Name | Description |
@@ -161,21 +96,6 @@ The `apiKey` field supports two formats:
 - **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
 - **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)

-### Environment variables
-
-| Variable | Used for |
-|----------|----------|
-| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
-| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
-| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
-| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
-| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
-| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
-| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
-| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
-| `NOPECHA_API_KEY` | CAPTCHA solver extension |
-| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
-
 ### Supported providers

 | Provider | `provider` value | Requires `baseUrl` |
@@ -190,22 +110,6 @@ The `apiKey` field supports two formats:
 | Ollama | `ollama` | No |
 | Clado Action (executor only) | `clado-action` | Yes |

-### R2 publishing
-
-`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
-
-```bash
-export EVAL_R2_ACCOUNT_ID=...
-export EVAL_R2_ACCESS_KEY_ID=...
-export EVAL_R2_SECRET_ACCESS_KEY=...
-export EVAL_R2_BUCKET=browseros-eval
-export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
-```
-
-`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
-
-Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
-
 ### BrowserOS infrastructure

 ```json
@@ -215,7 +119,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
  "base_server_port": 9110,
  "base_extension_port": 9310,
  "load_extensions": false,
-  "headless": false
+  "headless": true
 }
 ```

@@ -233,12 +137,10 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP

 | File | Tasks | Description |
 |------|-------|-------------|
-| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
 | `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
 | `mind2web.jsonl` | 300 | Online-Mind2Web |
 | `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
-| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
-| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
+| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
 | `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
 | `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
 | `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
@@ -265,47 +167,14 @@ results/
  browseros-agent-weekly/
    2026-04-29-1430/
      Amazon--0/
-        attempt.json          # Stable attempt summary for viewer/reporting
        metadata.json         # Task result, timing, grader scores
-        grades.json           # Compact grader results
        messages.jsonl         # Full message log
-        grader-artifacts/      # Grader-specific inputs/outputs/stderr
        screenshots/
          001.png              # Step-by-step screenshots
          002.png
      summary.json             # Aggregate pass rates
 ```

-R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
-
-### R2 viewer manifest
-
-`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
-
-```json
-{
-  "schemaVersion": 2,
-  "runId": "agisdk-real-smoke-2026-04-30-0000",
-  "tasks": [
-    {
-      "queryId": "agisdk-dashdish-10",
-      "paths": {
-        "metadata": "tasks/agisdk-dashdish-10/metadata.json",
-        "messages": "tasks/agisdk-dashdish-10/messages.jsonl",
-        "grades": "tasks/agisdk-dashdish-10/grades.json",
-        "trace": "tasks/agisdk-dashdish-10/trace.jsonl",
-        "screenshots": "tasks/agisdk-dashdish-10/screenshots",
-        "graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
-      }
-    }
-  ]
-}
-```
-
-The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
-
-Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
-
 ## Troubleshooting

 **BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://api.fireworks.ai/inference/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/agisdk-real.jsonl",
+  "dataset": "../data/agisdk-real.jsonl",
  "num_workers": 4,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
@@ -7,8 +7,8 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/agisdk-real.jsonl",
-  "num_workers": 3,
+  "dataset": "../data/webbench-2of4-50.jsonl",
+  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
    "server_url": "http://127.0.0.1:9110",
@@ -21,6 +21,6 @@
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
  },
-  "graders": ["agisdk_state_diff"],
+  "graders": ["performance_grader"],
  "timeout_ms": 1800000
 }
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
@@ -14,7 +14,7 @@
      "baseUrl": "https://api.fireworks.ai/inference/v1"
    }
  },
-  "dataset": "../../data/webbench-2of4-50.jsonl",
+  "dataset": "../data/webbench-2of4-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
@@ -14,7 +14,7 @@
      "baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
    }
  },
-  "dataset": "../../data/agisdk-real.jsonl",
+  "dataset": "../data/agisdk-real.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
@@ -23,7 +23,7 @@
    "base_server_port": 9110,
    "base_extension_port": 9310,
    "load_extensions": false,
-    "headless": false
+    "headless": true
  },
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
--- a/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webarena-infinity-hard-50.jsonl",
+  "dataset": "../data/webarena-infinity-hard-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
@@ -1,26 +0,0 @@
-{
-  "agent": {
-    "type": "single",
-    "provider": "openai-compatible",
-    "model": "moonshotai/kimi-k2.5",
-    "apiKey": "OPENROUTER_API_KEY",
-    "baseUrl": "https://openrouter.ai/api/v1",
-    "supportsImages": true
-  },
-  "dataset": "../../data/agisdk-real-smoke.jsonl",
-  "num_workers": 1,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-kimi-k2-5-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-kimi-k2-5-agisdk-real.json
@@ -1,26 +0,0 @@
-{
-  "agent": {
-    "type": "single",
-    "provider": "openai-compatible",
-    "model": "moonshotai/kimi-k2.5",
-    "apiKey": "OPENROUTER_API_KEY",
-    "baseUrl": "https://openrouter.ai/api/v1",
-    "supportsImages": true
-  },
-  "dataset": "../../data/agisdk-real.jsonl",
-  "num_workers": 3,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-opus-4-6-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-opus-4-6-agisdk-real.json
@@ -1,27 +0,0 @@
-{
-  "agent": {
-    "type": "single",
-    "provider": "bedrock",
-    "model": "global.anthropic.claude-opus-4-6-v1",
-    "region": "AWS_REGION",
-    "accessKeyId": "AWS_ACCESS_KEY_ID",
-    "secretAccessKey": "AWS_SECRET_ACCESS_KEY",
-    "supportsImages": true
-  },
-  "dataset": "../../data/agisdk-real.jsonl",
-  "num_workers": 2,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
@@ -1,23 +0,0 @@
-{
-  "agent": {
-    "type": "claude-code",
-    "model": "opus",
-    "extraArgs": ["--permission-mode", "bypassPermissions"]
-  },
-  "dataset": "../../data/agisdk-real.jsonl",
-  "num_workers": 1,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-daily-10",
-  "dataset": "../../data/agisdk-daily-10.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-real-smoke",
-  "dataset": "../../data/agisdk-real-smoke.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-real",
-  "dataset": "../../data/agisdk-real.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
@@ -5,7 +5,7 @@
    "model": "openai/gpt-4.1",
    "apiKey": "OPENROUTER_API_KEY"
  },
-  "dataset": "../../data/mind2web.jsonl",
+  "dataset": "../data/mind2web.jsonl",
  "num_workers": 5,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://api.fireworks.ai/inference/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webvoyager.jsonl",
+  "dataset": "../data/webvoyager.jsonl",
  "num_workers": 3,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
@@ -1,10 +0,0 @@
-{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
-{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
-{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
-{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
-{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
-{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
-{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}
--- a/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
@@ -1 +0,0 @@
-{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
--- a/packages/browseros-agent/apps/eval/package.json
+++ b/packages/browseros-agent/apps/eval/package.json
@@ -5,7 +5,6 @@
  "type": "module",
  "scripts": {
    "eval": "bun --env-file=.env.development run src/index.ts",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
--- a/packages/browseros-agent/apps/eval/src/graders/python/agisdk-evaluate.py
+++ b/packages/browseros-agent/apps/eval/src/graders/python/agisdk-evaluate.py
@@ -81,13 +81,30 @@ def main():

        reward_val = float(reward_val) if reward_val is not None else 0.0
        results = info.get("results", [])
+        # `info["results"]` aligns 1:1 with `tc.task.evals` — zip them so we can
+        # surface the human-readable description and JMESPath query alongside
+        # the pass/fail. Without this the only feedback was a stringified dict.
+        evals = list(getattr(tc.task, "evals", []))

        per_criterion = []
        softened_count = 0
-        for r in results:
+        for idx, r in enumerate(results):
            passed = bool(r[0])
-            detail = r[1] if len(r) > 1 else ""
-            entry: dict = {"passed": passed, "detail": str(detail)}
+            detail = r[1] if len(r) > 1 else {}
+            ev = evals[idx] if idx < len(evals) else None
+
+            actual_value = expected_value = None
+            if isinstance(detail, dict):
+                actual_value = detail.get("actual_value")
+                expected_value = detail.get("expected_value")
+
+            entry: dict = {
+                "passed": passed,
+                "description": getattr(ev, "description", "") or "",
+                "query": getattr(ev, "query", "") or "",
+                "expected_value": expected_value,
+                "actual_value": actual_value,
+            }
            if not _STRICT and not passed and _soft_string_match(detail):
                entry["passed"] = True
                entry["softened"] = True
@@ -100,9 +117,43 @@ def main():
        if all_pass and reward_val != 1.0:
            reward_val = 1.0

-        out_message = str(message)
-        if softened_count and all_pass:
-            out_message = f"Task passed (with {softened_count} softened string criterion/criteria)."
+        # Build a useful message: list every criterion with a pass/fail icon
+        # so the viewer's grader pill shows the full check-list, not just
+        # failures. This becomes the `reasoning` shown in the viewer.
+        if not per_criterion:
+            # Defensive: agisdk returned no criteria — fall back to its message.
+            out_message = str(message)
+        else:
+            failures = [c for c in per_criterion if not c["passed"]]
+            if all_pass:
+                header = (
+                    f"All {len(per_criterion)} criteria passed"
+                    + (
+                        f" ({softened_count} softened)."
+                        if softened_count
+                        else "."
+                    )
+                )
+            else:
+                header = (
+                    f"{len(failures)} of {len(per_criterion)} criteria failed:"
+                )
+
+            lines = []
+            for c in per_criterion:
+                icon = "✓" if c["passed"] else "✗"
+                desc = c["description"] or c["query"] or "<unknown>"
+                soft = " (softened)" if c.get("softened") else ""
+                if c["passed"]:
+                    lines.append(f"{icon} {desc}{soft}")
+                else:
+                    exp_s = repr(c["expected_value"])
+                    act_s = repr(c["actual_value"])
+                    lines.append(
+                        f"{icon} {desc}: expected {exp_s}, got {act_s}"
+                    )
+
+            out_message = header + "\n" + "\n".join(lines)

        print(
            json.dumps(
--- a/packages/browseros-agent/apps/eval/scripts/generate-report.ts
+++ b/packages/browseros-agent/apps/eval/scripts/generate-report.ts
@@ -1,191 +0,0 @@
-#!/usr/bin/env bun
-
-import { mkdir, stat } from 'node:fs/promises'
-import { dirname, resolve } from 'node:path'
-import { query as claudeQuery } from '@anthropic-ai/claude-agent-sdk'
-import { readRunMetricSummary } from '../src/reporting/task-metrics'
-
-export const DEFAULT_REPORT_MODEL = 'claude-opus-4-6'
-export const DEFAULT_REPORT_MAX_TURNS = 300
-
-type Env = Record<string, string | undefined>
-type ClaudeQuery = (input: unknown) => AsyncIterable<Record<string, unknown>>
-
-export interface ReportAgentInvocation {
-  inputDir: string
-  outputPath: string
-  prompt: string
-}
-
-export interface GenerateEvalReportOptions {
-  inputDir: string
-  outputPath: string
-  runAgent?: (invocation: ReportAgentInvocation) => Promise<void>
-}
-
-interface ClaudeReportAgentDeps {
-  query?: ClaudeQuery
-  env?: Env
-}
-
-function usage(): string {
-  return `Usage: bun scripts/generate-report.ts --input <run-dir> --output <report.html>`
-}
-
-function parseArgs(
-  argv: string[],
-): Pick<GenerateEvalReportOptions, 'inputDir' | 'outputPath'> {
-  let inputDir = ''
-  let outputPath = ''
-  for (let i = 0; i < argv.length; i++) {
-    const arg = argv[i]
-    if (arg === '--input' || arg === '--run') {
-      inputDir = argv[++i] ?? ''
-    } else if (arg === '--output' || arg === '--out') {
-      outputPath = argv[++i] ?? ''
-    } else if (arg === '--help' || arg === '-h') {
-      console.log(usage())
-      process.exit(0)
-    }
-  }
-  if (!inputDir || !outputPath) {
-    throw new Error(usage())
-  }
-  return { inputDir, outputPath }
-}
-
-function claudeCodeEnv(env: Env): Env {
-  return {
-    CLAUDE_CODE_OAUTH_TOKEN: env.CLAUDE_CODE_OAUTH_TOKEN,
-    ANTHROPIC_API_KEY: env.ANTHROPIC_API_KEY,
-    HOME: env.HOME,
-    PATH: env.PATH,
-    SHELL: env.SHELL,
-    TMPDIR: env.TMPDIR,
-    TMP: env.TMP,
-    TEMP: env.TEMP,
-    USER: env.USER,
-    CLAUDECODE: '',
-  }
-}
-
-async function buildReportPrompt(
-  inputDir: string,
-  outputPath: string,
-): Promise<string> {
-  const metrics = await readRunMetricSummary(inputDir)
-
-  return `Analyze this BrowserOS eval run and write a shareable HTML report.
-
-Run directory: ${inputDir}
-Output file to write: ${outputPath}
-
-You are running with the run directory as cwd. Inspect the local artifacts:
- summary.json for run totals and pass rate
- each task directory's metadata.json for query, final answer, timing, screenshots, and grader results
- each task directory's messages.jsonl for tool calls, tool errors, and recent trajectory
- screenshots/ for visual evidence
- grader-artifacts/ when present for grader-specific context
-
-Write the final report directly to the output file path above. Do not print the
-report instead of writing it. Do not modify any input artifacts. The only file
-you should create or overwrite is the requested report.html.
-
-The report should follow the style and density of the Shadowfax AGI SDK report:
- Title like "AGI SDK Random-10 Failure Report" or a run-specific equivalent
- Run directory and note that screenshots are embedded as data URIs
- Summary cards for total tasks, passed, failed, pass rate, average duration, average steps, and average tool calls
- A Metrics section with compact charts for Duration by task, Steps by task, Tool calls by task, and Tool errors by task
- Task Summary table with task id, status, score, duration, steps, and prompt
- Include tool calls and tool errors in the Task Summary table
- Failure sections with stable anchors using each task id, for example <section id="agisdk-networkin-10">
- For each failed task: Diagnosis, Evidence, Next Check, final screenshot, AGI SDK / grader criteria, final answer, and recent trajectory events
- Make failure links in the summary table point to the task anchors
- Keep the HTML self-contained: inline CSS and embedded final screenshots as data:image/png;base64 URIs
- Escape user/model text correctly so task outputs cannot break the page
-
-Analysis guidance:
- Focus on why the model failed: task understanding, browser/tool usage, missing verification, tool errors, max-step/timeout, bad final answer, or grader ambiguity
- Use messages.jsonl strategically. Do not paste huge DOM outputs into the report. Summarize only the relevant recent trajectory and evidence.
- Limit trajectory analysis to the most relevant 200-300 events/calls across the run. Prefer failed tasks and the final/key actions for each failure.
- If a grader criterion is boolean-only or ambiguous, say so and identify what additional artifact would make it debuggable.
-
-Deterministic run metrics computed from metadata.json and messages.jsonl:
-\`\`\`json
-${JSON.stringify(metrics, null, 2)}
-\`\`\`
-
-After writing the file, verify that ${outputPath} exists and is non-empty.`
-}
-
-async function assertRunDir(inputDir: string): Promise<void> {
-  const inputStat = await stat(inputDir).catch(() => null)
-  if (!inputStat?.isDirectory()) {
-    throw new Error(`Not a run directory: ${inputDir}`)
-  }
-}
-
-async function assertReportWritten(outputPath: string): Promise<void> {
-  const outputStat = await stat(outputPath).catch(() => null)
-  if (!outputStat?.isFile() || outputStat.size === 0) {
-    throw new Error(`Report was not written: ${outputPath}`)
-  }
-}
-
-export async function runClaudeCodeReportAgent(
-  invocation: ReportAgentInvocation,
-  deps: ClaudeReportAgentDeps = {},
-): Promise<void> {
-  const query = deps.query ?? (claudeQuery as unknown as ClaudeQuery)
-  let resultSubtype: string | undefined
-
-  for await (const message of query({
-    prompt: invocation.prompt,
-    options: {
-      cwd: invocation.inputDir,
-      model: DEFAULT_REPORT_MODEL,
-      systemPrompt:
-        'You are an eval failure analyst. Produce a concise, evidence-backed, self-contained HTML report from local run artifacts.',
-      permissionMode: 'bypassPermissions',
-      allowDangerouslySkipPermissions: true,
-      maxTurns: DEFAULT_REPORT_MAX_TURNS,
-      env: claudeCodeEnv(deps.env ?? process.env),
-    },
-  })) {
-    if (message.type === 'result') {
-      resultSubtype =
-        typeof message.subtype === 'string' ? message.subtype : undefined
-    }
-  }
-
-  if (resultSubtype && resultSubtype !== 'success') {
-    throw new Error(`Claude Code report agent failed: ${resultSubtype}`)
-  }
-}
-
-export async function generateEvalReport(
-  options: GenerateEvalReportOptions,
-): Promise<void> {
-  const inputDir = resolve(options.inputDir)
-  const outputPath = resolve(options.outputPath)
-
-  await assertRunDir(inputDir)
-  await mkdir(dirname(outputPath), { recursive: true })
-
-  const invocation = {
-    inputDir,
-    outputPath,
-    prompt: await buildReportPrompt(inputDir, outputPath),
-  }
-  await (options.runAgent ?? runClaudeCodeReportAgent)(invocation)
-  await assertReportWritten(outputPath)
-}
-
-if (import.meta.main) {
-  try {
-    await generateEvalReport(parseArgs(Bun.argv.slice(2)))
-  } catch (error) {
-    console.error(error instanceof Error ? error.message : String(error))
-    process.exit(1)
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/graders/python/infinity-evaluate.py
+++ b/packages/browseros-agent/apps/eval/src/graders/python/infinity-evaluate.py
--- a/packages/browseros-agent/apps/eval/scripts/upload-run.ts
+++ b/packages/browseros-agent/apps/eval/scripts/upload-run.ts
@@ -1,43 +1,349 @@
-#!/usr/bin/env bun
-
 /**
 * Upload eval runs to R2.
 *
 * Two modes:
 *   bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
+ *       → uploads that specific run
+ *
 *   bun scripts/upload-run.ts results/browseros-agent-weekly
+ *       → finds all timestamped subfolders, uploads any not yet in R2
+ *
+ * Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
+ *           EVAL_R2_BUCKET (default: browseros-eval)
+ *           EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
 */

+import { readdir, readFile, stat } from 'node:fs/promises'
+import { basename, dirname, extname, join } from 'node:path'
 import {
-  loadR2ConfigFromEnv,
-  R2Publisher,
-} from '../src/publishing/r2-publisher'
+  GetObjectCommand,
+  PutObjectCommand,
+  S3Client,
+} from '@aws-sdk/client-s3'

-async function main(): Promise<void> {
-  const inputDir = process.argv[2]
-  if (!inputDir) {
-    throw new Error(
-      'Usage:\n' +
-        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
-        '  bun scripts/upload-run.ts results/config-name',
+const CONCURRENCY = 20
+
+const CONTENT_TYPES: Record<string, string> = {
+  '.json': 'application/json',
+  '.jsonl': 'application/x-ndjson',
+  '.png': 'image/png',
+}
+
+interface R2Config {
+  accountId: string
+  accessKeyId: string
+  secretAccessKey: string
+  bucket: string
+  cdnBaseUrl: string
+}
+
+function loadConfig(): R2Config {
+  const accountId = process.env.EVAL_R2_ACCOUNT_ID
+  const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
+  const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
+
+  if (!accountId || !accessKeyId || !secretAccessKey) {
+    console.error(
+      'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
    )
+    process.exit(1)
  }

-  const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
-  const result = await publisher.publishPath(inputDir)
-  for (const run of result.uploadedRuns) {
-    console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
-    console.log(run.viewerUrl)
+  return {
+    accountId,
+    accessKeyId,
+    secretAccessKey,
+    bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
+    cdnBaseUrl: (
+      process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
+    ).replace(/\/+$/, ''),
  }
-  for (const runId of result.skippedRuns) {
-    console.log(`${runId}: already uploaded, skipping`)
-  }
-  console.log(
-    `Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
+}
+
+function createClient(config: R2Config): S3Client {
+  return new S3Client({
+    region: 'auto',
+    endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
+    credentials: {
+      accessKeyId: config.accessKeyId,
+      secretAccessKey: config.secretAccessKey,
+    },
+  })
+}
+
+async function upload(
+  client: S3Client,
+  bucket: string,
+  key: string,
+  body: Buffer,
+  contentType: string,
+) {
+  await client.send(
+    new PutObjectCommand({
+      Bucket: bucket,
+      Key: key,
+      Body: body,
+      ContentType: contentType,
+    }),
  )
 }

-main().catch((error) => {
-  console.error(error instanceof Error ? error.message : String(error))
-  process.exit(1)
-})
+async function collectFiles(dir: string): Promise<string[]> {
+  const files: string[] = []
+  const entries = await readdir(dir, { withFileTypes: true })
+  for (const entry of entries) {
+    const full = join(dir, entry.name)
+    if (entry.isDirectory()) {
+      files.push(...(await collectFiles(full)))
+    } else {
+      files.push(full)
+    }
+  }
+  return files
+}
+
+async function runPool<T>(
+  items: T[],
+  concurrency: number,
+  fn: (item: T) => Promise<void>,
+) {
+  let i = 0
+  const workers = Array.from({ length: concurrency }, async () => {
+    while (i < items.length) {
+      const idx = i++
+      await fn(items[idx])
+    }
+  })
+  await Promise.all(workers)
+}
+
+// Check if a run has already been uploaded to R2
+async function isUploaded(
+  client: S3Client,
+  bucket: string,
+  runId: string,
+): Promise<boolean> {
+  try {
+    await client.send(
+      new GetObjectCommand({
+        Bucket: bucket,
+        Key: `runs/${runId}/manifest.json`,
+      }),
+    )
+    return true
+  } catch {
+    return false
+  }
+}
+
+// Detect if a directory is a run dir (has task subdirs with metadata.json)
+// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
+async function isRunDir(dir: string): Promise<boolean> {
+  const entries = await readdir(dir, { withFileTypes: true })
+  const subdirs = entries.filter((e) => e.isDirectory())
+  for (const subdir of subdirs) {
+    const metaPath = join(dir, subdir.name, 'metadata.json')
+    const metaStat = await stat(metaPath).catch(() => null)
+    if (metaStat?.isFile()) return true
+  }
+  return false
+}
+
+async function uploadSingleRun(
+  runDir: string,
+  runId: string,
+  r2Config: R2Config,
+  client: S3Client,
+): Promise<void> {
+  const taskDirs = await readdir(runDir, { withFileTypes: true })
+  const taskEntries = taskDirs.filter((d) => d.isDirectory())
+
+  if (taskEntries.length === 0) {
+    console.warn(`  No task subdirectories in ${runId}, skipping`)
+    return
+  }
+
+  const manifestTasks: Record<string, unknown>[] = []
+  const jobs: { key: string; filePath: string; contentType: string }[] = []
+
+  // Extract agent config from first task
+  let agentConfig: Record<string, unknown> | undefined
+  let dataset: string | undefined
+
+  for (const taskDir of taskEntries) {
+    const taskId = taskDir.name
+    const taskPath = join(runDir, taskId)
+    const metaPath = join(taskPath, 'metadata.json')
+
+    let meta: Record<string, unknown> = {}
+    try {
+      meta = JSON.parse(await readFile(metaPath, 'utf-8'))
+    } catch {
+      continue
+    }
+
+    if (!agentConfig && meta.agent_config)
+      agentConfig = meta.agent_config as Record<string, unknown>
+    if (!dataset && meta.dataset) dataset = meta.dataset as string
+
+    const files = await collectFiles(taskPath)
+    let screenshotCount = 0
+
+    for (const file of files) {
+      const relative = file.slice(taskPath.length + 1)
+      const ext = extname(file)
+      if (relative.startsWith('screenshots/') && ext === '.png')
+        screenshotCount++
+
+      jobs.push({
+        key: `runs/${runId}/${taskId}/${relative}`,
+        filePath: file,
+        contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
+      })
+    }
+
+    manifestTasks.push({
+      queryId: meta.query_id || taskId,
+      query: meta.query || '',
+      startUrl: meta.start_url || '',
+      status:
+        meta.termination_reason === 'completed'
+          ? 'completed'
+          : meta.termination_reason || 'unknown',
+      durationMs: meta.total_duration_ms || 0,
+      screenshotCount: (meta.screenshot_count as number) || screenshotCount,
+      graderResults: meta.grader_results || {},
+    })
+  }
+
+  if (manifestTasks.length === 0) {
+    console.warn(`  No completed tasks in ${runId}, skipping`)
+    return
+  }
+
+  console.log(
+    `  Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
+  )
+
+  let uploaded = 0
+  await runPool(jobs, CONCURRENCY, async (job) => {
+    const body = await readFile(job.filePath)
+    await upload(client, r2Config.bucket, job.key, body, job.contentType)
+    uploaded++
+    if (uploaded % 50 === 0 || uploaded === jobs.length) {
+      console.log(`    ${uploaded}/${jobs.length}`)
+    }
+  })
+
+  // Read summary.json if it exists
+  let summaryData: Record<string, unknown> | undefined
+  try {
+    summaryData = JSON.parse(
+      await readFile(join(runDir, 'summary.json'), 'utf-8'),
+    )
+  } catch {}
+
+  // Upload manifest
+  const manifest = {
+    runId,
+    uploadedAt: new Date().toISOString(),
+    agentConfig,
+    dataset,
+    summary: summaryData
+      ? {
+          passRate: summaryData.passRate,
+          avgDurationMs: summaryData.avgDurationMs,
+        }
+      : undefined,
+    tasks: manifestTasks,
+  }
+  const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
+  await upload(
+    client,
+    r2Config.bucket,
+    `runs/${runId}/manifest.json`,
+    manifestBody,
+    'application/json',
+  )
+
+  // Upload viewer.html to bucket root
+  const viewerPath = join(
+    import.meta.dir,
+    '..',
+    'src',
+    'dashboard',
+    'viewer.html',
+  )
+  const viewerBody = await readFile(viewerPath)
+  await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
+
+  console.log(`  Uploaded ${uploaded + 2} files`)
+  console.log(`  ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
+}
+
+async function main() {
+  const inputDir = process.argv[2]
+  if (!inputDir) {
+    console.error(
+      'Usage:\n' +
+        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730  (specific run)\n' +
+        '  bun scripts/upload-run.ts results/config-name                   (all un-uploaded runs)',
+    )
+    process.exit(1)
+  }
+
+  const dirStat = await stat(inputDir).catch(() => null)
+  if (!dirStat?.isDirectory()) {
+    console.error(`Not a directory: ${inputDir}`)
+    process.exit(1)
+  }
+
+  const r2Config = loadConfig()
+  const client = createClient(r2Config)
+
+  if (await isRunDir(inputDir)) {
+    // Single run: results/config-name/2026-03-21-1730
+    const timestamp = basename(inputDir)
+    const configName = basename(dirname(inputDir))
+    const runId = `${configName}-${timestamp}`
+    console.log(`Uploading run: ${runId}`)
+    await uploadSingleRun(inputDir, runId, r2Config, client)
+  } else {
+    // Config dir: results/config-name/ — upload all un-uploaded runs
+    const configName = basename(inputDir)
+    const entries = await readdir(inputDir, { withFileTypes: true })
+    const runDirs = entries
+      .filter((e) => e.isDirectory())
+      .map((e) => e.name)
+      .sort()
+
+    if (runDirs.length === 0) {
+      console.error('No run subdirectories found')
+      process.exit(1)
+    }
+
+    console.log(
+      `Found ${runDirs.length} runs for config "${configName}", checking R2...`,
+    )
+
+    let uploadedCount = 0
+    for (const dir of runDirs) {
+      const runId = `${configName}-${dir}`
+      const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
+      if (alreadyUploaded) {
+        console.log(`  ${runId}: already uploaded, skipping`)
+        continue
+      }
+
+      console.log(`  ${runId}: uploading...`)
+      await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
+      uploadedCount++
+    }
+
+    console.log(
+      `\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
+    )
+  }
+}
+
+main()
--- a/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
+++ b/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
@@ -24,11 +24,45 @@ import {
  PutObjectCommand,
  S3Client,
 } from '@aws-sdk/client-s3'
-import {
-  buildRunSummaries,
-  type ReportManifest,
-  type RunSummary,
-} from '../src/reporting/run-summary'
+
+interface ManifestTask {
+  queryId: string
+  query: string
+  status: string
+  durationMs: number
+  screenshotCount: number
+  graderResults: Record<string, { pass: boolean; score: number }>
+}
+
+interface Manifest {
+  runId: string
+  uploadedAt: string
+  agentConfig?: { type?: string; model?: string }
+  dataset?: string
+  summary?: { passRate?: number; avgDurationMs?: number }
+  tasks: ManifestTask[]
+}
+
+interface RunSummary {
+  runId: string
+  configName: string
+  date: string
+  avgScore: number
+  total: number
+  completed: number
+  failed: number
+  timeout: number
+  avgDurationMs: number
+  model: string
+  dataset: string
+  agentType: string
+}
+
+const PASS_FAIL_GRADER_ORDER = [
+  'agisdk_state_diff',
+  'infinity_state',
+  'performance_grader',
+]

 function requireEnv(name: string): string {
  const value = process.env[name]
@@ -53,7 +87,7 @@ const client = new S3Client({
 // Step 1: List all manifest.json files in runs/
 console.log('Scanning R2 for eval runs...')

-const manifests: ReportManifest[] = []
+const manifests: Manifest[] = []
 let continuationToken: string | undefined

 do {
@@ -93,9 +127,64 @@ if (manifests.length === 0) {
 }

 // Step 2: Build run summaries
-const runs: RunSummary[] = buildRunSummaries(manifests)
+const runs: RunSummary[] = manifests
+  .map((m) => {
+    const total = m.tasks.length
+    const completed = m.tasks.filter((t) => t.status === 'completed').length
+    const failed = m.tasks.filter((t) => t.status === 'failed').length
+    const timeout = m.tasks.filter((t) => t.status === 'timeout').length
+
+    let scoredCount = 0
+    let scoreSum = 0
+    for (const task of m.tasks) {
+      if (!task.graderResults) continue
+      for (const name of PASS_FAIL_GRADER_ORDER) {
+        if (task.graderResults[name]) {
+          scoredCount++
+          scoreSum += task.graderResults[name].score ?? 0
+          break
+        }
+      }
+    }
+
+    const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
+    const durations = m.tasks
+      .filter((t) => t.durationMs > 0)
+      .map((t) => t.durationMs)
+    const avgDurationMs =
+      durations.length > 0
+        ? durations.reduce((a, b) => a + b, 0) / durations.length
+        : 0
+
+    const date = m.uploadedAt
+      ? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
+      : m.runId.slice(0, 15)
+
+    const model = m.agentConfig?.model || 'unknown'
+    const dataset = m.dataset || m.runId
+    const agentType = m.agentConfig?.type || 'unknown'
+
+    const configName = extractConfigName(m.runId)
+    return {
+      runId: m.runId,
+      configName,
+      date,
+      avgScore,
+      total,
+      completed,
+      failed,
+      timeout,
+      avgDurationMs,
+      model,
+      dataset,
+      agentType,
+    }
+  })
+  .sort((a, b) => a.date.localeCompare(b.date))

 // Step 3: Identify unique config groups
+// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
+// Extract config name by stripping the date-time suffix pattern
 function escHtml(s: string): string {
  return s
    .replace(/&/g, '&amp;')
@@ -104,6 +193,12 @@ function escHtml(s: string): string {
    .replace(/"/g, '&quot;')
 }

+function extractConfigName(runId: string): string {
+  // "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
+  // "ci-weekly" → "ci-weekly" (no timestamp, old format)
+  return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
+}
+
 const configGroups = [...new Set(runs.map((r) => r.configName))]
 const defaultConfig = configGroups.includes('ci-weekly')
  ? 'ci-weekly'
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
@@ -1,238 +0,0 @@
-import { writeFile } from 'node:fs/promises'
-import { join } from 'node:path'
-import { DEFAULT_TIMEOUT_MS } from '../../constants'
-import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
-import { withEvalTimeout } from '../../utils/with-eval-timeout'
-import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
-import {
-  type ClaudeCodeProcessRunner,
-  createClaudeCodeProcessRunner,
-} from './process-runner'
-import {
-  ClaudeCodeStreamParser,
-  shouldCaptureScreenshotForTool,
-} from './stream-parser'
-
-export interface ClaudeCodeEvaluatorDeps {
-  processRunner?: ClaudeCodeProcessRunner
-}
-
-export class ClaudeCodeEvaluator implements AgentEvaluator {
-  private processRunner: ClaudeCodeProcessRunner
-
-  constructor(
-    private ctx: AgentContext,
-    deps: ClaudeCodeEvaluatorDeps = {},
-  ) {
-    this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
-  }
-
-  async execute(): Promise<AgentResult> {
-    const { config, task, capture, taskOutputDir } = this.ctx
-    const startTime = Date.now()
-    const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
-
-    await capture.messageLogger.logUser(task.query)
-
-    if (config.agent.type !== 'claude-code') {
-      throw new Error('ClaudeCodeEvaluator only supports claude-code config')
-    }
-    const agentConfig = config.agent
-
-    const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
-    await writeFile(
-      mcpConfigPath,
-      JSON.stringify(
-        buildClaudeCodeMcpConfig(config.browseros.server_url),
-        null,
-        2,
-      ),
-    )
-
-    const parser = new ClaudeCodeStreamParser()
-    const toolNamesById = new Map<string, string>()
-    const prompt = buildClaudeCodePrompt(task.query)
-    const args = buildClaudeCodeArgs({
-      prompt,
-      mcpConfigPath,
-      config: agentConfig,
-    })
-
-    const { terminationReason } = await withEvalTimeout(
-      timeoutMs,
-      capture,
-      async (signal) => {
-        const runResult = await this.processRunner.run({
-          executable: agentConfig.claudePath,
-          args,
-          cwd: taskOutputDir,
-          signal,
-          onStdoutLine: async (line) => {
-            const events = parser.pushLine(line)
-            for (const event of events) {
-              await this.handleStreamEvent(event, toolNamesById)
-            }
-          },
-        })
-
-        if (runResult.exitCode !== 0) {
-          const message =
-            runResult.stderr.trim() ||
-            `Claude Code exited with status ${runResult.exitCode}`
-          capture.addError('agent_execution', message, {
-            exitCode: runResult.exitCode,
-          })
-          if (!parser.getLastText()) {
-            throw new Error(message)
-          }
-        }
-
-        for (const error of runResult.streamErrors ?? []) {
-          capture.addWarning(
-            'message_logging',
-            `Claude Code stream event processing failed: ${error}`,
-          )
-        }
-
-        return runResult
-      },
-    )
-
-    const endTime = Date.now()
-    const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
-    const metadata = {
-      query_id: task.query_id,
-      dataset: task.dataset,
-      query: task.query,
-      started_at: new Date(startTime).toISOString(),
-      completed_at: new Date(endTime).toISOString(),
-      total_duration_ms: endTime - startTime,
-      total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
-      termination_reason: terminationReason,
-      final_answer: finalAnswer,
-      errors: capture.getErrors(),
-      warnings: capture.getWarnings(),
-      device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
-      agent_config: {
-        type: 'claude-code' as const,
-        model: agentConfig.model,
-      },
-      grader_results: {},
-    }
-
-    await capture.trajectorySaver.saveMetadata(metadata)
-
-    return {
-      metadata,
-      messages: capture.getMessages(),
-      finalAnswer,
-    }
-  }
-
-  private async handleStreamEvent(
-    event: UIMessageStreamEvent,
-    toolNamesById: Map<string, string>,
-  ): Promise<void> {
-    const { capture, task } = this.ctx
-    let screenshot: number | undefined
-
-    if (event.type === 'tool-input-available') {
-      toolNamesById.set(event.toolCallId, event.toolName)
-      if (isPageInput(event.input)) {
-        capture.setActivePageId(event.input.page)
-      }
-    }
-
-    if (
-      event.type === 'tool-output-available' ||
-      event.type === 'tool-output-error'
-    ) {
-      const toolName = toolNamesById.get(event.toolCallId)
-      if (toolName && shouldCaptureScreenshotForTool(toolName)) {
-        screenshot = await this.captureScreenshot()
-      }
-    }
-
-    await capture.messageLogger.logStreamEvent(event, screenshot)
-    capture.emitEvent(task.query_id, {
-      ...event,
-      ...(screenshot !== undefined && { screenshot }),
-    })
-  }
-
-  private async captureScreenshot(): Promise<number | undefined> {
-    const { capture, task } = this.ctx
-    try {
-      const screenshot = await capture.screenshot.capture(
-        capture.getActivePageId(),
-      )
-      capture.emitEvent(task.query_id, {
-        type: 'screenshot-captured',
-        screenshot,
-      })
-      return screenshot
-    } catch {
-      return undefined
-    }
-  }
-}
-
-function isPageInput(input: unknown): input is { page: number } {
-  return (
-    typeof input === 'object' &&
-    input !== null &&
-    'page' in input &&
-    typeof input.page === 'number'
-  )
-}
-
-function buildClaudeCodePrompt(taskQuery: string): string {
-  return [
-    'You are running inside BrowserOS eval.',
-    'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
-    'When the task is complete, respond with the final answer only.',
-    'If blocked, explain the blocker clearly.',
-    '',
-    `Task: ${taskQuery}`,
-  ].join('\n')
-}
-
-function buildClaudeCodeArgs({
-  prompt,
-  mcpConfigPath,
-  config,
-}: {
-  prompt: string
-  mcpConfigPath: string
-  config: ClaudeCodeAgentConfig
-}): string[] {
-  const args = [
-    '-p',
-    prompt,
-    '--mcp-config',
-    mcpConfigPath,
-    '--strict-mcp-config',
-    '--output-format',
-    'stream-json',
-    '--verbose',
-  ]
-
-  if (config.model) args.push('--model', config.model)
-  args.push(...config.extraArgs)
-
-  return args
-}
-
-function buildClaudeCodeMcpConfig(serverUrl: string) {
-  const trimmed = serverUrl.replace(/\/$/, '')
-  const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
-  return {
-    mcpServers: {
-      browseros: {
-        type: 'http',
-        url,
-        headers: { 'X-BrowserOS-Source': 'sdk-internal' },
-      },
-    },
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
@@ -1,114 +0,0 @@
-export interface ClaudeCodeRunOptions {
-  executable: string
-  args: string[]
-  cwd: string
-  signal?: AbortSignal
-  onStdoutLine: (line: string) => Promise<void>
-}
-
-export interface ClaudeCodeRunResult {
-  exitCode: number
-  stderr: string
-  streamErrors?: string[]
-}
-
-export interface ClaudeCodeProcessRunner {
-  run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
-}
-
-export interface SpawnOptions {
-  cwd: string
-  signal?: AbortSignal
-  onStdoutLine: (line: string) => Promise<void>
-}
-
-export interface CreateClaudeCodeProcessRunnerDeps {
-  spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
-}
-
-export function createClaudeCodeProcessRunner(
-  deps: CreateClaudeCodeProcessRunnerDeps = {},
-): ClaudeCodeProcessRunner {
-  const spawn = deps.spawn ?? spawnClaudeCode
-  return {
-    run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
-      spawn([executable, ...args], { cwd, signal, onStdoutLine }),
-  }
-}
-
-async function spawnClaudeCode(
-  cmd: string[],
-  options: SpawnOptions,
-): Promise<ClaudeCodeRunResult> {
-  const proc = Bun.spawn({
-    cmd,
-    cwd: options.cwd,
-    stdin: 'ignore',
-    stdout: 'pipe',
-    stderr: 'pipe',
-  })
-
-  const abort = () => {
-    try {
-      proc.kill('SIGTERM')
-    } catch {
-      // Process may already have exited.
-    }
-  }
-  options.signal?.addEventListener('abort', abort, { once: true })
-
-  try {
-    const streamErrors: string[] = []
-    const stdoutPromise = readLines(
-      proc.stdout,
-      options.onStdoutLine,
-      streamErrors,
-    )
-    const stderrPromise = new Response(proc.stderr).text()
-    const exitCode = await proc.exited
-    await stdoutPromise
-    const stderr = await stderrPromise
-    return { exitCode, stderr, streamErrors }
-  } finally {
-    options.signal?.removeEventListener('abort', abort)
-  }
-}
-
-async function readLines(
-  stream: ReadableStream<Uint8Array>,
-  onLine: (line: string) => Promise<void>,
-  streamErrors: string[],
-): Promise<void> {
-  const reader = stream.getReader()
-  const decoder = new TextDecoder()
-  let buffer = ''
-
-  while (true) {
-    const { done, value } = await reader.read()
-    if (done) break
-
-    buffer += decoder.decode(value, { stream: true })
-    const lines = buffer.split('\n')
-    buffer = lines.pop() ?? ''
-    for (const line of lines) {
-      await emitLine(line, onLine, streamErrors)
-    }
-  }
-
-  buffer += decoder.decode()
-  if (buffer.length > 0) {
-    await emitLine(buffer, onLine, streamErrors)
-  }
-}
-
-async function emitLine(
-  line: string,
-  onLine: (line: string) => Promise<void>,
-  streamErrors: string[],
-): Promise<void> {
-  try {
-    await onLine(line)
-  } catch (error) {
-    streamErrors.push(error instanceof Error ? error.message : String(error))
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
@@ -1,142 +0,0 @@
-import { randomUUID } from 'node:crypto'
-import type { UIMessageStreamEvent } from '../../types'
-
-type JsonObject = Record<string, unknown>
-
-export class ClaudeCodeStreamParser {
-  private lastText: string | null = null
-  private toolCallCount = 0
-
-  pushLine(line: string): UIMessageStreamEvent[] {
-    const trimmed = line.trim()
-    if (!trimmed) return []
-
-    let parsed: unknown
-    try {
-      parsed = JSON.parse(trimmed)
-    } catch {
-      return []
-    }
-
-    if (!isObject(parsed)) return []
-
-    if (parsed.type === 'assistant') {
-      return this.parseAssistantMessage(parsed)
-    }
-    if (parsed.type === 'user') {
-      return this.parseUserMessage(parsed)
-    }
-    if (parsed.type === 'result' && typeof parsed.result === 'string') {
-      this.lastText = parsed.result
-    }
-
-    return []
-  }
-
-  getLastText(): string | null {
-    return this.lastText
-  }
-
-  getToolCallCount(): number {
-    return this.toolCallCount
-  }
-
-  private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
-    const content = contentBlocks(message)
-    const events: UIMessageStreamEvent[] = []
-
-    for (const block of content) {
-      if (block.type === 'text' && typeof block.text === 'string') {
-        const id = randomUUID()
-        this.lastText = block.text
-        events.push(
-          { type: 'text-start', id },
-          { type: 'text-delta', id, delta: block.text },
-          { type: 'text-end', id },
-        )
-      } else if (
-        block.type === 'tool_use' &&
-        typeof block.id === 'string' &&
-        typeof block.name === 'string'
-      ) {
-        this.toolCallCount++
-        events.push({
-          type: 'tool-input-available',
-          toolCallId: block.id,
-          toolName: block.name,
-          input: block.input,
-        })
-      }
-    }
-
-    return events
-  }
-
-  private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
-    const content = contentBlocks(message)
-    const events: UIMessageStreamEvent[] = []
-
-    for (const block of content) {
-      if (
-        block.type !== 'tool_result' ||
-        typeof block.tool_use_id !== 'string'
-      ) {
-        continue
-      }
-
-      if (block.is_error === true) {
-        events.push({
-          type: 'tool-output-error',
-          toolCallId: block.tool_use_id,
-          errorText: stringifyToolContent(block.content),
-        })
-      } else {
-        events.push({
-          type: 'tool-output-available',
-          toolCallId: block.tool_use_id,
-          output: normalizeToolContent(block.content),
-        })
-      }
-    }
-
-    return events
-  }
-}
-
-export function shouldCaptureScreenshotForTool(toolName: string): boolean {
-  if (!toolName.startsWith('mcp__browseros__')) return false
-  return !toolName.endsWith('__take_screenshot')
-}
-
-function contentBlocks(message: JsonObject): JsonObject[] {
-  const inner = isObject(message.message) ? message.message : message
-  return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
-}
-
-function isObject(value: unknown): value is JsonObject {
-  return typeof value === 'object' && value !== null
-}
-
-function normalizeToolContent(content: unknown): unknown {
-  if (!Array.isArray(content)) return content
-  return content.map((item) => {
-    if (
-      isObject(item) &&
-      item.type === 'text' &&
-      typeof item.text === 'string'
-    ) {
-      return item.text
-    }
-    return item
-  })
-}
-
-function stringifyToolContent(content: unknown): string {
-  const normalized = normalizeToolContent(content)
-  if (typeof normalized === 'string') return normalized
-  try {
-    return JSON.stringify(normalized)
-  } catch {
-    return String(normalized)
-  }
-}
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
shivammittal274	7ee8dedd53	chore(eval): drop the 60-char truncation on grader expected/actual values Some criteria check long strings (job descriptions, post bodies, etc.) — truncating to 60 chars hides exactly the bytes you need to diff. The viewer's reasoning area already has max-height + scroll + word-break so long content scrolls; nothing renders worse for being full-length.	2026-04-30 02:08:30 +05:30
shivammittal274	a3b5ef4da3	chore(eval): show every criterion in agisdk grader message, not just failures Listing only failures hid the bigger picture — when 1 of 4 criteria fails you still want to know which 3 passed and what was checked. Now the message is the full checklist, ✓/✗ per criterion, with expected vs actual on the failing lines. Examples: All 4 criteria passed. ✓ correct job title ✓ includes Java skill ✓ includes Spring Boot skill ✓ includes Angular skill 2 of 4 criteria failed: ✓ correct job title (softened) ✓ includes Java skill ✗ includes Spring Boot skill: expected True, got False ✗ includes Angular skill: expected True, got False	2026-04-30 02:08:07 +05:30
shivammittal274	3333728e4e	fix(eval): surface per-criterion descriptions in agisdk grader output The viewer's grader-reasoning pill was showing "Task not completed successfully." for every agisdk_state_diff failure. The rich data was actually available — agisdk's TaskConfig exposes a 'description' (e.g. "includes Spring Boot skill") and the JMESPath 'query' for each criterion, zip-aligned 1:1 with info['results'] — we just weren't extracting it. Now agisdk-evaluate.py emits per-criterion entries with description, query, expected_value, actual_value, and builds the message as a useful multi-line summary: 2 of 4 criteria failed: • includes Spring Boot skill: expected True, got False • includes Angular skill: expected True, got False The viewer's grader-reasoning area already has white-space: pre-wrap so the multi-line message renders correctly. The structured per_criterion fields are also stored under details.per_criterion in metadata.json for anyone who wants to grep R2 artifacts directly.	2026-04-30 02:06:51 +05:30
shivammittal274	5c6fd34d3e	fix(eval): address Greptile P1+P2 on server log fd handling P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the log directory missing and crash the server spawn with ENOENT. Move openSync into the same try block; fall back to /dev/null so spawn always succeeds. P2: the log fd was opened on every server start but never closed. Each restart attempt leaked one fd across all workers — over a long eval run that could exhaust the process fd limit. Track the fd on the manager and closeSync it in killApp() right after the server process exits (the child's dup keeps the file open until it exits, so we don't truncate output).	2026-04-30 01:16:20 +05:30
shivammittal274	1a1220dff5	chore(eval): run clado weekly headless Default to headless so the weekly job (and local repros) don't pop ten visible Chrome windows. Set headless=false locally if you need to watch a worker.	2026-04-30 00:37:45 +05:30
shivammittal274	dc98858cc3	chore(eval): point clado weekly config at agisdk-real Switches the orchestrator-executor + Clado weekly config to run on the AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff grader. Matches the orchestrator-executor smoke target (Fireworks K2.5 orchestrator + Clado action executor) we want to track week-over-week.	2026-04-30 00:37:45 +05:30
shivammittal274	72cbffe2bb	chore(eval): refresh test-clado-api script for new Clado contract Updated the local smoke-test to match the new Clado endpoint and response contract: - New action + health URLs (000159-merged checkpoint). - Drop the grounding-model branch (orchestrator-executor doesn't use it; the README David shared only documents the action model). - Health-check waits up to 6 minutes for cold start with a 30s warning so the operator knows it's spinning up. - Print every documented response field (action, x/y, text, key, direction, amount, drag start/end, time, final_answer, thinking, parse_error, inference_time_seconds). - Three-step run that exercises a click, a typing continuation with formatted history, and an end+final_answer probe.	2026-04-30 00:37:44 +05:30
shivammittal274	34fdf08521	feat(eval): align Clado action executor with new endpoint contract David Shan shared the updated Clado BrowserOS Action Model spec. Changes to match it: - Bump endpoint URL + model id to the 000159-merged checkpoint (clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef) in browseros-oe-clado-weekly.json and the README example. - CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min; the 2-min ceiling was failing every cold-start request. - Treat HTTP 200 with action=null / parse_error as an INVALID step instead of aborting the executor loop. The model can self-correct on the next call. Cap consecutive parse failures at 3 to avoid infinite loops. - Capture final_answer from end actions. Surface it in the observation back to the orchestrator so its task answer can use the model's declared result. - Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X). - Switch screenshot format from webp → png to match the documented "PNG or JPEG" contract.	2026-04-30 00:37:44 +05:30
shivammittal274	be6858d589	fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1 Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't unblock dev's Linux CI — same throw kept reproducing. Whether this is bun caching stale stack frames or a missed eager call site, the safer move is to fix it at the root: make buildContainerRuntime never throw on Linux when the runner has explicitly opted out. Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test escape hatch in container-runtime-factory.ts. When set, returns the existing UnsupportedPlatformTestRuntime stub — server boots normally, /health binds, any actual OpenClaw API call still fails loudly at request time. eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and non-CI Linux behavior unchanged (without the flag they still throw).	2026-04-29 23:18:59 +05:30
shivammittal274	33f68a0d74	fix(server): defer OpenClaw chat client port lookup to request time apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort() synchronously when constructing the OpenClawGatewayChatClient inside the createHttpServer object literal. On non-darwin platforms this throws via the OpenClawService constructor → buildContainerRuntime, escaping the try/catch added in `5cf7b765` (which only protected the configureOpenClawService call further down in main.ts). Every other getOpenClawService() reference in server.ts is already wrapped in an arrow function. This was the lone holdout. Make it lazy too: change the chat client constructor to take getHostPort: () => number instead of hostPort: number, evaluate it inside streamTurn at request time. Behavior on darwin is unchanged. This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't available — the chat endpoint isn't exercised by the eval, so a deferred throw is acceptable.	2026-04-29 23:10:48 +05:30
shivammittal274	5cf7b765d0	fix(server): catch sync throw from OpenClaw constructor on Linux The container runtime constructor in OpenClawService throws synchronously on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing .catch() on tryAutoStart() only handles async throws inside auto-start — the sync throw from configureOpenClawService(...) itself propagates up through Application.start() and crashes the process via index.ts:48 (process.exit(EXIT_CODES.GENERAL_ERROR)). This is what's been killing dev's eval-weekly CI: the server crashes in milliseconds, the eval client polls /health, gets nothing, times out. Fix: wrap the configureOpenClawService call in try/catch matching the existing .catch() intent (best-effort, don't crash). Server continues without OpenClaw on platforms where it can't initialize. Verified by reading captured server stdout from run 25123195126: Failed to start server: error: browseros-vm currently supports macOS only at buildContainerRuntime (container-runtime-factory.ts:54:11) at new OpenClawService (openclaw-service.ts:652:15) at configureOpenClawService (openclaw-service.ts:1527:19) at start (main.ts:127:5)	2026-04-29 22:57:03 +05:30
shivammittal274	5ed0879d31	fix(eval): capture stdout too — pino logger writes to stdout, not stderr Previous diagnostic patch only redirected stderr; the captured per-worker log files came back as 0 bytes because the server uses pino which writes all log output to stdout (fd 1), not stderr (fd 2). Capture both into the same file.	2026-04-29 22:44:07 +05:30
shivammittal274	e136094305	chore(eval): instrument server startup to root-cause dev CI health-check timeouts Three diagnostics + one config swap to investigate why the eval-weekly workflow has been failing on dev since 2026-04-25 with "Server health check timed out" (every worker, every retry). Background: - Last successful weekly eval on dev: 2026-04-18 (sha `f5a2b73`) - Since then, ~30 server commits landed including Lima/VM runtime, OpenClaw service, ACL system, ACP SDK — 108 server files changed, ~13K LOC added. - Server process spawns cleanly in CI (PID logged) but never binds /health within the 30s eval-side timeout. Static analysis finds no obvious blocker; we need runtime evidence. Changes: 1. apps/server/package.json — add `start:ci` script (no `--watch`). The default `start` uses `bun --watch` which forks a child process that watches every file in the import graph. Dev's graph is ~108 files larger than main's; on a cold CI runner the watcher setup is a plausible source of multi-second startup overhead. 2. apps/eval/src/runner/browseros-app-manager.ts: - Use `start:ci` when `process.env.CI` is set (true on GitHub-hosted runners by default), else `start`. - Capture per-worker server stderr to /tmp/browseros-server-logs/ instead of ignoring it. Without this we have no visibility into why the server is hung pre-/health. - Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module graph may simply need more cold-start time on CI. 3. .github/workflows/eval-weekly.yml — upload the server logs dir as a workflow artifact (always, not just on success) so we can post-mortem any startup failure on the next run. 4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter -> Fireworks (bypasses the OpenRouter per-key spend cap that has been eating recent runs) and drop num_workers 10 -> 4 (well below the Fireworks per-account TPM threshold that overwhelmed the original 2026-04-23 run). Plan: trigger the eval-weekly workflow on this branch with the agisdk config and observe (a) whether it gets past server startup, and (b) if it doesn't, what the captured server stderr says.	2026-04-29 22:34:32 +05:30
				`@@ -1 +0,0 @@`
				{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}