chore(eval): colocate grader python evaluators

chore(eval): organize config layouts
docs(eval): explain suites and variants
2026-05-14 08:03:58 +00:00 · 2026-04-29 17:16:58 -07:00 · 2026-04-29 17:01:25 -07:00 · 2026-04-29 16:38:54 -07:00 · 2026-04-29 16:10:27 -07:00 · 2026-04-29 16:00:56 -07:00
202 changed files with 5008 additions and 10383 deletions
--- a/.claude/skills/ask-internal/SKILL.md
+++ b/.claude/skills/ask-internal/SKILL.md
@@ -1,152 +0,0 @@
---
-name: ask-internal
-description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
-allowed-tools: Bash, Read, Grep, Glob, Edit, Write
---
-
-# Ask Internal
-
-Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
-
-**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
-
-## When to use
-
- "How do I reset my dogfood profile?"
- "What's the deal with the OpenClaw VM startup?"
- "Where do we configure release signing?"
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
-
-## Hard rules — never do these
-
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER cite a file or line number you have not actually read.
-
-## Voice rules
-
-Apply the same voice rules as `document-internal` to the synthesized answer:
-
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences. Active voice. No em dashes.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- No filler intros.
-
-## Workflow
-
-### Step 0: Pre-flight
-
-```bash
-if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
-  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
-  exit 0
-fi
-[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
-  echo ".internal-docs/ missing or empty. Submodule not configured?"
-  exit 0
-}
-```
-
-### Step 1: Parse the question
-
-Pull the keywords from the user's question. Drop stop words. Identify intent:
-
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
- **Free-form** ("anything about Y"): search all categories.
-
-### Step 2: Multi-source search
-
-Run grep in parallel across two sources.
-
-**Internal docs:**
-
-```bash
-grep -rni --include='*.md' '<keyword>' .internal-docs/
-```
-
-Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
-
-**Codebase (skip vendored Chromium and `node_modules`):**
-
-```bash
-grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
-     --exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
-     '<keyword>' packages/ scripts/ .config/ .github/
-```
-
-Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
-
-### Step 3: Synthesize answer
-
-Structure the response:
-
-1. **Direct answer.** First sentence answers the question. No preamble.
-2. **Steps if applicable.** Numbered list with exact commands.
-3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
-
-If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
-
-### Step 4: Offer execution (only if commands surfaced)
-
-If Step 3 produced executable commands the user could run, ask:
-
-> Run these for you? (y / n / dry-run)
-
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
- **n:** Skip. Done.
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
-
-### Step 5: Doc-not-found path
-
-If Step 2 returned nothing useful (no doc hits AND no clear code answer):
-
-1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
-2. Ask: "Draft a new doc and open a PR to internal-docs?"
-3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
-
-### Step 6: Completion status
-
-Report one of:
-
- **DONE** — answer delivered, citations verified.
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
- **BLOCKED** — submodule missing or other pre-flight failure.
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
-
-## Citation discipline
-
-Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
-
-If a doc says one thing and the code says another, surface the conflict explicitly:
-
-> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
-
-## Common Mistakes
-
-**Skimming and then citing**
- **Problem:** Citation points to a line that doesn't actually contain the claim.
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
-
-**Executing without per-command confirmation for mutations**
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
-
-**Searching only docs, not code**
- **Problem:** Doc says X but code does Y; answer is wrong.
- **Fix:** Always grep both sources in Step 2.
-
-## Red Flags
-
-**Never:**
- Cite a file:line you haven't read.
- Run mutations without per-command confirmation.
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
-
-**Always:**
- Pre-flight check before any search.
- Reconcile doc vs code conflicts in the answer, don't hide them.
- Plain "no doc covers this" when grep is empty — never invent.
--- a/.claude/skills/document-internal/SKILL.md
+++ b/.claude/skills/document-internal/SKILL.md
@@ -1,208 +0,0 @@
---
-name: document-internal
-description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
-allowed-tools: Bash, Read, Write, Edit, Grep, Glob
---
-
-# Document Internal
-
-Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
-
-**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
-
-## When to use
-
-After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
-
-## Hard rules — never do these
-
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
- NEVER fabricate filler content for empty template sections. Empty stays empty.
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
-
-## Voice rules — enforced by Step 4
-
-The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
-
- Lead with the point. First sentence answers "what is this?"
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
- Short sentences. Average <20 words. No deeply nested clauses.
- Active voice. "X does Y" not "Y is done by X".
- No em dashes. Use commas, periods, or rephrase.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
- No filler intros ("This document describes..."). Start with the substance.
- Empty sections stay empty. Do not write "N/A" or fabricate content.
-
-## Workflow
-
-### Step 0: Pre-flight
-
-Bail with a clear message on any failure.
-
-```bash
-# Submodule must be initialized
-if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
-  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
-  exit 0
-fi
-[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
-
-# Must be on a feature branch
-BRANCH=$(git branch --show-current)
-if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
-  echo "On $BRANCH. Run from a feature branch."
-  exit 0
-fi
-
-# Determine base branch (default: dev for this repo, fall back to main).
-# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
-BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
-
-# Gather context
-git log "$BASE..HEAD" --oneline
-git diff "$BASE...HEAD" --stat
-gh pr view --json body -q .body 2>/dev/null  # may be empty if no PR yet
-```
-
-### Step 1: Identify the doc
-
-Ask the user for three things in one prompt:
-
-1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
-2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
-3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
-
-### Step 2: Decision brief — four sharp questions
-
-Ask one question at a time. Each answer constrains the next. These force compression before drafting.
-
-1. "In one sentence: what can someone now DO that they could not before?"
-2. "What is the one design decision a future engineer needs to know?"
-3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
-4. "Any sharp edges or gotchas? (or 'none')"
-
-Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
-
-### Step 3: Draft from the template
-
-Read the matching template from `.internal-docs/_templates/`:
-
- `feature` → `feature-note.md`
- `architecture` → `architecture-note.md`
- `design` → `design-spec.md`
-
-If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
-
-Generate the 1-pager from the template, the four answers, and the diff context.
-
-### Step 4: Voice self-check
-
-Scan the draft for violations:
-
- Em dash present (`—`).
- Any banned word from the list.
- Average sentence length > 20 words.
- Body line count > 60 (feature notes only — architecture/design have no cap).
-
-If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
-
-If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
-
-### Step 5: Show + iterate
-
-Print the full draft. Ask:
-
-> Edit needed? Paste any changes, or say "looks good".
-
-Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
-
-### Step 6: Open PR to internal-docs
-
-Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
-
-```bash
-TMP=$(mktemp -d)
-trap 'rm -rf "$TMP"' EXIT  # cleans up even if any step below fails
-git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
-cd "$TMP"
-git checkout -b "docs/<slug>"
-
-# Write the doc
-mkdir -p "<type>"  # features, architecture, designs, or setup
-cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
-<draft content>
-DOC
-
-# Update the root README index — insert one line under the matching section
-# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
-
-git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
-git commit -m "docs(<type>): <slug>"
-git push -u origin "docs/<slug>"
-
-PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
-  --head "docs/<slug>" \
-  --title "docs(<type>): <slug>" \
-  --body "$(cat <<'BODY'
-## Summary
-<one-line of what this doc covers>
-
-## Source
- BrowserOS branch: <branch>
- Related PR: <#NNN if any>
-BODY
-)")
-
-cd -
-echo "PR opened: $PR_URL"
-# trap above cleans up $TMP on EXIT
-```
-
-If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
-
-### Step 7: Completion status
-
-Report one of:
-
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
-
-## Doc type defaults
-
-| Branch pattern | Default doc type | Default location |
-|----------------|------------------|------------------|
-| `feat/*`       | feature          | `features/`      |
-| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
-| `rfc/*` or `design/*` | design          | `designs/`       |
-| Otherwise      | ask              | ask              |
-
-## Common Mistakes
-
-**Drafting before asking the four questions**
- **Problem:** Output is generic filler that says nothing concrete.
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
-
-**Touching `.internal-docs/` directly**
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
- **Fix:** Always use the tmp clone in Step 6.
-
-**Skipping voice check on user edits**
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
- **Fix:** Re-run Step 4 after every user edit.
-
-## Red Flags
-
-**Never:**
- Push to `internal-docs/main`. Always branch + PR.
- Modify the OSS repo's `.gitmodules` or submodule pointer.
- Fabricate content for empty template sections.
-
-**Always:**
- Pre-flight check before doing any work.
- One-pager rule for feature notes (60-line body cap).
- File:line citations when referencing code.
--- a/.claude/skills/document-internal/seeds/README.md
+++ b/.claude/skills/document-internal/seeds/README.md
@@ -1,51 +0,0 @@
-# BrowserOS Internal Docs
-
-Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
-
-If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
-
-## How to find what you need
-
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
- Recently shipped feature → look in [`features/`](features/)
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
- A design decision or RFC → look in [`designs/`](designs/)
-
-Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
-
-## How to add a doc
-
-Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
-
-## Index
-
-### Setup
-<!-- one line per setup runbook: -->
-<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
-
-### Features
-<!-- one line per shipped feature, newest first: -->
-<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
-
-### Architecture
-<!-- one line per cross-cutting subsystem: -->
-<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
-
-### Designs
-<!-- one line per design spec, newest first: -->
-<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
-
-## Templates
-
-When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
-
-## Voice
-
-Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
-
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences, active voice, no em dashes.
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
- Empty sections stay empty. Do not write "N/A" or fake content.
- Feature notes target one screen, body 60 lines max.
--- a/.claude/skills/document-internal/seeds/_templates/architecture-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/architecture-note.md
@@ -1,31 +0,0 @@
---
-title: <subsystem name>
-owner: <github handle>
-status: current | deprecated
-date: YYYY-MM-DD
-related-features: [feature-slug-1, feature-slug-2]
---
-
-# <subsystem name>
-
-## What this subsystem does
-<1-2 paragraphs. The top-level responsibility. Boundaries.>
-
-## Architecture
-<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
-
-## Constraints
-<Hard rules the design enforces. "X must never call Y" type statements.>
-
-## Decisions made
-<Numbered list of non-obvious decisions and the reason for each.>
-
-## Key files
- `path/to/file.ts` — role
- `path/to/dir/` — what lives here
-
-## How to evolve this
-<Where to add things. Which tests to expect to update. What NOT to touch.>
-
-## Open questions
-<What is still being figured out. Empty if none.>
--- a/.claude/skills/document-internal/seeds/_templates/design-spec.md
+++ b/.claude/skills/document-internal/seeds/_templates/design-spec.md
@@ -1,34 +0,0 @@
---
-title: <design name>
-owner: <github handle>
-status: proposed | accepted | rejected | superseded
-date: YYYY-MM-DD
-supersedes: <design-slug or none>
---
-
-# <design name>
-
-## Goal
-<2-4 sentences. What this design is trying to accomplish.>
-
-## Context
-<1-2 paragraphs. The current state, what is failing, why this needs to change.>
-
-## Selected Approach
-<The chosen design at a high level. Architecture, components, data flow.>
-
-## Alternatives Considered
-### 1. <name>
-<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
-
-### 2. <name>
-<Same shape.>
-
-## Out of Scope
-<What this design does NOT cover. Defer references.>
-
-## Rollout
-<Numbered steps from "nothing exists" to "fully shipped".>
-
-## Open Questions
-<Resolved during design? Empty. Unresolved? List with owner.>
--- a/.claude/skills/document-internal/seeds/_templates/feature-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/feature-note.md
@@ -1,29 +0,0 @@
---
-title: <feature name>
-owner: <github handle>
-status: shipped | wip | deprecated
-date: YYYY-MM-DD
-prs: ["#NNN"]
-tags: [agent, browser, mcp]
---
-
-# <feature name>
-
-## What it does
-<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
-
-## Why we built it
-<1-2 sentences. Motivation. What pain it removed or what unlocked.>
-
-## How it works
-<3-6 sentences. The flow at a high level. Name the key files.>
-
-## Key files
- `path/to/file.ts` — what it does
- `path/to/other.ts` — what it does
-
-## How to run / test it locally
-<bullet list of commands. Empty section if N/A — do not fake.>
-
-## Gotchas
-<known sharp edges. "If you see X, that's why." Empty if N/A.>
--- a/.github/workflows/publish-vm-agent-cache.yml
+++ b/.github/workflows/publish-vm-agent-cache.yml
@@ -0,0 +1,167 @@
+name: Publish VM Agent Cache
+
+on:
+  workflow_dispatch:
+    inputs:
+      agent:
+        description: "Agent name from bundle.json"
+        required: true
+        type: string
+        default: openclaw
+      publish:
+        description: "Upload to R2 and merge manifest slice"
+        required: false
+        default: false
+        type: boolean
+  pull_request:
+    paths:
+      - "packages/browseros-agent/packages/build-tools/**"
+      - ".github/workflows/publish-vm-agent-cache.yml"
+
+env:
+  BUN_VERSION: "1.3.6"
+  PKG_DIR: packages/browseros-agent/packages/build-tools
+
+permissions:
+  contents: read
+
+jobs:
+  check:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools typecheck
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools test
+
+  build:
+    needs: check
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - arch: arm64
+            runner: ubuntu-24.04-arm
+          - arch: x64
+            runner: ubuntu-24.04
+    runs-on: ${{ matrix.runner }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Build tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          OUT: ${{ github.workspace }}/dist/images
+        run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
+      - uses: actions/upload-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
+          path: dist/images/
+          retention-days: 7
+
+  smoke:
+    needs: build
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - arch: arm64
+            runner: ubuntu-24.04-arm
+          - arch: x64
+            runner: ubuntu-24.04
+    runs-on: ${{ matrix.runner }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
+          path: dist/images
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Smoke test tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+        run: |
+          set -euo pipefail
+          tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-${{ matrix.arch }}.tar.gz" -print -quit)"
+          if [ -z "$tarball" ]; then
+            echo "missing ${{ matrix.arch }} tarball artifact for ${AGENT}" >&2
+            exit 1
+          fi
+          bun run smoke:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --tarball "$tarball"
+
+  publish:
+    needs: [build, smoke]
+    if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
+    runs-on: ubuntu-24.04
+    environment: release
+    concurrency:
+      group: r2-manifest-publish
+      cancel-in-progress: false
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          pattern: tarball-*
+          path: dist/images
+          merge-multiple: true
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Upload tarballs to R2
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
+            base="$(basename "$file")"
+            bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
+          done
+      - name: Merge agent slice into manifest
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          mkdir -p dist/images
+          cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
+          bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
+          bun run emit-manifest -- \
+            --slice "agents:${AGENT}" \
+            --dist-dir dist \
+            --merge-from dist/baseline-manifest.json \
+            --out dist/manifest.json
+          bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
--- a/.github/workflows/sync-internal-docs.yml
+++ b/.github/workflows/sync-internal-docs.yml
@@ -1,53 +0,0 @@
-name: Sync internal-docs submodule
-
-on:
-  schedule:
-    - cron: '0 */4 * * *'
-  workflow_dispatch:
-
-jobs:
-  sync:
-    name: Bump internal-docs submodule pointer on dev
-    runs-on: ubuntu-latest
-    steps:
-      - name: Rewrite SSH submodule URL to HTTPS-with-token
-        env:
-          TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
-        run: |
-          git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
-
-      - uses: actions/checkout@v4
-        with:
-          token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
-          submodules: true
-          ref: dev
-          fetch-depth: 50
-
-      - name: Bump submodule pointer if internal-docs has new commits
-        env:
-          GH_TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
-        run: |
-          set -e
-
-          # Skip if submodule not yet configured (handoff window before someone adds it)
-          if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
-            echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
-            exit 0
-          fi
-
-          git submodule update --remote --merge .internal-docs
-
-          if git diff --quiet .internal-docs; then
-            echo "No internal-docs changes to sync."
-            exit 0
-          fi
-
-          git config user.name  "browseros-bot"
-          git config user.email "bot@browseros.ai"
-          git add .internal-docs
-          git commit -m "chore: sync internal-docs submodule"
-
-          # Rebase onto latest dev to absorb any commits that landed during the run,
-          # then push. set -e takes care of failing the run on rebase conflict.
-          git pull --rebase origin dev
-          git push origin dev
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -63,15 +63,15 @@ jobs:
            junit_path: test-results/server-root.xml
            needs_browser: false
          - suite: agent
-            command: (cd apps/agent && bun run test)
+            command: bun run test:agent
            junit_path: test-results/agent.xml
            needs_browser: false
          - suite: eval
-            command: (cd apps/eval && bun run test)
+            command: bun run test:eval
            junit_path: test-results/eval.xml
            needs_browser: false
          - suite: build
-            command: bun run ./scripts/run-bun-test.ts ./scripts/build
+            command: bun run test:build
            junit_path: test-results/build.xml
            needs_browser: false

--- a/.gitmodules
+++ b/.gitmodules
@@ -1,4 +0,0 @@
-[submodule ".internal-docs"]
-	path = .internal-docs
-	url = git@github.com:browseros-ai/internal-docs.git
-	branch = main
--- a/.internal-docs
+++ b/.internal-docs
--- a/README.md
+++ b/README.md
@@ -188,21 +188,6 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
 - [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
 - [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.

-## Citation
-
-If you use BrowserOS in your research or project, please cite:
-
-```bibtex
-@software{browseros2025,
-  author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
-  title = {BrowserOS: The open-source Agentic browser},
-  url = {https://github.com/browseros-ai/BrowserOS},
-  year = {2025},
-  publisher = {GitHub},
-  license = {AGPL-3.0},
-}
-```
-
 ## License

 BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
--- a/packages/browseros-agent/README.md
+++ b/packages/browseros-agent/README.md
@@ -79,15 +79,14 @@ cp apps/server/.env.example apps/server/.env.development
 cp apps/agent/.env.example apps/agent/.env.development
 cp apps/server/.env.production.example apps/server/.env.production

-# Install deps and generate agent code
+# Install deps, generate agent code, and sync the VM cache
 bun run dev:setup

 # Start the full dev environment
 bun run dev:watch
 ```

-`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
-the server startup path and pulls the configured GHCR image on demand.
+`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.

 ### Environment Variables

@@ -157,14 +156,9 @@ bun run build:server          # Build production server resource artifacts and u
 bun run build:agent           # Build agent extension

 # Test
-bun run test                  # Run all tests
-bun run test:all              # Run all tests
-bun run test:main             # Run key server tools and integration tests
-
-# App-specific test groups (from packages/browseros-agent)
-cd apps/server && bun run test:tools
-cd apps/server && bun run test:cdp
-cd apps/server && bun run test:integration
+bun run test                  # Run standard tests
+bun run test:cdp              # Run CDP-based tests
+bun run test:integration      # Run integration tests

 # Quality
 bun run lint                  # Check with Biome
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
@@ -0,0 +1,136 @@
+import { Bot, Loader2, Wrench } from 'lucide-react'
+import type { FC } from 'react'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import { cn } from '@/lib/utils'
+
+interface AgentCardProps {
+  agent: AgentCardData
+  onClick: () => void
+  active?: boolean
+}
+
+function formatTimestamp(timestamp?: number): string {
+  if (!timestamp) return 'No activity yet'
+  const diff = Date.now() - timestamp
+  const minutes = Math.floor(diff / 60000)
+  if (minutes < 1) return 'just now'
+  if (minutes < 60) return `${minutes}m ago`
+  const hours = Math.floor(minutes / 60)
+  if (hours < 24) return `${hours}h ago`
+  return `${Math.floor(hours / 24)}d ago`
+}
+
+function getStatusLabel(status: AgentCardData['status']): string {
+  if (status === 'working') return 'Working'
+  if (status === 'error') return 'Error'
+  return 'Ready'
+}
+
+function getStatusTone(status: AgentCardData['status']): string {
+  if (status === 'working') return 'bg-amber-500'
+  if (status === 'error') return 'bg-destructive'
+  return 'bg-emerald-500'
+}
+
+function formatCost(usd: number): string {
+  if (usd < 0.005) return `$${usd.toFixed(4)}`
+  return `$${usd.toFixed(2)}`
+}
+
+export const AgentCardExpanded: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
+      active
+        ? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
+    )}
+  >
+    <div className="flex items-start justify-between gap-3">
+      <div className="flex min-w-0 items-center gap-3">
+        <div
+          className={cn(
+            'flex size-10 shrink-0 items-center justify-center rounded-xl',
+            active
+              ? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
+              : 'bg-muted text-muted-foreground',
+          )}
+        >
+          <Bot className="size-5" />
+        </div>
+        <div className="min-w-0">
+          <div className="truncate font-semibold text-sm">{agent.name}</div>
+          <div className="truncate text-muted-foreground text-xs">
+            {agent.model ?? 'OpenClaw agent'}
+          </div>
+        </div>
+      </div>
+      <div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
+        <span
+          className={cn('size-2 rounded-full', getStatusTone(agent.status))}
+        />
+        <span>{getStatusLabel(agent.status)}</span>
+      </div>
+    </div>
+
+    <div className="mt-4 flex-1">
+      <p className="line-clamp-2 text-foreground/90 text-sm">
+        {agent.lastMessage ??
+          'Start a conversation to see recent work and summaries.'}
+      </p>
+    </div>
+
+    <div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
+      <div className="flex items-center justify-between gap-3">
+        <span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
+        {agent.costUsd ? (
+          <span className="tabular-nums opacity-70">
+            {formatCost(agent.costUsd)}
+          </span>
+        ) : null}
+      </div>
+      {agent.status === 'working' && agent.currentTool ? (
+        <div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
+          <Loader2 className="size-3 shrink-0 animate-spin" />
+          <span className="truncate">{agent.currentTool}</span>
+        </div>
+      ) : agent.activitySummary ? (
+        <div className="flex items-center gap-1.5 text-muted-foreground/60">
+          <Wrench className="size-3 shrink-0" />
+          <span className="truncate">{agent.activitySummary}</span>
+        </div>
+      ) : null}
+    </div>
+  </button>
+)
+
+export const AgentCardCompact: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
+      active
+        ? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
+    )}
+  >
+    <span
+      className={cn(
+        'size-2 rounded-full',
+        active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
+      )}
+    />
+    <span className="truncate">{agent.name}</span>
+  </button>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
@@ -1,71 +1,70 @@
 import { Plus } from 'lucide-react'
 import type { FC } from 'react'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { cn } from '@/lib/utils'
-import { HomeAgentCard } from './HomeAgentCard'
+import { AgentCardCompact, AgentCardExpanded } from './AgentCard'

 interface AgentCardDockProps {
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  activeAgentId?: string
  onSelectAgent: (agentId: string) => void
  onCreateAgent?: () => void
+  compact?: boolean
 }

-function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
+function CreateAgentButton({
+  compact,
+  onCreateAgent,
+}: {
+  compact?: boolean
+  onCreateAgent: () => void
+}) {
  return (
    <button
      type="button"
      onClick={onCreateAgent}
      className={cn(
-        'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
-        'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        compact
+          ? 'rounded-full px-3 py-2 text-sm'
+          : 'min-h-32 rounded-2xl px-5 py-4',
      )}
    >
-      <Plus className="size-5" />
-      <span>Create agent</span>
+      <Plus className={compact ? 'size-3.5' : 'size-5'} />
+      <span>{compact ? 'New' : 'Create agent'}</span>
    </button>
  )
 }

-/**
- * 3-column grid of HomeAgentCards plus a trailing "Create agent"
- * tile. The previous `compact` mode (rendered a horizontal pill rail)
- * had no callers and was dropped along with the legacy AgentCard.
- */
 export const AgentCardDock: FC<AgentCardDockProps> = ({
  agents,
-  adapters,
  activeAgentId,
  onSelectAgent,
  onCreateAgent,
+  compact,
 }) => {
  if (agents.length === 0 && !onCreateAgent) return null

-  const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
-  for (const descriptor of adapters) {
-    if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
-  }
+  const Card = compact ? AgentCardCompact : AgentCardExpanded

  return (
-    <div className="grid gap-4 md:grid-cols-3">
+    <div
+      className={cn(
+        compact
+          ? 'flex items-center gap-2 overflow-x-auto pb-1'
+          : 'grid gap-4 md:grid-cols-3',
+      )}
+    >
      {agents.map((agent) => (
-        <HomeAgentCard
-          key={agent.id}
+        <Card
+          key={agent.agentId}
          agent={agent}
-          adapter={agent.adapter}
-          adapterHealth={adapterHealth.get(agent.adapter) ?? null}
-          active={agent.id === activeAgentId}
-          onClick={() => onSelectAgent(agent.id)}
+          active={agent.agentId === activeAgentId}
+          onClick={() => onSelectAgent(agent.agentId)}
        />
      ))}
      {onCreateAgent ? (
-        <CreateAgentButton onCreateAgent={onCreateAgent} />
+        <CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
      ) : null}
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
@@ -2,12 +2,6 @@ import { ArrowLeft, Bot, Home } from 'lucide-react'
 import { type FC, useEffect, useMemo, useRef } from 'react'
 import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
 import { Button } from '@/components/ui/button'
-import {
-  cancelHarnessTurn,
-  useEnqueueHarnessMessage,
-  useHarnessAgents,
-  useRemoveHarnessQueuedMessage,
-} from '@/entrypoints/app/agents/useAgents'
 import {
  type AgentEntry,
  getModelDisplayName,
@@ -21,7 +15,6 @@ import {
  filterTurnsPersistedInHistory,
  flattenHistoryPages,
 } from './claw-chat-types'
-import { QueuePanel } from './QueuePanel'
 import { useAgentConversation } from './useAgentConversation'
 import { useHarnessChatHistory } from './useHarnessChatHistory'

@@ -219,33 +212,15 @@ function AgentConversationController({
    [historyMessages],
  )

-  // Listing query feeds queue + active-turn state for this agent. We
-  // already poll it every 5s for the rail; reusing the same cache
-  // keeps cross-tab queue state in sync without a second poll.
-  const { harnessAgents } = useHarnessAgents()
-  const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
-  const queue = harnessAgent?.queue ?? []
-  const activeTurnId = harnessAgent?.activeTurnId ?? null
-
  const { turns, streaming, send } = useAgentConversation(agentId, {
    runtime: 'agent-harness',
    sessionKey: null,
    history: chatHistory,
-    activeTurnId,
    onComplete: () => {
      void harnessHistoryQuery.refetch()
    },
    onSessionKeyChange: () => {},
  })
-  const enqueueMessage = useEnqueueHarnessMessage()
-  const removeQueuedMessage = useRemoveHarnessQueuedMessage()
-
-  const handleStop = () => {
-    void cancelHarnessTurn(agentId, {
-      turnId: activeTurnId ?? undefined,
-      reason: 'user pressed stop',
-    })
-  }
  const visibleTurns = useMemo(
    () => filterTurnsPersistedInHistory(turns, historyMessages),
    [historyMessages, turns],
@@ -306,15 +281,7 @@ function AgentConversationController({
      />

      <div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
-        <div className="mx-auto max-w-3xl space-y-3">
-          {queue.length > 0 ? (
-            <QueuePanel
-              queue={queue}
-              onRemove={(messageId) =>
-                removeQueuedMessage.mutate({ agentId, messageId })
-              }
-            />
-          ) : null}
+        <div className="mx-auto max-w-3xl">
          <ConversationInput
            variant="conversation"
            agents={agents}
@@ -329,31 +296,14 @@ function AgentConversationController({
                name: a.name,
                dataUrl: a.dataUrl,
              }))
-              // When the agent already has an in-flight turn, route
-              // the new message into the durable queue instead of
-              // starting a parallel turn. Drains automatically as
-              // soon as the active turn ends.
-              if (streaming || activeTurnId) {
-                enqueueMessage.mutate({
-                  agentId,
-                  message: input.text,
-                  attachments,
-                })
-                return
-              }
              void send({ text: input.text, attachments, attachmentPreviews })
            }}
            onCreateAgent={() => navigate(createAgentPath)}
-            onStop={handleStop}
            streaming={streaming}
            disabled={disabled}
            status="running"
            attachmentsEnabled={true}
-            placeholder={
-              streaming
-                ? `Type to queue another message for ${agentName}...`
-                : `Message ${agentName}...`
-            }
+            placeholder={`Message ${agentName}...`}
          />
        </div>
      </div>
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
@@ -1,25 +1,18 @@
 import { Plus } from 'lucide-react'
-import { type FC, useEffect, useMemo, useState } from 'react'
+import { type FC, useEffect, useState } from 'react'
 import { useNavigate } from 'react-router'
 import { Button } from '@/components/ui/button'
 import { Card, CardContent } from '@/components/ui/card'
 import { Separator } from '@/components/ui/separator'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import {
-  useAgentAdapters,
-  useHarnessAgents,
-} from '@/entrypoints/app/agents/useAgents'
 import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
 import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
 import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
 import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { AgentCardDock } from './AgentCardDock'
 import { useAgentCommandData } from './agent-command-layout'
 import { ConversationInput } from './ConversationInput'
-import { orderHomeAgents } from './home-agent-card.helpers'
+import { buildAgentCardData } from './useAgentCardData'

 function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
  return (
@@ -45,13 +38,11 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
 function RecentThreads({
  activeAgentId,
  agents,
-  adapters,
  onOpenAgents,
  onSelectAgent,
 }: {
  activeAgentId?: string | null
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  onOpenAgents: () => void
  onSelectAgent: (agentId: string) => void
 }) {
@@ -77,7 +68,6 @@ function RecentThreads({
      </div>
      <AgentCardDock
        agents={agents}
-        adapters={adapters}
        activeAgentId={activeAgentId ?? undefined}
        onSelectAgent={onSelectAgent}
        onCreateAgent={onOpenAgents}
@@ -89,32 +79,25 @@ function RecentThreads({
 export const AgentCommandHome: FC = () => {
  const navigate = useNavigate()
  const activeHint = useActiveHint()
-  // The conversation input still consumes the merged AgentEntry list
-  // from the layout context (handles legacy /claw/agents entries that
-  // haven't yet been backfilled into the harness store). The Recent
-  // Agents grid below reads the richer harness payload directly.
-  const { agents: legacyAgents, status } = useAgentCommandData()
-  const { harnessAgents } = useHarnessAgents()
-  const { adapters } = useAgentAdapters()
+  const { agents, status } = useAgentCommandData()
  const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
-
-  const orderedAgents = useMemo(
-    () => orderHomeAgents(harnessAgents),
-    [harnessAgents],
-  )
+  const cardData = buildAgentCardData(agents, status?.status, undefined)

  useEffect(() => {
-    if (legacyAgents.length === 0) {
-      if (selectedAgentId) setSelectedAgentId(null)
+    if (agents.length === 0) {
+      if (selectedAgentId) {
+        setSelectedAgentId(null)
+      }
      return
    }
+
    if (
      !selectedAgentId ||
-      !legacyAgents.some((agent) => agent.agentId === selectedAgentId)
+      !agents.some((agent) => agent.agentId === selectedAgentId)
    ) {
-      setSelectedAgentId(legacyAgents[0].agentId)
+      setSelectedAgentId(agents[0].agentId)
    }
-  }, [legacyAgents, selectedAgentId])
+  }, [agents, selectedAgentId])

  const handleSend = (input: { text: string }) => {
    if (!selectedAgentId) return
@@ -127,7 +110,7 @@ export const AgentCommandHome: FC = () => {
    setSelectedAgentId(agent.agentId)
  }

-  const selectedAgent = legacyAgents.find(
+  const selectedAgent = agents.find(
    (agent) => agent.agentId === selectedAgentId,
  )
  const selectedAgentReady = selectedAgent
@@ -135,15 +118,13 @@ export const AgentCommandHome: FC = () => {
    : false
  const selectedAgentStatus =
    selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
-  const selectedAgentName =
-    selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
-
-  const hasAgents = legacyAgents.length > 0
+  const selectedCard =
+    cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]

  return (
    <div className="min-h-full px-4 py-6">
      <div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
-        {hasAgents ? (
+        {cardData.length > 0 ? (
          <>
            <div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
              <div className="space-y-3">
@@ -159,7 +140,7 @@ export const AgentCommandHome: FC = () => {
              <div className="w-full max-w-3xl">
                <ConversationInput
                  variant="home"
-                  agents={legacyAgents}
+                  agents={agents}
                  selectedAgentId={selectedAgentId}
                  onSelectAgent={handleSelectAgent}
                  onSend={handleSend}
@@ -170,7 +151,7 @@ export const AgentCommandHome: FC = () => {
                  attachmentsEnabled={false}
                  placeholder={
                    selectedAgentReady
-                      ? `Ask ${selectedAgentName} to handle a task...`
+                      ? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
                      : 'Agent runtime is not running...'
                  }
                />
@@ -181,8 +162,7 @@ export const AgentCommandHome: FC = () => {

            <RecentThreads
              activeAgentId={selectedAgentId}
-              agents={orderedAgents}
-              adapters={adapters}
+              agents={cardData}
              onOpenAgents={() => navigate('/agents')}
              onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
            />
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
@@ -54,40 +54,25 @@ interface ConversationInputProps {
  placeholder?: string
  attachmentsEnabled?: boolean
  variant?: 'home' | 'conversation'
-  /**
-   * When set, a Stop button surfaces to the left of the voice mic
-   * while `streaming === true`. Click cancels the active turn
-   * server-side via the chat-cancel endpoint. Absent → no Stop
-   * button (legacy behaviour for the home composer).
-   */
-  onStop?: () => void
 }

 function InputActionButton({
  disabled,
  onClick,
  streaming,
-  hasContent,
 }: {
  disabled: boolean
  onClick: () => void
  streaming: boolean
-  hasContent: boolean
 }) {
-  // Show the spinner while streaming only when there's nothing to
-  // send — once the user types something, the icon flips back to the
-  // paper-plane so it reads as "queue this message" instead of
-  // "still working".
-  const showSpinner = streaming && !hasContent
  return (
    <Button
      onClick={onClick}
      size="icon"
      disabled={disabled}
-      title={streaming && hasContent ? 'Queue message' : undefined}
      className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
    >
-      {showSpinner ? (
+      {streaming ? (
        <Loader2 className="h-5 w-5 animate-spin" />
      ) : (
        <ArrowRight className="h-5 w-5" />
@@ -96,22 +81,6 @@ function InputActionButton({
  )
 }

-function StopButton({ onStop }: { onStop: () => void }) {
-  return (
-    <Button
-      type="button"
-      size="icon"
-      variant="ghost"
-      onClick={onStop}
-      title="Stop current turn — queued messages will start next."
-      aria-label="Stop current turn"
-      className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
-    >
-      <Square className="h-3.5 w-3.5 fill-current" />
-    </Button>
-  )
-}
-
 function VoiceButton({
  isRecording,
  isTranscribing,
@@ -330,7 +299,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  placeholder,
  attachmentsEnabled = true,
  variant = 'conversation',
-  onStop,
 }) => {
  const [input, setInput] = useState('')
  const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -411,17 +379,10 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  }

  const hasContent = input.trim().length > 0 || attachments.length > 0
-  // Queue-aware composers (the conversation panel passes `onStop`)
-  // accept input while streaming — the parent decides whether the
-  // submission opens a new turn or enqueues onto the active one.
-  // Surfaces without a Stop hook (home) keep the legacy behaviour
-  // and block input until the current turn finishes.
-  const queueAware = Boolean(onStop)

  const handleSend = () => {
    const text = input.trim()
-    if (disabled || isStaging) return
-    if (streaming && !queueAware) return
+    if (disabled || isStaging || streaming) return
    if (!text && attachments.length === 0) return
    onSend({ text, attachments })
    setInput('')
@@ -551,7 +512,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              )}
            />
          </div>
-          {streaming && onStop ? <StopButton onStop={onStop} /> : null}
          <VoiceButton
            isRecording={voice.isRecording}
            isTranscribing={voice.isTranscribing}
@@ -569,13 +529,12 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              !!disabled ||
              voice.isRecording ||
              voice.isTranscribing ||
-              (streaming && !queueAware)
+              streaming
            }
            onClick={handleSend}
            // Spinner stays the user-facing "agent is busy" hint; with the
            // queue active we still spin while a turn is in flight.
            streaming={streaming}
-            hasContent={hasContent}
          />
        </div>
        {voice.error ? (
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
@@ -1,243 +0,0 @@
-import { Quote, TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
-import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
-import type {
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
-import {
-  firstNonBlankLine,
-  truncate,
-} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
-import { cn } from '@/lib/utils'
-
-interface HomeAgentCardProps {
-  agent: HarnessAgent
-  adapter: HarnessAgentAdapter | 'unknown'
-  /** Per-adapter health snapshot, shared across cards rendering the
-   *  same adapter. `null` when the /adapters response hasn't surfaced
-   *  health yet (we treat that as healthy until proven otherwise). */
-  adapterHealth: HarnessAdapterHealth | null
-  /** Highlights the card with an accent ring; tells the user which
-   *  agent the conversation input is bound to. */
-  active?: boolean
-  onClick: () => void
-}
-
-const PREVIEW_CHARS = 100
-
-/**
- * Grid-shaped card for the /home Recent agents section. Composition
- * mirrors the rail's `AgentRowCard` but the layout is a vertical
- * column sized for a 1/3-width tile rather than a full-width row.
- *
- * Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
- * `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
- * inline `Unavailable` chip pattern so the visual language is
- * continuous between rail and grid.
- */
-export const HomeAgentCard: FC<HomeAgentCardProps> = ({
-  agent,
-  adapter,
-  adapterHealth,
-  active,
-  onClick,
-}) => {
-  const status = agent.status ?? 'unknown'
-  const lastUsedAt = agent.lastUsedAt ?? null
-  const isWorking = status === 'working'
-  const isAsleep = status === 'asleep'
-  const isError = status === 'error'
-  const hasActiveTurn = Boolean(agent.activeTurnId)
-
-  return (
-    <button
-      type="button"
-      onClick={onClick}
-      className={cn(
-        'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
-        active && 'ring-1 ring-[var(--accent-orange)]/30',
-        isWorking
-          ? 'border-[var(--accent-orange)]/40'
-          : isError
-            ? 'border-destructive/30'
-            : 'border-border/60 hover:border-[var(--accent-orange)]/30',
-      )}
-    >
-      <div className="flex items-start gap-3">
-        <AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
-        <div className="min-w-0 flex-1">
-          <div className="flex items-center gap-1.5">
-            <span className="truncate font-semibold text-sm">
-              {displayName(agent)}
-            </span>
-            {isWorking && (
-              <Badge
-                variant="secondary"
-                className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
-              >
-                Working
-              </Badge>
-            )}
-          </div>
-          <SummaryLine
-            adapter={adapter}
-            modelId={agent.modelId ?? null}
-            reasoningEffort={agent.reasoningEffort ?? null}
-            adapterHealth={adapterHealth}
-          />
-        </div>
-      </div>
-
-      <LastMessage message={agent.lastUserMessage ?? null} />
-
-      <div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
-        <span>{statusFootnote(status, lastUsedAt)}</span>
-        {hasActiveTurn ? (
-          <ResumeChip />
-        ) : isAsleep ? (
-          <Badge variant="outline" className="text-muted-foreground">
-            Asleep
-          </Badge>
-        ) : isError ? (
-          <ErrorChip lastError={agent.lastError ?? null} />
-        ) : null}
-      </div>
-    </button>
-  )
-}
-
-const SummaryLine: FC<{
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelId: string | null
-  reasoningEffort: string | null
-  adapterHealth: HarnessAdapterHealth | null
-}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelId) parts.push(modelId)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth?.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
-
-const LastMessage: FC<{ message: string | null }> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  return (
-    <p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="line-clamp-2">
-        {truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
-      </span>
-    </p>
-  )
-}
-
-const ResumeChip: FC = () => (
-  <span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
-    <span className="relative flex size-1.5">
-      <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-      <span className="relative inline-flex size-1.5 rounded-full bg-white" />
-    </span>
-    Resume
-  </span>
-)
-
-const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
-  if (!lastError) {
-    return <Badge variant="destructive">Attention</Badge>
-  }
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <Badge variant="destructive" className="cursor-default">
-          Attention
-        </Badge>
-      </HoverCardTrigger>
-      <HoverCardContent
-        side="left"
-        className="max-w-xs whitespace-pre-wrap font-mono text-xs"
-      >
-        {lastError}
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
-
-/**
- * Footer left side: relative time on every state EXCEPT working,
- * which shows `now` (the dot is already pulsing — restating it as
- * "Working" would duplicate the pill in the title row).
- */
-function statusFootnote(
-  status: AgentLiveness,
-  lastUsedAt: number | null,
-): string {
-  if (status === 'working') return 'now'
-  return formatRelativeTime(lastUsedAt)
-}
-
-const UUID_PATTERN =
-  /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-const OC_UUID_PATTERN =
-  /^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-
-function displayName(agent: HarnessAgent): string {
-  const name = agent.name?.trim()
-  const id = agent.id
-  if (!name || name === id) {
-    if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
-    if (UUID_PATTERN.test(id)) return id.slice(0, 8)
-    return id
-  }
-  return name
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
@@ -1,94 +0,0 @@
-import { ListPlus, X } from 'lucide-react'
-import type { FC } from 'react'
-import {
-  Queue,
-  QueueItem,
-  QueueItemAction,
-  QueueItemActions,
-  QueueItemAttachment,
-  QueueItemContent,
-  QueueItemFile,
-  QueueItemImage,
-  QueueList,
-  QueueSection,
-  QueueSectionContent,
-  QueueSectionLabel,
-  QueueSectionTrigger,
-} from '@/components/ai-elements/queue'
-import type {
-  HarnessQueuedMessage,
-  HarnessQueuedMessageAttachment,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-
-interface QueuePanelProps {
-  queue: HarnessQueuedMessage[]
-  onRemove: (messageId: string) => void
-}
-
-/**
- * Renders the agent's pending message queue using the shared AI
- * Elements `Queue` primitives. Caller is expected to gate render on
- * `queue.length > 0` — when empty, this returns null so the panel
- * disappears cleanly between turns.
- */
-export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
-  if (queue.length === 0) return null
-  return (
-    <Queue>
-      <QueueSection>
-        <QueueSectionTrigger>
-          <QueueSectionLabel
-            count={queue.length}
-            label={queue.length === 1 ? 'queued message' : 'queued messages'}
-            icon={<ListPlus className="size-3.5" />}
-          />
-        </QueueSectionTrigger>
-        <QueueSectionContent>
-          <QueueList>
-            {queue.map((entry) => (
-              <QueueItem key={entry.id}>
-                <div className="flex items-center gap-2">
-                  <QueueItemContent>
-                    {firstNonBlankLine(entry.message)}
-                  </QueueItemContent>
-                  <QueueItemActions>
-                    <QueueItemAction
-                      aria-label="Remove from queue"
-                      onClick={() => onRemove(entry.id)}
-                    >
-                      <X className="size-3" />
-                    </QueueItemAction>
-                  </QueueItemActions>
-                </div>
-                {entry.attachments && entry.attachments.length > 0 ? (
-                  <QueueItemAttachment>
-                    {entry.attachments.map((attachment, idx) =>
-                      renderAttachment(entry.id, attachment, idx),
-                    )}
-                  </QueueItemAttachment>
-                ) : null}
-              </QueueItem>
-            ))}
-          </QueueList>
-        </QueueSectionContent>
-      </QueueSection>
-    </Queue>
-  )
-}
-
-function renderAttachment(
-  messageId: string,
-  attachment: HarnessQueuedMessageAttachment,
-  idx: number,
-) {
-  if (attachment.mediaType.startsWith('image/')) {
-    const src = `data:${attachment.mediaType};base64,${attachment.data}`
-    return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
-  }
-  return (
-    <QueueItemFile key={`${messageId}-${idx}`}>
-      {attachment.mediaType}
-    </QueueItemFile>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
@@ -1,69 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-import { orderHomeAgents } from './home-agent-card.helpers'
-
-function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
-  return {
-    id: overrides.id ?? 'agent-x',
-    name: overrides.name ?? overrides.id ?? 'agent-x',
-    adapter: overrides.adapter ?? 'codex',
-    permissionMode: 'approve-all',
-    sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
-    createdAt: 1000,
-    updatedAt: 1000,
-    ...overrides,
-  }
-}
-
-describe('orderHomeAgents', () => {
-  it('places active-turn agents before everyone else', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'a', lastUsedAt: 5000 }),
-      agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
-      agent({ id: 'c', lastUsedAt: 7000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
-  })
-
-  it('orders non-active agents by lastUsedAt desc', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'old', lastUsedAt: 1000 }),
-      agent({ id: 'new', lastUsedAt: 9000 }),
-      agent({ id: 'mid', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
-  })
-
-  it('puts the gateway `main` seed agent above other never-used agents', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
-  })
-
-  it('sends never-used agents to the bottom even when `main` is among them', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'used', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
-  })
-
-  it('does NOT sort by pinned — pinned agents are treated like any other', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
-      agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
-  })
-
-  it('falls back to id-stable ordering when lastUsedAt ties', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'b', lastUsedAt: 5000 }),
-      agent({ id: 'a', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
@@ -1,42 +0,0 @@
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-
-/**
- * Order for the /home Recent agents grid.
- *
- * 1. Active turn first — agents mid-turn float to the top so the
- *    Resume affordance is the first thing the user sees on /home.
- * 2. The protected gateway-side `main` agent stays pinned-to-top in
- *    the never-used group on a fresh install (mirrors the rail).
- * 3. Recency (`lastUsedAt` desc).
- * 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
- *    every 5-second poll.
- *
- * Pin is NOT a sort key. The home grid is action-oriented and trusts
- * recency + active-turn to surface the right agent; pinning is an
- * organisation tool that lives on the rail at /agents.
- */
-export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
-  return [...agents].sort((a, b) => {
-    const aActive = a.activeTurnId != null
-    const bActive = b.activeTurnId != null
-    if (aActive !== bActive) return aActive ? -1 : 1
-
-    // Recency wins outright. Never-used agents (`lastUsedAt == null`)
-    // both fall to the same `-Infinity` bucket and the seed/id rules
-    // below decide their order — but a used agent always beats any
-    // never-used agent regardless of id.
-    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    if (aValue !== bValue) return bValue - aValue
-
-    // Inside the never-used (or exact-tie) group: pin the gateway
-    // `main` seed to the top of the group on a fresh install, then
-    // fall back to id-stable order so the grid doesn't reshuffle on
-    // every poll.
-    const aSeed = a.id === 'main' && a.lastUsedAt == null
-    const bSeed = b.id === 'main' && b.lastUsedAt == null
-    if (aSeed !== bSeed) return aSeed ? -1 : 1
-
-    return a.id.localeCompare(b.id)
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
@@ -0,0 +1,53 @@
+import {
+  type AgentEntry,
+  getModelDisplayName,
+  type OpenClawStatus,
+} from '@/entrypoints/app/agents/useOpenClaw'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import type { AgentOverview } from './useAgentDashboard'
+
+function resolveAgentStatus(
+  gatewayStatus: OpenClawStatus['status'] | undefined,
+  liveStatus: AgentOverview['status'] | undefined,
+): AgentCardData['status'] {
+  // Gateway-level errors take precedence
+  if (gatewayStatus === 'error') return 'error'
+  if (gatewayStatus === 'starting') return 'working'
+
+  // Per-agent live status from the WS observer
+  if (liveStatus === 'working') return 'working'
+  if (liveStatus === 'error') return 'error'
+
+  return 'idle'
+}
+
+/**
+ * Build agent card display data by merging the raw agent entries from
+ * the gateway with enriched overview data from the dashboard API.
+ *
+ * Pure function — no hooks, no IndexedDB, no async.
+ */
+export function buildAgentCardData(
+  agents: AgentEntry[],
+  status: OpenClawStatus['status'] | undefined,
+  dashboard: AgentOverview[] | undefined,
+): AgentCardData[] {
+  return agents.map((agent) => {
+    const overview = dashboard?.find((d) => d.agentId === agent.agentId)
+
+    return {
+      agentId: agent.agentId,
+      name: agent.name,
+      model: getModelDisplayName(agent.model),
+      status:
+        agent.source === 'agent-harness'
+          ? 'idle'
+          : resolveAgentStatus(status, overview?.status),
+      lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
+      lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
+      activitySummary: overview?.activitySummary ?? undefined,
+      currentTool: overview?.currentTool ?? undefined,
+      costUsd: overview?.totalCostUsd ?? undefined,
+    }
+  })
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
@@ -36,15 +36,6 @@ interface UseAgentConversationOptions {
  history?: OpenClawChatHistoryMessage[]
  onComplete?: () => void
  onSessionKeyChange?: (sessionKey: string) => void
-  /**
-   * Server-side active turn id, surfaced via the listing query. When
-   * this changes from null/<id> to a different non-null id while we
-   * aren't already streaming (e.g. the server just popped a queued
-   * message and started a new turn), the hook reattaches via
-   * /chat/active so the chat panel picks up the live stream without
-   * waiting for a remount.
-   */
-  activeTurnId?: string | null
 }

 export function useAgentConversation(
@@ -220,46 +211,31 @@ export function useAgentConversation(
  }
  processEventRef.current = processAgentHarnessStreamEvent

-  const activeTurnIdDep = options.activeTurnId ?? null
-
-  // On mount, on agent change, and whenever the listing reports a
-  // *new* active turn id, check whether the server has an in-flight
-  // turn for this agent and reattach to it. This catches three
-  // cases at once: the chat resilience flow (tab close/reopen),
-  // navigation between agents, AND queue drain (the server starts a
-  // new turn from a queued message → activeTurnId flips → attach).
+  // On mount (and whenever the agent changes), check whether the
+  // server has an in-flight turn for this agent and reattach to it.
+  // This is what makes the chat resilient across tab close/reopen,
+  // refresh, and navigation: the runtime call kept running on the
+  // server while we were away. Effect only depends on `agentId` —
+  // the event handler is read off a ref so this doesn't re-subscribe
+  // every render.
  useEffect(() => {
    let cancelled = false
    const abortController = new AbortController()
-    // Reference the dep inside the body so biome's exhaustive-deps
-    // rule sees it consumed; the value is just an "any non-null
-    // active turn id" trigger — the actual id we attach to comes
-    // from the fresh fetchActiveHarnessTurn call below.
-    void activeTurnIdDep

    const attemptResume = async () => {
-      // Track whether *we* started a stream in this run. When the
-      // early-return paths fire (no active turn, or a `send()` /
-      // earlier resume already owns `streamAbortRef`), the finally
-      // block must NOT touch streaming/turnIdRef/lastSeqRef —
-      // otherwise we clobber the in-flight stream's state and the
-      // Stop button drops out mid-turn while events keep arriving.
-      let weStartedStream = false
      try {
        const active = await fetchActiveHarnessTurn(agentId)
        if (cancelled || !active || active.status !== 'running') return
-        if (streamAbortRef.current) return // someone else already owns the stream
+        if (streamAbortRef.current) return // a fresh send already in flight

        // Stage a placeholder turn so the streamed events have a row
-        // to render into. The server now persists the kicking-off
-        // prompt on the active turn, so we render it as the user
-        // bubble immediately — no empty-bubble flicker when a queued
-        // message starts running.
+        // to render into. We don't have the user message text on
+        // resume; the assistant turn is what we're catching up on.
        setTurns((prev) => [
          ...prev,
          {
            id: crypto.randomUUID(),
-            userText: active.prompt ?? '',
+            userText: '',
            parts: [],
            done: false,
            timestamp: active.startedAt,
@@ -271,7 +247,6 @@ export function useAgentConversation(
        lastSeqRef.current = null
        streamAbortRef.current = abortController
        setStreaming(true)
-        weStartedStream = true

        const response = await attachToHarnessTurn(agentId, {
          turnId: active.turnId,
@@ -290,20 +265,10 @@ export function useAgentConversation(
        // Resume is best-effort; transient errors fall back to the
        // user starting a new turn manually.
      } finally {
-        // Always release `streamAbortRef` if we owned it — even when
-        // the effect was cancelled mid-stream (a listing poll
-        // captured the next queue-drain turn id, for example). If we
-        // don't, the next effect run hits `if (streamAbortRef.current)
-        // return` against our now-aborted controller and never
-        // reattaches, leaving `streaming === true` with no live stream.
-        if (weStartedStream && streamAbortRef.current === abortController) {
-          streamAbortRef.current = null
-        }
-        // The other state (streaming flag, turn id, lastSeq) is the
-        // *current run's* lifecycle: only reset it on a clean exit.
-        // When `cancelled` is true the next run will set these
-        // itself, so resetting here would only cause a brief flicker.
-        if (!cancelled && weStartedStream) {
+        if (!cancelled) {
+          if (streamAbortRef.current === abortController) {
+            streamAbortRef.current = null
+          }
          turnIdRef.current = null
          lastSeqRef.current = null
          setStreaming(false)
@@ -316,7 +281,7 @@ export function useAgentConversation(
      cancelled = true
      abortController.abort()
    }
-  }, [agentId, activeTurnIdDep])
+  }, [agentId])

  const send = async (input: string | SendInput) => {
    const normalized: SendInput =
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
@@ -0,0 +1,95 @@
+import { useQuery, useQueryClient } from '@tanstack/react-query'
+import { useEffect } from 'react'
+import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
+
+export interface AgentOverview {
+  agentId: string
+  status: 'working' | 'idle' | 'error' | 'unknown'
+  latestMessage: string | null
+  latestMessageAt: number | null
+  activitySummary: string | null
+  currentTool: string | null
+  totalCostUsd: number
+  sessionCount: number
+}
+
+export interface DashboardResponse {
+  agents: AgentOverview[]
+  summary: {
+    totalAgents: number
+    totalCostUsd: number
+  }
+}
+
+interface StatusEvent {
+  agentId: string
+  status: AgentOverview['status']
+  currentTool: string | null
+  error: string | null
+  timestamp: number
+}
+
+const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
+
+export function useAgentDashboard(enabled: boolean) {
+  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+  const ready = enabled && Boolean(baseUrl) && !urlLoading
+
+  // Initial data load + periodic refresh as fallback
+  const query = useQuery<DashboardResponse>({
+    queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
+    queryFn: async () => {
+      const url = new URL('/claw/dashboard', baseUrl as string)
+      const response = await fetch(url.toString())
+      if (!response.ok) throw new Error('Failed to fetch dashboard')
+      return response.json()
+    },
+    enabled: ready,
+  })
+
+  // SSE subscription for real-time status patches
+  useEffect(() => {
+    if (!ready || !baseUrl) return
+
+    const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
+    const eventSource = new EventSource(streamUrl.toString())
+
+    eventSource.addEventListener('snapshot', (event) => {
+      try {
+        const dashboard = JSON.parse(event.data) as DashboardResponse
+        queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
+      } catch {}
+    })
+
+    eventSource.addEventListener('status', (event) => {
+      try {
+        const status = JSON.parse(event.data) as StatusEvent
+        queryClient.setQueryData<DashboardResponse>(
+          [...DASHBOARD_QUERY_KEY, baseUrl],
+          (prev) => {
+            if (!prev) return prev
+            return {
+              ...prev,
+              agents: prev.agents.map((agent) =>
+                agent.agentId === status.agentId
+                  ? {
+                      ...agent,
+                      status: status.status,
+                      currentTool: status.currentTool,
+                    }
+                  : agent,
+              ),
+            }
+          },
+        )
+      } catch {}
+    })
+
+    return () => {
+      eventSource.close()
+    }
+  }, [ready, baseUrl, queryClient])
+
+  return query
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
@@ -2,87 +2,67 @@ import { Loader2 } from 'lucide-react'
 import { type FC, useMemo } from 'react'
 import { AgentRowCard } from './AgentRowCard'
 import { AgentsEmptyState } from './AgentsEmptyState'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from './agent-harness-types'
-import type {
-  AgentAdapterHealth,
-  AgentRowData,
-} from './agent-row/agent-row.types'
+import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
 import type { AgentListItem } from './agents-page-types'
 import type { AgentLiveness } from './LivenessDot'

 interface AgentListProps {
  agents: AgentListItem[]
-  /** Optional per-agent activity metadata, keyed by `agentId`. */
+  /**
+   * Optional per-agent activity metadata. Keyed by `agentId`. Missing
+   * entries fall back to status='unknown' / lastUsedAt=null and the
+   * row renders an "unknown" dot. The server will populate this once
+   * the activity tracker ships; the page works without it.
+   */
  activity?: Record<
    string,
    { status: AgentLiveness; lastUsedAt: number | null }
  >
-  /** Lookup table from harness id → enriched agent record. */
+  /**
+   * Lookup table from harness agent id → adapter + reasoning effort,
+   * sourced from `useHarnessAgents`. Lets the row card render the
+   * correct adapter icon and chips for harness agents (legacy
+   * /claw/agents entries fall back to inferring from `runtimeLabel`).
+   */
  harnessAgentLookup?: Map<string, HarnessAgent>
-  /** Adapter catalog (carries per-adapter health). */
-  adapters: HarnessAdapterDescriptor[]
  loading: boolean
  deletingAgentKey: string | null
  onCreateAgent: () => void
  onDeleteAgent: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
 }

 export const AgentList: FC<AgentListProps> = ({
  agents,
  activity,
  harnessAgentLookup,
-  adapters,
  loading,
  deletingAgentKey,
  onCreateAgent,
  onDeleteAgent,
-  onPinToggle,
 }) => {
-  const adapterHealth = useMemo(() => {
-    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
-    for (const adapter of adapters) {
-      if (adapter.health) {
-        map.set(adapter.id, {
-          healthy: adapter.health.healthy,
-          reason: adapter.health.reason,
-        })
-      }
-    }
-    return map
-  }, [adapters])
-
-  // Sort: pinned rows first, then most recently used, then never-used
-  // agents in id-stable order. The gateway's `main` agent stays
-  // pinned-to-top when never touched so a fresh install has an
-  // obvious starting point.
+  // Sort by recency: most recently used first; never-used agents drop
+  // to the bottom in id-stable order so the list doesn't reshuffle on
+  // every refresh. The pinned exception is the gateway's `main` agent
+  // when it's never been touched — keep it at the top so a fresh
+  // install has an obvious starting point.
  const ordered = useMemo(() => {
-    const withMeta = agents.map((agent) => {
-      const harness = harnessAgentLookup?.get(agent.agentId)
-      return {
-        agent,
-        pinned: harness?.pinned ?? false,
-        lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
-      }
+    const withScore = agents.map((agent) => {
+      const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
+      return { agent, lastUsedAt }
    })
-    return withMeta
+    return withScore
      .sort((a, b) => {
-        if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
-        const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
-        const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
-        if (aSeed && !bSeed) return -1
-        if (!aSeed && bSeed) return 1
+        const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
+        const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
+        if (aPinned && !bPinned) return -1
+        if (!aPinned && bPinned) return 1
        const aValue = a.lastUsedAt ?? -Infinity
        const bValue = b.lastUsedAt ?? -Infinity
        if (aValue !== bValue) return bValue - aValue
        return a.agent.agentId.localeCompare(b.agent.agentId)
      })
      .map((entry) => entry.agent)
-  }, [activity, agents, harnessAgentLookup])
+  }, [activity, agents])

  if (loading && agents.length === 0) {
    return (
@@ -100,23 +80,18 @@ export const AgentList: FC<AgentListProps> = ({
    <div className="grid gap-3">
      {ordered.map((agent) => {
        const harness = harnessAgentLookup?.get(agent.agentId)
-        const adapter: HarnessAgentAdapter | 'unknown' =
+        const adapter: HarnessAgentAdapter | undefined =
          harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
-        const data = buildRowData({
-          agent,
-          adapter,
-          harness,
-          activity: activity?.[agent.agentId],
-          adapterHealth:
-            adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
-        })
        return (
          <AgentRowCard
            key={agent.key}
-            data={data}
-            deleting={deletingAgentKey === agent.key}
+            agent={agent}
+            status={activity?.[agent.agentId]?.status}
+            lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
+            adapter={adapter}
+            reasoningEffort={harness?.reasoningEffort ?? null}
            onDelete={onDeleteAgent}
-            onPinToggle={onPinToggle}
+            deleting={deletingAgentKey === agent.key}
          />
        )
      })}
@@ -124,53 +99,10 @@ export const AgentList: FC<AgentListProps> = ({
  )
 }

-function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
+function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
  const lower = label?.toLowerCase()
  if (lower === 'claude code') return 'claude'
  if (lower === 'codex') return 'codex'
  if (lower === 'openclaw') return 'openclaw'
-  return 'unknown'
-}
-
-const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
-
-function buildRowData(input: {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  harness: HarnessAgent | undefined
-  activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
-  adapterHealth: AgentAdapterHealth | null
-}): AgentRowData {
-  const { agent, adapter, harness, activity, adapterHealth } = input
-  return {
-    agent,
-    adapter,
-    modelLabel: deriveModelLabel(agent, harness),
-    reasoningEffort: harness?.reasoningEffort ?? null,
-    status: activity?.status ?? 'unknown',
-    lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
-    pinned: harness?.pinned ?? false,
-    cwd: harness?.cwd ?? null,
-    lastUserMessage: harness?.lastUserMessage ?? null,
-    tokens: harness?.tokens ?? null,
-    turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
-    failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
-    lastError: harness?.lastError ?? null,
-    lastErrorAt: harness?.lastErrorAt ?? null,
-    activeTurnId: harness?.activeTurnId ?? null,
-    adapterHealth,
-  }
-}
-
-function deriveModelLabel(
-  agent: AgentListItem,
-  harness: HarnessAgent | undefined,
-): string | null {
-  // Prefer the agent rail's modelLabel when meaningful; harness's
-  // modelId is a stable identifier but the rail's `modelLabel`
-  // already maps to a friendly display string.
-  if (agent.modelLabel && agent.modelLabel !== 'default') {
-    return agent.modelLabel
-  }
-  return harness?.modelId ?? null
+  return undefined
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
@@ -1,99 +1,270 @@
+import {
+  Copy,
+  Loader2,
+  MessageSquare,
+  MoreHorizontal,
+  Pencil,
+  RotateCcw,
+  Trash2,
+} from 'lucide-react'
 import type { FC } from 'react'
+import { useNavigate } from 'react-router'
+import { toast } from 'sonner'
+import { Badge } from '@/components/ui/badge'
+import { Button } from '@/components/ui/button'
+import {
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuItem,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from '@/components/ui/dropdown-menu'
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from '@/components/ui/tooltip'
 import { cn } from '@/lib/utils'
-import { AgentActions } from './agent-row/AgentActions'
-import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
-import { AgentLastMessage } from './agent-row/AgentLastMessage'
-import { AgentMetaRow } from './agent-row/AgentMetaRow'
-import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
-import { AgentTile } from './agent-row/AgentTile'
-import { AgentTitleRow } from './agent-row/AgentTitleRow'
-import type {
-  AgentRowCallbacks,
-  AgentRowData,
-} from './agent-row/agent-row.types'
+import { AdapterIcon, adapterLabel } from './AdapterIcon'
+import {
+  canDelete as canDeleteAgent,
+  canRename as canRenameAgent,
+  displayName,
+  formatRelativeTime,
+  workspaceLabel,
+} from './agent-display.helpers'
+import type { HarnessAgentAdapter } from './agent-harness-types'
+import type { AgentListItem } from './agents-page-types'
+import { type AgentLiveness, LivenessDot } from './LivenessDot'

-interface AgentRowCardProps extends AgentRowCallbacks {
-  data: AgentRowData
-  /** Whether THIS agent is mid-delete; renders a spinner in the menu. */
+interface AgentRowCardProps {
+  agent: AgentListItem
+  /**
+   * Per-agent extras the listing surface provides on top of the
+   * minimal `AgentListItem` shape. `lastUsedAt` survives server
+   * restart (sourced from acpx session record); `status` is in-memory
+   * server-side.
+   */
+  status?: AgentLiveness
+  lastUsedAt?: number | null
+  /** Adapter the agent belongs to. Drives icon + label. */
+  adapter?: HarnessAgentAdapter
+  /** Reasoning effort chip (claude/codex/openclaw catalog). */
+  reasoningEffort?: string | null
+  /** Modeled directly off the inbound delete handler so the parent owns the dialog. */
+  onDelete: (agent: AgentListItem) => void
+  /** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
  deleting?: boolean
 }

-/**
- * Composition shell for the agent rail. Owns no state; sub-components
- * each handle their own micro-state (error-panel collapse, etc.) and
- * emit callbacks (delete, pin/unpin) for the page to act on.
- *
- * The whole card carries state — not just the tile — so the row's
- * border subtly tells the user what's going on at a glance:
- *   working → accent-orange border with a soft glow
- *   error   → destructive border
- *   idle    → muted border, lifts on hover
- */
 export const AgentRowCard: FC<AgentRowCardProps> = ({
-  data,
-  deleting,
+  agent,
+  status = 'unknown',
+  lastUsedAt,
+  adapter,
+  reasoningEffort,
  onDelete,
-  onPinToggle,
+  deleting,
 }) => {
+  const navigate = useNavigate()
+  const adapterId = adapter ?? inferAdapterFromListItem(agent)
+  const workspace = workspaceLabel(agent)
+  const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
+  const allowDelete = canDeleteAgent(agent)
+  const allowRename = canRenameAgent(agent)
+
+  const handleChat = () => navigate(`/agents/${agent.agentId}`)
+  const handleCopyId = async () => {
+    try {
+      await navigator.clipboard.writeText(agent.agentId)
+      toast.success('Agent id copied')
+    } catch {
+      toast.error('Could not copy agent id')
+    }
+  }
+
  return (
    <div
      className={cn(
-        // Layout-stable hover. No translate, no shadow change — both
-        // visibly perturb neighbouring rows. Only the border tint
-        // shifts on hover, and the rail's vertical rhythm stays
-        // exactly the same in every state.
-        'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
-        data.status === 'working'
-          ? 'border-[var(--accent-orange)]/40'
-          : data.status === 'error'
-            ? 'border-destructive/40'
-            : 'border-border hover:border-[var(--accent-orange)]/30',
+        'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
+        'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
      )}
    >
      <div className="flex items-start gap-4">
-        <AgentTile
-          adapter={data.adapter}
-          status={data.status}
-          lastUsedAt={data.lastUsedAt}
-        />
-
-        <div className="min-w-0 flex-1">
-          <AgentTitleRow
-            agent={data.agent}
-            status={data.status}
-            pinned={data.pinned}
-            turnsByDay={data.turnsByDay}
-            failedByDay={data.failedByDay}
-            onPinToggle={(next) => onPinToggle(data.agent, next)}
+        {/* Adapter tile + liveness dot in the corner. */}
+        <div className="relative shrink-0">
+          <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
+            <AdapterIcon adapter={adapterId} className="h-6 w-6" />
+          </div>
+          <LivenessDot
+            status={status}
+            detail={livenessDetail(status, lastUsedAt)}
+            className="absolute -right-0.5 -bottom-0.5"
          />
-
-          <AgentSummaryChips
-            adapter={data.adapter}
-            modelLabel={data.modelLabel}
-            reasoningEffort={data.reasoningEffort}
-            adapterHealth={data.adapterHealth}
-          />
-
-          <AgentLastMessage message={data.lastUserMessage} />
-
-          <AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
-
-          {data.status === 'error' && data.lastError && (
-            <AgentErrorPanel
-              agentId={data.agent.agentId}
-              message={data.lastError}
-              errorAt={data.lastErrorAt}
-            />
-          )}
        </div>

-        <AgentActions
-          agent={data.agent}
-          activeTurnId={data.activeTurnId}
-          deleting={deleting}
-          onDelete={onDelete}
-        />
+        <div className="min-w-0 flex-1">
+          <div className="mb-1 flex items-center gap-2">
+            <span className="truncate font-semibold">{displayName(agent)}</span>
+            {status === 'working' && (
+              <Badge
+                variant="secondary"
+                className="bg-amber-50 text-amber-900 hover:bg-amber-50"
+              >
+                Working
+              </Badge>
+            )}
+            {status === 'asleep' && (
+              <Badge variant="outline" className="text-muted-foreground">
+                Asleep
+              </Badge>
+            )}
+            {status === 'error' && (
+              <Badge variant="destructive">Attention</Badge>
+            )}
+          </div>
+
+          <div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
+            <Badge variant="secondary" className="font-normal">
+              {adapterLabel(adapterId)}
+            </Badge>
+            {agent.modelLabel && agent.modelLabel !== 'default' && (
+              <Badge variant="outline" className="font-normal">
+                {agent.modelLabel}
+              </Badge>
+            )}
+            {reasoningEffort && reasoningEffort !== 'medium' && (
+              <Badge variant="outline" className="font-normal">
+                {reasoningEffort}
+              </Badge>
+            )}
+          </div>
+
+          <div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
+            <span>Last used {lastUsedLabel}</span>
+            {workspace && (
+              <>
+                <span aria-hidden>•</span>
+                <span className="truncate font-mono" title={workspace}>
+                  {workspace}
+                </span>
+              </>
+            )}
+          </div>
+        </div>
+
+        <div className="flex shrink-0 items-center gap-2">
+          <Button variant="outline" size="sm" onClick={handleChat}>
+            <MessageSquare className="mr-1.5 h-3 w-3" />
+            Chat
+          </Button>
+          <DropdownMenu>
+            <DropdownMenuTrigger asChild>
+              <Button
+                variant="ghost"
+                size="icon"
+                aria-label={`More actions for ${displayName(agent)}`}
+                className="h-8 w-8"
+              >
+                <MoreHorizontal className="h-4 w-4" />
+              </Button>
+            </DropdownMenuTrigger>
+            <DropdownMenuContent align="end" className="w-44">
+              <DropdownMenuItem onSelect={() => void handleCopyId()}>
+                <Copy className="mr-2 h-3.5 w-3.5" />
+                Copy id
+              </DropdownMenuItem>
+              <RenameMenuItem disabled={!allowRename} />
+              <ResetHistoryMenuItem />
+              <DropdownMenuSeparator />
+              <DropdownMenuItem
+                onSelect={() => onDelete(agent)}
+                disabled={!allowDelete || deleting}
+                className="text-destructive focus:text-destructive"
+              >
+                {deleting ? (
+                  <Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
+                ) : (
+                  <Trash2 className="mr-2 h-3.5 w-3.5" />
+                )}
+                Delete
+              </DropdownMenuItem>
+            </DropdownMenuContent>
+          </DropdownMenu>
+        </div>
      </div>
    </div>
  )
 }
+
+const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <Pencil className="mr-2 h-3.5 w-3.5" />
+      Rename
+    </DropdownMenuItem>
+  )
+  if (!disabled) return item
+  // Disabled but with a hint so users know it's coming, not broken.
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Rename coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+const ResetHistoryMenuItem: FC = () => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <RotateCcw className="mr-2 h-3.5 w-3.5" />
+      Reset history
+    </DropdownMenuItem>
+  )
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Reset history coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+function inferAdapterFromListItem(
+  agent: AgentListItem,
+): HarnessAgentAdapter | 'unknown' {
+  const label = agent.runtimeLabel?.toLowerCase()
+  if (label?.includes('claude')) return 'claude'
+  if (label?.includes('codex')) return 'codex'
+  if (label?.includes('openclaw')) return 'openclaw'
+  return 'unknown'
+}
+
+function livenessDetail(
+  status: AgentLiveness,
+  lastUsedAt: number | null | undefined,
+): string | undefined {
+  if (lastUsedAt == null) return undefined
+  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
+  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
+  if (status === 'asleep') {
+    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
+    const hr = Math.floor(diffMin / 60)
+    return `Asleep — quiet for ${hr} hr`
+  }
+  if (status === 'working') return 'Working on a turn'
+  if (status === 'error') return 'Attention — last turn failed'
+  return undefined
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
@@ -44,7 +44,6 @@ import {
  useCreateHarnessAgent,
  useDeleteHarnessAgent,
  useHarnessAgents,
-  useUpdateHarnessAgent,
 } from './useAgents'
 import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'

@@ -77,7 +76,6 @@ export const AgentsPage: FC = () => {
  } = useOpenClawAgents(openClawAgentsEnabled)
  const createHarnessAgent = useCreateHarnessAgent()
  const deleteHarnessAgent = useDeleteHarnessAgent()
-  const updateHarnessAgent = useUpdateHarnessAgent()
  const {
    setupOpenClaw,
    createAgent: createOpenClawAgent,
@@ -344,24 +342,12 @@ export const AgentsPage: FC = () => {
          agents={agentListItems}
          activity={agentActivity}
          harnessAgentLookup={harnessAgentLookup}
-          adapters={adapters}
          loading={agentsLoading}
          deletingAgentKey={deletingAgent ? deletingAgentKey : null}
          onCreateAgent={() => setCreateOpen(true)}
          onDeleteAgent={(agent) => {
            void handleDelete(agent)
          }}
-          onPinToggle={(agent, next) => {
-            // Optimistic mutation; harness-only — gateway-original
-            // OpenClaw entries are gated server-side via the harness
-            // backfill, so we only fire when the row maps to a
-            // harness agent record.
-            if (!harnessAgentLookup.has(agent.agentId)) return
-            updateHarnessAgent.mutate({
-              agentId: agent.agentId,
-              patch: { pinned: next },
-            })
-          }}
        />

        <SetupOpenClawDialog
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
@@ -1,5 +1,4 @@
 import type { AgentListItem } from './agents-page-types'
-import type { AgentLiveness } from './LivenessDot'

 /**
 * Display rules for the redesigned agent rows. Pure helpers — no React,
@@ -83,25 +82,3 @@ export function formatRelativeTime(epochMs: number | null): string {
  const d = Math.floor(diff / ONE_DAY)
  return d === 1 ? '1 day ago' : `${d} days ago`
 }
-
-/**
- * Tooltip-friendly description of a row's current liveness state.
- * Returns `undefined` when the state has nothing extra to add (e.g.
- * `unknown` with no timestamp).
- */
-export function livenessDetail(
-  status: AgentLiveness,
-  lastUsedAt: number | null | undefined,
-): string | undefined {
-  if (lastUsedAt == null) return undefined
-  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
-  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
-  if (status === 'asleep') {
-    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
-    const hr = Math.floor(diffMin / 60)
-    return `Asleep — quiet for ${hr} hr`
-  }
-  if (status === 'working') return 'Working on a turn'
-  if (status === 'error') return 'Attention — last turn failed'
-  return undefined
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
@@ -56,43 +56,6 @@ export interface HarnessAgent {
   * agents. Drives the recency sort and the "Last used X min ago" copy.
   */
  lastUsedAt?: number | null
-  /** Pinned agents float to the top of the list. Defaults to `false`. */
-  pinned?: boolean
-  /** First non-blank line of the most recent user message; null if none. */
-  lastUserMessage?: string | null
-  /** Working directory the agent runs in; null when no session record yet. */
-  cwd?: string | null
-  /** Cumulative + 7-day rolling token usage; null when no record. */
-  tokens?: {
-    last7d: { input: number; output: number; requestCount: number }
-    cumulative: { input: number; output: number }
-  } | null
-  turnsByDay?: number[]
-  failedByDay?: number[]
-  lastError?: string | null
-  lastErrorAt?: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId?: string | null
-  /** Persistent FIFO queue of messages waiting for this agent. */
-  queue?: HarnessQueuedMessage[]
-}
-
-export interface HarnessQueuedMessageAttachment {
-  mediaType: string
-  data: string
-}
-
-export interface HarnessQueuedMessage {
-  id: string
-  createdAt: number
-  message: string
-  attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
-}
-
-export interface HarnessAdapterHealth {
-  healthy: boolean
-  reason?: string
-  checkedAt: number
 }

 export interface HarnessAdapterDescriptor {
@@ -103,7 +66,6 @@ export interface HarnessAdapterDescriptor {
  modelControl: 'runtime-supported' | 'best-effort'
  models: Array<{ id: string; label: string; recommended?: boolean }>
  reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
-  health?: HarnessAdapterHealth
 }

 export interface CreateHarnessAgentInput {
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
@@ -1,160 +0,0 @@
-import {
-  Copy,
-  Loader2,
-  MessageSquare,
-  MoreHorizontal,
-  Pencil,
-  RotateCcw,
-  Trash2,
-} from 'lucide-react'
-import type { FC } from 'react'
-import { useNavigate } from 'react-router'
-import { toast } from 'sonner'
-import { Button } from '@/components/ui/button'
-import {
-  DropdownMenu,
-  DropdownMenuContent,
-  DropdownMenuItem,
-  DropdownMenuSeparator,
-  DropdownMenuTrigger,
-} from '@/components/ui/dropdown-menu'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import {
-  canDelete as canDeleteAgent,
-  canRename as canRenameAgent,
-  displayName,
-} from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-
-interface AgentActionsProps {
-  agent: AgentListItem
-  activeTurnId: string | null
-  deleting?: boolean
-  onDelete: (agent: AgentListItem) => void
-}
-
-/**
- * Single primary CTA per row: `Resume` (filled, accent-orange, with a
- * pulsing dot) when an active turn exists; otherwise `Chat` (outline).
- * Both navigate to the same place — the chat hook auto-attaches via
- * `/chat/active` when there's a live turn — but the row signals which
- * action the user is actually taking.
- */
-export const AgentActions: FC<AgentActionsProps> = ({
-  agent,
-  activeTurnId,
-  deleting,
-  onDelete,
-}) => {
-  const navigate = useNavigate()
-  const allowDelete = canDeleteAgent(agent)
-  const allowRename = canRenameAgent(agent)
-
-  const handleChat = () => navigate(`/agents/${agent.agentId}`)
-  const handleCopyId = async () => {
-    try {
-      await navigator.clipboard.writeText(agent.agentId)
-      toast.success('Agent id copied')
-    } catch {
-      toast.error('Could not copy agent id')
-    }
-  }
-
-  return (
-    <div className="flex shrink-0 items-center gap-1.5">
-      {activeTurnId ? (
-        <Button
-          variant="default"
-          size="sm"
-          onClick={handleChat}
-          className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
-        >
-          <span className="relative flex size-2">
-            <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-            <span className="relative inline-flex size-2 rounded-full bg-white" />
-          </span>
-          Resume
-        </Button>
-      ) : (
-        <Button variant="outline" size="sm" onClick={handleChat}>
-          <MessageSquare className="mr-1.5 size-3" />
-          Chat
-        </Button>
-      )}
-      <DropdownMenu>
-        <DropdownMenuTrigger asChild>
-          <Button
-            variant="ghost"
-            size="icon"
-            aria-label={`More actions for ${displayName(agent)}`}
-            className="size-8 text-muted-foreground hover:text-foreground"
-          >
-            <MoreHorizontal className="size-4" />
-          </Button>
-        </DropdownMenuTrigger>
-        <DropdownMenuContent align="end" className="w-44">
-          <DropdownMenuItem onSelect={() => void handleCopyId()}>
-            <Copy className="mr-2 size-3.5" />
-            Copy id
-          </DropdownMenuItem>
-          <ComingSoonItem
-            icon={Pencil}
-            label="Rename"
-            disabled={!allowRename}
-          />
-          <ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
-          <DropdownMenuSeparator />
-          <DropdownMenuItem
-            onSelect={() => onDelete(agent)}
-            disabled={!allowDelete || deleting}
-            className="text-destructive focus:text-destructive"
-          >
-            {deleting ? (
-              <Loader2 className="mr-2 size-3.5 animate-spin" />
-            ) : (
-              <Trash2 className="mr-2 size-3.5" />
-            )}
-            Delete
-          </DropdownMenuItem>
-        </DropdownMenuContent>
-      </DropdownMenu>
-    </div>
-  )
-}
-
-interface ComingSoonItemProps {
-  icon: typeof Pencil
-  label: string
-  disabled: boolean
-}
-
-const ComingSoonItem: FC<ComingSoonItemProps> = ({
-  icon: Icon,
-  label,
-  disabled,
-}) => {
-  const item = (
-    <DropdownMenuItem disabled className="text-muted-foreground">
-      <Icon className="mr-2 size-3.5" />
-      {label}
-    </DropdownMenuItem>
-  )
-  if (!disabled) return item
-  return (
-    <TooltipProvider delayDuration={300}>
-      <Tooltip>
-        <TooltipTrigger asChild>
-          <span className="block w-full">{item}</span>
-        </TooltipTrigger>
-        <TooltipContent side="left" className="text-xs">
-          {label} coming soon
-        </TooltipContent>
-      </Tooltip>
-    </TooltipProvider>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
@@ -1,96 +0,0 @@
-import { AlertTriangle, ChevronDown } from 'lucide-react'
-import { type FC, useEffect, useState } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Collapsible,
-  CollapsibleContent,
-  CollapsibleTrigger,
-} from '@/components/ui/collapsible'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { truncate } from './agent-row.helpers'
-
-interface AgentErrorPanelProps {
-  agentId: string
-  message: string
-  errorAt: number | null
-}
-
-const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
-const PREVIEW_CHARS = 200
-
-export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
-  agentId,
-  message,
-  errorAt,
-}) => {
-  const storageKey = `${STORAGE_PREFIX}${agentId}`
-  // Open if we've never seen this `errorAt` for this agent. Once the
-  // user collapses the panel (or refreshes after seeing it), we mark
-  // it seen so it doesn't re-pop on every poll.
-  const [open, setOpen] = useState<boolean>(() => {
-    if (typeof window === 'undefined' || !errorAt) return true
-    const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
-    return !Number.isFinite(seen) || errorAt > seen
-  })
-
-  useEffect(() => {
-    if (!open && errorAt && typeof window !== 'undefined') {
-      window.localStorage.setItem(storageKey, String(errorAt))
-    }
-  }, [open, errorAt, storageKey])
-
-  const preview = truncate(message, PREVIEW_CHARS)
-  const truncated = preview.length < message.length
-
-  return (
-    <Collapsible open={open} onOpenChange={setOpen} className="mt-3">
-      <div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
-        <div className="flex items-center gap-2 font-medium text-destructive text-xs">
-          <AlertTriangle className="size-3.5" />
-          Last error
-        </div>
-        <CollapsibleTrigger asChild>
-          <Button
-            variant="ghost"
-            size="sm"
-            className="h-6 px-2 text-muted-foreground"
-          >
-            <span className="text-xs">{open ? 'hide' : 'show'}</span>
-            <ChevronDown
-              className={cn(
-                'ml-1 size-3 transition-transform',
-                open && 'rotate-180',
-              )}
-            />
-          </Button>
-        </CollapsibleTrigger>
-      </div>
-      <CollapsibleContent>
-        <div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
-          {truncated ? (
-            <HoverCard openDelay={300}>
-              <HoverCardTrigger asChild>
-                <span className="cursor-default font-mono text-foreground/80">
-                  {preview}…
-                </span>
-              </HoverCardTrigger>
-              <HoverCardContent
-                side="bottom"
-                className="max-w-md whitespace-pre-wrap font-mono text-xs"
-              >
-                {message}
-              </HoverCardContent>
-            </HoverCard>
-          ) : (
-            <span className="font-mono text-foreground/80">{message}</span>
-          )}
-        </div>
-      </CollapsibleContent>
-    </Collapsible>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
@@ -1,35 +0,0 @@
-import { Quote } from 'lucide-react'
-import type { FC } from 'react'
-import { firstNonBlankLine, truncate } from './agent-row.helpers'
-
-interface AgentLastMessageProps {
-  message: string | null
-}
-
-const PREVIEW_CHARS = 110
-
-/**
- * Inline preview of the most recent user message. Renders as a quoted,
- * italic line so the row reads like a conversation snippet rather than
- * a label-and-value pair. No hover-card — opening the agent's chat is
- * the canonical way to read the full message.
- */
-export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
-  return (
-    <p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="truncate">{preview}</span>
-    </p>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { formatRelativeTime } from '../agent-display.helpers'
-import { AgentTokenSummary } from './AgentTokenSummary'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentMetaRowProps {
-  lastUsedAt: number | null
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Bottom-of-row meta line. Intentionally sparse — last activity time
- * and lifetime tokens. CWD is no longer surfaced here because the path
- * the server happens to be running from isn't actionable; if a future
- * surface needs the cwd (chat panel, debug view) it reads from the
- * listing payload directly.
- */
-export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
-  const lastUsedLabel = formatRelativeTime(lastUsedAt)
-  const tokensTotal =
-    (tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
-  const showTokens = tokensTotal > 0
-
-  return (
-    <div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
-      <span>{lastUsedLabel}</span>
-      {showTokens && (
-        <>
-          <span aria-hidden className="text-muted-foreground/50">
-            ·
-          </span>
-          <AgentTokenSummary tokens={tokens} />
-        </>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
@@ -1,92 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
-
-interface AgentSparklineProps {
-  /** 14 entries, oldest → newest. Today's bucket is the last index. */
-  turnsByDay: number[]
-  /** Same length, same order. Failed turns counted separately. */
-  failedByDay: number[]
-  className?: string
-}
-
-const MIN_BAR_HEIGHT_PX = 2
-const MAX_BAR_HEIGHT_PX = 18
-
-export const AgentSparkline: FC<AgentSparklineProps> = ({
-  turnsByDay,
-  failedByDay,
-  className,
-}) => {
-  if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
-  const max = Math.max(1, ...turnsByDay)
-
-  return (
-    <HoverCard openDelay={250}>
-      <HoverCardTrigger asChild>
-        <div
-          role="img"
-          aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
-          className={cn('flex h-5 items-end gap-px', className)}
-        >
-          {turnsByDay.map((count, idx) => {
-            const ratio = count / max
-            const height = Math.max(
-              MIN_BAR_HEIGHT_PX,
-              Math.round(ratio * MAX_BAR_HEIGHT_PX),
-            )
-            const isToday = idx === ROW_BAR_COUNT - 1
-            const failed = failedByDay[idx] ?? 0
-            return (
-              <div
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
-                key={`bar-${idx}`}
-                className={cn(
-                  'w-1.5 rounded-sm',
-                  count === 0
-                    ? 'bg-muted-foreground/15'
-                    : failed > 0
-                      ? 'bg-destructive/50'
-                      : 'bg-[var(--accent-orange)]/50',
-                  isToday && 'ring-1 ring-foreground/30',
-                )}
-                style={{ height }}
-              />
-            )
-          })}
-        </div>
-      </HoverCardTrigger>
-      <HoverCardContent side="left" className="w-56 text-xs">
-        <div className="mb-2 font-medium text-sm">Last 14 days</div>
-        <ul className="space-y-0.5">
-          {turnsByDay.map((count, idx) => {
-            const failed = failedByDay[idx] ?? 0
-            const dayLabel = formatLocalDate(idx)
-            return (
-              <li
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
-                key={`day-${idx}`}
-                className="flex items-center justify-between text-muted-foreground"
-              >
-                <span>{dayLabel}</span>
-                <span>
-                  {count}
-                  {failed > 0 && (
-                    <span className="ml-1 text-destructive">
-                      ({failed} failed)
-                    </span>
-                  )}
-                </span>
-              </li>
-            )
-          })}
-        </ul>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
@@ -1,71 +0,0 @@
-import { TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { adapterLabel } from '../AdapterIcon'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentAdapterHealth } from './agent-row.types'
-
-interface AgentSummaryChipsProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  /** When unhealthy, the adapter label dims and a warning chip appears. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-/**
- * Adapter / model / reasoning summary line. Always rendered (so OpenClaw
- * rows that fall back to defaults still expose what they're set up to do)
- * and surfaces adapter-health *only when unhealthy* — keeping the calm
- * default state silent and reserving visual noise for things the user
- * needs to act on.
- */
-export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
-  adapter,
-  modelLabel,
-  reasoningEffort,
-  adapterHealth,
-}) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelLabel) parts.push(modelLabel)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && adapterHealth && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { cn } from '@/lib/utils'
-import { AdapterIcon } from '../AdapterIcon'
-import { livenessDetail } from '../agent-display.helpers'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import { type AgentLiveness, LivenessDot } from '../LivenessDot'
-
-export interface AgentTileProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  status: AgentLiveness
-  lastUsedAt: number | null
-}
-
-/**
- * Adapter glyph + a single liveness dot. Adapter health is no longer
- * surfaced here — it lives as an inline pill inside `AgentSummaryChips`
- * so the user isn't asked to disambiguate two dots on the same tile.
- */
-export const AgentTile: FC<AgentTileProps> = ({
-  adapter,
-  status,
-  lastUsedAt,
-}) => (
-  <div className="relative shrink-0">
-    <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
-      <AdapterIcon adapter={adapter} className="h-6 w-6" />
-    </div>
-    <LivenessDot
-      status={status}
-      detail={livenessDetail(status, lastUsedAt)}
-      className={cn(
-        'absolute -right-0.5 -bottom-0.5',
-        status === 'working' && 'animate-pulse',
-      )}
-    />
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
@@ -1,55 +0,0 @@
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import { displayName } from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-import { AgentSparkline } from './AgentSparkline'
-import { PinToggle } from './PinToggle'
-
-interface AgentTitleRowProps {
-  agent: AgentListItem
-  status: AgentLiveness
-  pinned: boolean
-  turnsByDay: number[]
-  failedByDay: number[]
-  onPinToggle: (next: boolean) => void
-}
-
-/**
- * Title strip: name + status badge + (right-aligned) sparkline. The
- * pin toggle sits trailing the title so the title always flushes left
- * regardless of pin state — moving the star left of the title indents
- * the row's first line off-axis from the model/preview/meta lines
- * below it. When unpinned and not hovered, the toggle is removed from
- * layout entirely so it reserves no space at all.
- */
-export const AgentTitleRow: FC<AgentTitleRowProps> = ({
-  agent,
-  status,
-  pinned,
-  turnsByDay,
-  failedByDay,
-  onPinToggle,
-}) => (
-  <div className="mb-1 flex items-center gap-2">
-    <span className="truncate font-semibold">{displayName(agent)}</span>
-    {status === 'working' && (
-      <Badge
-        variant="secondary"
-        className="bg-amber-50 text-amber-900 hover:bg-amber-50"
-      >
-        Working
-      </Badge>
-    )}
-    {status === 'asleep' && (
-      <Badge variant="outline" className="text-muted-foreground">
-        Asleep
-      </Badge>
-    )}
-    {status === 'error' && <Badge variant="destructive">Attention</Badge>}
-    <PinToggle pinned={pinned} onToggle={onPinToggle} />
-    <div className="ml-auto">
-      <AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
-    </div>
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
@@ -1,63 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { Progress } from '@/components/ui/progress'
-import { formatTokens } from './agent-row.helpers'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentTokenSummaryProps {
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
- * (the only window we can compute reliably from the session record).
- * Per-window stats land in a follow-up once the activity ledger ships.
- */
-export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
-  if (!tokens) return null
-  const { input, output } = tokens.cumulative
-  const total = input + output
-  if (total === 0) return null
-  const inputPct = (input / total) * 100
-
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
-          {formatTokens(total)} tokens
-        </span>
-      </HoverCardTrigger>
-      <HoverCardContent side="top" align="end" className="w-72 text-sm">
-        <div className="mb-3 flex items-center justify-between">
-          <span className="font-medium">Lifetime tokens</span>
-          <span className="text-muted-foreground text-xs tabular-nums">
-            {formatTokens(total)} total
-          </span>
-        </div>
-
-        <div className="space-y-2">
-          <div className="flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Input</span>
-            <span className="tabular-nums">{formatTokens(input)}</span>
-          </div>
-          <Progress value={inputPct} className="h-1.5" />
-
-          <div className="mt-2 flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Output</span>
-            <span className="tabular-nums">{formatTokens(output)}</span>
-          </div>
-          <Progress value={100 - inputPct} className="h-1.5" />
-        </div>
-
-        <p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
-          Cumulative across every turn this agent has run. Per-window stats
-          arrive in a future release.
-        </p>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
@@ -1,60 +0,0 @@
-import { Star } from 'lucide-react'
-import type { FC } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import { cn } from '@/lib/utils'
-
-interface PinToggleProps {
-  pinned: boolean
-  onToggle: (next: boolean) => void
-}
-
-/**
- * Trailing star toggle. The button is *always rendered* — only its
- * opacity changes between pinned/unpinned/hover states — so the title
- * row's height is constant. Hiding the slot via `display: none` would
- * collapse the row's vertical metrics on hover and shift every card
- * below in the rail.
- *
- * Placement is trailing the title (after the status badge) so the
- * title itself flushes left regardless of pin state — leading the
- * row with the star would indent the title relative to the model /
- * preview / meta lines beneath it.
- */
-export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
-  <TooltipProvider delayDuration={300}>
-    <Tooltip>
-      <TooltipTrigger asChild>
-        <Button
-          variant="ghost"
-          size="icon"
-          className={cn(
-            'size-6 text-muted-foreground transition-opacity hover:text-foreground',
-            pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
-          )}
-          aria-pressed={pinned}
-          aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
-          onClick={(event) => {
-            event.stopPropagation()
-            onToggle(!pinned)
-          }}
-        >
-          <Star
-            className={cn(
-              'size-3.5',
-              pinned && 'fill-amber-400 text-amber-500',
-            )}
-          />
-        </Button>
-      </TooltipTrigger>
-      <TooltipContent side="top" className="text-xs">
-        {pinned ? 'Unpin' : 'Pin to top'}
-      </TooltipContent>
-    </Tooltip>
-  </TooltipProvider>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
@@ -1,73 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import {
-  firstNonBlankLine,
-  formatLocalDate,
-  formatTokens,
-  ROW_BAR_COUNT,
-  truncate,
-} from './agent-row.helpers'
-
-describe('formatTokens', () => {
-  it('renders zero / NaN as "0"', () => {
-    expect(formatTokens(0)).toBe('0')
-    expect(formatTokens(Number.NaN)).toBe('0')
-  })
-
-  it('renders sub-1K as integer', () => {
-    expect(formatTokens(142)).toBe('142')
-  })
-
-  it('renders K with one decimal under 10', () => {
-    expect(formatTokens(8_400)).toBe('8.4K')
-  })
-
-  it('drops the decimal at >=10K', () => {
-    expect(formatTokens(120_000)).toBe('120K')
-  })
-
-  it('renders M with one decimal under 10', () => {
-    expect(formatTokens(1_200_000)).toBe('1.2M')
-  })
-})
-
-describe('firstNonBlankLine', () => {
-  it('returns the first non-blank line', () => {
-    expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
-  })
-
-  it('skips USER_QUERY envelope tags', () => {
-    expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
-      'fix tests',
-    )
-  })
-
-  it('falls back to the trimmed input when nothing matches', () => {
-    expect(firstNonBlankLine('   single   ')).toBe('single')
-  })
-})
-
-describe('truncate', () => {
-  it('returns input unchanged when within limit', () => {
-    expect(truncate('hello', 10)).toBe('hello')
-  })
-
-  it('appends an ellipsis when over limit', () => {
-    expect(truncate('hello world', 6)).toBe('hello…')
-  })
-})
-
-describe('formatLocalDate', () => {
-  const today = new Date('2026-04-30T12:00:00Z')
-
-  it('labels today and yesterday explicitly', () => {
-    expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
-    expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
-  })
-
-  it('returns a "Mon D" format for older days', () => {
-    const label = formatLocalDate(0, today)
-    // "Apr 17" or "Apr 17," depending on locale; just assert it
-    // contains a month abbreviation and a day number.
-    expect(label).toMatch(/[A-Za-z]+ \d+/)
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
@@ -1,64 +0,0 @@
-/**
- * Pure formatters consumed by row sub-components. Kept distinct from
- * `agent-display.helpers.ts` (page-level helpers) so the row internals
- * have an obvious single home.
- */
-
-const TOKEN_THRESHOLDS: Array<[number, string]> = [
-  [1_000_000, 'M'],
-  [1_000, 'K'],
-]
-
-/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
-export function formatTokens(n: number): string {
-  if (!Number.isFinite(n) || n <= 0) return '0'
-  for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
-    if (n >= threshold) {
-      const value = n / threshold
-      const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
-      return `${decimal}${suffix}`
-    }
-  }
-  return String(Math.round(n))
-}
-
-const USER_QUERY_OPEN = /^<USER_QUERY>$/i
-const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
-
-/**
- * First non-blank line, with the BrowserOS user-system-prompt
- * `<USER_QUERY>` envelope tags stripped so previews don't show
- * structural noise.
- */
-export function firstNonBlankLine(text: string): string {
-  const lines = text.split('\n').map((line) => line.trim())
-  for (const line of lines) {
-    if (!line) continue
-    if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
-    return line
-  }
-  return text.trim()
-}
-
-export function truncate(text: string, max: number): string {
-  if (text.length <= max) return text
-  return `${text.slice(0, max - 1).trimEnd()}…`
-}
-
-const SPARKLINE_DAYS = 14
-
-/**
- * "today" / "yesterday" / "Apr 17" — given an index 0..13 from
- * oldest → newest. `today` defaults to `new Date()` so callers don't
- * have to thread a clock through.
- */
-export function formatLocalDate(idx: number, today: Date = new Date()): string {
-  if (idx === SPARKLINE_DAYS - 1) return 'today'
-  if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
-  const offset = SPARKLINE_DAYS - 1 - idx
-  const date = new Date(today)
-  date.setDate(date.getDate() - offset)
-  return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
-}
-
-export const ROW_BAR_COUNT = SPARKLINE_DAYS
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
@@ -1,51 +0,0 @@
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-
-/**
- * Window-bounded token usage. Server returns `null` when no session
- * record exists yet for the agent.
- */
-export interface AgentTokenUsage {
-  last7d: { input: number; output: number; requestCount: number }
-  cumulative: { input: number; output: number }
-}
-
-export interface AgentAdapterHealth {
-  healthy: boolean
-  reason?: string
-}
-
-/**
- * Everything an `AgentRowCard` needs to render. Mirrors the shape
- * `useHarnessAgents` exposes; the page assembles one entry per row in
- * `AgentList` and passes it down. Sub-components only see slices of
- * this object — no prop drilling beyond two levels.
- */
-export interface AgentRowData {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  status: AgentLiveness
-  lastUsedAt: number | null
-  pinned: boolean
-  cwd: string | null
-  lastUserMessage: string | null
-  tokens: AgentTokenUsage | null
-  /** 14 entries, oldest → newest. Today is the last index. */
-  turnsByDay: number[]
-  /** Same length and ordering as `turnsByDay`. */
-  failedByDay: number[]
-  lastError: string | null
-  lastErrorAt: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId: string | null
-  /** Adapter-level health, shared across rows for the same adapter. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-export interface AgentRowCallbacks {
-  onDelete: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
@@ -8,7 +8,6 @@ import {
  type HarnessAdapterDescriptor,
  type HarnessAgent,
  type HarnessAgentHistoryPage,
-  type HarnessQueuedMessage,
  mapHarnessAgentToEntry,
 } from './agent-harness-types'
 import type { OpenClawStatus } from './useOpenClaw'
@@ -136,63 +135,6 @@ export function useCreateHarnessAgent() {
  })
 }

-/**
- * Apply a partial update to a harness agent. Used by the pin-toggle
- * star and (eventually) the inline rename UI. Optimistically writes
- * the patch into the listing query cache so the row updates instantly,
- * then rolls back if the server rejects the change.
- */
-export function useUpdateHarnessAgent() {
-  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: {
-      agentId: string
-      patch: { name?: string; pinned?: boolean }
-    }) => {
-      if (!baseUrl || urlLoading) {
-        throw new Error('BrowserOS agent server URL is not ready')
-      }
-      const data = await agentsFetch<{ agent: HarnessAgent }>(
-        baseUrl,
-        `/${encodeURIComponent(input.agentId)}`,
-        {
-          method: 'PATCH',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify(input.patch),
-        },
-      )
-      return data.agent
-    },
-    onMutate: async ({ agentId, patch }) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === agentId ? { ...agent, ...patch } : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
 export function useDeleteHarnessAgent() {
  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
  const queryClient = useQueryClient()
@@ -264,8 +206,6 @@ export interface HarnessActiveTurnInfo {
  lastSeq: number
  startedAt: number
  endedAt?: number
-  /** User message that kicked off the turn; null when not captured. */
-  prompt: string | null
 }

 /**
@@ -320,145 +260,3 @@ export async function fetchHarnessAgentHistory(
    `/${encodeURIComponent(agentId)}/sessions/main/history`,
  )
 }
-
-export interface EnqueueMessageInput {
-  message: string
-  attachments?: ReadonlyArray<unknown>
-}
-
-export async function enqueueHarnessMessage(
-  agentId: string,
-  input: EnqueueMessageInput,
-): Promise<HarnessQueuedMessage> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
-    {
-      method: 'POST',
-      headers: { 'Content-Type': 'application/json' },
-      body: JSON.stringify({
-        message: input.message,
-        ...(input.attachments && input.attachments.length > 0
-          ? { attachments: input.attachments }
-          : {}),
-      }),
-    },
-  )
-  if (!response.ok) {
-    let message = `Request failed with status ${response.status}`
-    try {
-      const body = (await response.json()) as { error?: string }
-      if (body.error) message = body.error
-    } catch {}
-    throw new Error(message)
-  }
-  const body = (await response.json()) as { queued: HarnessQueuedMessage }
-  return body.queued
-}
-
-export async function removeHarnessQueuedMessage(
-  agentId: string,
-  messageId: string,
-): Promise<{ removed: boolean }> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
-      messageId,
-    )}`,
-    { method: 'DELETE' },
-  )
-  if (!response.ok) return { removed: false }
-  return (await response.json()) as { removed: boolean }
-}
-
-/**
- * Optimistic enqueue: writes the new queued message into the listing
- * cache immediately so the queue panel reflects the change without
- * waiting for the next poll. Rolls back if the server rejects.
- */
-export function useEnqueueHarnessMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
-      enqueueHarnessMessage(input.agentId, input),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      const optimistic: HarnessQueuedMessage = {
-        id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
-        createdAt: Date.now(),
-        message: input.message,
-      }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
-/**
- * Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
- */
-export function useRemoveHarnessQueuedMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string; messageId: string }) =>
-      removeHarnessQueuedMessage(input.agentId, input.messageId),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? {
-                ...agent,
-                queue: (agent.queue ?? []).filter(
-                  (entry) => entry.id !== input.messageId,
-                ),
-              }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
--- a/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
+++ b/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
@@ -59,3 +59,15 @@ export interface AgentConversation {
  createdAt: number
  updatedAt: number
 }
+
+export interface AgentCardData {
+  agentId: string
+  name: string
+  model?: string
+  status: 'idle' | 'working' | 'error'
+  lastMessage?: string
+  lastMessageTimestamp?: number
+  activitySummary?: string
+  currentTool?: string
+  costUsd?: number
+}
--- a/packages/browseros-agent/apps/agent/package.json
+++ b/packages/browseros-agent/apps/agent/package.json
@@ -9,7 +9,6 @@
    "build": "bun run codegen && wxt build",
    "build:dev": "bun --env-file=.env.development wxt build --mode development",
    "zip": "wxt zip",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
    "compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
    "lint": "bunx biome check",
    "typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
--- a/packages/browseros-agent/apps/cli/README.md
+++ b/packages/browseros-agent/apps/cli/README.md
@@ -38,8 +38,8 @@ browseros-cli install                # downloads BrowserOS for your platform
 # If BrowserOS is installed but not running
 browseros-cli launch                 # opens BrowserOS, waits for server

-# Configure the CLI with the Server URL from BrowserOS settings
-browseros-cli init http://127.0.0.1:9000/mcp
+# Configure the CLI (auto-discovers running BrowserOS)
+browseros-cli init --auto            # detects server URL and saves config

 # Verify connection
 browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url>             # non-interactive — pass URL directly
 browseros-cli init                   # interactive — prompts for URL
 ```

-Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
+Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).

 ### CLI updates

@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
 | `--debug` | `BOS_DEBUG=1` | Debug output |
 | `--timeout, -t` | | Request timeout (default: 2m) |

-Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
+Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file

-If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
+If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.

 ## Testing

@@ -179,7 +179,7 @@ apps/cli/
 │   └── config.go       # Config file (~/.config/browseros-cli/config.yaml)
 ├── cmd/
 │   ├── root.go         # Root command, global flags
-│   ├── init.go         # Server URL configuration (URL arg or interactive)
+│   ├── init.go         # Server URL configuration (URL arg, --auto, interactive)
 │   ├── install.go      # install (download BrowserOS for current platform)
 │   ├── launch.go       # launch (find and start BrowserOS, wait for server)
 │   ├── open.go         # open (new_page / new_hidden_page)
--- a/packages/browseros-agent/apps/cli/cmd/init.go
+++ b/packages/browseros-agent/apps/cli/cmd/init.go
@@ -17,6 +17,8 @@ import (
 )

 func init() {
+	var autoDiscover bool
+
 	cmd := &cobra.Command{
 		Use:   "init [url]",
 		Short: "Configure the BrowserOS server connection",
@@ -32,8 +34,9 @@ You can provide the full URL or just the port number:
  browseros-cli init http://127.0.0.1:9000/mcp
  browseros-cli init 9000

-Modes:
+Three modes:
  browseros-cli init <url>    Non-interactive (full URL or port number)
+  browseros-cli init --auto   Auto-discover from ~/.browseros/server.json
  browseros-cli init          Interactive prompt`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.MaximumNArgs(1),
@@ -46,9 +49,22 @@ Modes:

 			switch {
 			case len(args) == 1:
+				// Non-interactive: URL provided as argument
 				input = args[0]

+			case autoDiscover:
+				// Auto-discover: server.json → config → probe common ports
+				discovered := probeRunningServer()
+				if discovered == "" {
+					output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
+						"  If not running:    browseros-cli launch\n"+
+						"  If not installed:  browseros-cli install", 1)
+				}
+				input = discovered
+				fmt.Printf("Auto-discovered server at %s\n", input)
+
 			default:
+				// Interactive prompt (original behavior)
 				fmt.Println()
 				bold.Println("BrowserOS CLI Setup")
 				fmt.Println()
@@ -79,14 +95,12 @@ Modes:
 				output.Errorf(1, "invalid URL: %s", input)
 			}

+			// Verify connectivity
 			fmt.Printf("Checking connection to %s ...\n", baseURL)
 			client := &http.Client{Timeout: 5 * time.Second}
 			resp, err := client.Get(baseURL + "/health")
 			if err != nil {
-				output.Errorf(1, "cannot connect to %s: %v\n\n"+
-					"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
-					"Then run: browseros-cli init <Server URL>\n"+
-					"Example:  browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
+				output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
 			}
 			resp.Body.Close()

@@ -107,5 +121,6 @@ Modes:
 		},
 	}

+	cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
 	rootCmd.AddCommand(cmd)
 }
--- a/packages/browseros-agent/apps/cli/cmd/install.go
+++ b/packages/browseros-agent/apps/cli/cmd/install.go
@@ -28,7 +28,7 @@ Linux:   Downloads AppImage (or .deb with --deb flag)

 After installation:
  browseros-cli launch        # start BrowserOS
-  browseros-cli init <url>    # configure the CLI with the Server URL`,
+  browseros-cli init --auto   # configure the CLI`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.NoArgs,
 		Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
 			fmt.Println()
 			bold.Println("Next steps:")
 			dim.Println("  browseros-cli launch        # start BrowserOS")
-			dim.Println("  browseros-cli init <url>    # use the Server URL from BrowserOS settings")
+			dim.Println("  browseros-cli init --auto   # configure the CLI")
 		},
 	}

--- a/packages/browseros-agent/apps/cli/cmd/launch.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch.go
@@ -1,7 +1,6 @@
 package cmd

 import (
-	"encoding/json"
 	"fmt"
 	"net/http"
 	"os"
@@ -39,7 +38,6 @@ If BrowserOS is already running, reports the server URL.`,

 			if url := probeRunningServer(); url != "" {
 				green.Printf("BrowserOS is already running at %s\n", url)
-				dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 				return
 			}

@@ -65,7 +63,7 @@ If BrowserOS is already running, reports the server URL.`,

 			green.Printf("BrowserOS is ready at %s\n", url)
 			fmt.Println()
-			dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
+			dim.Println("Next: browseros-cli init --auto")
 		},
 	}

@@ -77,77 +75,39 @@ If BrowserOS is already running, reports the server URL.`,
 // Server probing
 // ---------------------------------------------------------------------------

-var commonBrowserOSPorts = []int{9100, 9200, 9300}
-
-// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
+// probeRunningServer checks server.json, config, and common ports for a running server.
 func probeRunningServer() string {
-	client := &http.Client{Timeout: 2 * time.Second}
+	check := func(baseURL string) bool {
+		client := &http.Client{Timeout: 2 * time.Second}
+		resp, err := client.Get(baseURL + "/health")
+		if err != nil {
+			return false
+		}
+		resp.Body.Close()
+		return resp.StatusCode == 200
+	}

-	if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
+	// 1. server.json — written by BrowserOS on startup with the actual port
+	if url := loadBrowserosServerURL(); url != "" && check(url) {
 		return url
 	}

-	if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
+	// 2. Saved config / env var
+	if url := defaultServerURL(); url != "" && check(url) {
 		return url
 	}

-	return probeCommonServerPorts(client)
-}
-
-func checkServerHealth(client *http.Client, baseURL string) bool {
-	resp, err := client.Get(baseURL + "/health")
-	if err != nil {
-		return false
-	}
-	resp.Body.Close()
-	return resp.StatusCode == 200
-}
-
-func probeCommonServerPorts(client *http.Client) string {
-	for _, port := range commonBrowserOSPorts {
+	// 3. Probe common BrowserOS ports as last resort
+	for _, port := range []int{9100, 9200, 9300} {
 		url := fmt.Sprintf("http://127.0.0.1:%d", port)
-		if checkServerHealth(client, url) {
+		if check(url) {
 			return url
 		}
 	}
+
 	return ""
 }

-type serverDiscoveryConfig struct {
-	ServerPort       int    `json:"server_port"`
-	URL              string `json:"url"`
-	ServerVersion    string `json:"server_version"`
-	BrowserOSVersion string `json:"browseros_version,omitempty"`
-	ChromiumVersion  string `json:"chromium_version,omitempty"`
-}
-
-// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
-//
-// Normal command resolution must not call this because it can override a URL the
-// user explicitly saved with `browseros-cli init <Server URL>`.
-func loadBrowserosServerURL() string {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return ""
-	}
-
-	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
-	if err != nil {
-		return ""
-	}
-
-	var sc serverDiscoveryConfig
-	if err := json.Unmarshal(data, &sc); err != nil {
-		return ""
-	}
-
-	return normalizeServerURL(sc.URL)
-}
-
-func mcpEndpointURL(baseURL string) string {
-	return strings.TrimSuffix(baseURL, "/") + "/mcp"
-}
-
 // ---------------------------------------------------------------------------
 // Platform-native installation detection
 // ---------------------------------------------------------------------------
@@ -157,8 +117,7 @@ func mcpEndpointURL(baseURL string) string {
 // macOS:   `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
 // Linux:   checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
 // Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
-//
-//	and registry uninstall key (per-user Chromium install pattern)
+//          and registry uninstall key (per-user Chromium install pattern)
 func isBrowserOSInstalled() bool {
 	switch runtime.GOOS {
 	case "darwin":
@@ -312,11 +271,14 @@ func waitForServer(maxWait time.Duration) (string, bool) {

 	for time.Now().Before(deadline) {
 		// server.json is written by BrowserOS on startup with the actual port
-		if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
-			return url, true
-		}
-		if url := probeCommonServerPorts(client); url != "" {
-			return url, true
+		if url := loadBrowserosServerURL(); url != "" {
+			resp, err := client.Get(url + "/health")
+			if err == nil {
+				resp.Body.Close()
+				if resp.StatusCode == 200 {
+					return url, true
+				}
+			}
 		}
 		fmt.Print(".")
 		time.Sleep(1 * time.Second)
--- a/packages/browseros-agent/apps/cli/cmd/launch_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch_test.go
@@ -1,99 +0,0 @@
-package cmd
-
-import (
-	"fmt"
-	"net"
-	"net/http"
-	"net/http/httptest"
-	"net/url"
-	"os"
-	"path/filepath"
-	"strconv"
-	"testing"
-	"time"
-
-	"browseros-cli/config"
-)
-
-func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	discoveredServer := newHealthyServer(t)
-	configServer := newHealthyServer(t)
-
-	serverDir := filepath.Join(home, ".browseros")
-	if err := os.MkdirAll(serverDir, 0755); err != nil {
-		t.Fatalf("os.MkdirAll() error = %v", err)
-	}
-	data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
-	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
-		t.Fatalf("os.WriteFile() error = %v", err)
-	}
-	if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := probeRunningServer()
-	if got != normalizeServerURL(discoveredServer.URL) {
-		t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
-	}
-}
-
-func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-
-	server := newHealthyServer(t)
-	port := serverPort(t, server.URL)
-
-	originalPorts := commonBrowserOSPorts
-	commonBrowserOSPorts = []int{port}
-	t.Cleanup(func() {
-		commonBrowserOSPorts = originalPorts
-	})
-
-	got, ok := waitForServer(100 * time.Millisecond)
-	if !ok {
-		t.Fatal("waitForServer() ok = false, want true")
-	}
-	if got != normalizeServerURL(server.URL) {
-		t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
-	}
-}
-
-func newHealthyServer(t *testing.T) *httptest.Server {
-	t.Helper()
-
-	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		if r.URL.Path != "/health" {
-			http.NotFound(w, r)
-			return
-		}
-		w.WriteHeader(http.StatusOK)
-	}))
-	t.Cleanup(server.Close)
-	return server
-}
-
-func serverPort(t *testing.T, rawURL string) int {
-	t.Helper()
-
-	parsed, err := url.Parse(rawURL)
-	if err != nil {
-		t.Fatalf("url.Parse() error = %v", err)
-	}
-	_, portText, err := net.SplitHostPort(parsed.Host)
-	if err != nil {
-		t.Fatalf("net.SplitHostPort() error = %v", err)
-	}
-	port, err := strconv.Atoi(portText)
-	if err != nil {
-		t.Fatalf("strconv.Atoi() error = %v", err)
-	}
-	return port
-}
--- a/packages/browseros-agent/apps/cli/cmd/root.go
+++ b/packages/browseros-agent/apps/cli/cmd/root.go
@@ -2,8 +2,10 @@ package cmd

 import (
 	"context"
+	"encoding/json"
 	"fmt"
 	"os"
+	"path/filepath"
 	"strconv"
 	"strings"
 	"time"
@@ -287,15 +289,18 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
 	}
 }

-// defaultServerURL returns the implicit target from user-controlled settings only.
-//
-// BrowserOS writes a discovery file at runtime, but normal commands intentionally
-// ignore it so a saved URL is not silently overridden by another running server.
 func defaultServerURL() string {
+	// 1. Explicit env var always wins
 	if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
 		return env
 	}

+	// 2. Live discovery file from running BrowserOS (most current)
+	if url := loadBrowserosServerURL(); url != "" {
+		return url
+	}
+
+	// 3. Saved config (may be stale if port changed)
 	cfg, err := config.Load()
 	if err == nil {
 		if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -306,6 +311,33 @@ func defaultServerURL() string {
 	return ""
 }

+type serverDiscoveryConfig struct {
+	ServerPort       int    `json:"server_port"`
+	URL              string `json:"url"`
+	ServerVersion    string `json:"server_version"`
+	BrowserOSVersion string `json:"browseros_version,omitempty"`
+	ChromiumVersion  string `json:"chromium_version,omitempty"`
+}
+
+func loadBrowserosServerURL() string {
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return ""
+	}
+
+	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
+	if err != nil {
+		return ""
+	}
+
+	var sc serverDiscoveryConfig
+	if err := json.Unmarshal(data, &sc); err != nil {
+		return ""
+	}
+
+	return normalizeServerURL(sc.URL)
+}
+
 func normalizeServerURL(raw string) string {
 	normalized := strings.TrimSpace(raw)

@@ -337,10 +369,8 @@ func validateServerURL(raw string) (string, error) {

 	return "", fmt.Errorf(
 		"BrowserOS server URL is not configured.\n\n" +
-			"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
-			"  Save it with:       browseros-cli init <Server URL>\n" +
-			"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
-			"  If BrowserOS is closed:  browseros-cli launch\n" +
-			"  If not installed:        browseros-cli install",
+			"  If BrowserOS is running:  browseros-cli init --auto\n" +
+			"  If BrowserOS is closed:   browseros-cli launch\n" +
+			"  If not installed:         browseros-cli install",
 	)
 }
--- a/packages/browseros-agent/apps/cli/cmd/root_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/root_test.go
@@ -1,13 +1,8 @@
 package cmd

 import (
-	"os"
-	"path/filepath"
-	"strings"
 	"testing"
 	"time"
-
-	"browseros-cli/config"
 )

 func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -105,76 +100,6 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
 	}
 }

-func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
-
-	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := defaultServerURL()
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
-		t.Fatalf("config.Save() error = %v", err)
-	}
-
-	got := defaultServerURL()
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
-	home := t.TempDir()
-	t.Setenv("HOME", home)
-	t.Setenv("USERPROFILE", home)
-	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
-	t.Setenv("BROWSEROS_URL", "")
-
-	serverDir := filepath.Join(home, ".browseros")
-	if err := os.MkdirAll(serverDir, 0755); err != nil {
-		t.Fatalf("os.MkdirAll() error = %v", err)
-	}
-	data := []byte(`{"url":"http://127.0.0.1:9999"}`)
-	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
-		t.Fatalf("os.WriteFile() error = %v", err)
-	}
-
-	if got := defaultServerURL(); got != "" {
-		t.Fatalf("defaultServerURL() = %q, want empty", got)
-	}
-}
-
-func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
-	got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
-	if got != "http://127.0.0.1:9115" {
-		t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
-	}
-}
-
-func TestValidateServerURLExplainsManualInit(t *testing.T) {
-	_, err := validateServerURL("")
-	if err == nil {
-		t.Fatal("validateServerURL() error = nil, want setup instructions")
-	}
-	msg := err.Error()
-	if !strings.Contains(msg, "browseros-cli init <Server URL>") {
-		t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
-	}
-	if strings.Contains(msg, "init --auto") {
-		t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
-	}
-}
-
 func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
 	done := make(chan struct{})
 	returned := make(chan struct{})
--- a/packages/browseros-agent/apps/cli/mcp/client.go
+++ b/packages/browseros-agent/apps/cli/mcp/client.go
@@ -44,7 +44,10 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {

 	session, err := sdkClient.Connect(ctx, transport, nil)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
+			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
+			"  If BrowserOS is not running:                  browseros-cli launch\n"+
+			"  If not installed:                             browseros-cli install", c.BaseURL, err)
 	}
 	return session, nil
 }
@@ -184,7 +187,10 @@ func (c *Client) Status() (map[string]any, error) {
 func (c *Client) restGET(path string) (map[string]any, error) {
 	resp, err := c.HTTPClient.Get(c.BaseURL + path)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
+			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
+			"  If BrowserOS is not running:                  browseros-cli launch\n"+
+			"  If not installed:                             browseros-cli install", c.BaseURL, err)
 	}
 	defer resp.Body.Close()

@@ -199,14 +205,3 @@ func (c *Client) restGET(path string) (map[string]any, error) {
 	}
 	return data, nil
 }
-
-// connectionSetupInstructions explains how to recover from a stale or missing server URL.
-func connectionSetupInstructions() string {
-	return "\n\n" +
-		"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
-		"  Save it with:       browseros-cli init <Server URL>\n" +
-		"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
-		"  Run once with:      browseros-cli --server <Server URL> health\n" +
-		"  If BrowserOS is closed:  browseros-cli launch\n" +
-		"  If not installed:        browseros-cli install"
-}
--- a/packages/browseros-agent/apps/cli/npm/README.md
+++ b/packages/browseros-agent/apps/cli/npm/README.md
@@ -31,8 +31,8 @@ browseros-cli install
 # Start BrowserOS
 browseros-cli launch

-# Configure MCP settings with the Server URL from BrowserOS settings
-browseros-cli init http://127.0.0.1:9000/mcp
+# Auto-configure MCP settings for your AI tools
+browseros-cli init --auto

 # Verify everything is working
 browseros-cli health
--- a/packages/browseros-agent/apps/eval/README.md
+++ b/packages/browseros-agent/apps/eval/README.md
@@ -9,7 +9,6 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
 - **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
 - **Bun** runtime
 - **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.

 ## Quick Start

@@ -68,7 +67,7 @@ This lets us run the same suite against multiple model setups without copying th

 ```txt
 agisdk-daily-10 + kimi-fireworks
-agisdk-daily-10 + claude-opus
+agisdk-daily-10 + claude-sonnet
 agisdk-daily-10 + clado-action-000159
 ```

@@ -80,7 +79,6 @@ For `orchestrator-executor` suites, there can also be an executor model/backend.
 |------|-------------|
 | `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
 | `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
-| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |

 ### Single agent

@@ -121,24 +119,6 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
 }
 ```

-### Claude Code
-
-Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
-
-```json
-{
-  "agent": {
-    "type": "claude-code",
-    "model": "opus"
-  }
-}
-```
-
-```bash
-BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
-bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
-```
-
 ## Graders

 | Name | Description |
@@ -171,7 +151,6 @@ The `apiKey` field supports two formats:
 | `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
 | `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
 | `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
-| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
 | `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
 | `NOPECHA_API_KEY` | CAPTCHA solver extension |
 | `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
@@ -202,8 +181,6 @@ export EVAL_R2_BUCKET=browseros-eval
 export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
 ```

-`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
-
 Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.

 ### BrowserOS infrastructure
@@ -215,7 +192,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
  "base_server_port": 9110,
  "base_extension_port": 9310,
  "load_extensions": false,
-  "headless": false
+  "headless": true
 }
 ```

@@ -276,35 +253,7 @@ results/
      summary.json             # Aggregate pass rates
 ```

-R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
-
-### R2 viewer manifest
-
-`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
-
-```json
-{
-  "schemaVersion": 2,
-  "runId": "agisdk-real-smoke-2026-04-30-0000",
-  "tasks": [
-    {
-      "queryId": "agisdk-dashdish-10",
-      "paths": {
-        "metadata": "tasks/agisdk-dashdish-10/metadata.json",
-        "messages": "tasks/agisdk-dashdish-10/messages.jsonl",
-        "grades": "tasks/agisdk-dashdish-10/grades.json",
-        "trace": "tasks/agisdk-dashdish-10/trace.jsonl",
-        "screenshots": "tasks/agisdk-dashdish-10/screenshots",
-        "graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
-      }
-    }
-  ]
-}
-```
-
-The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
-
-Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
+R2 publishing preserves the same task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.

 ## Troubleshooting

--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/agisdk-real.jsonl",
+  "dataset": "../../data/webbench-2of4-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
@@ -21,6 +21,6 @@
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
  },
-  "graders": ["agisdk_state_diff"],
+  "graders": ["performance_grader"],
  "timeout_ms": 1800000
 }
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
@@ -23,7 +23,7 @@
    "base_server_port": 9110,
    "base_extension_port": 9310,
    "load_extensions": false,
-    "headless": false
+    "headless": true
  },
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
--- a/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
@@ -1,22 +0,0 @@
-{
-  "agent": {
-    "type": "claude-code",
-    "model": "opus"
-  },
-  "dataset": "../../data/agisdk-real.jsonl",
-  "num_workers": 1,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
@@ -14,7 +14,7 @@
    "base_server_port": 9110,
    "base_extension_port": 9310,
    "load_extensions": false,
-    "headless": false
+    "headless": true
  },
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
--- a/packages/browseros-agent/apps/eval/package.json
+++ b/packages/browseros-agent/apps/eval/package.json
@@ -5,7 +5,6 @@
  "type": "module",
  "scripts": {
    "eval": "bun --env-file=.env.development run src/index.ts",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
--- a/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
+++ b/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
@@ -24,11 +24,45 @@ import {
  PutObjectCommand,
  S3Client,
 } from '@aws-sdk/client-s3'
-import {
-  buildRunSummaries,
-  type ReportManifest,
-  type RunSummary,
-} from '../src/reporting/run-summary'
+
+interface ManifestTask {
+  queryId: string
+  query: string
+  status: string
+  durationMs: number
+  screenshotCount: number
+  graderResults: Record<string, { pass: boolean; score: number }>
+}
+
+interface Manifest {
+  runId: string
+  uploadedAt: string
+  agentConfig?: { type?: string; model?: string }
+  dataset?: string
+  summary?: { passRate?: number; avgDurationMs?: number }
+  tasks: ManifestTask[]
+}
+
+interface RunSummary {
+  runId: string
+  configName: string
+  date: string
+  avgScore: number
+  total: number
+  completed: number
+  failed: number
+  timeout: number
+  avgDurationMs: number
+  model: string
+  dataset: string
+  agentType: string
+}
+
+const PASS_FAIL_GRADER_ORDER = [
+  'agisdk_state_diff',
+  'infinity_state',
+  'performance_grader',
+]

 function requireEnv(name: string): string {
  const value = process.env[name]
@@ -53,7 +87,7 @@ const client = new S3Client({
 // Step 1: List all manifest.json files in runs/
 console.log('Scanning R2 for eval runs...')

-const manifests: ReportManifest[] = []
+const manifests: Manifest[] = []
 let continuationToken: string | undefined

 do {
@@ -93,9 +127,64 @@ if (manifests.length === 0) {
 }

 // Step 2: Build run summaries
-const runs: RunSummary[] = buildRunSummaries(manifests)
+const runs: RunSummary[] = manifests
+  .map((m) => {
+    const total = m.tasks.length
+    const completed = m.tasks.filter((t) => t.status === 'completed').length
+    const failed = m.tasks.filter((t) => t.status === 'failed').length
+    const timeout = m.tasks.filter((t) => t.status === 'timeout').length
+
+    let scoredCount = 0
+    let scoreSum = 0
+    for (const task of m.tasks) {
+      if (!task.graderResults) continue
+      for (const name of PASS_FAIL_GRADER_ORDER) {
+        if (task.graderResults[name]) {
+          scoredCount++
+          scoreSum += task.graderResults[name].score ?? 0
+          break
+        }
+      }
+    }
+
+    const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
+    const durations = m.tasks
+      .filter((t) => t.durationMs > 0)
+      .map((t) => t.durationMs)
+    const avgDurationMs =
+      durations.length > 0
+        ? durations.reduce((a, b) => a + b, 0) / durations.length
+        : 0
+
+    const date = m.uploadedAt
+      ? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
+      : m.runId.slice(0, 15)
+
+    const model = m.agentConfig?.model || 'unknown'
+    const dataset = m.dataset || m.runId
+    const agentType = m.agentConfig?.type || 'unknown'
+
+    const configName = extractConfigName(m.runId)
+    return {
+      runId: m.runId,
+      configName,
+      date,
+      avgScore,
+      total,
+      completed,
+      failed,
+      timeout,
+      avgDurationMs,
+      model,
+      dataset,
+      agentType,
+    }
+  })
+  .sort((a, b) => a.date.localeCompare(b.date))

 // Step 3: Identify unique config groups
+// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
+// Extract config name by stripping the date-time suffix pattern
 function escHtml(s: string): string {
  return s
    .replace(/&/g, '&amp;')
@@ -104,6 +193,12 @@ function escHtml(s: string): string {
    .replace(/"/g, '&quot;')
 }

+function extractConfigName(runId: string): string {
+  // "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
+  // "ci-weekly" → "ci-weekly" (no timestamp, old format)
+  return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
+}
+
 const configGroups = [...new Set(runs.map((r) => r.configName))]
 const defaultConfig = configGroups.includes('ci-weekly')
  ? 'ci-weekly'
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
@@ -1,238 +0,0 @@
-import { writeFile } from 'node:fs/promises'
-import { join } from 'node:path'
-import { DEFAULT_TIMEOUT_MS } from '../../constants'
-import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
-import { withEvalTimeout } from '../../utils/with-eval-timeout'
-import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
-import {
-  type ClaudeCodeProcessRunner,
-  createClaudeCodeProcessRunner,
-} from './process-runner'
-import {
-  ClaudeCodeStreamParser,
-  shouldCaptureScreenshotForTool,
-} from './stream-parser'
-
-export interface ClaudeCodeEvaluatorDeps {
-  processRunner?: ClaudeCodeProcessRunner
-}
-
-export class ClaudeCodeEvaluator implements AgentEvaluator {
-  private processRunner: ClaudeCodeProcessRunner
-
-  constructor(
-    private ctx: AgentContext,
-    deps: ClaudeCodeEvaluatorDeps = {},
-  ) {
-    this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
-  }
-
-  async execute(): Promise<AgentResult> {
-    const { config, task, capture, taskOutputDir } = this.ctx
-    const startTime = Date.now()
-    const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
-
-    await capture.messageLogger.logUser(task.query)
-
-    if (config.agent.type !== 'claude-code') {
-      throw new Error('ClaudeCodeEvaluator only supports claude-code config')
-    }
-    const agentConfig = config.agent
-
-    const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
-    await writeFile(
-      mcpConfigPath,
-      JSON.stringify(
-        buildClaudeCodeMcpConfig(config.browseros.server_url),
-        null,
-        2,
-      ),
-    )
-
-    const parser = new ClaudeCodeStreamParser()
-    const toolNamesById = new Map<string, string>()
-    const prompt = buildClaudeCodePrompt(task.query)
-    const args = buildClaudeCodeArgs({
-      prompt,
-      mcpConfigPath,
-      config: agentConfig,
-    })
-
-    const { terminationReason } = await withEvalTimeout(
-      timeoutMs,
-      capture,
-      async (signal) => {
-        const runResult = await this.processRunner.run({
-          executable: agentConfig.claudePath,
-          args,
-          cwd: taskOutputDir,
-          signal,
-          onStdoutLine: async (line) => {
-            const events = parser.pushLine(line)
-            for (const event of events) {
-              await this.handleStreamEvent(event, toolNamesById)
-            }
-          },
-        })
-
-        if (runResult.exitCode !== 0) {
-          const message =
-            runResult.stderr.trim() ||
-            `Claude Code exited with status ${runResult.exitCode}`
-          capture.addError('agent_execution', message, {
-            exitCode: runResult.exitCode,
-          })
-          if (!parser.getLastText()) {
-            throw new Error(message)
-          }
-        }
-
-        for (const error of runResult.streamErrors ?? []) {
-          capture.addWarning(
-            'message_logging',
-            `Claude Code stream event processing failed: ${error}`,
-          )
-        }
-
-        return runResult
-      },
-    )
-
-    const endTime = Date.now()
-    const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
-    const metadata = {
-      query_id: task.query_id,
-      dataset: task.dataset,
-      query: task.query,
-      started_at: new Date(startTime).toISOString(),
-      completed_at: new Date(endTime).toISOString(),
-      total_duration_ms: endTime - startTime,
-      total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
-      termination_reason: terminationReason,
-      final_answer: finalAnswer,
-      errors: capture.getErrors(),
-      warnings: capture.getWarnings(),
-      device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
-      agent_config: {
-        type: 'claude-code' as const,
-        model: agentConfig.model,
-      },
-      grader_results: {},
-    }
-
-    await capture.trajectorySaver.saveMetadata(metadata)
-
-    return {
-      metadata,
-      messages: capture.getMessages(),
-      finalAnswer,
-    }
-  }
-
-  private async handleStreamEvent(
-    event: UIMessageStreamEvent,
-    toolNamesById: Map<string, string>,
-  ): Promise<void> {
-    const { capture, task } = this.ctx
-    let screenshot: number | undefined
-
-    if (event.type === 'tool-input-available') {
-      toolNamesById.set(event.toolCallId, event.toolName)
-      if (isPageInput(event.input)) {
-        capture.setActivePageId(event.input.page)
-      }
-    }
-
-    if (
-      event.type === 'tool-output-available' ||
-      event.type === 'tool-output-error'
-    ) {
-      const toolName = toolNamesById.get(event.toolCallId)
-      if (toolName && shouldCaptureScreenshotForTool(toolName)) {
-        screenshot = await this.captureScreenshot()
-      }
-    }
-
-    await capture.messageLogger.logStreamEvent(event, screenshot)
-    capture.emitEvent(task.query_id, {
-      ...event,
-      ...(screenshot !== undefined && { screenshot }),
-    })
-  }
-
-  private async captureScreenshot(): Promise<number | undefined> {
-    const { capture, task } = this.ctx
-    try {
-      const screenshot = await capture.screenshot.capture(
-        capture.getActivePageId(),
-      )
-      capture.emitEvent(task.query_id, {
-        type: 'screenshot-captured',
-        screenshot,
-      })
-      return screenshot
-    } catch {
-      return undefined
-    }
-  }
-}
-
-function isPageInput(input: unknown): input is { page: number } {
-  return (
-    typeof input === 'object' &&
-    input !== null &&
-    'page' in input &&
-    typeof input.page === 'number'
-  )
-}
-
-function buildClaudeCodePrompt(taskQuery: string): string {
-  return [
-    'You are running inside BrowserOS eval.',
-    'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
-    'When the task is complete, respond with the final answer only.',
-    'If blocked, explain the blocker clearly.',
-    '',
-    `Task: ${taskQuery}`,
-  ].join('\n')
-}
-
-function buildClaudeCodeArgs({
-  prompt,
-  mcpConfigPath,
-  config,
-}: {
-  prompt: string
-  mcpConfigPath: string
-  config: ClaudeCodeAgentConfig
-}): string[] {
-  const args = [
-    '-p',
-    prompt,
-    '--mcp-config',
-    mcpConfigPath,
-    '--strict-mcp-config',
-    '--output-format',
-    'stream-json',
-    '--verbose',
-  ]
-
-  if (config.model) args.push('--model', config.model)
-  args.push(...config.extraArgs)
-
-  return args
-}
-
-function buildClaudeCodeMcpConfig(serverUrl: string) {
-  const trimmed = serverUrl.replace(/\/$/, '')
-  const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
-  return {
-    mcpServers: {
-      browseros: {
-        type: 'http',
-        url,
-        headers: { 'X-BrowserOS-Source': 'sdk-internal' },
-      },
-    },
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
@@ -1,114 +0,0 @@
-export interface ClaudeCodeRunOptions {
-  executable: string
-  args: string[]
-  cwd: string
-  signal?: AbortSignal
-  onStdoutLine: (line: string) => Promise<void>
-}
-
-export interface ClaudeCodeRunResult {
-  exitCode: number
-  stderr: string
-  streamErrors?: string[]
-}
-
-export interface ClaudeCodeProcessRunner {
-  run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
-}
-
-export interface SpawnOptions {
-  cwd: string
-  signal?: AbortSignal
-  onStdoutLine: (line: string) => Promise<void>
-}
-
-export interface CreateClaudeCodeProcessRunnerDeps {
-  spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
-}
-
-export function createClaudeCodeProcessRunner(
-  deps: CreateClaudeCodeProcessRunnerDeps = {},
-): ClaudeCodeProcessRunner {
-  const spawn = deps.spawn ?? spawnClaudeCode
-  return {
-    run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
-      spawn([executable, ...args], { cwd, signal, onStdoutLine }),
-  }
-}
-
-async function spawnClaudeCode(
-  cmd: string[],
-  options: SpawnOptions,
-): Promise<ClaudeCodeRunResult> {
-  const proc = Bun.spawn({
-    cmd,
-    cwd: options.cwd,
-    stdin: 'ignore',
-    stdout: 'pipe',
-    stderr: 'pipe',
-  })
-
-  const abort = () => {
-    try {
-      proc.kill('SIGTERM')
-    } catch {
-      // Process may already have exited.
-    }
-  }
-  options.signal?.addEventListener('abort', abort, { once: true })
-
-  try {
-    const streamErrors: string[] = []
-    const stdoutPromise = readLines(
-      proc.stdout,
-      options.onStdoutLine,
-      streamErrors,
-    )
-    const stderrPromise = new Response(proc.stderr).text()
-    const exitCode = await proc.exited
-    await stdoutPromise
-    const stderr = await stderrPromise
-    return { exitCode, stderr, streamErrors }
-  } finally {
-    options.signal?.removeEventListener('abort', abort)
-  }
-}
-
-async function readLines(
-  stream: ReadableStream<Uint8Array>,
-  onLine: (line: string) => Promise<void>,
-  streamErrors: string[],
-): Promise<void> {
-  const reader = stream.getReader()
-  const decoder = new TextDecoder()
-  let buffer = ''
-
-  while (true) {
-    const { done, value } = await reader.read()
-    if (done) break
-
-    buffer += decoder.decode(value, { stream: true })
-    const lines = buffer.split('\n')
-    buffer = lines.pop() ?? ''
-    for (const line of lines) {
-      await emitLine(line, onLine, streamErrors)
-    }
-  }
-
-  buffer += decoder.decode()
-  if (buffer.length > 0) {
-    await emitLine(buffer, onLine, streamErrors)
-  }
-}
-
-async function emitLine(
-  line: string,
-  onLine: (line: string) => Promise<void>,
-  streamErrors: string[],
-): Promise<void> {
-  try {
-    await onLine(line)
-  } catch (error) {
-    streamErrors.push(error instanceof Error ? error.message : String(error))
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
@@ -1,142 +0,0 @@
-import { randomUUID } from 'node:crypto'
-import type { UIMessageStreamEvent } from '../../types'
-
-type JsonObject = Record<string, unknown>
-
-export class ClaudeCodeStreamParser {
-  private lastText: string | null = null
-  private toolCallCount = 0
-
-  pushLine(line: string): UIMessageStreamEvent[] {
-    const trimmed = line.trim()
-    if (!trimmed) return []
-
-    let parsed: unknown
-    try {
-      parsed = JSON.parse(trimmed)
-    } catch {
-      return []
-    }
-
-    if (!isObject(parsed)) return []
-
-    if (parsed.type === 'assistant') {
-      return this.parseAssistantMessage(parsed)
-    }
-    if (parsed.type === 'user') {
-      return this.parseUserMessage(parsed)
-    }
-    if (parsed.type === 'result' && typeof parsed.result === 'string') {
-      this.lastText = parsed.result
-    }
-
-    return []
-  }
-
-  getLastText(): string | null {
-    return this.lastText
-  }
-
-  getToolCallCount(): number {
-    return this.toolCallCount
-  }
-
-  private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
-    const content = contentBlocks(message)
-    const events: UIMessageStreamEvent[] = []
-
-    for (const block of content) {
-      if (block.type === 'text' && typeof block.text === 'string') {
-        const id = randomUUID()
-        this.lastText = block.text
-        events.push(
-          { type: 'text-start', id },
-          { type: 'text-delta', id, delta: block.text },
-          { type: 'text-end', id },
-        )
-      } else if (
-        block.type === 'tool_use' &&
-        typeof block.id === 'string' &&
-        typeof block.name === 'string'
-      ) {
-        this.toolCallCount++
-        events.push({
-          type: 'tool-input-available',
-          toolCallId: block.id,
-          toolName: block.name,
-          input: block.input,
-        })
-      }
-    }
-
-    return events
-  }
-
-  private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
-    const content = contentBlocks(message)
-    const events: UIMessageStreamEvent[] = []
-
-    for (const block of content) {
-      if (
-        block.type !== 'tool_result' ||
-        typeof block.tool_use_id !== 'string'
-      ) {
-        continue
-      }
-
-      if (block.is_error === true) {
-        events.push({
-          type: 'tool-output-error',
-          toolCallId: block.tool_use_id,
-          errorText: stringifyToolContent(block.content),
-        })
-      } else {
-        events.push({
-          type: 'tool-output-available',
-          toolCallId: block.tool_use_id,
-          output: normalizeToolContent(block.content),
-        })
-      }
-    }
-
-    return events
-  }
-}
-
-export function shouldCaptureScreenshotForTool(toolName: string): boolean {
-  if (!toolName.startsWith('mcp__browseros__')) return false
-  return !toolName.endsWith('__take_screenshot')
-}
-
-function contentBlocks(message: JsonObject): JsonObject[] {
-  const inner = isObject(message.message) ? message.message : message
-  return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
-}
-
-function isObject(value: unknown): value is JsonObject {
-  return typeof value === 'object' && value !== null
-}
-
-function normalizeToolContent(content: unknown): unknown {
-  if (!Array.isArray(content)) return content
-  return content.map((item) => {
-    if (
-      isObject(item) &&
-      item.type === 'text' &&
-      typeof item.text === 'string'
-    ) {
-      return item.text
-    }
-    return item
-  })
-}
-
-function stringifyToolContent(content: unknown): string {
-  const normalized = normalizeToolContent(content)
-  if (typeof normalized === 'string') return normalized
-  try {
-    return JSON.stringify(normalized)
-  } catch {
-    return String(normalized)
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/index.ts
@@ -1,4 +1,3 @@
-import { ClaudeCodeEvaluator } from './claude-code'
 import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
 import { SingleAgentEvaluator } from './single-agent'
 import type { AgentContext, AgentEvaluator } from './types'
@@ -9,8 +8,6 @@ export function createAgent(context: AgentContext): AgentEvaluator {
      return new SingleAgentEvaluator(context)
    case 'orchestrator-executor':
      return new OrchestratorExecutorEvaluator(context)
-    case 'claude-code':
-      return new ClaudeCodeEvaluator(context)
  }
 }

--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
@@ -1,56 +0,0 @@
-import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
-import type {
-  DelegationResult,
-  ExecutorBackend,
-  ExecutorCallbacks,
-} from '../../executor-backend'
-import { CladoActionExecutor } from './clado-action-executor'
-
-export interface CladoExecutorBackendOptions {
-  configTemplate: ResolvedAgentConfig
-  serverUrl: string
-  initialPageId?: number
-  callbacks?: ExecutorCallbacks
-}
-
-/** Executes delegated goals through the Clado visual action model. */
-export class CladoExecutorBackend implements ExecutorBackend {
-  readonly kind = 'clado'
-  private executor: CladoActionExecutor | null = null
-
-  constructor(private readonly options: CladoExecutorBackendOptions) {}
-
-  async execute(
-    instruction: string,
-    signal?: AbortSignal,
-  ): Promise<DelegationResult> {
-    const executor = this.getExecutor()
-    const result = await executor.execute(instruction, signal)
-    return result
-  }
-
-  async close(): Promise<void> {
-    await this.executor?.close()
-  }
-
-  getTotalSteps(): number {
-    return this.executor?.getTotalSteps() ?? 0
-  }
-
-  private getExecutor(): CladoActionExecutor {
-    if (this.executor) return this.executor
-
-    this.executor = new CladoActionExecutor(
-      {
-        provider: this.options.configTemplate.provider,
-        model: this.options.configTemplate.model,
-        apiKey: this.options.configTemplate.apiKey ?? '',
-        baseUrl: this.options.configTemplate.baseUrl,
-      },
-      this.options.serverUrl,
-      this.options.initialPageId,
-    )
-    this.executor.setCallbacks(this.options.callbacks ?? {})
-    return this.executor
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
@@ -1,13 +1,8 @@
 import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
 import type { Browser } from '@browseros/server/browser'
-import type {
-  ExecutorBackend,
-  ExecutorBackendKind,
-  ExecutorCallbacks,
-} from '../executor-backend'
-import { CladoExecutorBackend } from './clado/clado-executor-backend'
-import { isCladoActionProvider } from './clado/types'
-import { ToolLoopExecutorBackend } from './tool-loop/tool-loop-executor-backend'
+import type { ExecutorCallbacks } from '../../orchestrator-executor/executor'
+import type { ExecutorBackend, ExecutorBackendKind } from '../executor-backend'
+import { ExecutorAdapterBackend } from './tool-loop-backend'

 export interface CreateExecutorBackendOptions {
  backendKind?: ExecutorBackendKind
@@ -23,38 +18,28 @@ export interface CreateExecutorBackendOptions {
 }

 export function backendKindForProvider(provider: string): ExecutorBackendKind {
-  return isCladoActionProvider(provider) ? 'clado' : 'tool-loop'
+  return provider === 'clado-action' ? 'clado' : 'tool-loop'
 }

 /** Creates the backend used for one orchestrator delegation. */
 export function createExecutorBackend(
  options: CreateExecutorBackendOptions,
 ): ExecutorBackend {
-  if (options.executor) return options.executor
-
  const kind =
    options.backendKind ??
    backendKindForProvider(
      options.provider ?? options.configTemplate?.provider ?? '',
    )

-  if (kind === 'clado') {
-    return new CladoExecutorBackend({
-      configTemplate: required(options.configTemplate, 'configTemplate'),
-      serverUrl: required(options.serverUrl, 'serverUrl'),
-      initialPageId: options.initialPageId,
-      callbacks: options.callbacks,
-    })
-  }
-
-  return new ToolLoopExecutorBackend({
-    configTemplate: required(options.configTemplate, 'configTemplate'),
-    browser: options.browser ?? null,
+  return new ExecutorAdapterBackend({
+    kind,
+    configTemplate: options.configTemplate,
+    browser: options.browser,
+    serverUrl: options.serverUrl,
+    windowId: options.windowId,
+    tabId: options.tabId,
+    initialPageId: options.initialPageId,
    callbacks: options.callbacks,
+    executor: options.executor,
  })
 }
-
-function required<T>(value: T | undefined, name: string): T {
-  if (value === undefined) throw new Error(`${name} is required`)
-  return value
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop-backend.ts
@@ -0,0 +1,72 @@
+import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
+import type { Browser } from '@browseros/server/browser'
+import {
+  Executor,
+  type ExecutorCallbacks,
+} from '../../orchestrator-executor/executor'
+import type {
+  DelegationResult,
+  ExecutorBackend,
+  ExecutorBackendKind,
+} from '../executor-backend'
+
+interface ExecutorRunner {
+  execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
+  close(): Promise<void>
+  getTotalSteps(): number
+}
+
+export interface ExecutorAdapterBackendOptions {
+  kind: ExecutorBackendKind
+  configTemplate?: ResolvedAgentConfig
+  browser?: Browser | null
+  serverUrl?: string
+  windowId?: number
+  tabId?: number
+  initialPageId?: number
+  callbacks?: ExecutorCallbacks
+  executor?: ExecutorRunner
+}
+
+export class ExecutorAdapterBackend implements ExecutorBackend {
+  readonly kind: ExecutorBackendKind
+  private readonly executor: ExecutorRunner
+
+  constructor(options: ExecutorAdapterBackendOptions) {
+    this.kind = options.kind
+    this.executor =
+      options.executor ??
+      new Executor(
+        required(options.configTemplate, 'configTemplate'),
+        options.browser ?? null,
+        required(options.serverUrl, 'serverUrl'),
+        {
+          isCladoAction: options.kind === 'clado',
+          windowId: options.windowId,
+          tabId: options.tabId,
+          initialPageId: options.initialPageId,
+          callbacks: options.callbacks,
+        },
+      )
+  }
+
+  execute(
+    instruction: string,
+    signal?: AbortSignal,
+  ): Promise<DelegationResult> {
+    return this.executor.execute(instruction, signal)
+  }
+
+  close(): Promise<void> {
+    return this.executor.close()
+  }
+
+  getTotalSteps(): number {
+    return this.executor.getTotalSteps()
+  }
+}
+
+function required<T>(value: T | undefined, name: string): T {
+  if (value === undefined) throw new Error(`${name} is required`)
+  return value
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend.ts
@@ -1,144 +0,0 @@
-import { randomUUID } from 'node:crypto'
-import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
-import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
-import type { Browser } from '@browseros/server/browser'
-import { registry } from '@browseros/server/tools/registry'
-import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
-import type {
-  DelegationResult,
-  ExecutorBackend,
-  ExecutorCallbacks,
-} from '../../executor-backend'
-import { TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT } from './tool-loop-executor-prompt'
-
-export interface ToolLoopExecutorBackendOptions {
-  configTemplate: ResolvedAgentConfig
-  browser: Browser | null
-  callbacks?: ExecutorCallbacks
-}
-
-/** Executes delegated goals through the BrowserOS ToolLoopAgent. */
-export class ToolLoopExecutorBackend implements ExecutorBackend {
-  readonly kind = 'tool-loop'
-  private stepsUsed = 0
-  private currentUrl = ''
-
-  constructor(private readonly options: ToolLoopExecutorBackendOptions) {}
-
-  async execute(
-    instruction: string,
-    signal?: AbortSignal,
-  ): Promise<DelegationResult> {
-    const browser = this.options.browser
-    if (!browser) {
-      throw new Error('Browser instance is required for tool-loop executor')
-    }
-
-    const stepsAtStart = this.stepsUsed
-    const toolsUsed: string[] = []
-    let status: DelegationResult['status'] = 'done'
-    let resultText = ''
-
-    const conversationId = randomUUID()
-    const agentConfig: ResolvedAgentConfig = {
-      ...this.options.configTemplate,
-      conversationId,
-      userSystemPrompt: TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT,
-      evalMode: true,
-      workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
-    }
-
-    const browserContext = await this.browserContext(browser)
-    let agent: AiSdkAgent | null = null
-
-    try {
-      agent = await AiSdkAgent.create({
-        resolvedConfig: agentConfig,
-        browser,
-        registry,
-        browserContext,
-      })
-
-      await agent.toolLoopAgent.generate({
-        prompt: instruction,
-        abortSignal: signal,
-
-        experimental_onToolCallStart: ({ toolCall }) => {
-          const input = toolCall.input as Record<string, unknown> | undefined
-          if (input && typeof input.url === 'string' && input.url.length > 0) {
-            this.currentUrl = input.url
-          }
-          this.options.callbacks?.onToolCallStart?.({
-            toolCallId: toolCall.toolCallId,
-            toolName: toolCall.toolName,
-            input: toolCall.input,
-          })
-        },
-
-        experimental_onToolCallFinish: async () => {
-          this.stepsUsed++
-          await this.options.callbacks?.onToolCallFinish?.()
-        },
-
-        onStepFinish: async ({ toolCalls, toolResults, text }) => {
-          if (toolCalls) {
-            for (const toolCall of toolCalls) {
-              if (!toolsUsed.includes(toolCall.toolName)) {
-                toolsUsed.push(toolCall.toolName)
-              }
-            }
-          }
-
-          if (text) resultText = text
-
-          await this.options.callbacks?.onStepFinish?.({
-            toolCalls,
-            toolResults,
-            text,
-          })
-        },
-      })
-    } catch {
-      status = signal?.aborted ? 'timeout' : 'blocked'
-    } finally {
-      if (agent) await agent.dispose().catch(() => {})
-    }
-
-    if (status === 'done' && signal?.aborted) {
-      status = 'timeout'
-    }
-
-    return {
-      observation: resultText || 'Execution completed with no actions taken.',
-      status,
-      url: this.currentUrl,
-      actionsPerformed: this.stepsUsed - stepsAtStart,
-      toolsUsed,
-    }
-  }
-
-  async close(): Promise<void> {
-    // No persistent resources; AiSdkAgent is disposed at the end of each execute() call.
-  }
-
-  getTotalSteps(): number {
-    return this.stepsUsed
-  }
-
-  private async browserContext(
-    browser: Browser,
-  ): Promise<BrowserContext | undefined> {
-    const pages = await browser.listPages()
-    const activePage = pages[0]
-    if (!activePage) return undefined
-
-    return {
-      activeTab: {
-        id: activePage.tabId,
-        pageId: activePage.pageId,
-        url: activePage.url,
-        title: activePage.title,
-      },
-    }
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-prompt.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-prompt.ts
@@ -1,21 +0,0 @@
-export const TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
-
-## Your Job
-1. Execute browser actions to achieve the given goal
-2. Stop as soon as the goal is accomplished -- do NOT perform extra actions
-3. Write a final observation describing the result
-
-## Final Response Format
-When done, your response MUST include:
- What you accomplished (or what went wrong)
- What the page currently shows: key headings, links, data, or content visible
- The current URL from the address bar
- If you got stuck, what is blocking progress
-
-## Rules
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
- If the goal is to click something, confirm the result of the click.
- If you cannot find what was asked for, say so clearly -- do not guess or improvise.
- Prefer browser_navigate over browser_open_tab for going to URLs.
- Do NOT call browser_group_tabs or other organizational tools.`
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
@@ -3,28 +3,6 @@ import type { ExecutorResult } from '../orchestrator-executor/types'
 export type ExecutorBackendKind = 'tool-loop' | 'clado'
 export type DelegationResult = ExecutorResult

-export interface ToolCallInfo {
-  toolCallId: string
-  toolName: string
-  input: unknown
-}
-
-export interface ToolResultInfo {
-  toolCallId: string
-  toolName: string
-  output: unknown
-}
-
-export interface ExecutorCallbacks {
-  onToolCallStart?: (toolCall: ToolCallInfo) => void
-  onToolCallFinish?: () => Promise<void>
-  onStepFinish?: (step: {
-    toolCalls?: ReadonlyArray<ToolCallInfo>
-    toolResults?: ReadonlyArray<ToolResultInfo>
-    text?: string
-  }) => Promise<void>
-}
-
 export interface ExecutorBackend {
  readonly kind: ExecutorBackendKind
  execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
@@ -1,27 +1,22 @@
 import { randomUUID } from 'node:crypto'
-import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
-import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
-import { sleep } from '../../../../utils/sleep'
-import type {
-  ExecutorConfig,
-  ExecutorResult,
-} from '../../../orchestrator-executor/types'
-import type { ExecutorCallbacks } from '../../executor-backend'
+import { MAX_ACTIONS_PER_DELEGATION } from '../../constants'
+import { McpClient, type McpToolResult } from '../../utils/mcp-client'
+import { sleep } from '../../utils/sleep'
 import {
  extractCladoThinking,
  formatCladoHistory,
  getCladoActionSignature,
  parseCladoActions,
  summarizeCladoPrediction,
-} from './clado-actions'
+} from '../orchestrated/backends/clado/clado-actions'
 import {
  normalizeCladoDirection,
  normalizeCladoPressKey,
  normalizeCladoScrollAmount,
  prepareCladoToolArgs,
  resolveCladoPoint,
-} from './clado-browser-driver'
-import { CladoActionClient } from './clado-client'
+} from '../orchestrated/backends/clado/clado-browser-driver'
+import { CladoActionClient } from '../orchestrated/backends/clado/clado-client'
 import {
  CLADO_ACTION_PROVIDER,
  type CladoAction,
@@ -29,7 +24,9 @@ import {
  type CladoActionResponse,
  type CladoViewport,
  isCladoActionProvider,
-} from './types'
+} from '../orchestrated/backends/clado/types'
+import type { ExecutorCallbacks } from './executor'
+import type { ExecutorConfig, ExecutorResult } from './types'

 const MAX_CONSECUTIVE_PARSE_FAILURES = 3

@@ -48,8 +45,10 @@ export class CladoActionExecutor {
  private currentUrl = ''

  constructor(
-    config: ExecutorConfig,
+    private readonly config: ExecutorConfig,
    serverUrl: string,
+    readonly _windowId?: number,
+    readonly _tabId?: number,
    initialPageId?: number,
  ) {
    if (!isCladoActionProvider(config.provider)) {
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
@@ -0,0 +1,243 @@
+/**
+ * Executor - Wraps AiSdkAgent for page-level browser actions (direct CDP)
+ *
+ * The executor:
+ * - Receives goal-level instructions from orchestrator
+ * - Executes browser actions until the goal is accomplished
+ * - Returns observation to orchestrator (not full history)
+ */
+
+import { randomUUID } from 'node:crypto'
+import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
+import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
+import type { Browser } from '@browseros/server/browser'
+import { registry } from '@browseros/server/tools/registry'
+import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
+import { CladoActionExecutor } from './clado-action-executor'
+import type { ExecutorResult } from './types'
+
+const EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
+
+## Your Job
+1. Execute browser actions to achieve the given goal
+2. Stop as soon as the goal is accomplished — do NOT perform extra actions
+3. Write a final observation describing the result
+
+## Final Response Format
+When done, your response MUST include:
+- What you accomplished (or what went wrong)
+- What the page currently shows: key headings, links, data, or content visible
+- The current URL from the address bar
+- If you got stuck, what is blocking progress
+
+## Rules
+- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
+- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
+- If the goal is to click something, confirm the result of the click.
+- If you cannot find what was asked for, say so clearly — do not guess or improvise.
+- Prefer browser_navigate over browser_open_tab for going to URLs.
+- Do NOT call browser_group_tabs or other organizational tools.`
+
+export interface ToolCallInfo {
+  toolCallId: string
+  toolName: string
+  input: unknown
+}
+
+export interface ToolResultInfo {
+  toolCallId: string
+  toolName: string
+  output: unknown
+}
+
+export interface ExecutorCallbacks {
+  onToolCallStart?: (toolCall: ToolCallInfo) => void
+  onToolCallFinish?: () => Promise<void>
+  onStepFinish?: (step: {
+    toolCalls?: ReadonlyArray<ToolCallInfo>
+    toolResults?: ReadonlyArray<ToolResultInfo>
+    text?: string
+  }) => Promise<void>
+}
+
+export class Executor {
+  private cladoExecutor: CladoActionExecutor | null = null
+  private stepsUsed = 0
+  private currentUrl = ''
+  private configTemplate: ResolvedAgentConfig
+  private isCladoAction: boolean
+  private browser: Browser | null
+  private serverUrl: string
+  private windowId?: number
+  private tabId?: number
+  private initialPageId?: number
+  private callbacks: ExecutorCallbacks
+
+  constructor(
+    configTemplate: ResolvedAgentConfig,
+    browser: Browser | null,
+    serverUrl: string,
+    options?: {
+      isCladoAction?: boolean
+      windowId?: number
+      tabId?: number
+      initialPageId?: number
+      callbacks?: ExecutorCallbacks
+    },
+  ) {
+    this.configTemplate = configTemplate
+    this.isCladoAction = options?.isCladoAction ?? false
+    this.browser = browser
+    this.serverUrl = serverUrl
+    this.windowId = options?.windowId
+    this.tabId = options?.tabId
+    this.initialPageId = options?.initialPageId
+    this.callbacks = options?.callbacks ?? {}
+  }
+
+  async execute(
+    instruction: string,
+    signal?: AbortSignal,
+  ): Promise<ExecutorResult> {
+    if (this.isCladoAction) {
+      if (!this.cladoExecutor) {
+        this.cladoExecutor = new CladoActionExecutor(
+          {
+            provider: this.configTemplate.provider,
+            model: this.configTemplate.model,
+            apiKey: this.configTemplate.apiKey ?? '',
+            baseUrl: this.configTemplate.baseUrl,
+          },
+          this.serverUrl,
+          this.windowId,
+          this.tabId,
+          this.initialPageId,
+        )
+        this.cladoExecutor.setCallbacks(this.callbacks)
+      }
+
+      const result = await this.cladoExecutor.execute(instruction, signal)
+      this.stepsUsed = this.cladoExecutor.getTotalSteps()
+      this.currentUrl = result.url || this.currentUrl
+      return result
+    }
+
+    if (!this.browser) {
+      throw new Error('Browser instance is required for standard executor path')
+    }
+
+    const stepsAtStart = this.stepsUsed
+    const toolsUsed: string[] = []
+    let status: 'done' | 'blocked' | 'timeout' = 'done'
+    let resultText = ''
+
+    const conversationId = randomUUID()
+    const agentConfig: ResolvedAgentConfig = {
+      ...this.configTemplate,
+      conversationId,
+      userSystemPrompt: EXECUTOR_SYSTEM_PROMPT,
+      evalMode: true,
+      workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
+    }
+
+    // Build browser context so executor agent knows the correct page ID
+    let browserContext: BrowserContext | undefined
+    if (this.browser) {
+      const pages = await this.browser.listPages()
+      const activePage = pages[0]
+      if (activePage) {
+        browserContext = {
+          activeTab: {
+            id: activePage.tabId,
+            pageId: activePage.pageId,
+            url: activePage.url,
+            title: activePage.title,
+          },
+        }
+      }
+    }
+
+    let agent: AiSdkAgent | null = null
+
+    try {
+      agent = await AiSdkAgent.create({
+        resolvedConfig: agentConfig,
+        browser: this.browser,
+        registry,
+        browserContext,
+      })
+
+      await agent.toolLoopAgent.generate({
+        prompt: instruction,
+        abortSignal: signal,
+
+        experimental_onToolCallStart: ({ toolCall }) => {
+          const input = toolCall.input as Record<string, unknown> | undefined
+          if (input && typeof input.url === 'string' && input.url.length > 0) {
+            this.currentUrl = input.url
+          }
+          this.callbacks.onToolCallStart?.({
+            toolCallId: toolCall.toolCallId,
+            toolName: toolCall.toolName,
+            input: toolCall.input,
+          })
+        },
+
+        experimental_onToolCallFinish: async () => {
+          this.stepsUsed++
+          await this.callbacks.onToolCallFinish?.()
+        },
+
+        onStepFinish: async ({ toolCalls, toolResults, text }) => {
+          if (toolCalls) {
+            for (const tc of toolCalls) {
+              if (!toolsUsed.includes(tc.toolName)) {
+                toolsUsed.push(tc.toolName)
+              }
+            }
+          }
+
+          if (text) {
+            resultText = text
+          }
+
+          await this.callbacks.onStepFinish?.({ toolCalls, toolResults, text })
+        },
+      })
+    } catch {
+      if (signal?.aborted) {
+        status = 'timeout'
+      } else {
+        status = 'blocked'
+      }
+    } finally {
+      if (agent) await agent.dispose().catch(() => {})
+    }
+
+    if (status === 'done' && signal?.aborted) {
+      status = 'timeout'
+    }
+
+    const observation =
+      resultText || 'Execution completed with no actions taken.'
+
+    return {
+      observation,
+      status,
+      url: this.currentUrl,
+      actionsPerformed: this.stepsUsed - stepsAtStart,
+      toolsUsed,
+    }
+  }
+
+  async close(): Promise<void> {
+    await this.cladoExecutor?.close()
+  }
+
+  getTotalSteps(): number {
+    if (this.isCladoAction) {
+      return this.cladoExecutor?.getTotalSteps() ?? 0
+    }
+    return this.stepsUsed
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/index.ts
@@ -24,16 +24,16 @@ import {
  resolveProviderConfig,
 } from '../../utils/resolve-provider-config'
 import { withEvalTimeout } from '../../utils/with-eval-timeout'
-import { isCladoActionProvider } from '../orchestrated/backends/clado/types'
 import { createExecutorBackend } from '../orchestrated/backends/create-executor-backend'
-import type { ExecutorCallbacks } from '../orchestrated/executor-backend'
 import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
+import type { ExecutorCallbacks } from './executor'
 import { OrchestratorAgent } from './orchestrator-agent'
 import type { ExecutorFactory, ExecutorResult } from './types'

 interface ResolvedConfigs {
  orchestratorConfig: ResolvedAgentConfig & { maxTurns?: number }
  executorConfig: ResolvedAgentConfig
+  isCladoAction: boolean
 }

 function toResolvedAgentConfig(
@@ -68,10 +68,7 @@ async function resolveAgentConfig(
  if (!executorModel) {
    throw new Error('executor.model is required in config')
  }
-  if (
-    isCladoActionProvider(config.executor.provider) &&
-    !config.executor.baseUrl
-  ) {
+  if (config.executor.provider === 'clado-action' && !config.executor.baseUrl) {
    throw new Error(
      'executor.baseUrl is required in config for clado-action provider',
    )
@@ -79,8 +76,10 @@ async function resolveAgentConfig(

  const resolvedOrchestrator = await resolveProviderConfig(config.orchestrator)

+  const isCladoAction = config.executor.provider === 'clado-action'
+
  let executorConfig: ResolvedAgentConfig
-  if (isCladoActionProvider(config.executor.provider)) {
+  if (isCladoAction) {
    executorConfig = {
      conversationId: crypto.randomUUID(),
      provider: config.executor.provider as ResolvedAgentConfig['provider'],
@@ -109,7 +108,7 @@ async function resolveAgentConfig(
    maxTurns: config.orchestrator.maxTurns,
  }

-  return { orchestratorConfig, executorConfig }
+  return { orchestratorConfig, executorConfig, isCladoAction }
 }

 export class OrchestratorExecutorEvaluator implements AgentEvaluator {
@@ -129,7 +128,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
    }

    const agentConfig = config.agent as OrchestratorExecutorConfig
-    const { orchestratorConfig, executorConfig } =
+    const { orchestratorConfig, executorConfig, isCladoAction } =
      await resolveAgentConfig(agentConfig)

    // Connect to Chrome via CDP — same per-worker offset used by app-manager.
@@ -238,6 +237,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
        capture.emitEvent(task.query_id, delegateInputEvent)

        const executor = createExecutorBackend({
+          backendKind: isCladoAction ? 'clado' : 'tool-loop',
          configTemplate: executorConfig,
          browser,
          serverUrl: config.browseros.server_url,
@@ -331,5 +331,6 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
  }
 }

+export { Executor } from './executor'
 export { OrchestratorAgent } from './orchestrator-agent'
 export * from './types'
--- a/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
+++ b/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
@@ -105,10 +105,7 @@ export class TrajectorySaver {
      errors: [],
      warnings: [],
      agent_config: {
-        type: agentConfig.type as
-          | 'single'
-          | 'orchestrator-executor'
-          | 'claude-code',
+        type: agentConfig.type as 'single' | 'orchestrator-executor',
        model: agentConfig.model,
      },
      grader_results: {},
--- a/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
@@ -82,16 +82,6 @@ function suiteToEvalConfig(
    })
  }

-  if (suite.agent.type === 'claude-code') {
-    return EvalConfigSchema.parse({
-      ...base,
-      agent: {
-        type: 'claude-code',
-        ...(variant.agent.model && { model: variant.agent.model }),
-      },
-    })
-  }
-
  const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
  const executor =
    executorBackend === 'clado'
@@ -145,10 +135,7 @@ export async function resolveSuiteCommand(
  const loaded = await loadSuite(options.suitePath)
  const variant = resolveVariant({
    variantId: options.variantId,
-    provider:
-      loaded.suite.agent.type === 'claude-code'
-        ? 'claude-code'
-        : options.provider,
+    provider: options.provider,
    model: options.model,
    apiKey: options.apiKey,
    baseUrl: options.baseUrl,
--- a/packages/browseros-agent/apps/eval/src/dashboard/viewer.html
+++ b/packages/browseros-agent/apps/eval/src/dashboard/viewer.html
@@ -685,59 +685,6 @@
    });
  }

-  // Test harness note: these ASCII section markers are used by r2-viewer-compat.test.ts.
-  // -- Artifact path resolution
-  function taskKey(task) {
-    return task.queryId || task.id || 'unknown-task';
-  }
-
-  function legacyArtifactPath(task, artifact) {
-    const id = taskKey(task);
-    switch (artifact) {
-      case 'attempt':
-        return `${id}/attempt.json`;
-      case 'metadata':
-        return `${id}/metadata.json`;
-      case 'messages':
-        return `${id}/messages.jsonl`;
-      case 'trace':
-        return `${id}/trace.jsonl`;
-      case 'grades':
-        return `${id}/grades.json`;
-      case 'screenshots':
-        return `${id}/screenshots`;
-      case 'graderArtifacts':
-        return `${id}/grader-artifacts`;
-      default:
-        return `${id}/${artifact}`;
-    }
-  }
-
-  function artifactPath(task, artifact) {
-    const manifestPath = task.paths && task.paths[artifact];
-    if (typeof manifestPath === 'string' && manifestPath.length > 0) {
-      return manifestPath.replace(/^\/+/, '');
-    }
-    return legacyArtifactPath(task, artifact);
-  }
-
-  function artifactUrl(task, artifact) {
-    return `${basePath}/${artifactPath(task, artifact)}`;
-  }
-
-  function metadataUrl(task) {
-    return artifactUrl(task, 'metadata');
-  }
-
-  function messagesUrl(task) {
-    return artifactUrl(task, 'messages');
-  }
-
-  function screenshotUrl(task, n) {
-    return `${artifactUrl(task, 'screenshots')}/${n}.png`;
-  }
-
-  // -- Task selection
  // ── Task selection ─────────────────────────────────────────────
  function selectTask(task) {
    stopAutoplay();
@@ -769,7 +716,6 @@
    }
  }

-  // -- Center panel
  // ── Center panel: screenshot viewer ────────────────────────────
  function renderCenterPanel(task) {
    const panel = document.getElementById('center-panel');
@@ -817,6 +763,10 @@
    updateControls();
  }

+  function screenshotUrl(task, n) {
+    return `${basePath}/${task.queryId || task.id}/screenshots/${n}.png`;
+  }
+
  function goToStep(n) {
    if (!selectedTask || n < 1 || n > totalSteps) return;
    currentStep = n;
@@ -964,7 +914,7 @@
    body.innerHTML = '<div class="placeholder"><div class="ph-text" style="color: #6e7681;">Loading messages...</div></div>';
    countEl.textContent = '';

-    const msgUrl = messagesUrl(task);
+    const msgUrl = `${basePath}/${task.queryId || task.id}/messages.jsonl`;

    fetch(msgUrl)
      .then((res) => {
@@ -1125,7 +1075,7 @@

  // ── Load task metadata for rich grader details ──────────────────
  function loadTaskMetadata(task) {
-    const metaUrl = metadataUrl(task);
+    const metaUrl = `${basePath}/${task.queryId || task.id}/metadata.json`;
    fetch(metaUrl)
      .then((res) => res.ok ? res.json() : null)
      .then((meta) => {
--- a/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
@@ -2,7 +2,6 @@ export interface PythonEvaluatorOptions {
  scriptPath: string
  input: unknown
  timeoutMs: number
-  pythonPath?: string
 }

 export interface PythonEvaluatorResult<T> {
@@ -16,9 +15,7 @@ export interface PythonEvaluatorResult<T> {
 export async function runPythonJsonEvaluator<T>(
  options: PythonEvaluatorOptions,
 ): Promise<PythonEvaluatorResult<T>> {
-  const pythonPath =
-    options.pythonPath || process.env.BROWSEROS_EVAL_PYTHON || 'python3'
-  const proc = Bun.spawn([pythonPath, options.scriptPath], {
+  const proc = Bun.spawn(['python3', options.scriptPath], {
    stdin: 'pipe',
    stdout: 'pipe',
    stderr: 'pipe',
--- a/packages/browseros-agent/apps/eval/src/publishing/r2-manifest.ts
+++ b/packages/browseros-agent/apps/eval/src/publishing/r2-manifest.ts
@@ -1,8 +1,3 @@
-import type {
-  ViewerManifest,
-  ViewerManifestTask,
-} from '../viewer/viewer-manifest'
-
 export interface R2UploadConfig {
  accountId: string
  accessKeyId: string
@@ -11,9 +6,27 @@ export interface R2UploadConfig {
  cdnBaseUrl: string
 }

-export type R2ManifestTask = ViewerManifestTask
+export interface R2ManifestTask {
+  queryId: string
+  query: string
+  startUrl: string
+  status: string
+  durationMs: number
+  screenshotCount: number
+  graderResults: Record<string, unknown>
+}

-export type R2RunManifest = ViewerManifest
+export interface R2RunManifest {
+  runId: string
+  uploadedAt: string
+  agentConfig?: Record<string, unknown>
+  dataset?: string
+  summary?: {
+    passRate?: unknown
+    avgDurationMs?: unknown
+  }
+  tasks: R2ManifestTask[]
+}

 export interface R2PublishRunResult {
  runId: string
--- a/packages/browseros-agent/apps/eval/src/publishing/r2-publisher.ts
+++ b/packages/browseros-agent/apps/eval/src/publishing/r2-publisher.ts
@@ -5,11 +5,8 @@ import {
  PutObjectCommand,
  S3Client,
 } from '@aws-sdk/client-s3'
-import {
-  buildViewerManifest,
-  type ViewerManifestTaskInput,
-} from '../viewer/viewer-manifest'
 import type {
+  R2ManifestTask,
  R2PublishPathResult,
  R2PublishRunResult,
  R2RunManifest,
@@ -46,6 +43,7 @@ interface UploadJob {
 interface TaskDirEntry {
  taskId: string
  taskPath: string
+  canonicalLayout: boolean
 }

 export function contentTypeForPath(filePath: string): string {
@@ -131,6 +129,7 @@ async function findTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
      legacyTasks.push({
        taskId: entry.name,
        taskPath,
+        canonicalLayout: false,
      })
    }
  }
@@ -147,6 +146,7 @@ async function findTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
      canonicalTasks.push({
        taskId: entry.name,
        taskPath,
+        canonicalLayout: true,
      })
    }
  }
@@ -262,7 +262,7 @@ export class R2Publisher {
      throw new Error(`No task subdirectories in ${runId}`)
    }

-    const manifestTasks: ViewerManifestTaskInput[] = []
+    const manifestTasks: R2ManifestTask[] = []
    const jobs: UploadJob[] = (await collectRunRootFiles(runDir)).map(
      (job) => ({
        ...job,
@@ -289,23 +289,22 @@ export class R2Publisher {
        if (relative.startsWith('screenshots/') && extname(file) === '.png') {
          screenshotCount++
        }
-        // Keep legacy keys during the manifest v2 rollout so cached viewers and
-        // old manifests can still resolve task artifacts.
        jobs.push({
          key: `runs/${runId}/${taskId}/${relative}`,
          filePath: file,
          contentType: contentTypeForPath(file),
        })
-        jobs.push({
-          key: `runs/${runId}/tasks/${taskId}/${relative}`,
-          filePath: file,
-          contentType: contentTypeForPath(file),
-        })
+        if (taskDirEntry.canonicalLayout) {
+          jobs.push({
+            key: `runs/${runId}/tasks/${taskId}/${relative}`,
+            filePath: file,
+            contentType: contentTypeForPath(file),
+          })
+        }
      }

      manifestTasks.push({
        queryId: (meta.query_id as string | undefined) || taskId,
-        artifactId: taskId,
        query: (meta.query as string | undefined) || '',
        startUrl: (meta.start_url as string | undefined) || '',
        status: statusFromMetadata(meta),
@@ -313,8 +312,7 @@ export class R2Publisher {
        screenshotCount:
          (meta.screenshot_count as number | undefined) || screenshotCount,
        graderResults:
-          (meta.grader_results as ViewerManifestTaskInput['graderResults']) ||
-          {},
+          (meta.grader_results as Record<string, unknown> | undefined) || {},
      })
    }

@@ -349,7 +347,7 @@ export class R2Publisher {
    return {
      runId,
      uploadedFiles: uploaded + 2,
-      viewerUrl: `${this.config.cdnBaseUrl}/viewer.html?run=${encodeURIComponent(runId)}`,
+      viewerUrl: `${this.config.cdnBaseUrl}/viewer.html?run=${runId}`,
      manifest,
    }
  }
@@ -371,7 +369,7 @@ export class R2Publisher {
    runId: string,
    agentConfig: Record<string, unknown> | undefined,
    dataset: string | undefined,
-    tasks: ViewerManifestTaskInput[],
+    tasks: R2ManifestTask[],
  ): Promise<R2RunManifest> {
    let summaryData: Record<string, unknown> | undefined
    try {
@@ -380,7 +378,7 @@ export class R2Publisher {
      ) as Record<string, unknown>
    } catch {}

-    return buildViewerManifest({
+    return {
      runId,
      uploadedAt: this.now().toISOString(),
      agentConfig,
@@ -392,7 +390,7 @@ export class R2Publisher {
          }
        : undefined,
      tasks,
-    })
+    }
  }

  private async uploadFile(job: UploadJob): Promise<void> {
--- a/packages/browseros-agent/apps/eval/src/reporting/run-summary.ts
+++ b/packages/browseros-agent/apps/eval/src/reporting/run-summary.ts
@@ -1,104 +0,0 @@
-export interface ReportManifestTask {
-  queryId: string
-  query?: string
-  status: string
-  durationMs: number
-  screenshotCount?: number
-  paths?: Record<string, string>
-  graderResults?: Record<string, { pass?: boolean; score?: number }>
-}
-
-export interface ReportManifest {
-  schemaVersion?: number
-  runId: string
-  uploadedAt?: string
-  agentConfig?: { type?: string; model?: string }
-  dataset?: string
-  summary?: { passRate?: number; avgDurationMs?: number }
-  tasks?: ReportManifestTask[]
-}
-
-export interface RunSummary {
-  runId: string
-  configName: string
-  date: string
-  avgScore: number
-  total: number
-  completed: number
-  failed: number
-  timeout: number
-  avgDurationMs: number
-  model: string
-  dataset: string
-  agentType: string
-}
-
-// Report score uses the primary pass/fail grader so mixed-grader runs keep
-// the same precedence as the eval summary.
-const PASS_FAIL_GRADER_ORDER = [
-  'agisdk_state_diff',
-  'infinity_state',
-  'performance_grader',
-]
-
-export function extractConfigName(runId: string): string {
-  return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
-}
-
-function reportDate(manifest: ReportManifest): string {
-  if (!manifest.uploadedAt) return 'unknown'
-  const [date, time] = manifest.uploadedAt.split('T')
-  return `${date} ${time?.slice(0, 5) || ''}`
-}
-
-function primaryScore(task: ReportManifestTask): number | null {
-  if (!task.graderResults) return null
-  for (const name of PASS_FAIL_GRADER_ORDER) {
-    const result = task.graderResults[name]
-    if (result) return result.score ?? 0
-  }
-  return null
-}
-
-export function buildRunSummaries(manifests: ReportManifest[]): RunSummary[] {
-  return manifests
-    .map((manifest) => {
-      const tasks = Array.isArray(manifest.tasks) ? manifest.tasks : []
-      const total = tasks.length
-      const completed = tasks.filter((t) => t.status === 'completed').length
-      const failed = tasks.filter((t) => t.status === 'failed').length
-      const timeout = tasks.filter((t) => t.status === 'timeout').length
-
-      let scoredCount = 0
-      let scoreSum = 0
-      for (const task of tasks) {
-        const score = primaryScore(task)
-        if (score === null) continue
-        scoredCount++
-        scoreSum += score
-      }
-
-      const durations = tasks
-        .filter((t) => t.durationMs > 0)
-        .map((t) => t.durationMs)
-
-      return {
-        runId: manifest.runId,
-        configName: extractConfigName(manifest.runId),
-        date: reportDate(manifest),
-        avgScore: scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0,
-        total,
-        completed,
-        failed,
-        timeout,
-        avgDurationMs:
-          durations.length > 0
-            ? durations.reduce((a, b) => a + b, 0) / durations.length
-            : 0,
-        model: manifest.agentConfig?.model || 'unknown',
-        dataset: manifest.dataset || manifest.runId,
-        agentType: manifest.agentConfig?.type || 'unknown',
-      }
-    })
-    .sort((a, b) => a.date.localeCompare(b.date))
-}
--- a/packages/browseros-agent/apps/eval/src/suites/config-adapter.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/config-adapter.ts
@@ -33,13 +33,6 @@ function variantSource(config: EvalConfig): {
  baseUrl?: string
  supportsImages?: boolean
 } {
-  if (config.agent.type === 'claude-code') {
-    return {
-      provider: 'claude-code',
-      model: config.agent.model ?? 'default',
-    }
-  }
-
  const agent =
    config.agent.type === 'single' ? config.agent : config.agent.orchestrator
  if (!agent.model) {
@@ -83,7 +76,10 @@ export async function adaptEvalConfigFile(
    suite: {
      id,
      dataset: evalConfig.dataset,
-      agent: suiteAgent(evalConfig, backend),
+      agent:
+        evalConfig.agent.type === 'single'
+          ? { type: 'tool-loop' }
+          : { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' },
      graders: evalConfig.graders ?? [],
      workers: evalConfig.num_workers,
      restartBrowserPerTask: evalConfig.restart_server_per_task,
@@ -103,17 +99,3 @@ export async function adaptEvalConfigFile(
    }),
  }
 }
-
-function suiteAgent(
-  config: EvalConfig,
-  backend: ReturnType<typeof executorBackend>,
-): EvalSuite['agent'] {
-  switch (config.agent.type) {
-    case 'single':
-      return { type: 'tool-loop' }
-    case 'orchestrator-executor':
-      return { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' }
-    case 'claude-code':
-      return { type: 'claude-code' }
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/suites/resolve-variant.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/resolve-variant.ts
@@ -57,30 +57,10 @@ export function resolveVariant(
  options: ResolveVariantOptions = {},
 ): EvalVariant {
  const env = options.env ?? process.env
+  const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
  const provider =
    options.provider ?? env.EVAL_AGENT_PROVIDER ?? 'openai-compatible'
  const model = options.model ?? env.EVAL_AGENT_MODEL
-
-  if (provider === 'claude-code') {
-    const id = options.variantId ?? env.EVAL_VARIANT ?? 'claude-code'
-    return {
-      id,
-      agent: {
-        provider,
-        model: model ?? '',
-      },
-      publicMetadata: {
-        id,
-        agent: {
-          provider,
-          model: model || 'default',
-          apiKeyConfigured: false,
-        },
-      },
-    }
-  }
-
-  const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
  const apiKey = options.apiKey ?? env.EVAL_AGENT_API_KEY
  const apiKeyEnv =
    options.apiKeyEnv ?? (options.apiKey ? undefined : 'EVAL_AGENT_API_KEY')
--- a/packages/browseros-agent/apps/eval/src/suites/schema.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/schema.ts
@@ -8,7 +8,6 @@ export const SuiteAgentSchema = z
      'single',
      'orchestrated',
      'orchestrator-executor',
-      'claude-code',
    ]),
    executorBackend: z.enum(['tool-loop', 'clado']).optional(),
  })
--- a/packages/browseros-agent/apps/eval/src/types/config.ts
+++ b/packages/browseros-agent/apps/eval/src/types/config.ts
@@ -19,19 +19,9 @@ export const OrchestratorExecutorConfigSchema = z.object({
  }),
 })

-export const ClaudeCodeAgentConfigSchema = z
-  .object({
-    type: z.literal('claude-code'),
-    model: z.string().min(1).optional(),
-    claudePath: z.string().min(1).default('claude'),
-    extraArgs: z.array(z.string()).default([]),
-  })
-  .strict()
-
 export const AgentConfigSchema = z.discriminatedUnion('type', [
  SingleAgentConfigSchema,
  OrchestratorExecutorConfigSchema,
-  ClaudeCodeAgentConfigSchema,
 ])

 export const EvalConfigSchema = z.object({
@@ -63,6 +53,5 @@ export type SingleAgentConfig = z.infer<typeof SingleAgentConfigSchema>
 export type OrchestratorExecutorConfig = z.infer<
  typeof OrchestratorExecutorConfigSchema
 >
-export type ClaudeCodeAgentConfig = z.infer<typeof ClaudeCodeAgentConfigSchema>
 export type AgentConfig = z.infer<typeof AgentConfigSchema>
 export type EvalConfig = z.infer<typeof EvalConfigSchema>
--- a/packages/browseros-agent/apps/eval/src/types/index.ts
+++ b/packages/browseros-agent/apps/eval/src/types/index.ts
@@ -2,8 +2,6 @@
 export {
  type AgentConfig,
  AgentConfigSchema,
-  type ClaudeCodeAgentConfig,
-  ClaudeCodeAgentConfigSchema,
  type EvalConfig,
  EvalConfigSchema,
  type OrchestratorExecutorConfig,
--- a/packages/browseros-agent/apps/eval/src/types/result.ts
+++ b/packages/browseros-agent/apps/eval/src/types/result.ts
@@ -13,7 +13,7 @@ export const GraderResultSchema = z.object({
 // Agent config in metadata
 const AgentConfigMetaSchema = z
  .object({
-    type: z.enum(['single', 'orchestrator-executor', 'claude-code']),
+    type: z.enum(['single', 'orchestrator-executor']),
    model: z.string().optional(),
  })
  .passthrough()
--- a/packages/browseros-agent/apps/eval/src/utils/config-validator.ts
+++ b/packages/browseros-agent/apps/eval/src/utils/config-validator.ts
@@ -59,7 +59,7 @@ export async function validateConfig(
    ) {
      envVarsToCheck.push(config.agent.apiKey)
    }
-  } else if (config.agent.type === 'orchestrator-executor') {
+  } else {
    const { orchestrator, executor } = config.agent
    if (orchestrator.apiKey && isEnvVarName(orchestrator.apiKey)) {
      envVarsToCheck.push(orchestrator.apiKey)
--- a/packages/browseros-agent/apps/eval/src/viewer/viewer-manifest.ts
+++ b/packages/browseros-agent/apps/eval/src/viewer/viewer-manifest.ts
@@ -1,20 +1,7 @@
 import type { GraderResult } from '../types'

-export const VIEWER_MANIFEST_SCHEMA_VERSION = 2
-
-export interface ViewerManifestTaskPaths {
-  attempt: string
-  metadata: string
-  messages: string
-  trace: string
-  grades: string
-  screenshots: string
-  graderArtifacts: string
-}
-
 export interface ViewerManifestTaskInput {
  queryId: string
-  artifactId?: string
  query: string
  startUrl?: string
  status: string
@@ -23,67 +10,57 @@ export interface ViewerManifestTaskInput {
  graderResults: Record<string, GraderResult>
 }

-export interface ViewerManifestTask
-  extends Omit<ViewerManifestTaskInput, 'artifactId'> {
-  startUrl: string
-  paths: ViewerManifestTaskPaths
+export interface ViewerManifestTask extends ViewerManifestTaskInput {
+  paths: {
+    attempt: string
+    metadata: string
+    messages: string
+    trace: string
+    grades: string
+    screenshots: string
+    graderArtifacts: string
+  }
 }

 export interface ViewerManifest {
-  schemaVersion: typeof VIEWER_MANIFEST_SCHEMA_VERSION
  runId: string
-  suiteId?: string
-  variantId?: string
+  suiteId: string
+  variantId: string
  uploadedAt?: string
-  agentConfig?: Record<string, unknown>
-  dataset?: string
-  summary?: Record<string, unknown>
+  summary: Record<string, unknown>
  tasks: ViewerManifestTask[]
 }

 export interface BuildViewerManifestInput {
  runId: string
-  suiteId?: string
-  variantId?: string
+  suiteId: string
+  variantId: string
  uploadedAt?: string
-  agentConfig?: Record<string, unknown>
-  dataset?: string
-  summary?: Record<string, unknown>
+  summary: Record<string, unknown>
  tasks: ViewerManifestTaskInput[]
 }

-function taskPaths(queryId: string): ViewerManifestTaskPaths {
-  return {
-    attempt: `tasks/${queryId}/attempt.json`,
-    metadata: `tasks/${queryId}/metadata.json`,
-    messages: `tasks/${queryId}/messages.jsonl`,
-    trace: `tasks/${queryId}/trace.jsonl`,
-    grades: `tasks/${queryId}/grades.json`,
-    screenshots: `tasks/${queryId}/screenshots`,
-    graderArtifacts: `tasks/${queryId}/grader-artifacts`,
-  }
-}
-
 /** Builds the compact JSON index consumed by the static R2 viewer. */
 export function buildViewerManifest(
  input: BuildViewerManifestInput,
 ): ViewerManifest {
  return {
-    schemaVersion: VIEWER_MANIFEST_SCHEMA_VERSION,
    runId: input.runId,
-    ...(input.suiteId ? { suiteId: input.suiteId } : {}),
-    ...(input.variantId ? { variantId: input.variantId } : {}),
-    ...(input.uploadedAt ? { uploadedAt: input.uploadedAt } : {}),
-    ...(input.agentConfig ? { agentConfig: input.agentConfig } : {}),
-    ...(input.dataset ? { dataset: input.dataset } : {}),
-    ...(input.summary ? { summary: input.summary } : {}),
-    tasks: input.tasks.map((task) => {
-      const { artifactId, ...publicTask } = task
-      return {
-        ...publicTask,
-        startUrl: publicTask.startUrl ?? '',
-        paths: taskPaths(artifactId ?? publicTask.queryId),
-      }
-    }),
+    suiteId: input.suiteId,
+    variantId: input.variantId,
+    uploadedAt: input.uploadedAt,
+    summary: input.summary,
+    tasks: input.tasks.map((task) => ({
+      ...task,
+      paths: {
+        attempt: `tasks/${task.queryId}/attempt.json`,
+        metadata: `tasks/${task.queryId}/metadata.json`,
+        messages: `tasks/${task.queryId}/messages.jsonl`,
+        trace: `tasks/${task.queryId}/trace.jsonl`,
+        grades: `tasks/${task.queryId}/grades.json`,
+        screenshots: `tasks/${task.queryId}/screenshots`,
+        graderArtifacts: `tasks/${task.queryId}/grader-artifacts`,
+      },
+    })),
  }
 }
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-evaluator.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-evaluator.test.ts
@@ -1,268 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import { mkdtemp, readFile } from 'node:fs/promises'
-import { tmpdir } from 'node:os'
-import { join } from 'node:path'
-import { createAgent } from '../../src/agents'
-import { ClaudeCodeEvaluator } from '../../src/agents/claude-code'
-import { CaptureContext } from '../../src/capture/context'
-import {
-  AgentConfigSchema,
-  type EvalConfig,
-  EvalConfigSchema,
-  type Task,
-  TaskMetadataSchema,
-} from '../../src/types'
-
-function config(): EvalConfig {
-  return {
-    agent: {
-      type: 'claude-code',
-      model: 'opus',
-      claudePath: 'claude',
-      extraArgs: [],
-    },
-    dataset: 'data/test.jsonl',
-    num_workers: 1,
-    restart_server_per_task: false,
-    browseros: {
-      server_url: 'http://127.0.0.1:9110',
-      base_cdp_port: 9010,
-      base_server_port: 9110,
-      base_extension_port: 9310,
-      load_extensions: false,
-      headless: false,
-    },
-    graders: [],
-  }
-}
-
-const task: Task = {
-  query_id: 'task-1',
-  dataset: 'test',
-  query: 'Find the title',
-  graders: [],
-  metadata: {
-    original_task_id: 'task-1',
-  },
-}
-
-describe('ClaudeCodeEvaluator', () => {
-  it('accepts claude-code config defaults without permission mode', () => {
-    const agent = AgentConfigSchema.parse({ type: 'claude-code' })
-
-    expect(agent).toEqual({
-      type: 'claude-code',
-      claudePath: 'claude',
-      extraArgs: [],
-    })
-  })
-
-  it('accepts claude-code as a runnable eval agent', () => {
-    const parsed = EvalConfigSchema.parse({
-      agent: {
-        type: 'claude-code',
-        model: 'opus',
-      },
-      dataset: 'data/test-set.jsonl',
-      browseros: {
-        server_url: 'http://127.0.0.1:9110',
-      },
-    })
-
-    expect(parsed.agent.type).toBe('claude-code')
-    expect(parsed.agent.model).toBe('opus')
-  })
-
-  it('rejects unsupported claude-code settings instead of silently ignoring them', () => {
-    expect(
-      AgentConfigSchema.safeParse({
-        type: 'claude-code',
-        permissionMode: 'bypassPermissions',
-      }).success,
-    ).toBe(false)
-    expect(
-      AgentConfigSchema.safeParse({
-        type: 'claude-code',
-        maxTurns: 3,
-      }).success,
-    ).toBe(false)
-  })
-
-  it('allows claude-code in task metadata', () => {
-    const metadata = TaskMetadataSchema.parse({
-      query_id: 'task-1',
-      dataset: 'test',
-      query: 'Do the thing',
-      started_at: new Date().toISOString(),
-      completed_at: new Date().toISOString(),
-      total_duration_ms: 100,
-      total_steps: 1,
-      termination_reason: 'completed',
-      final_answer: 'done',
-      errors: [],
-      warnings: [],
-      agent_config: {
-        type: 'claude-code',
-        model: 'opus',
-      },
-      grader_results: {},
-    })
-
-    expect(metadata.agent_config.type).toBe('claude-code')
-  })
-
-  it('is created by the agent factory', async () => {
-    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
-    const { capture, taskOutputDir } = await CaptureContext.create({
-      serverUrl: 'http://127.0.0.1:9110',
-      outputDir,
-      taskId: task.query_id,
-      initialPageId: 1,
-    })
-
-    const agent = createAgent({
-      config: config(),
-      task,
-      workerIndex: 0,
-      initialPageId: 1,
-      outputDir,
-      taskOutputDir,
-      capture,
-    })
-
-    expect(agent).toBeInstanceOf(ClaudeCodeEvaluator)
-  })
-
-  it('runs claude code, logs messages, writes MCP config, and saves metadata', async () => {
-    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
-    const { capture, taskOutputDir } = await CaptureContext.create({
-      serverUrl: 'http://127.0.0.1:9110',
-      outputDir,
-      taskId: task.query_id,
-      initialPageId: 1,
-    })
-    const calls: Array<{ executable: string; args: string[]; cwd: string }> = []
-    const evaluator = new ClaudeCodeEvaluator(
-      {
-        config: config(),
-        task,
-        workerIndex: 0,
-        initialPageId: 1,
-        outputDir,
-        taskOutputDir,
-        capture,
-      },
-      {
-        processRunner: {
-          async run(options) {
-            calls.push(options)
-            await options.onStdoutLine(
-              JSON.stringify({
-                type: 'assistant',
-                message: {
-                  content: [{ type: 'text', text: 'The title is Example' }],
-                },
-              }),
-            )
-            await options.onStdoutLine(
-              JSON.stringify({
-                type: 'result',
-                subtype: 'success',
-                result: 'The title is Example',
-              }),
-            )
-            return { exitCode: 0, stderr: '' }
-          },
-        },
-      },
-    )
-
-    const result = await evaluator.execute()
-
-    expect(result.finalAnswer).toBe('The title is Example')
-    expect(result.metadata.agent_config).toMatchObject({
-      type: 'claude-code',
-      model: 'opus',
-    })
-    expect(result.messages.some((msg) => msg.type === 'user')).toBe(true)
-    expect(result.messages.some((msg) => msg.type === 'text-delta')).toBe(true)
-    const mcpConfig = JSON.parse(
-      await readFile(join(taskOutputDir, 'claude-code-mcp.json'), 'utf-8'),
-    )
-    expect(mcpConfig.mcpServers.browseros).toMatchObject({
-      type: 'http',
-      url: 'http://127.0.0.1:9110/mcp',
-      headers: {
-        'X-BrowserOS-Source': 'sdk-internal',
-      },
-    })
-    expect(calls).toEqual([
-      expect.objectContaining({
-        executable: 'claude',
-        cwd: taskOutputDir,
-        args: [
-          '-p',
-          expect.stringContaining('Task: Find the title'),
-          '--mcp-config',
-          join(taskOutputDir, 'claude-code-mcp.json'),
-          '--strict-mcp-config',
-          '--output-format',
-          'stream-json',
-          '--verbose',
-          '--model',
-          'opus',
-        ],
-      }),
-    ])
-    expect(calls[0].args).not.toContain('--permission-mode')
-  })
-
-  it('records non-fatal stream processing errors as warnings', async () => {
-    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
-    const { capture, taskOutputDir } = await CaptureContext.create({
-      serverUrl: 'http://127.0.0.1:9110',
-      outputDir,
-      taskId: task.query_id,
-      initialPageId: 1,
-    })
-    const evaluator = new ClaudeCodeEvaluator(
-      {
-        config: config(),
-        task,
-        workerIndex: 0,
-        initialPageId: 1,
-        outputDir,
-        taskOutputDir,
-        capture,
-      },
-      {
-        processRunner: {
-          async run(options) {
-            await options.onStdoutLine(
-              JSON.stringify({
-                type: 'result',
-                subtype: 'success',
-                result: 'done',
-              }),
-            )
-            return {
-              exitCode: 0,
-              stderr: '',
-              streamErrors: ['bad stream line'],
-            }
-          },
-        },
-      },
-    )
-
-    const result = await evaluator.execute()
-
-    expect(result.finalAnswer).toBe('done')
-    expect(result.metadata.warnings).toEqual([
-      expect.objectContaining({
-        source: 'message_logging',
-        message: 'Claude Code stream event processing failed: bad stream line',
-      }),
-    ])
-  })
-})
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-process-runner.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-process-runner.test.ts
@@ -1,78 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
-import { tmpdir } from 'node:os'
-import { join } from 'node:path'
-import { createClaudeCodeProcessRunner } from '../../src/agents/claude-code/process-runner'
-
-async function writeStdoutScript(): Promise<string> {
-  const dir = await mkdtemp(join(tmpdir(), 'claude-code-runner-'))
-  const script = join(dir, 'stdout-lines')
-  await writeFile(script, '#!/bin/sh\nprintf "first\\nbad\\nlast\\n"\n')
-  await chmod(script, 0o755)
-  return script
-}
-
-describe('createClaudeCodeProcessRunner', () => {
-  it('passes executable and args to the spawn dependency', async () => {
-    const calls: unknown[] = []
-    const runner = createClaudeCodeProcessRunner({
-      spawn: async (cmd, options) => {
-        calls.push({ cmd, options })
-        await options.onStdoutLine('{"type":"result","result":"done"}')
-        return { exitCode: 0, stderr: '' }
-      },
-    })
-
-    const result = await runner.run({
-      executable: 'claude',
-      args: ['-p', 'hello'],
-      cwd: '/tmp',
-      signal: new AbortController().signal,
-      onStdoutLine: async () => {},
-    })
-
-    expect(result.exitCode).toBe(0)
-    expect(calls).toEqual([
-      {
-        cmd: ['claude', '-p', 'hello'],
-        options: expect.objectContaining({ cwd: '/tmp' }),
-      },
-    ])
-  })
-
-  it('returns stderr and non-zero exit codes', async () => {
-    const runner = createClaudeCodeProcessRunner({
-      spawn: async () => ({ exitCode: 2, stderr: 'bad auth' }),
-    })
-
-    const result = await runner.run({
-      executable: 'claude',
-      args: [],
-      cwd: '/tmp',
-      signal: new AbortController().signal,
-      onStdoutLine: async () => {},
-    })
-
-    expect(result).toEqual({ exitCode: 2, stderr: 'bad auth' })
-  })
-
-  it('continues reading stdout after a line handler error', async () => {
-    const script = await writeStdoutScript()
-    const lines: string[] = []
-    const runner = createClaudeCodeProcessRunner()
-
-    const result = await runner.run({
-      executable: script,
-      args: [],
-      cwd: '/tmp',
-      onStdoutLine: async (line) => {
-        lines.push(line)
-        if (line === 'bad') throw new Error('bad line')
-      },
-    })
-
-    expect(result.exitCode).toBe(0)
-    expect(result.streamErrors).toEqual(['bad line'])
-    expect(lines).toEqual(['first', 'bad', 'last'])
-  })
-})
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-stream-parser.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-stream-parser.test.ts
@@ -1,102 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import {
-  ClaudeCodeStreamParser,
-  shouldCaptureScreenshotForTool,
-} from '../../src/agents/claude-code/stream-parser'
-
-describe('ClaudeCodeStreamParser', () => {
-  it('maps assistant text and MCP tool use into eval stream events', () => {
-    const parser = new ClaudeCodeStreamParser()
-    const events = parser.pushLine(
-      JSON.stringify({
-        type: 'assistant',
-        message: {
-          content: [
-            { type: 'text', text: 'I will navigate.' },
-            {
-              type: 'tool_use',
-              id: 'toolu_1',
-              name: 'mcp__browseros__navigate_page',
-              input: { page: 2, url: 'https://example.com' },
-            },
-          ],
-        },
-      }),
-    )
-
-    expect(events).toEqual([
-      { type: 'text-start', id: expect.any(String) },
-      {
-        type: 'text-delta',
-        id: expect.any(String),
-        delta: 'I will navigate.',
-      },
-      { type: 'text-end', id: expect.any(String) },
-      {
-        type: 'tool-input-available',
-        toolCallId: 'toolu_1',
-        toolName: 'mcp__browseros__navigate_page',
-        input: { page: 2, url: 'https://example.com' },
-      },
-    ])
-    expect(parser.getLastText()).toBe('I will navigate.')
-    expect(parser.getToolCallCount()).toBe(1)
-  })
-
-  it('maps Claude Code tool results into eval output events', () => {
-    const parser = new ClaudeCodeStreamParser()
-    const events = parser.pushLine(
-      JSON.stringify({
-        type: 'user',
-        message: {
-          content: [
-            {
-              type: 'tool_result',
-              tool_use_id: 'toolu_1',
-              content: 'Navigated successfully',
-            },
-          ],
-        },
-      }),
-    )
-
-    expect(events).toEqual([
-      {
-        type: 'tool-output-available',
-        toolCallId: 'toolu_1',
-        output: 'Navigated successfully',
-      },
-    ])
-  })
-
-  it('uses result messages as the authoritative final text', () => {
-    const parser = new ClaudeCodeStreamParser()
-    parser.pushLine(
-      JSON.stringify({
-        type: 'assistant',
-        message: {
-          content: [{ type: 'text', text: 'I will complete the task.' }],
-        },
-      }),
-    )
-    parser.pushLine(
-      JSON.stringify({
-        type: 'result',
-        subtype: 'success',
-        result: 'Final answer',
-      }),
-    )
-
-    expect(parser.getLastText()).toBe('Final answer')
-  })
-
-  it('identifies BrowserOS MCP tools that should trigger screenshots', () => {
-    expect(
-      shouldCaptureScreenshotForTool('mcp__browseros__navigate_page'),
-    ).toBe(true)
-    expect(
-      shouldCaptureScreenshotForTool('mcp__browseros__take_screenshot'),
-    ).toBe(false)
-    expect(shouldCaptureScreenshotForTool('Read')).toBe(false)
-  })
-})
--- a/packages/browseros-agent/apps/eval/tests/agents/executor-backend.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/executor-backend.test.ts
@@ -1,10 +1,8 @@
 import { describe, expect, it } from 'bun:test'
-import { CladoExecutorBackend } from '../../src/agents/orchestrated/backends/clado/clado-executor-backend'
 import {
  backendKindForProvider,
  createExecutorBackend,
 } from '../../src/agents/orchestrated/backends/create-executor-backend'
-import { ToolLoopExecutorBackend } from '../../src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend'
 import type { ExecutorBackend } from '../../src/agents/orchestrated/executor-backend'

 describe('executor backend boundary', () => {
@@ -13,32 +11,6 @@ describe('executor backend boundary', () => {
    expect(backendKindForProvider('openai-compatible')).toBe('tool-loop')
  })

-  it('creates concrete backend classes for each executor path', () => {
-    expect(
-      createExecutorBackend({
-        backendKind: 'tool-loop',
-        configTemplate: {
-          provider: 'openai-compatible',
-          model: 'tool-loop-model',
-        },
-        browser: null,
-        serverUrl: 'http://127.0.0.1:9110',
-      }),
-    ).toBeInstanceOf(ToolLoopExecutorBackend)
-
-    expect(
-      createExecutorBackend({
-        backendKind: 'clado',
-        configTemplate: {
-          provider: 'clado-action',
-          model: 'clado-model',
-          baseUrl: 'https://clado.example.test',
-        },
-        serverUrl: 'http://127.0.0.1:9110',
-      }),
-    ).toBeInstanceOf(CladoExecutorBackend)
-  })
-
  it('forwards execution and step state through the backend interface', async () => {
    const signal = new AbortController().signal
    const fakeBackend: ExecutorBackend = {
@@ -61,6 +33,7 @@ describe('executor backend boundary', () => {
    }

    const backend = createExecutorBackend({
+      backendKind: 'tool-loop',
      executor: fakeBackend,
    })
    const result = await backend.execute('Click checkout', signal)
--- a/packages/browseros-agent/apps/eval/tests/cli/suite-command.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/cli/suite-command.test.ts
@@ -7,11 +7,8 @@ import {
  runSuiteCommand,
 } from '../../src/cli/commands/suite'
 import type { RunEvalOptions } from '../../src/runner/types'
-import type { EvalSuite } from '../../src/suites/schema'

-async function writeTempSuite(
-  overrides: Partial<EvalSuite> = {},
-): Promise<{ dir: string; suitePath: string }> {
+async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
  const dir = await mkdtemp(join(tmpdir(), 'eval-suite-cli-'))
  const suitePath = join(dir, 'agisdk-daily-10.json')
  await writeFile(
@@ -26,9 +23,8 @@ async function writeTempSuite(
        restartBrowserPerTask: true,
        browseros: {
          server_url: 'http://127.0.0.1:9110',
-          headless: false,
+          headless: true,
        },
-        ...overrides,
      },
      null,
      2,
@@ -47,7 +43,9 @@ describe('suite command', () => {

    expect(resolved.kind).toBe('config')
    expect(resolved.suite.id).toBe('browseros-agent-weekly')
-    expect(resolved.evalConfig.dataset).toBe('../../data/agisdk-real.jsonl')
+    expect(resolved.evalConfig.dataset).toBe(
+      '../../data/webbench-2of4-50.jsonl',
+    )
    expect(resolved.variant.publicMetadata.agent.apiKeyConfigured).toBe(true)
  })

@@ -77,25 +75,6 @@ describe('suite command', () => {
    expect(resolved.evalConfig.num_workers).toBe(2)
  })

-  it('resolves claude-code suites without provider API credentials', async () => {
-    const { dir, suitePath } = await writeTempSuite({
-      agent: { type: 'claude-code' },
-    })
-
-    const resolved = await resolveSuiteCommand({
-      suitePath,
-      model: 'opus',
-      env: {},
-    })
-
-    expect(resolved.kind).toBe('suite')
-    expect(resolved.evalConfig.agent).toMatchObject({
-      type: 'claude-code',
-      model: 'opus',
-    })
-    expect(resolved.datasetPath).toBe(join(dir, 'tasks.jsonl'))
-  })
-
  it('runs config and suite commands through the runner dependency', async () => {
    const calls: RunEvalOptions[] = []
    await runSuiteCommand(
--- a/packages/browseros-agent/apps/eval/tests/grading/python-evaluator.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/grading/python-evaluator.test.ts
@@ -1,5 +1,5 @@
 import { describe, expect, it } from 'bun:test'
-import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
+import { mkdtemp, writeFile } from 'node:fs/promises'
 import { tmpdir } from 'node:os'
 import { join } from 'node:path'
 import { runPythonJsonEvaluator } from '../../src/grading/python-evaluator'
@@ -11,17 +11,6 @@ async function writeScript(source: string): Promise<string> {
  return script
 }

-async function writePythonWrapper(): Promise<string> {
-  const dir = await mkdtemp(join(tmpdir(), 'eval-python-wrapper-'))
-  const wrapper = join(dir, 'python-wrapper')
-  await writeFile(
-    wrapper,
-    '#!/bin/sh\necho custom-python >&2\nexec python3 "$@"\n',
-  )
-  await chmod(wrapper, 0o755)
-  return wrapper
-}
-
 describe('runPythonJsonEvaluator', () => {
  it('sends JSON on stdin, captures stderr, and parses stdout JSON', async () => {
    const script = await writeScript(`
@@ -60,34 +49,6 @@ sys.exit(3)
    ).rejects.toThrow('bad verifier')
  })

-  it('uses BROWSEROS_EVAL_PYTHON when provided', async () => {
-    const script = await writeScript(`
-import json, sys
-data = json.loads(sys.stdin.read())
-print(json.dumps({"ok": data["ok"]}))
-`)
-    const wrapper = await writePythonWrapper()
-    const previousPythonPath = process.env.BROWSEROS_EVAL_PYTHON
-    process.env.BROWSEROS_EVAL_PYTHON = wrapper
-
-    try {
-      const result = await runPythonJsonEvaluator<{ ok: boolean }>({
-        scriptPath: script,
-        input: { ok: true },
-        timeoutMs: 5_000,
-      })
-
-      expect(result.output).toEqual({ ok: true })
-      expect(result.stderr).toContain('custom-python')
-    } finally {
-      if (previousPythonPath === undefined) {
-        delete process.env.BROWSEROS_EVAL_PYTHON
-      } else {
-        process.env.BROWSEROS_EVAL_PYTHON = previousPythonPath
-      }
-    }
-  })
-
  it('enforces timeouts', async () => {
    const script = await writeScript(`
 import time
--- a/packages/browseros-agent/apps/eval/tests/publishing/r2-publisher.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/publishing/r2-publisher.test.ts
@@ -26,7 +26,6 @@ async function writeRunFixture(
  root: string,
  configName = 'browseros-agent-weekly',
  timestamp = '2026-04-29-1200',
-  options: { queryId?: string } = {},
 ): Promise<{ runDir: string; runId: string }> {
  const runDir = join(root, configName, timestamp)
  const taskDir = join(runDir, 'task-1')
@@ -34,7 +33,7 @@ async function writeRunFixture(
  await writeFile(
    join(taskDir, 'metadata.json'),
    JSON.stringify({
-      query_id: options.queryId ?? 'task-1',
+      query_id: 'task-1',
      dataset: 'webbench',
      query: 'Find pricing',
      start_url: 'https://example.test',
@@ -95,15 +94,6 @@ describe('R2Publisher', () => {
    expect(
      byKey.get(`runs/${runId}/task-1/screenshots/1.png`)?.ContentType,
    ).toBe('image/png')
-    expect(
-      byKey.get(`runs/${runId}/tasks/task-1/metadata.json`)?.ContentType,
-    ).toBe('application/json')
-    expect(
-      byKey.get(`runs/${runId}/tasks/task-1/messages.jsonl`)?.ContentType,
-    ).toBe('application/x-ndjson')
-    expect(
-      byKey.get(`runs/${runId}/tasks/task-1/screenshots/1.png`)?.ContentType,
-    ).toBe('image/png')
    expect(byKey.get(`runs/${runId}/manifest.json`)?.ContentType).toBe(
      'application/json',
    )
@@ -121,10 +111,8 @@ describe('R2Publisher', () => {
      ).toString('utf-8'),
    )
    expect(manifest).toMatchObject({
-      schemaVersion: 2,
      runId,
      uploadedAt: '2026-04-29T12:00:00.000Z',
-      agentConfig: { type: 'single', model: 'kimi' },
      dataset: 'webbench',
      summary: { passRate: 1, avgDurationMs: 1200 },
      tasks: [
@@ -132,86 +120,11 @@ describe('R2Publisher', () => {
          queryId: 'task-1',
          status: 'completed',
          screenshotCount: 1,
-          paths: {
-            attempt: 'tasks/task-1/attempt.json',
-            metadata: 'tasks/task-1/metadata.json',
-            messages: 'tasks/task-1/messages.jsonl',
-            trace: 'tasks/task-1/trace.jsonl',
-            grades: 'tasks/task-1/grades.json',
-            screenshots: 'tasks/task-1/screenshots',
-            graderArtifacts: 'tasks/task-1/grader-artifacts',
-          },
        },
      ],
    })
  })

-  it('uses task directory ids for canonical paths when metadata query ids differ', async () => {
-    const dir = await mkdtemp(join(tmpdir(), 'eval-r2-path-id-'))
-    const { runDir, runId } = await writeRunFixture(
-      dir,
-      'weekly',
-      '2026-04-29-1200',
-      { queryId: 'query-id-from-metadata' },
-    )
-    const viewerPath = join(dir, 'viewer.html')
-    await writeFile(viewerPath, '<html>viewer</html>')
-    const client = new FakeR2Client()
-
-    await new R2Publisher({
-      client,
-      viewerPath,
-      config: {
-        accountId: 'acct',
-        accessKeyId: 'key',
-        secretAccessKey: 'secret',
-        bucket: 'bucket',
-        cdnBaseUrl: 'https://eval.example.test',
-      },
-      now: () => new Date('2026-04-29T12:00:00.000Z'),
-    }).publishRun(runDir, runId)
-
-    const byKey = new Map(client.puts.map((put) => [put.Key, put]))
-    const manifest = JSON.parse(
-      Buffer.from(
-        byKey.get(`runs/${runId}/manifest.json`)?.Body as Buffer,
-      ).toString('utf-8'),
-    )
-
-    expect(byKey.has(`runs/${runId}/tasks/task-1/metadata.json`)).toBe(true)
-    expect(manifest.tasks[0]).toMatchObject({
-      queryId: 'query-id-from-metadata',
-      paths: {
-        metadata: 'tasks/task-1/metadata.json',
-        screenshots: 'tasks/task-1/screenshots',
-      },
-    })
-  })
-
-  it('encodes run ids in returned viewer urls', async () => {
-    const dir = await mkdtemp(join(tmpdir(), 'eval-r2-viewer-url-'))
-    const { runDir } = await writeRunFixture(dir)
-    const viewerPath = join(dir, 'viewer.html')
-    await writeFile(viewerPath, '<html>viewer</html>')
-    const client = new FakeR2Client()
-
-    const result = await new R2Publisher({
-      client,
-      viewerPath,
-      config: {
-        accountId: 'acct',
-        accessKeyId: 'key',
-        secretAccessKey: 'secret',
-        bucket: 'bucket',
-        cdnBaseUrl: 'https://eval.example.test',
-      },
-    }).publishRun(runDir, 'run with spaces')
-
-    expect(result.viewerUrl).toBe(
-      'https://eval.example.test/viewer.html?run=run%20with%20spaces',
-    )
-  })
-
  it('publishes unuploaded runs from a config results directory', async () => {
    const dir = await mkdtemp(join(tmpdir(), 'eval-r2-config-'))
    const first = await writeRunFixture(dir, 'weekly', '2026-04-29-1200')
@@ -273,27 +186,8 @@ describe('R2Publisher', () => {
    }).publishPath(runDir)

    const keys = client.puts.map((put) => put.Key)
-    const byKey = new Map(client.puts.map((put) => [put.Key, put]))
-    const manifest = JSON.parse(
-      Buffer.from(
-        byKey.get(`runs/${runId}/manifest.json`)?.Body as Buffer,
-      ).toString('utf-8'),
-    )
-
    expect(result.uploadedRuns.map((run) => run.runId)).toEqual([runId])
    expect(keys).toContain(`runs/${runId}/task-1/metadata.json`)
    expect(keys).toContain(`runs/${runId}/tasks/task-1/metadata.json`)
-    expect(manifest).toMatchObject({
-      schemaVersion: 2,
-      tasks: [
-        {
-          queryId: 'task-1',
-          paths: {
-            metadata: 'tasks/task-1/metadata.json',
-            screenshots: 'tasks/task-1/screenshots',
-          },
-        },
-      ],
-    })
  })
 })
--- a/packages/browseros-agent/apps/eval/tests/publishing/r2-viewer-compat.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/publishing/r2-viewer-compat.test.ts
@@ -1,130 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import { readFile } from 'node:fs/promises'
-import { join } from 'node:path'
-
-interface ViewerPathResolvers {
-  artifactUrl(task: Record<string, unknown>, artifact: string): string
-  metadataUrl(task: Record<string, unknown>): string
-  messagesUrl(task: Record<string, unknown>): string
-  screenshotUrl(task: Record<string, unknown>, step: number): string
-}
-
-async function loadViewerPathResolvers(): Promise<ViewerPathResolvers> {
-  const html = await readFile(
-    join(import.meta.dir, '..', '..', 'src', 'dashboard', 'viewer.html'),
-    'utf-8',
-  )
-  const start = html.indexOf('// -- Artifact path resolution')
-  const end = html.indexOf('// -- Task selection', start)
-  expect(start).toBeGreaterThan(-1)
-  expect(end).toBeGreaterThan(start)
-
-  const block = html.slice(start, end)
-  const createResolvers = new Function(
-    `
-      const basePath = 'runs/run-1';
-      ${block}
-      return { artifactUrl, metadataUrl, messagesUrl, screenshotUrl };
-    `,
-  ) as () => ViewerPathResolvers
-  return createResolvers()
-}
-
-async function runAutoSelectFromHash(hash: string): Promise<unknown> {
-  const html = await readFile(
-    join(import.meta.dir, '..', '..', 'src', 'dashboard', 'viewer.html'),
-    'utf-8',
-  )
-  const start = html.indexOf('function autoSelectFromHash()')
-  const end = html.indexOf('// -- Center panel', start)
-  expect(start).toBeGreaterThan(-1)
-  expect(end).toBeGreaterThan(start)
-
-  const block = html.slice(start, end)
-  const runAutoSelect = new Function(
-    `
-      const window = { location: { hash: ${JSON.stringify(hash)} } };
-      const manifest = {
-        tasks: [
-          { queryId: 'legacy-task' },
-          { queryId: 'new-task', paths: { metadata: 'tasks/new-task/metadata.json' } },
-        ],
-      };
-      let selected = null;
-      function selectTask(task) { selected = task; }
-      ${block}
-      autoSelectFromHash();
-      return selected;
-    `,
-  ) as () => unknown
-  return runAutoSelect()
-}
-
-describe('R2 viewer artifact path compatibility', () => {
-  it('uses explicit manifest paths for new uploaded runs', async () => {
-    const resolvers = await loadViewerPathResolvers()
-    const task = {
-      queryId: 'task-1',
-      paths: {
-        metadata: 'tasks/task-1/metadata.json',
-        messages: 'tasks/task-1/messages.jsonl',
-        grades: 'tasks/task-1/grades.json',
-        trace: 'tasks/task-1/trace.jsonl',
-        screenshots: 'tasks/task-1/screenshots',
-        graderArtifacts: 'tasks/task-1/grader-artifacts',
-      },
-    }
-
-    expect(resolvers.metadataUrl(task)).toBe(
-      'runs/run-1/tasks/task-1/metadata.json',
-    )
-    expect(resolvers.messagesUrl(task)).toBe(
-      'runs/run-1/tasks/task-1/messages.jsonl',
-    )
-    expect(resolvers.artifactUrl(task, 'grades')).toBe(
-      'runs/run-1/tasks/task-1/grades.json',
-    )
-    expect(resolvers.artifactUrl(task, 'trace')).toBe(
-      'runs/run-1/tasks/task-1/trace.jsonl',
-    )
-    expect(resolvers.artifactUrl(task, 'graderArtifacts')).toBe(
-      'runs/run-1/tasks/task-1/grader-artifacts',
-    )
-    expect(resolvers.screenshotUrl(task, 7)).toBe(
-      'runs/run-1/tasks/task-1/screenshots/7.png',
-    )
-  })
-
-  it('falls back to legacy inferred paths for old uploaded runs', async () => {
-    const resolvers = await loadViewerPathResolvers()
-    const task = { queryId: 'legacy-task' }
-
-    expect(resolvers.metadataUrl(task)).toBe(
-      'runs/run-1/legacy-task/metadata.json',
-    )
-    expect(resolvers.messagesUrl(task)).toBe(
-      'runs/run-1/legacy-task/messages.jsonl',
-    )
-    expect(resolvers.artifactUrl(task, 'grades')).toBe(
-      'runs/run-1/legacy-task/grades.json',
-    )
-    expect(resolvers.artifactUrl(task, 'trace')).toBe(
-      'runs/run-1/legacy-task/trace.jsonl',
-    )
-    expect(resolvers.artifactUrl(task, 'graderArtifacts')).toBe(
-      'runs/run-1/legacy-task/grader-artifacts',
-    )
-    expect(resolvers.screenshotUrl(task, 3)).toBe(
-      'runs/run-1/legacy-task/screenshots/3.png',
-    )
-  })
-
-  it('keeps hash-based task selection independent of artifact layout', async () => {
-    expect(await runAutoSelectFromHash('#new-task')).toMatchObject({
-      queryId: 'new-task',
-    })
-    expect(await runAutoSelectFromHash('#legacy-task')).toMatchObject({
-      queryId: 'legacy-task',
-    })
-  })
-})
--- a/packages/browseros-agent/apps/eval/tests/reporting/run-summary.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/reporting/run-summary.test.ts
@@ -1,105 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import {
-  buildRunSummaries,
-  extractConfigName,
-} from '../../src/reporting/run-summary'
-
-describe('report run summaries', () => {
-  it('summarizes schema v2 manifests without depending on artifact paths', () => {
-    const [summary] = buildRunSummaries([
-      {
-        schemaVersion: 2,
-        runId: 'agisdk-real-smoke-2026-04-30-0000',
-        uploadedAt: '2026-04-30T01:03:59.663Z',
-        agentConfig: { type: 'single', model: 'moonshotai/kimi-k2.5' },
-        dataset: 'agisdk-real',
-        tasks: [
-          {
-            queryId: 'task-1',
-            query: 'Do task 1',
-            status: 'completed',
-            durationMs: 1000,
-            screenshotCount: 1,
-            paths: { metadata: 'tasks/task-1/metadata.json' },
-            graderResults: {
-              agisdk_state_diff: { score: 1, pass: true },
-            },
-          },
-          {
-            queryId: 'task-2',
-            query: 'Do task 2',
-            status: 'timeout',
-            durationMs: 3000,
-            screenshotCount: 0,
-            paths: { metadata: 'tasks/task-2/metadata.json' },
-            graderResults: {
-              agisdk_state_diff: { score: 0, pass: false },
-            },
-          },
-        ],
-      },
-    ])
-
-    expect(summary).toMatchObject({
-      runId: 'agisdk-real-smoke-2026-04-30-0000',
-      configName: 'agisdk-real-smoke',
-      date: '2026-04-30 01:03',
-      avgScore: 50,
-      total: 2,
-      completed: 1,
-      timeout: 1,
-      avgDurationMs: 2000,
-      model: 'moonshotai/kimi-k2.5',
-      dataset: 'agisdk-real',
-      agentType: 'single',
-    })
-  })
-
-  it('summarizes legacy manifests without schema version or paths', () => {
-    const [summary] = buildRunSummaries([
-      {
-        runId: 'browseros-agent-weekly-2026-04-29-1430',
-        uploadedAt: '2026-04-29T14:30:00.000Z',
-        agentConfig: { type: 'orchestrator-executor', model: 'kimi' },
-        dataset: 'webbench',
-        tasks: [
-          {
-            queryId: 'legacy-task',
-            query: 'Do the old task',
-            status: 'failed',
-            durationMs: 0,
-            screenshotCount: 0,
-            graderResults: {
-              performance_grader: { score: 0.25, pass: false },
-            },
-          },
-        ],
-      },
-    ])
-
-    expect(summary).toMatchObject({
-      runId: 'browseros-agent-weekly-2026-04-29-1430',
-      configName: 'browseros-agent-weekly',
-      avgScore: 25,
-      total: 1,
-      completed: 0,
-      failed: 1,
-      avgDurationMs: 0,
-    })
-  })
-
-  it('keeps legacy config names when run ids have no timestamp suffix', () => {
-    expect(extractConfigName('ci-weekly')).toBe('ci-weekly')
-  })
-
-  it('uses an explicit unknown date when uploadedAt is missing', () => {
-    const [summary] = buildRunSummaries([
-      {
-        runId: 'ci-weekly',
-        tasks: [],
-      },
-    ])
-
-    expect(summary.date).toBe('unknown')
-  })
-})
--- a/packages/browseros-agent/apps/eval/tests/suites/config-adapter.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/suites/config-adapter.test.ts
@@ -1,18 +1,15 @@
 import { describe, expect, it } from 'bun:test'
-import { mkdtemp, writeFile } from 'node:fs/promises'
-import { tmpdir } from 'node:os'
-import { join } from 'node:path'
 import { adaptEvalConfigFile } from '../../src/suites/config-adapter'

 describe('adaptEvalConfigFile', () => {
-  it('preserves browseros-agent-weekly AGI SDK config semantics', async () => {
+  it('preserves browseros-agent-weekly config semantics', async () => {
    const adapted = await adaptEvalConfigFile(
      'apps/eval/configs/legacy/browseros-agent-weekly.json',
    )

    expect(adapted.suite.id).toBe('browseros-agent-weekly')
-    expect(adapted.suite.dataset).toBe('../../data/agisdk-real.jsonl')
-    expect(adapted.suite.graders).toEqual(['agisdk_state_diff'])
+    expect(adapted.suite.dataset).toBe('../../data/webbench-2of4-50.jsonl')
+    expect(adapted.suite.graders).toEqual(['performance_grader'])
    expect(adapted.suite.workers).toBe(10)
    expect(adapted.suite.restartBrowserPerTask).toBe(true)
    expect(adapted.suite.timeoutMs).toBe(1_800_000)
@@ -37,33 +34,4 @@ describe('adaptEvalConfigFile', () => {
      'secret-openrouter-value',
    )
  })
-
-  it('adapts claude-code configs without provider credentials', async () => {
-    const dir = await mkdtemp(join(tmpdir(), 'claude-code-config-'))
-    const configPath = join(dir, 'claude-code-agisdk.json')
-    await writeFile(
-      configPath,
-      JSON.stringify({
-        agent: {
-          type: 'claude-code',
-          model: 'opus',
-        },
-        dataset: 'tasks.jsonl',
-        num_workers: 1,
-        restart_server_per_task: false,
-        browseros: {
-          server_url: 'http://127.0.0.1:9110',
-          headless: false,
-        },
-      }),
-    )
-
-    const adapted = await adaptEvalConfigFile(configPath, { env: {} })
-
-    expect(adapted.suite.agent).toEqual({ type: 'claude-code' })
-    expect(adapted.variant.agent).toMatchObject({
-      provider: 'claude-code',
-      model: 'opus',
-    })
-  })
 })
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Nikhil Sonti	6ee306236e	chore(eval): colocate grader python evaluators	2026-04-29 17:16:58 -07:00
Nikhil Sonti	0afc59cda1	chore(eval): organize config layouts	2026-04-29 17:01:25 -07:00
Nikhil Sonti	eb8faa931a	docs(eval): explain suites and variants	2026-04-29 16:38:54 -07:00
Nikhil Sonti	be70170313	docs(eval): add env example	2026-04-29 16:10:27 -07:00
Nikhil Sonti	0661197f5b	fix: address review feedback for PR #875	2026-04-29 16:00:56 -07:00
Nikhil Sonti	c4e7824266	chore(eval): verify pipeline refactor	2026-04-29 15:47:09 -07:00
Nikhil Sonti	22f71a36c5	docs(eval): document suite pipeline	2026-04-29 15:45:27 -07:00
Nikhil Sonti	d49986d0b3	ci(eval): migrate weekly workflow to eval cli	2026-04-29 15:43:56 -07:00
Nikhil Sonti	acdd394585	feat(eval): add r2 publisher module	2026-04-29 15:42:58 -07:00
Nikhil Sonti	219fdf1e28	feat(eval): add workflow compatible cli	2026-04-29 15:40:05 -07:00
Nikhil Sonti	014f71d227	refactor(eval): split clado backend	2026-04-29 15:34:09 -07:00
Nikhil Sonti	876dea4d56	refactor(eval): add executor backend boundary	2026-04-29 15:28:56 -07:00
Nikhil Sonti	fca7d4cbcb	refactor(eval): rename runner layers	2026-04-29 15:27:12 -07:00
Nikhil Sonti	e1bfadb075	feat(eval): persist grader artifacts	2026-04-29 15:25:42 -07:00
Nikhil Sonti	aa0d9b96ef	refactor(eval): add shared grader contract	2026-04-29 15:23:41 -07:00
Nikhil Sonti	1c9604b5fa	feat(eval): add stable run artifacts	2026-04-29 15:22:10 -07:00
Nikhil Sonti	685266a1d8	feat(eval): add suite variant config bridge	2026-04-29 15:20:45 -07:00