chore(eval): drop the 60-char truncation on grader expected/actual values

Some criteria check long strings (job descriptions, post bodies, etc.) — truncating to 60 chars hides exactly the bytes you need to diff. The viewer's reasoning area already has max-height + scroll + word-break so long content scrolls; nothing renders worse for being full-length.
chore(eval): show every criterion in agisdk grader message, not just failures
2026-05-14 16:14:28 +00:00 · 2026-04-30 02:08:30 +05:30 · 2026-04-30 02:08:07 +05:30 · 2026-04-30 02:06:51 +05:30 · 2026-04-30 01:16:20 +05:30 · 2026-04-30 00:37:45 +05:30
232 changed files with 6994 additions and 12864 deletions
--- a/.github/workflows/build-agent.yml
+++ b/.github/workflows/build-agent.yml
@@ -0,0 +1,157 @@
+name: build-agent
+
+on:
+  workflow_dispatch:
+    inputs:
+      agent:
+        description: "Agent name from bundle.json"
+        required: true
+        type: string
+        default: openclaw
+      publish:
+        description: "Upload to R2 and merge manifest slice"
+        required: false
+        default: false
+        type: boolean
+  pull_request:
+    paths:
+      - "packages/browseros-agent/packages/build-tools/**"
+      - ".github/workflows/build-agent.yml"
+
+env:
+  BUN_VERSION: "1.3.6"
+  PKG_DIR: packages/browseros-agent/packages/build-tools
+
+permissions:
+  contents: read
+
+jobs:
+  check:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools typecheck
+      - working-directory: packages/browseros-agent
+        run: bun run --filter @browseros/build-tools test
+
+  build:
+    needs: check
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - arch: arm64
+            runner: ubuntu-24.04-arm
+    runs-on: ${{ matrix.runner }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Build tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          OUT: ${{ github.workspace }}/dist/images
+        run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
+      - uses: actions/upload-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
+          path: dist/images/
+          retention-days: 7
+
+  smoke:
+    needs: build
+    runs-on: ubuntu-24.04-arm
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          name: tarball-${{ inputs.agent || 'openclaw' }}-arm64
+          path: dist/images
+      - name: Install podman
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y podman
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Smoke test tarball
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+        run: |
+          set -euo pipefail
+          tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-arm64.tar.gz" -print -quit)"
+          if [ -z "$tarball" ]; then
+            echo "missing arm64 tarball artifact for ${AGENT}" >&2
+            exit 1
+          fi
+          bun run smoke:tarball -- --agent "$AGENT" --arch arm64 --tarball "$tarball"
+
+  publish:
+    needs: [build, smoke]
+    if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
+    runs-on: ubuntu-24.04
+    environment: release
+    concurrency:
+      group: r2-manifest-publish
+      cancel-in-progress: false
+    steps:
+      - uses: actions/checkout@v4
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: ${{ env.BUN_VERSION }}
+      - uses: actions/download-artifact@v4
+        with:
+          pattern: tarball-*
+          path: dist/images
+          merge-multiple: true
+      - working-directory: packages/browseros-agent
+        run: bun install --frozen-lockfile
+      - name: Upload tarballs to R2
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
+            base="$(basename "$file")"
+            bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
+          done
+      - name: Merge agent slice into manifest
+        working-directory: ${{ env.PKG_DIR }}
+        env:
+          AGENT: ${{ inputs.agent || 'openclaw' }}
+          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
+          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_BUCKET: ${{ secrets.R2_BUCKET }}
+        run: |
+          set -euo pipefail
+          mkdir -p dist/images
+          cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
+          bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
+          bun run emit-manifest -- \
+            --slice "agents:${AGENT}" \
+            --dist-dir dist \
+            --merge-from dist/baseline-manifest.json \
+            --out dist/manifest.json
+          bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
--- a/.github/workflows/eval-weekly.yml
+++ b/.github/workflows/eval-weekly.yml
@@ -14,7 +14,7 @@ on:
      config:
        description: 'Eval config file (relative to apps/eval/)'
        required: false
-        default: 'configs/legacy/browseros-agent-weekly.json'
+        default: 'configs/browseros-agent-weekly.json'

 permissions:
  contents: read
@@ -62,27 +62,36 @@ jobs:
          curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
          unzip -qo /tmp/nopecha.zip -d extensions/nopecha

-      - name: Run eval and publish to R2
+      - name: Run eval
        working-directory: packages/browseros-agent/apps/eval
        env:
          FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
-          EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
-          EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
-          EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
-          EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
-          EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
          BROWSEROS_BINARY: /usr/bin/browseros
          WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
          # OpenClaw container runtime is macOS-only; opt the Linux runner
          # into the no-op stub so the server can boot and the eval can run.
          BROWSEROS_SKIP_OPENCLAW: '1'
-          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
+          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
        run: |
          echo "Running eval with config: $EVAL_CONFIG"
-          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
+          xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"
+
+      - name: Upload runs to R2
+        if: success()
+        working-directory: packages/browseros-agent/apps/eval
+        env:
+          EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
+          EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
+          EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
+          EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
+          EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
+          EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
+        run: |
+          CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
+          bun scripts/upload-run.ts "results/$CONFIG_NAME"

      - name: Generate trend report
        if: success()
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -63,15 +63,15 @@ jobs:
            junit_path: test-results/server-root.xml
            needs_browser: false
          - suite: agent
-            command: (cd apps/agent && bun run test)
+            command: bun run test:agent
            junit_path: test-results/agent.xml
            needs_browser: false
          - suite: eval
-            command: (cd apps/eval && bun run test)
+            command: bun run test:eval
            junit_path: test-results/eval.xml
            needs_browser: false
          - suite: build
-            command: bun run ./scripts/run-bun-test.ts ./scripts/build
+            command: bun run test:build
            junit_path: test-results/build.xml
            needs_browser: false

--- a/README.md
+++ b/README.md
@@ -188,21 +188,6 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
 - [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
 - [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.

-## Citation
-
-If you use BrowserOS in your research or project, please cite:
-
-```bibtex
-@software{browseros2025,
-  author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
-  title = {BrowserOS: The open-source Agentic browser},
-  url = {https://github.com/browseros-ai/BrowserOS},
-  year = {2025},
-  publisher = {GitHub},
-  license = {AGPL-3.0},
-}
-```
-
 ## License

 BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
--- a/packages/browseros-agent/README.md
+++ b/packages/browseros-agent/README.md
@@ -79,15 +79,14 @@ cp apps/server/.env.example apps/server/.env.development
 cp apps/agent/.env.example apps/agent/.env.development
 cp apps/server/.env.production.example apps/server/.env.production

-# Install deps and generate agent code
+# Install deps, generate agent code, and sync the VM cache
 bun run dev:setup

 # Start the full dev environment
 bun run dev:watch
 ```

-`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
-the server startup path and pulls the configured GHCR image on demand.
+`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.

 ### Environment Variables

@@ -157,14 +156,9 @@ bun run build:server          # Build production server resource artifacts and u
 bun run build:agent           # Build agent extension

 # Test
-bun run test                  # Run all tests
-bun run test:all              # Run all tests
-bun run test:main             # Run key server tools and integration tests
-
-# App-specific test groups (from packages/browseros-agent)
-cd apps/server && bun run test:tools
-cd apps/server && bun run test:cdp
-cd apps/server && bun run test:integration
+bun run test                  # Run standard tests
+bun run test:cdp              # Run CDP-based tests
+bun run test:integration      # Run integration tests

 # Quality
 bun run lint                  # Check with Biome
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.helpers.ts
@@ -17,7 +17,7 @@ export function groupProviderOptions(
      ? [{ key: 'llm' as const, label: 'AI Providers', options: llm }]
      : []),
    ...(acp.length
-      ? [{ key: 'acp' as const, label: 'Agents', options: acp }]
+      ? [{ key: 'acp' as const, label: 'ACP Models', options: acp }]
      : []),
  ]
 }
@@ -26,25 +26,14 @@ export function getProviderSearchValue(
  provider: Provider,
  groupLabel: string,
 ): string {
-  return [
-    provider.id,
-    provider.name,
-    provider.type,
-    groupLabel,
-    provider.adapterName,
-    provider.modelLabel,
-  ]
+  return [provider.id, provider.name, provider.type, groupLabel]
    .filter(Boolean)
    .join(' ')
 }

 export function getProviderSubtitle(provider: Provider): string | undefined {
  if (provider.kind !== 'acp') return undefined
-  return [
-    provider.adapterName,
-    provider.modelLabel,
-    provider.modelControl === 'best-effort' ? 'best effort' : undefined,
-  ]
-    .filter(Boolean)
-    .join(' · ')
+  return provider.modelControl === 'best-effort'
+    ? 'ACP model · best effort'
+    : 'ACP model'
 }
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.test.tsx
@@ -16,26 +16,22 @@ const options: Provider[] = [
  },
  {
    kind: 'acp',
-    id: 'agent-claude-review',
-    name: 'Review Bot',
+    id: 'acp:claude:haiku:medium',
+    name: 'Claude Code Haiku',
    type: 'acp',
-    adapterName: 'Claude Code',
-    modelLabel: 'Haiku',
    modelControl: 'best-effort',
  },
  {
    kind: 'acp',
-    id: 'agent-codex-browser',
-    name: 'Browser Driver',
+    id: 'acp:codex:gpt-5.5:medium',
+    name: 'Codex GPT-5.5',
    type: 'acp',
-    adapterName: 'Codex',
-    modelLabel: 'GPT-5.5',
    modelControl: 'runtime-supported',
  },
 ]

 describe('groupProviderOptions', () => {
-  it('groups normal providers separately from created agents', () => {
+  it('groups normal providers separately from ACP models', () => {
    expect(groupProviderOptions(options)).toEqual([
      {
        key: 'llm',
@@ -44,7 +40,7 @@ describe('groupProviderOptions', () => {
      },
      {
        key: 'acp',
-        label: 'Agents',
+        label: 'ACP Models',
        options: [options[2], options[3]],
      },
    ])
@@ -52,21 +48,20 @@ describe('groupProviderOptions', () => {
 })

 describe('getProviderSearchValue', () => {
-  it('matches created-agent group labels and item labels', () => {
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Agents')
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain('Review Bot')
-    expect(getProviderSearchValue(options[2], 'Agents')).toContain(
-      'Claude Code',
+  it('matches ACP group labels and item labels', () => {
+    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
+      'ACP Models',
+    )
+    expect(getProviderSearchValue(options[2], 'ACP Models')).toContain(
+      'Claude Code Haiku',
    )
  })
 })

 describe('getProviderSubtitle', () => {
-  it('describes created-agent runtime context without model-target copy', () => {
-    expect(getProviderSubtitle(options[2])).toBe(
-      'Claude Code · Haiku · best effort',
-    )
-    expect(getProviderSubtitle(options[3])).toBe('Codex · GPT-5.5')
+  it('does not present best-effort ACP models as guaranteed routing', () => {
+    expect(getProviderSubtitle(options[2])).toBe('ACP model · best effort')
+    expect(getProviderSubtitle(options[3])).toBe('ACP model')
    expect(getProviderSubtitle(options[0])).toBeUndefined()
  })
 })
--- a/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
+++ b/packages/browseros-agent/apps/agent/components/chat/ChatProviderSelector.tsx
@@ -41,10 +41,7 @@ export const ChatProviderSelector: FC<
      <PopoverTrigger asChild>{children}</PopoverTrigger>
      <PopoverContent side="bottom" align="start" className="w-64 p-0">
        <Command>
-          <CommandInput
-            placeholder="Search providers or agents..."
-            className="h-9"
-          />
+          <CommandInput placeholder="Search models..." className="h-9" />
          <CommandList>
            <CommandEmpty>No provider found</CommandEmpty>
            {groups.map((group) => (
--- a/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
+++ b/packages/browseros-agent/apps/agent/components/chat/chatComponentTypes.ts
@@ -7,8 +7,5 @@ export interface Provider {
  name: string
  type: ChatProviderType
  kind: 'llm' | 'acp'
-  agentId?: string
-  adapterName?: string
-  modelLabel?: string
  modelControl?: 'runtime-supported' | 'best-effort'
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCard.tsx
@@ -0,0 +1,136 @@
+import { Bot, Loader2, Wrench } from 'lucide-react'
+import type { FC } from 'react'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import { cn } from '@/lib/utils'
+
+interface AgentCardProps {
+  agent: AgentCardData
+  onClick: () => void
+  active?: boolean
+}
+
+function formatTimestamp(timestamp?: number): string {
+  if (!timestamp) return 'No activity yet'
+  const diff = Date.now() - timestamp
+  const minutes = Math.floor(diff / 60000)
+  if (minutes < 1) return 'just now'
+  if (minutes < 60) return `${minutes}m ago`
+  const hours = Math.floor(minutes / 60)
+  if (hours < 24) return `${hours}h ago`
+  return `${Math.floor(hours / 24)}d ago`
+}
+
+function getStatusLabel(status: AgentCardData['status']): string {
+  if (status === 'working') return 'Working'
+  if (status === 'error') return 'Error'
+  return 'Ready'
+}
+
+function getStatusTone(status: AgentCardData['status']): string {
+  if (status === 'working') return 'bg-amber-500'
+  if (status === 'error') return 'bg-destructive'
+  return 'bg-emerald-500'
+}
+
+function formatCost(usd: number): string {
+  if (usd < 0.005) return `$${usd.toFixed(4)}`
+  return `$${usd.toFixed(2)}`
+}
+
+export const AgentCardExpanded: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
+      active
+        ? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
+    )}
+  >
+    <div className="flex items-start justify-between gap-3">
+      <div className="flex min-w-0 items-center gap-3">
+        <div
+          className={cn(
+            'flex size-10 shrink-0 items-center justify-center rounded-xl',
+            active
+              ? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
+              : 'bg-muted text-muted-foreground',
+          )}
+        >
+          <Bot className="size-5" />
+        </div>
+        <div className="min-w-0">
+          <div className="truncate font-semibold text-sm">{agent.name}</div>
+          <div className="truncate text-muted-foreground text-xs">
+            {agent.model ?? 'OpenClaw agent'}
+          </div>
+        </div>
+      </div>
+      <div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
+        <span
+          className={cn('size-2 rounded-full', getStatusTone(agent.status))}
+        />
+        <span>{getStatusLabel(agent.status)}</span>
+      </div>
+    </div>
+
+    <div className="mt-4 flex-1">
+      <p className="line-clamp-2 text-foreground/90 text-sm">
+        {agent.lastMessage ??
+          'Start a conversation to see recent work and summaries.'}
+      </p>
+    </div>
+
+    <div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
+      <div className="flex items-center justify-between gap-3">
+        <span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
+        {agent.costUsd ? (
+          <span className="tabular-nums opacity-70">
+            {formatCost(agent.costUsd)}
+          </span>
+        ) : null}
+      </div>
+      {agent.status === 'working' && agent.currentTool ? (
+        <div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
+          <Loader2 className="size-3 shrink-0 animate-spin" />
+          <span className="truncate">{agent.currentTool}</span>
+        </div>
+      ) : agent.activitySummary ? (
+        <div className="flex items-center gap-1.5 text-muted-foreground/60">
+          <Wrench className="size-3 shrink-0" />
+          <span className="truncate">{agent.activitySummary}</span>
+        </div>
+      ) : null}
+    </div>
+  </button>
+)
+
+export const AgentCardCompact: FC<AgentCardProps> = ({
+  agent,
+  onClick,
+  active,
+}) => (
+  <button
+    type="button"
+    onClick={onClick}
+    className={cn(
+      'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
+      active
+        ? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
+        : 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
+    )}
+  >
+    <span
+      className={cn(
+        'size-2 rounded-full',
+        active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
+      )}
+    />
+    <span className="truncate">{agent.name}</span>
+  </button>
+)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCardDock.tsx
@@ -1,71 +1,70 @@
 import { Plus } from 'lucide-react'
 import type { FC } from 'react'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { cn } from '@/lib/utils'
-import { HomeAgentCard } from './HomeAgentCard'
+import { AgentCardCompact, AgentCardExpanded } from './AgentCard'

 interface AgentCardDockProps {
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  activeAgentId?: string
  onSelectAgent: (agentId: string) => void
  onCreateAgent?: () => void
+  compact?: boolean
 }

-function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
+function CreateAgentButton({
+  compact,
+  onCreateAgent,
+}: {
+  compact?: boolean
+  onCreateAgent: () => void
+}) {
  return (
    <button
      type="button"
      onClick={onCreateAgent}
      className={cn(
-        'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
-        'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
+        compact
+          ? 'rounded-full px-3 py-2 text-sm'
+          : 'min-h-32 rounded-2xl px-5 py-4',
      )}
    >
-      <Plus className="size-5" />
-      <span>Create agent</span>
+      <Plus className={compact ? 'size-3.5' : 'size-5'} />
+      <span>{compact ? 'New' : 'Create agent'}</span>
    </button>
  )
 }

-/**
- * 3-column grid of HomeAgentCards plus a trailing "Create agent"
- * tile. The previous `compact` mode (rendered a horizontal pill rail)
- * had no callers and was dropped along with the legacy AgentCard.
- */
 export const AgentCardDock: FC<AgentCardDockProps> = ({
  agents,
-  adapters,
  activeAgentId,
  onSelectAgent,
  onCreateAgent,
+  compact,
 }) => {
  if (agents.length === 0 && !onCreateAgent) return null

-  const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
-  for (const descriptor of adapters) {
-    if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
-  }
+  const Card = compact ? AgentCardCompact : AgentCardExpanded

  return (
-    <div className="grid gap-4 md:grid-cols-3">
+    <div
+      className={cn(
+        compact
+          ? 'flex items-center gap-2 overflow-x-auto pb-1'
+          : 'grid gap-4 md:grid-cols-3',
+      )}
+    >
      {agents.map((agent) => (
-        <HomeAgentCard
-          key={agent.id}
+        <Card
+          key={agent.agentId}
          agent={agent}
-          adapter={agent.adapter}
-          adapterHealth={adapterHealth.get(agent.adapter) ?? null}
-          active={agent.id === activeAgentId}
-          onClick={() => onSelectAgent(agent.id)}
+          active={agent.agentId === activeAgentId}
+          onClick={() => onSelectAgent(agent.agentId)}
        />
      ))}
      {onCreateAgent ? (
-        <CreateAgentButton onCreateAgent={onCreateAgent} />
+        <CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
      ) : null}
    </div>
  )
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandConversation.tsx
@@ -2,12 +2,6 @@ import { ArrowLeft, Bot, Home } from 'lucide-react'
 import { type FC, useEffect, useMemo, useRef } from 'react'
 import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
 import { Button } from '@/components/ui/button'
-import {
-  cancelHarnessTurn,
-  useEnqueueHarnessMessage,
-  useHarnessAgents,
-  useRemoveHarnessQueuedMessage,
-} from '@/entrypoints/app/agents/useAgents'
 import {
  type AgentEntry,
  getModelDisplayName,
@@ -21,7 +15,6 @@ import {
  filterTurnsPersistedInHistory,
  flattenHistoryPages,
 } from './claw-chat-types'
-import { QueuePanel } from './QueuePanel'
 import { useAgentConversation } from './useAgentConversation'
 import { useHarnessChatHistory } from './useHarnessChatHistory'

@@ -219,33 +212,15 @@ function AgentConversationController({
    [historyMessages],
  )

-  // Listing query feeds queue + active-turn state for this agent. We
-  // already poll it every 5s for the rail; reusing the same cache
-  // keeps cross-tab queue state in sync without a second poll.
-  const { harnessAgents } = useHarnessAgents()
-  const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
-  const queue = harnessAgent?.queue ?? []
-  const activeTurnId = harnessAgent?.activeTurnId ?? null
-
  const { turns, streaming, send } = useAgentConversation(agentId, {
    runtime: 'agent-harness',
    sessionKey: null,
    history: chatHistory,
-    activeTurnId,
    onComplete: () => {
      void harnessHistoryQuery.refetch()
    },
    onSessionKeyChange: () => {},
  })
-  const enqueueMessage = useEnqueueHarnessMessage()
-  const removeQueuedMessage = useRemoveHarnessQueuedMessage()
-
-  const handleStop = () => {
-    void cancelHarnessTurn(agentId, {
-      turnId: activeTurnId ?? undefined,
-      reason: 'user pressed stop',
-    })
-  }
  const visibleTurns = useMemo(
    () => filterTurnsPersistedInHistory(turns, historyMessages),
    [historyMessages, turns],
@@ -306,15 +281,7 @@ function AgentConversationController({
      />

      <div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
-        <div className="mx-auto max-w-3xl space-y-3">
-          {queue.length > 0 ? (
-            <QueuePanel
-              queue={queue}
-              onRemove={(messageId) =>
-                removeQueuedMessage.mutate({ agentId, messageId })
-              }
-            />
-          ) : null}
+        <div className="mx-auto max-w-3xl">
          <ConversationInput
            variant="conversation"
            agents={agents}
@@ -329,31 +296,14 @@ function AgentConversationController({
                name: a.name,
                dataUrl: a.dataUrl,
              }))
-              // When the agent already has an in-flight turn, route
-              // the new message into the durable queue instead of
-              // starting a parallel turn. Drains automatically as
-              // soon as the active turn ends.
-              if (streaming || activeTurnId) {
-                enqueueMessage.mutate({
-                  agentId,
-                  message: input.text,
-                  attachments,
-                })
-                return
-              }
              void send({ text: input.text, attachments, attachmentPreviews })
            }}
            onCreateAgent={() => navigate(createAgentPath)}
-            onStop={handleStop}
            streaming={streaming}
            disabled={disabled}
            status="running"
            attachmentsEnabled={true}
-            placeholder={
-              streaming
-                ? `Type to queue another message for ${agentName}...`
-                : `Message ${agentName}...`
-            }
+            placeholder={`Message ${agentName}...`}
          />
        </div>
      </div>
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/AgentCommandHome.tsx
@@ -1,25 +1,18 @@
 import { Plus } from 'lucide-react'
-import { type FC, useEffect, useMemo, useState } from 'react'
+import { type FC, useEffect, useState } from 'react'
 import { useNavigate } from 'react-router'
 import { Button } from '@/components/ui/button'
 import { Card, CardContent } from '@/components/ui/card'
 import { Separator } from '@/components/ui/separator'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import {
-  useAgentAdapters,
-  useHarnessAgents,
-} from '@/entrypoints/app/agents/useAgents'
 import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
 import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
 import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
 import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
 import { AgentCardDock } from './AgentCardDock'
 import { useAgentCommandData } from './agent-command-layout'
 import { ConversationInput } from './ConversationInput'
-import { orderHomeAgents } from './home-agent-card.helpers'
+import { buildAgentCardData } from './useAgentCardData'

 function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
  return (
@@ -45,13 +38,11 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
 function RecentThreads({
  activeAgentId,
  agents,
-  adapters,
  onOpenAgents,
  onSelectAgent,
 }: {
  activeAgentId?: string | null
-  agents: HarnessAgent[]
-  adapters: HarnessAdapterDescriptor[]
+  agents: AgentCardData[]
  onOpenAgents: () => void
  onSelectAgent: (agentId: string) => void
 }) {
@@ -77,7 +68,6 @@ function RecentThreads({
      </div>
      <AgentCardDock
        agents={agents}
-        adapters={adapters}
        activeAgentId={activeAgentId ?? undefined}
        onSelectAgent={onSelectAgent}
        onCreateAgent={onOpenAgents}
@@ -89,32 +79,25 @@ function RecentThreads({
 export const AgentCommandHome: FC = () => {
  const navigate = useNavigate()
  const activeHint = useActiveHint()
-  // The conversation input still consumes the merged AgentEntry list
-  // from the layout context (handles legacy /claw/agents entries that
-  // haven't yet been backfilled into the harness store). The Recent
-  // Agents grid below reads the richer harness payload directly.
-  const { agents: legacyAgents, status } = useAgentCommandData()
-  const { harnessAgents } = useHarnessAgents()
-  const { adapters } = useAgentAdapters()
+  const { agents, status } = useAgentCommandData()
  const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
-
-  const orderedAgents = useMemo(
-    () => orderHomeAgents(harnessAgents),
-    [harnessAgents],
-  )
+  const cardData = buildAgentCardData(agents, status?.status, undefined)

  useEffect(() => {
-    if (legacyAgents.length === 0) {
-      if (selectedAgentId) setSelectedAgentId(null)
+    if (agents.length === 0) {
+      if (selectedAgentId) {
+        setSelectedAgentId(null)
+      }
      return
    }
+
    if (
      !selectedAgentId ||
-      !legacyAgents.some((agent) => agent.agentId === selectedAgentId)
+      !agents.some((agent) => agent.agentId === selectedAgentId)
    ) {
-      setSelectedAgentId(legacyAgents[0].agentId)
+      setSelectedAgentId(agents[0].agentId)
    }
-  }, [legacyAgents, selectedAgentId])
+  }, [agents, selectedAgentId])

  const handleSend = (input: { text: string }) => {
    if (!selectedAgentId) return
@@ -127,7 +110,7 @@ export const AgentCommandHome: FC = () => {
    setSelectedAgentId(agent.agentId)
  }

-  const selectedAgent = legacyAgents.find(
+  const selectedAgent = agents.find(
    (agent) => agent.agentId === selectedAgentId,
  )
  const selectedAgentReady = selectedAgent
@@ -135,15 +118,13 @@ export const AgentCommandHome: FC = () => {
    : false
  const selectedAgentStatus =
    selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
-  const selectedAgentName =
-    selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
-
-  const hasAgents = legacyAgents.length > 0
+  const selectedCard =
+    cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]

  return (
    <div className="min-h-full px-4 py-6">
      <div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
-        {hasAgents ? (
+        {cardData.length > 0 ? (
          <>
            <div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
              <div className="space-y-3">
@@ -159,7 +140,7 @@ export const AgentCommandHome: FC = () => {
              <div className="w-full max-w-3xl">
                <ConversationInput
                  variant="home"
-                  agents={legacyAgents}
+                  agents={agents}
                  selectedAgentId={selectedAgentId}
                  onSelectAgent={handleSelectAgent}
                  onSend={handleSend}
@@ -170,7 +151,7 @@ export const AgentCommandHome: FC = () => {
                  attachmentsEnabled={false}
                  placeholder={
                    selectedAgentReady
-                      ? `Ask ${selectedAgentName} to handle a task...`
+                      ? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
                      : 'Agent runtime is not running...'
                  }
                />
@@ -181,8 +162,7 @@ export const AgentCommandHome: FC = () => {

            <RecentThreads
              activeAgentId={selectedAgentId}
-              agents={orderedAgents}
-              adapters={adapters}
+              agents={cardData}
              onOpenAgents={() => navigate('/agents')}
              onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
            />
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/ConversationInput.tsx
@@ -54,40 +54,25 @@ interface ConversationInputProps {
  placeholder?: string
  attachmentsEnabled?: boolean
  variant?: 'home' | 'conversation'
-  /**
-   * When set, a Stop button surfaces to the left of the voice mic
-   * while `streaming === true`. Click cancels the active turn
-   * server-side via the chat-cancel endpoint. Absent → no Stop
-   * button (legacy behaviour for the home composer).
-   */
-  onStop?: () => void
 }

 function InputActionButton({
  disabled,
  onClick,
  streaming,
-  hasContent,
 }: {
  disabled: boolean
  onClick: () => void
  streaming: boolean
-  hasContent: boolean
 }) {
-  // Show the spinner while streaming only when there's nothing to
-  // send — once the user types something, the icon flips back to the
-  // paper-plane so it reads as "queue this message" instead of
-  // "still working".
-  const showSpinner = streaming && !hasContent
  return (
    <Button
      onClick={onClick}
      size="icon"
      disabled={disabled}
-      title={streaming && hasContent ? 'Queue message' : undefined}
      className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
    >
-      {showSpinner ? (
+      {streaming ? (
        <Loader2 className="h-5 w-5 animate-spin" />
      ) : (
        <ArrowRight className="h-5 w-5" />
@@ -96,22 +81,6 @@ function InputActionButton({
  )
 }

-function StopButton({ onStop }: { onStop: () => void }) {
-  return (
-    <Button
-      type="button"
-      size="icon"
-      variant="ghost"
-      onClick={onStop}
-      title="Stop current turn — queued messages will start next."
-      aria-label="Stop current turn"
-      className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
-    >
-      <Square className="h-3.5 w-3.5 fill-current" />
-    </Button>
-  )
-}
-
 function VoiceButton({
  isRecording,
  isTranscribing,
@@ -330,7 +299,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  placeholder,
  attachmentsEnabled = true,
  variant = 'conversation',
-  onStop,
 }) => {
  const [input, setInput] = useState('')
  const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
@@ -411,17 +379,10 @@ export const ConversationInput: FC<ConversationInputProps> = ({
  }

  const hasContent = input.trim().length > 0 || attachments.length > 0
-  // Queue-aware composers (the conversation panel passes `onStop`)
-  // accept input while streaming — the parent decides whether the
-  // submission opens a new turn or enqueues onto the active one.
-  // Surfaces without a Stop hook (home) keep the legacy behaviour
-  // and block input until the current turn finishes.
-  const queueAware = Boolean(onStop)

  const handleSend = () => {
    const text = input.trim()
-    if (disabled || isStaging) return
-    if (streaming && !queueAware) return
+    if (disabled || isStaging || streaming) return
    if (!text && attachments.length === 0) return
    onSend({ text, attachments })
    setInput('')
@@ -551,7 +512,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              )}
            />
          </div>
-          {streaming && onStop ? <StopButton onStop={onStop} /> : null}
          <VoiceButton
            isRecording={voice.isRecording}
            isTranscribing={voice.isTranscribing}
@@ -569,13 +529,12 @@ export const ConversationInput: FC<ConversationInputProps> = ({
              !!disabled ||
              voice.isRecording ||
              voice.isTranscribing ||
-              (streaming && !queueAware)
+              streaming
            }
            onClick={handleSend}
            // Spinner stays the user-facing "agent is busy" hint; with the
            // queue active we still spin while a turn is in flight.
            streaming={streaming}
-            hasContent={hasContent}
          />
        </div>
        {voice.error ? (
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/HomeAgentCard.tsx
@@ -1,243 +0,0 @@
-import { Quote, TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
-import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
-import type {
-  HarnessAdapterHealth,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
-import {
-  firstNonBlankLine,
-  truncate,
-} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
-import { cn } from '@/lib/utils'
-
-interface HomeAgentCardProps {
-  agent: HarnessAgent
-  adapter: HarnessAgentAdapter | 'unknown'
-  /** Per-adapter health snapshot, shared across cards rendering the
-   *  same adapter. `null` when the /adapters response hasn't surfaced
-   *  health yet (we treat that as healthy until proven otherwise). */
-  adapterHealth: HarnessAdapterHealth | null
-  /** Highlights the card with an accent ring; tells the user which
-   *  agent the conversation input is bound to. */
-  active?: boolean
-  onClick: () => void
-}
-
-const PREVIEW_CHARS = 100
-
-/**
- * Grid-shaped card for the /home Recent agents section. Composition
- * mirrors the rail's `AgentRowCard` but the layout is a vertical
- * column sized for a 1/3-width tile rather than a full-width row.
- *
- * Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
- * `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
- * inline `Unavailable` chip pattern so the visual language is
- * continuous between rail and grid.
- */
-export const HomeAgentCard: FC<HomeAgentCardProps> = ({
-  agent,
-  adapter,
-  adapterHealth,
-  active,
-  onClick,
-}) => {
-  const status = agent.status ?? 'unknown'
-  const lastUsedAt = agent.lastUsedAt ?? null
-  const isWorking = status === 'working'
-  const isAsleep = status === 'asleep'
-  const isError = status === 'error'
-  const hasActiveTurn = Boolean(agent.activeTurnId)
-
-  return (
-    <button
-      type="button"
-      onClick={onClick}
-      className={cn(
-        'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
-        active && 'ring-1 ring-[var(--accent-orange)]/30',
-        isWorking
-          ? 'border-[var(--accent-orange)]/40'
-          : isError
-            ? 'border-destructive/30'
-            : 'border-border/60 hover:border-[var(--accent-orange)]/30',
-      )}
-    >
-      <div className="flex items-start gap-3">
-        <AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
-        <div className="min-w-0 flex-1">
-          <div className="flex items-center gap-1.5">
-            <span className="truncate font-semibold text-sm">
-              {displayName(agent)}
-            </span>
-            {isWorking && (
-              <Badge
-                variant="secondary"
-                className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
-              >
-                Working
-              </Badge>
-            )}
-          </div>
-          <SummaryLine
-            adapter={adapter}
-            modelId={agent.modelId ?? null}
-            reasoningEffort={agent.reasoningEffort ?? null}
-            adapterHealth={adapterHealth}
-          />
-        </div>
-      </div>
-
-      <LastMessage message={agent.lastUserMessage ?? null} />
-
-      <div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
-        <span>{statusFootnote(status, lastUsedAt)}</span>
-        {hasActiveTurn ? (
-          <ResumeChip />
-        ) : isAsleep ? (
-          <Badge variant="outline" className="text-muted-foreground">
-            Asleep
-          </Badge>
-        ) : isError ? (
-          <ErrorChip lastError={agent.lastError ?? null} />
-        ) : null}
-      </div>
-    </button>
-  )
-}
-
-const SummaryLine: FC<{
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelId: string | null
-  reasoningEffort: string | null
-  adapterHealth: HarnessAdapterHealth | null
-}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelId) parts.push(modelId)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth?.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
-
-const LastMessage: FC<{ message: string | null }> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  return (
-    <p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="line-clamp-2">
-        {truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
-      </span>
-    </p>
-  )
-}
-
-const ResumeChip: FC = () => (
-  <span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
-    <span className="relative flex size-1.5">
-      <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-      <span className="relative inline-flex size-1.5 rounded-full bg-white" />
-    </span>
-    Resume
-  </span>
-)
-
-const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
-  if (!lastError) {
-    return <Badge variant="destructive">Attention</Badge>
-  }
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <Badge variant="destructive" className="cursor-default">
-          Attention
-        </Badge>
-      </HoverCardTrigger>
-      <HoverCardContent
-        side="left"
-        className="max-w-xs whitespace-pre-wrap font-mono text-xs"
-      >
-        {lastError}
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
-
-/**
- * Footer left side: relative time on every state EXCEPT working,
- * which shows `now` (the dot is already pulsing — restating it as
- * "Working" would duplicate the pill in the title row).
- */
-function statusFootnote(
-  status: AgentLiveness,
-  lastUsedAt: number | null,
-): string {
-  if (status === 'working') return 'now'
-  return formatRelativeTime(lastUsedAt)
-}
-
-const UUID_PATTERN =
-  /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-const OC_UUID_PATTERN =
-  /^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
-
-function displayName(agent: HarnessAgent): string {
-  const name = agent.name?.trim()
-  const id = agent.id
-  if (!name || name === id) {
-    if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
-    if (UUID_PATTERN.test(id)) return id.slice(0, 8)
-    return id
-  }
-  return name
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/QueuePanel.tsx
@@ -1,94 +0,0 @@
-import { ListPlus, X } from 'lucide-react'
-import type { FC } from 'react'
-import {
-  Queue,
-  QueueItem,
-  QueueItemAction,
-  QueueItemActions,
-  QueueItemAttachment,
-  QueueItemContent,
-  QueueItemFile,
-  QueueItemImage,
-  QueueList,
-  QueueSection,
-  QueueSectionContent,
-  QueueSectionLabel,
-  QueueSectionTrigger,
-} from '@/components/ai-elements/queue'
-import type {
-  HarnessQueuedMessage,
-  HarnessQueuedMessageAttachment,
-} from '@/entrypoints/app/agents/agent-harness-types'
-import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
-
-interface QueuePanelProps {
-  queue: HarnessQueuedMessage[]
-  onRemove: (messageId: string) => void
-}
-
-/**
- * Renders the agent's pending message queue using the shared AI
- * Elements `Queue` primitives. Caller is expected to gate render on
- * `queue.length > 0` — when empty, this returns null so the panel
- * disappears cleanly between turns.
- */
-export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
-  if (queue.length === 0) return null
-  return (
-    <Queue>
-      <QueueSection>
-        <QueueSectionTrigger>
-          <QueueSectionLabel
-            count={queue.length}
-            label={queue.length === 1 ? 'queued message' : 'queued messages'}
-            icon={<ListPlus className="size-3.5" />}
-          />
-        </QueueSectionTrigger>
-        <QueueSectionContent>
-          <QueueList>
-            {queue.map((entry) => (
-              <QueueItem key={entry.id}>
-                <div className="flex items-center gap-2">
-                  <QueueItemContent>
-                    {firstNonBlankLine(entry.message)}
-                  </QueueItemContent>
-                  <QueueItemActions>
-                    <QueueItemAction
-                      aria-label="Remove from queue"
-                      onClick={() => onRemove(entry.id)}
-                    >
-                      <X className="size-3" />
-                    </QueueItemAction>
-                  </QueueItemActions>
-                </div>
-                {entry.attachments && entry.attachments.length > 0 ? (
-                  <QueueItemAttachment>
-                    {entry.attachments.map((attachment, idx) =>
-                      renderAttachment(entry.id, attachment, idx),
-                    )}
-                  </QueueItemAttachment>
-                ) : null}
-              </QueueItem>
-            ))}
-          </QueueList>
-        </QueueSectionContent>
-      </QueueSection>
-    </Queue>
-  )
-}
-
-function renderAttachment(
-  messageId: string,
-  attachment: HarnessQueuedMessageAttachment,
-  idx: number,
-) {
-  if (attachment.mediaType.startsWith('image/')) {
-    const src = `data:${attachment.mediaType};base64,${attachment.data}`
-    return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
-  }
-  return (
-    <QueueItemFile key={`${messageId}-${idx}`}>
-      {attachment.mediaType}
-    </QueueItemFile>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.test.ts
@@ -1,69 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-import { orderHomeAgents } from './home-agent-card.helpers'
-
-function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
-  return {
-    id: overrides.id ?? 'agent-x',
-    name: overrides.name ?? overrides.id ?? 'agent-x',
-    adapter: overrides.adapter ?? 'codex',
-    permissionMode: 'approve-all',
-    sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
-    createdAt: 1000,
-    updatedAt: 1000,
-    ...overrides,
-  }
-}
-
-describe('orderHomeAgents', () => {
-  it('places active-turn agents before everyone else', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'a', lastUsedAt: 5000 }),
-      agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
-      agent({ id: 'c', lastUsedAt: 7000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
-  })
-
-  it('orders non-active agents by lastUsedAt desc', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'old', lastUsedAt: 1000 }),
-      agent({ id: 'new', lastUsedAt: 9000 }),
-      agent({ id: 'mid', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
-  })
-
-  it('puts the gateway `main` seed agent above other never-used agents', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
-  })
-
-  it('sends never-used agents to the bottom even when `main` is among them', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'main', lastUsedAt: null }),
-      agent({ id: 'used', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
-  })
-
-  it('does NOT sort by pinned — pinned agents are treated like any other', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
-      agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
-  })
-
-  it('falls back to id-stable ordering when lastUsedAt ties', () => {
-    const sorted = orderHomeAgents([
-      agent({ id: 'b', lastUsedAt: 5000 }),
-      agent({ id: 'a', lastUsedAt: 5000 }),
-    ])
-    expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/home-agent-card.helpers.ts
@@ -1,42 +0,0 @@
-import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
-
-/**
- * Order for the /home Recent agents grid.
- *
- * 1. Active turn first — agents mid-turn float to the top so the
- *    Resume affordance is the first thing the user sees on /home.
- * 2. The protected gateway-side `main` agent stays pinned-to-top in
- *    the never-used group on a fresh install (mirrors the rail).
- * 3. Recency (`lastUsedAt` desc).
- * 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
- *    every 5-second poll.
- *
- * Pin is NOT a sort key. The home grid is action-oriented and trusts
- * recency + active-turn to surface the right agent; pinning is an
- * organisation tool that lives on the rail at /agents.
- */
-export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
-  return [...agents].sort((a, b) => {
-    const aActive = a.activeTurnId != null
-    const bActive = b.activeTurnId != null
-    if (aActive !== bActive) return aActive ? -1 : 1
-
-    // Recency wins outright. Never-used agents (`lastUsedAt == null`)
-    // both fall to the same `-Infinity` bucket and the seed/id rules
-    // below decide their order — but a used agent always beats any
-    // never-used agent regardless of id.
-    const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
-    if (aValue !== bValue) return bValue - aValue
-
-    // Inside the never-used (or exact-tie) group: pin the gateway
-    // `main` seed to the top of the group on a fresh install, then
-    // fall back to id-stable order so the grid doesn't reshuffle on
-    // every poll.
-    const aSeed = a.id === 'main' && a.lastUsedAt == null
-    const bSeed = b.id === 'main' && b.lastUsedAt == null
-    if (aSeed !== bSeed) return aSeed ? -1 : 1
-
-    return a.id.localeCompare(b.id)
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentCardData.ts
@@ -0,0 +1,53 @@
+import {
+  type AgentEntry,
+  getModelDisplayName,
+  type OpenClawStatus,
+} from '@/entrypoints/app/agents/useOpenClaw'
+import type { AgentCardData } from '@/lib/agent-conversations/types'
+import type { AgentOverview } from './useAgentDashboard'
+
+function resolveAgentStatus(
+  gatewayStatus: OpenClawStatus['status'] | undefined,
+  liveStatus: AgentOverview['status'] | undefined,
+): AgentCardData['status'] {
+  // Gateway-level errors take precedence
+  if (gatewayStatus === 'error') return 'error'
+  if (gatewayStatus === 'starting') return 'working'
+
+  // Per-agent live status from the WS observer
+  if (liveStatus === 'working') return 'working'
+  if (liveStatus === 'error') return 'error'
+
+  return 'idle'
+}
+
+/**
+ * Build agent card display data by merging the raw agent entries from
+ * the gateway with enriched overview data from the dashboard API.
+ *
+ * Pure function — no hooks, no IndexedDB, no async.
+ */
+export function buildAgentCardData(
+  agents: AgentEntry[],
+  status: OpenClawStatus['status'] | undefined,
+  dashboard: AgentOverview[] | undefined,
+): AgentCardData[] {
+  return agents.map((agent) => {
+    const overview = dashboard?.find((d) => d.agentId === agent.agentId)
+
+    return {
+      agentId: agent.agentId,
+      name: agent.name,
+      model: getModelDisplayName(agent.model),
+      status:
+        agent.source === 'agent-harness'
+          ? 'idle'
+          : resolveAgentStatus(status, overview?.status),
+      lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
+      lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
+      activitySummary: overview?.activitySummary ?? undefined,
+      currentTool: overview?.currentTool ?? undefined,
+      costUsd: overview?.totalCostUsd ?? undefined,
+    }
+  })
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentConversation.ts
@@ -36,15 +36,6 @@ interface UseAgentConversationOptions {
  history?: OpenClawChatHistoryMessage[]
  onComplete?: () => void
  onSessionKeyChange?: (sessionKey: string) => void
-  /**
-   * Server-side active turn id, surfaced via the listing query. When
-   * this changes from null/<id> to a different non-null id while we
-   * aren't already streaming (e.g. the server just popped a queued
-   * message and started a new turn), the hook reattaches via
-   * /chat/active so the chat panel picks up the live stream without
-   * waiting for a remount.
-   */
-  activeTurnId?: string | null
 }

 export function useAgentConversation(
@@ -220,46 +211,31 @@ export function useAgentConversation(
  }
  processEventRef.current = processAgentHarnessStreamEvent

-  const activeTurnIdDep = options.activeTurnId ?? null
-
-  // On mount, on agent change, and whenever the listing reports a
-  // *new* active turn id, check whether the server has an in-flight
-  // turn for this agent and reattach to it. This catches three
-  // cases at once: the chat resilience flow (tab close/reopen),
-  // navigation between agents, AND queue drain (the server starts a
-  // new turn from a queued message → activeTurnId flips → attach).
+  // On mount (and whenever the agent changes), check whether the
+  // server has an in-flight turn for this agent and reattach to it.
+  // This is what makes the chat resilient across tab close/reopen,
+  // refresh, and navigation: the runtime call kept running on the
+  // server while we were away. Effect only depends on `agentId` —
+  // the event handler is read off a ref so this doesn't re-subscribe
+  // every render.
  useEffect(() => {
    let cancelled = false
    const abortController = new AbortController()
-    // Reference the dep inside the body so biome's exhaustive-deps
-    // rule sees it consumed; the value is just an "any non-null
-    // active turn id" trigger — the actual id we attach to comes
-    // from the fresh fetchActiveHarnessTurn call below.
-    void activeTurnIdDep

    const attemptResume = async () => {
-      // Track whether *we* started a stream in this run. When the
-      // early-return paths fire (no active turn, or a `send()` /
-      // earlier resume already owns `streamAbortRef`), the finally
-      // block must NOT touch streaming/turnIdRef/lastSeqRef —
-      // otherwise we clobber the in-flight stream's state and the
-      // Stop button drops out mid-turn while events keep arriving.
-      let weStartedStream = false
      try {
        const active = await fetchActiveHarnessTurn(agentId)
        if (cancelled || !active || active.status !== 'running') return
-        if (streamAbortRef.current) return // someone else already owns the stream
+        if (streamAbortRef.current) return // a fresh send already in flight

        // Stage a placeholder turn so the streamed events have a row
-        // to render into. The server now persists the kicking-off
-        // prompt on the active turn, so we render it as the user
-        // bubble immediately — no empty-bubble flicker when a queued
-        // message starts running.
+        // to render into. We don't have the user message text on
+        // resume; the assistant turn is what we're catching up on.
        setTurns((prev) => [
          ...prev,
          {
            id: crypto.randomUUID(),
-            userText: active.prompt ?? '',
+            userText: '',
            parts: [],
            done: false,
            timestamp: active.startedAt,
@@ -271,7 +247,6 @@ export function useAgentConversation(
        lastSeqRef.current = null
        streamAbortRef.current = abortController
        setStreaming(true)
-        weStartedStream = true

        const response = await attachToHarnessTurn(agentId, {
          turnId: active.turnId,
@@ -290,20 +265,10 @@ export function useAgentConversation(
        // Resume is best-effort; transient errors fall back to the
        // user starting a new turn manually.
      } finally {
-        // Always release `streamAbortRef` if we owned it — even when
-        // the effect was cancelled mid-stream (a listing poll
-        // captured the next queue-drain turn id, for example). If we
-        // don't, the next effect run hits `if (streamAbortRef.current)
-        // return` against our now-aborted controller and never
-        // reattaches, leaving `streaming === true` with no live stream.
-        if (weStartedStream && streamAbortRef.current === abortController) {
-          streamAbortRef.current = null
-        }
-        // The other state (streaming flag, turn id, lastSeq) is the
-        // *current run's* lifecycle: only reset it on a clean exit.
-        // When `cancelled` is true the next run will set these
-        // itself, so resetting here would only cause a brief flicker.
-        if (!cancelled && weStartedStream) {
+        if (!cancelled) {
+          if (streamAbortRef.current === abortController) {
+            streamAbortRef.current = null
+          }
          turnIdRef.current = null
          lastSeqRef.current = null
          setStreaming(false)
@@ -316,7 +281,7 @@ export function useAgentConversation(
      cancelled = true
      abortController.abort()
    }
-  }, [agentId, activeTurnIdDep])
+  }, [agentId])

  const send = async (input: string | SendInput) => {
    const normalized: SendInput =
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agent-command/useAgentDashboard.ts
@@ -0,0 +1,95 @@
+import { useQuery, useQueryClient } from '@tanstack/react-query'
+import { useEffect } from 'react'
+import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
+
+export interface AgentOverview {
+  agentId: string
+  status: 'working' | 'idle' | 'error' | 'unknown'
+  latestMessage: string | null
+  latestMessageAt: number | null
+  activitySummary: string | null
+  currentTool: string | null
+  totalCostUsd: number
+  sessionCount: number
+}
+
+export interface DashboardResponse {
+  agents: AgentOverview[]
+  summary: {
+    totalAgents: number
+    totalCostUsd: number
+  }
+}
+
+interface StatusEvent {
+  agentId: string
+  status: AgentOverview['status']
+  currentTool: string | null
+  error: string | null
+  timestamp: number
+}
+
+const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
+
+export function useAgentDashboard(enabled: boolean) {
+  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
+  const queryClient = useQueryClient()
+  const ready = enabled && Boolean(baseUrl) && !urlLoading
+
+  // Initial data load + periodic refresh as fallback
+  const query = useQuery<DashboardResponse>({
+    queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
+    queryFn: async () => {
+      const url = new URL('/claw/dashboard', baseUrl as string)
+      const response = await fetch(url.toString())
+      if (!response.ok) throw new Error('Failed to fetch dashboard')
+      return response.json()
+    },
+    enabled: ready,
+  })
+
+  // SSE subscription for real-time status patches
+  useEffect(() => {
+    if (!ready || !baseUrl) return
+
+    const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
+    const eventSource = new EventSource(streamUrl.toString())
+
+    eventSource.addEventListener('snapshot', (event) => {
+      try {
+        const dashboard = JSON.parse(event.data) as DashboardResponse
+        queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
+      } catch {}
+    })
+
+    eventSource.addEventListener('status', (event) => {
+      try {
+        const status = JSON.parse(event.data) as StatusEvent
+        queryClient.setQueryData<DashboardResponse>(
+          [...DASHBOARD_QUERY_KEY, baseUrl],
+          (prev) => {
+            if (!prev) return prev
+            return {
+              ...prev,
+              agents: prev.agents.map((agent) =>
+                agent.agentId === status.agentId
+                  ? {
+                      ...agent,
+                      status: status.status,
+                      currentTool: status.currentTool,
+                    }
+                  : agent,
+              ),
+            }
+          },
+        )
+      } catch {}
+    })
+
+    return () => {
+      eventSource.close()
+    }
+  }, [ready, baseUrl, queryClient])
+
+  return query
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentList.tsx
@@ -2,87 +2,67 @@ import { Loader2 } from 'lucide-react'
 import { type FC, useMemo } from 'react'
 import { AgentRowCard } from './AgentRowCard'
 import { AgentsEmptyState } from './AgentsEmptyState'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-  HarnessAgentAdapter,
-} from './agent-harness-types'
-import type {
-  AgentAdapterHealth,
-  AgentRowData,
-} from './agent-row/agent-row.types'
+import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
 import type { AgentListItem } from './agents-page-types'
 import type { AgentLiveness } from './LivenessDot'

 interface AgentListProps {
  agents: AgentListItem[]
-  /** Optional per-agent activity metadata, keyed by `agentId`. */
+  /**
+   * Optional per-agent activity metadata. Keyed by `agentId`. Missing
+   * entries fall back to status='unknown' / lastUsedAt=null and the
+   * row renders an "unknown" dot. The server will populate this once
+   * the activity tracker ships; the page works without it.
+   */
  activity?: Record<
    string,
    { status: AgentLiveness; lastUsedAt: number | null }
  >
-  /** Lookup table from harness id → enriched agent record. */
+  /**
+   * Lookup table from harness agent id → adapter + reasoning effort,
+   * sourced from `useHarnessAgents`. Lets the row card render the
+   * correct adapter icon and chips for harness agents (legacy
+   * /claw/agents entries fall back to inferring from `runtimeLabel`).
+   */
  harnessAgentLookup?: Map<string, HarnessAgent>
-  /** Adapter catalog (carries per-adapter health). */
-  adapters: HarnessAdapterDescriptor[]
  loading: boolean
  deletingAgentKey: string | null
  onCreateAgent: () => void
  onDeleteAgent: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
 }

 export const AgentList: FC<AgentListProps> = ({
  agents,
  activity,
  harnessAgentLookup,
-  adapters,
  loading,
  deletingAgentKey,
  onCreateAgent,
  onDeleteAgent,
-  onPinToggle,
 }) => {
-  const adapterHealth = useMemo(() => {
-    const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
-    for (const adapter of adapters) {
-      if (adapter.health) {
-        map.set(adapter.id, {
-          healthy: adapter.health.healthy,
-          reason: adapter.health.reason,
-        })
-      }
-    }
-    return map
-  }, [adapters])
-
-  // Sort: pinned rows first, then most recently used, then never-used
-  // agents in id-stable order. The gateway's `main` agent stays
-  // pinned-to-top when never touched so a fresh install has an
-  // obvious starting point.
+  // Sort by recency: most recently used first; never-used agents drop
+  // to the bottom in id-stable order so the list doesn't reshuffle on
+  // every refresh. The pinned exception is the gateway's `main` agent
+  // when it's never been touched — keep it at the top so a fresh
+  // install has an obvious starting point.
  const ordered = useMemo(() => {
-    const withMeta = agents.map((agent) => {
-      const harness = harnessAgentLookup?.get(agent.agentId)
-      return {
-        agent,
-        pinned: harness?.pinned ?? false,
-        lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
-      }
+    const withScore = agents.map((agent) => {
+      const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
+      return { agent, lastUsedAt }
    })
-    return withMeta
+    return withScore
      .sort((a, b) => {
-        if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
-        const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
-        const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
-        if (aSeed && !bSeed) return -1
-        if (!aSeed && bSeed) return 1
+        const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
+        const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
+        if (aPinned && !bPinned) return -1
+        if (!aPinned && bPinned) return 1
        const aValue = a.lastUsedAt ?? -Infinity
        const bValue = b.lastUsedAt ?? -Infinity
        if (aValue !== bValue) return bValue - aValue
        return a.agent.agentId.localeCompare(b.agent.agentId)
      })
      .map((entry) => entry.agent)
-  }, [activity, agents, harnessAgentLookup])
+  }, [activity, agents])

  if (loading && agents.length === 0) {
    return (
@@ -100,23 +80,18 @@ export const AgentList: FC<AgentListProps> = ({
    <div className="grid gap-3">
      {ordered.map((agent) => {
        const harness = harnessAgentLookup?.get(agent.agentId)
-        const adapter: HarnessAgentAdapter | 'unknown' =
+        const adapter: HarnessAgentAdapter | undefined =
          harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
-        const data = buildRowData({
-          agent,
-          adapter,
-          harness,
-          activity: activity?.[agent.agentId],
-          adapterHealth:
-            adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
-        })
        return (
          <AgentRowCard
            key={agent.key}
-            data={data}
-            deleting={deletingAgentKey === agent.key}
+            agent={agent}
+            status={activity?.[agent.agentId]?.status}
+            lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
+            adapter={adapter}
+            reasoningEffort={harness?.reasoningEffort ?? null}
            onDelete={onDeleteAgent}
-            onPinToggle={onPinToggle}
+            deleting={deletingAgentKey === agent.key}
          />
        )
      })}
@@ -124,53 +99,10 @@ export const AgentList: FC<AgentListProps> = ({
  )
 }

-function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
+function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
  const lower = label?.toLowerCase()
  if (lower === 'claude code') return 'claude'
  if (lower === 'codex') return 'codex'
  if (lower === 'openclaw') return 'openclaw'
-  return 'unknown'
-}
-
-const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
-
-function buildRowData(input: {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  harness: HarnessAgent | undefined
-  activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
-  adapterHealth: AgentAdapterHealth | null
-}): AgentRowData {
-  const { agent, adapter, harness, activity, adapterHealth } = input
-  return {
-    agent,
-    adapter,
-    modelLabel: deriveModelLabel(agent, harness),
-    reasoningEffort: harness?.reasoningEffort ?? null,
-    status: activity?.status ?? 'unknown',
-    lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
-    pinned: harness?.pinned ?? false,
-    cwd: harness?.cwd ?? null,
-    lastUserMessage: harness?.lastUserMessage ?? null,
-    tokens: harness?.tokens ?? null,
-    turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
-    failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
-    lastError: harness?.lastError ?? null,
-    lastErrorAt: harness?.lastErrorAt ?? null,
-    activeTurnId: harness?.activeTurnId ?? null,
-    adapterHealth,
-  }
-}
-
-function deriveModelLabel(
-  agent: AgentListItem,
-  harness: HarnessAgent | undefined,
-): string | null {
-  // Prefer the agent rail's modelLabel when meaningful; harness's
-  // modelId is a stable identifier but the rail's `modelLabel`
-  // already maps to a friendly display string.
-  if (agent.modelLabel && agent.modelLabel !== 'default') {
-    return agent.modelLabel
-  }
-  return harness?.modelId ?? null
+  return undefined
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentRowCard.tsx
@@ -1,99 +1,270 @@
+import {
+  Copy,
+  Loader2,
+  MessageSquare,
+  MoreHorizontal,
+  Pencil,
+  RotateCcw,
+  Trash2,
+} from 'lucide-react'
 import type { FC } from 'react'
+import { useNavigate } from 'react-router'
+import { toast } from 'sonner'
+import { Badge } from '@/components/ui/badge'
+import { Button } from '@/components/ui/button'
+import {
+  DropdownMenu,
+  DropdownMenuContent,
+  DropdownMenuItem,
+  DropdownMenuSeparator,
+  DropdownMenuTrigger,
+} from '@/components/ui/dropdown-menu'
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from '@/components/ui/tooltip'
 import { cn } from '@/lib/utils'
-import { AgentActions } from './agent-row/AgentActions'
-import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
-import { AgentLastMessage } from './agent-row/AgentLastMessage'
-import { AgentMetaRow } from './agent-row/AgentMetaRow'
-import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
-import { AgentTile } from './agent-row/AgentTile'
-import { AgentTitleRow } from './agent-row/AgentTitleRow'
-import type {
-  AgentRowCallbacks,
-  AgentRowData,
-} from './agent-row/agent-row.types'
+import { AdapterIcon, adapterLabel } from './AdapterIcon'
+import {
+  canDelete as canDeleteAgent,
+  canRename as canRenameAgent,
+  displayName,
+  formatRelativeTime,
+  workspaceLabel,
+} from './agent-display.helpers'
+import type { HarnessAgentAdapter } from './agent-harness-types'
+import type { AgentListItem } from './agents-page-types'
+import { type AgentLiveness, LivenessDot } from './LivenessDot'

-interface AgentRowCardProps extends AgentRowCallbacks {
-  data: AgentRowData
-  /** Whether THIS agent is mid-delete; renders a spinner in the menu. */
+interface AgentRowCardProps {
+  agent: AgentListItem
+  /**
+   * Per-agent extras the listing surface provides on top of the
+   * minimal `AgentListItem` shape. `lastUsedAt` survives server
+   * restart (sourced from acpx session record); `status` is in-memory
+   * server-side.
+   */
+  status?: AgentLiveness
+  lastUsedAt?: number | null
+  /** Adapter the agent belongs to. Drives icon + label. */
+  adapter?: HarnessAgentAdapter
+  /** Reasoning effort chip (claude/codex/openclaw catalog). */
+  reasoningEffort?: string | null
+  /** Modeled directly off the inbound delete handler so the parent owns the dialog. */
+  onDelete: (agent: AgentListItem) => void
+  /** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
  deleting?: boolean
 }

-/**
- * Composition shell for the agent rail. Owns no state; sub-components
- * each handle their own micro-state (error-panel collapse, etc.) and
- * emit callbacks (delete, pin/unpin) for the page to act on.
- *
- * The whole card carries state — not just the tile — so the row's
- * border subtly tells the user what's going on at a glance:
- *   working → accent-orange border with a soft glow
- *   error   → destructive border
- *   idle    → muted border, lifts on hover
- */
 export const AgentRowCard: FC<AgentRowCardProps> = ({
-  data,
-  deleting,
+  agent,
+  status = 'unknown',
+  lastUsedAt,
+  adapter,
+  reasoningEffort,
  onDelete,
-  onPinToggle,
+  deleting,
 }) => {
+  const navigate = useNavigate()
+  const adapterId = adapter ?? inferAdapterFromListItem(agent)
+  const workspace = workspaceLabel(agent)
+  const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
+  const allowDelete = canDeleteAgent(agent)
+  const allowRename = canRenameAgent(agent)
+
+  const handleChat = () => navigate(`/agents/${agent.agentId}`)
+  const handleCopyId = async () => {
+    try {
+      await navigator.clipboard.writeText(agent.agentId)
+      toast.success('Agent id copied')
+    } catch {
+      toast.error('Could not copy agent id')
+    }
+  }
+
  return (
    <div
      className={cn(
-        // Layout-stable hover. No translate, no shadow change — both
-        // visibly perturb neighbouring rows. Only the border tint
-        // shifts on hover, and the rail's vertical rhythm stays
-        // exactly the same in every state.
-        'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
-        data.status === 'working'
-          ? 'border-[var(--accent-orange)]/40'
-          : data.status === 'error'
-            ? 'border-destructive/40'
-            : 'border-border hover:border-[var(--accent-orange)]/30',
+        'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
+        'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
      )}
    >
      <div className="flex items-start gap-4">
-        <AgentTile
-          adapter={data.adapter}
-          status={data.status}
-          lastUsedAt={data.lastUsedAt}
-        />
-
-        <div className="min-w-0 flex-1">
-          <AgentTitleRow
-            agent={data.agent}
-            status={data.status}
-            pinned={data.pinned}
-            turnsByDay={data.turnsByDay}
-            failedByDay={data.failedByDay}
-            onPinToggle={(next) => onPinToggle(data.agent, next)}
+        {/* Adapter tile + liveness dot in the corner. */}
+        <div className="relative shrink-0">
+          <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
+            <AdapterIcon adapter={adapterId} className="h-6 w-6" />
+          </div>
+          <LivenessDot
+            status={status}
+            detail={livenessDetail(status, lastUsedAt)}
+            className="absolute -right-0.5 -bottom-0.5"
          />
-
-          <AgentSummaryChips
-            adapter={data.adapter}
-            modelLabel={data.modelLabel}
-            reasoningEffort={data.reasoningEffort}
-            adapterHealth={data.adapterHealth}
-          />
-
-          <AgentLastMessage message={data.lastUserMessage} />
-
-          <AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
-
-          {data.status === 'error' && data.lastError && (
-            <AgentErrorPanel
-              agentId={data.agent.agentId}
-              message={data.lastError}
-              errorAt={data.lastErrorAt}
-            />
-          )}
        </div>

-        <AgentActions
-          agent={data.agent}
-          activeTurnId={data.activeTurnId}
-          deleting={deleting}
-          onDelete={onDelete}
-        />
+        <div className="min-w-0 flex-1">
+          <div className="mb-1 flex items-center gap-2">
+            <span className="truncate font-semibold">{displayName(agent)}</span>
+            {status === 'working' && (
+              <Badge
+                variant="secondary"
+                className="bg-amber-50 text-amber-900 hover:bg-amber-50"
+              >
+                Working
+              </Badge>
+            )}
+            {status === 'asleep' && (
+              <Badge variant="outline" className="text-muted-foreground">
+                Asleep
+              </Badge>
+            )}
+            {status === 'error' && (
+              <Badge variant="destructive">Attention</Badge>
+            )}
+          </div>
+
+          <div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
+            <Badge variant="secondary" className="font-normal">
+              {adapterLabel(adapterId)}
+            </Badge>
+            {agent.modelLabel && agent.modelLabel !== 'default' && (
+              <Badge variant="outline" className="font-normal">
+                {agent.modelLabel}
+              </Badge>
+            )}
+            {reasoningEffort && reasoningEffort !== 'medium' && (
+              <Badge variant="outline" className="font-normal">
+                {reasoningEffort}
+              </Badge>
+            )}
+          </div>
+
+          <div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
+            <span>Last used {lastUsedLabel}</span>
+            {workspace && (
+              <>
+                <span aria-hidden>•</span>
+                <span className="truncate font-mono" title={workspace}>
+                  {workspace}
+                </span>
+              </>
+            )}
+          </div>
+        </div>
+
+        <div className="flex shrink-0 items-center gap-2">
+          <Button variant="outline" size="sm" onClick={handleChat}>
+            <MessageSquare className="mr-1.5 h-3 w-3" />
+            Chat
+          </Button>
+          <DropdownMenu>
+            <DropdownMenuTrigger asChild>
+              <Button
+                variant="ghost"
+                size="icon"
+                aria-label={`More actions for ${displayName(agent)}`}
+                className="h-8 w-8"
+              >
+                <MoreHorizontal className="h-4 w-4" />
+              </Button>
+            </DropdownMenuTrigger>
+            <DropdownMenuContent align="end" className="w-44">
+              <DropdownMenuItem onSelect={() => void handleCopyId()}>
+                <Copy className="mr-2 h-3.5 w-3.5" />
+                Copy id
+              </DropdownMenuItem>
+              <RenameMenuItem disabled={!allowRename} />
+              <ResetHistoryMenuItem />
+              <DropdownMenuSeparator />
+              <DropdownMenuItem
+                onSelect={() => onDelete(agent)}
+                disabled={!allowDelete || deleting}
+                className="text-destructive focus:text-destructive"
+              >
+                {deleting ? (
+                  <Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
+                ) : (
+                  <Trash2 className="mr-2 h-3.5 w-3.5" />
+                )}
+                Delete
+              </DropdownMenuItem>
+            </DropdownMenuContent>
+          </DropdownMenu>
+        </div>
      </div>
    </div>
  )
 }
+
+const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <Pencil className="mr-2 h-3.5 w-3.5" />
+      Rename
+    </DropdownMenuItem>
+  )
+  if (!disabled) return item
+  // Disabled but with a hint so users know it's coming, not broken.
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Rename coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+const ResetHistoryMenuItem: FC = () => {
+  const item = (
+    <DropdownMenuItem disabled className="text-muted-foreground">
+      <RotateCcw className="mr-2 h-3.5 w-3.5" />
+      Reset history
+    </DropdownMenuItem>
+  )
+  return (
+    <TooltipProvider delayDuration={300}>
+      <Tooltip>
+        <TooltipTrigger asChild>
+          <span className="block w-full">{item}</span>
+        </TooltipTrigger>
+        <TooltipContent side="left" className="text-xs">
+          Reset history coming soon
+        </TooltipContent>
+      </Tooltip>
+    </TooltipProvider>
+  )
+}
+
+function inferAdapterFromListItem(
+  agent: AgentListItem,
+): HarnessAgentAdapter | 'unknown' {
+  const label = agent.runtimeLabel?.toLowerCase()
+  if (label?.includes('claude')) return 'claude'
+  if (label?.includes('codex')) return 'codex'
+  if (label?.includes('openclaw')) return 'openclaw'
+  return 'unknown'
+}
+
+function livenessDetail(
+  status: AgentLiveness,
+  lastUsedAt: number | null | undefined,
+): string | undefined {
+  if (lastUsedAt == null) return undefined
+  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
+  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
+  if (status === 'asleep') {
+    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
+    const hr = Math.floor(diffMin / 60)
+    return `Asleep — quiet for ${hr} hr`
+  }
+  if (status === 'working') return 'Working on a turn'
+  if (status === 'error') return 'Attention — last turn failed'
+  return undefined
+}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
@@ -44,7 +44,6 @@ import {
  useCreateHarnessAgent,
  useDeleteHarnessAgent,
  useHarnessAgents,
-  useUpdateHarnessAgent,
 } from './useAgents'
 import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'

@@ -77,7 +76,6 @@ export const AgentsPage: FC = () => {
  } = useOpenClawAgents(openClawAgentsEnabled)
  const createHarnessAgent = useCreateHarnessAgent()
  const deleteHarnessAgent = useDeleteHarnessAgent()
-  const updateHarnessAgent = useUpdateHarnessAgent()
  const {
    setupOpenClaw,
    createAgent: createOpenClawAgent,
@@ -344,24 +342,12 @@ export const AgentsPage: FC = () => {
          agents={agentListItems}
          activity={agentActivity}
          harnessAgentLookup={harnessAgentLookup}
-          adapters={adapters}
          loading={agentsLoading}
          deletingAgentKey={deletingAgent ? deletingAgentKey : null}
          onCreateAgent={() => setCreateOpen(true)}
          onDeleteAgent={(agent) => {
            void handleDelete(agent)
          }}
-          onPinToggle={(agent, next) => {
-            // Optimistic mutation; harness-only — gateway-original
-            // OpenClaw entries are gated server-side via the harness
-            // backfill, so we only fire when the row maps to a
-            // harness agent record.
-            if (!harnessAgentLookup.has(agent.agentId)) return
-            updateHarnessAgent.mutate({
-              agentId: agent.agentId,
-              patch: { pinned: next },
-            })
-          }}
        />

        <SetupOpenClawDialog
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-display.helpers.ts
@@ -1,5 +1,4 @@
 import type { AgentListItem } from './agents-page-types'
-import type { AgentLiveness } from './LivenessDot'

 /**
 * Display rules for the redesigned agent rows. Pure helpers — no React,
@@ -83,25 +82,3 @@ export function formatRelativeTime(epochMs: number | null): string {
  const d = Math.floor(diff / ONE_DAY)
  return d === 1 ? '1 day ago' : `${d} days ago`
 }
-
-/**
- * Tooltip-friendly description of a row's current liveness state.
- * Returns `undefined` when the state has nothing extra to add (e.g.
- * `unknown` with no timestamp).
- */
-export function livenessDetail(
-  status: AgentLiveness,
-  lastUsedAt: number | null | undefined,
-): string | undefined {
-  if (lastUsedAt == null) return undefined
-  const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
-  if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
-  if (status === 'asleep') {
-    if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
-    const hr = Math.floor(diffMin / 60)
-    return `Asleep — quiet for ${hr} hr`
-  }
-  if (status === 'working') return 'Working on a turn'
-  if (status === 'error') return 'Attention — last turn failed'
-  return undefined
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-harness-types.ts
@@ -56,43 +56,6 @@ export interface HarnessAgent {
   * agents. Drives the recency sort and the "Last used X min ago" copy.
   */
  lastUsedAt?: number | null
-  /** Pinned agents float to the top of the list. Defaults to `false`. */
-  pinned?: boolean
-  /** First non-blank line of the most recent user message; null if none. */
-  lastUserMessage?: string | null
-  /** Working directory the agent runs in; null when no session record yet. */
-  cwd?: string | null
-  /** Cumulative + 7-day rolling token usage; null when no record. */
-  tokens?: {
-    last7d: { input: number; output: number; requestCount: number }
-    cumulative: { input: number; output: number }
-  } | null
-  turnsByDay?: number[]
-  failedByDay?: number[]
-  lastError?: string | null
-  lastErrorAt?: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId?: string | null
-  /** Persistent FIFO queue of messages waiting for this agent. */
-  queue?: HarnessQueuedMessage[]
-}
-
-export interface HarnessQueuedMessageAttachment {
-  mediaType: string
-  data: string
-}
-
-export interface HarnessQueuedMessage {
-  id: string
-  createdAt: number
-  message: string
-  attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
-}
-
-export interface HarnessAdapterHealth {
-  healthy: boolean
-  reason?: string
-  checkedAt: number
 }

 export interface HarnessAdapterDescriptor {
@@ -103,7 +66,6 @@ export interface HarnessAdapterDescriptor {
  modelControl: 'runtime-supported' | 'best-effort'
  models: Array<{ id: string; label: string; recommended?: boolean }>
  reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
-  health?: HarnessAdapterHealth
 }

 export interface CreateHarnessAgentInput {
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentActions.tsx
@@ -1,160 +0,0 @@
-import {
-  Copy,
-  Loader2,
-  MessageSquare,
-  MoreHorizontal,
-  Pencil,
-  RotateCcw,
-  Trash2,
-} from 'lucide-react'
-import type { FC } from 'react'
-import { useNavigate } from 'react-router'
-import { toast } from 'sonner'
-import { Button } from '@/components/ui/button'
-import {
-  DropdownMenu,
-  DropdownMenuContent,
-  DropdownMenuItem,
-  DropdownMenuSeparator,
-  DropdownMenuTrigger,
-} from '@/components/ui/dropdown-menu'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import {
-  canDelete as canDeleteAgent,
-  canRename as canRenameAgent,
-  displayName,
-} from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-
-interface AgentActionsProps {
-  agent: AgentListItem
-  activeTurnId: string | null
-  deleting?: boolean
-  onDelete: (agent: AgentListItem) => void
-}
-
-/**
- * Single primary CTA per row: `Resume` (filled, accent-orange, with a
- * pulsing dot) when an active turn exists; otherwise `Chat` (outline).
- * Both navigate to the same place — the chat hook auto-attaches via
- * `/chat/active` when there's a live turn — but the row signals which
- * action the user is actually taking.
- */
-export const AgentActions: FC<AgentActionsProps> = ({
-  agent,
-  activeTurnId,
-  deleting,
-  onDelete,
-}) => {
-  const navigate = useNavigate()
-  const allowDelete = canDeleteAgent(agent)
-  const allowRename = canRenameAgent(agent)
-
-  const handleChat = () => navigate(`/agents/${agent.agentId}`)
-  const handleCopyId = async () => {
-    try {
-      await navigator.clipboard.writeText(agent.agentId)
-      toast.success('Agent id copied')
-    } catch {
-      toast.error('Could not copy agent id')
-    }
-  }
-
-  return (
-    <div className="flex shrink-0 items-center gap-1.5">
-      {activeTurnId ? (
-        <Button
-          variant="default"
-          size="sm"
-          onClick={handleChat}
-          className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
-        >
-          <span className="relative flex size-2">
-            <span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
-            <span className="relative inline-flex size-2 rounded-full bg-white" />
-          </span>
-          Resume
-        </Button>
-      ) : (
-        <Button variant="outline" size="sm" onClick={handleChat}>
-          <MessageSquare className="mr-1.5 size-3" />
-          Chat
-        </Button>
-      )}
-      <DropdownMenu>
-        <DropdownMenuTrigger asChild>
-          <Button
-            variant="ghost"
-            size="icon"
-            aria-label={`More actions for ${displayName(agent)}`}
-            className="size-8 text-muted-foreground hover:text-foreground"
-          >
-            <MoreHorizontal className="size-4" />
-          </Button>
-        </DropdownMenuTrigger>
-        <DropdownMenuContent align="end" className="w-44">
-          <DropdownMenuItem onSelect={() => void handleCopyId()}>
-            <Copy className="mr-2 size-3.5" />
-            Copy id
-          </DropdownMenuItem>
-          <ComingSoonItem
-            icon={Pencil}
-            label="Rename"
-            disabled={!allowRename}
-          />
-          <ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
-          <DropdownMenuSeparator />
-          <DropdownMenuItem
-            onSelect={() => onDelete(agent)}
-            disabled={!allowDelete || deleting}
-            className="text-destructive focus:text-destructive"
-          >
-            {deleting ? (
-              <Loader2 className="mr-2 size-3.5 animate-spin" />
-            ) : (
-              <Trash2 className="mr-2 size-3.5" />
-            )}
-            Delete
-          </DropdownMenuItem>
-        </DropdownMenuContent>
-      </DropdownMenu>
-    </div>
-  )
-}
-
-interface ComingSoonItemProps {
-  icon: typeof Pencil
-  label: string
-  disabled: boolean
-}
-
-const ComingSoonItem: FC<ComingSoonItemProps> = ({
-  icon: Icon,
-  label,
-  disabled,
-}) => {
-  const item = (
-    <DropdownMenuItem disabled className="text-muted-foreground">
-      <Icon className="mr-2 size-3.5" />
-      {label}
-    </DropdownMenuItem>
-  )
-  if (!disabled) return item
-  return (
-    <TooltipProvider delayDuration={300}>
-      <Tooltip>
-        <TooltipTrigger asChild>
-          <span className="block w-full">{item}</span>
-        </TooltipTrigger>
-        <TooltipContent side="left" className="text-xs">
-          {label} coming soon
-        </TooltipContent>
-      </Tooltip>
-    </TooltipProvider>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentErrorPanel.tsx
@@ -1,96 +0,0 @@
-import { AlertTriangle, ChevronDown } from 'lucide-react'
-import { type FC, useEffect, useState } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Collapsible,
-  CollapsibleContent,
-  CollapsibleTrigger,
-} from '@/components/ui/collapsible'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { truncate } from './agent-row.helpers'
-
-interface AgentErrorPanelProps {
-  agentId: string
-  message: string
-  errorAt: number | null
-}
-
-const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
-const PREVIEW_CHARS = 200
-
-export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
-  agentId,
-  message,
-  errorAt,
-}) => {
-  const storageKey = `${STORAGE_PREFIX}${agentId}`
-  // Open if we've never seen this `errorAt` for this agent. Once the
-  // user collapses the panel (or refreshes after seeing it), we mark
-  // it seen so it doesn't re-pop on every poll.
-  const [open, setOpen] = useState<boolean>(() => {
-    if (typeof window === 'undefined' || !errorAt) return true
-    const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
-    return !Number.isFinite(seen) || errorAt > seen
-  })
-
-  useEffect(() => {
-    if (!open && errorAt && typeof window !== 'undefined') {
-      window.localStorage.setItem(storageKey, String(errorAt))
-    }
-  }, [open, errorAt, storageKey])
-
-  const preview = truncate(message, PREVIEW_CHARS)
-  const truncated = preview.length < message.length
-
-  return (
-    <Collapsible open={open} onOpenChange={setOpen} className="mt-3">
-      <div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
-        <div className="flex items-center gap-2 font-medium text-destructive text-xs">
-          <AlertTriangle className="size-3.5" />
-          Last error
-        </div>
-        <CollapsibleTrigger asChild>
-          <Button
-            variant="ghost"
-            size="sm"
-            className="h-6 px-2 text-muted-foreground"
-          >
-            <span className="text-xs">{open ? 'hide' : 'show'}</span>
-            <ChevronDown
-              className={cn(
-                'ml-1 size-3 transition-transform',
-                open && 'rotate-180',
-              )}
-            />
-          </Button>
-        </CollapsibleTrigger>
-      </div>
-      <CollapsibleContent>
-        <div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
-          {truncated ? (
-            <HoverCard openDelay={300}>
-              <HoverCardTrigger asChild>
-                <span className="cursor-default font-mono text-foreground/80">
-                  {preview}…
-                </span>
-              </HoverCardTrigger>
-              <HoverCardContent
-                side="bottom"
-                className="max-w-md whitespace-pre-wrap font-mono text-xs"
-              >
-                {message}
-              </HoverCardContent>
-            </HoverCard>
-          ) : (
-            <span className="font-mono text-foreground/80">{message}</span>
-          )}
-        </div>
-      </CollapsibleContent>
-    </Collapsible>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentLastMessage.tsx
@@ -1,35 +0,0 @@
-import { Quote } from 'lucide-react'
-import type { FC } from 'react'
-import { firstNonBlankLine, truncate } from './agent-row.helpers'
-
-interface AgentLastMessageProps {
-  message: string | null
-}
-
-const PREVIEW_CHARS = 110
-
-/**
- * Inline preview of the most recent user message. Renders as a quoted,
- * italic line so the row reads like a conversation snippet rather than
- * a label-and-value pair. No hover-card — opening the agent's chat is
- * the canonical way to read the full message.
- */
-export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
-  if (!message) {
-    return (
-      <p className="mt-1 text-muted-foreground/70 text-xs italic">
-        No messages yet — start a chat
-      </p>
-    )
-  }
-  const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
-  return (
-    <p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
-      <Quote
-        className="mt-1 size-3 shrink-0 text-muted-foreground/60"
-        aria-hidden
-      />
-      <span className="truncate">{preview}</span>
-    </p>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentMetaRow.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { formatRelativeTime } from '../agent-display.helpers'
-import { AgentTokenSummary } from './AgentTokenSummary'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentMetaRowProps {
-  lastUsedAt: number | null
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Bottom-of-row meta line. Intentionally sparse — last activity time
- * and lifetime tokens. CWD is no longer surfaced here because the path
- * the server happens to be running from isn't actionable; if a future
- * surface needs the cwd (chat panel, debug view) it reads from the
- * listing payload directly.
- */
-export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
-  const lastUsedLabel = formatRelativeTime(lastUsedAt)
-  const tokensTotal =
-    (tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
-  const showTokens = tokensTotal > 0
-
-  return (
-    <div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
-      <span>{lastUsedLabel}</span>
-      {showTokens && (
-        <>
-          <span aria-hidden className="text-muted-foreground/50">
-            ·
-          </span>
-          <AgentTokenSummary tokens={tokens} />
-        </>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSparkline.tsx
@@ -1,92 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
-
-interface AgentSparklineProps {
-  /** 14 entries, oldest → newest. Today's bucket is the last index. */
-  turnsByDay: number[]
-  /** Same length, same order. Failed turns counted separately. */
-  failedByDay: number[]
-  className?: string
-}
-
-const MIN_BAR_HEIGHT_PX = 2
-const MAX_BAR_HEIGHT_PX = 18
-
-export const AgentSparkline: FC<AgentSparklineProps> = ({
-  turnsByDay,
-  failedByDay,
-  className,
-}) => {
-  if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
-  const max = Math.max(1, ...turnsByDay)
-
-  return (
-    <HoverCard openDelay={250}>
-      <HoverCardTrigger asChild>
-        <div
-          role="img"
-          aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
-          className={cn('flex h-5 items-end gap-px', className)}
-        >
-          {turnsByDay.map((count, idx) => {
-            const ratio = count / max
-            const height = Math.max(
-              MIN_BAR_HEIGHT_PX,
-              Math.round(ratio * MAX_BAR_HEIGHT_PX),
-            )
-            const isToday = idx === ROW_BAR_COUNT - 1
-            const failed = failedByDay[idx] ?? 0
-            return (
-              <div
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
-                key={`bar-${idx}`}
-                className={cn(
-                  'w-1.5 rounded-sm',
-                  count === 0
-                    ? 'bg-muted-foreground/15'
-                    : failed > 0
-                      ? 'bg-destructive/50'
-                      : 'bg-[var(--accent-orange)]/50',
-                  isToday && 'ring-1 ring-foreground/30',
-                )}
-                style={{ height }}
-              />
-            )
-          })}
-        </div>
-      </HoverCardTrigger>
-      <HoverCardContent side="left" className="w-56 text-xs">
-        <div className="mb-2 font-medium text-sm">Last 14 days</div>
-        <ul className="space-y-0.5">
-          {turnsByDay.map((count, idx) => {
-            const failed = failedByDay[idx] ?? 0
-            const dayLabel = formatLocalDate(idx)
-            return (
-              <li
-                // biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
-                key={`day-${idx}`}
-                className="flex items-center justify-between text-muted-foreground"
-              >
-                <span>{dayLabel}</span>
-                <span>
-                  {count}
-                  {failed > 0 && (
-                    <span className="ml-1 text-destructive">
-                      ({failed} failed)
-                    </span>
-                  )}
-                </span>
-              </li>
-            )
-          })}
-        </ul>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx
@@ -1,71 +0,0 @@
-import { TriangleAlert } from 'lucide-react'
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { cn } from '@/lib/utils'
-import { adapterLabel } from '../AdapterIcon'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentAdapterHealth } from './agent-row.types'
-
-interface AgentSummaryChipsProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  /** When unhealthy, the adapter label dims and a warning chip appears. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-/**
- * Adapter / model / reasoning summary line. Always rendered (so OpenClaw
- * rows that fall back to defaults still expose what they're set up to do)
- * and surfaces adapter-health *only when unhealthy* — keeping the calm
- * default state silent and reserving visual noise for things the user
- * needs to act on.
- */
-export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
-  adapter,
-  modelLabel,
-  reasoningEffort,
-  adapterHealth,
-}) => {
-  const parts = [adapterLabel(adapter)]
-  if (modelLabel) parts.push(modelLabel)
-  if (reasoningEffort) parts.push(reasoningEffort)
-  const unhealthy = adapterHealth?.healthy === false
-  return (
-    <div
-      className={cn(
-        'flex items-center gap-1.5 text-muted-foreground text-xs',
-        unhealthy && 'text-muted-foreground/70',
-      )}
-    >
-      <span className="truncate">{parts.join(' · ')}</span>
-      {unhealthy && adapterHealth && (
-        <HoverCard openDelay={200}>
-          <HoverCardTrigger asChild>
-            <Badge
-              variant="outline"
-              className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
-            >
-              <TriangleAlert className="size-2.5" />
-              <span className="font-normal">Unavailable</span>
-            </Badge>
-          </HoverCardTrigger>
-          <HoverCardContent side="right" className="w-72 text-sm">
-            <div className="font-medium">
-              {adapterLabel(adapter)} CLI not available
-            </div>
-            <div className="mt-1 text-muted-foreground text-xs">
-              {adapterHealth.reason ??
-                'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
-            </div>
-          </HoverCardContent>
-        </HoverCard>
-      )}
-    </div>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTile.tsx
@@ -1,37 +0,0 @@
-import type { FC } from 'react'
-import { cn } from '@/lib/utils'
-import { AdapterIcon } from '../AdapterIcon'
-import { livenessDetail } from '../agent-display.helpers'
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import { type AgentLiveness, LivenessDot } from '../LivenessDot'
-
-export interface AgentTileProps {
-  adapter: HarnessAgentAdapter | 'unknown'
-  status: AgentLiveness
-  lastUsedAt: number | null
-}
-
-/**
- * Adapter glyph + a single liveness dot. Adapter health is no longer
- * surfaced here — it lives as an inline pill inside `AgentSummaryChips`
- * so the user isn't asked to disambiguate two dots on the same tile.
- */
-export const AgentTile: FC<AgentTileProps> = ({
-  adapter,
-  status,
-  lastUsedAt,
-}) => (
-  <div className="relative shrink-0">
-    <div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
-      <AdapterIcon adapter={adapter} className="h-6 w-6" />
-    </div>
-    <LivenessDot
-      status={status}
-      detail={livenessDetail(status, lastUsedAt)}
-      className={cn(
-        'absolute -right-0.5 -bottom-0.5',
-        status === 'working' && 'animate-pulse',
-      )}
-    />
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTitleRow.tsx
@@ -1,55 +0,0 @@
-import type { FC } from 'react'
-import { Badge } from '@/components/ui/badge'
-import { displayName } from '../agent-display.helpers'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-import { AgentSparkline } from './AgentSparkline'
-import { PinToggle } from './PinToggle'
-
-interface AgentTitleRowProps {
-  agent: AgentListItem
-  status: AgentLiveness
-  pinned: boolean
-  turnsByDay: number[]
-  failedByDay: number[]
-  onPinToggle: (next: boolean) => void
-}
-
-/**
- * Title strip: name + status badge + (right-aligned) sparkline. The
- * pin toggle sits trailing the title so the title always flushes left
- * regardless of pin state — moving the star left of the title indents
- * the row's first line off-axis from the model/preview/meta lines
- * below it. When unpinned and not hovered, the toggle is removed from
- * layout entirely so it reserves no space at all.
- */
-export const AgentTitleRow: FC<AgentTitleRowProps> = ({
-  agent,
-  status,
-  pinned,
-  turnsByDay,
-  failedByDay,
-  onPinToggle,
-}) => (
-  <div className="mb-1 flex items-center gap-2">
-    <span className="truncate font-semibold">{displayName(agent)}</span>
-    {status === 'working' && (
-      <Badge
-        variant="secondary"
-        className="bg-amber-50 text-amber-900 hover:bg-amber-50"
-      >
-        Working
-      </Badge>
-    )}
-    {status === 'asleep' && (
-      <Badge variant="outline" className="text-muted-foreground">
-        Asleep
-      </Badge>
-    )}
-    {status === 'error' && <Badge variant="destructive">Attention</Badge>}
-    <PinToggle pinned={pinned} onToggle={onPinToggle} />
-    <div className="ml-auto">
-      <AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
-    </div>
-  </div>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentTokenSummary.tsx
@@ -1,63 +0,0 @@
-import type { FC } from 'react'
-import {
-  HoverCard,
-  HoverCardContent,
-  HoverCardTrigger,
-} from '@/components/ui/hover-card'
-import { Progress } from '@/components/ui/progress'
-import { formatTokens } from './agent-row.helpers'
-import type { AgentTokenUsage } from './agent-row.types'
-
-interface AgentTokenSummaryProps {
-  tokens: AgentTokenUsage | null
-}
-
-/**
- * Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
- * (the only window we can compute reliably from the session record).
- * Per-window stats land in a follow-up once the activity ledger ships.
- */
-export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
-  if (!tokens) return null
-  const { input, output } = tokens.cumulative
-  const total = input + output
-  if (total === 0) return null
-  const inputPct = (input / total) * 100
-
-  return (
-    <HoverCard openDelay={200}>
-      <HoverCardTrigger asChild>
-        <span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
-          {formatTokens(total)} tokens
-        </span>
-      </HoverCardTrigger>
-      <HoverCardContent side="top" align="end" className="w-72 text-sm">
-        <div className="mb-3 flex items-center justify-between">
-          <span className="font-medium">Lifetime tokens</span>
-          <span className="text-muted-foreground text-xs tabular-nums">
-            {formatTokens(total)} total
-          </span>
-        </div>
-
-        <div className="space-y-2">
-          <div className="flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Input</span>
-            <span className="tabular-nums">{formatTokens(input)}</span>
-          </div>
-          <Progress value={inputPct} className="h-1.5" />
-
-          <div className="mt-2 flex items-center justify-between text-xs">
-            <span className="text-muted-foreground">Output</span>
-            <span className="tabular-nums">{formatTokens(output)}</span>
-          </div>
-          <Progress value={100 - inputPct} className="h-1.5" />
-        </div>
-
-        <p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
-          Cumulative across every turn this agent has run. Per-window stats
-          arrive in a future release.
-        </p>
-      </HoverCardContent>
-    </HoverCard>
-  )
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/PinToggle.tsx
@@ -1,60 +0,0 @@
-import { Star } from 'lucide-react'
-import type { FC } from 'react'
-import { Button } from '@/components/ui/button'
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from '@/components/ui/tooltip'
-import { cn } from '@/lib/utils'
-
-interface PinToggleProps {
-  pinned: boolean
-  onToggle: (next: boolean) => void
-}
-
-/**
- * Trailing star toggle. The button is *always rendered* — only its
- * opacity changes between pinned/unpinned/hover states — so the title
- * row's height is constant. Hiding the slot via `display: none` would
- * collapse the row's vertical metrics on hover and shift every card
- * below in the rail.
- *
- * Placement is trailing the title (after the status badge) so the
- * title itself flushes left regardless of pin state — leading the
- * row with the star would indent the title relative to the model /
- * preview / meta lines beneath it.
- */
-export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
-  <TooltipProvider delayDuration={300}>
-    <Tooltip>
-      <TooltipTrigger asChild>
-        <Button
-          variant="ghost"
-          size="icon"
-          className={cn(
-            'size-6 text-muted-foreground transition-opacity hover:text-foreground',
-            pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
-          )}
-          aria-pressed={pinned}
-          aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
-          onClick={(event) => {
-            event.stopPropagation()
-            onToggle(!pinned)
-          }}
-        >
-          <Star
-            className={cn(
-              'size-3.5',
-              pinned && 'fill-amber-400 text-amber-500',
-            )}
-          />
-        </Button>
-      </TooltipTrigger>
-      <TooltipContent side="top" className="text-xs">
-        {pinned ? 'Unpin' : 'Pin to top'}
-      </TooltipContent>
-    </Tooltip>
-  </TooltipProvider>
-)
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.test.ts
@@ -1,73 +0,0 @@
-import { describe, expect, it } from 'bun:test'
-import {
-  firstNonBlankLine,
-  formatLocalDate,
-  formatTokens,
-  ROW_BAR_COUNT,
-  truncate,
-} from './agent-row.helpers'
-
-describe('formatTokens', () => {
-  it('renders zero / NaN as "0"', () => {
-    expect(formatTokens(0)).toBe('0')
-    expect(formatTokens(Number.NaN)).toBe('0')
-  })
-
-  it('renders sub-1K as integer', () => {
-    expect(formatTokens(142)).toBe('142')
-  })
-
-  it('renders K with one decimal under 10', () => {
-    expect(formatTokens(8_400)).toBe('8.4K')
-  })
-
-  it('drops the decimal at >=10K', () => {
-    expect(formatTokens(120_000)).toBe('120K')
-  })
-
-  it('renders M with one decimal under 10', () => {
-    expect(formatTokens(1_200_000)).toBe('1.2M')
-  })
-})
-
-describe('firstNonBlankLine', () => {
-  it('returns the first non-blank line', () => {
-    expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
-  })
-
-  it('skips USER_QUERY envelope tags', () => {
-    expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
-      'fix tests',
-    )
-  })
-
-  it('falls back to the trimmed input when nothing matches', () => {
-    expect(firstNonBlankLine('   single   ')).toBe('single')
-  })
-})
-
-describe('truncate', () => {
-  it('returns input unchanged when within limit', () => {
-    expect(truncate('hello', 10)).toBe('hello')
-  })
-
-  it('appends an ellipsis when over limit', () => {
-    expect(truncate('hello world', 6)).toBe('hello…')
-  })
-})
-
-describe('formatLocalDate', () => {
-  const today = new Date('2026-04-30T12:00:00Z')
-
-  it('labels today and yesterday explicitly', () => {
-    expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
-    expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
-  })
-
-  it('returns a "Mon D" format for older days', () => {
-    const label = formatLocalDate(0, today)
-    // "Apr 17" or "Apr 17," depending on locale; just assert it
-    // contains a month abbreviation and a day number.
-    expect(label).toMatch(/[A-Za-z]+ \d+/)
-  })
-})
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.helpers.ts
@@ -1,64 +0,0 @@
-/**
- * Pure formatters consumed by row sub-components. Kept distinct from
- * `agent-display.helpers.ts` (page-level helpers) so the row internals
- * have an obvious single home.
- */
-
-const TOKEN_THRESHOLDS: Array<[number, string]> = [
-  [1_000_000, 'M'],
-  [1_000, 'K'],
-]
-
-/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
-export function formatTokens(n: number): string {
-  if (!Number.isFinite(n) || n <= 0) return '0'
-  for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
-    if (n >= threshold) {
-      const value = n / threshold
-      const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
-      return `${decimal}${suffix}`
-    }
-  }
-  return String(Math.round(n))
-}
-
-const USER_QUERY_OPEN = /^<USER_QUERY>$/i
-const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
-
-/**
- * First non-blank line, with the BrowserOS user-system-prompt
- * `<USER_QUERY>` envelope tags stripped so previews don't show
- * structural noise.
- */
-export function firstNonBlankLine(text: string): string {
-  const lines = text.split('\n').map((line) => line.trim())
-  for (const line of lines) {
-    if (!line) continue
-    if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
-    return line
-  }
-  return text.trim()
-}
-
-export function truncate(text: string, max: number): string {
-  if (text.length <= max) return text
-  return `${text.slice(0, max - 1).trimEnd()}…`
-}
-
-const SPARKLINE_DAYS = 14
-
-/**
- * "today" / "yesterday" / "Apr 17" — given an index 0..13 from
- * oldest → newest. `today` defaults to `new Date()` so callers don't
- * have to thread a clock through.
- */
-export function formatLocalDate(idx: number, today: Date = new Date()): string {
-  if (idx === SPARKLINE_DAYS - 1) return 'today'
-  if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
-  const offset = SPARKLINE_DAYS - 1 - idx
-  const date = new Date(today)
-  date.setDate(date.getDate() - offset)
-  return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
-}
-
-export const ROW_BAR_COUNT = SPARKLINE_DAYS
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/agent-row.types.ts
@@ -1,51 +0,0 @@
-import type { HarnessAgentAdapter } from '../agent-harness-types'
-import type { AgentListItem } from '../agents-page-types'
-import type { AgentLiveness } from '../LivenessDot'
-
-/**
- * Window-bounded token usage. Server returns `null` when no session
- * record exists yet for the agent.
- */
-export interface AgentTokenUsage {
-  last7d: { input: number; output: number; requestCount: number }
-  cumulative: { input: number; output: number }
-}
-
-export interface AgentAdapterHealth {
-  healthy: boolean
-  reason?: string
-}
-
-/**
- * Everything an `AgentRowCard` needs to render. Mirrors the shape
- * `useHarnessAgents` exposes; the page assembles one entry per row in
- * `AgentList` and passes it down. Sub-components only see slices of
- * this object — no prop drilling beyond two levels.
- */
-export interface AgentRowData {
-  agent: AgentListItem
-  adapter: HarnessAgentAdapter | 'unknown'
-  modelLabel: string | null
-  reasoningEffort: string | null
-  status: AgentLiveness
-  lastUsedAt: number | null
-  pinned: boolean
-  cwd: string | null
-  lastUserMessage: string | null
-  tokens: AgentTokenUsage | null
-  /** 14 entries, oldest → newest. Today is the last index. */
-  turnsByDay: number[]
-  /** Same length and ordering as `turnsByDay`. */
-  failedByDay: number[]
-  lastError: string | null
-  lastErrorAt: number | null
-  /** When non-null, an in-flight turn this row can be resumed from. */
-  activeTurnId: string | null
-  /** Adapter-level health, shared across rows for the same adapter. */
-  adapterHealth: AgentAdapterHealth | null
-}
-
-export interface AgentRowCallbacks {
-  onDelete: (agent: AgentListItem) => void
-  onPinToggle: (agent: AgentListItem, next: boolean) => void
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
@@ -8,7 +8,6 @@ import {
  type HarnessAdapterDescriptor,
  type HarnessAgent,
  type HarnessAgentHistoryPage,
-  type HarnessQueuedMessage,
  mapHarnessAgentToEntry,
 } from './agent-harness-types'
 import type { OpenClawStatus } from './useOpenClaw'
@@ -136,63 +135,6 @@ export function useCreateHarnessAgent() {
  })
 }

-/**
- * Apply a partial update to a harness agent. Used by the pin-toggle
- * star and (eventually) the inline rename UI. Optimistically writes
- * the patch into the listing query cache so the row updates instantly,
- * then rolls back if the server rejects the change.
- */
-export function useUpdateHarnessAgent() {
-  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: {
-      agentId: string
-      patch: { name?: string; pinned?: boolean }
-    }) => {
-      if (!baseUrl || urlLoading) {
-        throw new Error('BrowserOS agent server URL is not ready')
-      }
-      const data = await agentsFetch<{ agent: HarnessAgent }>(
-        baseUrl,
-        `/${encodeURIComponent(input.agentId)}`,
-        {
-          method: 'PATCH',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify(input.patch),
-        },
-      )
-      return data.agent
-    },
-    onMutate: async ({ agentId, patch }) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === agentId ? { ...agent, ...patch } : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
 export function useDeleteHarnessAgent() {
  const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
  const queryClient = useQueryClient()
@@ -264,8 +206,6 @@ export interface HarnessActiveTurnInfo {
  lastSeq: number
  startedAt: number
  endedAt?: number
-  /** User message that kicked off the turn; null when not captured. */
-  prompt: string | null
 }

 /**
@@ -320,145 +260,3 @@ export async function fetchHarnessAgentHistory(
    `/${encodeURIComponent(agentId)}/sessions/main/history`,
  )
 }
-
-export interface EnqueueMessageInput {
-  message: string
-  attachments?: ReadonlyArray<unknown>
-}
-
-export async function enqueueHarnessMessage(
-  agentId: string,
-  input: EnqueueMessageInput,
-): Promise<HarnessQueuedMessage> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
-    {
-      method: 'POST',
-      headers: { 'Content-Type': 'application/json' },
-      body: JSON.stringify({
-        message: input.message,
-        ...(input.attachments && input.attachments.length > 0
-          ? { attachments: input.attachments }
-          : {}),
-      }),
-    },
-  )
-  if (!response.ok) {
-    let message = `Request failed with status ${response.status}`
-    try {
-      const body = (await response.json()) as { error?: string }
-      if (body.error) message = body.error
-    } catch {}
-    throw new Error(message)
-  }
-  const body = (await response.json()) as { queued: HarnessQueuedMessage }
-  return body.queued
-}
-
-export async function removeHarnessQueuedMessage(
-  agentId: string,
-  messageId: string,
-): Promise<{ removed: boolean }> {
-  const baseUrl = await getAgentServerUrl()
-  const response = await fetch(
-    `${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
-      messageId,
-    )}`,
-    { method: 'DELETE' },
-  )
-  if (!response.ok) return { removed: false }
-  return (await response.json()) as { removed: boolean }
-}
-
-/**
- * Optimistic enqueue: writes the new queued message into the listing
- * cache immediately so the queue panel reflects the change without
- * waiting for the next poll. Rolls back if the server rejects.
- */
-export function useEnqueueHarnessMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
-      enqueueHarnessMessage(input.agentId, input),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      const optimistic: HarnessQueuedMessage = {
-        id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
-        createdAt: Date.now(),
-        message: input.message,
-      }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
-
-/**
- * Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
- */
-export function useRemoveHarnessQueuedMessage() {
-  const { baseUrl } = useAgentServerUrl()
-  const queryClient = useQueryClient()
-
-  return useMutation({
-    mutationFn: async (input: { agentId: string; messageId: string }) =>
-      removeHarnessQueuedMessage(input.agentId, input.messageId),
-    onMutate: async (input) => {
-      const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
-      await queryClient.cancelQueries({ queryKey })
-      const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
-      if (!previous) return { previous: undefined }
-      queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
-        ...previous,
-        agents: previous.agents.map((agent) =>
-          agent.id === input.agentId
-            ? {
-                ...agent,
-                queue: (agent.queue ?? []).filter(
-                  (entry) => entry.id !== input.messageId,
-                ),
-              }
-            : agent,
-        ),
-      })
-      return { previous }
-    },
-    onError: (_err, _vars, context) => {
-      if (!context?.previous) return
-      queryClient.setQueryData(
-        [AGENT_QUERY_KEYS.agents, baseUrl],
-        context.previous,
-      )
-    },
-    onSettled: async () => {
-      await queryClient.invalidateQueries({
-        queryKey: [AGENT_QUERY_KEYS.agents],
-      })
-    },
-  })
-}
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.test.ts
@@ -1,8 +1,5 @@
 import { describe, expect, it } from 'bun:test'
-import type {
-  HarnessAdapterDescriptor,
-  HarnessAgent,
-} from '@/entrypoints/app/agents/agent-harness-types'
+import type { HarnessAdapterDescriptor } from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import {
  buildSidepanelChatTargets,
@@ -80,96 +77,58 @@ const adapters: HarnessAdapterDescriptor[] = [
  },
 ]

-const agents: HarnessAgent[] = [
-  {
-    id: 'agent-codex',
-    name: 'Review Bot',
-    adapter: 'codex',
-    modelId: 'gpt-5.5',
-    reasoningEffort: 'medium',
-    permissionMode: 'approve-all',
-    sessionKey: 'agent:agent-codex:main',
-    createdAt: timestamp,
-    updatedAt: timestamp,
-  },
-  {
-    id: 'agent-openclaw',
-    name: 'Research Claw',
-    adapter: 'openclaw',
-    modelId: 'default',
-    reasoningEffort: 'high',
-    permissionMode: 'approve-all',
-    sessionKey: 'agent:agent-openclaw:main',
-    createdAt: timestamp,
-    updatedAt: timestamp,
-  },
-]
-
 describe('buildSidepanelChatTargets', () => {
-  it('returns LLM targets plus one ACP target per persisted harness agent', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+  it('returns LLM targets plus one ACP target per adapter model', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })

    expect(targets.map((target) => target.id)).toEqual([
      'browseros',
      'anthropic-sonnet',
-      'agent-codex',
-      'agent-openclaw',
+      'acp:claude:sonnet:medium',
+      'acp:claude:haiku:medium',
+      'acp:codex:gpt-5.5:medium',
+      'acp:openclaw:default:medium',
    ])
  })

-  it('does not emit catalog-only ACP targets without persisted agents', () => {
-    const targets = buildSidepanelChatTargets({
-      providers,
-      adapters,
-      agents: [],
-    })
-
-    expect(targets.map((target) => target.id)).toEqual([
-      'browseros',
-      'anthropic-sonnet',
-    ])
-  })
-
-  it('uses the created OpenClaw agent name instead of a generic adapter target', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const openclaw = targets.find((target) => target.id === 'agent-openclaw')
+  it('emits a single default ACP target for adapters with no per-session model picker', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const openclaw = targets.find(
+      (target) => target.id === 'acp:openclaw:default:medium',
+    )

    expect(openclaw).toMatchObject({
      kind: 'acp',
-      id: 'agent-openclaw',
-      agentId: 'agent-openclaw',
      adapter: 'openclaw',
      adapterName: 'OpenClaw',
      modelId: 'default',
      modelLabel: 'default',
-      name: 'Research Claw',
+      // Without a model picker, the target name is just the adapter
+      // name — the user picks the adapter, not a model under it.
+      name: 'OpenClaw',
      modelControl: 'best-effort',
-      reasoningEffort: 'high',
+      reasoningEffort: 'medium',
    })
  })

-  it('preserves adapter metadata for created agent targets', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const codex = targets.find((target) => target.id === 'agent-codex')
+  it('preserves ACP model-control and recommendation metadata', () => {
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const haiku = targets.find(
+      (target) => target.id === 'acp:claude:haiku:medium',
+    )

-    expect(codex).toMatchObject({
+    expect(haiku).toMatchObject({
      kind: 'acp',
-      agentId: 'agent-codex',
-      adapter: 'codex',
-      adapterName: 'Codex',
-      modelId: 'gpt-5.5',
-      modelLabel: 'GPT-5.5',
-      modelControl: 'runtime-supported',
+      adapter: 'claude',
+      modelId: 'haiku',
+      modelControl: 'best-effort',
      recommended: true,
      reasoningEffort: 'medium',
-      reasoningEffortLabel: 'Medium',
    })
  })

-  it('still returns LLM targets when agents and adapters are unavailable', () => {
-    expect(
-      buildSidepanelChatTargets({ providers, adapters: [], agents: [] }),
-    ).toEqual([
+  it('still returns LLM targets when ACP adapters are unavailable', () => {
+    expect(buildSidepanelChatTargets({ providers, adapters: [] })).toEqual([
      {
        kind: 'llm',
        id: 'browseros',
@@ -190,7 +149,7 @@ describe('buildSidepanelChatTargets', () => {

 describe('resolveSidepanelChatTarget', () => {
  it('resolves selected LLM targets back to their provider config', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
+    const targets = buildSidepanelChatTargets({ providers, adapters })
    const resolved = resolveSidepanelChatTarget({
      targets,
      defaultProviderId: 'browseros',
@@ -202,32 +161,13 @@ describe('resolveSidepanelChatTarget', () => {
  })

  it('falls back to the current default LLM provider when a persisted ACP target is stale', () => {
-    const targets = buildSidepanelChatTargets({
-      providers,
-      adapters,
-      agents: [],
-    })
+    const targets = buildSidepanelChatTargets({ providers, adapters: [] })

    expect(
      resolveSidepanelChatTarget({
        targets,
        defaultProviderId: 'anthropic-sonnet',
-        selection: { kind: 'acp', id: 'agent-codex' },
-      }),
-    ).toMatchObject({
-      kind: 'llm',
-      id: 'anthropic-sonnet',
-    })
-  })
-
-  it('falls back when an old catalog-style ACP target id is persisted', () => {
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-
-    expect(
-      resolveSidepanelChatTarget({
-        targets,
-        defaultProviderId: 'anthropic-sonnet',
-        selection: { kind: 'acp', id: 'acp:codex:gpt-5.5:medium' },
+        selection: { kind: 'acp', id: 'acp:claude:haiku:medium' },
      }),
    ).toMatchObject({
      kind: 'llm',
@@ -240,8 +180,10 @@ describe('persistSidepanelChatTargetSelection', () => {
  it('stores only target identity and does not mutate LLM provider arrays', async () => {
    let savedSelection: SidepanelChatTargetSelection | null = null
    const originalProviders = providers.map((provider) => ({ ...provider }))
-    const targets = buildSidepanelChatTargets({ providers, adapters, agents })
-    const target = targets.find((candidate) => candidate.id === 'agent-codex')
+    const targets = buildSidepanelChatTargets({ providers, adapters })
+    const target = targets.find(
+      (candidate) => candidate.id === 'acp:codex:gpt-5.5:medium',
+    )

    await persistSidepanelChatTargetSelection(target, {
      setValue: async (value) => {
@@ -251,7 +193,7 @@ describe('persistSidepanelChatTargetSelection', () => {

    expect(savedSelection as SidepanelChatTargetSelection | null).toEqual({
      kind: 'acp',
-      id: 'agent-codex',
+      id: 'acp:codex:gpt-5.5:medium',
    })
    expect(providers).toEqual(originalProviders)
  })
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/sidepanel-chat-targets.ts
@@ -1,6 +1,5 @@
 import type {
  HarnessAdapterDescriptor,
-  HarnessAgent,
  HarnessAgentAdapter,
 } from '@/entrypoints/app/agents/agent-harness-types'
 import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
@@ -20,7 +19,6 @@ export type SidepanelChatTarget =
      id: string
      name: string
      type: 'acp'
-      agentId: string
      adapter: HarnessAgentAdapter
      adapterName: string
      modelId: string
@@ -39,7 +37,6 @@ export type SidepanelChatTargetSelection = Pick<
 interface BuildSidepanelChatTargetsInput {
  providers: LlmProviderConfig[]
  adapters: HarnessAdapterDescriptor[]
-  agents?: HarnessAgent[]
 }

 interface ResolveSidepanelChatTargetInput {
@@ -66,49 +63,61 @@ let sidepanelChatTargetSelectionStorage:
 export function buildSidepanelChatTargets({
  providers,
  adapters,
-  agents = [],
 }: BuildSidepanelChatTargetsInput): SidepanelChatTarget[] {
  return [
    ...providers.map(toLlmTarget),
-    ...agents.map((agent) => toAcpTargetForAgent(agent, adapters)),
+    ...adapters.flatMap(toAcpTargetsForAdapter),
  ]
 }

-function toAcpTargetForAgent(
-  agent: HarnessAgent,
-  adapters: HarnessAdapterDescriptor[],
-): SidepanelChatTarget {
-  const adapter = adapters.find((entry) => entry.id === agent.adapter)
-  const modelId = agent.modelId ?? adapter?.defaultModelId ?? 'default'
-  const reasoningEffort =
-    agent.reasoningEffort ?? adapter?.defaultReasoningEffort ?? 'medium'
-  const model = adapter?.models.find((entry) => entry.id === modelId)
-  const reasoning = adapter?.reasoningEfforts.find(
-    (effort) => effort.id === reasoningEffort,
+function toAcpTargetsForAdapter(
+  adapter: HarnessAdapterDescriptor,
+): SidepanelChatTarget[] {
+  const reasoning = adapter.reasoningEfforts.find(
+    (effort) => effort.id === adapter.defaultReasoningEffort,
  )
+  const reasoningEffort =
+    reasoning?.id ?? adapter.defaultReasoningEffort ?? 'medium'

-  return {
-    kind: 'acp',
-    id: agent.id,
-    name: agent.name,
-    type: 'acp',
-    agentId: agent.id,
-    adapter: agent.adapter,
-    adapterName: adapter?.name ?? formatAdapterName(agent.adapter),
-    modelId,
-    modelLabel: model?.label ?? modelId,
-    modelControl: adapter?.modelControl ?? 'best-effort',
-    recommended: model?.recommended,
+  // Adapters with no per-session model picker (e.g. OpenClaw, whose
+  // model lives on the gateway-side agent record) still need exactly
+  // one sidepanel target so the user can pick the adapter at all.
+  if (adapter.models.length === 0) {
+    return [
+      {
+        kind: 'acp',
+        id: buildAcpTargetId(
+          adapter.id,
+          adapter.defaultModelId,
+          reasoningEffort,
+        ),
+        name: adapter.name,
+        type: 'acp',
+        adapter: adapter.id,
+        adapterName: adapter.name,
+        modelId: adapter.defaultModelId,
+        modelLabel: 'default',
+        modelControl: adapter.modelControl,
+        reasoningEffort,
+        reasoningEffortLabel: reasoning?.label,
+      },
+    ]
+  }
+
+  return adapter.models.map((model) => ({
+    kind: 'acp' as const,
+    id: buildAcpTargetId(adapter.id, model.id, reasoningEffort),
+    name: `${adapter.name} ${model.label}`,
+    type: 'acp' as const,
+    adapter: adapter.id,
+    adapterName: adapter.name,
+    modelId: model.id,
+    modelLabel: model.label,
+    modelControl: adapter.modelControl,
+    recommended: model.recommended,
    reasoningEffort,
    reasoningEffortLabel: reasoning?.label,
-  }
-}
-
-function formatAdapterName(adapter: HarnessAgentAdapter): string {
-  if (adapter === 'claude') return 'Claude Code'
-  if (adapter === 'codex') return 'Codex'
-  if (adapter === 'openclaw') return 'OpenClaw'
-  return adapter
+  }))
 }

 export function resolveSidepanelChatTarget({
@@ -163,6 +172,14 @@ function toLlmTarget(provider: LlmProviderConfig): SidepanelChatTarget {
  }
 }

+export function buildAcpTargetId(
+  adapter: HarnessAgentAdapter,
+  modelId: string,
+  reasoningEffort: string,
+): string {
+  return `acp:${adapter}:${modelId}:${reasoningEffort}`
+}
+
 async function getSidepanelChatTargetSelectionStorage(): Promise<SidepanelChatTargetSelectionStore> {
  if (sidepanelChatTargetSelectionStorage) {
    return sidepanelChatTargetSelectionStorage
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatRefs.ts
@@ -1,9 +1,6 @@
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
 import useDeepCompareEffect from 'use-deep-compare-effect'
-import {
-  useAgentAdapters,
-  useHarnessAgents,
-} from '@/entrypoints/app/agents/useAgents'
+import { useAgentAdapters } from '@/entrypoints/app/agents/useAgents'
 import type { LlmProviderConfig } from '@/lib/llm-providers/types'
 import { useLlmProviders } from '@/lib/llm-providers/useLlmProviders'
 import { type McpServer, useMcpServers } from '@/lib/mcp/mcpServerStorage'
@@ -41,7 +38,6 @@ export const useChatRefs = () => {
    isLoading: isLoadingProviders,
  } = useLlmProviders()
  const { adapters, loading: isLoadingAdapters } = useAgentAdapters()
-  const { harnessAgents, loading: isLoadingAgents } = useHarnessAgents()
  const { personalization } = usePersonalization()
  const [targetSelection, setTargetSelection] =
    useState<SidepanelChatTargetSelection | null>(null)
@@ -61,9 +57,8 @@ export const useChatRefs = () => {
      buildSidepanelChatTargets({
        providers: llmProviders,
        adapters,
-        agents: harnessAgents,
      }),
-    [llmProviders, adapters, harnessAgents],
+    [llmProviders, adapters],
  )

  const selectedChatTarget = useMemo(
@@ -121,7 +116,6 @@ export const useChatRefs = () => {
    selectedChatTarget,
    selectChatTarget,
    selectedLlmProvider,
-    isLoadingProviders:
-      isLoadingProviders || isLoadingAdapters || isLoadingAgents,
+    isLoadingProviders: isLoadingProviders || isLoadingAdapters,
  }
 }
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.test.ts
@@ -40,7 +40,7 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
    })
  })

-  it('sends created-agent targets to the agent-id sidepanel route', () => {
+  it('sends ACP targets to the sidepanel ACP route with explicit target fields', () => {
    const request = buildSidepanelPreparedSendMessagesRequest({
      agentServerUrl: 'http://127.0.0.1:5151',
      target: acpTarget,
@@ -52,11 +52,12 @@ describe('buildSidepanelPreparedSendMessagesRequest', () => {
      ...commonRequestInput(),
    })

-    expect(request.api).toBe(
-      'http://127.0.0.1:5151/agents/agent-codex/sidepanel/chat',
-    )
+    expect(request.api).toBe('http://127.0.0.1:5151/agents/sidepanel/chat')
    expect(request.body).toEqual({
      conversationId,
+      adapter: 'codex',
+      modelId: 'gpt-5.5',
+      reasoningEffort: 'medium',
      message: 'Inspect the current tab',
      browserContext: {
        activeTab: { id: 10, url: 'https://example.com', title: 'Example' },
@@ -139,10 +140,9 @@ const llmTarget: SidepanelChatTarget = {

 const acpTarget: SidepanelChatTarget = {
  kind: 'acp',
-  id: 'agent-codex',
-  name: 'Review bot',
+  id: 'acp:codex:gpt-5.5:medium',
+  name: 'Codex GPT-5.5',
  type: 'acp',
-  agentId: 'agent-codex',
  adapter: 'codex',
  adapterName: 'Codex',
  modelId: 'gpt-5.5',
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSession.ts
@@ -680,20 +680,13 @@ export const useChatSession = (options?: ChatSessionOptions) => {
  const sendMessage = (params: { text: string; action?: ChatAction }) => {
    const target = selectedChatTargetRef.current
    const llmTargetProvider = toLlmProviderConfig(target)
-    const agentTarget = target?.kind === 'acp' ? target : undefined
    track(MESSAGE_SENT_EVENT, {
      mode,
-      provider_id:
-        agentTarget?.agentId ??
-        llmTargetProvider?.id ??
-        selectedLlmProvider?.id,
-      provider_type: agentTarget ? 'acp' : llmTargetProvider?.type,
-      agent_id: agentTarget?.agentId,
-      adapter: agentTarget?.adapter,
+      provider_type: target?.kind === 'acp' ? 'acp' : llmTargetProvider?.type,
      model:
-        agentTarget?.modelId ??
-        llmTargetProvider?.modelId ??
-        selectedLlmProvider?.modelId,
+        target?.kind === 'acp'
+          ? target.modelId
+          : llmTargetProvider?.modelId || selectedLlmProvider?.modelId,
    })

    if (!isIntegrationsSyncedRef.current) {
@@ -770,8 +763,6 @@ export const useChatSession = (options?: ChatSessionOptions) => {
      provider_type: target.kind === 'acp' ? 'acp' : target.type,
      model_id:
        target.kind === 'acp' ? target.modelId : target.provider.modelId,
-      agent_id: target.kind === 'acp' ? target.agentId : undefined,
-      adapter: target.kind === 'acp' ? target.adapter : undefined,
    })

    void selectChatTarget(target).catch((error) => {
--- a/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
+++ b/packages/browseros-agent/apps/agent/entrypoints/sidepanel/index/useChatSessionRequest.ts
@@ -34,10 +34,15 @@ export function buildSidepanelPreparedSendMessagesRequest({
  ...common
 }: BuildSidepanelPreparedSendMessagesRequestInput) {
  if (target?.kind === 'acp') {
+    // ACP session history is owned by AcpxRuntime through sessionKey, so LLM-only
+    // resume and approval fields are intentionally not forwarded.
    return {
-      api: `${agentServerUrl}/agents/${encodeURIComponent(target.agentId)}/sidepanel/chat`,
+      api: `${agentServerUrl}/agents/sidepanel/chat`,
      body: {
        conversationId: common.conversationId,
+        adapter: target.adapter,
+        modelId: target.modelId,
+        reasoningEffort: target.reasoningEffort,
        message: message ?? '',
        browserContext: common.browserContext,
        userSystemPrompt: common.userSystemPrompt,
@@ -66,9 +71,6 @@ export function toProviderOption(target: SidepanelChatTarget): Provider {
    name: target.name,
    type: target.type,
    kind: target.kind,
-    agentId: target.kind === 'acp' ? target.agentId : undefined,
-    adapterName: target.kind === 'acp' ? target.adapterName : undefined,
-    modelLabel: target.kind === 'acp' ? target.modelLabel : undefined,
    modelControl: target.kind === 'acp' ? target.modelControl : undefined,
  }
 }
--- a/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
+++ b/packages/browseros-agent/apps/agent/lib/agent-conversations/types.ts
@@ -59,3 +59,15 @@ export interface AgentConversation {
  createdAt: number
  updatedAt: number
 }
+
+export interface AgentCardData {
+  agentId: string
+  name: string
+  model?: string
+  status: 'idle' | 'working' | 'error'
+  lastMessage?: string
+  lastMessageTimestamp?: number
+  activitySummary?: string
+  currentTool?: string
+  costUsd?: number
+}
--- a/packages/browseros-agent/apps/agent/package.json
+++ b/packages/browseros-agent/apps/agent/package.json
@@ -9,7 +9,6 @@
    "build": "bun run codegen && wxt build",
    "build:dev": "bun --env-file=.env.development wxt build --mode development",
    "zip": "wxt zip",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
    "compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
    "lint": "bunx biome check",
    "typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
--- a/packages/browseros-agent/apps/eval/.env.example
+++ b/packages/browseros-agent/apps/eval/.env.example
@@ -1,51 +0,0 @@
-# Copy to .env.development for local eval runs.
-
-# Provider keys used by existing config files.
-OPENROUTER_API_KEY=
-FIREWORKS_API_KEY=
-ANTHROPIC_API_KEY=
-OPENAI_API_KEY=
-GOOGLE_GENERATIVE_AI_API_KEY=
-
-# Claude Agent SDK token used by performance_grader.
-CLAUDE_CODE_OAUTH_TOKEN=
-
-# Suite-mode model selection.
-EVAL_VARIANT=local
-EVAL_AGENT_PROVIDER=openai-compatible
-EVAL_AGENT_MODEL=
-EVAL_AGENT_API_KEY=
-EVAL_AGENT_BASE_URL=
-EVAL_AGENT_SUPPORTS_IMAGES=true
-
-# Optional suite-mode executor override for orchestrator suites.
-EVAL_EXECUTOR_MODEL=
-EVAL_EXECUTOR_API_KEY=
-EVAL_EXECUTOR_BASE_URL=
-
-# Clado visual action executor.
-CLADO_ACTION_MODEL=
-CLADO_ACTION_API_KEY=
-CLADO_ACTION_BASE_URL=
-# Backward-compatible alias used by older local scripts.
-CLADO_ACTION_URL=
-
-# BrowserOS runner.
-BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
-BROWSEROS_SERVER_URL=http://127.0.0.1:9110
-BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
-BROWSEROS_CONFIG_URL=
-
-# Captcha solver extension.
-NOPECHA_API_KEY=
-
-# WebArena-Infinity.
-WEBARENA_INFINITY_DIR=
-INFINITY_APP_URL=
-
-# R2 publishing and weekly report.
-EVAL_R2_ACCOUNT_ID=
-EVAL_R2_ACCESS_KEY_ID=
-EVAL_R2_SECRET_ACCESS_KEY=
-EVAL_R2_BUCKET=browseros-eval
-EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
--- a/packages/browseros-agent/apps/eval/README.md
+++ b/packages/browseros-agent/apps/eval/README.md
@@ -14,7 +14,6 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st

 ```bash
 cd apps/eval
-cp .env.example .env.development
 # Edit .env.development with your keys, then:
 bun run eval
 ```
@@ -24,55 +23,11 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
 ### CLI mode

 ```bash
-bun run eval -c configs/legacy/browseros-agent-weekly.json
-bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
+bun run eval -c configs/browseros-agent-weekly.json
 ```

 Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.

-The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
-
-```bash
-bun run eval run --config configs/legacy/browseros-agent-weekly.json
-bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
-bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
-bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
-```
-
-Config files live in two groups:
-
-```txt
-configs/legacy/  # Complete EvalConfig files used by older workflows and the dashboard
-configs/suites/  # Suite definitions; model/provider comes from CLI flags or env
-```
-
-Suite mode takes model settings from CLI flags first, then env:
-
-```bash
-EVAL_VARIANT=kimi-fireworks \
-EVAL_AGENT_PROVIDER=openai-compatible \
-EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
-EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
-EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
-bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
-```
-
-### Suites and variants
-
-A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
-
-A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
-
-This lets us run the same suite against multiple model setups without copying the benchmark config:
-
-```txt
-agisdk-daily-10 + kimi-fireworks
-agisdk-daily-10 + claude-sonnet
-agisdk-daily-10 + clado-action-000159
-```
-
-For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
-
 ## Agent types

 | Type | Description |
@@ -141,20 +96,6 @@ The `apiKey` field supports two formats:
 - **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
 - **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)

-### Environment variables
-
-| Variable | Used for |
-|----------|----------|
-| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
-| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
-| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
-| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
-| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
-| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
-| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
-| `NOPECHA_API_KEY` | CAPTCHA solver extension |
-| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
-
 ### Supported providers

 | Provider | `provider` value | Requires `baseUrl` |
@@ -169,22 +110,6 @@ The `apiKey` field supports two formats:
 | Ollama | `ollama` | No |
 | Clado Action (executor only) | `clado-action` | Yes |

-### R2 publishing
-
-`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
-
-```bash
-export EVAL_R2_ACCOUNT_ID=...
-export EVAL_R2_ACCESS_KEY_ID=...
-export EVAL_R2_SECRET_ACCESS_KEY=...
-export EVAL_R2_BUCKET=browseros-eval
-export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
-```
-
-`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
-
-Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
-
 ### BrowserOS infrastructure

 ```json
@@ -212,12 +137,10 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP

 | File | Tasks | Description |
 |------|-------|-------------|
-| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
 | `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
 | `mind2web.jsonl` | 300 | Online-Mind2Web |
 | `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
-| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
-| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
+| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
 | `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
 | `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
 | `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
@@ -244,47 +167,14 @@ results/
  browseros-agent-weekly/
    2026-04-29-1430/
      Amazon--0/
-        attempt.json          # Stable attempt summary for viewer/reporting
        metadata.json         # Task result, timing, grader scores
-        grades.json           # Compact grader results
        messages.jsonl         # Full message log
-        grader-artifacts/      # Grader-specific inputs/outputs/stderr
        screenshots/
          001.png              # Step-by-step screenshots
          002.png
      summary.json             # Aggregate pass rates
 ```

-R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
-
-### R2 viewer manifest
-
-`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
-
-```json
-{
-  "schemaVersion": 2,
-  "runId": "agisdk-real-smoke-2026-04-30-0000",
-  "tasks": [
-    {
-      "queryId": "agisdk-dashdish-10",
-      "paths": {
-        "metadata": "tasks/agisdk-dashdish-10/metadata.json",
-        "messages": "tasks/agisdk-dashdish-10/messages.jsonl",
-        "grades": "tasks/agisdk-dashdish-10/grades.json",
-        "trace": "tasks/agisdk-dashdish-10/trace.jsonl",
-        "screenshots": "tasks/agisdk-dashdish-10/screenshots",
-        "graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
-      }
-    }
-  ]
-}
-```
-
-The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
-
-Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
-
 ## Troubleshooting

 **BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://api.fireworks.ai/inference/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/agisdk-real.jsonl",
+  "dataset": "../data/agisdk-real.jsonl",
  "num_workers": 4,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webbench-2of4-50.jsonl",
+  "dataset": "../data/webbench-2of4-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-agent-weekly.json
@@ -14,7 +14,7 @@
      "baseUrl": "https://api.fireworks.ai/inference/v1"
    }
  },
-  "dataset": "../../data/webbench-2of4-50.jsonl",
+  "dataset": "../data/webbench-2of4-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
@@ -14,7 +14,7 @@
      "baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
    }
  },
-  "dataset": "../../data/agisdk-real.jsonl",
+  "dataset": "../data/agisdk-real.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/infinity-hard-50.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webarena-infinity-hard-50.jsonl",
+  "dataset": "../data/webarena-infinity-hard-50.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
@@ -1,26 +0,0 @@
-{
-  "agent": {
-    "type": "single",
-    "provider": "openai-compatible",
-    "model": "moonshotai/kimi-k2.5",
-    "apiKey": "OPENROUTER_API_KEY",
-    "baseUrl": "https://openrouter.ai/api/v1",
-    "supportsImages": true
-  },
-  "dataset": "../../data/agisdk-real-smoke.jsonl",
-  "num_workers": 1,
-  "restart_server_per_task": true,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  },
-  "graders": ["agisdk_state_diff"],
-  "timeout_ms": 1800000
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-daily-10",
-  "dataset": "../../data/agisdk-daily-10.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": true
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-real-smoke",
-  "dataset": "../../data/agisdk-real-smoke.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
@@ -1,22 +0,0 @@
-{
-  "id": "agisdk-real",
-  "dataset": "../../data/agisdk-real.jsonl",
-  "agent": {
-    "type": "single"
-  },
-  "graders": ["agisdk_state_diff"],
-  "workers": 1,
-  "restartBrowserPerTask": true,
-  "timeoutMs": 1800000,
-  "browseros": {
-    "server_url": "http://127.0.0.1:9110",
-    "base_cdp_port": 9010,
-    "base_server_port": 9110,
-    "base_extension_port": 9310,
-    "load_extensions": false,
-    "headless": false
-  },
-  "captcha": {
-    "api_key_env": "NOPECHA_API_KEY"
-  }
-}
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-mind2web.json
@@ -5,7 +5,7 @@
    "model": "openai/gpt-4.1",
    "apiKey": "OPENROUTER_API_KEY"
  },
-  "dataset": "../../data/mind2web.jsonl",
+  "dataset": "../data/mind2web.jsonl",
  "num_workers": 5,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/test-webvoyager.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://api.fireworks.ai/inference/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webvoyager.jsonl",
+  "dataset": "../data/webvoyager.jsonl",
  "num_workers": 3,
  "restart_server_per_task": true,
  "browseros": {
--- a/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
@@ -1,10 +0,0 @@
-{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
-{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
-{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
-{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
-{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
-{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
-{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
-{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}
--- a/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
+++ b/packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
@@ -1 +0,0 @@
-{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
--- a/packages/browseros-agent/apps/eval/package.json
+++ b/packages/browseros-agent/apps/eval/package.json
@@ -5,7 +5,6 @@
  "type": "module",
  "scripts": {
    "eval": "bun --env-file=.env.development run src/index.ts",
-    "test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
--- a/packages/browseros-agent/apps/eval/src/graders/python/agisdk-evaluate.py
+++ b/packages/browseros-agent/apps/eval/src/graders/python/agisdk-evaluate.py
@@ -81,13 +81,30 @@ def main():

        reward_val = float(reward_val) if reward_val is not None else 0.0
        results = info.get("results", [])
+        # `info["results"]` aligns 1:1 with `tc.task.evals` — zip them so we can
+        # surface the human-readable description and JMESPath query alongside
+        # the pass/fail. Without this the only feedback was a stringified dict.
+        evals = list(getattr(tc.task, "evals", []))

        per_criterion = []
        softened_count = 0
-        for r in results:
+        for idx, r in enumerate(results):
            passed = bool(r[0])
-            detail = r[1] if len(r) > 1 else ""
-            entry: dict = {"passed": passed, "detail": str(detail)}
+            detail = r[1] if len(r) > 1 else {}
+            ev = evals[idx] if idx < len(evals) else None
+
+            actual_value = expected_value = None
+            if isinstance(detail, dict):
+                actual_value = detail.get("actual_value")
+                expected_value = detail.get("expected_value")
+
+            entry: dict = {
+                "passed": passed,
+                "description": getattr(ev, "description", "") or "",
+                "query": getattr(ev, "query", "") or "",
+                "expected_value": expected_value,
+                "actual_value": actual_value,
+            }
            if not _STRICT and not passed and _soft_string_match(detail):
                entry["passed"] = True
                entry["softened"] = True
@@ -100,9 +117,43 @@ def main():
        if all_pass and reward_val != 1.0:
            reward_val = 1.0

-        out_message = str(message)
-        if softened_count and all_pass:
-            out_message = f"Task passed (with {softened_count} softened string criterion/criteria)."
+        # Build a useful message: list every criterion with a pass/fail icon
+        # so the viewer's grader pill shows the full check-list, not just
+        # failures. This becomes the `reasoning` shown in the viewer.
+        if not per_criterion:
+            # Defensive: agisdk returned no criteria — fall back to its message.
+            out_message = str(message)
+        else:
+            failures = [c for c in per_criterion if not c["passed"]]
+            if all_pass:
+                header = (
+                    f"All {len(per_criterion)} criteria passed"
+                    + (
+                        f" ({softened_count} softened)."
+                        if softened_count
+                        else "."
+                    )
+                )
+            else:
+                header = (
+                    f"{len(failures)} of {len(per_criterion)} criteria failed:"
+                )
+
+            lines = []
+            for c in per_criterion:
+                icon = "✓" if c["passed"] else "✗"
+                desc = c["description"] or c["query"] or "<unknown>"
+                soft = " (softened)" if c.get("softened") else ""
+                if c["passed"]:
+                    lines.append(f"{icon} {desc}{soft}")
+                else:
+                    exp_s = repr(c["expected_value"])
+                    act_s = repr(c["actual_value"])
+                    lines.append(
+                        f"{icon} {desc}: expected {exp_s}, got {act_s}"
+                    )
+
+            out_message = header + "\n" + "\n".join(lines)

        print(
            json.dumps(
--- a/packages/browseros-agent/apps/eval/src/graders/python/infinity-evaluate.py
+++ b/packages/browseros-agent/apps/eval/src/graders/python/infinity-evaluate.py
--- a/packages/browseros-agent/apps/eval/scripts/upload-run.ts
+++ b/packages/browseros-agent/apps/eval/scripts/upload-run.ts
@@ -1,43 +1,349 @@
-#!/usr/bin/env bun
-
 /**
 * Upload eval runs to R2.
 *
 * Two modes:
 *   bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
+ *       → uploads that specific run
+ *
 *   bun scripts/upload-run.ts results/browseros-agent-weekly
+ *       → finds all timestamped subfolders, uploads any not yet in R2
+ *
+ * Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
+ *           EVAL_R2_BUCKET (default: browseros-eval)
+ *           EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
 */

+import { readdir, readFile, stat } from 'node:fs/promises'
+import { basename, dirname, extname, join } from 'node:path'
 import {
-  loadR2ConfigFromEnv,
-  R2Publisher,
-} from '../src/publishing/r2-publisher'
+  GetObjectCommand,
+  PutObjectCommand,
+  S3Client,
+} from '@aws-sdk/client-s3'

-async function main(): Promise<void> {
-  const inputDir = process.argv[2]
-  if (!inputDir) {
-    throw new Error(
-      'Usage:\n' +
-        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
-        '  bun scripts/upload-run.ts results/config-name',
+const CONCURRENCY = 20
+
+const CONTENT_TYPES: Record<string, string> = {
+  '.json': 'application/json',
+  '.jsonl': 'application/x-ndjson',
+  '.png': 'image/png',
+}
+
+interface R2Config {
+  accountId: string
+  accessKeyId: string
+  secretAccessKey: string
+  bucket: string
+  cdnBaseUrl: string
+}
+
+function loadConfig(): R2Config {
+  const accountId = process.env.EVAL_R2_ACCOUNT_ID
+  const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
+  const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
+
+  if (!accountId || !accessKeyId || !secretAccessKey) {
+    console.error(
+      'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
    )
+    process.exit(1)
  }

-  const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
-  const result = await publisher.publishPath(inputDir)
-  for (const run of result.uploadedRuns) {
-    console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
-    console.log(run.viewerUrl)
+  return {
+    accountId,
+    accessKeyId,
+    secretAccessKey,
+    bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
+    cdnBaseUrl: (
+      process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
+    ).replace(/\/+$/, ''),
  }
-  for (const runId of result.skippedRuns) {
-    console.log(`${runId}: already uploaded, skipping`)
-  }
-  console.log(
-    `Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
+}
+
+function createClient(config: R2Config): S3Client {
+  return new S3Client({
+    region: 'auto',
+    endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
+    credentials: {
+      accessKeyId: config.accessKeyId,
+      secretAccessKey: config.secretAccessKey,
+    },
+  })
+}
+
+async function upload(
+  client: S3Client,
+  bucket: string,
+  key: string,
+  body: Buffer,
+  contentType: string,
+) {
+  await client.send(
+    new PutObjectCommand({
+      Bucket: bucket,
+      Key: key,
+      Body: body,
+      ContentType: contentType,
+    }),
  )
 }

-main().catch((error) => {
-  console.error(error instanceof Error ? error.message : String(error))
-  process.exit(1)
-})
+async function collectFiles(dir: string): Promise<string[]> {
+  const files: string[] = []
+  const entries = await readdir(dir, { withFileTypes: true })
+  for (const entry of entries) {
+    const full = join(dir, entry.name)
+    if (entry.isDirectory()) {
+      files.push(...(await collectFiles(full)))
+    } else {
+      files.push(full)
+    }
+  }
+  return files
+}
+
+async function runPool<T>(
+  items: T[],
+  concurrency: number,
+  fn: (item: T) => Promise<void>,
+) {
+  let i = 0
+  const workers = Array.from({ length: concurrency }, async () => {
+    while (i < items.length) {
+      const idx = i++
+      await fn(items[idx])
+    }
+  })
+  await Promise.all(workers)
+}
+
+// Check if a run has already been uploaded to R2
+async function isUploaded(
+  client: S3Client,
+  bucket: string,
+  runId: string,
+): Promise<boolean> {
+  try {
+    await client.send(
+      new GetObjectCommand({
+        Bucket: bucket,
+        Key: `runs/${runId}/manifest.json`,
+      }),
+    )
+    return true
+  } catch {
+    return false
+  }
+}
+
+// Detect if a directory is a run dir (has task subdirs with metadata.json)
+// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
+async function isRunDir(dir: string): Promise<boolean> {
+  const entries = await readdir(dir, { withFileTypes: true })
+  const subdirs = entries.filter((e) => e.isDirectory())
+  for (const subdir of subdirs) {
+    const metaPath = join(dir, subdir.name, 'metadata.json')
+    const metaStat = await stat(metaPath).catch(() => null)
+    if (metaStat?.isFile()) return true
+  }
+  return false
+}
+
+async function uploadSingleRun(
+  runDir: string,
+  runId: string,
+  r2Config: R2Config,
+  client: S3Client,
+): Promise<void> {
+  const taskDirs = await readdir(runDir, { withFileTypes: true })
+  const taskEntries = taskDirs.filter((d) => d.isDirectory())
+
+  if (taskEntries.length === 0) {
+    console.warn(`  No task subdirectories in ${runId}, skipping`)
+    return
+  }
+
+  const manifestTasks: Record<string, unknown>[] = []
+  const jobs: { key: string; filePath: string; contentType: string }[] = []
+
+  // Extract agent config from first task
+  let agentConfig: Record<string, unknown> | undefined
+  let dataset: string | undefined
+
+  for (const taskDir of taskEntries) {
+    const taskId = taskDir.name
+    const taskPath = join(runDir, taskId)
+    const metaPath = join(taskPath, 'metadata.json')
+
+    let meta: Record<string, unknown> = {}
+    try {
+      meta = JSON.parse(await readFile(metaPath, 'utf-8'))
+    } catch {
+      continue
+    }
+
+    if (!agentConfig && meta.agent_config)
+      agentConfig = meta.agent_config as Record<string, unknown>
+    if (!dataset && meta.dataset) dataset = meta.dataset as string
+
+    const files = await collectFiles(taskPath)
+    let screenshotCount = 0
+
+    for (const file of files) {
+      const relative = file.slice(taskPath.length + 1)
+      const ext = extname(file)
+      if (relative.startsWith('screenshots/') && ext === '.png')
+        screenshotCount++
+
+      jobs.push({
+        key: `runs/${runId}/${taskId}/${relative}`,
+        filePath: file,
+        contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
+      })
+    }
+
+    manifestTasks.push({
+      queryId: meta.query_id || taskId,
+      query: meta.query || '',
+      startUrl: meta.start_url || '',
+      status:
+        meta.termination_reason === 'completed'
+          ? 'completed'
+          : meta.termination_reason || 'unknown',
+      durationMs: meta.total_duration_ms || 0,
+      screenshotCount: (meta.screenshot_count as number) || screenshotCount,
+      graderResults: meta.grader_results || {},
+    })
+  }
+
+  if (manifestTasks.length === 0) {
+    console.warn(`  No completed tasks in ${runId}, skipping`)
+    return
+  }
+
+  console.log(
+    `  Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
+  )
+
+  let uploaded = 0
+  await runPool(jobs, CONCURRENCY, async (job) => {
+    const body = await readFile(job.filePath)
+    await upload(client, r2Config.bucket, job.key, body, job.contentType)
+    uploaded++
+    if (uploaded % 50 === 0 || uploaded === jobs.length) {
+      console.log(`    ${uploaded}/${jobs.length}`)
+    }
+  })
+
+  // Read summary.json if it exists
+  let summaryData: Record<string, unknown> | undefined
+  try {
+    summaryData = JSON.parse(
+      await readFile(join(runDir, 'summary.json'), 'utf-8'),
+    )
+  } catch {}
+
+  // Upload manifest
+  const manifest = {
+    runId,
+    uploadedAt: new Date().toISOString(),
+    agentConfig,
+    dataset,
+    summary: summaryData
+      ? {
+          passRate: summaryData.passRate,
+          avgDurationMs: summaryData.avgDurationMs,
+        }
+      : undefined,
+    tasks: manifestTasks,
+  }
+  const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
+  await upload(
+    client,
+    r2Config.bucket,
+    `runs/${runId}/manifest.json`,
+    manifestBody,
+    'application/json',
+  )
+
+  // Upload viewer.html to bucket root
+  const viewerPath = join(
+    import.meta.dir,
+    '..',
+    'src',
+    'dashboard',
+    'viewer.html',
+  )
+  const viewerBody = await readFile(viewerPath)
+  await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
+
+  console.log(`  Uploaded ${uploaded + 2} files`)
+  console.log(`  ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
+}
+
+async function main() {
+  const inputDir = process.argv[2]
+  if (!inputDir) {
+    console.error(
+      'Usage:\n' +
+        '  bun scripts/upload-run.ts results/config-name/2026-03-21-1730  (specific run)\n' +
+        '  bun scripts/upload-run.ts results/config-name                   (all un-uploaded runs)',
+    )
+    process.exit(1)
+  }
+
+  const dirStat = await stat(inputDir).catch(() => null)
+  if (!dirStat?.isDirectory()) {
+    console.error(`Not a directory: ${inputDir}`)
+    process.exit(1)
+  }
+
+  const r2Config = loadConfig()
+  const client = createClient(r2Config)
+
+  if (await isRunDir(inputDir)) {
+    // Single run: results/config-name/2026-03-21-1730
+    const timestamp = basename(inputDir)
+    const configName = basename(dirname(inputDir))
+    const runId = `${configName}-${timestamp}`
+    console.log(`Uploading run: ${runId}`)
+    await uploadSingleRun(inputDir, runId, r2Config, client)
+  } else {
+    // Config dir: results/config-name/ — upload all un-uploaded runs
+    const configName = basename(inputDir)
+    const entries = await readdir(inputDir, { withFileTypes: true })
+    const runDirs = entries
+      .filter((e) => e.isDirectory())
+      .map((e) => e.name)
+      .sort()
+
+    if (runDirs.length === 0) {
+      console.error('No run subdirectories found')
+      process.exit(1)
+    }
+
+    console.log(
+      `Found ${runDirs.length} runs for config "${configName}", checking R2...`,
+    )
+
+    let uploadedCount = 0
+    for (const dir of runDirs) {
+      const runId = `${configName}-${dir}`
+      const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
+      if (alreadyUploaded) {
+        console.log(`  ${runId}: already uploaded, skipping`)
+        continue
+      }
+
+      console.log(`  ${runId}: uploading...`)
+      await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
+      uploadedCount++
+    }
+
+    console.log(
+      `\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
+    )
+  }
+}
+
+main()
--- a/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
+++ b/packages/browseros-agent/apps/eval/scripts/weekly-report.ts
@@ -24,11 +24,45 @@ import {
  PutObjectCommand,
  S3Client,
 } from '@aws-sdk/client-s3'
-import {
-  buildRunSummaries,
-  type ReportManifest,
-  type RunSummary,
-} from '../src/reporting/run-summary'
+
+interface ManifestTask {
+  queryId: string
+  query: string
+  status: string
+  durationMs: number
+  screenshotCount: number
+  graderResults: Record<string, { pass: boolean; score: number }>
+}
+
+interface Manifest {
+  runId: string
+  uploadedAt: string
+  agentConfig?: { type?: string; model?: string }
+  dataset?: string
+  summary?: { passRate?: number; avgDurationMs?: number }
+  tasks: ManifestTask[]
+}
+
+interface RunSummary {
+  runId: string
+  configName: string
+  date: string
+  avgScore: number
+  total: number
+  completed: number
+  failed: number
+  timeout: number
+  avgDurationMs: number
+  model: string
+  dataset: string
+  agentType: string
+}
+
+const PASS_FAIL_GRADER_ORDER = [
+  'agisdk_state_diff',
+  'infinity_state',
+  'performance_grader',
+]

 function requireEnv(name: string): string {
  const value = process.env[name]
@@ -53,7 +87,7 @@ const client = new S3Client({
 // Step 1: List all manifest.json files in runs/
 console.log('Scanning R2 for eval runs...')

-const manifests: ReportManifest[] = []
+const manifests: Manifest[] = []
 let continuationToken: string | undefined

 do {
@@ -93,9 +127,64 @@ if (manifests.length === 0) {
 }

 // Step 2: Build run summaries
-const runs: RunSummary[] = buildRunSummaries(manifests)
+const runs: RunSummary[] = manifests
+  .map((m) => {
+    const total = m.tasks.length
+    const completed = m.tasks.filter((t) => t.status === 'completed').length
+    const failed = m.tasks.filter((t) => t.status === 'failed').length
+    const timeout = m.tasks.filter((t) => t.status === 'timeout').length
+
+    let scoredCount = 0
+    let scoreSum = 0
+    for (const task of m.tasks) {
+      if (!task.graderResults) continue
+      for (const name of PASS_FAIL_GRADER_ORDER) {
+        if (task.graderResults[name]) {
+          scoredCount++
+          scoreSum += task.graderResults[name].score ?? 0
+          break
+        }
+      }
+    }
+
+    const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
+    const durations = m.tasks
+      .filter((t) => t.durationMs > 0)
+      .map((t) => t.durationMs)
+    const avgDurationMs =
+      durations.length > 0
+        ? durations.reduce((a, b) => a + b, 0) / durations.length
+        : 0
+
+    const date = m.uploadedAt
+      ? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
+      : m.runId.slice(0, 15)
+
+    const model = m.agentConfig?.model || 'unknown'
+    const dataset = m.dataset || m.runId
+    const agentType = m.agentConfig?.type || 'unknown'
+
+    const configName = extractConfigName(m.runId)
+    return {
+      runId: m.runId,
+      configName,
+      date,
+      avgScore,
+      total,
+      completed,
+      failed,
+      timeout,
+      avgDurationMs,
+      model,
+      dataset,
+      agentType,
+    }
+  })
+  .sort((a, b) => a.date.localeCompare(b.date))

 // Step 3: Identify unique config groups
+// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
+// Extract config name by stripping the date-time suffix pattern
 function escHtml(s: string): string {
  return s
    .replace(/&/g, '&amp;')
@@ -104,6 +193,12 @@ function escHtml(s: string): string {
    .replace(/"/g, '&quot;')
 }

+function extractConfigName(runId: string): string {
+  // "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
+  // "ci-weekly" → "ci-weekly" (no timestamp, old format)
+  return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
+}
+
 const configGroups = [...new Set(runs.map((r) => r.configName))]
 const defaultConfig = configGroups.includes('ci-weekly')
  ? 'ci-weekly'
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
@@ -1,191 +0,0 @@
-import type {
-  CladoAction,
-  CladoActionResponse,
-  RawCladoActionPayload,
-} from './types'
-
-/** Parses Clado's structured response plus any raw `<answer>` blocks into executable actions. */
-export function parseCladoActions(
-  prediction: CladoActionResponse,
-): CladoAction[] {
-  const actionFromField =
-    typeof prediction.action === 'string' ? prediction.action : null
-
-  const rawActions = parseCladoActionsFromRawResponse(prediction.raw_response)
-  const primaryFromRaw = rawActions[0] ?? null
-  const mergedPrimary = {
-    ...primaryFromRaw,
-    ...prediction,
-    action: actionFromField ?? primaryFromRaw?.action,
-  }
-
-  const normalized: CladoAction[] = []
-  const primary = normalizeCladoActionPayload(mergedPrimary)
-  if (primary) normalized.push(primary)
-
-  for (const candidate of rawActions.slice(1)) {
-    const parsed = normalizeCladoActionPayload(candidate)
-    if (!parsed) continue
-    const prev = normalized[normalized.length - 1]
-    if (
-      !prev ||
-      getCladoActionSignature(prev) !== getCladoActionSignature(parsed)
-    ) {
-      normalized.push(parsed)
-    }
-  }
-
-  return normalized
-}
-
-export function normalizeCladoActionPayload(
-  payload: RawCladoActionPayload,
-): CladoAction | null {
-  if (!payload.action || typeof payload.action !== 'string') {
-    return null
-  }
-  return {
-    action: payload.action,
-    x: typeof payload.x === 'number' ? payload.x : undefined,
-    y: typeof payload.y === 'number' ? payload.y : undefined,
-    text: typeof payload.text === 'string' ? payload.text : undefined,
-    key: typeof payload.key === 'string' ? payload.key : undefined,
-    direction:
-      typeof payload.direction === 'string' ? payload.direction : undefined,
-    startX: typeof payload.startX === 'number' ? payload.startX : undefined,
-    startY: typeof payload.startY === 'number' ? payload.startY : undefined,
-    endX: typeof payload.endX === 'number' ? payload.endX : undefined,
-    endY: typeof payload.endY === 'number' ? payload.endY : undefined,
-    amount: typeof payload.amount === 'number' ? payload.amount : undefined,
-    time: typeof payload.time === 'number' ? payload.time : undefined,
-    final_answer:
-      typeof payload.final_answer === 'string'
-        ? payload.final_answer
-        : undefined,
-  }
-}
-
-export function parseCladoActionsFromRawResponse(
-  rawResponse: string | undefined,
-): RawCladoActionPayload[] {
-  if (!rawResponse) return []
-  const matches = [
-    ...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
-  ]
-  const parsed: RawCladoActionPayload[] = []
-  for (const match of matches) {
-    try {
-      parsed.push(JSON.parse(match[1]) as RawCladoActionPayload)
-    } catch {
-      // Ignore malformed answer blocks so one bad block does not drop the whole prediction.
-    }
-  }
-  return parsed
-}
-
-export function extractCladoThinking(
-  rawResponse: string | undefined,
-): string | undefined {
-  if (!rawResponse) return undefined
-  const matches = [
-    ...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
-  ]
-  if (matches.length === 0) return undefined
-
-  const merged = matches
-    .map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
-    .filter((value) => value.length > 0)
-    .join(' ')
-
-  if (!merged) return undefined
-  return merged
-}
-
-export function summarizeCladoPrediction(
-  prediction: CladoActionResponse,
-): Record<string, unknown> {
-  const preview =
-    typeof prediction.raw_response === 'string' &&
-    prediction.raw_response.length > 0
-      ? prediction.raw_response.slice(0, 240)
-      : undefined
-
-  return {
-    action: prediction.action,
-    x: prediction.x,
-    y: prediction.y,
-    text: prediction.text,
-    key: prediction.key,
-    direction: prediction.direction,
-    startX: prediction.startX,
-    startY: prediction.startY,
-    endX: prediction.endX,
-    endY: prediction.endY,
-    amount: prediction.amount,
-    time: prediction.time,
-    inference_time_seconds: prediction.inference_time_seconds,
-    raw_response_preview: preview,
-  }
-}
-
-export function getCladoActionSignature(action: CladoAction): string {
-  switch (action.action) {
-    case 'click':
-    case 'double_click':
-    case 'right_click':
-    case 'hover':
-      return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
-    case 'type':
-      return `${action.action}:${(action.text ?? '').slice(0, 16)}`
-    case 'press_key':
-      return `${action.action}:${action.key ?? 'key'}`
-    case 'scroll':
-      return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
-    case 'drag':
-      return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
-    case 'wait':
-      return `${action.action}:${action.time ?? 1}`
-    case 'end':
-      return action.final_answer
-        ? `end(${action.final_answer.slice(0, 32)})`
-        : 'end()'
-    case 'invalid':
-      return `invalid(${(action.text ?? '').slice(0, 40)})`
-    default:
-      return action.action
-  }
-}
-
-export function formatCladoHistory(actions: CladoAction[]): string {
-  if (actions.length === 0) return 'None'
-
-  const parts = actions.map((action) => {
-    switch (action.action) {
-      case 'click':
-      case 'double_click':
-      case 'right_click':
-      case 'hover':
-        return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
-      case 'type': {
-        const text = (action.text ?? '').replace(/'/g, "\\'")
-        return `type('${text}')`
-      }
-      case 'press_key':
-        return `press_key('${action.key ?? 'Enter'}')`
-      case 'scroll':
-        return `scroll(${action.direction ?? 'down'})`
-      case 'drag':
-        return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
-      case 'wait':
-        return `wait(${Math.round(action.time ?? 1)}s)`
-      case 'end':
-        return 'end()'
-      case 'invalid':
-        return 'invalid()'
-      default:
-        return action.action
-    }
-  })
-
-  return parts.join(' -> ')
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
@@ -1,123 +0,0 @@
-import {
-  CLADO_PAGE_SCOPED_TOOLS,
-  type CladoActionPoint,
-  type CladoViewport,
-} from './types'
-
-export function clampCladoNormalizedCoordinate(value: number): number {
-  return Math.min(999, Math.max(0, Math.round(value)))
-}
-
-/** Converts Clado's 0-1000 normalized coordinate space into BrowserOS viewport pixels. */
-export function resolveCladoPoint(
-  viewport: CladoViewport,
-  normalizedX: number | undefined,
-  normalizedY: number | undefined,
-): CladoActionPoint {
-  const nx = clampCladoNormalizedCoordinate(normalizedX ?? 500)
-  const ny = clampCladoNormalizedCoordinate(normalizedY ?? 500)
-
-  return {
-    x: Math.round((nx / 1000) * viewport.width),
-    y: Math.round((ny / 1000) * viewport.height),
-  }
-}
-
-/** Adapts Clado action tool arguments to the BrowserOS MCP tool argument contract. */
-export function prepareCladoToolArgs(
-  toolName: string,
-  args: Record<string, unknown>,
-  pageId: number,
-): Record<string, unknown> {
-  const prepared: Record<string, unknown> = { ...args }
-
-  if (
-    toolName === 'evaluate_script' &&
-    typeof prepared.function === 'string' &&
-    prepared.expression === undefined
-  ) {
-    prepared.expression = toCladoEvaluateExpression(prepared.function)
-    delete prepared.function
-  }
-
-  if (
-    toolName === 'click_at' &&
-    typeof prepared.dblClick === 'boolean' &&
-    prepared.clickCount === undefined
-  ) {
-    prepared.clickCount = prepared.dblClick ? 2 : 1
-    delete prepared.dblClick
-  }
-
-  if (
-    CLADO_PAGE_SCOPED_TOOLS.has(toolName) &&
-    typeof prepared.page !== 'number'
-  ) {
-    prepared.page = pageId
-  }
-
-  return prepared
-}
-
-export function toCladoEvaluateExpression(rawFunction: unknown): string {
-  const source = String(rawFunction).trim()
-  if (source.startsWith('() =>') || source.startsWith('async () =>')) {
-    return `(${source})()`
-  }
-  if (source.startsWith('function')) {
-    return `(${source})()`
-  }
-  return source
-}
-
-export function normalizeCladoPressKey(key: string | undefined): string {
-  const raw = (key ?? '').trim()
-  if (!raw) throw new Error('press_key action missing key field')
-
-  const map: Record<string, string> = {
-    'C-a': 'Control+A',
-    'C-c': 'Control+C',
-    'C-v': 'Control+V',
-    'C-x': 'Control+X',
-    'C-z': 'Control+Z',
-    'C-y': 'Control+Y',
-    'C-s': 'Control+S',
-    'C-t': 'Control+T',
-    'C-w': 'Control+W',
-    'C-h': 'Control+H',
-    'C-f': 'Control+F',
-    'C-+': 'Control++',
-    'C--': 'Control+-',
-    'C-tab': 'Control+Tab',
-    'C-S-tab': 'Control+Shift+Tab',
-    'C-S-n': 'Control+Shift+N',
-    'C-down': 'Control+ArrowDown',
-    'M-a': 'Meta+A',
-    'M-c': 'Meta+C',
-    'M-v': 'Meta+V',
-    'M-x': 'Meta+X',
-    'M-f4': 'Alt+F4',
-  }
-  return map[raw] ?? raw
-}
-
-export function normalizeCladoDirection(
-  direction: string | undefined,
-): 'up' | 'down' | 'left' | 'right' {
-  if (
-    direction === 'up' ||
-    direction === 'down' ||
-    direction === 'left' ||
-    direction === 'right'
-  ) {
-    return direction
-  }
-  return 'down'
-}
-
-export function normalizeCladoScrollAmount(amount: number | undefined): number {
-  if (typeof amount !== 'number') return 500
-  if (amount <= 0) return 100
-  const clamped = Math.min(amount, 1000)
-  return Math.max(100, Math.round((clamped / 1000) * 900))
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
@@ -1,68 +0,0 @@
-import { CLADO_REQUEST_TIMEOUT_MS } from '../../../../constants'
-import { formatCladoHistory } from './clado-actions'
-import type { CladoAction, CladoActionResponse } from './types'
-
-export interface CladoActionClientOptions {
-  baseUrl?: string
-  apiKey?: string
-}
-
-export interface CladoActionPredictionInput {
-  instruction: string
-  imageBase64: string
-  actionHistory: CladoAction[]
-  signal?: AbortSignal
-}
-
-/** Calls the Clado action model without exposing credentials in process arguments or artifacts. */
-export class CladoActionClient {
-  constructor(private readonly options: CladoActionClientOptions) {}
-
-  async requestActionPrediction(
-    input: CladoActionPredictionInput,
-  ): Promise<CladoActionResponse> {
-    if (!this.options.baseUrl) {
-      throw new Error('executor.baseUrl must be set for clado-action provider')
-    }
-
-    const requestController = new AbortController()
-    const onAbort = () => requestController.abort()
-    input.signal?.addEventListener('abort', onAbort, { once: true })
-
-    const timeoutHandle = setTimeout(() => {
-      requestController.abort()
-    }, CLADO_REQUEST_TIMEOUT_MS)
-
-    try {
-      const headers: Record<string, string> = {
-        'Content-Type': 'application/json',
-      }
-      if (this.options.apiKey) {
-        headers.Authorization = `Bearer ${this.options.apiKey}`
-      }
-
-      const response = await fetch(this.options.baseUrl, {
-        method: 'POST',
-        headers,
-        body: JSON.stringify({
-          instruction: input.instruction,
-          image_base64: input.imageBase64,
-          history: formatCladoHistory(input.actionHistory),
-        }),
-        signal: requestController.signal,
-      })
-
-      if (!response.ok) {
-        const body = await response.text()
-        throw new Error(
-          `HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
-        )
-      }
-
-      return (await response.json()) as CladoActionResponse
-    } finally {
-      clearTimeout(timeoutHandle)
-      input.signal?.removeEventListener('abort', onAbort)
-    }
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-executor-backend.ts
@@ -1,56 +0,0 @@
-import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
-import type {
-  DelegationResult,
-  ExecutorBackend,
-  ExecutorCallbacks,
-} from '../../executor-backend'
-import { CladoActionExecutor } from './clado-action-executor'
-
-export interface CladoExecutorBackendOptions {
-  configTemplate: ResolvedAgentConfig
-  serverUrl: string
-  initialPageId?: number
-  callbacks?: ExecutorCallbacks
-}
-
-/** Executes delegated goals through the Clado visual action model. */
-export class CladoExecutorBackend implements ExecutorBackend {
-  readonly kind = 'clado'
-  private executor: CladoActionExecutor | null = null
-
-  constructor(private readonly options: CladoExecutorBackendOptions) {}
-
-  async execute(
-    instruction: string,
-    signal?: AbortSignal,
-  ): Promise<DelegationResult> {
-    const executor = this.getExecutor()
-    const result = await executor.execute(instruction, signal)
-    return result
-  }
-
-  async close(): Promise<void> {
-    await this.executor?.close()
-  }
-
-  getTotalSteps(): number {
-    return this.executor?.getTotalSteps() ?? 0
-  }
-
-  private getExecutor(): CladoActionExecutor {
-    if (this.executor) return this.executor
-
-    this.executor = new CladoActionExecutor(
-      {
-        provider: this.options.configTemplate.provider,
-        model: this.options.configTemplate.model,
-        apiKey: this.options.configTemplate.apiKey ?? '',
-        baseUrl: this.options.configTemplate.baseUrl,
-      },
-      this.options.serverUrl,
-      this.options.initialPageId,
-    )
-    this.executor.setCallbacks(this.options.callbacks ?? {})
-    return this.executor
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/types.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/types.ts
@@ -1,78 +0,0 @@
-export const CLADO_ACTION_PROVIDER = 'clado-action'
-
-export const CLADO_PAGE_SCOPED_TOOLS = new Set<string>([
-  'take_screenshot',
-  'evaluate_script',
-  'click',
-  'click_at',
-  'hover',
-  'hover_at',
-  'clear',
-  'fill',
-  'press_key',
-  'type_at',
-  'drag',
-  'drag_at',
-  'scroll',
-  'handle_dialog',
-  'select_option',
-  'navigate_page',
-  'close_page',
-  'wait_for',
-])
-
-export interface CladoActionResponse {
-  action?: string | null
-  x?: number
-  y?: number
-  text?: string
-  key?: string
-  direction?: string
-  startX?: number
-  startY?: number
-  endX?: number
-  endY?: number
-  amount?: number
-  time?: number
-  final_answer?: string | null
-  inference_time_seconds?: number
-  raw_response?: string
-  thinking?: string | null
-  parse_error?: string | null
-}
-
-export interface CladoViewport {
-  width: number
-  height: number
-}
-
-export interface CladoAction {
-  action: string
-  x?: number
-  y?: number
-  text?: string
-  key?: string
-  direction?: string
-  startX?: number
-  startY?: number
-  endX?: number
-  endY?: number
-  amount?: number
-  time?: number
-  final_answer?: string
-}
-
-export type RawCladoActionPayload = Partial<
-  Omit<CladoAction, 'final_answer'>
-> & {
-  final_answer?: string | null
-}
-
-export interface CladoActionPoint {
-  x: number
-  y: number
-}
-
-export function isCladoActionProvider(provider: string): boolean {
-  return provider === CLADO_ACTION_PROVIDER
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
@@ -1,60 +0,0 @@
-import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
-import type { Browser } from '@browseros/server/browser'
-import type {
-  ExecutorBackend,
-  ExecutorBackendKind,
-  ExecutorCallbacks,
-} from '../executor-backend'
-import { CladoExecutorBackend } from './clado/clado-executor-backend'
-import { isCladoActionProvider } from './clado/types'
-import { ToolLoopExecutorBackend } from './tool-loop/tool-loop-executor-backend'
-
-export interface CreateExecutorBackendOptions {
-  backendKind?: ExecutorBackendKind
-  provider?: string
-  configTemplate?: ResolvedAgentConfig
-  browser?: Browser | null
-  serverUrl?: string
-  windowId?: number
-  tabId?: number
-  initialPageId?: number
-  callbacks?: ExecutorCallbacks
-  executor?: ExecutorBackend
-}
-
-export function backendKindForProvider(provider: string): ExecutorBackendKind {
-  return isCladoActionProvider(provider) ? 'clado' : 'tool-loop'
-}
-
-/** Creates the backend used for one orchestrator delegation. */
-export function createExecutorBackend(
-  options: CreateExecutorBackendOptions,
-): ExecutorBackend {
-  if (options.executor) return options.executor
-
-  const kind =
-    options.backendKind ??
-    backendKindForProvider(
-      options.provider ?? options.configTemplate?.provider ?? '',
-    )
-
-  if (kind === 'clado') {
-    return new CladoExecutorBackend({
-      configTemplate: required(options.configTemplate, 'configTemplate'),
-      serverUrl: required(options.serverUrl, 'serverUrl'),
-      initialPageId: options.initialPageId,
-      callbacks: options.callbacks,
-    })
-  }
-
-  return new ToolLoopExecutorBackend({
-    configTemplate: required(options.configTemplate, 'configTemplate'),
-    browser: options.browser ?? null,
-    callbacks: options.callbacks,
-  })
-}
-
-function required<T>(value: T | undefined, name: string): T {
-  if (value === undefined) throw new Error(`${name} is required`)
-  return value
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend.ts
@@ -1,144 +0,0 @@
-import { randomUUID } from 'node:crypto'
-import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
-import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
-import type { Browser } from '@browseros/server/browser'
-import { registry } from '@browseros/server/tools/registry'
-import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
-import type {
-  DelegationResult,
-  ExecutorBackend,
-  ExecutorCallbacks,
-} from '../../executor-backend'
-import { TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT } from './tool-loop-executor-prompt'
-
-export interface ToolLoopExecutorBackendOptions {
-  configTemplate: ResolvedAgentConfig
-  browser: Browser | null
-  callbacks?: ExecutorCallbacks
-}
-
-/** Executes delegated goals through the BrowserOS ToolLoopAgent. */
-export class ToolLoopExecutorBackend implements ExecutorBackend {
-  readonly kind = 'tool-loop'
-  private stepsUsed = 0
-  private currentUrl = ''
-
-  constructor(private readonly options: ToolLoopExecutorBackendOptions) {}
-
-  async execute(
-    instruction: string,
-    signal?: AbortSignal,
-  ): Promise<DelegationResult> {
-    const browser = this.options.browser
-    if (!browser) {
-      throw new Error('Browser instance is required for tool-loop executor')
-    }
-
-    const stepsAtStart = this.stepsUsed
-    const toolsUsed: string[] = []
-    let status: DelegationResult['status'] = 'done'
-    let resultText = ''
-
-    const conversationId = randomUUID()
-    const agentConfig: ResolvedAgentConfig = {
-      ...this.options.configTemplate,
-      conversationId,
-      userSystemPrompt: TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT,
-      evalMode: true,
-      workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
-    }
-
-    const browserContext = await this.browserContext(browser)
-    let agent: AiSdkAgent | null = null
-
-    try {
-      agent = await AiSdkAgent.create({
-        resolvedConfig: agentConfig,
-        browser,
-        registry,
-        browserContext,
-      })
-
-      await agent.toolLoopAgent.generate({
-        prompt: instruction,
-        abortSignal: signal,
-
-        experimental_onToolCallStart: ({ toolCall }) => {
-          const input = toolCall.input as Record<string, unknown> | undefined
-          if (input && typeof input.url === 'string' && input.url.length > 0) {
-            this.currentUrl = input.url
-          }
-          this.options.callbacks?.onToolCallStart?.({
-            toolCallId: toolCall.toolCallId,
-            toolName: toolCall.toolName,
-            input: toolCall.input,
-          })
-        },
-
-        experimental_onToolCallFinish: async () => {
-          this.stepsUsed++
-          await this.options.callbacks?.onToolCallFinish?.()
-        },
-
-        onStepFinish: async ({ toolCalls, toolResults, text }) => {
-          if (toolCalls) {
-            for (const toolCall of toolCalls) {
-              if (!toolsUsed.includes(toolCall.toolName)) {
-                toolsUsed.push(toolCall.toolName)
-              }
-            }
-          }
-
-          if (text) resultText = text
-
-          await this.options.callbacks?.onStepFinish?.({
-            toolCalls,
-            toolResults,
-            text,
-          })
-        },
-      })
-    } catch {
-      status = signal?.aborted ? 'timeout' : 'blocked'
-    } finally {
-      if (agent) await agent.dispose().catch(() => {})
-    }
-
-    if (status === 'done' && signal?.aborted) {
-      status = 'timeout'
-    }
-
-    return {
-      observation: resultText || 'Execution completed with no actions taken.',
-      status,
-      url: this.currentUrl,
-      actionsPerformed: this.stepsUsed - stepsAtStart,
-      toolsUsed,
-    }
-  }
-
-  async close(): Promise<void> {
-    // No persistent resources; AiSdkAgent is disposed at the end of each execute() call.
-  }
-
-  getTotalSteps(): number {
-    return this.stepsUsed
-  }
-
-  private async browserContext(
-    browser: Browser,
-  ): Promise<BrowserContext | undefined> {
-    const pages = await browser.listPages()
-    const activePage = pages[0]
-    if (!activePage) return undefined
-
-    return {
-      activeTab: {
-        id: activePage.tabId,
-        pageId: activePage.pageId,
-        url: activePage.url,
-        title: activePage.title,
-      },
-    }
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-prompt.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-prompt.ts
@@ -1,21 +0,0 @@
-export const TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
-
-## Your Job
-1. Execute browser actions to achieve the given goal
-2. Stop as soon as the goal is accomplished -- do NOT perform extra actions
-3. Write a final observation describing the result
-
-## Final Response Format
-When done, your response MUST include:
- What you accomplished (or what went wrong)
- What the page currently shows: key headings, links, data, or content visible
- The current URL from the address bar
- If you got stuck, what is blocking progress
-
-## Rules
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
- If the goal is to click something, confirm the result of the click.
- If you cannot find what was asked for, say so clearly -- do not guess or improvise.
- Prefer browser_navigate over browser_open_tab for going to URLs.
- Do NOT call browser_group_tabs or other organizational tools.`
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
@@ -1,33 +0,0 @@
-import type { ExecutorResult } from '../orchestrator-executor/types'
-
-export type ExecutorBackendKind = 'tool-loop' | 'clado'
-export type DelegationResult = ExecutorResult
-
-export interface ToolCallInfo {
-  toolCallId: string
-  toolName: string
-  input: unknown
-}
-
-export interface ToolResultInfo {
-  toolCallId: string
-  toolName: string
-  output: unknown
-}
-
-export interface ExecutorCallbacks {
-  onToolCallStart?: (toolCall: ToolCallInfo) => void
-  onToolCallFinish?: () => Promise<void>
-  onStepFinish?: (step: {
-    toolCalls?: ReadonlyArray<ToolCallInfo>
-    toolResults?: ReadonlyArray<ToolResultInfo>
-    text?: string
-  }) => Promise<void>
-}
-
-export interface ExecutorBackend {
-  readonly kind: ExecutorBackendKind
-  execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
-  close(): Promise<void>
-  getTotalSteps(): number
-}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-action-executor.ts
@@ -1,67 +1,121 @@
 import { randomUUID } from 'node:crypto'
-import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
-import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
-import { sleep } from '../../../../utils/sleep'
-import type {
-  ExecutorConfig,
-  ExecutorResult,
-} from '../../../orchestrator-executor/types'
-import type { ExecutorCallbacks } from '../../executor-backend'
 import {
-  extractCladoThinking,
-  formatCladoHistory,
-  getCladoActionSignature,
-  parseCladoActions,
-  summarizeCladoPrediction,
-} from './clado-actions'
-import {
-  normalizeCladoDirection,
-  normalizeCladoPressKey,
-  normalizeCladoScrollAmount,
-  prepareCladoToolArgs,
-  resolveCladoPoint,
-} from './clado-browser-driver'
-import { CladoActionClient } from './clado-client'
-import {
-  CLADO_ACTION_PROVIDER,
-  type CladoAction,
-  type CladoActionPoint,
-  type CladoActionResponse,
-  type CladoViewport,
-  isCladoActionProvider,
-} from './types'
+  CLADO_REQUEST_TIMEOUT_MS,
+  MAX_ACTIONS_PER_DELEGATION,
+} from '../../constants'
+import { McpClient, type McpToolResult } from '../../utils/mcp-client'
+import { sleep } from '../../utils/sleep'
+import type { ExecutorCallbacks } from './executor'
+import type { ExecutorConfig, ExecutorResult } from './types'
+
+const CLADO_ACTION_PROVIDER = 'clado-action'
+const PAGE_SCOPED_TOOLS = new Set<string>([
+  'take_screenshot',
+  'evaluate_script',
+  'click',
+  'click_at',
+  'hover',
+  'hover_at',
+  'clear',
+  'fill',
+  'press_key',
+  'type_at',
+  'drag',
+  'drag_at',
+  'scroll',
+  'handle_dialog',
+  'select_option',
+  'navigate_page',
+  'close_page',
+  'wait_for',
+])
+
+interface CladoActionResponse {
+  action?: string | null
+  x?: number
+  y?: number
+  text?: string
+  key?: string
+  direction?: string
+  startX?: number
+  startY?: number
+  endX?: number
+  endY?: number
+  amount?: number
+  time?: number
+  final_answer?: string | null
+  inference_time_seconds?: number
+  raw_response?: string
+  thinking?: string | null
+  parse_error?: string | null
+}
+
+interface Viewport {
+  width: number
+  height: number
+}
+
+interface CladoAction {
+  action: string
+  x?: number
+  y?: number
+  text?: string
+  key?: string
+  direction?: string
+  startX?: number
+  startY?: number
+  endX?: number
+  endY?: number
+  amount?: number
+  time?: number
+  final_answer?: string
+}
+
+type RawActionPayload = Partial<Omit<CladoAction, 'final_answer'>> & {
+  final_answer?: string | null
+}

 const MAX_CONSECUTIVE_PARSE_FAILURES = 3

+interface ActionPoint {
+  x: number
+  y: number
+}
+
 function asErrorMessage(error: unknown): string {
  return error instanceof Error ? error.message : String(error)
 }

+function clampNormalized(value: number): number {
+  return Math.min(999, Math.max(0, Math.round(value)))
+}
+
+function isCladoProvider(provider: string): boolean {
+  return provider === CLADO_ACTION_PROVIDER
+}
+
 export class CladoActionExecutor {
  private readonly mcpClient: McpClient
-  private readonly cladoClient: CladoActionClient
  private readonly pageId: number
  private callbacks: ExecutorCallbacks = {}
  private stepsUsed = 0
-  private viewport: CladoViewport | null = null
-  private lastPoint: CladoActionPoint | null = null
+  private viewport: Viewport | null = null
+  private lastPoint: ActionPoint | null = null
  private currentUrl = ''

  constructor(
-    config: ExecutorConfig,
+    private readonly config: ExecutorConfig,
    serverUrl: string,
+    readonly _windowId?: number,
+    readonly _tabId?: number,
    initialPageId?: number,
  ) {
-    if (!isCladoActionProvider(config.provider)) {
+    if (!isCladoProvider(config.provider)) {
      throw new Error(
        `CladoActionExecutor requires provider="${CLADO_ACTION_PROVIDER}"`,
      )
    }
    this.mcpClient = new McpClient(`${serverUrl}/mcp`)
-    this.cladoClient = new CladoActionClient({
-      baseUrl: config.baseUrl,
-      apiKey: config.apiKey,
-    })
    this.pageId = initialPageId ?? 1
  }

@@ -111,7 +165,7 @@ export class CladoActionExecutor {
        break
      }

-      const historyForPrediction = formatCladoHistory(actionHistory)
+      const historyForPrediction = this.formatHistory(actionHistory)
      const actionToolCallId = randomUUID()
      const predictionInput = {
        instruction,
@@ -133,7 +187,7 @@ export class CladoActionExecutor {
          signal,
        )
        predictionCalls++
-        const thinking = extractCladoThinking(prediction.raw_response)
+        const thinking = this.extractThinking(prediction.raw_response)
        if (thinking) {
          const previous = thinkingTrace[thinkingTrace.length - 1]
          if (previous !== thinking) {
@@ -163,7 +217,7 @@ export class CladoActionExecutor {
        break
      }

-      const predictedActions = parseCladoActions(prediction)
+      const predictedActions = this.parseActions(prediction)
      if (predictedActions.length === 0) {
        // Per Clado contract: HTTP 200 with action=null on parse failure.
        // Count as an invalid step so the model can self-correct on the
@@ -189,7 +243,7 @@ export class CladoActionExecutor {
              toolCallId: actionToolCallId,
              toolName: 'clado_action_predict',
              output: {
-                prediction: summarizeCladoPrediction(prediction),
+                prediction: this.summarizePrediction(prediction),
                parsedActions: [],
                parseError,
                consecutiveParseFailures,
@@ -231,7 +285,7 @@ export class CladoActionExecutor {
                toolCallId: actionToolCallId,
                toolName: 'clado_action_predict',
                output: {
-                  prediction: summarizeCladoPrediction(prediction),
+                  prediction: this.summarizePrediction(prediction),
                  parsedActions: predictedActions,
                  executed: executionNotes,
                },
@@ -272,7 +326,7 @@ export class CladoActionExecutor {
              toolCallId: actionToolCallId,
              toolName: 'clado_action_predict',
              output: {
-                prediction: summarizeCladoPrediction(prediction),
+                prediction: this.summarizePrediction(prediction),
                parsedActions: predictedActions,
                executed: executionNotes,
              },
@@ -324,12 +378,125 @@ export class CladoActionExecutor {
    actionHistory: CladoAction[],
    signal?: AbortSignal,
  ): Promise<CladoActionResponse> {
-    return this.cladoClient.requestActionPrediction({
-      instruction,
-      imageBase64,
-      actionHistory,
-      signal,
-    })
+    if (!this.config.baseUrl) {
+      throw new Error('executor.baseUrl must be set for clado-action provider')
+    }
+
+    const requestController = new AbortController()
+    const onAbort = () => requestController.abort()
+    signal?.addEventListener('abort', onAbort, { once: true })
+
+    const timeoutHandle = setTimeout(() => {
+      requestController.abort()
+    }, CLADO_REQUEST_TIMEOUT_MS)
+
+    try {
+      const headers: Record<string, string> = {
+        'Content-Type': 'application/json',
+      }
+      if (this.config.apiKey) {
+        headers.Authorization = `Bearer ${this.config.apiKey}`
+      }
+
+      const response = await fetch(this.config.baseUrl, {
+        method: 'POST',
+        headers,
+        body: JSON.stringify({
+          instruction,
+          image_base64: imageBase64,
+          history: this.formatHistory(actionHistory),
+        }),
+        signal: requestController.signal,
+      })
+
+      if (!response.ok) {
+        const body = await response.text()
+        throw new Error(
+          `HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
+        )
+      }
+
+      return (await response.json()) as CladoActionResponse
+    } finally {
+      clearTimeout(timeoutHandle)
+      signal?.removeEventListener('abort', onAbort)
+    }
+  }
+
+  private parseActions(prediction: CladoActionResponse): CladoAction[] {
+    const actionFromField =
+      typeof prediction.action === 'string' ? prediction.action : null
+
+    const rawActions = this.parseActionsFromRawResponse(prediction.raw_response)
+    const primaryFromRaw = rawActions[0] ?? null
+    const mergedPrimary = {
+      ...primaryFromRaw,
+      ...prediction,
+      action: actionFromField ?? primaryFromRaw?.action,
+    }
+
+    const normalized: CladoAction[] = []
+    const primary = this.normalizeActionPayload(mergedPrimary)
+    if (primary) normalized.push(primary)
+
+    for (const candidate of rawActions.slice(1)) {
+      const parsed = this.normalizeActionPayload(candidate)
+      if (!parsed) continue
+      const prev = normalized[normalized.length - 1]
+      if (
+        !prev ||
+        this.getActionSignature(prev) !== this.getActionSignature(parsed)
+      ) {
+        normalized.push(parsed)
+      }
+    }
+
+    return normalized
+  }
+
+  private normalizeActionPayload(
+    payload: RawActionPayload,
+  ): CladoAction | null {
+    if (!payload.action || typeof payload.action !== 'string') {
+      return null
+    }
+    return {
+      action: payload.action,
+      x: typeof payload.x === 'number' ? payload.x : undefined,
+      y: typeof payload.y === 'number' ? payload.y : undefined,
+      text: typeof payload.text === 'string' ? payload.text : undefined,
+      key: typeof payload.key === 'string' ? payload.key : undefined,
+      direction:
+        typeof payload.direction === 'string' ? payload.direction : undefined,
+      startX: typeof payload.startX === 'number' ? payload.startX : undefined,
+      startY: typeof payload.startY === 'number' ? payload.startY : undefined,
+      endX: typeof payload.endX === 'number' ? payload.endX : undefined,
+      endY: typeof payload.endY === 'number' ? payload.endY : undefined,
+      amount: typeof payload.amount === 'number' ? payload.amount : undefined,
+      time: typeof payload.time === 'number' ? payload.time : undefined,
+      final_answer:
+        typeof payload.final_answer === 'string'
+          ? payload.final_answer
+          : undefined,
+    }
+  }
+
+  private parseActionsFromRawResponse(
+    rawResponse: string | undefined,
+  ): RawActionPayload[] {
+    if (!rawResponse) return []
+    const matches = [
+      ...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
+    ]
+    const parsed: RawActionPayload[] = []
+    for (const match of matches) {
+      try {
+        parsed.push(JSON.parse(match[1]) as RawActionPayload)
+      } catch {
+        // ignore malformed answer blocks
+      }
+    }
+    return parsed
  }

  private async executeAction(
@@ -400,14 +567,14 @@ export class CladoActionExecutor {
      }

      case 'press_key': {
-        const key = normalizeCladoPressKey(action.key)
+        const key = this.normalizePressKey(action.key)
        await this.runTool('press_key', { key }, signal)
        return `Pressed key "${key}".`
      }

      case 'scroll': {
-        const direction = normalizeCladoDirection(action.direction)
-        const amountPx = normalizeCladoScrollAmount(action.amount)
+        const direction = this.normalizeDirection(action.direction)
+        const amountPx = this.normalizeScrollAmount(action.amount)
        const ticks = Math.max(1, Math.round(amountPx / 120))

        await this.runTool('scroll', { direction, amount: ticks }, signal)
@@ -478,7 +645,7 @@ export class CladoActionExecutor {
    return image.data
  }

-  private async getViewport(signal?: AbortSignal): Promise<CladoViewport> {
+  private async getViewport(signal?: AbortSignal): Promise<Viewport> {
    if (this.viewport) return this.viewport

    try {
@@ -509,9 +676,15 @@ export class CladoActionExecutor {
    normalizedX: number | undefined,
    normalizedY: number | undefined,
    signal?: AbortSignal,
-  ): Promise<CladoActionPoint> {
+  ): Promise<ActionPoint> {
    const viewport = await this.getViewport(signal)
-    return resolveCladoPoint(viewport, normalizedX, normalizedY)
+    const nx = clampNormalized(normalizedX ?? 500)
+    const ny = clampNormalized(normalizedY ?? 500)
+
+    return {
+      x: Math.round((nx / 1000) * viewport.width),
+      y: Math.round((ny / 1000) * viewport.height),
+    }
  }

  private async getCurrentUrl(signal?: AbortSignal): Promise<string> {
@@ -538,7 +711,7 @@ export class CladoActionExecutor {
      throw new Error('aborted')
    }

-    const toolArgs = prepareCladoToolArgs(toolName, args, this.pageId)
+    const toolArgs = this.prepareToolArgs(toolName, args)

    try {
      const raw = await this.mcpClient.callTool(toolName, toolArgs)
@@ -557,6 +730,207 @@ export class CladoActionExecutor {
    }
  }

+  private prepareToolArgs(
+    toolName: string,
+    args: Record<string, unknown>,
+  ): Record<string, unknown> {
+    const prepared: Record<string, unknown> = { ...args }
+
+    if (
+      toolName === 'evaluate_script' &&
+      typeof prepared.function === 'string' &&
+      prepared.expression === undefined
+    ) {
+      prepared.expression = this.toEvaluateExpression(prepared.function)
+      delete prepared.function
+    }
+
+    if (
+      toolName === 'click_at' &&
+      typeof prepared.dblClick === 'boolean' &&
+      prepared.clickCount === undefined
+    ) {
+      prepared.clickCount = prepared.dblClick ? 2 : 1
+      delete prepared.dblClick
+    }
+
+    // Use fixed page ID for all page-scoped tools (single-page operation)
+    if (PAGE_SCOPED_TOOLS.has(toolName) && typeof prepared.page !== 'number') {
+      prepared.page = this.pageId
+    }
+
+    return prepared
+  }
+
+  private toEvaluateExpression(rawFunction: unknown): string {
+    const source = String(rawFunction).trim()
+    if (source.startsWith('() =>') || source.startsWith('async () =>')) {
+      return `(${source})()`
+    }
+    if (source.startsWith('function')) {
+      return `(${source})()`
+    }
+    return source
+  }
+
+  private normalizePressKey(key: string | undefined): string {
+    const raw = (key ?? '').trim()
+    if (!raw) throw new Error('press_key action missing key field')
+
+    const map: Record<string, string> = {
+      'C-a': 'Control+A',
+      'C-c': 'Control+C',
+      'C-v': 'Control+V',
+      'C-x': 'Control+X',
+      'C-z': 'Control+Z',
+      'C-y': 'Control+Y',
+      'C-s': 'Control+S',
+      'C-t': 'Control+T',
+      'C-w': 'Control+W',
+      'C-h': 'Control+H',
+      'C-f': 'Control+F',
+      'C-+': 'Control++',
+      'C--': 'Control+-',
+      'C-tab': 'Control+Tab',
+      'C-S-tab': 'Control+Shift+Tab',
+      'C-S-n': 'Control+Shift+N',
+      'C-down': 'Control+ArrowDown',
+      // macOS Cmd shortcuts (Meta in CDP).
+      'M-a': 'Meta+A',
+      'M-c': 'Meta+C',
+      'M-v': 'Meta+V',
+      'M-x': 'Meta+X',
+      'M-f4': 'Alt+F4',
+    }
+    return map[raw] ?? raw
+  }
+
+  private normalizeDirection(
+    direction: string | undefined,
+  ): 'up' | 'down' | 'left' | 'right' {
+    if (
+      direction === 'up' ||
+      direction === 'down' ||
+      direction === 'left' ||
+      direction === 'right'
+    ) {
+      return direction
+    }
+    return 'down'
+  }
+
+  private normalizeScrollAmount(amount: number | undefined): number {
+    if (typeof amount !== 'number') return 500
+    if (amount <= 0) return 100
+    const clamped = Math.min(amount, 1000)
+    return Math.max(100, Math.round((clamped / 1000) * 900))
+  }
+
+  private summarizePrediction(
+    prediction: CladoActionResponse,
+  ): Record<string, unknown> {
+    const preview =
+      typeof prediction.raw_response === 'string' &&
+      prediction.raw_response.length > 0
+        ? prediction.raw_response.slice(0, 240)
+        : undefined
+
+    return {
+      action: prediction.action,
+      x: prediction.x,
+      y: prediction.y,
+      text: prediction.text,
+      key: prediction.key,
+      direction: prediction.direction,
+      startX: prediction.startX,
+      startY: prediction.startY,
+      endX: prediction.endX,
+      endY: prediction.endY,
+      amount: prediction.amount,
+      time: prediction.time,
+      inference_time_seconds: prediction.inference_time_seconds,
+      raw_response_preview: preview,
+    }
+  }
+
+  private extractThinking(rawResponse: string | undefined): string | undefined {
+    if (!rawResponse) return undefined
+    const matches = [
+      ...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
+    ]
+    if (matches.length === 0) return undefined
+
+    const merged = matches
+      .map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
+      .filter((value) => value.length > 0)
+      .join(' ')
+
+    if (!merged) return undefined
+    return merged
+  }
+
+  private getActionSignature(action: CladoAction): string {
+    switch (action.action) {
+      case 'click':
+      case 'double_click':
+      case 'right_click':
+      case 'hover':
+        return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
+      case 'type':
+        return `${action.action}:${(action.text ?? '').slice(0, 16)}`
+      case 'press_key':
+        return `${action.action}:${action.key ?? 'key'}`
+      case 'scroll':
+        return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
+      case 'drag':
+        return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
+      case 'wait':
+        return `${action.action}:${action.time ?? 1}`
+      case 'end':
+        return action.final_answer
+          ? `end(${action.final_answer.slice(0, 32)})`
+          : 'end()'
+      case 'invalid':
+        return `invalid(${(action.text ?? '').slice(0, 40)})`
+      default:
+        return action.action
+    }
+  }
+
+  private formatHistory(actions: CladoAction[]): string {
+    if (actions.length === 0) return 'None'
+
+    const parts = actions.map((action) => {
+      switch (action.action) {
+        case 'click':
+        case 'double_click':
+        case 'right_click':
+        case 'hover':
+          return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
+        case 'type': {
+          const text = (action.text ?? '').replace(/'/g, "\\'")
+          return `type('${text}')`
+        }
+        case 'press_key':
+          return `press_key('${action.key ?? 'Enter'}')`
+        case 'scroll':
+          return `scroll(${action.direction ?? 'down'})`
+        case 'drag':
+          return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
+        case 'wait':
+          return `wait(${Math.round(action.time ?? 1)}s)`
+        case 'end':
+          return 'end()'
+        case 'invalid':
+          return 'invalid()'
+        default:
+          return action.action
+      }
+    })
+
+    return parts.join(' -> ')
+  }
+
  private buildObservation(params: {
    status: ExecutorResult['status']
    reason: string
@@ -572,7 +946,7 @@ export class CladoActionExecutor {
        : actions
            .slice(-5)
            .map(
-              (action, idx) => `${idx + 1}. ${getCladoActionSignature(action)}`,
+              (action, idx) => `${idx + 1}. ${this.getActionSignature(action)}`,
            )
            .join('\n')
    const thinkingSummary =
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
@@ -0,0 +1,243 @@
+/**
+ * Executor - Wraps AiSdkAgent for page-level browser actions (direct CDP)
+ *
+ * The executor:
+ * - Receives goal-level instructions from orchestrator
+ * - Executes browser actions until the goal is accomplished
+ * - Returns observation to orchestrator (not full history)
+ */
+
+import { randomUUID } from 'node:crypto'
+import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
+import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
+import type { Browser } from '@browseros/server/browser'
+import { registry } from '@browseros/server/tools/registry'
+import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
+import { CladoActionExecutor } from './clado-action-executor'
+import type { ExecutorResult } from './types'
+
+const EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
+
+## Your Job
+1. Execute browser actions to achieve the given goal
+2. Stop as soon as the goal is accomplished — do NOT perform extra actions
+3. Write a final observation describing the result
+
+## Final Response Format
+When done, your response MUST include:
+- What you accomplished (or what went wrong)
+- What the page currently shows: key headings, links, data, or content visible
+- The current URL from the address bar
+- If you got stuck, what is blocking progress
+
+## Rules
+- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
+- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
+- If the goal is to click something, confirm the result of the click.
+- If you cannot find what was asked for, say so clearly — do not guess or improvise.
+- Prefer browser_navigate over browser_open_tab for going to URLs.
+- Do NOT call browser_group_tabs or other organizational tools.`
+
+export interface ToolCallInfo {
+  toolCallId: string
+  toolName: string
+  input: unknown
+}
+
+export interface ToolResultInfo {
+  toolCallId: string
+  toolName: string
+  output: unknown
+}
+
+export interface ExecutorCallbacks {
+  onToolCallStart?: (toolCall: ToolCallInfo) => void
+  onToolCallFinish?: () => Promise<void>
+  onStepFinish?: (step: {
+    toolCalls?: ReadonlyArray<ToolCallInfo>
+    toolResults?: ReadonlyArray<ToolResultInfo>
+    text?: string
+  }) => Promise<void>
+}
+
+export class Executor {
+  private cladoExecutor: CladoActionExecutor | null = null
+  private stepsUsed = 0
+  private currentUrl = ''
+  private configTemplate: ResolvedAgentConfig
+  private isCladoAction: boolean
+  private browser: Browser | null
+  private serverUrl: string
+  private windowId?: number
+  private tabId?: number
+  private initialPageId?: number
+  private callbacks: ExecutorCallbacks
+
+  constructor(
+    configTemplate: ResolvedAgentConfig,
+    browser: Browser | null,
+    serverUrl: string,
+    options?: {
+      isCladoAction?: boolean
+      windowId?: number
+      tabId?: number
+      initialPageId?: number
+      callbacks?: ExecutorCallbacks
+    },
+  ) {
+    this.configTemplate = configTemplate
+    this.isCladoAction = options?.isCladoAction ?? false
+    this.browser = browser
+    this.serverUrl = serverUrl
+    this.windowId = options?.windowId
+    this.tabId = options?.tabId
+    this.initialPageId = options?.initialPageId
+    this.callbacks = options?.callbacks ?? {}
+  }
+
+  async execute(
+    instruction: string,
+    signal?: AbortSignal,
+  ): Promise<ExecutorResult> {
+    if (this.isCladoAction) {
+      if (!this.cladoExecutor) {
+        this.cladoExecutor = new CladoActionExecutor(
+          {
+            provider: this.configTemplate.provider,
+            model: this.configTemplate.model,
+            apiKey: this.configTemplate.apiKey ?? '',
+            baseUrl: this.configTemplate.baseUrl,
+          },
+          this.serverUrl,
+          this.windowId,
+          this.tabId,
+          this.initialPageId,
+        )
+        this.cladoExecutor.setCallbacks(this.callbacks)
+      }
+
+      const result = await this.cladoExecutor.execute(instruction, signal)
+      this.stepsUsed = this.cladoExecutor.getTotalSteps()
+      this.currentUrl = result.url || this.currentUrl
+      return result
+    }
+
+    if (!this.browser) {
+      throw new Error('Browser instance is required for standard executor path')
+    }
+
+    const stepsAtStart = this.stepsUsed
+    const toolsUsed: string[] = []
+    let status: 'done' | 'blocked' | 'timeout' = 'done'
+    let resultText = ''
+
+    const conversationId = randomUUID()
+    const agentConfig: ResolvedAgentConfig = {
+      ...this.configTemplate,
+      conversationId,
+      userSystemPrompt: EXECUTOR_SYSTEM_PROMPT,
+      evalMode: true,
+      workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
+    }
+
+    // Build browser context so executor agent knows the correct page ID
+    let browserContext: BrowserContext | undefined
+    if (this.browser) {
+      const pages = await this.browser.listPages()
+      const activePage = pages[0]
+      if (activePage) {
+        browserContext = {
+          activeTab: {
+            id: activePage.tabId,
+            pageId: activePage.pageId,
+            url: activePage.url,
+            title: activePage.title,
+          },
+        }
+      }
+    }
+
+    let agent: AiSdkAgent | null = null
+
+    try {
+      agent = await AiSdkAgent.create({
+        resolvedConfig: agentConfig,
+        browser: this.browser,
+        registry,
+        browserContext,
+      })
+
+      await agent.toolLoopAgent.generate({
+        prompt: instruction,
+        abortSignal: signal,
+
+        experimental_onToolCallStart: ({ toolCall }) => {
+          const input = toolCall.input as Record<string, unknown> | undefined
+          if (input && typeof input.url === 'string' && input.url.length > 0) {
+            this.currentUrl = input.url
+          }
+          this.callbacks.onToolCallStart?.({
+            toolCallId: toolCall.toolCallId,
+            toolName: toolCall.toolName,
+            input: toolCall.input,
+          })
+        },
+
+        experimental_onToolCallFinish: async () => {
+          this.stepsUsed++
+          await this.callbacks.onToolCallFinish?.()
+        },
+
+        onStepFinish: async ({ toolCalls, toolResults, text }) => {
+          if (toolCalls) {
+            for (const tc of toolCalls) {
+              if (!toolsUsed.includes(tc.toolName)) {
+                toolsUsed.push(tc.toolName)
+              }
+            }
+          }
+
+          if (text) {
+            resultText = text
+          }
+
+          await this.callbacks.onStepFinish?.({ toolCalls, toolResults, text })
+        },
+      })
+    } catch {
+      if (signal?.aborted) {
+        status = 'timeout'
+      } else {
+        status = 'blocked'
+      }
+    } finally {
+      if (agent) await agent.dispose().catch(() => {})
+    }
+
+    if (status === 'done' && signal?.aborted) {
+      status = 'timeout'
+    }
+
+    const observation =
+      resultText || 'Execution completed with no actions taken.'
+
+    return {
+      observation,
+      status,
+      url: this.currentUrl,
+      actionsPerformed: this.stepsUsed - stepsAtStart,
+      toolsUsed,
+    }
+  }
+
+  async close(): Promise<void> {
+    await this.cladoExecutor?.close()
+  }
+
+  getTotalSteps(): number {
+    if (this.isCladoAction) {
+      return this.cladoExecutor?.getTotalSteps() ?? 0
+    }
+    return this.stepsUsed
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/index.ts
@@ -24,16 +24,15 @@ import {
  resolveProviderConfig,
 } from '../../utils/resolve-provider-config'
 import { withEvalTimeout } from '../../utils/with-eval-timeout'
-import { isCladoActionProvider } from '../orchestrated/backends/clado/types'
-import { createExecutorBackend } from '../orchestrated/backends/create-executor-backend'
-import type { ExecutorCallbacks } from '../orchestrated/executor-backend'
 import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
+import { Executor, type ExecutorCallbacks } from './executor'
 import { OrchestratorAgent } from './orchestrator-agent'
 import type { ExecutorFactory, ExecutorResult } from './types'

 interface ResolvedConfigs {
  orchestratorConfig: ResolvedAgentConfig & { maxTurns?: number }
  executorConfig: ResolvedAgentConfig
+  isCladoAction: boolean
 }

 function toResolvedAgentConfig(
@@ -68,10 +67,7 @@ async function resolveAgentConfig(
  if (!executorModel) {
    throw new Error('executor.model is required in config')
  }
-  if (
-    isCladoActionProvider(config.executor.provider) &&
-    !config.executor.baseUrl
-  ) {
+  if (config.executor.provider === 'clado-action' && !config.executor.baseUrl) {
    throw new Error(
      'executor.baseUrl is required in config for clado-action provider',
    )
@@ -79,8 +75,10 @@ async function resolveAgentConfig(

  const resolvedOrchestrator = await resolveProviderConfig(config.orchestrator)

+  const isCladoAction = config.executor.provider === 'clado-action'
+
  let executorConfig: ResolvedAgentConfig
-  if (isCladoActionProvider(config.executor.provider)) {
+  if (isCladoAction) {
    executorConfig = {
      conversationId: crypto.randomUUID(),
      provider: config.executor.provider as ResolvedAgentConfig['provider'],
@@ -109,7 +107,7 @@ async function resolveAgentConfig(
    maxTurns: config.orchestrator.maxTurns,
  }

-  return { orchestratorConfig, executorConfig }
+  return { orchestratorConfig, executorConfig, isCladoAction }
 }

 export class OrchestratorExecutorEvaluator implements AgentEvaluator {
@@ -129,7 +127,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
    }

    const agentConfig = config.agent as OrchestratorExecutorConfig
-    const { orchestratorConfig, executorConfig } =
+    const { orchestratorConfig, executorConfig, isCladoAction } =
      await resolveAgentConfig(agentConfig)

    // Connect to Chrome via CDP — same per-worker offset used by app-manager.
@@ -237,12 +235,12 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
        await capture.messageLogger.logStreamEvent(delegateInputEvent)
        capture.emitEvent(task.query_id, delegateInputEvent)

-        const executor = createExecutorBackend({
-          configTemplate: executorConfig,
+        const executor = new Executor(
+          executorConfig,
          browser,
-          serverUrl: config.browseros.server_url,
-          callbacks,
-        })
+          config.browseros.server_url,
+          { isCladoAction, callbacks },
+        )
        let result: ExecutorResult
        try {
          result = await executor.execute(instruction, signal)
@@ -331,5 +329,6 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
  }
 }

+export { Executor } from './executor'
 export { OrchestratorAgent } from './orchestrator-agent'
 export * from './types'
--- a/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
+++ b/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
@@ -57,20 +57,6 @@ export class TrajectorySaver {
    )
  }

-  async saveAttempt(attempt: Record<string, unknown>): Promise<void> {
-    await writeFile(
-      join(this.outputDir, 'attempt.json'),
-      JSON.stringify(attempt, null, 2),
-    )
-  }
-
-  async saveGrades(graderResults: Record<string, GraderResult>): Promise<void> {
-    await writeFile(
-      join(this.outputDir, 'grades.json'),
-      JSON.stringify(graderResults, null, 2),
-    )
-  }
-
  async loadMetadata(): Promise<TaskMetadata> {
    const content = await readFile(
      join(this.outputDir, 'metadata.json'),
@@ -84,7 +70,6 @@ export class TrajectorySaver {
  ): Promise<void> {
    const metadata = await this.loadMetadata()
    metadata.grader_results = graderResults
-    await this.saveGrades(graderResults)
    await this.saveMetadata(metadata)
  }

--- a/packages/browseros-agent/apps/eval/src/cli/args.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/args.ts
@@ -1,170 +0,0 @@
-import { parseArgs } from 'node:util'
-
-export type PublishTarget = 'r2'
-
-export interface LegacyCliArgs {
-  command: 'legacy'
-  configPath?: string
-  help?: boolean
-}
-
-export interface SuiteCliArgs {
-  command: 'suite'
-  configPath?: string
-  suitePath?: string
-  variantId?: string
-  provider?: string
-  model?: string
-  apiKey?: string
-  baseUrl?: string
-  publishTarget?: PublishTarget
-}
-
-export interface RunCliArgs
-  extends Omit<SuiteCliArgs, 'command' | 'publishTarget'> {
-  command: 'run'
-}
-
-export interface GradeCliArgs {
-  command: 'grade'
-  runDir: string
-}
-
-export interface PublishCliArgs {
-  command: 'publish'
-  runDir: string
-  target: PublishTarget
-}
-
-export type EvalCliArgs =
-  | LegacyCliArgs
-  | SuiteCliArgs
-  | RunCliArgs
-  | GradeCliArgs
-  | PublishCliArgs
-
-const COMMANDS = new Set(['suite', 'run', 'grade', 'publish'])
-
-function stringValue(value: string | boolean | undefined): string | undefined {
-  return typeof value === 'string' && value.length > 0 ? value : undefined
-}
-
-function publishTarget(value: string | undefined): PublishTarget | undefined {
-  if (value === undefined) return undefined
-  if (value === 'r2') return 'r2'
-  throw new Error(`Unsupported publish target: ${value}`)
-}
-
-function requireOne(
-  command: string,
-  configPath: string | undefined,
-  suitePath: string | undefined,
-): void {
-  if (!configPath && !suitePath) {
-    throw new Error(`${command} requires --config or --suite`)
-  }
-  if (configPath && suitePath) {
-    throw new Error(`${command} accepts either --config or --suite, not both`)
-  }
-}
-
-function parseSuiteLikeArgs(
-  command: 'suite' | 'run',
-  argv: string[],
-): SuiteCliArgs | RunCliArgs {
-  const { values } = parseArgs({
-    args: argv,
-    options: {
-      config: { type: 'string' },
-      suite: { type: 'string' },
-      variant: { type: 'string' },
-      provider: { type: 'string' },
-      model: { type: 'string' },
-      'api-key': { type: 'string' },
-      'base-url': { type: 'string' },
-      publish: { type: 'string' },
-    },
-  })
-
-  const configPath = stringValue(values.config)
-  const suitePath = stringValue(values.suite)
-  requireOne(command, configPath, suitePath)
-
-  const parsed: SuiteCliArgs | RunCliArgs =
-    command === 'suite' ? { command: 'suite' } : { command: 'run' }
-  if (configPath) parsed.configPath = configPath
-  if (suitePath) parsed.suitePath = suitePath
-  const variantId = stringValue(values.variant)
-  if (variantId) parsed.variantId = variantId
-  const provider = stringValue(values.provider)
-  if (provider) parsed.provider = provider
-  const model = stringValue(values.model)
-  if (model) parsed.model = model
-  const apiKey = stringValue(values['api-key'])
-  if (apiKey) parsed.apiKey = apiKey
-  const baseUrl = stringValue(values['base-url'])
-  if (baseUrl) parsed.baseUrl = baseUrl
-
-  if (command === 'suite') {
-    const target = publishTarget(stringValue(values.publish))
-    if (target) {
-      const suiteArgs = parsed as SuiteCliArgs
-      suiteArgs.publishTarget = target
-    }
-  }
-
-  return parsed
-}
-
-function parseLegacyArgs(argv: string[]): LegacyCliArgs {
-  const { values } = parseArgs({
-    args: argv,
-    options: {
-      config: { type: 'string', short: 'c' },
-      help: { type: 'boolean', short: 'h' },
-    },
-  })
-
-  const parsed: LegacyCliArgs = { command: 'legacy' }
-  const configPath = stringValue(values.config)
-  if (configPath) parsed.configPath = configPath
-  if (values.help) parsed.help = true
-  return parsed
-}
-
-/** Parses the eval CLI command without running browser or publishing side effects. */
-export function parseEvalCliArgs(argv: string[]): EvalCliArgs {
-  const [command, ...rest] = argv
-  if (!COMMANDS.has(command ?? '')) {
-    return parseLegacyArgs(argv)
-  }
-
-  switch (command) {
-    case 'suite':
-      return parseSuiteLikeArgs('suite', rest)
-    case 'run':
-      return parseSuiteLikeArgs('run', rest)
-    case 'grade': {
-      const { values } = parseArgs({
-        args: rest,
-        options: { run: { type: 'string' } },
-      })
-      const runDir = stringValue(values.run)
-      if (!runDir) throw new Error('grade requires --run')
-      return { command: 'grade', runDir }
-    }
-    case 'publish': {
-      const { values } = parseArgs({
-        args: rest,
-        options: { run: { type: 'string' }, target: { type: 'string' } },
-      })
-      const runDir = stringValue(values.run)
-      if (!runDir) throw new Error('publish requires --run')
-      const target = publishTarget(stringValue(values.target))
-      if (!target) throw new Error('publish requires --target')
-      return { command: 'publish', runDir, target }
-    }
-    default:
-      return parseLegacyArgs(argv)
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/cli/commands/grade.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/grade.ts
@@ -1,84 +0,0 @@
-import { readdir, readFile, stat } from 'node:fs/promises'
-import { join } from 'node:path'
-import { TrajectorySaver } from '../../capture/trajectory-saver'
-import { runGraders } from '../../grading/grader-runner'
-import { type Message, MessageSchema, TaskMetadataSchema } from '../../types'
-import type { GradeCliArgs } from '../args'
-
-async function loadMessages(taskDir: string): Promise<Message[]> {
-  const content = await readFile(
-    join(taskDir, 'messages.jsonl'),
-    'utf-8',
-  ).catch(() => '')
-  return content
-    .split('\n')
-    .filter((line) => line.trim().length > 0)
-    .map((line) => MessageSchema.parse(JSON.parse(line)))
-}
-
-async function findTaskDirs(runDir: string): Promise<string[]> {
-  const entries = await readdir(runDir, { withFileTypes: true })
-  const taskDirs: string[] = []
-  for (const entry of entries) {
-    if (!entry.isDirectory()) continue
-    const taskDir = join(runDir, entry.name)
-    const metadata = await stat(join(taskDir, 'metadata.json')).catch(
-      () => null,
-    )
-    if (metadata?.isFile()) taskDirs.push(taskDir)
-  }
-  return taskDirs
-}
-
-/** Re-runs graders for task artifacts that already contain metadata and messages. */
-export async function runGradeCommand(args: GradeCliArgs): Promise<void> {
-  const runStat = await stat(args.runDir).catch(() => null)
-  if (!runStat?.isDirectory()) {
-    throw new Error(`Not a run directory: ${args.runDir}`)
-  }
-
-  const taskDirs = await findTaskDirs(args.runDir)
-  if (taskDirs.length === 0) {
-    throw new Error(`No task metadata found under ${args.runDir}`)
-  }
-
-  let graded = 0
-  for (const taskDir of taskDirs) {
-    const metadata = TaskMetadataSchema.parse(
-      JSON.parse(await readFile(join(taskDir, 'metadata.json'), 'utf-8')),
-    )
-    const graderNames = Object.keys(metadata.grader_results ?? {})
-    if (graderNames.length === 0) {
-      console.warn(`Skipping ${metadata.query_id}: no existing grader names`)
-      continue
-    }
-
-    const messages = await loadMessages(taskDir)
-    const graderResults = await runGraders(graderNames, {
-      task: {
-        query_id: metadata.query_id,
-        query: metadata.query,
-        dataset: metadata.dataset,
-      },
-      messages,
-      screenshotCount: metadata.screenshot_count ?? metadata.total_steps,
-      finalAnswer: metadata.final_answer,
-      taskArtifactDir: taskDir,
-      outputDir: taskDir,
-      mcpUrl: `${process.env.BROWSEROS_SERVER_URL || 'http://127.0.0.1:9110'}/mcp`,
-    })
-
-    await new TrajectorySaver(
-      args.runDir,
-      metadata.query_id,
-    ).updateGraderResults(graderResults)
-    graded++
-  }
-
-  if (graded === 0) {
-    throw new Error(
-      `No tasks with existing grader names found under ${args.runDir}`,
-    )
-  }
-  console.log(`Re-graded ${graded} task(s) in ${args.runDir}`)
-}
--- a/packages/browseros-agent/apps/eval/src/cli/commands/publish.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/publish.ts
@@ -1,25 +0,0 @@
-import { publishPathToR2 } from '../../publishing/r2-publisher'
-import type { PublishCliArgs, PublishTarget } from '../args'
-
-export interface PublishRunOptions {
-  runDir: string
-  target: PublishTarget
-}
-
-/** Publishes run artifacts through the R2 viewer upload path. */
-export async function publishRun(options: PublishRunOptions): Promise<void> {
-  if (options.target !== 'r2') {
-    throw new Error(`Unsupported publish target: ${options.target}`)
-  }
-  const result = await publishPathToR2(options.runDir)
-  for (const run of result.uploadedRuns) {
-    console.log(run.viewerUrl)
-  }
-  for (const runId of result.skippedRuns) {
-    console.log(`${runId}: already uploaded, skipping`)
-  }
-}
-
-export async function runPublishCommand(args: PublishCliArgs): Promise<void> {
-  await publishRun({ runDir: args.runDir, target: args.target })
-}
--- a/packages/browseros-agent/apps/eval/src/cli/commands/run.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/run.ts
@@ -1,21 +0,0 @@
-import type { RunCliArgs } from '../args'
-import { runSuiteCommand, type SuiteCommandDeps } from './suite'
-
-/** Executes tasks from a config or suite without publishing artifacts. */
-export async function runRunCommand(
-  args: RunCliArgs,
-  deps: SuiteCommandDeps = {},
-): Promise<void> {
-  await runSuiteCommand(
-    {
-      configPath: args.configPath,
-      suitePath: args.suitePath,
-      variantId: args.variantId,
-      provider: args.provider,
-      model: args.model,
-      apiKey: args.apiKey,
-      baseUrl: args.baseUrl,
-    },
-    deps,
-  )
-}
--- a/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
@@ -1,187 +0,0 @@
-import type { RunEvalOptions, RunEvalResult } from '../../runner/types'
-import { runEval as defaultRunEval } from '../../runs/eval-runner'
-import {
-  type AdaptedEvalConfig,
-  adaptEvalConfigFile,
-} from '../../suites/config-adapter'
-import { loadSuite } from '../../suites/load-suite'
-import { type EvalVariant, resolveVariant } from '../../suites/resolve-variant'
-import type { EvalSuite } from '../../suites/schema'
-import { type EvalConfig, EvalConfigSchema } from '../../types'
-import type { PublishTarget } from '../args'
-
-type Env = Record<string, string | undefined>
-
-export interface SuiteCommandOptions {
-  configPath?: string
-  suitePath?: string
-  variantId?: string
-  provider?: string
-  model?: string
-  apiKey?: string
-  baseUrl?: string
-  publishTarget?: PublishTarget
-  env?: Env
-}
-
-export type ResolvedSuiteCommand =
-  | (AdaptedEvalConfig & { kind: 'config'; datasetPath?: undefined })
-  | {
-      kind: 'suite'
-      suitePath: string
-      suite: EvalSuite
-      variant: EvalVariant
-      datasetPath: string
-      evalConfig: EvalConfig
-    }
-
-export interface SuiteCommandDeps {
-  runEval?: (options: RunEvalOptions) => Promise<RunEvalResult | undefined>
-  publishRun?: (options: {
-    runDir: string
-    target: PublishTarget
-  }) => Promise<void>
-}
-
-function ensureRunnableSuite(suite: EvalSuite): void {
-  if (!suite.browseros) {
-    throw new Error('suite browseros config is required to run suite commands')
-  }
-}
-
-function suiteToEvalConfig(
-  suite: EvalSuite,
-  datasetPath: string,
-  variant: EvalVariant,
-  env: Env,
-): EvalConfig {
-  ensureRunnableSuite(suite)
-
-  const base = {
-    dataset: datasetPath,
-    num_workers: suite.workers,
-    restart_server_per_task: suite.restartBrowserPerTask,
-    browseros: suite.browseros,
-    graders: suite.graders,
-    timeout_ms: suite.timeoutMs,
-    captcha: suite.captcha,
-  }
-
-  if (suite.agent.type === 'single' || suite.agent.type === 'tool-loop') {
-    // The legacy runner names the BrowserOS tool-loop agent "single".
-    return EvalConfigSchema.parse({
-      ...base,
-      agent: {
-        type: 'single',
-        provider: variant.agent.provider,
-        model: variant.agent.model,
-        apiKey: variant.agent.apiKey,
-        baseUrl: variant.agent.baseUrl,
-        supportsImages: variant.agent.supportsImages,
-      },
-    })
-  }
-
-  const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
-  const executor =
-    executorBackend === 'clado'
-      ? {
-          provider: 'clado-action' as const,
-          model:
-            env.EVAL_EXECUTOR_MODEL ?? env.CLADO_ACTION_MODEL ?? 'clado-action',
-          apiKey: env.EVAL_EXECUTOR_API_KEY ?? env.CLADO_ACTION_API_KEY ?? '',
-          baseUrl:
-            env.EVAL_EXECUTOR_BASE_URL ??
-            env.CLADO_ACTION_BASE_URL ??
-            env.CLADO_ACTION_URL,
-        }
-      : {
-          provider: variant.agent.provider,
-          model: variant.agent.model,
-          apiKey: variant.agent.apiKey,
-          baseUrl: variant.agent.baseUrl,
-        }
-
-  return EvalConfigSchema.parse({
-    ...base,
-    agent: {
-      type: 'orchestrator-executor',
-      orchestrator: {
-        provider: variant.agent.provider,
-        model: variant.agent.model,
-        apiKey: variant.agent.apiKey,
-        baseUrl: variant.agent.baseUrl,
-      },
-      executor,
-    },
-  })
-}
-
-/** Resolves config-backed or suite-backed CLI input into the run shape used by the runner. */
-export async function resolveSuiteCommand(
-  options: SuiteCommandOptions,
-): Promise<ResolvedSuiteCommand> {
-  const env = options.env ?? process.env
-  if (options.configPath) {
-    return {
-      kind: 'config',
-      ...(await adaptEvalConfigFile(options.configPath, { env })),
-    }
-  }
-  if (!options.suitePath) {
-    throw new Error('suite requires --config or --suite')
-  }
-
-  const loaded = await loadSuite(options.suitePath)
-  const variant = resolveVariant({
-    variantId: options.variantId,
-    provider: options.provider,
-    model: options.model,
-    apiKey: options.apiKey,
-    baseUrl: options.baseUrl,
-    env,
-  })
-
-  return {
-    kind: 'suite',
-    suitePath: loaded.suitePath,
-    suite: loaded.suite,
-    variant,
-    datasetPath: loaded.datasetPath,
-    evalConfig: suiteToEvalConfig(
-      loaded.suite,
-      loaded.datasetPath,
-      variant,
-      env,
-    ),
-  }
-}
-
-/** Runs the full suite loop: resolve input, execute tasks, then optionally publish the run. */
-export async function runSuiteCommand(
-  options: SuiteCommandOptions,
-  deps: SuiteCommandDeps = {},
-): Promise<void> {
-  const runEval = deps.runEval ?? defaultRunEval
-  const resolved = await resolveSuiteCommand(options)
-  const runOptions: RunEvalOptions =
-    resolved.kind === 'config'
-      ? { configPath: resolved.configPath }
-      : {
-          configPath: resolved.suitePath,
-          dataPath: resolved.datasetPath,
-          config: resolved.evalConfig,
-        }
-
-  const result = await runEval(runOptions)
-  if (!options.publishTarget) return
-
-  const outputDir = result?.outputDir
-  if (!outputDir) {
-    throw new Error('publish requested but runner did not return an outputDir')
-  }
-  if (!deps.publishRun) {
-    throw new Error('publish requested before the publisher is configured')
-  }
-  await deps.publishRun({ runDir: outputDir, target: options.publishTarget })
-}
--- a/packages/browseros-agent/apps/eval/src/cli/index.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/index.ts
@@ -1,70 +0,0 @@
-import { startDashboard } from '../dashboard/server'
-import { runEval } from '../runs/eval-runner'
-import { type EvalCliArgs, parseEvalCliArgs } from './args'
-import { runGradeCommand } from './commands/grade'
-import { publishRun, runPublishCommand } from './commands/publish'
-import { runRunCommand } from './commands/run'
-import { runSuiteCommand } from './commands/suite'
-
-export function usage(): string {
-  return `
-BrowserOS Eval
-
-Usage:
-  bun run eval suite --config <config.json> [--publish r2]
-  bun run eval suite --suite <suite.json> --variant <id> [--publish r2]
-  bun run eval run --config <config.json>
-  bun run eval run --suite <suite.json> --variant <id>
-  bun run eval grade --run <results/run-dir>
-  bun run eval publish --run <results/run-dir> --target r2
-  bun run eval -c <config.json>
-`
-}
-
-async function runLegacyCommand(args: EvalCliArgs): Promise<void> {
-  if (args.command !== 'legacy') return
-  if (args.help) {
-    console.log(usage())
-    return
-  }
-  if (args.configPath) {
-    await runEval({ configPath: args.configPath })
-    return
-  }
-
-  startDashboard({
-    tasks: [],
-    configName: '',
-    agentType: '',
-    outputDir: '',
-    configMode: true,
-  })
-  console.log(
-    'Dashboard running at http://localhost:9900 — configure and run from the UI',
-  )
-  await new Promise(() => {})
-}
-
-/** Dispatches the eval CLI while preserving the old config/dashboard entry points. */
-export async function runCli(
-  argv: string[] = Bun.argv.slice(2),
-): Promise<void> {
-  const args = parseEvalCliArgs(argv)
-  switch (args.command) {
-    case 'legacy':
-      await runLegacyCommand(args)
-      break
-    case 'suite':
-      await runSuiteCommand(args, { publishRun })
-      break
-    case 'run':
-      await runRunCommand(args)
-      break
-    case 'grade':
-      await runGradeCommand(args)
-      break
-    case 'publish':
-      await runPublishCommand(args)
-      break
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/dashboard/server.ts
+++ b/packages/browseros-agent/apps/eval/src/dashboard/server.ts
@@ -1,5 +1,5 @@
 import { mkdir, readdir, readFile, stat } from 'node:fs/promises'
-import { dirname, join, resolve, sep } from 'node:path'
+import { join, resolve } from 'node:path'
 import { Hono } from 'hono'
 import { streamSSE } from 'hono/streaming'
 import { ParallelExecutor } from '../runner/parallel-executor'
@@ -128,35 +128,6 @@ let dashboardConfigMode = false
 const configsDir = join(import.meta.dir, '..', '..', 'configs')
 const projectRoot = resolve(import.meta.dir, '..', '..', '..', '..')

-async function listConfigFiles(dir: string, prefix = ''): Promise<string[]> {
-  const entries = await readdir(join(dir, prefix), { withFileTypes: true })
-  const files: string[] = []
-  for (const entry of entries) {
-    const relativePath = prefix ? join(prefix, entry.name) : entry.name
-    if (entry.isDirectory()) {
-      files.push(...(await listConfigFiles(dir, relativePath)))
-    } else if (entry.isFile() && entry.name.endsWith('.json')) {
-      files.push(relativePath.split(sep).join('/'))
-    }
-  }
-  return files.sort()
-}
-
-function resolveConfigPath(name: string): string | null {
-  if (!name.endsWith('.json')) return null
-  if (name.split('/').some((part) => !part || part === '.' || part === '..')) {
-    return null
-  }
-
-  const resolvedPath = resolve(configsDir, name)
-  const resolvedConfigsDir = resolve(configsDir)
-  const configRootPrefix = resolvedConfigsDir.endsWith(sep)
-    ? resolvedConfigsDir
-    : `${resolvedConfigsDir}${sep}`
-  if (!resolvedPath.startsWith(configRootPrefix)) return null
-  return resolvedPath
-}
-
 // ============================================================================
 // Hono App
 // ============================================================================
@@ -368,21 +339,21 @@ app.get('/api/mode', (c) => {
 // List saved config files
 app.get('/api/configs', async (c) => {
  try {
-    return c.json(await listConfigFiles(configsDir))
+    const files = await readdir(configsDir)
+    return c.json(files.filter((f) => f.endsWith('.json')))
  } catch {
    return c.json([])
  }
 })

 // Read a specific config file
-app.get('/api/config/*', async (c) => {
-  const name = decodeURIComponent(c.req.path.slice('/api/config/'.length))
-  const configPath = resolveConfigPath(name)
-  if (!configPath) {
+app.get('/api/config/:name', async (c) => {
+  const name = c.req.param('name')
+  if (name.includes('/') || name.includes('..')) {
    return c.json({ error: 'Invalid config name' }, 400)
  }
  try {
-    const content = await readFile(configPath, 'utf-8')
+    const content = await readFile(join(configsDir, name), 'utf-8')
    return c.json(JSON.parse(content))
  } catch {
    return c.notFound()
@@ -411,17 +382,8 @@ app.post('/api/run', async (c) => {

  const config = parseResult.data

-  let baseDir = configsDir
-  if (body.configName) {
-    const configPath = resolveConfigPath(body.configName)
-    if (!configPath) {
-      return c.json({ error: 'Invalid config name' }, 400)
-    }
-    baseDir = dirname(configPath)
-  }
-
-  // Resolve relative paths from the loaded config location. Unsaved dashboard
-  // configs keep using apps/eval/configs as their base for dropdown values.
+  // Resolve relative paths from configs/ dir (dataset dropdown values are relative to it)
+  const baseDir = configsDir
  const datasetPath = resolve(
    config.dataset.startsWith('/')
      ? config.dataset
--- a/packages/browseros-agent/apps/eval/src/dashboard/viewer.html
+++ b/packages/browseros-agent/apps/eval/src/dashboard/viewer.html
@@ -685,59 +685,6 @@
    });
  }

-  // Test harness note: these ASCII section markers are used by r2-viewer-compat.test.ts.
-  // -- Artifact path resolution
-  function taskKey(task) {
-    return task.queryId || task.id || 'unknown-task';
-  }
-
-  function legacyArtifactPath(task, artifact) {
-    const id = taskKey(task);
-    switch (artifact) {
-      case 'attempt':
-        return `${id}/attempt.json`;
-      case 'metadata':
-        return `${id}/metadata.json`;
-      case 'messages':
-        return `${id}/messages.jsonl`;
-      case 'trace':
-        return `${id}/trace.jsonl`;
-      case 'grades':
-        return `${id}/grades.json`;
-      case 'screenshots':
-        return `${id}/screenshots`;
-      case 'graderArtifacts':
-        return `${id}/grader-artifacts`;
-      default:
-        return `${id}/${artifact}`;
-    }
-  }
-
-  function artifactPath(task, artifact) {
-    const manifestPath = task.paths && task.paths[artifact];
-    if (typeof manifestPath === 'string' && manifestPath.length > 0) {
-      return manifestPath.replace(/^\/+/, '');
-    }
-    return legacyArtifactPath(task, artifact);
-  }
-
-  function artifactUrl(task, artifact) {
-    return `${basePath}/${artifactPath(task, artifact)}`;
-  }
-
-  function metadataUrl(task) {
-    return artifactUrl(task, 'metadata');
-  }
-
-  function messagesUrl(task) {
-    return artifactUrl(task, 'messages');
-  }
-
-  function screenshotUrl(task, n) {
-    return `${artifactUrl(task, 'screenshots')}/${n}.png`;
-  }
-
-  // -- Task selection
  // ── Task selection ─────────────────────────────────────────────
  function selectTask(task) {
    stopAutoplay();
@@ -769,7 +716,6 @@
    }
  }

-  // -- Center panel
  // ── Center panel: screenshot viewer ────────────────────────────
  function renderCenterPanel(task) {
    const panel = document.getElementById('center-panel');
@@ -817,6 +763,10 @@
    updateControls();
  }

+  function screenshotUrl(task, n) {
+    return `${basePath}/${task.queryId || task.id}/screenshots/${n}.png`;
+  }
+
  function goToStep(n) {
    if (!selectedTask || n < 1 || n > totalSteps) return;
    currentStep = n;
@@ -964,7 +914,7 @@
    body.innerHTML = '<div class="placeholder"><div class="ph-text" style="color: #6e7681;">Loading messages...</div></div>';
    countEl.textContent = '';

-    const msgUrl = messagesUrl(task);
+    const msgUrl = `${basePath}/${task.queryId || task.id}/messages.jsonl`;

    fetch(msgUrl)
      .then((res) => {
@@ -1125,7 +1075,7 @@

  // ── Load task metadata for rich grader details ──────────────────
  function loadTaskMetadata(task) {
-    const metaUrl = metadataUrl(task);
+    const metaUrl = `${basePath}/${task.queryId || task.id}/metadata.json`;
    fetch(metaUrl)
      .then((res) => res.ok ? res.json() : null)
      .then((meta) => {
--- a/packages/browseros-agent/apps/eval/src/graders/benchmark/agisdk-state-diff.ts
+++ b/packages/browseros-agent/apps/eval/src/graders/benchmark/agisdk-state-diff.ts
@@ -1,12 +1,5 @@
+import { spawn } from 'node:child_process'
 import { join } from 'node:path'
-import {
-  writeGraderJsonArtifact,
-  writeGraderTextArtifact,
-} from '../../grading/artifacts'
-import {
-  type PythonEvaluatorResult,
-  runPythonJsonEvaluator,
-} from '../../grading/python-evaluator'
 import type { GraderResult } from '../../types'
 import { callMcpTool } from '../../utils/mcp-client'
 import type { Grader, GraderInput } from '../types'
@@ -14,23 +7,12 @@ import type { Grader, GraderInput } from '../types'
 const EVAL_SCRIPT = join(
  import.meta.dirname,
  '..',
-  'python',
+  '..',
+  '..',
+  'scripts',
  'agisdk-evaluate.py',
 )

-interface AgisdkEvaluatorInput {
-  task_id: string
-  env_state: Record<string, unknown>
-  model_response: string
-}
-
-interface AgisdkEvaluatorOutput {
-  reward: number
-  pass: boolean
-  message: string
-  per_criterion: unknown[]
-}
-
 export class AgisdkStateDiffGrader implements Grader {
  name = 'agisdk_state_diff'

@@ -54,16 +36,6 @@ export class AgisdkStateDiffGrader implements Grader {
    let envState: Record<string, unknown>
    try {
      envState = await this.fetchFinishState(origin, mcpEndpoint)
-      await writeGraderJsonArtifact(
-        input,
-        this.name,
-        'finish-state.json',
-        envState,
-      )
-      await writeGraderJsonArtifact(input, this.name, 'context.json', {
-        origin,
-        agisdk_task_id: taskId,
-      })
    } catch (error) {
      return {
        score: 0,
@@ -74,30 +46,10 @@ export class AgisdkStateDiffGrader implements Grader {
    }

    try {
-      const evaluatorInput: AgisdkEvaluatorInput = {
-        task_id: taskId,
-        env_state: envState,
-        model_response: input.finalAnswer || '',
-      }
-      await writeGraderJsonArtifact(
-        input,
-        this.name,
-        'evaluator-input.json',
-        evaluatorInput,
-      )
-      const evaluation = await this.runPythonEvaluator(evaluatorInput)
-      const result = evaluation.output
-      await writeGraderJsonArtifact(
-        input,
-        this.name,
-        'evaluator-output.json',
-        result,
-      )
-      await writeGraderTextArtifact(
-        input,
-        this.name,
-        'stderr.txt',
-        evaluation.stderr,
+      const result = await this.runPythonEvaluator(
+        taskId,
+        envState,
+        input.finalAnswer || '',
      )
      return {
        score: result.reward,
@@ -192,12 +144,59 @@ export class AgisdkStateDiffGrader implements Grader {
  }

  private runPythonEvaluator(
-    evalInput: AgisdkEvaluatorInput,
-  ): Promise<PythonEvaluatorResult<AgisdkEvaluatorOutput>> {
-    return runPythonJsonEvaluator<AgisdkEvaluatorOutput>({
-      scriptPath: EVAL_SCRIPT,
-      input: evalInput,
-      timeoutMs: 300_000,
+    taskId: string,
+    envState: Record<string, unknown>,
+    modelResponse: string,
+  ): Promise<{
+    reward: number
+    pass: boolean
+    message: string
+    per_criterion: unknown[]
+  }> {
+    return new Promise((resolve, reject) => {
+      const proc = spawn('python3', [EVAL_SCRIPT], {
+        stdio: ['pipe', 'pipe', 'pipe'],
+      })
+
+      const inputData = JSON.stringify({
+        task_id: taskId,
+        env_state: envState,
+        model_response: modelResponse,
+      })
+
+      let stdout = ''
+      let stderr = ''
+
+      proc.stdout.on('data', (data: Buffer) => {
+        stdout += data.toString()
+      })
+
+      proc.stderr.on('data', (data: Buffer) => {
+        stderr += data.toString()
+      })
+
+      proc.on('close', (code) => {
+        if (code !== 0) {
+          reject(
+            new Error(`Python evaluator exited with code ${code}: ${stderr}`),
+          )
+          return
+        }
+
+        try {
+          const result = JSON.parse(stdout.trim())
+          resolve(result)
+        } catch {
+          reject(new Error(`Failed to parse evaluator output: ${stdout}`))
+        }
+      })
+
+      proc.on('error', (err) => {
+        reject(new Error(`Failed to spawn Python evaluator: ${err.message}`))
+      })
+
+      proc.stdin.write(inputData)
+      proc.stdin.end()
    })
  }
 }
--- a/packages/browseros-agent/apps/eval/src/graders/benchmark/infinity-state.ts
+++ b/packages/browseros-agent/apps/eval/src/graders/benchmark/infinity-state.ts
@@ -1,12 +1,4 @@
 import { join, resolve } from 'node:path'
-import {
-  writeGraderJsonArtifact,
-  writeGraderTextArtifact,
-} from '../../grading/artifacts'
-import {
-  type PythonEvaluatorResult,
-  runPythonJsonEvaluator,
-} from '../../grading/python-evaluator'
 import type { GraderResult } from '../../types'
 import type { Grader, GraderInput } from '../types'

@@ -22,7 +14,10 @@ interface InfinityEvalOutput {
  message: string
 }

-const EVAL_SCRIPT = resolve(import.meta.dir, '../python/infinity-evaluate.py')
+const EVAL_SCRIPT = resolve(
+  import.meta.dir,
+  '../../../scripts/infinity-evaluate.py',
+)

 export class InfinityStateGrader implements Grader {
  name = 'infinity_state'
@@ -71,32 +66,7 @@ export class InfinityStateGrader implements Grader {
    }

    try {
-      await writeGraderJsonArtifact(input, this.name, 'verifier.json', {
-        appName: parsed.appName,
-        taskId: parsed.taskId,
-        verifierPath,
-        appServerUrl,
-      })
-      await writeGraderJsonArtifact(
-        input,
-        this.name,
-        'evaluator-input.json',
-        evalInput,
-      )
-      const evaluation = await this.runPythonEvaluator(evalInput)
-      const result = evaluation.output
-      await writeGraderJsonArtifact(
-        input,
-        this.name,
-        'evaluator-output.json',
-        result,
-      )
-      await writeGraderTextArtifact(
-        input,
-        this.name,
-        'stderr.txt',
-        evaluation.stderr,
-      )
+      const result = await this.runPythonEvaluator(evalInput)
      return {
        score: result.pass ? 1 : 0,
        pass: result.pass,
@@ -138,11 +108,27 @@ export class InfinityStateGrader implements Grader {

  private async runPythonEvaluator(
    evalInput: InfinityEvalInput,
-  ): Promise<PythonEvaluatorResult<InfinityEvalOutput>> {
-    return runPythonJsonEvaluator<InfinityEvalOutput>({
-      scriptPath: EVAL_SCRIPT,
-      input: evalInput,
-      timeoutMs: 300_000,
+  ): Promise<InfinityEvalOutput> {
+    const proc = Bun.spawn(['python3', EVAL_SCRIPT], {
+      stdin: 'pipe',
+      stdout: 'pipe',
+      stderr: 'pipe',
    })
+
+    const inputJson = JSON.stringify(evalInput)
+    proc.stdin.write(inputJson)
+    proc.stdin.end()
+
+    const stdout = await new Response(proc.stdout).text()
+    const stderr = await new Response(proc.stderr).text()
+    const exitCode = await proc.exited
+
+    if (exitCode !== 0) {
+      throw new Error(
+        `Python evaluator exited with code ${exitCode}: ${stderr || stdout}`,
+      )
+    }
+
+    return JSON.parse(stdout.trim()) as InfinityEvalOutput
  }
 }
--- a/packages/browseros-agent/apps/eval/src/graders/performance/performance-grader.ts
+++ b/packages/browseros-agent/apps/eval/src/graders/performance/performance-grader.ts
@@ -1,7 +1,6 @@
 import { readFile } from 'node:fs/promises'
 import { join } from 'node:path'
 import { query } from '@anthropic-ai/claude-agent-sdk'
-import { writeGraderJsonArtifact } from '../../grading/artifacts'
 import type { GraderResult } from '../../types'
 import type { Grader, GraderInput } from '../types'
 import {
@@ -64,7 +63,6 @@ export class PerformanceGrader implements Grader {
        input.screenshotCount,
        terminationReason,
      )
-      await writeGraderJsonArtifact(input, this.name, 'metrics.json', metrics)

      const systemPrompt = PERFORMANCE_SYSTEM_PROMPT.replace(
        /\{screenshot_count\}/g,
@@ -84,14 +82,6 @@ export class PerformanceGrader implements Grader {
        userPrompt,
        input.outputDir,
      )
-      if (response) {
-        await writeGraderJsonArtifact(
-          input,
-          this.name,
-          'agent-output.json',
-          response,
-        )
-      }

      if (!response) {
        return {
@@ -150,7 +140,6 @@ export class PerformanceGrader implements Grader {
          `Perf grader: LLM returned ${returnedAxes.size}/${expectedAxes.size} axes, missing: ${missingAxes.join(', ')}`,
        )
      }
-      await writeGraderJsonArtifact(input, this.name, 'axes.json', axisResults)

      return {
        score: compositeScore / 100,
--- a/packages/browseros-agent/apps/eval/src/graders/registry.ts
+++ b/packages/browseros-agent/apps/eval/src/graders/registry.ts
@@ -1,2 +1,51 @@
-export * from '../grading/grader-registry'
-export { runConfiguredGraders, runGraders } from '../grading/grader-runner'
+import type { GraderResult } from '../types'
+import { AgisdkStateDiffGrader } from './benchmark/agisdk-state-diff'
+import { InfinityStateGrader } from './benchmark/infinity-state'
+import { PerformanceGrader } from './performance/performance-grader'
+import type { Grader, GraderInput } from './types'
+
+export const PASS_FAIL_GRADER_ORDER = [
+  'agisdk_state_diff',
+  'infinity_state',
+  'performance_grader',
+] as const
+
+export function createGrader(name: string): Grader | null {
+  switch (name) {
+    case 'agisdk_state_diff':
+      return new AgisdkStateDiffGrader()
+    case 'infinity_state':
+      return new InfinityStateGrader()
+    case 'performance_grader':
+      return new PerformanceGrader()
+    default:
+      console.warn(`Unknown grader: ${name}`)
+      return null
+  }
+}
+
+export async function runGraders(
+  graderNames: string[],
+  input: GraderInput,
+): Promise<Record<string, GraderResult>> {
+  const results: Record<string, GraderResult> = {}
+
+  for (const name of graderNames) {
+    const grader = createGrader(name)
+    if (!grader) continue
+    try {
+      console.log(`  Running grader: ${name}`)
+      results[name] = await grader.grade(input)
+    } catch (error) {
+      results[name] = {
+        score: 0,
+        pass: false,
+        reasoning: `Error running grader: ${error}`,
+      }
+    }
+  }
+
+  return results
+}
+
+export { AgisdkStateDiffGrader, InfinityStateGrader, PerformanceGrader }
--- a/packages/browseros-agent/apps/eval/src/graders/types.ts
+++ b/packages/browseros-agent/apps/eval/src/graders/types.ts
@@ -1 +1,21 @@
-export type { Grader, GraderInput } from '../grading/types'
+import type { GraderResult, Message } from '../types'
+
+export interface GraderInput {
+  task: {
+    query_id: string
+    query: string
+    dataset: string
+  }
+  messages: Message[]
+  screenshotCount: number
+  finalAnswer: string | null
+  expectedAnswer?: string | null
+  outputDir: string
+  mcpUrl?: string
+  infinityAppUrl?: string
+}
+
+export interface Grader {
+  name: string
+  grade(input: GraderInput): Promise<GraderResult>
+}
--- a/packages/browseros-agent/apps/eval/src/grading/artifacts.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/artifacts.ts
@@ -1,34 +0,0 @@
-import { mkdir, writeFile } from 'node:fs/promises'
-import { join } from 'node:path'
-import type { GraderInput } from './types'
-
-function artifactDir(input: GraderInput, graderName: string): string {
-  return join(
-    input.taskArtifactDir || input.outputDir,
-    'grader-artifacts',
-    graderName,
-  )
-}
-
-/** Writes a JSON artifact for a grader under the task artifact directory. */
-export async function writeGraderJsonArtifact(
-  input: GraderInput,
-  graderName: string,
-  filename: string,
-  value: unknown,
-): Promise<void> {
-  const dir = artifactDir(input, graderName)
-  await mkdir(dir, { recursive: true })
-  await writeFile(join(dir, filename), JSON.stringify(value, null, 2))
-}
-
-export async function writeGraderTextArtifact(
-  input: GraderInput,
-  graderName: string,
-  filename: string,
-  value: string,
-): Promise<void> {
-  const dir = artifactDir(input, graderName)
-  await mkdir(dir, { recursive: true })
-  await writeFile(join(dir, filename), value)
-}
--- a/packages/browseros-agent/apps/eval/src/grading/grader-registry.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/grader-registry.ts
@@ -1,26 +0,0 @@
-import { AgisdkStateDiffGrader } from '../graders/benchmark/agisdk-state-diff'
-import { InfinityStateGrader } from '../graders/benchmark/infinity-state'
-import { PerformanceGrader } from '../graders/performance/performance-grader'
-import type { Grader } from './types'
-
-export const PASS_FAIL_GRADER_ORDER = [
-  'agisdk_state_diff',
-  'infinity_state',
-  'performance_grader',
-] as const
-
-export function createGrader(name: string): Grader | null {
-  switch (name) {
-    case 'agisdk_state_diff':
-      return new AgisdkStateDiffGrader()
-    case 'infinity_state':
-      return new InfinityStateGrader()
-    case 'performance_grader':
-      return new PerformanceGrader()
-    default:
-      console.warn(`Unknown grader: ${name}`)
-      return null
-  }
-}
-
-export { AgisdkStateDiffGrader, InfinityStateGrader, PerformanceGrader }
--- a/packages/browseros-agent/apps/eval/src/grading/grader-runner.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/grader-runner.ts
@@ -1,36 +0,0 @@
-import type { GraderResult } from '../types'
-import { createGrader as defaultCreateGrader } from './grader-registry'
-import type { Grader, GraderInput } from './types'
-
-export interface GraderRunnerDeps {
-  createGrader?: (name: string) => Grader | null
-}
-
-/** Runs configured graders independently so one failure does not hide others. */
-export async function runConfiguredGraders(
-  graderNames: string[],
-  input: GraderInput,
-  deps: GraderRunnerDeps = {},
-): Promise<Record<string, GraderResult>> {
-  const create = deps.createGrader ?? defaultCreateGrader
-  const results: Record<string, GraderResult> = {}
-
-  for (const name of graderNames) {
-    const grader = create(name)
-    if (!grader) continue
-    try {
-      console.log(`  Running grader: ${name}`)
-      results[name] = await grader.grade(input)
-    } catch (error) {
-      results[name] = {
-        score: 0,
-        pass: false,
-        reasoning: `Error running grader: ${error instanceof Error ? error.message : String(error)}`,
-      }
-    }
-  }
-
-  return results
-}
-
-export const runGraders = runConfiguredGraders
--- a/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
@@ -1,65 +0,0 @@
-export interface PythonEvaluatorOptions {
-  scriptPath: string
-  input: unknown
-  timeoutMs: number
-}
-
-export interface PythonEvaluatorResult<T> {
-  output: T
-  stdout: string
-  stderr: string
-  exitCode: number
-}
-
-/** Runs a Python evaluator that accepts stdin JSON and emits stdout JSON. */
-export async function runPythonJsonEvaluator<T>(
-  options: PythonEvaluatorOptions,
-): Promise<PythonEvaluatorResult<T>> {
-  const proc = Bun.spawn(['python3', options.scriptPath], {
-    stdin: 'pipe',
-    stdout: 'pipe',
-    stderr: 'pipe',
-  })
-
-  proc.stdin.write(JSON.stringify(options.input))
-  proc.stdin.end()
-
-  let timeoutHandle: ReturnType<typeof setTimeout> | undefined
-  const timeout = new Promise<never>((_, reject) => {
-    timeoutHandle = setTimeout(() => {
-      proc.kill('SIGKILL')
-      reject(
-        new Error(`Python evaluator timed out after ${options.timeoutMs}ms`),
-      )
-    }, options.timeoutMs)
-  })
-
-  const completed = (async (): Promise<PythonEvaluatorResult<T>> => {
-    const stdout = await new Response(proc.stdout).text()
-    const stderr = await new Response(proc.stderr).text()
-    const exitCode = await proc.exited
-
-    if (exitCode !== 0) {
-      throw new Error(
-        `Python evaluator exited with code ${exitCode}: ${stderr || stdout}`,
-      )
-    }
-
-    try {
-      return {
-        output: JSON.parse(stdout.trim()) as T,
-        stdout,
-        stderr,
-        exitCode,
-      }
-    } catch {
-      throw new Error(`Failed to parse Python evaluator output: ${stdout}`)
-    }
-  })()
-
-  try {
-    return await Promise.race([completed, timeout])
-  } finally {
-    clearTimeout(timeoutHandle)
-  }
-}
--- a/packages/browseros-agent/apps/eval/src/grading/types.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/types.ts
@@ -1,22 +0,0 @@
-import type { GraderResult, Message } from '../types'
-
-export interface GraderInput {
-  task: {
-    query_id: string
-    query: string
-    dataset: string
-  }
-  messages: Message[]
-  screenshotCount: number
-  finalAnswer: string | null
-  expectedAnswer?: string | null
-  taskArtifactDir: string
-  outputDir: string
-  mcpUrl?: string
-  infinityAppUrl?: string
-}
-
-export interface Grader {
-  name: string
-  grade(input: GraderInput): Promise<GraderResult>
-}
--- a/packages/browseros-agent/apps/eval/src/index.ts
+++ b/packages/browseros-agent/apps/eval/src/index.ts
@@ -1,10 +1,72 @@
 #!/usr/bin/env bun

-import { runCli } from './cli'
+import { parseArgs } from 'node:util'
+import { runEval } from './runner/eval-runner'

-try {
-  await runCli(Bun.argv.slice(2))
-} catch (error) {
-  console.error(error instanceof Error ? error.message : String(error))
-  process.exit(1)
+const { values } = parseArgs({
+  args: Bun.argv.slice(2),
+  options: {
+    config: { type: 'string', short: 'c' },
+    help: { type: 'boolean', short: 'h' },
+  },
+})
+
+if (values.help) {
+  console.log(`
+BrowserOS Eval
+
+Usage:
+  bun run eval                          # Opens dashboard in config mode
+  bun run eval --config <config.json>   # Runs eval with config file
+
+Available agent types:
+  - single                  Single LLM agent driven by the BrowserOS tool loop
+  - orchestrator-executor   High-level planner + visual/text executor
+
+Available graders:
+  - performance_grader      Multi-axis grader using Claude Agent SDK
+  - agisdk_state_diff       AGI SDK / REAL Bench state-diff grader
+  - infinity_state          WebArena-Infinity verifier-script grader
+
+Preset configs in configs/:
+  - browseros-agent-weekly.json       Weekly eval (single agent)
+  - browseros-oe-agent-weekly.json    Weekly eval (orchestrator + LLM executor)
+  - browseros-oe-clado-weekly.json    Weekly eval (orchestrator + Clado executor)
+  - agisdk-real-smoke.json            AGI SDK smoke run
+  - infinity-hard-50.json             WebArena-Infinity hard-50 set
+  - test-webvoyager.json              WebVoyager test
+  - test-mind2web.json                Mind2Web test
+
+Examples:
+  bun run eval                                       # Dashboard config mode
+  bun run eval -c configs/browseros-agent-weekly.json
+  bun run eval -c configs/test-webvoyager.json
+`)
+  process.exit(0)
+}
+
+if (values.config) {
+  try {
+    await runEval({ configPath: values.config })
+  } catch (error) {
+    console.error(error instanceof Error ? error.message : String(error))
+    process.exit(1)
+  }
+  process.exit(0)
+} else {
+  // No config — start dashboard in config mode, wait for user to configure and run
+  const { startDashboard } = await import('./dashboard/server')
+  startDashboard({
+    tasks: [],
+    configName: '',
+    agentType: '',
+    outputDir: '',
+    configMode: true,
+  })
+  console.log(
+    'Dashboard running at http://localhost:9900 — configure and run from the UI',
+  )
+
+  // Keep process alive until SIGINT
+  await new Promise(() => {})
 }
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
shivammittal274	7ee8dedd53	chore(eval): drop the 60-char truncation on grader expected/actual values Some criteria check long strings (job descriptions, post bodies, etc.) — truncating to 60 chars hides exactly the bytes you need to diff. The viewer's reasoning area already has max-height + scroll + word-break so long content scrolls; nothing renders worse for being full-length.	2026-04-30 02:08:30 +05:30
shivammittal274	a3b5ef4da3	chore(eval): show every criterion in agisdk grader message, not just failures Listing only failures hid the bigger picture — when 1 of 4 criteria fails you still want to know which 3 passed and what was checked. Now the message is the full checklist, ✓/✗ per criterion, with expected vs actual on the failing lines. Examples: All 4 criteria passed. ✓ correct job title ✓ includes Java skill ✓ includes Spring Boot skill ✓ includes Angular skill 2 of 4 criteria failed: ✓ correct job title (softened) ✓ includes Java skill ✗ includes Spring Boot skill: expected True, got False ✗ includes Angular skill: expected True, got False	2026-04-30 02:08:07 +05:30
shivammittal274	3333728e4e	fix(eval): surface per-criterion descriptions in agisdk grader output The viewer's grader-reasoning pill was showing "Task not completed successfully." for every agisdk_state_diff failure. The rich data was actually available — agisdk's TaskConfig exposes a 'description' (e.g. "includes Spring Boot skill") and the JMESPath 'query' for each criterion, zip-aligned 1:1 with info['results'] — we just weren't extracting it. Now agisdk-evaluate.py emits per-criterion entries with description, query, expected_value, actual_value, and builds the message as a useful multi-line summary: 2 of 4 criteria failed: • includes Spring Boot skill: expected True, got False • includes Angular skill: expected True, got False The viewer's grader-reasoning area already has white-space: pre-wrap so the multi-line message renders correctly. The structured per_criterion fields are also stored under details.per_criterion in metadata.json for anyone who wants to grep R2 artifacts directly.	2026-04-30 02:06:51 +05:30
shivammittal274	5c6fd34d3e	fix(eval): address Greptile P1+P2 on server log fd handling P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the log directory missing and crash the server spawn with ENOENT. Move openSync into the same try block; fall back to /dev/null so spawn always succeeds. P2: the log fd was opened on every server start but never closed. Each restart attempt leaked one fd across all workers — over a long eval run that could exhaust the process fd limit. Track the fd on the manager and closeSync it in killApp() right after the server process exits (the child's dup keeps the file open until it exits, so we don't truncate output).	2026-04-30 01:16:20 +05:30
shivammittal274	1a1220dff5	chore(eval): run clado weekly headless Default to headless so the weekly job (and local repros) don't pop ten visible Chrome windows. Set headless=false locally if you need to watch a worker.	2026-04-30 00:37:45 +05:30
shivammittal274	dc98858cc3	chore(eval): point clado weekly config at agisdk-real Switches the orchestrator-executor + Clado weekly config to run on the AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff grader. Matches the orchestrator-executor smoke target (Fireworks K2.5 orchestrator + Clado action executor) we want to track week-over-week.	2026-04-30 00:37:45 +05:30
shivammittal274	72cbffe2bb	chore(eval): refresh test-clado-api script for new Clado contract Updated the local smoke-test to match the new Clado endpoint and response contract: - New action + health URLs (000159-merged checkpoint). - Drop the grounding-model branch (orchestrator-executor doesn't use it; the README David shared only documents the action model). - Health-check waits up to 6 minutes for cold start with a 30s warning so the operator knows it's spinning up. - Print every documented response field (action, x/y, text, key, direction, amount, drag start/end, time, final_answer, thinking, parse_error, inference_time_seconds). - Three-step run that exercises a click, a typing continuation with formatted history, and an end+final_answer probe.	2026-04-30 00:37:44 +05:30
shivammittal274	34fdf08521	feat(eval): align Clado action executor with new endpoint contract David Shan shared the updated Clado BrowserOS Action Model spec. Changes to match it: - Bump endpoint URL + model id to the 000159-merged checkpoint (clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef) in browseros-oe-clado-weekly.json and the README example. - CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min; the 2-min ceiling was failing every cold-start request. - Treat HTTP 200 with action=null / parse_error as an INVALID step instead of aborting the executor loop. The model can self-correct on the next call. Cap consecutive parse failures at 3 to avoid infinite loops. - Capture final_answer from end actions. Surface it in the observation back to the orchestrator so its task answer can use the model's declared result. - Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X). - Switch screenshot format from webp → png to match the documented "PNG or JPEG" contract.	2026-04-30 00:37:44 +05:30
shivammittal274	be6858d589	fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1 Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't unblock dev's Linux CI — same throw kept reproducing. Whether this is bun caching stale stack frames or a missed eager call site, the safer move is to fix it at the root: make buildContainerRuntime never throw on Linux when the runner has explicitly opted out. Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test escape hatch in container-runtime-factory.ts. When set, returns the existing UnsupportedPlatformTestRuntime stub — server boots normally, /health binds, any actual OpenClaw API call still fails loudly at request time. eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and non-CI Linux behavior unchanged (without the flag they still throw).	2026-04-29 23:18:59 +05:30
shivammittal274	33f68a0d74	fix(server): defer OpenClaw chat client port lookup to request time apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort() synchronously when constructing the OpenClawGatewayChatClient inside the createHttpServer object literal. On non-darwin platforms this throws via the OpenClawService constructor → buildContainerRuntime, escaping the try/catch added in `5cf7b765` (which only protected the configureOpenClawService call further down in main.ts). Every other getOpenClawService() reference in server.ts is already wrapped in an arrow function. This was the lone holdout. Make it lazy too: change the chat client constructor to take getHostPort: () => number instead of hostPort: number, evaluate it inside streamTurn at request time. Behavior on darwin is unchanged. This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't available — the chat endpoint isn't exercised by the eval, so a deferred throw is acceptable.	2026-04-29 23:10:48 +05:30
shivammittal274	5cf7b765d0	fix(server): catch sync throw from OpenClaw constructor on Linux The container runtime constructor in OpenClawService throws synchronously on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing .catch() on tryAutoStart() only handles async throws inside auto-start — the sync throw from configureOpenClawService(...) itself propagates up through Application.start() and crashes the process via index.ts:48 (process.exit(EXIT_CODES.GENERAL_ERROR)). This is what's been killing dev's eval-weekly CI: the server crashes in milliseconds, the eval client polls /health, gets nothing, times out. Fix: wrap the configureOpenClawService call in try/catch matching the existing .catch() intent (best-effort, don't crash). Server continues without OpenClaw on platforms where it can't initialize. Verified by reading captured server stdout from run 25123195126: Failed to start server: error: browseros-vm currently supports macOS only at buildContainerRuntime (container-runtime-factory.ts:54:11) at new OpenClawService (openclaw-service.ts:652:15) at configureOpenClawService (openclaw-service.ts:1527:19) at start (main.ts:127:5)	2026-04-29 22:57:03 +05:30
shivammittal274	5ed0879d31	fix(eval): capture stdout too — pino logger writes to stdout, not stderr Previous diagnostic patch only redirected stderr; the captured per-worker log files came back as 0 bytes because the server uses pino which writes all log output to stdout (fd 1), not stderr (fd 2). Capture both into the same file.	2026-04-29 22:44:07 +05:30
shivammittal274	e136094305	chore(eval): instrument server startup to root-cause dev CI health-check timeouts Three diagnostics + one config swap to investigate why the eval-weekly workflow has been failing on dev since 2026-04-25 with "Server health check timed out" (every worker, every retry). Background: - Last successful weekly eval on dev: 2026-04-18 (sha `f5a2b73`) - Since then, ~30 server commits landed including Lima/VM runtime, OpenClaw service, ACL system, ACP SDK — 108 server files changed, ~13K LOC added. - Server process spawns cleanly in CI (PID logged) but never binds /health within the 30s eval-side timeout. Static analysis finds no obvious blocker; we need runtime evidence. Changes: 1. apps/server/package.json — add `start:ci` script (no `--watch`). The default `start` uses `bun --watch` which forks a child process that watches every file in the import graph. Dev's graph is ~108 files larger than main's; on a cold CI runner the watcher setup is a plausible source of multi-second startup overhead. 2. apps/eval/src/runner/browseros-app-manager.ts: - Use `start:ci` when `process.env.CI` is set (true on GitHub-hosted runners by default), else `start`. - Capture per-worker server stderr to /tmp/browseros-server-logs/ instead of ignoring it. Without this we have no visibility into why the server is hung pre-/health. - Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module graph may simply need more cold-start time on CI. 3. .github/workflows/eval-weekly.yml — upload the server logs dir as a workflow artifact (always, not just on success) so we can post-mortem any startup failure on the next run. 4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter -> Fireworks (bypasses the OpenRouter per-key spend cap that has been eating recent runs) and drop num_workers 10 -> 4 (well below the Fireworks per-account TPM threshold that overwhelmed the original 2026-04-23 run). Plan: trigger the eval-weekly workflow on this branch with the agisdk config and observe (a) whether it gets past server startup, and (b) if it doesn't, what the captured server stderr says.	2026-04-29 22:34:32 +05:30
				`@@ -1 +0,0 @@`
				{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}