fix: address review feedback for PR #896

feat: remove CLI auto init discovery
chore: add internal-docs submodule (#895 )
2026-05-13 23:53:25 +00:00 · 2026-04-30 15:36:54 -07:00 · 2026-04-30 15:24:41 -07:00 · 2026-04-30 15:13:41 -07:00 · 2026-04-30 15:04:08 -07:00 · 2026-05-01 02:25:08 +05:30
62 changed files with 3165 additions and 239 deletions
--- a/.claude/skills/ask-internal/SKILL.md
+++ b/.claude/skills/ask-internal/SKILL.md
@@ -0,0 +1,152 @@
+---
+name: ask-internal
+description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
+allowed-tools: Bash, Read, Grep, Glob, Edit, Write
+---
+
+# Ask Internal
+
+Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
+
+**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
+
+## When to use
+
+- "How do I reset my dogfood profile?"
+- "What's the deal with the OpenClaw VM startup?"
+- "Where do we configure release signing?"
+- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
+
+## Hard rules — never do these
+
+- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
+- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
+- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
+- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
+- NEVER cite a file or line number you have not actually read.
+
+## Voice rules
+
+Apply the same voice rules as `document-internal` to the synthesized answer:
+
+- Lead with the point.
+- Concrete nouns. Name files, functions, commands.
+- Short sentences. Active voice. No em dashes.
+- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
+- No filler intros.
+
+## Workflow
+
+### Step 0: Pre-flight
+
+```bash
+if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
+  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
+  exit 0
+fi
+[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
+  echo ".internal-docs/ missing or empty. Submodule not configured?"
+  exit 0
+}
+```
+
+### Step 1: Parse the question
+
+Pull the keywords from the user's question. Drop stop words. Identify intent:
+
+- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
+- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
+- **Free-form** ("anything about Y"): search all categories.
+
+### Step 2: Multi-source search
+
+Run grep in parallel across two sources.
+
+**Internal docs:**
+
+```bash
+grep -rni --include='*.md' '<keyword>' .internal-docs/
+```
+
+Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
+
+**Codebase (skip vendored Chromium and `node_modules`):**
+
+```bash
+grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
+     --exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
+     '<keyword>' packages/ scripts/ .config/ .github/
+```
+
+Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
+
+### Step 3: Synthesize answer
+
+Structure the response:
+
+1. **Direct answer.** First sentence answers the question. No preamble.
+2. **Steps if applicable.** Numbered list with exact commands.
+3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
+
+If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
+
+### Step 4: Offer execution (only if commands surfaced)
+
+If Step 3 produced executable commands the user could run, ask:
+
+> Run these for you? (y / n / dry-run)
+
+- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
+- **n:** Skip. Done.
+- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
+
+### Step 5: Doc-not-found path
+
+If Step 2 returned nothing useful (no doc hits AND no clear code answer):
+
+1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
+2. Ask: "Draft a new doc and open a PR to internal-docs?"
+3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
+
+### Step 6: Completion status
+
+Report one of:
+
+- **DONE** — answer delivered, citations verified.
+- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
+- **BLOCKED** — submodule missing or other pre-flight failure.
+- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
+
+## Citation discipline
+
+Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
+
+If a doc says one thing and the code says another, surface the conflict explicitly:
+
+> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
+
+## Common Mistakes
+
+**Skimming and then citing**
+- **Problem:** Citation points to a line that doesn't actually contain the claim.
+- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
+
+**Executing without per-command confirmation for mutations**
+- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
+- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
+
+**Searching only docs, not code**
+- **Problem:** Doc says X but code does Y; answer is wrong.
+- **Fix:** Always grep both sources in Step 2.
+
+## Red Flags
+
+**Never:**
+- Cite a file:line you haven't read.
+- Run mutations without per-command confirmation.
+- Modify BrowserOS code from this skill (use `/document-internal` for writes).
+
+**Always:**
+- Pre-flight check before any search.
+- Reconcile doc vs code conflicts in the answer, don't hide them.
+- Plain "no doc covers this" when grep is empty — never invent.
--- a/.claude/skills/document-internal/SKILL.md
+++ b/.claude/skills/document-internal/SKILL.md
@@ -0,0 +1,208 @@
+---
+name: document-internal
+description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
+allowed-tools: Bash, Read, Write, Edit, Grep, Glob
+---
+
+# Document Internal
+
+Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
+
+**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
+
+## When to use
+
+After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
+
+## Hard rules — never do these
+
+- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
+- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
+- NEVER fabricate filler content for empty template sections. Empty stays empty.
+- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
+- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
+- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
+
+## Voice rules — enforced by Step 4
+
+The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
+
+- Lead with the point. First sentence answers "what is this?"
+- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
+- Short sentences. Average <20 words. No deeply nested clauses.
+- Active voice. "X does Y" not "Y is done by X".
+- No em dashes. Use commas, periods, or rephrase.
+- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
+- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
+- No filler intros ("This document describes..."). Start with the substance.
+- Empty sections stay empty. Do not write "N/A" or fabricate content.
+
+## Workflow
+
+### Step 0: Pre-flight
+
+Bail with a clear message on any failure.
+
+```bash
+# Submodule must be initialized
+if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
+  echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
+  exit 0
+fi
+[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
+
+# Must be on a feature branch
+BRANCH=$(git branch --show-current)
+if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
+  echo "On $BRANCH. Run from a feature branch."
+  exit 0
+fi
+
+# Determine base branch (default: dev for this repo, fall back to main).
+# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
+BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
+
+# Gather context
+git log "$BASE..HEAD" --oneline
+git diff "$BASE...HEAD" --stat
+gh pr view --json body -q .body 2>/dev/null  # may be empty if no PR yet
+```
+
+### Step 1: Identify the doc
+
+Ask the user for three things in one prompt:
+
+1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
+2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
+3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
+
+### Step 2: Decision brief — four sharp questions
+
+Ask one question at a time. Each answer constrains the next. These force compression before drafting.
+
+1. "In one sentence: what can someone now DO that they could not before?"
+2. "What is the one design decision a future engineer needs to know?"
+3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
+4. "Any sharp edges or gotchas? (or 'none')"
+
+Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
+
+### Step 3: Draft from the template
+
+Read the matching template from `.internal-docs/_templates/`:
+
+- `feature` → `feature-note.md`
+- `architecture` → `architecture-note.md`
+- `design` → `design-spec.md`
+
+If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
+
+Generate the 1-pager from the template, the four answers, and the diff context.
+
+### Step 4: Voice self-check
+
+Scan the draft for violations:
+
+- Em dash present (`—`).
+- Any banned word from the list.
+- Average sentence length > 20 words.
+- Body line count > 60 (feature notes only — architecture/design have no cap).
+
+If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
+
+If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
+
+### Step 5: Show + iterate
+
+Print the full draft. Ask:
+
+> Edit needed? Paste any changes, or say "looks good".
+
+Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
+
+### Step 6: Open PR to internal-docs
+
+Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
+
+```bash
+TMP=$(mktemp -d)
+trap 'rm -rf "$TMP"' EXIT  # cleans up even if any step below fails
+git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
+cd "$TMP"
+git checkout -b "docs/<slug>"
+
+# Write the doc
+mkdir -p "<type>"  # features, architecture, designs, or setup
+cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
+<draft content>
+DOC
+
+# Update the root README index — insert one line under the matching section
+# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
+
+git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
+git commit -m "docs(<type>): <slug>"
+git push -u origin "docs/<slug>"
+
+PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
+  --head "docs/<slug>" \
+  --title "docs(<type>): <slug>" \
+  --body "$(cat <<'BODY'
+## Summary
+<one-line of what this doc covers>
+
+## Source
+- BrowserOS branch: <branch>
+- Related PR: <#NNN if any>
+BODY
+)")
+
+cd -
+echo "PR opened: $PR_URL"
+# trap above cleans up $TMP on EXIT
+```
+
+If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
+
+### Step 7: Completion status
+
+Report one of:
+
+- **DONE** — file written, branch pushed, PR opened. Print PR URL.
+- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
+- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
+
+## Doc type defaults
+
+| Branch pattern | Default doc type | Default location |
+|----------------|------------------|------------------|
+| `feat/*`       | feature          | `features/`      |
+| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
+| `rfc/*` or `design/*` | design          | `designs/`       |
+| Otherwise      | ask              | ask              |
+
+## Common Mistakes
+
+**Drafting before asking the four questions**
+- **Problem:** Output is generic filler that says nothing concrete.
+- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
+
+**Touching `.internal-docs/` directly**
+- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
+- **Fix:** Always use the tmp clone in Step 6.
+
+**Skipping voice check on user edits**
+- **Problem:** User pastes prose with em dashes or filler; ships as-is.
+- **Fix:** Re-run Step 4 after every user edit.
+
+## Red Flags
+
+**Never:**
+- Push to `internal-docs/main`. Always branch + PR.
+- Modify the OSS repo's `.gitmodules` or submodule pointer.
+- Fabricate content for empty template sections.
+
+**Always:**
+- Pre-flight check before doing any work.
+- One-pager rule for feature notes (60-line body cap).
+- File:line citations when referencing code.
--- a/.claude/skills/document-internal/seeds/README.md
+++ b/.claude/skills/document-internal/seeds/README.md
@@ -0,0 +1,51 @@
+# BrowserOS Internal Docs
+
+Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
+
+If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
+
+## How to find what you need
+
+- Setup task ("how do I X locally") → look in [`setup/`](setup/)
+- Recently shipped feature → look in [`features/`](features/)
+- Cross-cutting subsystem → look in [`architecture/`](architecture/)
+- A design decision or RFC → look in [`designs/`](designs/)
+
+Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
+
+## How to add a doc
+
+Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
+
+## Index
+
+### Setup
+<!-- one line per setup runbook: -->
+<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
+
+### Features
+<!-- one line per shipped feature, newest first: -->
+<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
+
+### Architecture
+<!-- one line per cross-cutting subsystem: -->
+<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
+
+### Designs
+<!-- one line per design spec, newest first: -->
+<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
+
+## Templates
+
+When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
+
+## Voice
+
+Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
+
+- Lead with the point.
+- Concrete nouns. Name files, functions, commands.
+- Short sentences, active voice, no em dashes.
+- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
+- Empty sections stay empty. Do not write "N/A" or fake content.
+- Feature notes target one screen, body 60 lines max.
--- a/.claude/skills/document-internal/seeds/_templates/architecture-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/architecture-note.md
@@ -0,0 +1,31 @@
+---
+title: <subsystem name>
+owner: <github handle>
+status: current | deprecated
+date: YYYY-MM-DD
+related-features: [feature-slug-1, feature-slug-2]
+---
+
+# <subsystem name>
+
+## What this subsystem does
+<1-2 paragraphs. The top-level responsibility. Boundaries.>
+
+## Architecture
+<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
+
+## Constraints
+<Hard rules the design enforces. "X must never call Y" type statements.>
+
+## Decisions made
+<Numbered list of non-obvious decisions and the reason for each.>
+
+## Key files
+- `path/to/file.ts` — role
+- `path/to/dir/` — what lives here
+
+## How to evolve this
+<Where to add things. Which tests to expect to update. What NOT to touch.>
+
+## Open questions
+<What is still being figured out. Empty if none.>
--- a/.claude/skills/document-internal/seeds/_templates/design-spec.md
+++ b/.claude/skills/document-internal/seeds/_templates/design-spec.md
@@ -0,0 +1,34 @@
+---
+title: <design name>
+owner: <github handle>
+status: proposed | accepted | rejected | superseded
+date: YYYY-MM-DD
+supersedes: <design-slug or none>
+---
+
+# <design name>
+
+## Goal
+<2-4 sentences. What this design is trying to accomplish.>
+
+## Context
+<1-2 paragraphs. The current state, what is failing, why this needs to change.>
+
+## Selected Approach
+<The chosen design at a high level. Architecture, components, data flow.>
+
+## Alternatives Considered
+### 1. <name>
+<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
+
+### 2. <name>
+<Same shape.>
+
+## Out of Scope
+<What this design does NOT cover. Defer references.>
+
+## Rollout
+<Numbered steps from "nothing exists" to "fully shipped".>
+
+## Open Questions
+<Resolved during design? Empty. Unresolved? List with owner.>
--- a/.claude/skills/document-internal/seeds/_templates/feature-note.md
+++ b/.claude/skills/document-internal/seeds/_templates/feature-note.md
@@ -0,0 +1,29 @@
+---
+title: <feature name>
+owner: <github handle>
+status: shipped | wip | deprecated
+date: YYYY-MM-DD
+prs: ["#NNN"]
+tags: [agent, browser, mcp]
+---
+
+# <feature name>
+
+## What it does
+<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
+
+## Why we built it
+<1-2 sentences. Motivation. What pain it removed or what unlocked.>
+
+## How it works
+<3-6 sentences. The flow at a high level. Name the key files.>
+
+## Key files
+- `path/to/file.ts` — what it does
+- `path/to/other.ts` — what it does
+
+## How to run / test it locally
+<bullet list of commands. Empty section if N/A — do not fake.>
+
+## Gotchas
+<known sharp edges. "If you see X, that's why." Empty if N/A.>
--- a/.github/workflows/sync-internal-docs.yml
+++ b/.github/workflows/sync-internal-docs.yml
@@ -0,0 +1,53 @@
+name: Sync internal-docs submodule
+
+on:
+  schedule:
+    - cron: '0 */4 * * *'
+  workflow_dispatch:
+
+jobs:
+  sync:
+    name: Bump internal-docs submodule pointer on dev
+    runs-on: ubuntu-latest
+    steps:
+      - name: Rewrite SSH submodule URL to HTTPS-with-token
+        env:
+          TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
+        run: |
+          git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
+
+      - uses: actions/checkout@v4
+        with:
+          token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
+          submodules: true
+          ref: dev
+          fetch-depth: 50
+
+      - name: Bump submodule pointer if internal-docs has new commits
+        env:
+          GH_TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
+        run: |
+          set -e
+
+          # Skip if submodule not yet configured (handoff window before someone adds it)
+          if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
+            echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
+            exit 0
+          fi
+
+          git submodule update --remote --merge .internal-docs
+
+          if git diff --quiet .internal-docs; then
+            echo "No internal-docs changes to sync."
+            exit 0
+          fi
+
+          git config user.name  "browseros-bot"
+          git config user.email "bot@browseros.ai"
+          git add .internal-docs
+          git commit -m "chore: sync internal-docs submodule"
+
+          # Rebase onto latest dev to absorb any commits that landed during the run,
+          # then push. set -e takes care of failing the run on rebase conflict.
+          git pull --rebase origin dev
+          git push origin dev
--- a/.gitmodules
+++ b/.gitmodules
@@ -0,0 +1,4 @@
+[submodule ".internal-docs"]
+	path = .internal-docs
+	url = git@github.com:browseros-ai/internal-docs.git
+	branch = main
--- a/.internal-docs
+++ b/.internal-docs
--- a/packages/browseros-agent/apps/cli/README.md
+++ b/packages/browseros-agent/apps/cli/README.md
@@ -38,8 +38,8 @@ browseros-cli install                # downloads BrowserOS for your platform
 # If BrowserOS is installed but not running
 browseros-cli launch                 # opens BrowserOS, waits for server

-# Configure the CLI (auto-discovers running BrowserOS)
-browseros-cli init --auto            # detects server URL and saves config
+# Configure the CLI with the Server URL from BrowserOS settings
+browseros-cli init http://127.0.0.1:9000/mcp

 # Verify connection
 browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url>             # non-interactive — pass URL directly
 browseros-cli init                   # interactive — prompts for URL
 ```

-Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
+Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.

 ### CLI updates

@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
 | `--debug` | `BOS_DEBUG=1` | Debug output |
 | `--timeout, -t` | | Request timeout (default: 2m) |

-Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
+Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file

-If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
+If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.

 ## Testing

@@ -179,7 +179,7 @@ apps/cli/
 │   └── config.go       # Config file (~/.config/browseros-cli/config.yaml)
 ├── cmd/
 │   ├── root.go         # Root command, global flags
-│   ├── init.go         # Server URL configuration (URL arg, --auto, interactive)
+│   ├── init.go         # Server URL configuration (URL arg or interactive)
 │   ├── install.go      # install (download BrowserOS for current platform)
 │   ├── launch.go       # launch (find and start BrowserOS, wait for server)
 │   ├── open.go         # open (new_page / new_hidden_page)
--- a/packages/browseros-agent/apps/cli/cmd/init.go
+++ b/packages/browseros-agent/apps/cli/cmd/init.go
@@ -17,8 +17,6 @@ import (
 )

 func init() {
-	var autoDiscover bool
-
 	cmd := &cobra.Command{
 		Use:   "init [url]",
 		Short: "Configure the BrowserOS server connection",
@@ -34,9 +32,8 @@ You can provide the full URL or just the port number:
  browseros-cli init http://127.0.0.1:9000/mcp
  browseros-cli init 9000

-Three modes:
+Modes:
  browseros-cli init <url>    Non-interactive (full URL or port number)
-  browseros-cli init --auto   Auto-discover from ~/.browseros/server.json
  browseros-cli init          Interactive prompt`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.MaximumNArgs(1),
@@ -49,22 +46,9 @@ Three modes:

 			switch {
 			case len(args) == 1:
-				// Non-interactive: URL provided as argument
 				input = args[0]

-			case autoDiscover:
-				// Auto-discover: server.json → config → probe common ports
-				discovered := probeRunningServer()
-				if discovered == "" {
-					output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
-						"  If not running:    browseros-cli launch\n"+
-						"  If not installed:  browseros-cli install", 1)
-				}
-				input = discovered
-				fmt.Printf("Auto-discovered server at %s\n", input)
-
 			default:
-				// Interactive prompt (original behavior)
 				fmt.Println()
 				bold.Println("BrowserOS CLI Setup")
 				fmt.Println()
@@ -95,12 +79,14 @@ Three modes:
 				output.Errorf(1, "invalid URL: %s", input)
 			}

-			// Verify connectivity
 			fmt.Printf("Checking connection to %s ...\n", baseURL)
 			client := &http.Client{Timeout: 5 * time.Second}
 			resp, err := client.Get(baseURL + "/health")
 			if err != nil {
-				output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
+				output.Errorf(1, "cannot connect to %s: %v\n\n"+
+					"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
+					"Then run: browseros-cli init <Server URL>\n"+
+					"Example:  browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
 			}
 			resp.Body.Close()

@@ -121,6 +107,5 @@ Three modes:
 		},
 	}

-	cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
 	rootCmd.AddCommand(cmd)
 }
--- a/packages/browseros-agent/apps/cli/cmd/install.go
+++ b/packages/browseros-agent/apps/cli/cmd/install.go
@@ -28,7 +28,7 @@ Linux:   Downloads AppImage (or .deb with --deb flag)

 After installation:
  browseros-cli launch        # start BrowserOS
-  browseros-cli init --auto   # configure the CLI`,
+  browseros-cli init <url>    # configure the CLI with the Server URL`,
 		Annotations: map[string]string{"group": "Setup:"},
 		Args:        cobra.NoArgs,
 		Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
 			fmt.Println()
 			bold.Println("Next steps:")
 			dim.Println("  browseros-cli launch        # start BrowserOS")
-			dim.Println("  browseros-cli init --auto   # configure the CLI")
+			dim.Println("  browseros-cli init <url>    # use the Server URL from BrowserOS settings")
 		},
 	}

--- a/packages/browseros-agent/apps/cli/cmd/launch.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch.go
@@ -1,6 +1,7 @@
 package cmd

 import (
+	"encoding/json"
 	"fmt"
 	"net/http"
 	"os"
@@ -38,6 +39,7 @@ If BrowserOS is already running, reports the server URL.`,

 			if url := probeRunningServer(); url != "" {
 				green.Printf("BrowserOS is already running at %s\n", url)
+				dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 				return
 			}

@@ -63,7 +65,7 @@ If BrowserOS is already running, reports the server URL.`,

 			green.Printf("BrowserOS is ready at %s\n", url)
 			fmt.Println()
-			dim.Println("Next: browseros-cli init --auto")
+			dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
 		},
 	}

@@ -75,39 +77,77 @@ If BrowserOS is already running, reports the server URL.`,
 // Server probing
 // ---------------------------------------------------------------------------

-// probeRunningServer checks server.json, config, and common ports for a running server.
+var commonBrowserOSPorts = []int{9100, 9200, 9300}
+
+// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
 func probeRunningServer() string {
-	check := func(baseURL string) bool {
-		client := &http.Client{Timeout: 2 * time.Second}
-		resp, err := client.Get(baseURL + "/health")
-		if err != nil {
-			return false
-		}
-		resp.Body.Close()
-		return resp.StatusCode == 200
-	}
+	client := &http.Client{Timeout: 2 * time.Second}

-	// 1. server.json — written by BrowserOS on startup with the actual port
-	if url := loadBrowserosServerURL(); url != "" && check(url) {
+	if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
 		return url
 	}

-	// 2. Saved config / env var
-	if url := defaultServerURL(); url != "" && check(url) {
+	if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
 		return url
 	}

-	// 3. Probe common BrowserOS ports as last resort
-	for _, port := range []int{9100, 9200, 9300} {
+	return probeCommonServerPorts(client)
+}
+
+func checkServerHealth(client *http.Client, baseURL string) bool {
+	resp, err := client.Get(baseURL + "/health")
+	if err != nil {
+		return false
+	}
+	resp.Body.Close()
+	return resp.StatusCode == 200
+}
+
+func probeCommonServerPorts(client *http.Client) string {
+	for _, port := range commonBrowserOSPorts {
 		url := fmt.Sprintf("http://127.0.0.1:%d", port)
-		if check(url) {
+		if checkServerHealth(client, url) {
 			return url
 		}
 	}
-
 	return ""
 }

+type serverDiscoveryConfig struct {
+	ServerPort       int    `json:"server_port"`
+	URL              string `json:"url"`
+	ServerVersion    string `json:"server_version"`
+	BrowserOSVersion string `json:"browseros_version,omitempty"`
+	ChromiumVersion  string `json:"chromium_version,omitempty"`
+}
+
+// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
+//
+// Normal command resolution must not call this because it can override a URL the
+// user explicitly saved with `browseros-cli init <Server URL>`.
+func loadBrowserosServerURL() string {
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return ""
+	}
+
+	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
+	if err != nil {
+		return ""
+	}
+
+	var sc serverDiscoveryConfig
+	if err := json.Unmarshal(data, &sc); err != nil {
+		return ""
+	}
+
+	return normalizeServerURL(sc.URL)
+}
+
+func mcpEndpointURL(baseURL string) string {
+	return strings.TrimSuffix(baseURL, "/") + "/mcp"
+}
+
 // ---------------------------------------------------------------------------
 // Platform-native installation detection
 // ---------------------------------------------------------------------------
@@ -117,7 +157,8 @@ func probeRunningServer() string {
 // macOS:   `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
 // Linux:   checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
 // Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
-//          and registry uninstall key (per-user Chromium install pattern)
+//
+//	and registry uninstall key (per-user Chromium install pattern)
 func isBrowserOSInstalled() bool {
 	switch runtime.GOOS {
 	case "darwin":
@@ -271,14 +312,11 @@ func waitForServer(maxWait time.Duration) (string, bool) {

 	for time.Now().Before(deadline) {
 		// server.json is written by BrowserOS on startup with the actual port
-		if url := loadBrowserosServerURL(); url != "" {
-			resp, err := client.Get(url + "/health")
-			if err == nil {
-				resp.Body.Close()
-				if resp.StatusCode == 200 {
-					return url, true
-				}
-			}
+		if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
+			return url, true
+		}
+		if url := probeCommonServerPorts(client); url != "" {
+			return url, true
 		}
 		fmt.Print(".")
 		time.Sleep(1 * time.Second)
--- a/packages/browseros-agent/apps/cli/cmd/launch_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/launch_test.go
@@ -0,0 +1,99 @@
+package cmd
+
+import (
+	"fmt"
+	"net"
+	"net/http"
+	"net/http/httptest"
+	"net/url"
+	"os"
+	"path/filepath"
+	"strconv"
+	"testing"
+	"time"
+
+	"browseros-cli/config"
+)
+
+func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	discoveredServer := newHealthyServer(t)
+	configServer := newHealthyServer(t)
+
+	serverDir := filepath.Join(home, ".browseros")
+	if err := os.MkdirAll(serverDir, 0755); err != nil {
+		t.Fatalf("os.MkdirAll() error = %v", err)
+	}
+	data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
+	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
+		t.Fatalf("os.WriteFile() error = %v", err)
+	}
+	if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := probeRunningServer()
+	if got != normalizeServerURL(discoveredServer.URL) {
+		t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
+	}
+}
+
+func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+
+	server := newHealthyServer(t)
+	port := serverPort(t, server.URL)
+
+	originalPorts := commonBrowserOSPorts
+	commonBrowserOSPorts = []int{port}
+	t.Cleanup(func() {
+		commonBrowserOSPorts = originalPorts
+	})
+
+	got, ok := waitForServer(100 * time.Millisecond)
+	if !ok {
+		t.Fatal("waitForServer() ok = false, want true")
+	}
+	if got != normalizeServerURL(server.URL) {
+		t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
+	}
+}
+
+func newHealthyServer(t *testing.T) *httptest.Server {
+	t.Helper()
+
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		if r.URL.Path != "/health" {
+			http.NotFound(w, r)
+			return
+		}
+		w.WriteHeader(http.StatusOK)
+	}))
+	t.Cleanup(server.Close)
+	return server
+}
+
+func serverPort(t *testing.T, rawURL string) int {
+	t.Helper()
+
+	parsed, err := url.Parse(rawURL)
+	if err != nil {
+		t.Fatalf("url.Parse() error = %v", err)
+	}
+	_, portText, err := net.SplitHostPort(parsed.Host)
+	if err != nil {
+		t.Fatalf("net.SplitHostPort() error = %v", err)
+	}
+	port, err := strconv.Atoi(portText)
+	if err != nil {
+		t.Fatalf("strconv.Atoi() error = %v", err)
+	}
+	return port
+}
--- a/packages/browseros-agent/apps/cli/cmd/root.go
+++ b/packages/browseros-agent/apps/cli/cmd/root.go
@@ -2,10 +2,8 @@ package cmd

 import (
 	"context"
-	"encoding/json"
 	"fmt"
 	"os"
-	"path/filepath"
 	"strconv"
 	"strings"
 	"time"
@@ -289,18 +287,15 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
 	}
 }

+// defaultServerURL returns the implicit target from user-controlled settings only.
+//
+// BrowserOS writes a discovery file at runtime, but normal commands intentionally
+// ignore it so a saved URL is not silently overridden by another running server.
 func defaultServerURL() string {
-	// 1. Explicit env var always wins
 	if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
 		return env
 	}

-	// 2. Live discovery file from running BrowserOS (most current)
-	if url := loadBrowserosServerURL(); url != "" {
-		return url
-	}
-
-	// 3. Saved config (may be stale if port changed)
 	cfg, err := config.Load()
 	if err == nil {
 		if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -311,33 +306,6 @@ func defaultServerURL() string {
 	return ""
 }

-type serverDiscoveryConfig struct {
-	ServerPort       int    `json:"server_port"`
-	URL              string `json:"url"`
-	ServerVersion    string `json:"server_version"`
-	BrowserOSVersion string `json:"browseros_version,omitempty"`
-	ChromiumVersion  string `json:"chromium_version,omitempty"`
-}
-
-func loadBrowserosServerURL() string {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return ""
-	}
-
-	data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
-	if err != nil {
-		return ""
-	}
-
-	var sc serverDiscoveryConfig
-	if err := json.Unmarshal(data, &sc); err != nil {
-		return ""
-	}
-
-	return normalizeServerURL(sc.URL)
-}
-
 func normalizeServerURL(raw string) string {
 	normalized := strings.TrimSpace(raw)

@@ -369,8 +337,10 @@ func validateServerURL(raw string) (string, error) {

 	return "", fmt.Errorf(
 		"BrowserOS server URL is not configured.\n\n" +
-			"  If BrowserOS is running:  browseros-cli init --auto\n" +
-			"  If BrowserOS is closed:   browseros-cli launch\n" +
-			"  If not installed:         browseros-cli install",
+			"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
+			"  Save it with:       browseros-cli init <Server URL>\n" +
+			"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
+			"  If BrowserOS is closed:  browseros-cli launch\n" +
+			"  If not installed:        browseros-cli install",
 	)
 }
--- a/packages/browseros-agent/apps/cli/cmd/root_test.go
+++ b/packages/browseros-agent/apps/cli/cmd/root_test.go
@@ -1,8 +1,13 @@
 package cmd

 import (
+	"os"
+	"path/filepath"
+	"strings"
 	"testing"
 	"time"
+
+	"browseros-cli/config"
 )

 func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -100,6 +105,76 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
 	}
 }

+func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
+
+	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := defaultServerURL()
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
+		t.Fatalf("config.Save() error = %v", err)
+	}
+
+	got := defaultServerURL()
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+	t.Setenv("XDG_CONFIG_HOME", t.TempDir())
+	t.Setenv("BROWSEROS_URL", "")
+
+	serverDir := filepath.Join(home, ".browseros")
+	if err := os.MkdirAll(serverDir, 0755); err != nil {
+		t.Fatalf("os.MkdirAll() error = %v", err)
+	}
+	data := []byte(`{"url":"http://127.0.0.1:9999"}`)
+	if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
+		t.Fatalf("os.WriteFile() error = %v", err)
+	}
+
+	if got := defaultServerURL(); got != "" {
+		t.Fatalf("defaultServerURL() = %q, want empty", got)
+	}
+}
+
+func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
+	got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
+	if got != "http://127.0.0.1:9115" {
+		t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
+	}
+}
+
+func TestValidateServerURLExplainsManualInit(t *testing.T) {
+	_, err := validateServerURL("")
+	if err == nil {
+		t.Fatal("validateServerURL() error = nil, want setup instructions")
+	}
+	msg := err.Error()
+	if !strings.Contains(msg, "browseros-cli init <Server URL>") {
+		t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
+	}
+	if strings.Contains(msg, "init --auto") {
+		t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
+	}
+}
+
 func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
 	done := make(chan struct{})
 	returned := make(chan struct{})
--- a/packages/browseros-agent/apps/cli/mcp/client.go
+++ b/packages/browseros-agent/apps/cli/mcp/client.go
@@ -44,10 +44,7 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {

 	session, err := sdkClient.Connect(ctx, transport, nil)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
-			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
-			"  If BrowserOS is not running:                  browseros-cli launch\n"+
-			"  If not installed:                             browseros-cli install", c.BaseURL, err)
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
 	}
 	return session, nil
 }
@@ -187,10 +184,7 @@ func (c *Client) Status() (map[string]any, error) {
 func (c *Client) restGET(path string) (map[string]any, error) {
 	resp, err := c.HTTPClient.Get(c.BaseURL + path)
 	if err != nil {
-		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
-			"  If BrowserOS is running on a different port:  browseros-cli init --auto\n"+
-			"  If BrowserOS is not running:                  browseros-cli launch\n"+
-			"  If not installed:                             browseros-cli install", c.BaseURL, err)
+		return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
 	}
 	defer resp.Body.Close()

@@ -205,3 +199,14 @@ func (c *Client) restGET(path string) (map[string]any, error) {
 	}
 	return data, nil
 }
+
+// connectionSetupInstructions explains how to recover from a stale or missing server URL.
+func connectionSetupInstructions() string {
+	return "\n\n" +
+		"  Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
+		"  Save it with:       browseros-cli init <Server URL>\n" +
+		"  Example:            browseros-cli init http://127.0.0.1:9000/mcp\n" +
+		"  Run once with:      browseros-cli --server <Server URL> health\n" +
+		"  If BrowserOS is closed:  browseros-cli launch\n" +
+		"  If not installed:        browseros-cli install"
+}
--- a/packages/browseros-agent/apps/cli/npm/README.md
+++ b/packages/browseros-agent/apps/cli/npm/README.md
@@ -31,8 +31,8 @@ browseros-cli install
 # Start BrowserOS
 browseros-cli launch

-# Auto-configure MCP settings for your AI tools
-browseros-cli init --auto
+# Configure MCP settings with the Server URL from BrowserOS settings
+browseros-cli init http://127.0.0.1:9000/mcp

 # Verify everything is working
 browseros-cli health
--- a/packages/browseros-agent/apps/eval/README.md
+++ b/packages/browseros-agent/apps/eval/README.md
@@ -9,6 +9,7 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
 - **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
 - **Bun** runtime
 - **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
+- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.

 ## Quick Start

@@ -67,7 +68,7 @@ This lets us run the same suite against multiple model setups without copying th

 ```txt
 agisdk-daily-10 + kimi-fireworks
-agisdk-daily-10 + claude-sonnet
+agisdk-daily-10 + claude-opus
 agisdk-daily-10 + clado-action-000159
 ```

@@ -79,6 +80,7 @@ For `orchestrator-executor` suites, there can also be an executor model/backend.
 |------|-------------|
 | `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
 | `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
+| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |

 ### Single agent

@@ -119,6 +121,24 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
 }
 ```

+### Claude Code
+
+Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
+
+```json
+{
+  "agent": {
+    "type": "claude-code",
+    "model": "opus"
+  }
+}
+```
+
+```bash
+BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
+bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
+```
+
 ## Graders

 | Name | Description |
@@ -151,6 +171,7 @@ The `apiKey` field supports two formats:
 | `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
 | `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
 | `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
+| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
 | `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
 | `NOPECHA_API_KEY` | CAPTCHA solver extension |
 | `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
@@ -194,7 +215,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
  "base_server_port": 9110,
  "base_extension_port": 9310,
  "load_extensions": false,
-  "headless": true
+  "headless": false
 }
 ```

--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-agent-weekly.json
@@ -7,7 +7,7 @@
    "baseUrl": "https://openrouter.ai/api/v1",
    "supportsImages": true
  },
-  "dataset": "../../data/webbench-2of4-50.jsonl",
+  "dataset": "../../data/agisdk-real.jsonl",
  "num_workers": 10,
  "restart_server_per_task": true,
  "browseros": {
@@ -21,6 +21,6 @@
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
  },
-  "graders": ["performance_grader"],
+  "graders": ["agisdk_state_diff"],
  "timeout_ms": 1800000
 }
--- a/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/browseros-oe-clado-weekly.json
@@ -23,7 +23,7 @@
    "base_server_port": 9110,
    "base_extension_port": 9310,
    "load_extensions": false,
-    "headless": true
+    "headless": false
  },
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
--- a/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
+++ b/packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
@@ -0,0 +1,22 @@
+{
+  "agent": {
+    "type": "claude-code",
+    "model": "opus"
+  },
+  "dataset": "../../data/agisdk-real.jsonl",
+  "num_workers": 1,
+  "restart_server_per_task": true,
+  "browseros": {
+    "server_url": "http://127.0.0.1:9110",
+    "base_cdp_port": 9010,
+    "base_server_port": 9110,
+    "base_extension_port": 9310,
+    "load_extensions": false,
+    "headless": false
+  },
+  "captcha": {
+    "api_key_env": "NOPECHA_API_KEY"
+  },
+  "graders": ["agisdk_state_diff"],
+  "timeout_ms": 1800000
+}
--- a/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
+++ b/packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
@@ -14,7 +14,7 @@
    "base_server_port": 9110,
    "base_extension_port": 9310,
    "load_extensions": false,
-    "headless": true
+    "headless": false
  },
  "captcha": {
    "api_key_env": "NOPECHA_API_KEY"
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
@@ -0,0 +1,238 @@
+import { writeFile } from 'node:fs/promises'
+import { join } from 'node:path'
+import { DEFAULT_TIMEOUT_MS } from '../../constants'
+import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
+import { withEvalTimeout } from '../../utils/with-eval-timeout'
+import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
+import {
+  type ClaudeCodeProcessRunner,
+  createClaudeCodeProcessRunner,
+} from './process-runner'
+import {
+  ClaudeCodeStreamParser,
+  shouldCaptureScreenshotForTool,
+} from './stream-parser'
+
+export interface ClaudeCodeEvaluatorDeps {
+  processRunner?: ClaudeCodeProcessRunner
+}
+
+export class ClaudeCodeEvaluator implements AgentEvaluator {
+  private processRunner: ClaudeCodeProcessRunner
+
+  constructor(
+    private ctx: AgentContext,
+    deps: ClaudeCodeEvaluatorDeps = {},
+  ) {
+    this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
+  }
+
+  async execute(): Promise<AgentResult> {
+    const { config, task, capture, taskOutputDir } = this.ctx
+    const startTime = Date.now()
+    const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
+
+    await capture.messageLogger.logUser(task.query)
+
+    if (config.agent.type !== 'claude-code') {
+      throw new Error('ClaudeCodeEvaluator only supports claude-code config')
+    }
+    const agentConfig = config.agent
+
+    const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
+    await writeFile(
+      mcpConfigPath,
+      JSON.stringify(
+        buildClaudeCodeMcpConfig(config.browseros.server_url),
+        null,
+        2,
+      ),
+    )
+
+    const parser = new ClaudeCodeStreamParser()
+    const toolNamesById = new Map<string, string>()
+    const prompt = buildClaudeCodePrompt(task.query)
+    const args = buildClaudeCodeArgs({
+      prompt,
+      mcpConfigPath,
+      config: agentConfig,
+    })
+
+    const { terminationReason } = await withEvalTimeout(
+      timeoutMs,
+      capture,
+      async (signal) => {
+        const runResult = await this.processRunner.run({
+          executable: agentConfig.claudePath,
+          args,
+          cwd: taskOutputDir,
+          signal,
+          onStdoutLine: async (line) => {
+            const events = parser.pushLine(line)
+            for (const event of events) {
+              await this.handleStreamEvent(event, toolNamesById)
+            }
+          },
+        })
+
+        if (runResult.exitCode !== 0) {
+          const message =
+            runResult.stderr.trim() ||
+            `Claude Code exited with status ${runResult.exitCode}`
+          capture.addError('agent_execution', message, {
+            exitCode: runResult.exitCode,
+          })
+          if (!parser.getLastText()) {
+            throw new Error(message)
+          }
+        }
+
+        for (const error of runResult.streamErrors ?? []) {
+          capture.addWarning(
+            'message_logging',
+            `Claude Code stream event processing failed: ${error}`,
+          )
+        }
+
+        return runResult
+      },
+    )
+
+    const endTime = Date.now()
+    const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
+    const metadata = {
+      query_id: task.query_id,
+      dataset: task.dataset,
+      query: task.query,
+      started_at: new Date(startTime).toISOString(),
+      completed_at: new Date(endTime).toISOString(),
+      total_duration_ms: endTime - startTime,
+      total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
+      termination_reason: terminationReason,
+      final_answer: finalAnswer,
+      errors: capture.getErrors(),
+      warnings: capture.getWarnings(),
+      device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
+      agent_config: {
+        type: 'claude-code' as const,
+        model: agentConfig.model,
+      },
+      grader_results: {},
+    }
+
+    await capture.trajectorySaver.saveMetadata(metadata)
+
+    return {
+      metadata,
+      messages: capture.getMessages(),
+      finalAnswer,
+    }
+  }
+
+  private async handleStreamEvent(
+    event: UIMessageStreamEvent,
+    toolNamesById: Map<string, string>,
+  ): Promise<void> {
+    const { capture, task } = this.ctx
+    let screenshot: number | undefined
+
+    if (event.type === 'tool-input-available') {
+      toolNamesById.set(event.toolCallId, event.toolName)
+      if (isPageInput(event.input)) {
+        capture.setActivePageId(event.input.page)
+      }
+    }
+
+    if (
+      event.type === 'tool-output-available' ||
+      event.type === 'tool-output-error'
+    ) {
+      const toolName = toolNamesById.get(event.toolCallId)
+      if (toolName && shouldCaptureScreenshotForTool(toolName)) {
+        screenshot = await this.captureScreenshot()
+      }
+    }
+
+    await capture.messageLogger.logStreamEvent(event, screenshot)
+    capture.emitEvent(task.query_id, {
+      ...event,
+      ...(screenshot !== undefined && { screenshot }),
+    })
+  }
+
+  private async captureScreenshot(): Promise<number | undefined> {
+    const { capture, task } = this.ctx
+    try {
+      const screenshot = await capture.screenshot.capture(
+        capture.getActivePageId(),
+      )
+      capture.emitEvent(task.query_id, {
+        type: 'screenshot-captured',
+        screenshot,
+      })
+      return screenshot
+    } catch {
+      return undefined
+    }
+  }
+}
+
+function isPageInput(input: unknown): input is { page: number } {
+  return (
+    typeof input === 'object' &&
+    input !== null &&
+    'page' in input &&
+    typeof input.page === 'number'
+  )
+}
+
+function buildClaudeCodePrompt(taskQuery: string): string {
+  return [
+    'You are running inside BrowserOS eval.',
+    'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
+    'When the task is complete, respond with the final answer only.',
+    'If blocked, explain the blocker clearly.',
+    '',
+    `Task: ${taskQuery}`,
+  ].join('\n')
+}
+
+function buildClaudeCodeArgs({
+  prompt,
+  mcpConfigPath,
+  config,
+}: {
+  prompt: string
+  mcpConfigPath: string
+  config: ClaudeCodeAgentConfig
+}): string[] {
+  const args = [
+    '-p',
+    prompt,
+    '--mcp-config',
+    mcpConfigPath,
+    '--strict-mcp-config',
+    '--output-format',
+    'stream-json',
+    '--verbose',
+  ]
+
+  if (config.model) args.push('--model', config.model)
+  args.push(...config.extraArgs)
+
+  return args
+}
+
+function buildClaudeCodeMcpConfig(serverUrl: string) {
+  const trimmed = serverUrl.replace(/\/$/, '')
+  const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
+  return {
+    mcpServers: {
+      browseros: {
+        type: 'http',
+        url,
+        headers: { 'X-BrowserOS-Source': 'sdk-internal' },
+      },
+    },
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
@@ -0,0 +1,114 @@
+export interface ClaudeCodeRunOptions {
+  executable: string
+  args: string[]
+  cwd: string
+  signal?: AbortSignal
+  onStdoutLine: (line: string) => Promise<void>
+}
+
+export interface ClaudeCodeRunResult {
+  exitCode: number
+  stderr: string
+  streamErrors?: string[]
+}
+
+export interface ClaudeCodeProcessRunner {
+  run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
+}
+
+export interface SpawnOptions {
+  cwd: string
+  signal?: AbortSignal
+  onStdoutLine: (line: string) => Promise<void>
+}
+
+export interface CreateClaudeCodeProcessRunnerDeps {
+  spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
+}
+
+export function createClaudeCodeProcessRunner(
+  deps: CreateClaudeCodeProcessRunnerDeps = {},
+): ClaudeCodeProcessRunner {
+  const spawn = deps.spawn ?? spawnClaudeCode
+  return {
+    run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
+      spawn([executable, ...args], { cwd, signal, onStdoutLine }),
+  }
+}
+
+async function spawnClaudeCode(
+  cmd: string[],
+  options: SpawnOptions,
+): Promise<ClaudeCodeRunResult> {
+  const proc = Bun.spawn({
+    cmd,
+    cwd: options.cwd,
+    stdin: 'ignore',
+    stdout: 'pipe',
+    stderr: 'pipe',
+  })
+
+  const abort = () => {
+    try {
+      proc.kill('SIGTERM')
+    } catch {
+      // Process may already have exited.
+    }
+  }
+  options.signal?.addEventListener('abort', abort, { once: true })
+
+  try {
+    const streamErrors: string[] = []
+    const stdoutPromise = readLines(
+      proc.stdout,
+      options.onStdoutLine,
+      streamErrors,
+    )
+    const stderrPromise = new Response(proc.stderr).text()
+    const exitCode = await proc.exited
+    await stdoutPromise
+    const stderr = await stderrPromise
+    return { exitCode, stderr, streamErrors }
+  } finally {
+    options.signal?.removeEventListener('abort', abort)
+  }
+}
+
+async function readLines(
+  stream: ReadableStream<Uint8Array>,
+  onLine: (line: string) => Promise<void>,
+  streamErrors: string[],
+): Promise<void> {
+  const reader = stream.getReader()
+  const decoder = new TextDecoder()
+  let buffer = ''
+
+  while (true) {
+    const { done, value } = await reader.read()
+    if (done) break
+
+    buffer += decoder.decode(value, { stream: true })
+    const lines = buffer.split('\n')
+    buffer = lines.pop() ?? ''
+    for (const line of lines) {
+      await emitLine(line, onLine, streamErrors)
+    }
+  }
+
+  buffer += decoder.decode()
+  if (buffer.length > 0) {
+    await emitLine(buffer, onLine, streamErrors)
+  }
+}
+
+async function emitLine(
+  line: string,
+  onLine: (line: string) => Promise<void>,
+  streamErrors: string[],
+): Promise<void> {
+  try {
+    await onLine(line)
+  } catch (error) {
+    streamErrors.push(error instanceof Error ? error.message : String(error))
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
@@ -0,0 +1,142 @@
+import { randomUUID } from 'node:crypto'
+import type { UIMessageStreamEvent } from '../../types'
+
+type JsonObject = Record<string, unknown>
+
+export class ClaudeCodeStreamParser {
+  private lastText: string | null = null
+  private toolCallCount = 0
+
+  pushLine(line: string): UIMessageStreamEvent[] {
+    const trimmed = line.trim()
+    if (!trimmed) return []
+
+    let parsed: unknown
+    try {
+      parsed = JSON.parse(trimmed)
+    } catch {
+      return []
+    }
+
+    if (!isObject(parsed)) return []
+
+    if (parsed.type === 'assistant') {
+      return this.parseAssistantMessage(parsed)
+    }
+    if (parsed.type === 'user') {
+      return this.parseUserMessage(parsed)
+    }
+    if (parsed.type === 'result' && typeof parsed.result === 'string') {
+      this.lastText = parsed.result
+    }
+
+    return []
+  }
+
+  getLastText(): string | null {
+    return this.lastText
+  }
+
+  getToolCallCount(): number {
+    return this.toolCallCount
+  }
+
+  private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
+    const content = contentBlocks(message)
+    const events: UIMessageStreamEvent[] = []
+
+    for (const block of content) {
+      if (block.type === 'text' && typeof block.text === 'string') {
+        const id = randomUUID()
+        this.lastText = block.text
+        events.push(
+          { type: 'text-start', id },
+          { type: 'text-delta', id, delta: block.text },
+          { type: 'text-end', id },
+        )
+      } else if (
+        block.type === 'tool_use' &&
+        typeof block.id === 'string' &&
+        typeof block.name === 'string'
+      ) {
+        this.toolCallCount++
+        events.push({
+          type: 'tool-input-available',
+          toolCallId: block.id,
+          toolName: block.name,
+          input: block.input,
+        })
+      }
+    }
+
+    return events
+  }
+
+  private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
+    const content = contentBlocks(message)
+    const events: UIMessageStreamEvent[] = []
+
+    for (const block of content) {
+      if (
+        block.type !== 'tool_result' ||
+        typeof block.tool_use_id !== 'string'
+      ) {
+        continue
+      }
+
+      if (block.is_error === true) {
+        events.push({
+          type: 'tool-output-error',
+          toolCallId: block.tool_use_id,
+          errorText: stringifyToolContent(block.content),
+        })
+      } else {
+        events.push({
+          type: 'tool-output-available',
+          toolCallId: block.tool_use_id,
+          output: normalizeToolContent(block.content),
+        })
+      }
+    }
+
+    return events
+  }
+}
+
+export function shouldCaptureScreenshotForTool(toolName: string): boolean {
+  if (!toolName.startsWith('mcp__browseros__')) return false
+  return !toolName.endsWith('__take_screenshot')
+}
+
+function contentBlocks(message: JsonObject): JsonObject[] {
+  const inner = isObject(message.message) ? message.message : message
+  return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
+}
+
+function isObject(value: unknown): value is JsonObject {
+  return typeof value === 'object' && value !== null
+}
+
+function normalizeToolContent(content: unknown): unknown {
+  if (!Array.isArray(content)) return content
+  return content.map((item) => {
+    if (
+      isObject(item) &&
+      item.type === 'text' &&
+      typeof item.text === 'string'
+    ) {
+      return item.text
+    }
+    return item
+  })
+}
+
+function stringifyToolContent(content: unknown): string {
+  const normalized = normalizeToolContent(content)
+  if (typeof normalized === 'string') return normalized
+  try {
+    return JSON.stringify(normalized)
+  } catch {
+    return String(normalized)
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/agents/index.ts
+++ b/packages/browseros-agent/apps/eval/src/agents/index.ts
@@ -1,3 +1,4 @@
+import { ClaudeCodeEvaluator } from './claude-code'
 import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
 import { SingleAgentEvaluator } from './single-agent'
 import type { AgentContext, AgentEvaluator } from './types'
@@ -8,6 +9,8 @@ export function createAgent(context: AgentContext): AgentEvaluator {
      return new SingleAgentEvaluator(context)
    case 'orchestrator-executor':
      return new OrchestratorExecutorEvaluator(context)
+    case 'claude-code':
+      return new ClaudeCodeEvaluator(context)
  }
 }

--- a/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
+++ b/packages/browseros-agent/apps/eval/src/capture/trajectory-saver.ts
@@ -105,7 +105,10 @@ export class TrajectorySaver {
      errors: [],
      warnings: [],
      agent_config: {
-        type: agentConfig.type as 'single' | 'orchestrator-executor',
+        type: agentConfig.type as
+          | 'single'
+          | 'orchestrator-executor'
+          | 'claude-code',
        model: agentConfig.model,
      },
      grader_results: {},
--- a/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
+++ b/packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
@@ -82,6 +82,16 @@ function suiteToEvalConfig(
    })
  }

+  if (suite.agent.type === 'claude-code') {
+    return EvalConfigSchema.parse({
+      ...base,
+      agent: {
+        type: 'claude-code',
+        ...(variant.agent.model && { model: variant.agent.model }),
+      },
+    })
+  }
+
  const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
  const executor =
    executorBackend === 'clado'
@@ -135,7 +145,10 @@ export async function resolveSuiteCommand(
  const loaded = await loadSuite(options.suitePath)
  const variant = resolveVariant({
    variantId: options.variantId,
-    provider: options.provider,
+    provider:
+      loaded.suite.agent.type === 'claude-code'
+        ? 'claude-code'
+        : options.provider,
    model: options.model,
    apiKey: options.apiKey,
    baseUrl: options.baseUrl,
--- a/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
+++ b/packages/browseros-agent/apps/eval/src/grading/python-evaluator.ts
@@ -2,6 +2,7 @@ export interface PythonEvaluatorOptions {
  scriptPath: string
  input: unknown
  timeoutMs: number
+  pythonPath?: string
 }

 export interface PythonEvaluatorResult<T> {
@@ -15,7 +16,9 @@ export interface PythonEvaluatorResult<T> {
 export async function runPythonJsonEvaluator<T>(
  options: PythonEvaluatorOptions,
 ): Promise<PythonEvaluatorResult<T>> {
-  const proc = Bun.spawn(['python3', options.scriptPath], {
+  const pythonPath =
+    options.pythonPath || process.env.BROWSEROS_EVAL_PYTHON || 'python3'
+  const proc = Bun.spawn([pythonPath, options.scriptPath], {
    stdin: 'pipe',
    stdout: 'pipe',
    stderr: 'pipe',
--- a/packages/browseros-agent/apps/eval/src/suites/config-adapter.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/config-adapter.ts
@@ -33,6 +33,13 @@ function variantSource(config: EvalConfig): {
  baseUrl?: string
  supportsImages?: boolean
 } {
+  if (config.agent.type === 'claude-code') {
+    return {
+      provider: 'claude-code',
+      model: config.agent.model ?? 'default',
+    }
+  }
+
  const agent =
    config.agent.type === 'single' ? config.agent : config.agent.orchestrator
  if (!agent.model) {
@@ -76,10 +83,7 @@ export async function adaptEvalConfigFile(
    suite: {
      id,
      dataset: evalConfig.dataset,
-      agent:
-        evalConfig.agent.type === 'single'
-          ? { type: 'tool-loop' }
-          : { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' },
+      agent: suiteAgent(evalConfig, backend),
      graders: evalConfig.graders ?? [],
      workers: evalConfig.num_workers,
      restartBrowserPerTask: evalConfig.restart_server_per_task,
@@ -99,3 +103,17 @@ export async function adaptEvalConfigFile(
    }),
  }
 }
+
+function suiteAgent(
+  config: EvalConfig,
+  backend: ReturnType<typeof executorBackend>,
+): EvalSuite['agent'] {
+  switch (config.agent.type) {
+    case 'single':
+      return { type: 'tool-loop' }
+    case 'orchestrator-executor':
+      return { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' }
+    case 'claude-code':
+      return { type: 'claude-code' }
+  }
+}
--- a/packages/browseros-agent/apps/eval/src/suites/resolve-variant.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/resolve-variant.ts
@@ -57,10 +57,30 @@ export function resolveVariant(
  options: ResolveVariantOptions = {},
 ): EvalVariant {
  const env = options.env ?? process.env
-  const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
  const provider =
    options.provider ?? env.EVAL_AGENT_PROVIDER ?? 'openai-compatible'
  const model = options.model ?? env.EVAL_AGENT_MODEL
+
+  if (provider === 'claude-code') {
+    const id = options.variantId ?? env.EVAL_VARIANT ?? 'claude-code'
+    return {
+      id,
+      agent: {
+        provider,
+        model: model ?? '',
+      },
+      publicMetadata: {
+        id,
+        agent: {
+          provider,
+          model: model || 'default',
+          apiKeyConfigured: false,
+        },
+      },
+    }
+  }
+
+  const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
  const apiKey = options.apiKey ?? env.EVAL_AGENT_API_KEY
  const apiKeyEnv =
    options.apiKeyEnv ?? (options.apiKey ? undefined : 'EVAL_AGENT_API_KEY')
--- a/packages/browseros-agent/apps/eval/src/suites/schema.ts
+++ b/packages/browseros-agent/apps/eval/src/suites/schema.ts
@@ -8,6 +8,7 @@ export const SuiteAgentSchema = z
      'single',
      'orchestrated',
      'orchestrator-executor',
+      'claude-code',
    ]),
    executorBackend: z.enum(['tool-loop', 'clado']).optional(),
  })
--- a/packages/browseros-agent/apps/eval/src/types/config.ts
+++ b/packages/browseros-agent/apps/eval/src/types/config.ts
@@ -19,9 +19,19 @@ export const OrchestratorExecutorConfigSchema = z.object({
  }),
 })

+export const ClaudeCodeAgentConfigSchema = z
+  .object({
+    type: z.literal('claude-code'),
+    model: z.string().min(1).optional(),
+    claudePath: z.string().min(1).default('claude'),
+    extraArgs: z.array(z.string()).default([]),
+  })
+  .strict()
+
 export const AgentConfigSchema = z.discriminatedUnion('type', [
  SingleAgentConfigSchema,
  OrchestratorExecutorConfigSchema,
+  ClaudeCodeAgentConfigSchema,
 ])

 export const EvalConfigSchema = z.object({
@@ -53,5 +63,6 @@ export type SingleAgentConfig = z.infer<typeof SingleAgentConfigSchema>
 export type OrchestratorExecutorConfig = z.infer<
  typeof OrchestratorExecutorConfigSchema
 >
+export type ClaudeCodeAgentConfig = z.infer<typeof ClaudeCodeAgentConfigSchema>
 export type AgentConfig = z.infer<typeof AgentConfigSchema>
 export type EvalConfig = z.infer<typeof EvalConfigSchema>
--- a/packages/browseros-agent/apps/eval/src/types/index.ts
+++ b/packages/browseros-agent/apps/eval/src/types/index.ts
@@ -2,6 +2,8 @@
 export {
  type AgentConfig,
  AgentConfigSchema,
+  type ClaudeCodeAgentConfig,
+  ClaudeCodeAgentConfigSchema,
  type EvalConfig,
  EvalConfigSchema,
  type OrchestratorExecutorConfig,
--- a/packages/browseros-agent/apps/eval/src/types/result.ts
+++ b/packages/browseros-agent/apps/eval/src/types/result.ts
@@ -13,7 +13,7 @@ export const GraderResultSchema = z.object({
 // Agent config in metadata
 const AgentConfigMetaSchema = z
  .object({
-    type: z.enum(['single', 'orchestrator-executor']),
+    type: z.enum(['single', 'orchestrator-executor', 'claude-code']),
    model: z.string().optional(),
  })
  .passthrough()
--- a/packages/browseros-agent/apps/eval/src/utils/config-validator.ts
+++ b/packages/browseros-agent/apps/eval/src/utils/config-validator.ts
@@ -59,7 +59,7 @@ export async function validateConfig(
    ) {
      envVarsToCheck.push(config.agent.apiKey)
    }
-  } else {
+  } else if (config.agent.type === 'orchestrator-executor') {
    const { orchestrator, executor } = config.agent
    if (orchestrator.apiKey && isEnvVarName(orchestrator.apiKey)) {
      envVarsToCheck.push(orchestrator.apiKey)
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-evaluator.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-evaluator.test.ts
@@ -0,0 +1,268 @@
+import { describe, expect, it } from 'bun:test'
+import { mkdtemp, readFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { createAgent } from '../../src/agents'
+import { ClaudeCodeEvaluator } from '../../src/agents/claude-code'
+import { CaptureContext } from '../../src/capture/context'
+import {
+  AgentConfigSchema,
+  type EvalConfig,
+  EvalConfigSchema,
+  type Task,
+  TaskMetadataSchema,
+} from '../../src/types'
+
+function config(): EvalConfig {
+  return {
+    agent: {
+      type: 'claude-code',
+      model: 'opus',
+      claudePath: 'claude',
+      extraArgs: [],
+    },
+    dataset: 'data/test.jsonl',
+    num_workers: 1,
+    restart_server_per_task: false,
+    browseros: {
+      server_url: 'http://127.0.0.1:9110',
+      base_cdp_port: 9010,
+      base_server_port: 9110,
+      base_extension_port: 9310,
+      load_extensions: false,
+      headless: false,
+    },
+    graders: [],
+  }
+}
+
+const task: Task = {
+  query_id: 'task-1',
+  dataset: 'test',
+  query: 'Find the title',
+  graders: [],
+  metadata: {
+    original_task_id: 'task-1',
+  },
+}
+
+describe('ClaudeCodeEvaluator', () => {
+  it('accepts claude-code config defaults without permission mode', () => {
+    const agent = AgentConfigSchema.parse({ type: 'claude-code' })
+
+    expect(agent).toEqual({
+      type: 'claude-code',
+      claudePath: 'claude',
+      extraArgs: [],
+    })
+  })
+
+  it('accepts claude-code as a runnable eval agent', () => {
+    const parsed = EvalConfigSchema.parse({
+      agent: {
+        type: 'claude-code',
+        model: 'opus',
+      },
+      dataset: 'data/test-set.jsonl',
+      browseros: {
+        server_url: 'http://127.0.0.1:9110',
+      },
+    })
+
+    expect(parsed.agent.type).toBe('claude-code')
+    expect(parsed.agent.model).toBe('opus')
+  })
+
+  it('rejects unsupported claude-code settings instead of silently ignoring them', () => {
+    expect(
+      AgentConfigSchema.safeParse({
+        type: 'claude-code',
+        permissionMode: 'bypassPermissions',
+      }).success,
+    ).toBe(false)
+    expect(
+      AgentConfigSchema.safeParse({
+        type: 'claude-code',
+        maxTurns: 3,
+      }).success,
+    ).toBe(false)
+  })
+
+  it('allows claude-code in task metadata', () => {
+    const metadata = TaskMetadataSchema.parse({
+      query_id: 'task-1',
+      dataset: 'test',
+      query: 'Do the thing',
+      started_at: new Date().toISOString(),
+      completed_at: new Date().toISOString(),
+      total_duration_ms: 100,
+      total_steps: 1,
+      termination_reason: 'completed',
+      final_answer: 'done',
+      errors: [],
+      warnings: [],
+      agent_config: {
+        type: 'claude-code',
+        model: 'opus',
+      },
+      grader_results: {},
+    })
+
+    expect(metadata.agent_config.type).toBe('claude-code')
+  })
+
+  it('is created by the agent factory', async () => {
+    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
+    const { capture, taskOutputDir } = await CaptureContext.create({
+      serverUrl: 'http://127.0.0.1:9110',
+      outputDir,
+      taskId: task.query_id,
+      initialPageId: 1,
+    })
+
+    const agent = createAgent({
+      config: config(),
+      task,
+      workerIndex: 0,
+      initialPageId: 1,
+      outputDir,
+      taskOutputDir,
+      capture,
+    })
+
+    expect(agent).toBeInstanceOf(ClaudeCodeEvaluator)
+  })
+
+  it('runs claude code, logs messages, writes MCP config, and saves metadata', async () => {
+    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
+    const { capture, taskOutputDir } = await CaptureContext.create({
+      serverUrl: 'http://127.0.0.1:9110',
+      outputDir,
+      taskId: task.query_id,
+      initialPageId: 1,
+    })
+    const calls: Array<{ executable: string; args: string[]; cwd: string }> = []
+    const evaluator = new ClaudeCodeEvaluator(
+      {
+        config: config(),
+        task,
+        workerIndex: 0,
+        initialPageId: 1,
+        outputDir,
+        taskOutputDir,
+        capture,
+      },
+      {
+        processRunner: {
+          async run(options) {
+            calls.push(options)
+            await options.onStdoutLine(
+              JSON.stringify({
+                type: 'assistant',
+                message: {
+                  content: [{ type: 'text', text: 'The title is Example' }],
+                },
+              }),
+            )
+            await options.onStdoutLine(
+              JSON.stringify({
+                type: 'result',
+                subtype: 'success',
+                result: 'The title is Example',
+              }),
+            )
+            return { exitCode: 0, stderr: '' }
+          },
+        },
+      },
+    )
+
+    const result = await evaluator.execute()
+
+    expect(result.finalAnswer).toBe('The title is Example')
+    expect(result.metadata.agent_config).toMatchObject({
+      type: 'claude-code',
+      model: 'opus',
+    })
+    expect(result.messages.some((msg) => msg.type === 'user')).toBe(true)
+    expect(result.messages.some((msg) => msg.type === 'text-delta')).toBe(true)
+    const mcpConfig = JSON.parse(
+      await readFile(join(taskOutputDir, 'claude-code-mcp.json'), 'utf-8'),
+    )
+    expect(mcpConfig.mcpServers.browseros).toMatchObject({
+      type: 'http',
+      url: 'http://127.0.0.1:9110/mcp',
+      headers: {
+        'X-BrowserOS-Source': 'sdk-internal',
+      },
+    })
+    expect(calls).toEqual([
+      expect.objectContaining({
+        executable: 'claude',
+        cwd: taskOutputDir,
+        args: [
+          '-p',
+          expect.stringContaining('Task: Find the title'),
+          '--mcp-config',
+          join(taskOutputDir, 'claude-code-mcp.json'),
+          '--strict-mcp-config',
+          '--output-format',
+          'stream-json',
+          '--verbose',
+          '--model',
+          'opus',
+        ],
+      }),
+    ])
+    expect(calls[0].args).not.toContain('--permission-mode')
+  })
+
+  it('records non-fatal stream processing errors as warnings', async () => {
+    const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
+    const { capture, taskOutputDir } = await CaptureContext.create({
+      serverUrl: 'http://127.0.0.1:9110',
+      outputDir,
+      taskId: task.query_id,
+      initialPageId: 1,
+    })
+    const evaluator = new ClaudeCodeEvaluator(
+      {
+        config: config(),
+        task,
+        workerIndex: 0,
+        initialPageId: 1,
+        outputDir,
+        taskOutputDir,
+        capture,
+      },
+      {
+        processRunner: {
+          async run(options) {
+            await options.onStdoutLine(
+              JSON.stringify({
+                type: 'result',
+                subtype: 'success',
+                result: 'done',
+              }),
+            )
+            return {
+              exitCode: 0,
+              stderr: '',
+              streamErrors: ['bad stream line'],
+            }
+          },
+        },
+      },
+    )
+
+    const result = await evaluator.execute()
+
+    expect(result.finalAnswer).toBe('done')
+    expect(result.metadata.warnings).toEqual([
+      expect.objectContaining({
+        source: 'message_logging',
+        message: 'Claude Code stream event processing failed: bad stream line',
+      }),
+    ])
+  })
+})
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-process-runner.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-process-runner.test.ts
@@ -0,0 +1,78 @@
+import { describe, expect, it } from 'bun:test'
+import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { createClaudeCodeProcessRunner } from '../../src/agents/claude-code/process-runner'
+
+async function writeStdoutScript(): Promise<string> {
+  const dir = await mkdtemp(join(tmpdir(), 'claude-code-runner-'))
+  const script = join(dir, 'stdout-lines')
+  await writeFile(script, '#!/bin/sh\nprintf "first\\nbad\\nlast\\n"\n')
+  await chmod(script, 0o755)
+  return script
+}
+
+describe('createClaudeCodeProcessRunner', () => {
+  it('passes executable and args to the spawn dependency', async () => {
+    const calls: unknown[] = []
+    const runner = createClaudeCodeProcessRunner({
+      spawn: async (cmd, options) => {
+        calls.push({ cmd, options })
+        await options.onStdoutLine('{"type":"result","result":"done"}')
+        return { exitCode: 0, stderr: '' }
+      },
+    })
+
+    const result = await runner.run({
+      executable: 'claude',
+      args: ['-p', 'hello'],
+      cwd: '/tmp',
+      signal: new AbortController().signal,
+      onStdoutLine: async () => {},
+    })
+
+    expect(result.exitCode).toBe(0)
+    expect(calls).toEqual([
+      {
+        cmd: ['claude', '-p', 'hello'],
+        options: expect.objectContaining({ cwd: '/tmp' }),
+      },
+    ])
+  })
+
+  it('returns stderr and non-zero exit codes', async () => {
+    const runner = createClaudeCodeProcessRunner({
+      spawn: async () => ({ exitCode: 2, stderr: 'bad auth' }),
+    })
+
+    const result = await runner.run({
+      executable: 'claude',
+      args: [],
+      cwd: '/tmp',
+      signal: new AbortController().signal,
+      onStdoutLine: async () => {},
+    })
+
+    expect(result).toEqual({ exitCode: 2, stderr: 'bad auth' })
+  })
+
+  it('continues reading stdout after a line handler error', async () => {
+    const script = await writeStdoutScript()
+    const lines: string[] = []
+    const runner = createClaudeCodeProcessRunner()
+
+    const result = await runner.run({
+      executable: script,
+      args: [],
+      cwd: '/tmp',
+      onStdoutLine: async (line) => {
+        lines.push(line)
+        if (line === 'bad') throw new Error('bad line')
+      },
+    })
+
+    expect(result.exitCode).toBe(0)
+    expect(result.streamErrors).toEqual(['bad line'])
+    expect(lines).toEqual(['first', 'bad', 'last'])
+  })
+})
--- a/packages/browseros-agent/apps/eval/tests/agents/claude-code-stream-parser.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/agents/claude-code-stream-parser.test.ts
@@ -0,0 +1,102 @@
+import { describe, expect, it } from 'bun:test'
+import {
+  ClaudeCodeStreamParser,
+  shouldCaptureScreenshotForTool,
+} from '../../src/agents/claude-code/stream-parser'
+
+describe('ClaudeCodeStreamParser', () => {
+  it('maps assistant text and MCP tool use into eval stream events', () => {
+    const parser = new ClaudeCodeStreamParser()
+    const events = parser.pushLine(
+      JSON.stringify({
+        type: 'assistant',
+        message: {
+          content: [
+            { type: 'text', text: 'I will navigate.' },
+            {
+              type: 'tool_use',
+              id: 'toolu_1',
+              name: 'mcp__browseros__navigate_page',
+              input: { page: 2, url: 'https://example.com' },
+            },
+          ],
+        },
+      }),
+    )
+
+    expect(events).toEqual([
+      { type: 'text-start', id: expect.any(String) },
+      {
+        type: 'text-delta',
+        id: expect.any(String),
+        delta: 'I will navigate.',
+      },
+      { type: 'text-end', id: expect.any(String) },
+      {
+        type: 'tool-input-available',
+        toolCallId: 'toolu_1',
+        toolName: 'mcp__browseros__navigate_page',
+        input: { page: 2, url: 'https://example.com' },
+      },
+    ])
+    expect(parser.getLastText()).toBe('I will navigate.')
+    expect(parser.getToolCallCount()).toBe(1)
+  })
+
+  it('maps Claude Code tool results into eval output events', () => {
+    const parser = new ClaudeCodeStreamParser()
+    const events = parser.pushLine(
+      JSON.stringify({
+        type: 'user',
+        message: {
+          content: [
+            {
+              type: 'tool_result',
+              tool_use_id: 'toolu_1',
+              content: 'Navigated successfully',
+            },
+          ],
+        },
+      }),
+    )
+
+    expect(events).toEqual([
+      {
+        type: 'tool-output-available',
+        toolCallId: 'toolu_1',
+        output: 'Navigated successfully',
+      },
+    ])
+  })
+
+  it('uses result messages as the authoritative final text', () => {
+    const parser = new ClaudeCodeStreamParser()
+    parser.pushLine(
+      JSON.stringify({
+        type: 'assistant',
+        message: {
+          content: [{ type: 'text', text: 'I will complete the task.' }],
+        },
+      }),
+    )
+    parser.pushLine(
+      JSON.stringify({
+        type: 'result',
+        subtype: 'success',
+        result: 'Final answer',
+      }),
+    )
+
+    expect(parser.getLastText()).toBe('Final answer')
+  })
+
+  it('identifies BrowserOS MCP tools that should trigger screenshots', () => {
+    expect(
+      shouldCaptureScreenshotForTool('mcp__browseros__navigate_page'),
+    ).toBe(true)
+    expect(
+      shouldCaptureScreenshotForTool('mcp__browseros__take_screenshot'),
+    ).toBe(false)
+    expect(shouldCaptureScreenshotForTool('Read')).toBe(false)
+  })
+})
--- a/packages/browseros-agent/apps/eval/tests/cli/suite-command.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/cli/suite-command.test.ts
@@ -7,8 +7,11 @@ import {
  runSuiteCommand,
 } from '../../src/cli/commands/suite'
 import type { RunEvalOptions } from '../../src/runner/types'
+import type { EvalSuite } from '../../src/suites/schema'

-async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
+async function writeTempSuite(
+  overrides: Partial<EvalSuite> = {},
+): Promise<{ dir: string; suitePath: string }> {
  const dir = await mkdtemp(join(tmpdir(), 'eval-suite-cli-'))
  const suitePath = join(dir, 'agisdk-daily-10.json')
  await writeFile(
@@ -23,8 +26,9 @@ async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
        restartBrowserPerTask: true,
        browseros: {
          server_url: 'http://127.0.0.1:9110',
-          headless: true,
+          headless: false,
        },
+        ...overrides,
      },
      null,
      2,
@@ -43,9 +47,7 @@ describe('suite command', () => {

    expect(resolved.kind).toBe('config')
    expect(resolved.suite.id).toBe('browseros-agent-weekly')
-    expect(resolved.evalConfig.dataset).toBe(
-      '../../data/webbench-2of4-50.jsonl',
-    )
+    expect(resolved.evalConfig.dataset).toBe('../../data/agisdk-real.jsonl')
    expect(resolved.variant.publicMetadata.agent.apiKeyConfigured).toBe(true)
  })

@@ -75,6 +77,25 @@ describe('suite command', () => {
    expect(resolved.evalConfig.num_workers).toBe(2)
  })

+  it('resolves claude-code suites without provider API credentials', async () => {
+    const { dir, suitePath } = await writeTempSuite({
+      agent: { type: 'claude-code' },
+    })
+
+    const resolved = await resolveSuiteCommand({
+      suitePath,
+      model: 'opus',
+      env: {},
+    })
+
+    expect(resolved.kind).toBe('suite')
+    expect(resolved.evalConfig.agent).toMatchObject({
+      type: 'claude-code',
+      model: 'opus',
+    })
+    expect(resolved.datasetPath).toBe(join(dir, 'tasks.jsonl'))
+  })
+
  it('runs config and suite commands through the runner dependency', async () => {
    const calls: RunEvalOptions[] = []
    await runSuiteCommand(
--- a/packages/browseros-agent/apps/eval/tests/grading/python-evaluator.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/grading/python-evaluator.test.ts
@@ -1,5 +1,5 @@
 import { describe, expect, it } from 'bun:test'
-import { mkdtemp, writeFile } from 'node:fs/promises'
+import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
 import { tmpdir } from 'node:os'
 import { join } from 'node:path'
 import { runPythonJsonEvaluator } from '../../src/grading/python-evaluator'
@@ -11,6 +11,17 @@ async function writeScript(source: string): Promise<string> {
  return script
 }

+async function writePythonWrapper(): Promise<string> {
+  const dir = await mkdtemp(join(tmpdir(), 'eval-python-wrapper-'))
+  const wrapper = join(dir, 'python-wrapper')
+  await writeFile(
+    wrapper,
+    '#!/bin/sh\necho custom-python >&2\nexec python3 "$@"\n',
+  )
+  await chmod(wrapper, 0o755)
+  return wrapper
+}
+
 describe('runPythonJsonEvaluator', () => {
  it('sends JSON on stdin, captures stderr, and parses stdout JSON', async () => {
    const script = await writeScript(`
@@ -49,6 +60,34 @@ sys.exit(3)
    ).rejects.toThrow('bad verifier')
  })

+  it('uses BROWSEROS_EVAL_PYTHON when provided', async () => {
+    const script = await writeScript(`
+import json, sys
+data = json.loads(sys.stdin.read())
+print(json.dumps({"ok": data["ok"]}))
+`)
+    const wrapper = await writePythonWrapper()
+    const previousPythonPath = process.env.BROWSEROS_EVAL_PYTHON
+    process.env.BROWSEROS_EVAL_PYTHON = wrapper
+
+    try {
+      const result = await runPythonJsonEvaluator<{ ok: boolean }>({
+        scriptPath: script,
+        input: { ok: true },
+        timeoutMs: 5_000,
+      })
+
+      expect(result.output).toEqual({ ok: true })
+      expect(result.stderr).toContain('custom-python')
+    } finally {
+      if (previousPythonPath === undefined) {
+        delete process.env.BROWSEROS_EVAL_PYTHON
+      } else {
+        process.env.BROWSEROS_EVAL_PYTHON = previousPythonPath
+      }
+    }
+  })
+
  it('enforces timeouts', async () => {
    const script = await writeScript(`
 import time
--- a/packages/browseros-agent/apps/eval/tests/suites/config-adapter.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/suites/config-adapter.test.ts
@@ -1,15 +1,18 @@
 import { describe, expect, it } from 'bun:test'
+import { mkdtemp, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
 import { adaptEvalConfigFile } from '../../src/suites/config-adapter'

 describe('adaptEvalConfigFile', () => {
-  it('preserves browseros-agent-weekly config semantics', async () => {
+  it('preserves browseros-agent-weekly AGI SDK config semantics', async () => {
    const adapted = await adaptEvalConfigFile(
      'apps/eval/configs/legacy/browseros-agent-weekly.json',
    )

    expect(adapted.suite.id).toBe('browseros-agent-weekly')
-    expect(adapted.suite.dataset).toBe('../../data/webbench-2of4-50.jsonl')
-    expect(adapted.suite.graders).toEqual(['performance_grader'])
+    expect(adapted.suite.dataset).toBe('../../data/agisdk-real.jsonl')
+    expect(adapted.suite.graders).toEqual(['agisdk_state_diff'])
    expect(adapted.suite.workers).toBe(10)
    expect(adapted.suite.restartBrowserPerTask).toBe(true)
    expect(adapted.suite.timeoutMs).toBe(1_800_000)
@@ -34,4 +37,33 @@ describe('adaptEvalConfigFile', () => {
      'secret-openrouter-value',
    )
  })
+
+  it('adapts claude-code configs without provider credentials', async () => {
+    const dir = await mkdtemp(join(tmpdir(), 'claude-code-config-'))
+    const configPath = join(dir, 'claude-code-agisdk.json')
+    await writeFile(
+      configPath,
+      JSON.stringify({
+        agent: {
+          type: 'claude-code',
+          model: 'opus',
+        },
+        dataset: 'tasks.jsonl',
+        num_workers: 1,
+        restart_server_per_task: false,
+        browseros: {
+          server_url: 'http://127.0.0.1:9110',
+          headless: false,
+        },
+      }),
+    )
+
+    const adapted = await adaptEvalConfigFile(configPath, { env: {} })
+
+    expect(adapted.suite.agent).toEqual({ type: 'claude-code' })
+    expect(adapted.variant.agent).toMatchObject({
+      provider: 'claude-code',
+      model: 'opus',
+    })
+  })
 })
--- a/packages/browseros-agent/apps/eval/tests/suites/schema.test.ts
+++ b/packages/browseros-agent/apps/eval/tests/suites/schema.test.ts
@@ -35,6 +35,16 @@ describe('EvalSuiteSchema', () => {
    expect(parsed.success).toBe(false)
  })

+  it('validates claude-code suites', () => {
+    const suite = EvalSuiteSchema.parse({
+      id: 'claude-code-agisdk',
+      dataset: 'data/agisdk-real.jsonl',
+      agent: { type: 'claude-code' },
+    })
+
+    expect(suite.agent.type).toBe('claude-code')
+  })
+
  it('validates the daily AGISDK 10-task suite', async () => {
    const loaded = await loadSuite(
      'apps/eval/configs/suites/agisdk-daily-10.json',
@@ -89,4 +99,40 @@ describe('resolveVariant', () => {
      }),
    ).toThrow('EVAL_AGENT_API_KEY')
  })
+
+  it('resolves claude-code variants without model or API key requirements', () => {
+    const variant = resolveVariant({
+      variantId: 'claude-opus',
+      provider: 'claude-code',
+      model: 'opus',
+      env: {},
+    })
+
+    expect(variant.id).toBe('claude-opus')
+    expect(variant.agent).toEqual({
+      provider: 'claude-code',
+      model: 'opus',
+    })
+    expect(variant.publicMetadata.agent).toEqual({
+      provider: 'claude-code',
+      model: 'opus',
+      apiKeyConfigured: false,
+    })
+
+    const defaultVariant = resolveVariant({
+      provider: 'claude-code',
+      env: {},
+    })
+
+    expect(defaultVariant.id).toBe('claude-code')
+    expect(defaultVariant.agent).toEqual({
+      provider: 'claude-code',
+      model: '',
+    })
+    expect(defaultVariant.publicMetadata.agent).toEqual({
+      provider: 'claude-code',
+      model: 'default',
+      apiKeyConfigured: false,
+    })
+  })
 })
--- a/packages/browseros-agent/package.json
+++ b/packages/browseros-agent/package.json
@@ -12,9 +12,13 @@
    "dev:watch": "./tools/dev/run.sh watch",
    "dev:watch:new": "./tools/dev/run.sh watch --new",
    "dev:manual": "./tools/dev/run.sh watch --manual",
-    "dev:setup": "./tools/dev/setup.sh",
-    "dev:cleanup": "./tools/dev/run.sh cleanup",
-    "dev:reset": "./tools/dev/run.sh reset",
+    "dev:setup": "./tools/dev/run.sh setup",
+    "dev:cleanup": "./tools/dev/run.sh cleanup --target dev",
+    "dev:reset": "./tools/dev/run.sh reset --target dev",
+    "dev:cleanup:dogfood": "./tools/dev/run.sh cleanup --target dogfood",
+    "dev:reset:dogfood": "./tools/dev/run.sh reset --target dogfood",
+    "dev:cleanup:prod": "./tools/dev/run.sh cleanup --target prod",
+    "dev:reset:prod": "./tools/dev/run.sh reset --target prod",
    "install:browseros-dogfood": "make -C tools/dogfood install",
    "test:env": "./tools/dev/run.sh test",
    "test:cleanup": "./tools/dev/run.sh cleanup --quick --yes",
--- a/packages/browseros-agent/tools/dev/cmd/cleanup.go
+++ b/packages/browseros-agent/tools/dev/cmd/cleanup.go
@@ -14,16 +14,20 @@ import (

 var cleanupCmd = &cobra.Command{
 	Use:   "cleanup",
-	Short: "Kill port processes and remove orphaned temp directories",
-	Long:  "Stops old dev watch processes, clears dev/test ports, and removes orphaned browseros-* temp directories.",
+	Short: "Kill target processes and remove orphaned temp directories",
+	Long:  "Stops target BrowserOS processes, clears target ports, and removes target temp directories.",
 	RunE:  runCleanup,
 }

 var (
-	cleanupPorts bool
-	cleanupTemps bool
-	cleanupQuick bool
-	cleanupYes   bool
+	cleanupOnlyPorts          bool
+	cleanupOnlyTemps          bool
+	cleanupQuick              bool
+	cleanupYes                bool
+	cleanupTarget             string
+	cleanupBrowserOSDir       string
+	cleanupPortsValue         string
+	cleanupBrowserUserDataDir string
 )

 type safeCleanupOptions struct {
@@ -32,8 +36,12 @@ type safeCleanupOptions struct {
 }

 func init() {
-	cleanupCmd.Flags().BoolVar(&cleanupPorts, "ports", false, "Only kill port processes")
-	cleanupCmd.Flags().BoolVar(&cleanupTemps, "temps", false, "Only remove temp directories")
+	cleanupCmd.Flags().StringVar(&cleanupTarget, "target", targetDev, "Cleanup target: dev, dogfood, or prod")
+	cleanupCmd.Flags().StringVar(&cleanupBrowserOSDir, "browseros-dir", "", "Override target BrowserOS state directory")
+	cleanupCmd.Flags().StringVar(&cleanupPortsValue, "ports", "", "Override ports as cdp,server,extension")
+	cleanupCmd.Flags().StringVar(&cleanupBrowserUserDataDir, "browser-user-data-dir", "", "Override BrowserOS user-data dir to stop")
+	cleanupCmd.Flags().BoolVar(&cleanupOnlyPorts, "only-ports", false, "Only kill port processes")
+	cleanupCmd.Flags().BoolVar(&cleanupOnlyTemps, "only-temps", false, "Only remove temp directories")
 	cleanupCmd.Flags().BoolVar(&cleanupQuick, "quick", false, "Run safe cleanup only")
 	cleanupCmd.Flags().BoolVar(&cleanupYes, "yes", false, "Answer yes to the safe cleanup prompt")
 	rootCmd.AddCommand(cleanupCmd)
@@ -42,11 +50,24 @@ func init() {
 // runCleanup performs the non-destructive daily cleanup path for local dev.
 func runCleanup(cmd *cobra.Command, args []string) error {
 	out := cmd.OutOrStdout()
+	root, err := proc.FindMonorepoRoot()
+	if err != nil {
+		return err
+	}
+	target, err := resolveResetTarget(root, resetTargetOptions{
+		Target:             cleanupTarget,
+		BrowserOSDir:       cleanupBrowserOSDir,
+		Ports:              cleanupPortsValue,
+		BrowserUserDataDir: cleanupBrowserUserDataDir,
+	})
+	if err != nil {
+		return err
+	}
 	if !cleanupYes && !cleanupQuick {
 		ok, err := confirmYesNo(out, bufio.NewReader(os.Stdin), resetPrompt{
 			Title:  "Run safe cleanup?",
-			Body:   "Stops old dev watch processes, clears dev ports, and removes temporary /tmp browser profiles. This does not touch ~/.browseros-dev, Lima, containers, images, or saved dev data.",
-			Action: "Run safe cleanup",
+			Body:   fmt.Sprintf("Stops %s processes, clears target ports, and removes target temp profiles. This does not touch saved BrowserOS data, Lima, containers, or images.", target.Name),
+			Action: "Run safe cleanup for " + target.Name,
 		})
 		if err != nil {
 			return err
@@ -56,42 +77,51 @@ func runCleanup(cmd *cobra.Command, args []string) error {
 			return nil
 		}
 	}
-	return runSafeCleanup(out, safeCleanupOptions{
-		ports: !cleanupTemps || cleanupPorts,
-		temps: !cleanupPorts || cleanupTemps,
+	if err := ensureTargetStopped(out, target); err != nil {
+		return err
+	}
+	return runSafeCleanup(out, target, safeCleanupOptions{
+		ports: !cleanupOnlyTemps || cleanupOnlyPorts,
+		temps: !cleanupOnlyPorts || cleanupOnlyTemps,
 	})
 }

 // runSafeCleanup is shared by cleanup and reset before any destructive repair steps.
-func runSafeCleanup(out io.Writer, opts safeCleanupOptions) error {
+func runSafeCleanup(out io.Writer, target resetTarget, opts safeCleanupOptions) error {
 	if opts.ports {
-		ports := proc.DefaultLocalPorts()
-		stopped, err := proc.StopAllWatchProcesses(3 * time.Second)
-		if err != nil {
-			return err
+		if target.WatchRunStateDir != "" {
+			stopped, err := proc.StopAllWatchProcessesInDir(target.WatchRunStateDir, 3*time.Second)
+			if err != nil {
+				return err
+			}
+			if stopped > 0 {
+				fmt.Fprintf(out, "%s stopped %d old %s watch process group(s)\n", successStyle.Sprint("Stopped:"), stopped, target.Name)
+			}
 		}
-		if stopped > 0 {
-			fmt.Fprintf(out, "%s stopped %d old dev watch process group(s)\n", successStyle.Sprint("Stopped:"), stopped)
+		if len(target.BrowserUserDataDirs) > 0 {
+			killedBrowsers, err := proc.KillBrowserProcessesForUserDataDirs(target.BrowserUserDataDirs, 3*time.Second)
+			if err != nil {
+				return err
+			}
+			if killedBrowsers > 0 {
+				fmt.Fprintf(out, "%s stopped %d BrowserOS %s profile process(es)\n", successStyle.Sprint("Stopped:"), killedBrowsers, target.Name)
+			}
 		}
-		killedBrowsers, err := proc.KillBrowserProcessesForDevProfiles(3 * time.Second)
-		if err != nil {
-			return err
+		if target.Ports != nil {
+			ports := *target.Ports
+			fmt.Fprintf(out, "%s ports %d, %d, %d\n", labelStyle.Sprint("Clearing:"), ports.CDP, ports.Server, ports.Extension)
+			if err := proc.KillPortsAndWait(ports, 3*time.Second); err != nil {
+				return err
+			}
+			fmt.Fprintln(out, successStyle.Sprint("Ports cleared."))
 		}
-		if killedBrowsers > 0 {
-			fmt.Fprintf(out, "%s stopped %d BrowserOS dev/test profile process(es)\n", successStyle.Sprint("Stopped:"), killedBrowsers)
-		}
-		fmt.Fprintf(out, "%s ports %d, %d, %d\n", labelStyle.Sprint("Clearing:"), ports.CDP, ports.Server, ports.Extension)
-		if err := proc.KillPortsAndWait(ports, 3*time.Second); err != nil {
-			return err
-		}
-		fmt.Fprintln(out, successStyle.Sprint("Ports cleared."))
 	}

 	if opts.temps {
-		n := proc.CleanupTempDirs("browseros-test-", "browseros-dev-")
+		n := proc.CleanupTempDirs(target.TempPrefixes...)
 		if n > 0 {
 			fmt.Fprintf(out, "%s removed %d temp directories\n", successStyle.Sprint("Removed:"), n)
-		} else {
+		} else if len(target.TempPrefixes) > 0 {
 			fmt.Fprintln(out, dimStyle.Sprint("No orphaned temp directories found."))
 		}
 	}
--- a/packages/browseros-agent/tools/dev/cmd/cleanup_test.go
+++ b/packages/browseros-agent/tools/dev/cmd/cleanup_test.go
@@ -64,7 +64,11 @@ func TestConfirmTypedRequiresExactToken(t *testing.T) {

 func TestResetOverviewTellsUserToUseSmallestReset(t *testing.T) {
 	var out bytes.Buffer
-	printResetOverview(&out, devPaths{Root: "/Users/me/.browseros-dev"})
+	printResetOverview(&out, resetTarget{
+		Title:           "BrowserOS dev reset",
+		BrowserOSDir:    "/Users/me/.browseros-dev",
+		DeleteRootLabel: "Delete dev profile:",
+	})

 	text := out.String()
 	for _, want := range []string{
--- a/packages/browseros-agent/tools/dev/cmd/dogfood_stop.go
+++ b/packages/browseros-agent/tools/dev/cmd/dogfood_stop.go
@@ -0,0 +1,197 @@
+package cmd
+
+import (
+	"bufio"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"io"
+	"net"
+	"os"
+	"syscall"
+	"time"
+)
+
+const dogfoodStopTimeout = 10 * time.Second
+
+type dogfoodRunState struct {
+	PID        int    `json:"pid"`
+	Mode       string `json:"mode"`
+	SocketPath string `json:"socket_path"`
+	LogPath    string `json:"log_path"`
+}
+
+type dogfoodIPCRequest struct {
+	Command string `json:"command"`
+}
+
+type dogfoodIPCResponse struct {
+	OK    bool   `json:"ok"`
+	Error string `json:"error,omitempty"`
+}
+
+func ensureTargetStopped(out io.Writer, target resetTarget) error {
+	if target.Dogfood == nil {
+		return nil
+	}
+	return stopDogfoodRun(out, *target.Dogfood, dogfoodStopTimeout)
+}
+
+func stopDogfoodRun(out io.Writer, target dogfoodRuntimeTarget, timeout time.Duration) error {
+	active, err := dogfoodRunActive(target.LockPath)
+	if err != nil {
+		return err
+	}
+	if !active {
+		cleanupDogfoodRunFilesWithWarning(out, target)
+		return nil
+	}
+
+	fmt.Fprintln(out, labelStyle.Sprint("Stopping dogfood run first."))
+	if err := stopDogfoodDaemon(target); err == nil {
+		if stopped, err := waitForDogfoodStopped(out, target, timeout); err != nil {
+			return err
+		} else if stopped {
+			fmt.Fprintln(out, successStyle.Sprint("Dogfood stopped."))
+			return nil
+		}
+	}
+
+	state, err := readDogfoodRunState(target.StatePath)
+	if err != nil {
+		return fmt.Errorf("dogfood is running but state is unreadable at %s: %w", target.StatePath, err)
+	}
+	if state.PID <= 0 {
+		return fmt.Errorf("dogfood is running but state has no pid at %s", target.StatePath)
+	}
+	if err := signalDogfoodPID(state.PID, syscall.SIGTERM); err != nil {
+		return err
+	}
+	if stopped, err := waitForDogfoodStopped(out, target, timeout); err != nil {
+		return err
+	} else if stopped {
+		fmt.Fprintln(out, successStyle.Sprint("Dogfood stopped."))
+		return nil
+	}
+	if err := signalDogfoodPID(state.PID, syscall.SIGKILL); err != nil {
+		return err
+	}
+	if stopped, err := waitForDogfoodStopped(out, target, time.Second); err != nil {
+		return err
+	} else if stopped {
+		fmt.Fprintln(out, successStyle.Sprint("Dogfood force-stopped."))
+		return nil
+	}
+	return fmt.Errorf("dogfood is still running; stop it manually before cleanup/reset")
+}
+
+func stopDogfoodDaemon(target dogfoodRuntimeTarget) error {
+	socketPath := target.SocketPath
+	if state, err := readDogfoodRunState(target.StatePath); err == nil && state.SocketPath != "" {
+		socketPath = state.SocketPath
+	}
+	conn, err := net.DialTimeout("unix", socketPath, 700*time.Millisecond)
+	if err != nil {
+		return err
+	}
+	defer conn.Close()
+
+	data, err := json.Marshal(dogfoodIPCRequest{Command: "stop"})
+	if err != nil {
+		return err
+	}
+	data = append(data, '\n')
+	if _, err := conn.Write(data); err != nil {
+		return err
+	}
+	_ = conn.SetReadDeadline(time.Now().Add(2 * time.Second))
+	scanner := bufio.NewScanner(conn)
+	if !scanner.Scan() {
+		if err := scanner.Err(); err != nil {
+			return err
+		}
+		return errors.New("dogfood daemon closed connection without response")
+	}
+	var response dogfoodIPCResponse
+	if err := json.Unmarshal(scanner.Bytes(), &response); err != nil {
+		return err
+	}
+	if response.Error != "" {
+		return errors.New(response.Error)
+	}
+	if !response.OK {
+		return errors.New("dogfood daemon did not accept stop request")
+	}
+	return nil
+}
+
+func waitForDogfoodStopped(out io.Writer, target dogfoodRuntimeTarget, timeout time.Duration) (bool, error) {
+	deadline := time.Now().Add(timeout)
+	for {
+		active, err := dogfoodRunActive(target.LockPath)
+		if err != nil {
+			return false, err
+		}
+		if !active {
+			cleanupDogfoodRunFilesWithWarning(out, target)
+			return true, nil
+		}
+		if time.Now().After(deadline) {
+			return false, nil
+		}
+		time.Sleep(100 * time.Millisecond)
+	}
+}
+
+func dogfoodRunActive(lockPath string) (bool, error) {
+	file, err := os.OpenFile(lockPath, os.O_CREATE|os.O_RDWR, 0o644)
+	if err != nil {
+		return false, err
+	}
+	defer file.Close()
+	if err := syscall.Flock(int(file.Fd()), syscall.LOCK_EX|syscall.LOCK_NB); err != nil {
+		if errors.Is(err, syscall.EWOULDBLOCK) || errors.Is(err, syscall.EAGAIN) {
+			return true, nil
+		}
+		return false, err
+	}
+	return false, syscall.Flock(int(file.Fd()), syscall.LOCK_UN)
+}
+
+func readDogfoodRunState(path string) (dogfoodRunState, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return dogfoodRunState{}, err
+	}
+	var state dogfoodRunState
+	if err := json.Unmarshal(data, &state); err != nil {
+		return dogfoodRunState{}, err
+	}
+	return state, nil
+}
+
+func signalDogfoodPID(pid int, sig syscall.Signal) error {
+	if pid <= 0 {
+		return fmt.Errorf("invalid dogfood pid %d", pid)
+	}
+	if err := syscall.Kill(pid, sig); err != nil && err != syscall.ESRCH {
+		return err
+	}
+	return nil
+}
+
+func cleanupDogfoodRunFilesWithWarning(out io.Writer, target dogfoodRuntimeTarget) {
+	if err := cleanupDogfoodRunFiles(target); err != nil {
+		fmt.Fprintf(out, "%s could not remove dogfood run files: %v\n", warnStyle.Sprint("Warning:"), err)
+	}
+}
+
+func cleanupDogfoodRunFiles(target dogfoodRuntimeTarget) error {
+	if err := os.Remove(target.SocketPath); err != nil && !os.IsNotExist(err) {
+		return err
+	}
+	if err := os.Remove(target.StatePath); err != nil && !os.IsNotExist(err) {
+		return err
+	}
+	return nil
+}
--- a/packages/browseros-agent/tools/dev/cmd/dogfood_stop_test.go
+++ b/packages/browseros-agent/tools/dev/cmd/dogfood_stop_test.go
@@ -0,0 +1,37 @@
+package cmd
+
+import (
+	"bytes"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+)
+
+func TestWaitForDogfoodStoppedWarnsWhenRunFileCleanupFails(t *testing.T) {
+	root := t.TempDir()
+	socketPath := filepath.Join(root, "dogfood.sock")
+	if err := os.Mkdir(socketPath, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(socketPath, "child"), []byte("x"), 0o644); err != nil {
+		t.Fatal(err)
+	}
+
+	var out bytes.Buffer
+	stopped, err := waitForDogfoodStopped(&out, dogfoodRuntimeTarget{
+		LockPath:   filepath.Join(root, "run.lock"),
+		SocketPath: socketPath,
+		StatePath:  filepath.Join(root, "state.json"),
+	}, time.Millisecond)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if !stopped {
+		t.Fatal("expected inactive dogfood run to be treated as stopped")
+	}
+	if !strings.Contains(out.String(), "Warning:") {
+		t.Fatalf("missing cleanup warning:\n%s", out.String())
+	}
+}
--- a/packages/browseros-agent/tools/dev/cmd/reset.go
+++ b/packages/browseros-agent/tools/dev/cmd/reset.go
@@ -10,11 +10,12 @@ import (
 	"path/filepath"
 	"strings"

+	"browseros-dev/proc"
+
 	"github.com/spf13/cobra"
 )

 const (
-	devDirName             = ".browseros-dev"
 	limaVMName             = "browseros-vm"
 	openClawImage          = "ghcr.io/openclaw/openclaw:2026.4.12"
 	openClawContainerName  = "browseros-openclaw-openclaw-gateway-1"
@@ -23,16 +24,11 @@ const (

 var resetCmd = &cobra.Command{
 	Use:   "reset",
-	Short: "Guide destructive BrowserOS dev profile and VM resets",
-	Long:  "Walks through safe cleanup, VM shutdown/deletion, OpenClaw container/image removal, and full ~/.browseros-dev reset.",
+	Short: "Guide destructive BrowserOS profile and VM resets",
+	Long:  "Walks through safe cleanup, VM shutdown/deletion, OpenClaw container/image removal, and target BrowserOS state reset.",
 	RunE:  runReset,
 }

-type devPaths struct {
-	Root     string
-	LimaHome string
-}
-
 type resetPrompt struct {
 	Title  string
 	Body   string
@@ -49,7 +45,18 @@ type podmanMachineEntry struct {
 	Running bool   `json:"Running"`
 }

+var (
+	resetTargetName         string
+	resetBrowserOSDir       string
+	resetPortsValue         string
+	resetBrowserUserDataDir string
+)
+
 func init() {
+	resetCmd.Flags().StringVar(&resetTargetName, "target", targetDev, "Reset target: dev, dogfood, or prod")
+	resetCmd.Flags().StringVar(&resetBrowserOSDir, "browseros-dir", "", "Override target BrowserOS state directory")
+	resetCmd.Flags().StringVar(&resetPortsValue, "ports", "", "Override ports as cdp,server,extension")
+	resetCmd.Flags().StringVar(&resetBrowserUserDataDir, "browser-user-data-dir", "", "Override BrowserOS user-data dir to stop")
 	rootCmd.AddCommand(resetCmd)
 }

@@ -57,21 +64,34 @@ func init() {
 func runReset(cmd *cobra.Command, args []string) error {
 	out := cmd.OutOrStdout()
 	reader := bufio.NewReader(os.Stdin)
-	paths, err := resolveDevPaths()
+	root, err := proc.FindMonorepoRoot()
+	if err != nil {
+		return err
+	}
+	target, err := resolveResetTarget(root, resetTargetOptions{
+		Target:             resetTargetName,
+		BrowserOSDir:       resetBrowserOSDir,
+		Ports:              resetPortsValue,
+		BrowserUserDataDir: resetBrowserUserDataDir,
+	})
 	if err != nil {
 		return err
 	}

-	printResetOverview(out, paths)
+	printResetOverview(out, target)
+
+	if err := ensureTargetStopped(out, target); err != nil {
+		return err
+	}

 	if ok, err := confirmYesNo(out, reader, resetPrompt{
 		Title:  "Run safe cleanup first?",
-		Body:   "This stops old dev watch processes, clears dev ports, and removes temporary /tmp browser profiles. It does not touch saved dev data.",
-		Action: "Run safe cleanup",
+		Body:   fmt.Sprintf("This stops %s processes, clears target ports, and removes target temp profiles. It does not touch saved BrowserOS data.", target.Name),
+		Action: "Run safe cleanup for " + target.Name,
 	}); err != nil {
 		return err
 	} else if ok {
-		if err := runSafeCleanup(out, safeCleanupOptions{ports: true, temps: true}); err != nil {
+		if err := runSafeCleanup(out, target, safeCleanupOptions{ports: true, temps: true}); err != nil {
 			return err
 		}
 	}
@@ -82,28 +102,28 @@ func runReset(cmd *cobra.Command, args []string) error {
 		if err := maybeResetLegacyPodman(out, reader); err != nil {
 			return err
 		}
-		return maybeDeleteDevProfile(out, reader, paths)
+		return maybeDeleteTargetRoot(out, reader, target)
 	}

-	vm, err := findVM(limactlPath, paths.LimaHome)
+	vm, err := findVM(limactlPath, target.LimaHome)
 	if err != nil {
 		fmt.Fprintf(out, "%s could not inspect Lima VMs: %v\n", warnStyle.Sprint("Warning:"), err)
 		if err := maybeResetLegacyPodman(out, reader); err != nil {
 			return err
 		}
-		return maybeDeleteDevProfile(out, reader, paths)
+		return maybeDeleteTargetRoot(out, reader, target)
 	}
 	if vm == nil {
-		fmt.Fprintf(out, "%s %s was not found in %s.\n", dimStyle.Sprint("Not found:"), limaVMName, pathStyle.Sprint(paths.LimaHome))
+		fmt.Fprintf(out, "%s %s was not found in %s.\n", dimStyle.Sprint("Not found:"), limaVMName, pathStyle.Sprint(target.LimaHome))
 		if err := maybeResetLegacyPodman(out, reader); err != nil {
 			return err
 		}
-		return maybeDeleteDevProfile(out, reader, paths)
+		return maybeDeleteTargetRoot(out, reader, target)
 	}

 	fmt.Fprintf(out, "%s %s %s\n", labelStyle.Sprint("Found VM:"), commandStyle.Sprint(vm.Name), dimStyle.Sprintf("(%s)", vm.Status))
 	if strings.EqualFold(vm.Status, "Running") {
-		if err := maybeResetOpenClaw(out, reader, limactlPath, paths.LimaHome); err != nil {
+		if err := maybeResetOpenClaw(out, reader, limactlPath, target.LimaHome); err != nil {
 			return err
 		}
 		if ok, err := confirmYesNo(out, reader, resetPrompt{
@@ -113,7 +133,7 @@ func runReset(cmd *cobra.Command, args []string) error {
 		}); err != nil {
 			return err
 		} else if ok {
-			if err := runLimactl(out, limactlPath, paths.LimaHome, "stop", limaVMName); err != nil {
+			if err := runLimactl(out, limactlPath, target.LimaHome, "stop", limaVMName); err != nil {
 				return err
 			}
 			fmt.Fprintln(out, successStyle.Sprint("VM stopped."))
@@ -125,12 +145,12 @@ func runReset(cmd *cobra.Command, args []string) error {

 	if ok, err := confirmYesNo(out, reader, resetPrompt{
 		Title:  "Delete VM?",
-		Body:   "This deletes the Lima VM and its container store. ~/.browseros-dev remains. OpenClaw will be pulled again next time.",
+		Body:   fmt.Sprintf("This deletes the Lima VM and its container store. %s remains. OpenClaw will be pulled again next time.", target.BrowserOSDir),
 		Action: "Delete browseros-vm",
 	}); err != nil {
 		return err
 	} else if ok {
-		if err := runLimactl(out, limactlPath, paths.LimaHome, "delete", "--force", limaVMName); err != nil {
+		if err := runLimactl(out, limactlPath, target.LimaHome, "delete", "--force", limaVMName); err != nil {
 			return err
 		}
 		fmt.Fprintln(out, successStyle.Sprint("VM deleted."))
@@ -140,35 +160,19 @@ func runReset(cmd *cobra.Command, args []string) error {
 		return err
 	}

-	return maybeDeleteDevProfile(out, reader, paths)
+	return maybeDeleteTargetRoot(out, reader, target)
 }

-func resolveDevPaths() (devPaths, error) {
-	if override := strings.TrimSpace(os.Getenv("BROWSEROS_DIR")); override != "" {
-		root, err := filepath.Abs(override)
-		if err != nil {
-			return devPaths{}, err
-		}
-		return devPaths{Root: root, LimaHome: filepath.Join(root, "lima")}, nil
-	}
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return devPaths{}, err
-	}
-	root := filepath.Join(home, devDirName)
-	return devPaths{Root: root, LimaHome: filepath.Join(root, "lima")}, nil
-}
-
-func printResetOverview(out io.Writer, paths devPaths) {
-	fmt.Fprintln(out, headerStyle.Sprint("BrowserOS dev reset"))
+func printResetOverview(out io.Writer, target resetTarget) {
+	fmt.Fprintln(out, headerStyle.Sprint(target.Title))
 	fmt.Fprintln(out)
-	fmt.Fprintf(out, "This can reset parts of %s. Pick the smallest reset that matches the problem.\n", pathStyle.Sprint(paths.Root))
+	fmt.Fprintf(out, "This can reset parts of %s. Pick the smallest reset that matches the problem.\n", pathStyle.Sprint(target.BrowserOSDir))
 	fmt.Fprintln(out)
 	fmt.Fprintf(out, "  %s %s\n", labelStyle.Sprint("Stop VM:"), dimStyle.Sprint("Shuts down browseros-vm. Keeps data."))
-	fmt.Fprintf(out, "  %s %s\n", labelStyle.Sprint("Delete VM:"), dimStyle.Sprint("Removes Lima/container state. Keeps the dev profile."))
+	fmt.Fprintf(out, "  %s %s\n", labelStyle.Sprint("Delete VM:"), dimStyle.Sprint("Removes Lima/container state. Keeps the target state root."))
 	fmt.Fprintf(out, "  %s %s\n", labelStyle.Sprint("Remove OpenClaw container:"), dimStyle.Sprint("Keeps the downloaded OpenClaw image."))
 	fmt.Fprintf(out, "  %s %s\n", labelStyle.Sprint("Remove OpenClaw image:"), dimStyle.Sprint("Next startup pulls it again."))
-	fmt.Fprintf(out, "  %s %s\n", warnStyle.Sprint("Delete dev profile:"), dimStyle.Sprint("Deletes the dev profile root and dev-local BrowserOS data."))
+	fmt.Fprintf(out, "  %s %s\n", warnStyle.Sprint(target.DeleteRootLabel), dimStyle.Sprint("Deletes the target BrowserOS state root."))
 	fmt.Fprintln(out)
 }

@@ -244,24 +248,24 @@ func maybeResetOpenClaw(out io.Writer, reader *bufio.Reader, limactlPath string,
 	return nil
 }

-func maybeDeleteDevProfile(out io.Writer, reader *bufio.Reader, paths devPaths) error {
+func maybeDeleteTargetRoot(out io.Writer, reader *bufio.Reader, target resetTarget) error {
 	ok, err := confirmTyped(
 		out,
 		reader,
-		"Delete dev profile?",
-		fmt.Sprintf("This deletes %s. It removes BrowserOS dev data plus VM/OpenClaw state.", pathStyle.Sprint(paths.Root)),
+		target.DeleteRootLabel,
+		fmt.Sprintf("This deletes %s. %s", pathStyle.Sprint(target.BrowserOSDir), target.DeleteRootBody),
 		"DELETE",
 	)
 	if err != nil || !ok {
 		return err
 	}
-	if err := validateDevProfileRootForDeletion(paths.Root); err != nil {
+	if err := validateDevProfileRootForDeletion(target.BrowserOSDir); err != nil {
 		return err
 	}
-	if err := os.RemoveAll(paths.Root); err != nil {
+	if err := os.RemoveAll(target.BrowserOSDir); err != nil {
 		return err
 	}
-	fmt.Fprintf(out, "%s %s\n", successStyle.Sprint("Deleted:"), pathStyle.Sprint(paths.Root))
+	fmt.Fprintf(out, "%s %s\n", successStyle.Sprint("Deleted:"), pathStyle.Sprint(target.BrowserOSDir))
 	return nil
 }

--- a/packages/browseros-agent/tools/dev/cmd/setup.go
+++ b/packages/browseros-agent/tools/dev/cmd/setup.go
@@ -0,0 +1,81 @@
+package cmd
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"path/filepath"
+
+	"browseros-dev/proc"
+
+	"github.com/spf13/cobra"
+)
+
+var setupIfNeeded bool
+
+const setupModeIfNeeded = true
+
+var setupCmd = &cobra.Command{
+	Use:   "setup",
+	Short: "Install dev dependencies and generate required code",
+	Long:  "Installs Bun dependencies and generates agent GraphQL code needed by the dev environment.",
+	RunE: func(cmd *cobra.Command, args []string) error {
+		root, err := proc.FindMonorepoRoot()
+		if err != nil {
+			return err
+		}
+		return runDevSetup(cmd.Context(), root, setupIfNeeded)
+	},
+}
+
+type setupPlan struct {
+	RunInstall bool
+	RunCodegen bool
+}
+
+func init() {
+	setupCmd.Flags().BoolVar(&setupIfNeeded, "if-needed", false, "Skip generated code refresh when it already exists")
+	rootCmd.AddCommand(setupCmd)
+}
+
+func buildSetupPlan(root string, ifNeeded bool) setupPlan {
+	return setupPlan{
+		RunInstall: true,
+		RunCodegen: !ifNeeded || !generatedGraphQLExists(root),
+	}
+}
+
+func generatedGraphQLExists(root string) bool {
+	for _, file := range []string{"gql.ts", "graphql.ts", "schema.graphql"} {
+		info, err := os.Stat(filepath.Join(root, "apps/agent/generated/graphql", file))
+		if err != nil || info.IsDir() {
+			return false
+		}
+	}
+	return true
+}
+
+// runDevSetup prepares the repo for local development. Dependency install always
+// runs because Bun is fast and this keeps watch resilient after branch changes.
+func runDevSetup(ctx context.Context, root string, ifNeeded bool) error {
+	plan := buildSetupPlan(root, ifNeeded)
+
+	if plan.RunInstall {
+		proc.LogMsg(proc.TagSetup, "Installing dependencies...")
+		if err := proc.RunBlocking(ctx, root, proc.TagSetup, "bun", "install", "--frozen-lockfile"); err != nil {
+			return fmt.Errorf("installing dependencies: %w", err)
+		}
+	}
+
+	if plan.RunCodegen {
+		proc.LogMsg(proc.TagSetup, "Generating agent code...")
+		if err := proc.RunBlocking(ctx, root, proc.TagSetup, "bun", "run", "codegen:agent"); err != nil {
+			return fmt.Errorf("generating agent code: %w", err)
+		}
+	} else {
+		proc.LogMsg(proc.TagSetup, "Agent code already generated")
+	}
+
+	proc.LogMsg(proc.TagSetup, "Setup ready")
+	return nil
+}
--- a/packages/browseros-agent/tools/dev/cmd/setup_test.go
+++ b/packages/browseros-agent/tools/dev/cmd/setup_test.go
@@ -0,0 +1,76 @@
+package cmd
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+)
+
+func TestBuildSetupPlanAlwaysInstallsDependencies(t *testing.T) {
+	root := t.TempDir()
+
+	plan := buildSetupPlan(root, true)
+
+	if !plan.RunInstall {
+		t.Fatal("expected dependency install to always run")
+	}
+}
+
+func TestBuildSetupPlanIfNeededSkipsExistingGeneratedGraphQL(t *testing.T) {
+	root := t.TempDir()
+	writeGeneratedGraphQLSentinels(t, root)
+
+	plan := buildSetupPlan(root, true)
+
+	if plan.RunCodegen {
+		t.Fatal("expected --if-needed setup to skip codegen when generated GraphQL exists")
+	}
+}
+
+func TestBuildSetupPlanIfNeededRunsCodegenWhenGeneratedGraphQLEmpty(t *testing.T) {
+	root := t.TempDir()
+	generatedDir := filepath.Join(root, "apps/agent/generated/graphql")
+	if err := os.MkdirAll(generatedDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+
+	plan := buildSetupPlan(root, true)
+
+	if !plan.RunCodegen {
+		t.Fatal("expected --if-needed setup to run codegen when generated GraphQL is empty")
+	}
+}
+
+func TestBuildSetupPlanIfNeededRunsCodegenWhenGeneratedGraphQLMissing(t *testing.T) {
+	root := t.TempDir()
+
+	plan := buildSetupPlan(root, true)
+
+	if !plan.RunCodegen {
+		t.Fatal("expected --if-needed setup to run codegen when generated GraphQL is missing")
+	}
+}
+
+func TestBuildSetupPlanExplicitSetupRunsCodegen(t *testing.T) {
+	root := t.TempDir()
+	writeGeneratedGraphQLSentinels(t, root)
+
+	plan := buildSetupPlan(root, false)
+
+	if !plan.RunCodegen {
+		t.Fatal("expected explicit setup to refresh codegen")
+	}
+}
+
+func writeGeneratedGraphQLSentinels(t *testing.T, root string) {
+	t.Helper()
+	generatedDir := filepath.Join(root, "apps/agent/generated/graphql")
+	if err := os.MkdirAll(generatedDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	for _, file := range []string{"gql.ts", "graphql.ts", "schema.graphql"} {
+		if err := os.WriteFile(filepath.Join(generatedDir, file), []byte("generated"), 0o644); err != nil {
+			t.Fatal(err)
+		}
+	}
+}
--- a/packages/browseros-agent/tools/dev/cmd/target.go
+++ b/packages/browseros-agent/tools/dev/cmd/target.go
@@ -0,0 +1,365 @@
+package cmd
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+
+	"browseros-dev/proc"
+
+	"gopkg.in/yaml.v3"
+)
+
+const (
+	targetDev     = "dev"
+	targetDogfood = "dogfood"
+	targetProd    = "prod"
+
+	devDirName  = ".browseros-dev"
+	prodDirName = ".browseros"
+)
+
+type resetTargetOptions struct {
+	Target             string
+	BrowserOSDir       string
+	Ports              string
+	BrowserUserDataDir string
+}
+
+type resetTarget struct {
+	Name                string
+	Title               string
+	BrowserOSDir        string
+	LimaHome            string
+	Ports               *proc.Ports
+	BrowserUserDataDirs []string
+	TempPrefixes        []string
+	WatchRunStateDir    string
+	DeleteRootLabel     string
+	DeleteRootBody      string
+	Dogfood             *dogfoodRuntimeTarget
+}
+
+type dogfoodRuntimeTarget struct {
+	ConfigDir  string
+	LockPath   string
+	StatePath  string
+	SocketPath string
+}
+
+type dogfoodConfigFile struct {
+	BrowserOSDir   string `yaml:"browseros_dir"`
+	DevUserDataDir string `yaml:"dev_user_data_dir"`
+	Ports          struct {
+		CDP       int `yaml:"cdp"`
+		Server    int `yaml:"server"`
+		Extension int `yaml:"extension"`
+	} `yaml:"ports"`
+}
+
+func resolveResetTarget(root string, opts resetTargetOptions) (resetTarget, error) {
+	target := strings.TrimSpace(opts.Target)
+	if target == "" {
+		target = targetDev
+	}
+	switch target {
+	case targetDev:
+		return resolveDevTarget(root, opts)
+	case targetDogfood:
+		return resolveDogfoodTarget(opts)
+	case targetProd:
+		return resolveProdTarget(opts)
+	default:
+		return resetTarget{}, fmt.Errorf("unsupported reset target %q", target)
+	}
+}
+
+func resolveDevTarget(root string, opts resetTargetOptions) (resetTarget, error) {
+	browserosDir, err := resolveBrowserOSDir(opts.BrowserOSDir, devDirName)
+	if err != nil {
+		return resetTarget{}, err
+	}
+	ports, err := resolveTargetPorts(root, opts.Ports)
+	if err != nil {
+		return resetTarget{}, err
+	}
+	return resetTarget{
+		Name:                targetDev,
+		Title:               "BrowserOS dev reset",
+		BrowserOSDir:        browserosDir,
+		LimaHome:            filepath.Join(browserosDir, "lima"),
+		Ports:               &ports,
+		BrowserUserDataDirs: []string{"/tmp/browseros-dev"},
+		TempPrefixes:        []string{"browseros-test-", "browseros-dev-"},
+		WatchRunStateDir:    filepath.Join(browserosDir, "runs"),
+		DeleteRootLabel:     "Delete dev profile?",
+		DeleteRootBody:      "It removes BrowserOS dev data plus VM/OpenClaw state.",
+	}, nil
+}
+
+func resolveDogfoodTarget(opts resetTargetOptions) (resetTarget, error) {
+	cfgDir, err := dogfoodConfigDir()
+	if err != nil {
+		return resetTarget{}, err
+	}
+	cfg, err := loadDogfoodConfig(filepath.Join(cfgDir, "config.yaml"))
+	if err != nil {
+		return resetTarget{}, err
+	}
+	applyDogfoodDefaults(&cfg, cfgDir)
+	browserosDir := firstNonEmpty(opts.BrowserOSDir, cfg.BrowserOSDir)
+	if browserosDir == "" {
+		return resetTarget{}, fmt.Errorf("dogfood browseros_dir is empty")
+	}
+	browserosDir, err = filepath.Abs(expandTilde(browserosDir))
+	if err != nil {
+		return resetTarget{}, err
+	}
+	ports, err := parsePorts(firstNonEmpty(opts.Ports, formatPorts(proc.Ports{
+		CDP:       cfg.Ports.CDP,
+		Server:    cfg.Ports.Server,
+		Extension: cfg.Ports.Extension,
+	})))
+	if err != nil {
+		return resetTarget{}, err
+	}
+	browserUserDataDir := firstNonEmpty(opts.BrowserUserDataDir, cfg.DevUserDataDir)
+	if browserUserDataDir == "" {
+		return resetTarget{}, fmt.Errorf("dogfood dev_user_data_dir is empty")
+	}
+	browserUserDataDir, err = filepath.Abs(expandTilde(browserUserDataDir))
+	if err != nil {
+		return resetTarget{}, err
+	}
+	return resetTarget{
+		Name:                targetDogfood,
+		Title:               "BrowserOS dogfood reset",
+		BrowserOSDir:        browserosDir,
+		LimaHome:            filepath.Join(browserosDir, "lima"),
+		Ports:               &ports,
+		BrowserUserDataDirs: []string{browserUserDataDir},
+		DeleteRootLabel:     "Delete dogfood BrowserOS state?",
+		DeleteRootBody:      "It removes dogfood-local BrowserOS server data plus VM/OpenClaw state. It does not touch your source BrowserOS browser profile.",
+		Dogfood: &dogfoodRuntimeTarget{
+			ConfigDir:  cfgDir,
+			LockPath:   filepath.Join(cfgDir, "run.lock"),
+			StatePath:  filepath.Join(cfgDir, "state.json"),
+			SocketPath: filepath.Join(cfgDir, "daemon.sock"),
+		},
+	}, nil
+}
+
+func applyDogfoodDefaults(cfg *dogfoodConfigFile, cfgDir string) {
+	if cfg.BrowserOSDir == "" {
+		if home, err := os.UserHomeDir(); err == nil {
+			cfg.BrowserOSDir = filepath.Join(home, ".browseros-dogfood")
+		}
+	}
+	if cfg.DevUserDataDir == "" {
+		cfg.DevUserDataDir = filepath.Join(cfgDir, "profile")
+	}
+	if cfg.Ports.CDP == 0 {
+		cfg.Ports.CDP = 9015
+	}
+	if cfg.Ports.Server == 0 {
+		cfg.Ports.Server = 9115
+	}
+	if cfg.Ports.Extension == 0 {
+		cfg.Ports.Extension = 9315
+	}
+}
+
+func resolveProdTarget(opts resetTargetOptions) (resetTarget, error) {
+	browserosDir, err := resolveBrowserOSDir(opts.BrowserOSDir, prodDirName)
+	if err != nil {
+		return resetTarget{}, err
+	}
+	return resetTarget{
+		Name:            targetProd,
+		Title:           "BrowserOS prod reset",
+		BrowserOSDir:    browserosDir,
+		LimaHome:        filepath.Join(browserosDir, "lima"),
+		DeleteRootLabel: "Delete prod BrowserOS state?",
+		DeleteRootBody:  "It removes ~/.browseros server data plus VM/OpenClaw state. It does not delete your BrowserOS browser profile.",
+	}, nil
+}
+
+func resolveBrowserOSDir(override string, dirName string) (string, error) {
+	if strings.TrimSpace(override) != "" {
+		return filepath.Abs(expandTilde(strings.TrimSpace(override)))
+	}
+	if dirName == devDirName {
+		if env := strings.TrimSpace(os.Getenv("BROWSEROS_DIR")); env != "" {
+			return filepath.Abs(expandTilde(env))
+		}
+	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", err
+	}
+	return filepath.Join(home, dirName), nil
+}
+
+func resolveTargetPorts(root string, explicit string) (proc.Ports, error) {
+	if strings.TrimSpace(explicit) != "" {
+		return parsePorts(explicit)
+	}
+	for _, path := range []string{
+		filepath.Join(root, "apps/server/.env.development"),
+		filepath.Join(root, "apps/server/.env.example"),
+	} {
+		ports, ok, err := readPortsFromEnvFile(path)
+		if err != nil {
+			return proc.Ports{}, err
+		}
+		if ok {
+			return ports, nil
+		}
+	}
+	return proc.DefaultLocalPorts(), nil
+}
+
+func readPortsFromEnvFile(path string) (proc.Ports, bool, error) {
+	file, err := os.Open(path)
+	if os.IsNotExist(err) {
+		return proc.Ports{}, false, nil
+	}
+	if err != nil {
+		return proc.Ports{}, false, err
+	}
+	defer file.Close()
+
+	values := map[string]int{}
+	scanner := bufio.NewScanner(file)
+	for scanner.Scan() {
+		key, value, ok := parseEnvLine(scanner.Text())
+		if !ok {
+			continue
+		}
+		switch key {
+		case "BROWSEROS_CDP_PORT", "BROWSEROS_SERVER_PORT", "BROWSEROS_EXTENSION_PORT":
+			port, err := strconv.Atoi(value)
+			if err != nil {
+				return proc.Ports{}, false, fmt.Errorf("parse %s in %s: %w", key, path, err)
+			}
+			values[key] = port
+		}
+	}
+	if err := scanner.Err(); err != nil {
+		return proc.Ports{}, false, err
+	}
+	if len(values) != 3 {
+		return proc.Ports{}, false, nil
+	}
+	return proc.Ports{
+		CDP:       values["BROWSEROS_CDP_PORT"],
+		Server:    values["BROWSEROS_SERVER_PORT"],
+		Extension: values["BROWSEROS_EXTENSION_PORT"],
+	}, true, nil
+}
+
+func parseEnvLine(line string) (string, string, bool) {
+	line = strings.TrimSpace(line)
+	if line == "" || strings.HasPrefix(line, "#") {
+		return "", "", false
+	}
+	key, value, ok := strings.Cut(line, "=")
+	if !ok {
+		return "", "", false
+	}
+	key = strings.TrimSpace(key)
+	value = strings.TrimSpace(stripInlineComment(value))
+	value = strings.Trim(value, `"'`)
+	return key, value, key != "" && value != ""
+}
+
+func stripInlineComment(value string) string {
+	quote := byte(0)
+	for index := 0; index < len(value); index++ {
+		switch value[index] {
+		case '\'', '"':
+			if quote == 0 {
+				quote = value[index]
+			} else if quote == value[index] {
+				quote = 0
+			}
+		case '#':
+			if quote == 0 {
+				return value[:index]
+			}
+		}
+	}
+	return value
+}
+
+func parsePorts(value string) (proc.Ports, error) {
+	parts := strings.Split(value, ",")
+	if len(parts) != 3 {
+		return proc.Ports{}, fmt.Errorf("ports must be cdp,server,extension")
+	}
+	parsed := [3]int{}
+	for i, part := range parts {
+		port, err := strconv.Atoi(strings.TrimSpace(part))
+		if err != nil {
+			return proc.Ports{}, fmt.Errorf("parse port %q: %w", part, err)
+		}
+		if port <= 0 || port > 65535 {
+			return proc.Ports{}, fmt.Errorf("port %d out of range", port)
+		}
+		parsed[i] = port
+	}
+	return proc.Ports{CDP: parsed[0], Server: parsed[1], Extension: parsed[2]}, nil
+}
+
+func formatPorts(ports proc.Ports) string {
+	return fmt.Sprintf("%d,%d,%d", ports.CDP, ports.Server, ports.Extension)
+}
+
+func dogfoodConfigDir() (string, error) {
+	if xdg := strings.TrimSpace(os.Getenv("XDG_CONFIG_HOME")); xdg != "" {
+		return filepath.Join(expandTilde(xdg), "browseros-dogfood"), nil
+	}
+	home, err := os.UserHomeDir()
+	if err != nil {
+		return "", err
+	}
+	return filepath.Join(home, ".config", "browseros-dogfood"), nil
+}
+
+func loadDogfoodConfig(path string) (dogfoodConfigFile, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return dogfoodConfigFile{}, fmt.Errorf("read dogfood config at %s: %w", path, err)
+	}
+	var cfg dogfoodConfigFile
+	if err := yaml.Unmarshal(data, &cfg); err != nil {
+		return dogfoodConfigFile{}, fmt.Errorf("parse dogfood config: %w", err)
+	}
+	return cfg, nil
+}
+
+func expandTilde(path string) string {
+	if path == "~" {
+		if home, err := os.UserHomeDir(); err == nil {
+			return home
+		}
+	}
+	if strings.HasPrefix(path, "~/") {
+		if home, err := os.UserHomeDir(); err == nil {
+			return filepath.Join(home, path[2:])
+		}
+	}
+	return path
+}
+
+func firstNonEmpty(values ...string) string {
+	for _, value := range values {
+		if strings.TrimSpace(value) != "" {
+			return strings.TrimSpace(value)
+		}
+	}
+	return ""
+}
--- a/packages/browseros-agent/tools/dev/cmd/target_test.go
+++ b/packages/browseros-agent/tools/dev/cmd/target_test.go
@@ -0,0 +1,166 @@
+package cmd
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+)
+
+func TestResolveDevTargetReadsDevelopmentEnvPorts(t *testing.T) {
+	root := t.TempDir()
+	serverDir := filepath.Join(root, "apps/server")
+	if err := os.MkdirAll(serverDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(serverDir, ".env.development"), []byte(
+		"BROWSEROS_CDP_PORT=9101\nBROWSEROS_SERVER_PORT=9201\nBROWSEROS_EXTENSION_PORT=9301\n",
+	), 0o644); err != nil {
+		t.Fatal(err)
+	}
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("BROWSEROS_DIR", "")
+
+	target, err := resolveResetTarget(root, resetTargetOptions{Target: "dev"})
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if target.Ports == nil || target.Ports.CDP != 9101 || target.Ports.Server != 9201 || target.Ports.Extension != 9301 {
+		t.Fatalf("unexpected ports: %#v", target.Ports)
+	}
+	if target.BrowserOSDir != filepath.Join(home, ".browseros-dev") {
+		t.Fatalf("unexpected browseros dir: %s", target.BrowserOSDir)
+	}
+}
+
+func TestResolveDevTargetFallsBackToExampleEnvPorts(t *testing.T) {
+	root := t.TempDir()
+	serverDir := filepath.Join(root, "apps/server")
+	if err := os.MkdirAll(serverDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(serverDir, ".env.example"), []byte(
+		"BROWSEROS_CDP_PORT=9000\nBROWSEROS_SERVER_PORT=9100\nBROWSEROS_EXTENSION_PORT=9300\n",
+	), 0o644); err != nil {
+		t.Fatal(err)
+	}
+	t.Setenv("HOME", t.TempDir())
+	t.Setenv("BROWSEROS_DIR", "")
+
+	target, err := resolveResetTarget(root, resetTargetOptions{Target: "dev"})
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if target.Ports == nil || target.Ports.CDP != 9000 || target.Ports.Server != 9100 || target.Ports.Extension != 9300 {
+		t.Fatalf("unexpected ports: %#v", target.Ports)
+	}
+}
+
+func TestReadPortsFromEnvFileStripsHashComments(t *testing.T) {
+	path := filepath.Join(t.TempDir(), ".env")
+	if err := os.WriteFile(path, []byte(
+		"BROWSEROS_CDP_PORT=9005#comment\nBROWSEROS_SERVER_PORT=9105 # comment\nBROWSEROS_EXTENSION_PORT=9305\n",
+	), 0o644); err != nil {
+		t.Fatal(err)
+	}
+
+	ports, ok, err := readPortsFromEnvFile(path)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if !ok {
+		t.Fatal("expected ports to be found")
+	}
+	if ports.CDP != 9005 || ports.Server != 9105 || ports.Extension != 9305 {
+		t.Fatalf("unexpected ports: %#v", ports)
+	}
+}
+
+func TestResolveDogfoodTargetReadsDogfoodConfig(t *testing.T) {
+	root := t.TempDir()
+	xdgConfig := t.TempDir()
+	t.Setenv("XDG_CONFIG_HOME", xdgConfig)
+	cfgDir := filepath.Join(xdgConfig, "browseros-dogfood")
+	if err := os.MkdirAll(cfgDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(cfgDir, "config.yaml"), []byte(`
+browseros_dir: /tmp/browseros-dogfood-state
+dev_user_data_dir: /tmp/browseros-dogfood-profile
+ports:
+  cdp: 9015
+  server: 9115
+  extension: 9315
+`), 0o644); err != nil {
+		t.Fatal(err)
+	}
+
+	target, err := resolveResetTarget(root, resetTargetOptions{Target: "dogfood"})
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if target.BrowserOSDir != "/tmp/browseros-dogfood-state" {
+		t.Fatalf("unexpected browseros dir: %s", target.BrowserOSDir)
+	}
+	if target.Ports == nil || target.Ports.CDP != 9015 || target.Ports.Server != 9115 || target.Ports.Extension != 9315 {
+		t.Fatalf("unexpected ports: %#v", target.Ports)
+	}
+	if len(target.BrowserUserDataDirs) != 1 || target.BrowserUserDataDirs[0] != "/tmp/browseros-dogfood-profile" {
+		t.Fatalf("unexpected browser user data dirs: %#v", target.BrowserUserDataDirs)
+	}
+	if target.Dogfood == nil || target.Dogfood.StatePath != filepath.Join(cfgDir, "state.json") {
+		t.Fatalf("unexpected dogfood runtime paths: %#v", target.Dogfood)
+	}
+}
+
+func TestResolveDogfoodTargetAppliesDogfoodDefaults(t *testing.T) {
+	root := t.TempDir()
+	home := t.TempDir()
+	xdgConfig := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("XDG_CONFIG_HOME", xdgConfig)
+	cfgDir := filepath.Join(xdgConfig, "browseros-dogfood")
+	if err := os.MkdirAll(cfgDir, 0o755); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.WriteFile(filepath.Join(cfgDir, "config.yaml"), []byte("{}\n"), 0o644); err != nil {
+		t.Fatal(err)
+	}
+
+	target, err := resolveResetTarget(root, resetTargetOptions{Target: "dogfood"})
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if target.BrowserOSDir != filepath.Join(home, ".browseros-dogfood") {
+		t.Fatalf("unexpected browseros dir: %s", target.BrowserOSDir)
+	}
+	if target.Ports == nil || target.Ports.CDP != 9015 || target.Ports.Server != 9115 || target.Ports.Extension != 9315 {
+		t.Fatalf("unexpected ports: %#v", target.Ports)
+	}
+	if len(target.BrowserUserDataDirs) != 1 || target.BrowserUserDataDirs[0] != filepath.Join(cfgDir, "profile") {
+		t.Fatalf("unexpected browser user data dirs: %#v", target.BrowserUserDataDirs)
+	}
+}
+
+func TestResolveProdTargetUsesBrowserosStateRoot(t *testing.T) {
+	root := t.TempDir()
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	t.Setenv("BROWSEROS_DIR", "")
+
+	target, err := resolveResetTarget(root, resetTargetOptions{Target: "prod"})
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if target.BrowserOSDir != filepath.Join(home, ".browseros") {
+		t.Fatalf("unexpected browseros dir: %s", target.BrowserOSDir)
+	}
+	if target.Ports != nil {
+		t.Fatalf("prod target should not clear ports by default: %#v", target.Ports)
+	}
+}
--- a/packages/browseros-agent/tools/dev/cmd/test.go
+++ b/packages/browseros-agent/tools/dev/cmd/test.go
@@ -41,7 +41,10 @@ func runTest(cmd *cobra.Command, args []string) error {
 		return err
 	}

-	p := proc.DefaultLocalPorts()
+	p, err := resolveTargetPorts(root, "")
+	if err != nil {
+		return err
+	}

 	proc.LogMsg(proc.TagInfo, "Killing processes on test ports...")
 	proc.KillPorts(p)
--- a/packages/browseros-agent/tools/dev/cmd/watch.go
+++ b/packages/browseros-agent/tools/dev/cmd/watch.go
@@ -44,7 +44,10 @@ func runWatch(cmd *cobra.Command, args []string) error {
 		return err
 	}

-	defaultPorts := proc.DefaultLocalPorts()
+	defaultPorts, err := resolveTargetPorts(root, "")
+	if err != nil {
+		return err
+	}
 	p := defaultPorts
 	var reservations *proc.PortReservations
 	userDataDir := "/tmp/browseros-dev"
@@ -115,6 +118,10 @@ func runWatch(cmd *cobra.Command, args []string) error {
 	}()
 	defer reservations.ReleaseAll()

+	if err := runDevSetup(cmd.Context(), root, setupModeIfNeeded); err != nil {
+		return err
+	}
+
 	fmt.Println()
 	proc.LogMsgf(proc.TagInfo, "Mode: %s", proc.BoldColor.Sprint(mode))
 	proc.LogMsgf(proc.TagInfo, "Ports: CDP=%d Server=%d Extension=%d", p.CDP, p.Server, p.Extension)
--- a/packages/browseros-agent/tools/dev/go.mod
+++ b/packages/browseros-agent/tools/dev/go.mod
@@ -5,6 +5,7 @@ go 1.25.7
 require (
 	github.com/fatih/color v1.18.0
 	github.com/spf13/cobra v1.10.2
+	gopkg.in/yaml.v3 v3.0.1
 )

 require (
--- a/packages/browseros-agent/tools/dev/go.sum
+++ b/packages/browseros-agent/tools/dev/go.sum
@@ -18,4 +18,7 @@ golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBc
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.25.0 h1:r+8e+loiHxRqhXVl6ML1nO3l1+oFoWbnlu2Ehimmi34=
 golang.org/x/sys v0.25.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
+gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
+gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
--- a/packages/browseros-agent/tools/dev/proc/log.go
+++ b/packages/browseros-agent/tools/dev/proc/log.go
@@ -14,6 +14,7 @@ type Tag struct {

 var (
 	TagBuild   = Tag{"build", color.New(color.FgYellow)}
+	TagSetup   = Tag{"setup", color.New(color.FgHiYellow)}
 	TagAgent   = Tag{"agent", color.New(color.FgMagenta)}
 	TagServer  = Tag{"server", color.New(color.FgCyan)}
 	TagBrowser = Tag{"browser", color.New(color.FgBlue)}
--- a/packages/browseros-agent/tools/dev/proc/ports.go
+++ b/packages/browseros-agent/tools/dev/proc/ports.go
@@ -27,7 +27,7 @@ const (
 	randomPortMax = 9999
 )

-var defaultLocalPorts = Ports{CDP: 9005, Server: 9105, Extension: 9305}
+var defaultLocalPorts = Ports{CDP: 9000, Server: 9100, Extension: 9300}

 func DefaultLocalPorts() Ports {
 	return defaultLocalPorts
--- a/packages/browseros-agent/tools/dev/proc/process.go
+++ b/packages/browseros-agent/tools/dev/proc/process.go
@@ -169,7 +169,16 @@ func StopAllWatchProcessesInDir(baseDir string, timeout time.Duration) (int, err

 // KillBrowserProcessesForDevProfiles kills BrowserOS instances using temporary dev/test profiles.
 func KillBrowserProcessesForDevProfiles(timeout time.Duration) (int, error) {
-	pids, err := currentBrowserProfilePIDs()
+	return killBrowserProcesses([]string{"/tmp/browseros-dev"}, true, timeout)
+}
+
+// KillBrowserProcessesForUserDataDirs kills BrowserOS instances using the given user-data dirs.
+func KillBrowserProcessesForUserDataDirs(userDataDirs []string, timeout time.Duration) (int, error) {
+	return killBrowserProcesses(userDataDirs, false, timeout)
+}
+
+func killBrowserProcesses(userDataDirs []string, includeDevTempProfiles bool, timeout time.Duration) (int, error) {
+	pids, err := currentBrowserProfilePIDs(userDataDirs, includeDevTempProfiles)
 	if err != nil {
 		return 0, err
 	}
@@ -184,7 +193,7 @@ func KillBrowserProcessesForDevProfiles(timeout time.Duration) (int, error) {

 	deadline := time.Now().Add(timeout)
 	for {
-		remaining, err := currentBrowserProfilePIDs()
+		remaining, err := currentBrowserProfilePIDs(userDataDirs, includeDevTempProfiles)
 		if err != nil {
 			return 0, err
 		}
@@ -292,15 +301,19 @@ func processGroupLive(pgid int) bool {
 	return err == nil || err == syscall.EPERM
 }

-func currentBrowserProfilePIDs() ([]int, error) {
+func currentBrowserProfilePIDs(userDataDirs []string, includeDevTempProfiles bool) ([]int, error) {
 	output, err := exec.Command("ps", "-axo", "pid=,pgid=,command=").Output()
 	if err != nil {
 		return nil, fmt.Errorf("listing processes: %w", err)
 	}
-	return browserProfilePIDsFromPS(string(output)), nil
+	return browserProfilePIDsFromPSForUserDataDirs(string(output), userDataDirs, includeDevTempProfiles), nil
 }

 func browserProfilePIDsFromPS(output string) []int {
+	return browserProfilePIDsFromPSForUserDataDirs(output, []string{"/tmp/browseros-dev"}, true)
+}
+
+func browserProfilePIDsFromPSForUserDataDirs(output string, userDataDirs []string, includeDevTempProfiles bool) []int {
 	var pids []int
 	for _, line := range strings.Split(output, "\n") {
 		fields := strings.Fields(line)
@@ -312,7 +325,7 @@ func browserProfilePIDsFromPS(output string) []int {
 			continue
 		}
 		command := strings.Join(fields[2:], " ")
-		if isDevBrowserProcess(command) {
+		if isBrowserProcessForUserDataDir(command, userDataDirs, includeDevTempProfiles) {
 			pids = append(pids, pid)
 		}
 	}
@@ -321,12 +334,24 @@ func browserProfilePIDsFromPS(output string) []int {
 }

 func isDevBrowserProcess(command string) bool {
+	return isBrowserProcessForUserDataDir(command, []string{"/tmp/browseros-dev"}, true)
+}
+
+func isBrowserProcessForUserDataDir(command string, userDataDirs []string, includeDevTempProfiles bool) bool {
 	if !strings.Contains(command, "BrowserOS.app/Contents/MacOS/BrowserOS") {
 		return false
 	}
-	return strings.Contains(command, "--user-data-dir=/tmp/browseros-dev") ||
-		strings.Contains(command, "browseros-dev-") ||
-		strings.Contains(command, "browseros-test-")
+	for _, dir := range userDataDirs {
+		if dir == "" {
+			continue
+		}
+		if strings.Contains(command, "--user-data-dir="+dir) {
+			return true
+		}
+	}
+	return includeDevTempProfiles &&
+		(strings.Contains(command, "browseros-dev-") ||
+			strings.Contains(command, "browseros-test-"))
 }

 func watchRunPaths(baseDir string, identity WatchRunIdentity) watchRunPathsResult {
--- a/packages/browseros-agent/tools/dev/setup.sh
+++ b/packages/browseros-agent/tools/dev/setup.sh
@@ -2,14 +2,4 @@
 set -euo pipefail

 DIR="$(cd "$(dirname "$0")" && pwd)"
-ROOT="$(cd "$DIR/../.." && pwd)"
-
-cd "$ROOT"
-
-echo "[setup] Installing dependencies..."
-bun install --frozen-lockfile
-
-echo "[setup] Generating agent code..."
-bun run codegen:agent
-
-echo "[setup] Ready"
+exec "$DIR/run.sh" setup "$@"
Author	SHA1	Message	Date
Nikhil Sonti	58cb43ec7f	fix: address review feedback for PR #896	2026-04-30 15:36:54 -07:00
Nikhil Sonti	eb90fcb6b3	feat: remove CLI auto init discovery	2026-04-30 15:24:41 -07:00
Nikhil	7c942e91ce	chore: add internal-docs submodule (#895 ) Mounts browseros-ai/internal-docs at .internal-docs/, tracking main. This activates the /document-internal and /ask-internal skills (which early-exit if the submodule is missing) and lets the sync-internal-docs workflow start bumping the pointer on its 4-hourly schedule. Team members: after this lands, run once from a fresh dev pull: git submodule update --init .internal-docs	2026-04-30 15:13:41 -07:00
Nikhil	1ff92c44b3	feat(internal-docs): scaffold private docs submodule, skills, sync action (#894 ) * feat(internal-docs): scaffold private docs submodule, skills, sync action Adds the OSS-side scaffolding for the internal-docs system: - /document-internal skill — drafts a 1-page feature/architecture/design doc from the current branch's diff, asks four sharp questions, enforces voice rules (no em dashes, banned filler words, 60-line cap on feature notes), then opens a PR to browseros-ai/internal-docs via a tmp clone. - /ask-internal skill — answers team-internal questions by greping internal-docs and the codebase, synthesizing with file:line citations, optionally executing surfaced commands with per-command confirmation, and drafting a new doc + PR if grep returns nothing useful. - .github/workflows/sync-internal-docs.yml — every 4 hours, bumps the submodule pointer on dev directly (no PR; relies on dev branch protection blocking force-push). Skips silently until the submodule is configured. Uses url.insteadOf to rewrite the SSH submodule URL to HTTPS-with-token for the bot, while keeping SSH the local default. - .claude/skills/document-internal/seeds/ — root README and three templates (feature-note, architecture-note, design-spec) ready to copy into the new internal-docs repo on rollout. Design spec: .llm/superpowers/specs/2026-04-30-internal-docs-submodule-design.md Manual prereqs (NOT in this PR — handled out-of-band): 1. Create private repo browseros-ai/internal-docs with branch protection on main. 2. Seed it with the contents of .claude/skills/document-internal/seeds/. 3. Create a bot account, mark as bypass actor on dev branch protection. 4. Add INTERNAL_DOCS_SYNC_TOKEN secret with repo + read access to internal-docs. 5. Once internal-docs exists, on a follow-up branch: git submodule add -b main git@github.com:browseros-ai/internal-docs.git .internal-docs 6. Send the team the one-time init snippet for their existing checkouts: git submodule update --init .internal-docs * fix(internal-docs): address Greptile review feedback - Workflow: rebase onto dev before push to handle non-fast-forward race; bump fetch-depth 1->50 so rebase has merge-base history. - Workflow: move INTERNAL_DOCS_SYNC_TOKEN into step env: per Actions credential-injection pattern, instead of inlining in the script body. - Skill (BASE bug): suppress git rev-parse stdout so SHA does not get captured into BASE alongside the literal 'dev'. Was breaking every downstream git log/diff call. - Skill (tmp clone): trap 'rm -rf "$TMP" EXIT after mktemp so cleanup always runs, even if any subsequent step fails.	2026-04-30 15:04:08 -07:00
shivammittal274	c81906ecbf	feat(eval): add claude code eval agent (#885 )	2026-05-01 02:25:08 +05:30
Nikhil	ffc0f09c86	feat(dev): add target-aware reset cleanup (#893 ) * feat(dev): add target-aware reset cleanup * fix(dev): address cleanup reset review comments	2026-04-30 13:34:52 -07:00
Nikhil	7fb53c9921	feat(dev): bootstrap setup from dev watch (#891 ) * feat(dev): bootstrap setup from dev watch * fix: address review feedback for PR #891	2026-04-30 13:00:46 -07:00