Compare commits

..

2 Commits

Author SHA1 Message Date
Nikhil Sonti
c60ac9496a fix: address cleanup reset review feedback 2026-04-30 12:25:38 -07:00
Nikhil Sonti
8acf4f54fd feat(dev): add guided cleanup and reset commands 2026-04-30 12:25:38 -07:00
269 changed files with 2951 additions and 20971 deletions

View File

@@ -1,152 +0,0 @@
---
name: ask-internal
description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
allowed-tools: Bash, Read, Grep, Glob, Edit, Write
---
# Ask Internal
Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
## When to use
- "How do I reset my dogfood profile?"
- "What's the deal with the OpenClaw VM startup?"
- "Where do we configure release signing?"
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
## Hard rules — never do these
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER cite a file or line number you have not actually read.
## Voice rules
Apply the same voice rules as `document-internal` to the synthesized answer:
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences. Active voice. No em dashes.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- No filler intros.
## Workflow
### Step 0: Pre-flight
```bash
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
exit 0
fi
[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
echo ".internal-docs/ missing or empty. Submodule not configured?"
exit 0
}
```
### Step 1: Parse the question
Pull the keywords from the user's question. Drop stop words. Identify intent:
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
- **Free-form** ("anything about Y"): search all categories.
### Step 2: Multi-source search
Run grep in parallel across two sources.
**Internal docs:**
```bash
grep -rni --include='*.md' '<keyword>' .internal-docs/
```
Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
**Codebase (skip vendored Chromium and `node_modules`):**
```bash
grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
--exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
'<keyword>' packages/ scripts/ .config/ .github/
```
Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
### Step 3: Synthesize answer
Structure the response:
1. **Direct answer.** First sentence answers the question. No preamble.
2. **Steps if applicable.** Numbered list with exact commands.
3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
### Step 4: Offer execution (only if commands surfaced)
If Step 3 produced executable commands the user could run, ask:
> Run these for you? (y / n / dry-run)
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
- **n:** Skip. Done.
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
### Step 5: Doc-not-found path
If Step 2 returned nothing useful (no doc hits AND no clear code answer):
1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
2. Ask: "Draft a new doc and open a PR to internal-docs?"
3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
### Step 6: Completion status
Report one of:
- **DONE** — answer delivered, citations verified.
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
- **BLOCKED** — submodule missing or other pre-flight failure.
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
## Citation discipline
Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
If a doc says one thing and the code says another, surface the conflict explicitly:
> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
## Common Mistakes
**Skimming and then citing**
- **Problem:** Citation points to a line that doesn't actually contain the claim.
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
**Executing without per-command confirmation for mutations**
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
**Searching only docs, not code**
- **Problem:** Doc says X but code does Y; answer is wrong.
- **Fix:** Always grep both sources in Step 2.
## Red Flags
**Never:**
- Cite a file:line you haven't read.
- Run mutations without per-command confirmation.
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
**Always:**
- Pre-flight check before any search.
- Reconcile doc vs code conflicts in the answer, don't hide them.
- Plain "no doc covers this" when grep is empty — never invent.

View File

@@ -1,208 +0,0 @@
---
name: document-internal
description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
allowed-tools: Bash, Read, Write, Edit, Grep, Glob
---
# Document Internal
Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
## When to use
After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
## Hard rules — never do these
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
- NEVER fabricate filler content for empty template sections. Empty stays empty.
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
## Voice rules — enforced by Step 4
The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
- Lead with the point. First sentence answers "what is this?"
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
- Short sentences. Average <20 words. No deeply nested clauses.
- Active voice. "X does Y" not "Y is done by X".
- No em dashes. Use commas, periods, or rephrase.
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
- No filler intros ("This document describes..."). Start with the substance.
- Empty sections stay empty. Do not write "N/A" or fabricate content.
## Workflow
### Step 0: Pre-flight
Bail with a clear message on any failure.
```bash
# Submodule must be initialized
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
exit 0
fi
[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
# Must be on a feature branch
BRANCH=$(git branch --show-current)
if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
echo "On $BRANCH. Run from a feature branch."
exit 0
fi
# Determine base branch (default: dev for this repo, fall back to main).
# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
# Gather context
git log "$BASE..HEAD" --oneline
git diff "$BASE...HEAD" --stat
gh pr view --json body -q .body 2>/dev/null # may be empty if no PR yet
```
### Step 1: Identify the doc
Ask the user for three things in one prompt:
1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
### Step 2: Decision brief — four sharp questions
Ask one question at a time. Each answer constrains the next. These force compression before drafting.
1. "In one sentence: what can someone now DO that they could not before?"
2. "What is the one design decision a future engineer needs to know?"
3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
4. "Any sharp edges or gotchas? (or 'none')"
Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
### Step 3: Draft from the template
Read the matching template from `.internal-docs/_templates/`:
- `feature` `feature-note.md`
- `architecture` `architecture-note.md`
- `design` `design-spec.md`
If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
Generate the 1-pager from the template, the four answers, and the diff context.
### Step 4: Voice self-check
Scan the draft for violations:
- Em dash present (`—`).
- Any banned word from the list.
- Average sentence length > 20 words.
- Body line count > 60 (feature notes only — architecture/design have no cap).
If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
### Step 5: Show + iterate
Print the full draft. Ask:
> Edit needed? Paste any changes, or say "looks good".
Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
### Step 6: Open PR to internal-docs
Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
```bash
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT # cleans up even if any step below fails
git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
cd "$TMP"
git checkout -b "docs/<slug>"
# Write the doc
mkdir -p "<type>" # features, architecture, designs, or setup
cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
<draft content>
DOC
# Update the root README index — insert one line under the matching section
# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
git commit -m "docs(<type>): <slug>"
git push -u origin "docs/<slug>"
PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
--head "docs/<slug>" \
--title "docs(<type>): <slug>" \
--body "$(cat <<'BODY'
## Summary
<one-line of what this doc covers>
## Source
- BrowserOS branch: <branch>
- Related PR: <#NNN if any>
BODY
)")
cd -
echo "PR opened: $PR_URL"
# trap above cleans up $TMP on EXIT
```
If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
### Step 7: Completion status
Report one of:
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
## Doc type defaults
| Branch pattern | Default doc type | Default location |
|----------------|------------------|------------------|
| `feat/*` | feature | `features/` |
| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
| `rfc/*` or `design/*` | design | `designs/` |
| Otherwise | ask | ask |
## Common Mistakes
**Drafting before asking the four questions**
- **Problem:** Output is generic filler that says nothing concrete.
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
**Touching `.internal-docs/` directly**
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
- **Fix:** Always use the tmp clone in Step 6.
**Skipping voice check on user edits**
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
- **Fix:** Re-run Step 4 after every user edit.
## Red Flags
**Never:**
- Push to `internal-docs/main`. Always branch + PR.
- Modify the OSS repo's `.gitmodules` or submodule pointer.
- Fabricate content for empty template sections.
**Always:**
- Pre-flight check before doing any work.
- One-pager rule for feature notes (60-line body cap).
- File:line citations when referencing code.

View File

@@ -1,51 +0,0 @@
# BrowserOS Internal Docs
Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
## How to find what you need
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
- Recently shipped feature → look in [`features/`](features/)
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
- A design decision or RFC → look in [`designs/`](designs/)
Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
## How to add a doc
Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
## Index
### Setup
<!-- one line per setup runbook: -->
<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
### Features
<!-- one line per shipped feature, newest first: -->
<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
### Architecture
<!-- one line per cross-cutting subsystem: -->
<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
### Designs
<!-- one line per design spec, newest first: -->
<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
## Templates
When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
## Voice
Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
- Lead with the point.
- Concrete nouns. Name files, functions, commands.
- Short sentences, active voice, no em dashes.
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
- Empty sections stay empty. Do not write "N/A" or fake content.
- Feature notes target one screen, body 60 lines max.

View File

@@ -1,31 +0,0 @@
---
title: <subsystem name>
owner: <github handle>
status: current | deprecated
date: YYYY-MM-DD
related-features: [feature-slug-1, feature-slug-2]
---
# <subsystem name>
## What this subsystem does
<1-2 paragraphs. The top-level responsibility. Boundaries.>
## Architecture
<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
## Constraints
<Hard rules the design enforces. "X must never call Y" type statements.>
## Decisions made
<Numbered list of non-obvious decisions and the reason for each.>
## Key files
- `path/to/file.ts` — role
- `path/to/dir/` — what lives here
## How to evolve this
<Where to add things. Which tests to expect to update. What NOT to touch.>
## Open questions
<What is still being figured out. Empty if none.>

View File

@@ -1,34 +0,0 @@
---
title: <design name>
owner: <github handle>
status: proposed | accepted | rejected | superseded
date: YYYY-MM-DD
supersedes: <design-slug or none>
---
# <design name>
## Goal
<2-4 sentences. What this design is trying to accomplish.>
## Context
<1-2 paragraphs. The current state, what is failing, why this needs to change.>
## Selected Approach
<The chosen design at a high level. Architecture, components, data flow.>
## Alternatives Considered
### 1. <name>
<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
### 2. <name>
<Same shape.>
## Out of Scope
<What this design does NOT cover. Defer references.>
## Rollout
<Numbered steps from "nothing exists" to "fully shipped".>
## Open Questions
<Resolved during design? Empty. Unresolved? List with owner.>

View File

@@ -1,29 +0,0 @@
---
title: <feature name>
owner: <github handle>
status: shipped | wip | deprecated
date: YYYY-MM-DD
prs: ["#NNN"]
tags: [agent, browser, mcp]
---
# <feature name>
## What it does
<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
## Why we built it
<1-2 sentences. Motivation. What pain it removed or what unlocked.>
## How it works
<3-6 sentences. The flow at a high level. Name the key files.>
## Key files
- `path/to/file.ts` — what it does
- `path/to/other.ts` — what it does
## How to run / test it locally
<bullet list of commands. Empty section if N/A do not fake.>
## Gotchas
<known sharp edges. "If you see X, that's why." Empty if N/A.>

View File

@@ -44,19 +44,6 @@ jobs:
working-directory: packages/browseros-agent
run: bun install --ignore-scripts
- name: Install Claude Code CLI
working-directory: packages/browseros-agent/apps/eval
env:
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
run: |
if bun -e "const config = await Bun.file(process.env.EVAL_CONFIG).json(); process.exit(config.agent?.type === 'claude-code' ? 0 : 1)"; then
npm install -g @anthropic-ai/claude-code@2.1.119
echo "Claude Code CLI installed at $(command -v claude)"
claude --version
else
echo "Eval config does not use Claude Code; skipping Claude Code CLI install"
fi
- name: Install Python eval dependencies
# agisdk pinned so silent upstream releases can't shift task definitions
# or grader behavior. Bump intentionally with a documented re-baseline.
@@ -80,11 +67,13 @@ jobs:
env:
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION || 'us-west-2' }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
# OpenClaw container runtime is macOS-only; opt the Linux runner
@@ -93,35 +82,7 @@ jobs:
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
run: |
echo "Running eval with config: $EVAL_CONFIG"
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG"
# Capture the run directory so report.html can be generated before the R2 publish step.
SUMMARY_PATH="$(find results -name summary.json -type f -print | sort | tail -n 1)"
if [ -z "$SUMMARY_PATH" ]; then
echo "No eval run summary found"
exit 1
fi
RUN_DIR="$(dirname "$SUMMARY_PATH")"
echo "EVAL_RUN_DIR=$RUN_DIR" >> "$GITHUB_ENV"
- name: Generate run analysis report
if: success()
working-directory: packages/browseros-agent/apps/eval
env:
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
run: |
echo "Generating run report for $EVAL_RUN_DIR"
bun scripts/generate-report.ts --input "$EVAL_RUN_DIR" --output "$EVAL_RUN_DIR/report.html"
- name: Publish eval run to R2
if: success()
working-directory: packages/browseros-agent/apps/eval
env:
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
run: bun run src/index.ts publish --run "$EVAL_RUN_DIR" --target r2
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
- name: Generate trend report
if: success()
@@ -136,7 +97,7 @@ jobs:
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
run: bun apps/eval/scripts/weekly-report.ts /tmp/eval-report.html
- name: Upload trend report as artifact
- name: Upload report as artifact
if: success()
uses: actions/upload-artifact@v4
with:

View File

@@ -1,62 +0,0 @@
name: Sync internal-docs submodule
on:
schedule:
- cron: '0 */4 * * *'
workflow_dispatch:
jobs:
sync:
name: Bump internal-docs submodule pointer on dev
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Rewrite SSH submodule URL to HTTPS-with-token
env:
TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
run: |
git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
- uses: actions/checkout@v4
with:
token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
submodules: true
ref: dev
fetch-depth: 50
- name: Open auto-merge PR if internal-docs has new commits
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -e
# Skip if submodule not yet configured (handoff window before someone adds it)
if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
exit 0
fi
git submodule update --remote --merge .internal-docs
if git diff --quiet .internal-docs; then
echo "No internal-docs changes to sync."
exit 0
fi
BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
git config user.name "browseros-bot"
git config user.email "bot@browseros.ai"
git checkout -b "$BRANCH"
git add .internal-docs
git commit -m "chore: sync internal-docs submodule"
git push -u origin "$BRANCH"
PR_URL=$(gh pr create \
--base dev \
--head "$BRANCH" \
--title "chore: sync internal-docs submodule" \
--body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
gh pr merge "$PR_URL" --auto --squash --delete-branch

4
.gitmodules vendored
View File

@@ -1,4 +0,0 @@
[submodule ".internal-docs"]
path = .internal-docs
url = git@github.com:browseros-ai/internal-docs.git
branch = main

Submodule .internal-docs deleted from 590799ae1c

View File

@@ -1,44 +1,186 @@
import { ArrowLeft, PanelRight } from 'lucide-react'
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
import { ArrowLeft, Bot, Home } from 'lucide-react'
import { type FC, useEffect, useMemo, useRef } from 'react'
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
import { Button } from '@/components/ui/button'
import type {
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import {
cancelHarnessTurn,
useAgentAdapters,
useEnqueueHarnessMessage,
useHarnessAgents,
useRemoveHarnessQueuedMessage,
useUpdateHarnessAgent,
} from '@/entrypoints/app/agents/useAgents'
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
import { type ProducedFilesRailGroup, useAgentOutputs } from '@/lib/agent-files'
import { cn } from '@/lib/utils'
import { AgentRail } from './AgentRail'
import { useAgentCommandData } from './agent-command-layout'
import {
OutputsRail,
useOutputsRailOpen,
} from './agent-conversation.outputs-rail'
type AgentEntry,
getModelDisplayName,
} from '@/entrypoints/app/agents/useOpenClaw'
import { cn } from '@/lib/utils'
import { useAgentCommandData } from './agent-command-layout'
import { ClawChat } from './ClawChat'
import { ConversationHeader } from './ConversationHeader'
import { ConversationInput } from './ConversationInput'
import {
buildChatHistoryFromClawMessages,
filterTurnsPersistedInHistory,
flattenHistoryPages,
mapHistoryToProducedFilesGroups,
selectStripOnlyTurns,
} from './claw-chat-types'
import { consumePendingInitialMessage } from './pending-initial-message'
import { QueuePanel } from './QueuePanel'
import { useAgentConversation } from './useAgentConversation'
import { useHarnessChatHistory } from './useHarnessChatHistory'
function StatusBadge({ status }: { status: string }) {
return (
<div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
<span
className={cn(
'size-1.5 rounded-full',
status === 'Working on your request'
? 'bg-amber-500'
: status === 'Ready'
? 'bg-emerald-500'
: status === 'Offline'
? 'bg-muted-foreground/50'
: 'bg-[var(--accent-orange)]',
)}
/>
<span>{status}</span>
</div>
)
}
function AgentIdentity({
name,
meta,
className,
}: {
name: string
meta: string
className?: string
}) {
return (
<div className={cn('min-w-0', className)}>
<div className="truncate font-semibold text-[15px] leading-5">{name}</div>
<div className="truncate text-muted-foreground text-xs leading-5">
{meta}
</div>
</div>
)
}
function ConversationHeader({
agentName,
agentMeta,
status,
backLabel,
backTarget,
onGoHome,
}: {
agentName: string
agentMeta: string
status: string
backLabel: string
backTarget: 'home' | 'page'
onGoHome: () => void
}) {
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
return (
<div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 rounded-xl lg:hidden"
title={backLabel}
>
<BackIcon className="size-4" />
</Button>
<div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
<Bot className="size-4" />
</div>
<AgentIdentity name={agentName} meta={agentMeta} />
</div>
<StatusBadge status={status} />
</div>
)
}
function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
return (
<div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 rounded-xl"
title="Back to home"
>
<ArrowLeft className="size-4" />
</Button>
<div className="truncate font-semibold text-[15px] leading-5">
Agents
</div>
</div>
</div>
)
}
function AgentRailList({
activeAgentId,
agents,
onSelectAgent,
}: {
activeAgentId: string
agents: AgentEntry[]
onSelectAgent: (entry: AgentEntry) => void
}) {
return (
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
<div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
{agents.map((entry) => {
const active = entry.agentId === activeAgentId
const modelName = getAgentEntryMeta(entry)
return (
<button
key={entry.agentId}
type="button"
onClick={() => onSelectAgent(entry)}
className={cn(
'w-full rounded-2xl border px-3 py-3 text-left transition-all',
active
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
)}
>
<div className="flex items-center gap-3">
<div
className={cn(
'flex size-9 items-center justify-center rounded-xl',
active
? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
: 'bg-muted text-muted-foreground',
)}
>
<Bot className="size-4" />
</div>
<AgentIdentity name={entry.name} meta={modelName} />
</div>
</button>
)
})}
</div>
</aside>
)
}
function getAgentEntryMeta(agent: AgentEntry | undefined): string {
if (agent?.source === 'agent-harness') {
return getModelDisplayName(agent.model) ?? 'ACP agent'
}
return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
}
function AgentConversationController({
agentId,
initialMessage,
@@ -46,7 +188,6 @@ function AgentConversationController({
agents,
agentPathPrefix,
createAgentPath,
onOpenOutputsRail,
}: {
agentId: string
initialMessage: string | null
@@ -54,7 +195,6 @@ function AgentConversationController({
agents: AgentEntry[]
agentPathPrefix: string
createAgentPath: string
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
}) {
const navigate = useNavigate()
const initialMessageSentRef = useRef<string | null>(null)
@@ -86,15 +226,6 @@ function AgentConversationController({
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
const queue = harnessAgent?.queue ?? []
const activeTurnId = harnessAgent?.activeTurnId ?? null
const isOpenClawAgent = harnessAgent?.adapter === 'openclaw'
// Used to surface produced-files strips on a fresh page load
// when there's no optimistic turn to carry the data. Disabled
// for non-openclaw adapters since they don't attribute files.
const { groups: agentOutputGroups } = useAgentOutputs(
agentId,
isOpenClawAgent,
)
const { turns, streaming, send } = useAgentConversation(agentId, {
runtime: 'agent-harness',
@@ -119,44 +250,6 @@ function AgentConversationController({
() => filterTurnsPersistedInHistory(turns, historyMessages),
[historyMessages, turns],
)
// Persisted turns that still need to surface their FileCardStrip
// — history items don't carry produced-files data, so without
// these the strip would vanish on history reload.
const stripOnlyTurns = useMemo(
() => selectStripOnlyTurns(turns, historyMessages),
[historyMessages, turns],
)
// Two outputs from the per-turn matcher:
// - filesByAssistantId → strip rendered directly under the
// matching assistant history bubble.
// - tailUnmatched → groups with no history pair (orphans);
// rendered at the conversation tail.
// Both are filtered to exclude turnIds already covered by a
// live or strip-only optimistic turn (those carry their own
// strip and history hasn't reloaded yet).
const { filesByAssistantId, tailStripGroups } = useMemo(() => {
if (!isOpenClawAgent) {
return {
filesByAssistantId: new Map<string, ProducedFilesRailGroup>(),
tailStripGroups: [] as ProducedFilesRailGroup[],
}
}
const coveredTurnIds = new Set<string>()
for (const turn of turns) {
if (turn.turnId) coveredTurnIds.add(turn.turnId)
}
const eligibleGroups = agentOutputGroups.filter(
(group) => !coveredTurnIds.has(group.turnId),
)
const { byAssistantMessageId, unmatched } = mapHistoryToProducedFilesGroups(
historyMessages,
eligibleGroups,
)
return {
filesByAssistantId: byAssistantMessageId,
tailStripGroups: unmatched,
}
}, [agentOutputGroups, isOpenClawAgent, historyMessages, turns])
onInitialMessageConsumedRef.current = onInitialMessageConsumed
const disabled = !agent
@@ -171,73 +264,42 @@ function AgentConversationController({
sendRef.current = send
useEffect(() => {
if (disabled || !historyReady) return
// Registry-first: when the user submitted at /home with
// attachments, the rich payload is here. URL `?q=` may also be
// present and is the text-only fallback path; the registry wins
// when both exist because it carries the binary attachments
// alongside the text.
const pending = consumePendingInitialMessage(agentId)
if (pending) {
// Mark the dedup ref so the text-only branch below doesn't
// re-fire on the same render.
if (initialMessageKey) {
initialMessageSentRef.current = initialMessageKey
}
onInitialMessageConsumedRef.current()
void sendRef.current({
text: pending.text,
attachments: pending.attachments.map((a) => a.payload),
attachmentPreviews: pending.attachments.map((a) => ({
id: a.id,
kind: a.kind,
mediaType: a.mediaType,
name: a.name,
dataUrl: a.dataUrl,
})),
})
return
}
const query = initialMessage?.trim()
if (!initialMessageKey) {
// Reset is safe even on the post-registry-fire re-run: consume
// is destructive, so the registry is already drained — there's
// nothing left for a third run to re-send.
initialMessageSentRef.current = null
return
}
if (!query || initialMessageSentRef.current === initialMessageKey) {
if (
!query ||
initialMessageSentRef.current === initialMessageKey ||
disabled ||
!historyReady
) {
return
}
initialMessageSentRef.current = initialMessageKey
onInitialMessageConsumedRef.current()
void sendRef.current({ text: query })
}, [agentId, disabled, historyReady, initialMessage, initialMessageKey])
}, [disabled, historyReady, initialMessage, initialMessageKey])
const handleSelectAgent = (entry: AgentEntry) => {
navigate(`${agentPathPrefix}/${entry.agentId}`)
}
return (
<div className="flex min-h-0 flex-1 flex-col overflow-hidden">
<div className="flex min-h-0 flex-col overflow-hidden">
<ClawChat
agentName={agentName}
historyMessages={historyMessages}
turns={visibleTurns}
stripOnlyTurns={stripOnlyTurns}
filesByAssistantId={filesByAssistantId}
tailStripGroups={tailStripGroups}
streaming={streaming}
isInitialLoading={harnessHistoryQuery.isLoading}
error={error}
hasNextPage={false}
isFetchingNextPage={false}
onFetchNextPage={() => {}}
onOpenOutputsRail={onOpenOutputsRail}
onRetry={() => {
void harnessHistoryQuery.refetch()
}}
@@ -306,22 +368,6 @@ interface AgentCommandConversationProps {
createAgentPath?: string
}
function inferAdapterFromEntry(
entry: AgentEntry | undefined,
): HarnessAgentAdapter | 'unknown' {
if (!entry) return 'unknown'
if (entry.source === 'agent-harness') {
// Harness entries don't carry the adapter on AgentEntry; the rail
// / header read the harness record directly. This branch only runs
// before the harness query resolves, so 'unknown' is correct — the
// tile's bot fallback renders until data arrives.
return 'unknown'
}
// OpenClaw-only entries (no harness shadow) are deprecated in
// practice but the rail still tolerates them.
return 'openclaw'
}
export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
variant = 'command',
backPath = '/home',
@@ -332,191 +378,60 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
const [searchParams, setSearchParams] = useSearchParams()
const navigate = useNavigate()
const { agents } = useAgentCommandData()
const { harnessAgents } = useHarnessAgents()
const { adapters } = useAgentAdapters()
const updateAgent = useUpdateHarnessAgent()
const shouldRedirectHome = !agentId
const resolvedAgentId = agentId ?? ''
const harnessAgent = harnessAgents.find(
(entry) => entry.id === resolvedAgentId,
)
const entry = agents.find((item) => item.agentId === resolvedAgentId)
const fallbackName = entry?.name || resolvedAgentId || 'Agent'
const fallbackAdapter = inferAdapterFromEntry(entry)
const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
const agentName = agent?.name || resolvedAgentId || 'Agent'
const agentMeta = getAgentEntryMeta(agent)
const initialMessage = searchParams.get('q')
const isPageVariant = variant === 'page'
const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'
const isOpenClawAgent = harnessAgent?.adapter === 'openclaw'
const [outputsRailOpen, setOutputsRailOpen] =
useOutputsRailOpen(resolvedAgentId)
const railVisible = isOpenClawAgent && outputsRailOpen
// Deep-link target for the rail. Set when (a) the user clicks
// View / +N on an inline file-card strip, or (b) an external nav
// arrived with `?outputsTurn=<turnId>`. Cleared by the rail
// itself once it has scrolled to + expanded the matching group.
const urlOutputsTurn = searchParams.get('outputsTurn')
const [focusTurnId, setFocusTurnId] = useState<string | null>(urlOutputsTurn)
// If the URL param flips while we're already on this agent, sync.
useEffect(() => {
if (!urlOutputsTurn) return
setFocusTurnId(urlOutputsTurn)
if (isOpenClawAgent) setOutputsRailOpen(true)
}, [urlOutputsTurn, isOpenClawAgent, setOutputsRailOpen])
const handleOpenOutputsRail = (turnId?: string | null) => {
if (!isOpenClawAgent) return
setOutputsRailOpen(true)
setFocusTurnId(turnId ?? null)
}
const handleFocusTurnConsumed = () => {
setFocusTurnId(null)
if (urlOutputsTurn) {
// Drop the URL param so a back-nav doesn't re-trigger the
// scroll. `replace: true` keeps history clean.
setSearchParams(
(prev) => {
const next = new URLSearchParams(prev)
next.delete('outputsTurn')
return next
},
{ replace: true },
)
}
}
const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
const adapterId = harnessAgent?.adapter
if (!adapterId) return null
const descriptor = adapters.find((item) => item.id === adapterId)
if (!descriptor?.health) return null
return {
healthy: descriptor.health.healthy,
reason: descriptor.health.reason,
}
}, [adapters, harnessAgent?.adapter])
if (shouldRedirectHome) {
return <Navigate to="/home" replace />
}
const handleSelectHarnessAgent = (target: HarnessAgent) => {
navigate(`${agentPathPrefix}/${target.id}`)
const handleSelectAgent = (entry: AgentEntry) => {
navigate(`${agentPathPrefix}/${entry.agentId}`)
}
const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
if (!target) return
updateAgent.mutate({
agentId: target.id,
patch: { pinned: next },
})
}
// Every visible agent runs through the harness now, so per-agent
// runtime status doesn't gate chat the way OpenClaw's legacy
// gateway lifecycle did. Show "Ready" once the agent record is
// resolved from the rail, "Setup" otherwise.
const statusCopy = agent ? 'Ready' : 'Setup'
return (
<div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
<div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
{/* Shared top band — the rail's "Agents" header and the chat
header live on one row so they're aligned by construction. */}
<div className="flex shrink-0 items-stretch border-border/50 border-b">
<div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
<Button
variant="ghost"
size="icon"
onClick={() => navigate(backPath)}
className="size-8 rounded-xl"
title="Back to home"
>
<ArrowLeft className="size-4" />
</Button>
<div className="truncate font-semibold text-[15px] leading-5">
Agents
</div>
</div>
<div className="min-w-0 flex-1">
<ConversationHeader
agent={harnessAgent ?? null}
fallbackName={fallbackName}
fallbackAdapter={fallbackAdapter}
adapterHealth={adapterHealth}
backLabel={backLabel}
backTarget={isPageVariant ? 'page' : 'home'}
onGoHome={() => navigate(backPath)}
onPinToggle={(next) =>
handlePinToggle(harnessAgent ?? null, next)
}
headerExtra={
isOpenClawAgent ? (
<Button
variant={railVisible ? 'secondary' : 'ghost'}
size="icon"
className="size-8 rounded-xl"
onClick={() => setOutputsRailOpen(!railVisible)}
title={railVisible ? 'Hide outputs' : 'Show outputs'}
>
<PanelRight className="size-4" />
</Button>
) : undefined
}
/>
</div>
</div>
<div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
<AgentRailHeader onGoHome={() => navigate(backPath)} />
{/* Body grid: rail list + chat (+ outputs rail when an
openclaw agent has it open). Columns share the same top
edge as the band above so headers can never drift. */}
<div
className={cn(
'grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)]',
railVisible
? 'lg:grid-cols-[288px_minmax(0,1fr)_320px]'
: 'lg:grid-cols-[288px_minmax(0,1fr)]',
)}
>
<AgentRail
agents={harnessAgents}
adapters={adapters}
activeAgentId={resolvedAgentId}
onSelectAgent={handleSelectHarnessAgent}
onPinToggle={(target, next) => handlePinToggle(target, next)}
/>
<ConversationHeader
agentName={agentName}
agentMeta={agentMeta}
status={statusCopy}
backLabel={backLabel}
backTarget={isPageVariant ? 'page' : 'home'}
onGoHome={() => navigate(backPath)}
/>
<div className="flex h-full min-h-0 flex-col overflow-hidden">
<AgentConversationController
key={resolvedAgentId}
agentId={resolvedAgentId}
agents={agents}
initialMessage={initialMessage}
onInitialMessageConsumed={() => {
// Preserve the outputsTurn deep-link if present —
// dropping all params would erase the rail focus
// before it had a chance to consume.
setSearchParams(
(prev) => {
const next = new URLSearchParams()
const turn = prev.get('outputsTurn')
if (turn) next.set('outputsTurn', turn)
return next
},
{ replace: true },
)
}}
agentPathPrefix={agentPathPrefix}
createAgentPath={createAgentPath}
onOpenOutputsRail={isOpenClawAgent ? handleOpenOutputsRail : null}
/>
</div>
<AgentRailList
activeAgentId={resolvedAgentId}
agents={agents}
onSelectAgent={handleSelectAgent}
/>
{railVisible ? (
<OutputsRail
agentId={resolvedAgentId}
onClose={() => setOutputsRailOpen(false)}
focusTurnId={focusTurnId}
onFocusTurnConsumed={handleFocusTurnConsumed}
/>
) : null}
</div>
<AgentConversationController
key={resolvedAgentId}
agentId={resolvedAgentId}
agents={agents}
initialMessage={initialMessage}
onInitialMessageConsumed={() =>
setSearchParams({}, { replace: true })
}
agentPathPrefix={agentPathPrefix}
createAgentPath={createAgentPath}
/>
</div>
</div>
)

View File

@@ -18,12 +18,8 @@ import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
import { AgentCardDock } from './AgentCardDock'
import { useAgentCommandData } from './agent-command-layout'
import {
ConversationInput,
type ConversationInputSendInput,
} from './ConversationInput'
import { ConversationInput } from './ConversationInput'
import { orderHomeAgents } from './home-agent-card.helpers'
import { setPendingInitialMessage } from './pending-initial-message'
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
return (
@@ -120,19 +116,8 @@ export const AgentCommandHome: FC = () => {
}
}, [legacyAgents, selectedAgentId])
const handleSend = (input: ConversationInputSendInput) => {
const handleSend = (input: { text: string }) => {
if (!selectedAgentId) return
// Stash text + attachments in the in-memory registry. Text also
// travels in `?q=` so a hard refresh / shareable URL still works
// for text-only prompts; attachments are registry-only because a
// multi-megabyte dataUrl can't ride a URL search param. The chat
// screen prefers the registry when both are present.
setPendingInitialMessage({
agentId: selectedAgentId,
text: input.text,
attachments: input.attachments,
createdAt: Date.now(),
})
navigate(
`/home/agents/${selectedAgentId}?q=${encodeURIComponent(input.text)}`,
)
@@ -162,16 +147,12 @@ export const AgentCommandHome: FC = () => {
<>
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
<div className="space-y-3">
<h1 className="font-semibold text-[clamp(2.25rem,4.5vw,3.5rem)] leading-[1.08] tracking-[-0.025em] [text-wrap:balance]">
What should your agent{' '}
<span className="font-medium text-[var(--accent-orange)] italic">
work on
</span>{' '}
next?
<h1 className="font-semibold text-[clamp(2rem,4vw,3.25rem)] leading-tight tracking-tight">
What should your agent work on next?
</h1>
<p className="mx-auto max-w-2xl text-muted-foreground text-sm leading-6 [text-wrap:pretty]">
Start a task, continue a thread, or hand off to a different
agent all without leaving this tab.
<p className="mx-auto max-w-2xl text-muted-foreground text-sm leading-6">
Start with a task, continue a thread, or switch to another
agent without leaving the new tab.
</p>
</div>
@@ -186,7 +167,7 @@ export const AgentCommandHome: FC = () => {
streaming={false}
disabled={!selectedAgentReady}
status={selectedAgentStatus}
attachmentsEnabled={true}
attachmentsEnabled={false}
placeholder={
selectedAgentReady
? `Ask ${selectedAgentName} to handle a task...`

View File

@@ -1,65 +0,0 @@
import { type FC, useMemo } from 'react'
import type {
HarnessAdapterDescriptor,
HarnessAgent,
HarnessAgentAdapter,
} from '@/entrypoints/app/agents/agent-harness-types'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
import { AgentRailRow } from './AgentRailRow'
interface AgentRailProps {
agents: HarnessAgent[]
adapters: HarnessAdapterDescriptor[]
activeAgentId: string
onSelectAgent: (agent: HarnessAgent) => void
onPinToggle: (agent: HarnessAgent, next: boolean) => void
}
/**
* Left-column scrollable list of agents. The "Agents" label + back
* button live in the shared top band above (so the rail header and
* the chat header sit on a single aligned strip rather than as two
* separately-sized headers per column). Sort matches `/agents`:
* pinned-first → recency, so the rail doesn't reshuffle as turns
* transition every 5 s.
*/
export const AgentRail: FC<AgentRailProps> = ({
agents,
adapters,
activeAgentId,
onSelectAgent,
onPinToggle,
}) => {
const adapterHealth = useMemo(() => {
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
for (const adapter of adapters) {
if (adapter.health) {
map.set(adapter.id, {
healthy: adapter.health.healthy,
reason: adapter.health.reason,
})
}
}
return map
}, [adapters])
const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
return (
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
<div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
{ordered.map((agent) => (
<AgentRailRow
key={agent.id}
agent={agent}
active={agent.id === activeAgentId}
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
onSelect={() => onSelectAgent(agent)}
onPinToggle={(next) => onPinToggle(agent, next)}
/>
))}
</div>
</aside>
)
}

View File

@@ -1,102 +0,0 @@
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
import { cn } from '@/lib/utils'
interface AgentRailRowProps {
agent: HarnessAgent
active: boolean
adapterHealth: AgentAdapterHealth | null
onSelect: () => void
onPinToggle: (next: boolean) => void
}
/**
* Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
* down to the essentials that fit a ~280 px rail: tile + name + status
* badge + pin star, with the adapter / model / reasoning chips on a
* second line. Token totals, sparkline, last-message preview all stay
* on the `/agents` page where rows are full-width.
*/
export const AgentRailRow: FC<AgentRailRowProps> = ({
agent,
active,
adapterHealth,
onSelect,
onPinToggle,
}) => {
const status = agent.status ?? 'unknown'
const lastUsedAt = agent.lastUsedAt ?? null
const pinned = agent.pinned ?? false
return (
<button
type="button"
onClick={onSelect}
className={cn(
'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
active
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
)}
>
<div className="flex min-w-0 items-start gap-3">
<AgentTile
adapter={agent.adapter}
status={status}
lastUsedAt={lastUsedAt}
/>
<div className="min-w-0 flex-1">
<div className="flex items-center gap-1.5">
<span className="truncate font-semibold text-[14px] leading-5">
{agent.name}
</span>
{status === 'working' && (
<Badge
variant="secondary"
className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
>
Working
</Badge>
)}
{status === 'asleep' && (
<Badge
variant="outline"
className="h-5 px-1.5 text-[10px] text-muted-foreground"
>
Asleep
</Badge>
)}
{status === 'error' && (
<Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
Attention
</Badge>
)}
<div className="ml-auto">
<PinToggle pinned={pinned} onToggle={onPinToggle} />
</div>
</div>
<AgentSummaryChips
adapter={agent.adapter}
modelLabel={agent.modelId ?? null}
reasoningEffort={agent.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
</button>
)
}
/**
* Tooltip-only label helper kept exported in case the tile row needs to
* show "Codex agent" or similar in a future state. Inlined fallback for
* the rare `unknown` adapter rendering path.
*/
export function railRowAdapterLabel(agent: HarnessAgent): string {
return adapterLabel(agent.adapter)
}

View File

@@ -27,14 +27,6 @@ interface AgentSelectorProps {
onSelectAgent: (agent: AgentEntry) => void
onCreateAgent?: () => void
status?: string
/**
* `'pill'` renders the filled-pill variant used by the calm
* composer on `/home` — bordered, slightly elevated background,
* mono agent name, used as the visual anchor on the left of the
* footer chip row. Default `'ghost'` keeps the existing flat
* shadcn ghost-button trigger used by the chat surface.
*/
triggerVariant?: 'ghost' | 'pill'
}
function getStatusDot(status?: string) {
@@ -50,49 +42,31 @@ export const AgentSelector: FC<AgentSelectorProps> = ({
onSelectAgent,
onCreateAgent,
status,
triggerVariant = 'ghost',
}) => {
const [open, setOpen] = useState(false)
const selectedAgent = agents.find(
(agent) => agent.agentId === selectedAgentId,
)
const triggerNode =
triggerVariant === 'pill' ? (
<button
type="button"
className={cn(
'inline-flex h-6 max-w-[180px] items-center gap-1.5 rounded-full border border-border bg-accent/40 pr-2 pl-2.5 text-[11.5px] text-foreground transition-colors',
'hover:border-border hover:bg-accent/70 data-[state=open]:border-border data-[state=open]:bg-accent/70',
)}
>
<span className={cn('size-1.5 rounded-full', getStatusDot(status))} />
<span className="truncate font-medium font-mono text-[11.5px] tracking-[-0.01em]">
{selectedAgent?.name ?? 'Select agent'}
</span>
<ChevronDown className="size-3 shrink-0 text-muted-foreground" />
</button>
) : (
<Button
variant="ghost"
className={cn(
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
'data-[state=open]:bg-accent',
)}
>
<Bot className="h-4 w-4" />
<span className={cn('size-2 rounded-full', getStatusDot(status))} />
<span className="max-w-32 truncate">
{selectedAgent?.name ?? 'Select agent'}
</span>
<ChevronDown className="h-3 w-3" />
</Button>
)
return (
<Popover open={open} onOpenChange={setOpen}>
<PopoverTrigger asChild>{triggerNode}</PopoverTrigger>
<PopoverTrigger asChild>
<Button
variant="ghost"
className={cn(
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
'data-[state=open]:bg-accent',
)}
>
<Bot className="h-4 w-4" />
<span className={cn('size-2 rounded-full', getStatusDot(status))} />
<span className="max-w-32 truncate">
{selectedAgent?.name ?? 'Select agent'}
</span>
<ChevronDown className="h-3 w-3" />
</Button>
</PopoverTrigger>
<PopoverContent side="bottom" align="start" className="w-72 p-0">
<Command>
<CommandInput placeholder="Search agents..." className="h-9" />

View File

@@ -1,14 +1,12 @@
import { Bot, Loader2, RefreshCw } from 'lucide-react'
import { type FC, Fragment, useEffect, useRef } from 'react'
import { type FC, useEffect, useRef } from 'react'
import {
Conversation,
ConversationContent,
ConversationScrollButton,
} from '@/components/ai-elements/conversation'
import type { AgentConversationTurn } from '@/lib/agent-conversations/types'
import type { ProducedFilesRailGroup } from '@/lib/agent-files'
import { cn } from '@/lib/utils'
import { FileCardStrip } from './agent-conversation.file-card-strip'
import { ClawChatMessage } from './ClawChatMessage'
import { ConversationMessage } from './ConversationMessage'
import type { ClawChatMessage as ClawChatMessageModel } from './claw-chat-types'
@@ -17,29 +15,6 @@ interface ClawChatProps {
agentName: string
historyMessages: ClawChatMessageModel[]
turns: AgentConversationTurn[]
/**
* Persisted turns that still need to render their FileCardStrip
* because the history items they were filtered against don't
* carry produced-files data. Rendered between history and the
* live `turns` so the strip lands at the bottom of the
* corresponding assistant turn.
*/
stripOnlyTurns?: AgentConversationTurn[]
/**
* Maps each assistant history message id → the produced-files
* group that came from its turn. Built by
* `mapHistoryToProducedFilesGroups` upstream so the strip
* renders directly under the matching message instead of
* stacking at the conversation tail.
*/
filesByAssistantId?: Map<string, ProducedFilesRailGroup>
/**
* Produced-files groups that didn't match any persisted history
* pair (e.g. orphaned turns where history loaded after the
* group was attributed). Rendered at the conversation tail as
* a fallback so the user can still see them.
*/
tailStripGroups?: ReadonlyArray<ProducedFilesRailGroup>
streaming: boolean
isInitialLoading: boolean
error: Error | null
@@ -47,8 +22,6 @@ interface ClawChatProps {
isFetchingNextPage: boolean
onFetchNextPage: () => void
onRetry: () => void
/** Wired through to the inline file-card strip on each assistant turn. */
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
className?: string
}
@@ -105,9 +78,6 @@ export const ClawChat: FC<ClawChatProps> = ({
agentName,
historyMessages,
turns,
stripOnlyTurns,
filesByAssistantId,
tailStripGroups,
streaming,
isInitialLoading,
error,
@@ -115,7 +85,6 @@ export const ClawChat: FC<ClawChatProps> = ({
isFetchingNextPage,
onFetchNextPage,
onRetry,
onOpenOutputsRail,
className,
}) => {
const topSentinelRef = useRef<HTMLDivElement>(null)
@@ -178,44 +147,14 @@ export const ClawChat: FC<ClawChatProps> = ({
Start of conversation
</div>
) : null}
{historyMessages.map((message) => {
const matched = filesByAssistantId?.get(message.id)
return (
<Fragment key={message.id}>
<ClawChatMessage message={message} />
{matched ? (
<FileCardStrip
turnId={matched.turnId}
files={matched.files}
onOpenRail={onOpenOutputsRail ?? (() => {})}
/>
) : null}
</Fragment>
)
})}
{(tailStripGroups ?? []).map((group) => (
<FileCardStrip
key={`tail-strip-${group.turnId}`}
turnId={group.turnId}
files={group.files}
onOpenRail={onOpenOutputsRail ?? (() => {})}
/>
))}
{(stripOnlyTurns ?? []).map((turn) => (
<ConversationMessage
key={`strip-${turn.id}`}
turn={turn}
streaming={false}
stripOnly
onOpenOutputsRail={onOpenOutputsRail}
/>
{historyMessages.map((message) => (
<ClawChatMessage key={message.id} message={message} />
))}
{turns.map((turn, index) => (
<ConversationMessage
key={turn.id}
turn={turn}
streaming={streaming && index === turns.length - 1}
onOpenOutputsRail={onOpenOutputsRail}
/>
))}
{error ? (

View File

@@ -1,187 +0,0 @@
import { ArrowLeft, Home } from 'lucide-react'
import type { FC, ReactNode } from 'react'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
import { cn } from '@/lib/utils'
interface ConversationHeaderProps {
agent: HarnessAgent | null
fallbackName: string
fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'hermes' | 'unknown'
adapterHealth: AgentAdapterHealth | null
backLabel: string
backTarget: 'home' | 'page'
onGoHome: () => void
onPinToggle: (next: boolean) => void
/** Optional trailing slot — currently used for the Outputs rail toggle. */
headerExtra?: ReactNode
}
/**
* Strip above the chat. Mirrors the `/agents` row card's title row +
* summary chips so the user gets adapter health, pin state, and status
* at a glance — but adds the meta line (last used · lifetime tokens ·
* queued) that's specific to this surface.
*
* The mobile `lg:hidden` Back button is preserved so the small-screen
* collapse keeps a navigable header without a sidebar.
*/
export const ConversationHeader: FC<ConversationHeaderProps> = ({
agent,
fallbackName,
fallbackAdapter,
adapterHealth,
backLabel,
backTarget,
onGoHome,
onPinToggle,
headerExtra,
}) => {
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
const adapter = agent?.adapter ?? fallbackAdapter
const status: AgentLiveness = agent?.status ?? 'unknown'
const lastUsedAt = agent?.lastUsedAt ?? null
const pinned = agent?.pinned ?? false
const queueCount = agent?.queue?.length ?? 0
const tokens = agent?.tokens ?? null
const lifetimeTotal = tokens
? tokens.cumulative.input + tokens.cumulative.output
: 0
const metaParts: string[] = []
if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
if (queueCount > 0) {
metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
}
return (
<div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
<div className="flex min-w-0 items-center gap-3">
<Button
variant="ghost"
size="icon"
onClick={onGoHome}
className="size-8 shrink-0 rounded-xl lg:hidden"
title={backLabel}
>
<BackIcon className="size-4" />
</Button>
<div className="group min-w-0 flex-1">
<div className="flex items-center gap-2">
<span className="truncate font-semibold text-[15px] leading-6">
{agent?.name || fallbackName}
</span>
{agent ? (
<PinToggle pinned={pinned} onToggle={onPinToggle} />
) : null}
</div>
<div className="mt-0.5 flex items-center gap-2">
<AgentSummaryChips
adapter={adapter}
modelLabel={agent?.modelId ?? null}
reasoningEffort={agent?.reasoningEffort ?? null}
adapterHealth={adapterHealth}
/>
</div>
</div>
</div>
<div className="flex shrink-0 items-center gap-3">
<div className="flex shrink-0 flex-col items-end gap-1">
<StatusPill
status={status}
hasActiveTurn={Boolean(agent?.activeTurnId)}
/>
<div className="flex h-4 items-center text-[11px] text-muted-foreground">
<span className="truncate">
{metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
</span>
</div>
</div>
{headerExtra ? (
<div className="flex shrink-0 items-center">{headerExtra}</div>
) : null}
</div>
</div>
)
}
interface StatusPillProps {
status: AgentLiveness
hasActiveTurn: boolean
}
/**
* Working / Asleep / Attention all get distinctive styling; idle keeps
* the legacy emerald `Ready` pill so the default state is visually
* calm. Defensive working: `idle + activeTurnId` falls through to the
* working pill since the server says a turn is in flight.
*/
const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
const effective: AgentLiveness =
status === 'idle' && hasActiveTurn ? 'working' : status
const base =
'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
if (effective === 'working') {
return (
<Badge
variant="secondary"
className={cn(
base,
'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
)}
>
<span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
Working
</Badge>
)
}
if (effective === 'asleep') {
return (
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
<span className="size-1.5 rounded-full bg-muted-foreground/50" />
Asleep
</Badge>
)
}
if (effective === 'error') {
return (
<Badge
variant="destructive"
className={cn(base, 'border-destructive/30')}
>
<span className="size-1.5 rounded-full bg-destructive-foreground" />
Attention
</Badge>
)
}
if (effective === 'idle') {
return (
<Badge
variant="outline"
className={cn(
base,
'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
)}
>
<span className="size-1.5 rounded-full bg-emerald-500" />
Ready
</Badge>
)
}
return (
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
<span className="size-1.5 rounded-full bg-muted-foreground/30" />
Setup
</Badge>
)
}

View File

@@ -164,16 +164,7 @@ function VoiceButton({
)
}
/**
* Calm-composer footer shared by both `/home` (`variant="home"`) and
* the chat surface at `/agents/:agentId` (`variant="conversation"`).
* Pill-shaped chips on an internal dashed divider, with a right-
* aligned keyboard hint. The agent selector is conditional via
* `showAgentSelector`: home shows it as a filled pill on the left,
* the chat surface hides it (the agent is locked once you're in the
* conversation).
*/
function CalmContextControls({
function ContextControls({
agents,
onCreateAgent,
onSelectAgent,
@@ -210,128 +201,110 @@ function CalmContextControls({
)?.is_authenticated
})
const showApps = supports(Feature.MANAGED_MCP_SUPPORT)
const showWorkspace = supports(Feature.WORKSPACE_FOLDER_SUPPORT)
return (
<div className="mx-3 flex items-center gap-1 border-border/60 border-t border-dashed py-2">
{showAgentSelector ? (
<>
<div className="flex items-center justify-between border-border/40 border-t px-4 py-2.5">
<div className="flex items-center gap-1">
{showAgentSelector ? (
<AgentSelector
agents={agents}
selectedAgentId={selectedAgentId}
onSelectAgent={onSelectAgent}
onCreateAgent={onCreateAgent}
status={status}
triggerVariant="pill"
/>
<span
aria-hidden="true"
className="mx-1 inline-block h-3.5 w-px shrink-0 bg-border"
/>
</>
) : null}
{showWorkspace ? (
<WorkspaceSelector>
<button
type="button"
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground data-[state=open]:bg-accent data-[state=open]:text-foreground"
>
<Folder className="size-3" />
<span>Workspace</span>
<span className="font-mono text-[10.5px] text-muted-foreground/70">
{selectedFolder?.name ?? 'none'}
</span>
</button>
</WorkspaceSelector>
) : null}
<TabPickerPopover
variant="selector"
selectedTabs={selectedTabs}
onToggleTab={onToggleTab}
>
<button
type="button"
className={cn(
'inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] transition-colors data-[state=open]:bg-accent data-[state=open]:text-foreground',
selectedTabs.length > 0
? 'bg-[var(--accent-orange)] text-white hover:bg-[var(--accent-orange)]/90'
: 'text-muted-foreground hover:bg-accent hover:text-foreground',
)}
) : null}
{supports(Feature.WORKSPACE_FOLDER_SUPPORT) ? (
<WorkspaceSelector>
<Button
variant="ghost"
className={cn(
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
'data-[state=open]:bg-accent',
)}
>
<Folder className="h-4 w-4" />
<span>{selectedFolder?.name || 'Add workspace'}</span>
<ChevronDown className="h-3 w-3" />
</Button>
</WorkspaceSelector>
) : null}
<TabPickerPopover
variant="selector"
selectedTabs={selectedTabs}
onToggleTab={onToggleTab}
>
<Layers className="size-3" />
<span>Tabs</span>
<span
<Button
className={cn(
'font-mono text-[10.5px]',
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
selectedTabs.length > 0
? 'text-white/80'
: 'text-muted-foreground/70',
? 'bg-[var(--accent-orange)]! text-white shadow-sm'
: 'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
'data-[state=open]:bg-accent',
)}
>
{selectedTabs.length}
</span>
</button>
</TabPickerPopover>
<button
type="button"
onClick={onAttachClick}
disabled={attachDisabled || !attachmentsEnabled}
title="Attach files"
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground disabled:cursor-not-allowed disabled:opacity-50"
>
<Paperclip className="size-3" />
<span>Attach</span>
</button>
{showApps ? (
<AppSelector side="bottom">
<button
type="button"
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground data-[state=open]:bg-accent data-[state=open]:text-foreground"
>
{connectedManagedServers.length > 0 ? (
<span className="flex items-center -space-x-1.5">
<Layers className="h-4 w-4" />
<span>Tabs</span>
</Button>
</TabPickerPopover>
<Button
type="button"
variant="ghost"
onClick={onAttachClick}
disabled={attachDisabled || !attachmentsEnabled}
title="Attach files"
className={cn(
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
)}
>
<Paperclip className="h-4 w-4" />
<span>Attach</span>
</Button>
</div>
{supports(Feature.MANAGED_MCP_SUPPORT) ? (
<div className="ml-auto flex items-center gap-1.5">
<AppSelector side="bottom">
<Button
variant="ghost"
className={cn(
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
'data-[state=open]:bg-accent',
)}
>
<div className="flex items-center -space-x-1.5">
{connectedManagedServers.slice(0, 4).map((server) => (
<span
<div
key={server.id}
className="rounded-full ring-2 ring-card"
>
<McpServerIcon
serverName={server.managedServerName ?? ''}
size={12}
size={16}
/>
</span>
</div>
))}
</span>
) : (
<FileText className="size-3" />
)}
<span>Apps</span>
<ChevronDown className="size-3" />
</button>
</AppSelector>
</div>
{connectedManagedServers.length > 4 ? (
<span className="text-xs">
+{connectedManagedServers.length - 4}
</span>
) : null}
<span>Apps</span>
<ChevronDown className="h-3 w-3" />
</Button>
</AppSelector>
</div>
) : null}
<div className="ml-auto inline-flex shrink-0 items-center gap-1.5 text-[11px] text-muted-foreground/70">
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
</kbd>
<span>to run</span>
<span className="text-muted-foreground/40">·</span>
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
</kbd>
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
</kbd>
<span>new line</span>
</div>
</div>
)
}
function HomeShell({ children }: { children: ReactNode }) {
return (
<div className="overflow-hidden rounded-[1.55rem] border border-border/60 bg-card/95 shadow-sm transition-[border-color,box-shadow] duration-150 focus-within:border-[var(--accent-orange)]/40 focus-within:shadow-[0_0_0_4px_color-mix(in_oklch,var(--accent-orange)_15%,transparent),0_1px_2px_rgba(15,23,42,0.04)]">
<div className="overflow-hidden rounded-[1.55rem] border border-border/60 bg-card/95 shadow-sm">
{children}
</div>
)
@@ -339,7 +312,7 @@ function HomeShell({ children }: { children: ReactNode }) {
function ConversationShell({ children }: { children: ReactNode }) {
return (
<div className="overflow-hidden rounded-[1.35rem] border border-border/50 bg-background/95 shadow-[0_10px_30px_rgba(15,23,42,0.06)] backdrop-blur-md transition-[border-color,box-shadow] duration-150 focus-within:border-[var(--accent-orange)]/40 focus-within:shadow-[0_0_0_4px_color-mix(in_oklch,var(--accent-orange)_15%,transparent),0_10px_30px_rgba(15,23,42,0.06)]">
<div className="overflow-hidden rounded-[1.35rem] border border-border/50 bg-background/95 shadow-[0_10px_30px_rgba(15,23,42,0.06)] backdrop-blur-md">
{children}
</div>
)
@@ -569,7 +542,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
}
disabled={disabled || voice.isTranscribing}
className={cn(
'resize-none border-none bg-transparent px-0 text-[15px] shadow-none focus-visible:ring-0 dark:bg-transparent',
'resize-none border-none bg-transparent px-0 text-[15px] shadow-none focus-visible:ring-0',
'[field-sizing:fixed]',
variant === 'home'
? 'min-h-[40px] py-2 leading-6'
@@ -610,7 +583,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
{voice.error}
</div>
) : null}
<CalmContextControls
<ContextControls
agents={agents}
onCreateAgent={onCreateAgent}
onSelectAgent={onSelectAgent}

View File

@@ -22,26 +22,10 @@ import type {
AgentConversationTurn,
ToolEntry,
} from '@/lib/agent-conversations/types'
import { FileCardStrip } from './agent-conversation.file-card-strip'
interface ConversationMessageProps {
turn: AgentConversationTurn
streaming: boolean
/**
* Forwarded to the inline file-card strip's "View" / "+N"
* button. Wired up by AgentCommandConversation so the strip can
* deep-link straight into the Outputs rail at the matching turn
* group. `null` here disables the strip's deep-link affordance
* — the cards still open the preview Sheet directly.
*/
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
/**
* Render only the trailing FileCardStrip for this turn — used
* when the turn's user / assistant text is already rendered
* elsewhere (e.g. by `ClawChatMessage` from persisted history)
* but the produced-files affordance would otherwise be lost.
*/
stripOnly?: boolean
}
interface RenderEntry {
@@ -104,22 +88,9 @@ function ToolStatusIcon({ status }: { status: ToolEntry['status'] }) {
export const ConversationMessage: FC<ConversationMessageProps> = ({
turn,
streaming,
onOpenOutputsRail,
stripOnly,
}) => {
const entries = useMemo(() => buildRenderEntries(turn), [turn])
if (stripOnly) {
if (!turn.producedFiles || turn.producedFiles.length === 0) return null
return (
<FileCardStrip
turnId={turn.turnId ?? null}
files={turn.producedFiles}
onOpenRail={onOpenOutputsRail ?? (() => {})}
/>
)
}
return (
<div className="space-y-3">
<Message from="user">
@@ -214,14 +185,6 @@ export const ConversationMessage: FC<ConversationMessageProps> = ({
</Message>
)}
{turn.producedFiles && turn.producedFiles.length > 0 ? (
<FileCardStrip
turnId={turn.turnId ?? null}
files={turn.producedFiles}
onOpenRail={onOpenOutputsRail ?? (() => {})}
/>
) : null}
{!turn.done && turn.parts.length === 0 && streaming && (
<div className="flex gap-2">
<div className="flex size-7 shrink-0 items-center justify-center rounded-full bg-[var(--accent-orange)] text-white">

View File

@@ -1,124 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* @deprecated Replaced by `FileCardStrip` in
* `agent-conversation.file-card-strip.tsx`. Kept temporarily so
* any in-flight callers don't fail to import; remove in a
* follow-up once nothing external references it.
*
* Compact "Files produced" card rendered under an assistant turn.
*/
import { FileText, Image as ImageIcon, Paperclip } from 'lucide-react'
import { type FC, useMemo, useState } from 'react'
import { Button } from '@/components/ui/button'
import { basenameOf, formatFileSize, inferFileKind } from '@/lib/agent-files'
import { cn } from '@/lib/utils'
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
export interface ProducedFileLike {
id: string
path: string
size: number
}
interface ArtifactCardProps {
files: ReadonlyArray<ProducedFileLike>
className?: string
}
const MAX_INLINE_ROWS = 4
export const ArtifactCard: FC<ArtifactCardProps> = ({ files, className }) => {
const [openFileId, setOpenFileId] = useState<string | null>(null)
const [expanded, setExpanded] = useState(false)
const sortedFiles = useMemo(
() => [...files].sort((a, b) => a.path.localeCompare(b.path)),
[files],
)
if (sortedFiles.length === 0) return null
const visible = expanded ? sortedFiles : sortedFiles.slice(0, MAX_INLINE_ROWS)
const hiddenCount = sortedFiles.length - visible.length
const openFile = sortedFiles.find((file) => file.id === openFileId) ?? null
return (
<div
className={cn(
'rounded-xl border border-border/60 bg-card/50 px-3 py-2.5',
className,
)}
>
<div className="mb-2 flex items-center gap-2 text-muted-foreground text-xs">
<Paperclip className="size-3.5" />
<span className="font-medium text-foreground">
{sortedFiles.length === 1
? '1 file produced'
: `${sortedFiles.length} files produced`}
</span>
</div>
<ul className="flex flex-col gap-1">
{visible.map((file) => (
<li key={file.id}>
<ArtifactRow file={file} onOpen={() => setOpenFileId(file.id)} />
</li>
))}
</ul>
{hiddenCount > 0 ? (
<Button
type="button"
variant="ghost"
size="sm"
className="mt-1.5 h-7 px-2 text-xs"
onClick={() => setExpanded(true)}
>
Show {hiddenCount} more
</Button>
) : null}
<FilePreviewSheet
fileId={openFile?.id ?? null}
filePath={openFile?.path ?? null}
open={Boolean(openFileId)}
onOpenChange={(next) => {
if (!next) setOpenFileId(null)
}}
/>
</div>
)
}
function ArtifactRow({
file,
onOpen,
}: {
file: ProducedFileLike
onOpen: () => void
}) {
const name = basenameOf(file.path)
const kind = inferFileKind(file.path)
const Icon = kind === 'image' ? ImageIcon : FileText
return (
<button
type="button"
onClick={onOpen}
className={cn(
'flex w-full items-center gap-2 rounded-md px-2 py-1.5 text-left text-sm transition-colors',
'hover:bg-accent/60 focus:bg-accent/60 focus:outline-hidden',
)}
>
<Icon className="size-3.5 shrink-0 text-muted-foreground" />
<span className="min-w-0 flex-1 truncate font-medium">{name}</span>
<span className="shrink-0 text-muted-foreground text-xs tabular-nums">
{formatFileSize(file.size)}
</span>
</button>
)
}

View File

@@ -1,163 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* "Files produced" strip rendered at the bottom of any assistant
* turn that produced files (openclaw only). Replaces Phase 5.3's
* row-list ArtifactCard with small horizontal cards for a lighter
* visual treatment.
*
* Click semantics:
* - Card → opens FilePreviewSheet directly (preview + download).
* - View → emits onOpenRail(turnId); the parent opens the rail
* and scrolls to the matching turn group.
* - +N → same as View (the user is asking to see what was
* overflowed).
*/
import { ChevronRight, FileText, Image as ImageIcon } from 'lucide-react'
import { type FC, useMemo, useState } from 'react'
import { Button } from '@/components/ui/button'
import { basenameOf, formatFileSize, inferFileKind } from '@/lib/agent-files'
import { cn } from '@/lib/utils'
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
export interface CardStripFile {
id: string
path: string
size: number
}
interface FileCardStripProps {
/**
* The turn id that produced these files. Forwarded to
* `onOpenRail` so the rail can scroll/expand the matching group.
* Optional because the live `produced_files` event lands before
* the harness has stamped a server-issued turn id on the
* optimistic turn — in that brief window, View falls back to
* just opening the rail at the top.
*/
turnId?: string | null
files: ReadonlyArray<CardStripFile>
/** Caller wires this to `setOutputsRailOpen(true)` + deep-link. */
onOpenRail: (turnId?: string | null) => void
className?: string
}
const MAX_VISIBLE = 4
export const FileCardStrip: FC<FileCardStripProps> = ({
turnId,
files,
onOpenRail,
className,
}) => {
const [openFileId, setOpenFileId] = useState<string | null>(null)
const sortedFiles = useMemo(
() => [...files].sort((a, b) => a.path.localeCompare(b.path)),
[files],
)
if (sortedFiles.length === 0) return null
const visible = sortedFiles.slice(0, MAX_VISIBLE)
const hiddenCount = sortedFiles.length - visible.length
const openFile = sortedFiles.find((file) => file.id === openFileId) ?? null
return (
<div
className={cn(
'rounded-xl border border-border/60 bg-card/50 px-3 py-2.5',
className,
)}
>
<div className="mb-2 flex items-center gap-2">
<span className="text-muted-foreground text-xs">
{sortedFiles.length === 1
? 'File produced'
: `Files produced (${sortedFiles.length})`}
</span>
<Button
type="button"
variant="ghost"
size="sm"
className="ml-auto h-7 gap-1 px-2 text-xs"
onClick={() => onOpenRail(turnId ?? null)}
>
View
<ChevronRight className="size-3" />
</Button>
</div>
<div className="flex flex-wrap gap-2">
{visible.map((file) => (
<FileCard
key={file.id}
file={file}
onOpen={() => setOpenFileId(file.id)}
/>
))}
{hiddenCount > 0 ? (
<button
type="button"
onClick={() => onOpenRail(turnId ?? null)}
className={cn(
'flex h-[56px] min-w-[56px] shrink-0 items-center justify-center rounded-lg border border-border/60 px-3 text-muted-foreground text-xs',
'transition-colors hover:border-border hover:bg-accent/40 hover:text-foreground',
'focus:outline-hidden focus-visible:ring-2 focus-visible:ring-[var(--accent-orange)]',
)}
title={`See ${hiddenCount} more in the Outputs rail`}
>
+{hiddenCount}
</button>
) : null}
</div>
<FilePreviewSheet
fileId={openFile?.id ?? null}
filePath={openFile?.path ?? null}
open={Boolean(openFileId)}
onOpenChange={(next) => {
if (!next) setOpenFileId(null)
}}
/>
</div>
)
}
function FileCard({
file,
onOpen,
}: {
file: CardStripFile
onOpen: () => void
}) {
const name = basenameOf(file.path)
const kind = inferFileKind(file.path)
const Icon = kind === 'image' ? ImageIcon : FileText
return (
<button
type="button"
onClick={onOpen}
title={file.path}
className={cn(
'flex h-[56px] w-[140px] shrink-0 flex-col justify-between rounded-lg border border-border/60 bg-background px-2.5 py-1.5 text-left',
'transition-colors hover:border-border hover:bg-accent/40',
'focus:outline-hidden focus-visible:ring-2 focus-visible:ring-[var(--accent-orange)]',
)}
>
<div className="flex min-w-0 items-center gap-1.5">
<Icon className="size-3.5 shrink-0 text-muted-foreground" />
<span className="min-w-0 flex-1 truncate font-medium text-xs">
{name}
</span>
</div>
<span className="text-[11px] text-muted-foreground tabular-nums">
{formatFileSize(file.size)}
</span>
</button>
)
}

View File

@@ -1,283 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* Shared preview drawer used by the inline artifact card AND the
* Outputs rail. Branches on the FilePreview discriminated union and
* renders the appropriate body. Always opens via a controlled
* `open`/`onOpenChange` pair so the parent owns the selected file.
*/
import { Download, FileWarning, Loader2 } from 'lucide-react'
import { type FC, useEffect, useMemo, useRef } from 'react'
import { toast } from 'sonner'
import { MessageResponse } from '@/components/ai-elements/message'
import { Button } from '@/components/ui/button'
import { ScrollArea } from '@/components/ui/scroll-area'
import {
Sheet,
SheetContent,
SheetDescription,
SheetHeader,
SheetTitle,
} from '@/components/ui/sheet'
import { Skeleton } from '@/components/ui/skeleton'
import {
basenameOf,
buildFileDownloadUrl,
extensionOf,
type FilePreview,
formatFileSize,
useFilePreview,
} from '@/lib/agent-files'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
import { cn } from '@/lib/utils'
interface FilePreviewSheetProps {
fileId: string | null
filePath: string | null
open: boolean
onOpenChange: (open: boolean) => void
}
const MARKDOWN_EXTENSIONS = new Set(['md', 'markdown', 'mdx'])
export const FilePreviewSheet: FC<FilePreviewSheetProps> = ({
fileId,
filePath,
open,
onOpenChange,
}) => {
const { baseUrl } = useAgentServerUrl()
const { preview, loading, error } = useFilePreview(fileId, open)
const fileName = filePath ? basenameOf(filePath) : 'File preview'
const downloadUrl = useMemo(() => {
if (!baseUrl || !fileId) return null
return buildFileDownloadUrl(baseUrl, fileId)
}, [baseUrl, fileId])
// Surface preview-load failures in a toast in addition to the
// inline error block — the inline UI lives at the bottom of the
// sheet and is easy to miss when scrolled into the body.
const lastToastedFileIdRef = useRef<string | null>(null)
useEffect(() => {
if (!open) {
lastToastedFileIdRef.current = null
return
}
if (!error || !fileId) return
if (lastToastedFileIdRef.current === fileId) return
lastToastedFileIdRef.current = fileId
toast.error('Could not load preview', { description: error.message })
}, [open, error, fileId])
const handleDownload = () => {
if (!downloadUrl) {
toast.error("Couldn't reach the agent server", {
description: 'Reconnect to BrowserOS and try again.',
})
return
}
// Manually trigger the download so any future failure (e.g. the
// server returns 404 because the file was removed) can be
// surfaced via toast — the bare <a download> path swallows
// these errors silently.
const link = document.createElement('a')
link.href = downloadUrl
link.download = fileName
link.rel = 'noopener'
document.body.appendChild(link)
link.click()
link.remove()
}
return (
<Sheet open={open} onOpenChange={onOpenChange}>
<SheetContent
side="right"
className="flex w-full flex-col gap-0 p-0 sm:max-w-xl"
>
<SheetHeader className="border-border/60 border-b px-5 py-4">
<SheetTitle className="truncate pr-8">{fileName}</SheetTitle>
<SheetDescription className="truncate">
{filePath ?? ''}
</SheetDescription>
</SheetHeader>
<ScrollArea className="min-h-0 flex-1">
<div className="px-5 py-4">
{loading ? (
<PreviewSkeleton />
) : error ? (
<PreviewError message={error.message} />
) : preview ? (
<PreviewBody
preview={preview}
filePath={filePath}
downloadUrl={downloadUrl}
/>
) : null}
</div>
</ScrollArea>
{fileId ? (
<div className="border-border/60 border-t bg-background/90 px-5 py-3 backdrop-blur">
<Button
type="button"
size="sm"
className="w-full gap-2"
onClick={handleDownload}
>
<Download className="size-3.5" />
Download
</Button>
</div>
) : null}
</SheetContent>
</Sheet>
)
}
function PreviewSkeleton() {
return (
<div className="flex flex-col gap-2">
<div className="flex items-center gap-2 text-muted-foreground text-xs">
<Loader2 className="size-3.5 animate-spin" />
Loading preview...
</div>
<Skeleton className="h-4 w-3/4" />
<Skeleton className="h-4 w-full" />
<Skeleton className="h-4 w-5/6" />
<Skeleton className="h-4 w-2/3" />
</div>
)
}
function PreviewError({ message }: { message: string }) {
return (
<div className="flex flex-col items-start gap-2 rounded-lg border border-destructive/30 bg-destructive/5 px-3 py-2 text-destructive text-sm">
<div className="flex items-center gap-2 font-medium">
<FileWarning className="size-4" />
Could not load preview
</div>
<p className="text-destructive/80 text-xs">{message}</p>
</div>
)
}
function PreviewBody({
preview,
filePath,
downloadUrl,
}: {
preview: FilePreview
filePath: string | null
downloadUrl: string | null
}) {
if (preview.kind === 'missing') {
return (
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
This file is no longer in the workspace. The agent may have moved or
deleted it after the turn finished.
</div>
)
}
if (preview.kind === 'image') {
return (
<div className="flex flex-col gap-3">
<PreviewMeta preview={preview} />
<div className="overflow-hidden rounded-lg border border-border/60 bg-muted/30">
<img
src={preview.dataUrl}
alt={filePath ?? 'preview'}
className="block max-h-[60vh] w-full object-contain"
/>
</div>
</div>
)
}
if (preview.kind === 'pdf') {
return (
<div className="flex flex-col gap-3">
<PreviewMeta preview={preview} />
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
PDF previews aren't supported inline yet. Use Download to open this
file in your default PDF viewer.
</div>
</div>
)
}
if (preview.kind === 'binary') {
return (
<div className="flex flex-col gap-3">
<PreviewMeta preview={preview} />
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
No inline preview for this file type.
{downloadUrl ? ' Use Download to save it locally.' : null}
</div>
</div>
)
}
return <TextPreviewBody preview={preview} filePath={filePath} />
}
function TextPreviewBody({
preview,
filePath,
}: {
preview: Extract<FilePreview, { kind: 'text' }>
filePath: string | null
}) {
const ext = filePath ? extensionOf(filePath).toLowerCase() : ''
const renderAsMarkdown = MARKDOWN_EXTENSIONS.has(ext)
return (
<div className="flex flex-col gap-3">
<PreviewMeta preview={preview} />
{renderAsMarkdown ? (
<div
className={cn(
'prose prose-sm dark:prose-invert max-w-none break-words rounded-lg border border-border/60 bg-muted/30 px-4 py-3',
"[&_[data-streamdown='code-block']]:!w-full [&_[data-streamdown='code-block']]:overflow-x-auto",
)}
>
<MessageResponse mode="static" parseIncompleteMarkdown={false}>
{preview.snippet}
</MessageResponse>
</div>
) : (
<pre className="overflow-x-auto rounded-lg border border-border/60 bg-muted/30 px-3 py-2 text-xs leading-relaxed">
<code className="font-mono text-foreground">{preview.snippet}</code>
</pre>
)}
{preview.truncated ? (
<div className="text-muted-foreground text-xs">
Showing the first part of this file. Download to see the full
contents.
</div>
) : null}
</div>
)
}
function PreviewMeta({
preview,
}: {
preview: Exclude<FilePreview, { kind: 'missing' }>
}) {
return (
<div className="flex flex-wrap items-center gap-x-3 gap-y-1 text-muted-foreground text-xs">
<span className="font-medium text-foreground">
{formatFileSize(preview.size)}
</span>
<span>·</span>
<span className="font-mono">{preview.mimeType || 'unknown'}</span>
</div>
)
}

View File

@@ -1,338 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* Per-agent right-side "Outputs" panel. Lists every file the harness
* has attributed to this agent, grouped by the turn that produced
* them. Click a row to open the shared preview Sheet.
*
* Lifecycle:
* - Open/closed state is controlled by the parent and persisted via
* `useOutputsRailOpen(agentId)` so each agent remembers its
* preference independently.
* - Data refreshes whenever a turn finishes (the conversation hook
* fires `useInvalidateAgentOutputs` from its finally block).
* - Manual "Refresh" button is wired to `useRefreshAgentOutputs`
* for users who navigate in mid-turn.
*/
import {
ChevronDown,
ChevronRight,
FileText,
Image as ImageIcon,
Inbox,
Loader2,
PanelRightClose,
RefreshCw,
} from 'lucide-react'
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
import { toast } from 'sonner'
import { Button } from '@/components/ui/button'
import {
Collapsible,
CollapsibleContent,
CollapsibleTrigger,
} from '@/components/ui/collapsible'
import { ScrollArea } from '@/components/ui/scroll-area'
import { Skeleton } from '@/components/ui/skeleton'
import {
basenameOf,
formatFileSize,
inferFileKind,
type ProducedFilesRailGroup,
useAgentOutputs,
useRefreshAgentOutputs,
} from '@/lib/agent-files'
import { cn } from '@/lib/utils'
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
interface OutputsRailProps {
agentId: string
onClose: () => void
/**
* When set, the rail scrolls the matching `RailTurnGroup` into
* view and force-opens its `Collapsible`. Used by the inline
* file-card strip's "View" / "+N" deep-link path. Cleared by
* the parent (via `onFocusTurnConsumed`) once the rail has
* acknowledged the deep-link so subsequent renders don't keep
* re-scrolling the same group.
*/
focusTurnId?: string | null
onFocusTurnConsumed?: () => void
}
const RAIL_LOCAL_STORAGE_PREFIX = 'browseros:outputs-rail:'
/**
* Controlled open/close state with per-agent localStorage memory.
* Returns a tuple compatible with React's useState shape so the
* parent can pass it straight into the rail without an extra effect.
*/
export function useOutputsRailOpen(
agentId: string,
): [boolean, (next: boolean) => void] {
const [open, setOpen] = useState(false)
useEffect(() => {
if (typeof window === 'undefined' || !agentId) return
try {
const stored = window.localStorage.getItem(
`${RAIL_LOCAL_STORAGE_PREFIX}${agentId}`,
)
setOpen(stored === '1')
} catch {
// localStorage may be unavailable (private mode, locked-down
// contexts) — fall back to closed.
}
}, [agentId])
const update = (next: boolean) => {
setOpen(next)
if (typeof window === 'undefined' || !agentId) return
try {
window.localStorage.setItem(
`${RAIL_LOCAL_STORAGE_PREFIX}${agentId}`,
next ? '1' : '0',
)
} catch {
// Best-effort persistence.
}
}
return [open, update]
}
export const OutputsRail: FC<OutputsRailProps> = ({
agentId,
onClose,
focusTurnId,
onFocusTurnConsumed,
}) => {
const { groups, loading, error } = useAgentOutputs(agentId)
const refresh = useRefreshAgentOutputs(agentId)
const [openFile, setOpenFile] = useState<{
id: string
path: string
} | null>(null)
const totalFiles = useMemo(
() => groups.reduce((sum, group) => sum + group.files.length, 0),
[groups],
)
return (
<aside className="flex h-full min-h-0 w-full flex-col border-border/50 border-l bg-background">
<header className="flex shrink-0 items-center gap-2 border-border/50 border-b px-3 py-3">
<span className="font-semibold text-[13px] uppercase tracking-wide">
Outputs
</span>
{totalFiles > 0 ? (
<span className="text-muted-foreground text-xs tabular-nums">
{totalFiles}
</span>
) : null}
<div className="ml-auto flex items-center gap-1">
<Button
type="button"
variant="ghost"
size="icon"
className="size-7"
onClick={() =>
refresh.mutate(undefined, {
onError: (err) =>
toast.error('Refresh failed', {
description:
err instanceof Error ? err.message : String(err),
}),
})
}
disabled={refresh.isPending}
title="Refresh"
>
{refresh.isPending ? (
<Loader2 className="size-3.5 animate-spin" />
) : (
<RefreshCw className="size-3.5" />
)}
</Button>
<Button
type="button"
variant="ghost"
size="icon"
className="size-7"
onClick={onClose}
title="Hide outputs"
>
<PanelRightClose className="size-3.5" />
</Button>
</div>
</header>
<ScrollArea className="min-h-0 flex-1">
<div className="px-2 py-2">
{loading && groups.length === 0 ? (
<RailSkeleton />
) : error ? (
<RailError message={error.message} />
) : groups.length === 0 ? (
<RailEmpty />
) : (
<ul className="flex flex-col gap-2">
{groups.map((group) => (
<li key={group.turnId}>
<RailTurnGroup
group={group}
focused={
Boolean(focusTurnId) && focusTurnId === group.turnId
}
onFocusConsumed={onFocusTurnConsumed}
onOpenFile={(file) =>
setOpenFile({ id: file.id, path: file.path })
}
/>
</li>
))}
</ul>
)}
</div>
</ScrollArea>
<FilePreviewSheet
fileId={openFile?.id ?? null}
filePath={openFile?.path ?? null}
open={Boolean(openFile)}
onOpenChange={(next) => {
if (!next) setOpenFile(null)
}}
/>
</aside>
)
}
function RailTurnGroup({
group,
focused,
onFocusConsumed,
onOpenFile,
}: {
group: ProducedFilesRailGroup
focused: boolean
onFocusConsumed?: () => void
onOpenFile: (file: { id: string; path: string }) => void
}) {
const [open, setOpen] = useState(true)
const headerLabel = group.turnPrompt.trim() || 'Turn'
const containerRef = useRef<HTMLDivElement>(null)
// Deep-link consumption: when the parent passes `focused=true`,
// expand the collapsible (in case the user had collapsed it
// earlier) and scroll into view. Fire `onFocusConsumed` so the
// parent can drop the URL param and we don't re-scroll on every
// render after that.
useEffect(() => {
if (!focused) return
setOpen(true)
containerRef.current?.scrollIntoView({
behavior: 'smooth',
block: 'nearest',
})
onFocusConsumed?.()
}, [focused, onFocusConsumed])
return (
<div ref={containerRef}>
<Collapsible open={open} onOpenChange={setOpen}>
<CollapsibleTrigger
className={cn(
'flex w-full items-center gap-1.5 rounded-md px-1.5 py-1 text-left text-muted-foreground text-xs',
'transition-colors hover:bg-accent/40 hover:text-foreground',
)}
>
{open ? (
<ChevronDown className="size-3 shrink-0" />
) : (
<ChevronRight className="size-3 shrink-0" />
)}
<span className="min-w-0 flex-1 truncate font-medium">
{headerLabel}
</span>
<span className="shrink-0 tabular-nums">{group.files.length}</span>
</CollapsibleTrigger>
<CollapsibleContent>
<ul className="mt-1 ml-1 flex flex-col gap-0.5 border-border/40 border-l pl-2">
{group.files.map((file) => (
<li key={file.id}>
<RailFileRow file={file} onOpen={() => onOpenFile(file)} />
</li>
))}
</ul>
</CollapsibleContent>
</Collapsible>
</div>
)
}
function RailFileRow({
file,
onOpen,
}: {
file: ProducedFilesRailGroup['files'][number]
onOpen: () => void
}) {
const name = basenameOf(file.path)
const kind = inferFileKind(file.path)
const Icon = kind === 'image' ? ImageIcon : FileText
return (
<button
type="button"
onClick={onOpen}
className={cn(
'flex w-full items-center gap-2 rounded-md px-1.5 py-1 text-left text-xs transition-colors',
'hover:bg-accent/60 focus:bg-accent/60 focus:outline-hidden',
)}
title={file.path}
>
<Icon className="size-3 shrink-0 text-muted-foreground" />
<span className="min-w-0 flex-1 truncate">{name}</span>
<span className="shrink-0 text-muted-foreground tabular-nums">
{formatFileSize(file.size)}
</span>
</button>
)
}
function RailSkeleton() {
return (
<div className="flex flex-col gap-2 px-1.5 py-1">
<Skeleton className="h-4 w-1/2" />
<Skeleton className="h-4 w-3/4" />
<Skeleton className="h-4 w-2/3" />
<Skeleton className="h-4 w-5/6" />
</div>
)
}
function RailEmpty() {
return (
<div className="mx-2 my-3 flex flex-col items-center gap-1.5 rounded-lg border border-border/60 border-dashed bg-muted/20 px-3 py-6 text-center text-muted-foreground text-xs">
<Inbox className="size-4" />
<p className="font-medium">No outputs yet</p>
<p className="text-[11px] text-muted-foreground/70 leading-snug">
Files this agent creates will appear here, grouped by the turn that made
them.
</p>
</div>
)
}
function RailError({ message }: { message: string }) {
return (
<div className="mx-2 my-3 rounded-lg border border-destructive/30 bg-destructive/5 px-3 py-2 text-destructive text-xs">
{message}
</div>
)
}

View File

@@ -1,6 +1,5 @@
import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpenClaw'
import type { AgentConversationTurn } from '@/lib/agent-conversations/types'
import type { ProducedFilesRailGroup } from '@/lib/agent-files'
export type ClawChatRole = 'user' | 'assistant'
@@ -235,30 +234,6 @@ export function filterTurnsPersistedInHistory(
)
}
/**
* Persisted turns that still carry `producedFiles` — once history
* reloads, the assistant text is rendered by `ClawChatMessage` and
* the optimistic turn is filtered out by
* `filterTurnsPersistedInHistory`. The historical message has no
* `producedFiles` field (history items don't carry that), so the
* inline file-card strip would vanish on history reload.
*
* Returning these here lets the caller render a strip-only entry
* after the corresponding history bubble — full message stays as
* the persisted history pair, but the produced-files affordance
* survives.
*/
export function selectStripOnlyTurns(
turns: AgentConversationTurn[],
historyMessages: ClawChatMessage[],
): AgentConversationTurn[] {
return turns.filter(
(turn) =>
Boolean(turn.producedFiles && turn.producedFiles.length > 0) &&
isTurnPersistedInHistory(turn, historyMessages),
)
}
function isTurnPersistedInHistory(
turn: AgentConversationTurn,
historyMessages: ClawChatMessage[],
@@ -310,59 +285,3 @@ function getClawMessageText(message: ClawChatMessage): string {
.join('')
.trim()
}
function firstNonBlankLine(value: string): string {
for (const raw of value.split('\n')) {
const trimmed = raw.trim()
if (trimmed) return trimmed
}
return ''
}
/**
* Map each assistant history message to the produced-files group
* that came from its turn. Match key is `group.turnPrompt` (first
* non-blank line of the user prompt that initiated the turn) vs.
* the first non-blank line of the user message that immediately
* preceded this assistant message — the same shape the server
* emits when storing turnPrompt.
*
* Walks history forward (oldest-first per `flattenHistoryPages`)
* and consumes groups in chronological order. A group can only
* match once — if two turns share the same prompt the earlier
* one wins, and the later assistant message stays unassociated
* (those land back in `tailStripGroups` at the conversation tail).
*/
export function mapHistoryToProducedFilesGroups(
historyMessages: ClawChatMessage[],
groups: ReadonlyArray<ProducedFilesRailGroup>,
): {
byAssistantMessageId: Map<string, ProducedFilesRailGroup>
unmatched: ProducedFilesRailGroup[]
} {
const byAssistantMessageId = new Map<string, ProducedFilesRailGroup>()
if (groups.length === 0) {
return { byAssistantMessageId, unmatched: [] }
}
// Oldest-first so the iteration order matches history.
const remaining = [...groups].sort((a, b) => a.createdAt - b.createdAt)
let pendingPrompt: string | null = null
for (const message of historyMessages) {
if (message.role === 'user') {
pendingPrompt = firstNonBlankLine(getClawMessageText(message))
continue
}
if (message.role !== 'assistant' || !pendingPrompt) continue
const matchIndex = remaining.findIndex(
(group) => group.turnPrompt === pendingPrompt,
)
if (matchIndex >= 0) {
const [match] = remaining.splice(matchIndex, 1)
byAssistantMessageId.set(message.id, match)
}
pendingPrompt = null
}
return { byAssistantMessageId, unmatched: remaining }
}

View File

@@ -1,109 +0,0 @@
import { afterEach, describe, expect, it } from 'bun:test'
import type { StagedAttachment } from '@/lib/attachments'
import {
consumePendingInitialMessage,
peekPendingInitialMessage,
setPendingInitialMessage,
} from './pending-initial-message'
function makeAttachment(id: string): StagedAttachment {
return {
id,
kind: 'image',
mediaType: 'image/png',
name: `${id}.png`,
dataUrl: `data:image/png;base64,${id}`,
payload: {
kind: 'image',
mediaType: 'image/png',
name: `${id}.png`,
dataUrl: `data:image/png;base64,${id}`,
},
}
}
afterEach(() => {
// Drain any leftover pending entry so tests don't leak into each
// other (the module-scope state survives across `it` blocks).
consumePendingInitialMessage('drain')
// If still set, clear by consuming with the matching id.
const leftover = peekPendingInitialMessage()
if (leftover) consumePendingInitialMessage(leftover.agentId)
})
describe('pending-initial-message', () => {
it('consume returns the payload set for the same agentId', () => {
setPendingInitialMessage({
agentId: 'agent-a',
text: 'hello',
attachments: [makeAttachment('one')],
createdAt: Date.now(),
})
const result = consumePendingInitialMessage('agent-a')
expect(result?.text).toBe('hello')
expect(result?.attachments).toHaveLength(1)
expect(result?.attachments[0]?.id).toBe('one')
})
it('consume is destructive — second call returns null', () => {
setPendingInitialMessage({
agentId: 'agent-a',
text: 'hello',
attachments: [],
createdAt: Date.now(),
})
expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
expect(consumePendingInitialMessage('agent-a')).toBeNull()
})
it('consume returns null and preserves entry when agentId differs', () => {
setPendingInitialMessage({
agentId: 'agent-a',
text: 'hello',
attachments: [],
createdAt: Date.now(),
})
expect(consumePendingInitialMessage('agent-b')).toBeNull()
expect(peekPendingInitialMessage()?.agentId).toBe('agent-a')
expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
})
it('returns null for entries older than the TTL', () => {
setPendingInitialMessage({
agentId: 'agent-a',
text: 'old',
attachments: [],
createdAt: Date.now() - 11_000, // older than 10 s TTL
})
expect(consumePendingInitialMessage('agent-a')).toBeNull()
})
it('replaces a previous pending entry when set is called again', () => {
setPendingInitialMessage({
agentId: 'agent-a',
text: 'first',
attachments: [],
createdAt: Date.now(),
})
setPendingInitialMessage({
agentId: 'agent-b',
text: 'second',
attachments: [makeAttachment('two')],
createdAt: Date.now(),
})
expect(consumePendingInitialMessage('agent-a')).toBeNull()
const result = consumePendingInitialMessage('agent-b')
expect(result?.text).toBe('second')
expect(result?.attachments[0]?.id).toBe('two')
})
it('no-ops when set is called with empty agentId', () => {
setPendingInitialMessage({
agentId: '',
text: 'oops',
attachments: [],
createdAt: Date.now(),
})
expect(peekPendingInitialMessage()).toBeNull()
})
})

View File

@@ -1,81 +0,0 @@
import type { StagedAttachment } from '@/lib/attachments'
/**
* Same-tab in-memory handoff between the `/home` composer and the
* chat screen at `/home/agents/:agentId`. URL search params (`?q=`)
* carry the text fine, but cannot carry binary attachments — a multi-
* megabyte image dataUrl would explode URL length limits and round-
* trip badly. This module is the rich-data side channel for the same
* navigation: the composer writes here, the chat screen reads here on
* mount.
*
* Intentionally module-scope. Same render tree, same tab — no need
* for sessionStorage (which would force JSON-serialising the dataUrls
* and re-parsing on the read side). Cross-tab handoff is out of
* scope: the user typing at home in tab A and switching to tab B's
* chat would surface an empty registry there, which is the correct
* behaviour.
*/
export interface PendingInitialMessage {
agentId: string
text: string
attachments: StagedAttachment[]
createdAt: number
}
/**
* 10s TTL on the entry. A stale entry from a back-button journey
* shouldn't fire on a future visit; if real-world latency makes 10s
* too tight under slow harness boot, bump but never make it
* indefinite.
*/
const PENDING_TTL_MS = 10_000
let pending: PendingInitialMessage | null = null
let pendingTimer: ReturnType<typeof setTimeout> | null = null
function clearPending(): void {
pending = null
if (pendingTimer !== null) {
clearTimeout(pendingTimer)
pendingTimer = null
}
}
export function setPendingInitialMessage(payload: PendingInitialMessage): void {
// Defensive: the home composer should never call this without an
// agent selected. If it somehow does, no-op rather than holding a
// payload we can't route.
if (!payload.agentId) return
clearPending()
pending = payload
pendingTimer = setTimeout(clearPending, PENDING_TTL_MS)
}
/**
* Destructive read. Returns the entry only if `agentId` matches and
* the entry is fresh; clears the entry on success so Strict-Mode
* double-invokes can't double-send.
*/
export function consumePendingInitialMessage(
agentId: string,
): PendingInitialMessage | null {
if (!pending) return null
if (pending.agentId !== agentId) return null
if (Date.now() - pending.createdAt >= PENDING_TTL_MS) {
clearPending()
return null
}
const entry = pending
clearPending()
return entry
}
/**
* Non-mutating read for tests. Production code should never need this
* — use `consume` and own the lifecycle.
*/
export function peekPendingInitialMessage(): PendingInitialMessage | null {
return pending
}

View File

@@ -10,11 +10,9 @@ import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpe
import type {
AgentConversationTurn,
AssistantPart,
ConversationTurnFile,
ToolEntry,
UserAttachmentPreview,
} from '@/lib/agent-conversations/types'
import { useInvalidateAgentOutputs } from '@/lib/agent-files'
import type { ServerAttachmentPayload } from '@/lib/attachments'
import { consumeSSEStream } from '@/lib/sse'
import { buildToolLabel } from '@/lib/tool-labels'
@@ -55,12 +53,6 @@ export function useAgentConversation(
) {
const [turns, setTurns] = useState<AgentConversationTurn[]>([])
const [streaming, setStreaming] = useState(false)
const invalidateAgentOutputs = useInvalidateAgentOutputs()
// Stable ref so the resume effect doesn't re-subscribe on every
// render (the hook's returned callable is freshly closured each
// time, but the underlying queryClient is stable).
const invalidateAgentOutputsRef = useRef(invalidateAgentOutputs)
invalidateAgentOutputsRef.current = invalidateAgentOutputs
const sessionKeyRef = useRef(options.sessionKey ?? '')
const historyRef = useRef<OpenClawChatHistoryMessage[]>(options.history ?? [])
const textAccRef = useRef('')
@@ -160,17 +152,6 @@ export function useAgentConversation(
})
}
const setProducedFilesOnCurrentTurn = (files: ConversationTurnFile[]) => {
setTurns((prev) => {
const last = prev[prev.length - 1]
if (!last) return prev
// Replace, don't merge: the server's diff is authoritative for
// the just-completed turn — duplicate events shouldn't grow the
// list, and a re-attribution should overwrite an earlier one.
return [...prev.slice(0, -1), { ...last, producedFiles: files }]
})
}
const upsertAgentHarnessTool = (event: AgentHarnessStreamEvent) => {
if (event.type !== 'tool_call') return
const rawName = event.title || event.rawType || 'tool call'
@@ -227,9 +208,6 @@ export function useAgentConversation(
case 'tool_call':
upsertAgentHarnessTool(event)
break
case 'produced_files':
setProducedFilesOnCurrentTurn(event.files)
break
case 'done':
markCurrentTurnDone()
break
@@ -281,7 +259,6 @@ export function useAgentConversation(
...prev,
{
id: crypto.randomUUID(),
turnId: active.turnId,
userText: active.prompt ?? '',
parts: [],
done: false,
@@ -327,14 +304,9 @@ export function useAgentConversation(
// When `cancelled` is true the next run will set these
// itself, so resetting here would only cause a brief flicker.
if (!cancelled && weStartedStream) {
const finishedTurnId = turnIdRef.current
turnIdRef.current = null
lastSeqRef.current = null
setStreaming(false)
void invalidateAgentOutputsRef.current(
agentId,
finishedTurnId ?? undefined,
)
}
}
}
@@ -346,60 +318,6 @@ export function useAgentConversation(
}
}, [agentId, activeTurnIdDep])
/**
* Send the chat request and follow the 409-active-turn redirect
* once. Pulled out of `send` to keep its cognitive complexity in
* check — the retry adds a branch that biome counts heavily.
*/
const openSendStream = async (
targetAgentId: string,
text: string,
attachments: ServerAttachmentPayload[],
signal: AbortSignal,
): Promise<Response> => {
const initial = await chatWithHarnessAgent(
targetAgentId,
text,
signal,
attachments,
)
if (initial.status !== 409) return initial
// 409 means the server already has an active turn for this agent
// (a previous tab kicked one off and we're a fresh mount that
// missed the resume window). Attach to it instead of double-sending.
const body = (await initial.json()) as { turnId?: string }
if (!body.turnId) return initial
return attachToHarnessTurn(targetAgentId, {
turnId: body.turnId,
signal,
})
}
/**
* Pull session-key / turn-id off response headers and propagate to
* refs + the optimistic turn. Stamping `turnId` here lets the
* inline artifact card fall back to /files/turn/<id> on a resumed
* mount that missed the live `produced_files` event.
*/
const applyResponseHeadersToTurn = (response: Response) => {
const responseSessionKey =
response.headers.get('X-Session-Key') ??
response.headers.get('X-Session-Id')
if (responseSessionKey) {
sessionKeyRef.current = responseSessionKey
onSessionKeyChangeRef.current?.(responseSessionKey)
}
const responseTurnId = response.headers.get('X-Turn-Id')
if (!responseTurnId) return
turnIdRef.current = responseTurnId
lastSeqRef.current = null
setTurns((prev) => {
const last = prev[prev.length - 1]
if (!last) return prev
return [...prev.slice(0, -1), { ...last, turnId: responseTurnId }]
})
}
const send = async (input: string | SendInput) => {
const normalized: SendInput =
typeof input === 'string' ? { text: input } : input
@@ -428,13 +346,37 @@ export function useAgentConversation(
streamAbortRef.current = abortController
try {
const response = await openSendStream(
let response = await chatWithHarnessAgent(
agentId,
trimmed,
attachments,
abortController.signal,
attachments,
)
applyResponseHeadersToTurn(response)
// 409 means the server already has an active turn for this
// agent (e.g. a previous tab kicked one off and we're a fresh
// mount that missed the resume window). Attach to it instead of
// double-sending.
if (response.status === 409) {
const body = (await response.json()) as { turnId?: string }
if (body.turnId) {
response = await attachToHarnessTurn(agentId, {
turnId: body.turnId,
signal: abortController.signal,
})
}
}
const responseSessionKey =
response.headers.get('X-Session-Key') ??
response.headers.get('X-Session-Id')
if (responseSessionKey) {
sessionKeyRef.current = responseSessionKey
onSessionKeyChangeRef.current?.(responseSessionKey)
}
const responseTurnId = response.headers.get('X-Turn-Id')
if (responseTurnId) {
turnIdRef.current = responseTurnId
lastSeqRef.current = null
}
if (!response.ok) {
const err = await response.text()
updateCurrentTurnParts((parts) => [
@@ -462,15 +404,10 @@ export function useAgentConversation(
if (streamAbortRef.current === abortController) {
streamAbortRef.current = null
}
// Capture before nulling — the invalidation needs the turn id so
// useAgentTurnFiles consumers also flush, not just the agent-wide
// rail query.
const finishedTurnId = turnIdRef.current
turnIdRef.current = null
lastSeqRef.current = null
onCompleteRef.current?.()
setStreaming(false)
void invalidateAgentOutputs(agentId, finishedTurnId ?? undefined)
}
}

View File

@@ -1,4 +1,4 @@
import { Bot, Cpu, Sparkles, Wand2 } from 'lucide-react'
import { Bot, Cpu, Sparkles } from 'lucide-react'
import type { FC } from 'react'
import type { HarnessAgentAdapter } from './agent-harness-types'
@@ -23,9 +23,6 @@ export const AdapterIcon: FC<AdapterIconProps> = ({ adapter, className }) => {
case 'openclaw':
// OpenClaw — bot/automation framing.
return <Bot className={className} aria-label="OpenClaw" />
case 'hermes':
// Hermes — messenger god framing, wand evokes the agentic conjuring.
return <Wand2 className={className} aria-label="Hermes" />
default:
return <Bot className={className} aria-label="Agent" />
}
@@ -39,8 +36,6 @@ export function adapterLabel(adapter: HarnessAgentAdapter | 'unknown'): string {
return 'Codex'
case 'openclaw':
return 'OpenClaw'
case 'hermes':
return 'Hermes'
default:
return 'Agent'
}

View File

@@ -11,7 +11,6 @@ import type {
AgentAdapterHealth,
AgentRowData,
} from './agent-row/agent-row.types'
import { compareAgentsByPinThenRecency } from './agents-list-order'
import type { AgentListItem } from './agents-page-types'
import type { AgentLiveness } from './LivenessDot'
@@ -57,18 +56,31 @@ export const AgentList: FC<AgentListProps> = ({
return map
}, [adapters])
// Sort: pinned rows first, then most recently used, then never-used
// agents in id-stable order. The gateway's `main` agent stays
// pinned-to-top when never touched so a fresh install has an
// obvious starting point.
const ordered = useMemo(() => {
const withMeta = agents.map((agent) => {
const harness = harnessAgentLookup?.get(agent.agentId)
return {
agent,
id: agent.agentId,
pinned: harness?.pinned ?? false,
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
}
})
return withMeta
.sort(compareAgentsByPinThenRecency)
.sort((a, b) => {
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? -Infinity
const bValue = b.lastUsedAt ?? -Infinity
if (aValue !== bValue) return bValue - aValue
return a.agent.agentId.localeCompare(b.agent.agentId)
})
.map((entry) => entry.agent)
}, [activity, agents, harnessAgentLookup])
@@ -117,7 +129,6 @@ function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
if (lower === 'claude code') return 'claude'
if (lower === 'codex') return 'codex'
if (lower === 'openclaw') return 'openclaw'
if (lower === 'hermes') return 'hermes'
return 'unknown'
}

View File

@@ -10,7 +10,6 @@ import { createAgentPageActions } from './agents-page-actions'
import {
useDefaultAgentName,
useHarnessAgentDefaults,
useHermesProviderSelection,
useOpenClawProviderSelection,
} from './agents-page-hooks'
import {
@@ -107,7 +106,6 @@ export const AgentsPage: FC = () => {
)
const [harnessModelId, setHarnessModelId] = useState('')
const [harnessReasoningEffort, setHarnessReasoningEffort] = useState('')
const [createHermesProviderId, setCreateHermesProviderId] = useState('')
const [showTerminal, setShowTerminal] = useState(false)
const [cliAuthModalOpen, setCliAuthModalOpen] = useState(false)
const [pageError, setPageError] = useState<string | null>(null)
@@ -135,14 +133,6 @@ export const AgentsPage: FC = () => {
cliAuthModalOpen,
setCliAuthModalOpen,
})
const { selectableHermesProviders } = useHermesProviderSelection({
providers,
defaultProviderId,
createOpen,
createRuntime,
createHermesProviderId,
setCreateHermesProviderId,
})
useDefaultAgentName(createOpen, setNewName)
useHarnessAgentDefaults({
adapters,
@@ -236,13 +226,11 @@ export const AgentsPage: FC = () => {
createAgentPageActions({
createProviderId,
createRuntime,
createHermesProviderId,
harnessModelId,
harnessReasoningEffort,
navigate,
newName,
selectableOpenClawProviders,
selectableHermesProviders,
setupProviderId,
createHarnessAgent: createHarnessAgent.mutateAsync,
createOpenClawAgent,
@@ -398,8 +386,6 @@ export const AgentsPage: FC = () => {
harnessAdapterId={harnessAdapterId}
harnessModelId={harnessModelId}
harnessReasoningEffort={harnessReasoningEffort}
hermesProviders={selectableHermesProviders}
hermesSelectedProviderId={createHermesProviderId}
name={newName}
open={createOpen}
providers={selectableOpenClawProviders}
@@ -415,14 +401,12 @@ export const AgentsPage: FC = () => {
if (!open) {
setCreateError(null)
createHarnessAgent.reset()
setCreateHermesProviderId('')
}
}}
onRuntimeChange={setCreateRuntime}
onHarnessAdapterChange={handleHarnessAdapterChange}
onHarnessModelChange={setHarnessModelId}
onHarnessReasoningChange={setHarnessReasoningEffort}
onHermesProviderChange={setCreateHermesProviderId}
onNameChange={setNewName}
onProviderChange={setCreateProviderId}
/>

View File

@@ -40,8 +40,6 @@ interface NewAgentDialogProps {
harnessAdapterId: HarnessAgentAdapter
harnessModelId: string
harnessReasoningEffort: string
hermesProviders: ProviderOption[]
hermesSelectedProviderId: string
name: string
open: boolean
providers: ProviderOption[]
@@ -57,7 +55,6 @@ interface NewAgentDialogProps {
onHarnessAdapterChange: (adapter: HarnessAgentAdapter) => void
onHarnessModelChange: (modelId: string) => void
onHarnessReasoningChange: (reasoningEffort: string) => void
onHermesProviderChange: (providerId: string) => void
onNameChange: (name: string) => void
onProviderChange: (providerId: string) => void
}
@@ -72,8 +69,6 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
harnessAdapterId,
harnessModelId,
harnessReasoningEffort,
hermesProviders,
hermesSelectedProviderId,
name,
open,
providers,
@@ -89,29 +84,22 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
onHarnessAdapterChange,
onHarnessModelChange,
onHarnessReasoningChange,
onHermesProviderChange,
onNameChange,
onProviderChange,
}) => {
const selectedHarnessAdapter =
adapters.find((adapter) => adapter.id === harnessAdapterId) ?? adapters[0]
const isHarnessRuntime = createRuntime !== 'openclaw'
const isHermesRuntime = createRuntime === 'hermes'
const isClassicHarnessRuntime = isHarnessRuntime && !isHermesRuntime
const openClawBlocked = createRuntime === 'openclaw' && !canManageOpenClaw
const cliBlocked =
createRuntime === 'openclaw' &&
!!selectedCliProvider &&
!cliAuthStatus?.loggedIn
const hermesBlocked =
isHermesRuntime &&
(hermesProviders.length === 0 || !hermesSelectedProviderId)
const canCreate =
Boolean(name.trim()) &&
!creating &&
!openClawBlocked &&
!cliBlocked &&
!hermesBlocked &&
(createRuntime === 'openclaw'
? providers.length > 0
: Boolean(selectedHarnessAdapter))
@@ -155,8 +143,7 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
if (
value === 'openclaw' ||
value === 'claude' ||
value === 'codex' ||
value === 'hermes'
value === 'codex'
) {
onRuntimeChange(value)
if (value !== 'openclaw') onHarnessAdapterChange(value)
@@ -209,16 +196,7 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
</>
) : null}
{isHermesRuntime ? (
<ProviderSelector
providers={hermesProviders}
defaultProviderId={defaultProviderId}
selectedId={hermesSelectedProviderId}
onSelect={onHermesProviderChange}
/>
) : null}
{isClassicHarnessRuntime ? (
{isHarnessRuntime ? (
<>
<div className="grid gap-2">
<Label htmlFor="harness-model">Model</Label>

View File

@@ -1,21 +1,6 @@
import type { AgentEntry } from './useOpenClaw'
export type HarnessAgentAdapter = 'claude' | 'codex' | 'openclaw' | 'hermes'
/**
* One file the harness attributed to the assistant turn that just
* finished. Mirrors the server-side `ProducedFileEventEntry` shape so
* the inline artifact card can render alongside the streamed text the
* user just watched complete. Only present for openclaw adapter
* turns; claude / codex don't produce these events in v1.
*/
export interface HarnessProducedFile {
id: string
/** Workspace-relative POSIX path. */
path: string
size: number
mtimeMs: number
}
export type HarnessAgentAdapter = 'claude' | 'codex' | 'openclaw'
export type AgentHarnessStreamEvent =
| {
@@ -37,10 +22,6 @@ export type AgentHarnessStreamEvent =
text: string
rawType?: string
}
| {
type: 'produced_files'
files: HarnessProducedFile[]
}
| {
type: 'done'
text?: string
@@ -130,16 +111,6 @@ export interface CreateHarnessAgentInput {
adapter: HarnessAgentAdapter
modelId?: string
reasoningEffort?: string
/**
* Adapter provider id from the user's BrowserOS AI Settings entry.
* Provider-backed adapters use this with `apiKey`/`baseUrl` to write
* or provision their runtime-specific provider config.
*/
providerType?: string
/** API key paired with `providerType` when the selected adapter needs one. */
apiKey?: string
/** Base URL for OpenAI-compatible/custom provider entries. */
baseUrl?: string
}
export interface HarnessHistoryReasoning {

View File

@@ -1,104 +0,0 @@
import { describe, expect, it } from 'bun:test'
import type { HarnessAgent } from './agent-harness-types'
import {
compareAgentsByPinThenRecency,
orderAgentsByPinThenRecency,
} from './agents-list-order'
function makeAgent(input: {
id: string
pinned?: boolean
lastUsedAt?: number | null
}): HarnessAgent {
return {
id: input.id,
name: input.id,
adapter: 'codex',
permissionMode: 'approve-all',
sessionKey: 'session',
createdAt: 0,
updatedAt: 0,
pinned: input.pinned,
lastUsedAt: input.lastUsedAt,
}
}
describe('orderAgentsByPinThenRecency', () => {
it('floats pinned agents to the top regardless of recency', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
])
expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
})
it('sorts by lastUsedAt desc within each pin group', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
])
expect(result.map((entry) => entry.id)).toEqual([
'newer-pin',
'older-pin',
'newer',
'older',
])
})
it('seed-pins the gateway main agent above other never-used agents', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
])
expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
})
it('drops the main seed-pin once the agent has been used', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
])
expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
})
it('puts never-used agents below recently-used ones', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
])
expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
})
it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
const result = orderAgentsByPinThenRecency([
makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
])
expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
})
})
describe('compareAgentsByPinThenRecency', () => {
it('produces the same order as the harness-shape helper', () => {
const items = [
{ id: 'older', pinned: false, lastUsedAt: 50 },
{ id: 'newer', pinned: false, lastUsedAt: 80 },
{ id: 'pinned', pinned: true, lastUsedAt: 1 },
]
const sorted = [...items].sort(compareAgentsByPinThenRecency)
expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
})
it('seeds the main agent above other never-used rows', () => {
const items = [
{ id: 'zzz', pinned: false, lastUsedAt: null },
{ id: 'main', pinned: false, lastUsedAt: null },
]
const sorted = [...items].sort(compareAgentsByPinThenRecency)
expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
})
})

View File

@@ -1,59 +0,0 @@
import type { HarnessAgent } from './agent-harness-types'
/**
* Stable ordering for index-shaped agent surfaces (the `/agents` rail
* and the chat-screen rail at `/agents/:agentId`). Pinned rows float
* to the top, then recency desc, with never-used agents falling to
* the bottom in id-stable order. The gateway's `main` agent gets
* seed-pinned to the top of the never-used group so a fresh install
* has an obvious starting point even before the user has used it.
*
* NOT the same rule as the home grid (`orderHomeAgents`): home is
* action-shaped — active-turn floats to the top — so users can
* resume what's running. The chat rail keeps recency stable so it
* doesn't reshuffle as turns transition every 5s.
*/
export function orderAgentsByPinThenRecency(
agents: HarnessAgent[],
): HarnessAgent[] {
return [...agents].sort((a, b) => {
const aPinned = a.pinned ?? false
const bPinned = b.pinned ?? false
if (aPinned !== bPinned) return aPinned ? -1 : 1
const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
return a.id.localeCompare(b.id)
})
}
/**
* Same comparator, but operates over arbitrary records that carry
* `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
* `/agents` `AgentList` which pivots `AgentListItem` + harness
* lookup into a sortable shape; both surfaces stay on identical
* sort semantics through this adapter.
*/
export function compareAgentsByPinThenRecency<
T extends { pinned: boolean; lastUsedAt: number | null; id: string },
>(a: T, b: T): number {
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
const aSeed = a.id === 'main' && a.lastUsedAt === null
const bSeed = b.id === 'main' && b.lastUsedAt === null
if (aSeed && !bSeed) return -1
if (!aSeed && bSeed) return 1
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
if (aValue !== bValue) return bValue - aValue
return a.id.localeCompare(b.id)
}

View File

@@ -20,22 +20,17 @@ import type {
export interface AgentPageActionInput {
createProviderId: string
createRuntime: CreateAgentRuntime
createHermesProviderId: string
harnessModelId: string
harnessReasoningEffort: string
navigate: NavigateFunction
newName: string
selectableOpenClawProviders: ProviderOption[]
selectableHermesProviders: ProviderOption[]
setupProviderId: string
createHarnessAgent: (input: {
name: string
adapter: HarnessAgentAdapter
modelId?: string
reasoningEffort?: string
providerType?: string
apiKey?: string
baseUrl?: string
}) => Promise<HarnessAgent>
createOpenClawAgent: (
input: OpenClawAgentMutationInput,
@@ -119,37 +114,20 @@ export function createAgentPageActions(input: AgentPageActionInput) {
const handleHarnessCreate = async () => {
if (!input.newName.trim()) return
const isHermes = input.createRuntime === 'hermes'
// Hermes pulls every provider field from the user's selected entry
// in the global LLM-providers list (managed under AI Settings). The
// backend rejects creation if any required field is missing.
const hermesProvider = isHermes
? input.selectableHermesProviders.find(
(option) => option.id === input.createHermesProviderId,
)
: undefined
const effectiveModelId = isHermes
? hermesProvider?.modelId
: input.harnessModelId || undefined
input.setCreateError(null)
try {
const agent = await input.createHarnessAgent({
name: input.newName.trim(),
adapter: input.createRuntime as HarnessAgentAdapter,
modelId: effectiveModelId,
modelId: input.harnessModelId || undefined,
reasoningEffort: input.harnessReasoningEffort || undefined,
providerType: hermesProvider?.type,
apiKey: hermesProvider?.apiKey,
baseUrl: hermesProvider?.baseUrl,
})
input.setCreateOpen(false)
input.setNewName('')
track(AGENT_CREATED_EVENT, {
runtime: input.createRuntime,
model_id: effectiveModelId,
model_id: input.harnessModelId || undefined,
reasoning_effort: input.harnessReasoningEffort || undefined,
provider_type: hermesProvider?.type,
})
input.navigate(`/agents/${agent.id}`)
} catch (err) {
@@ -162,7 +140,6 @@ export function createAgentPageActions(input: AgentPageActionInput) {
openclaw: handleOpenClawCreate,
claude: handleHarnessCreate,
codex: handleHarnessCreate,
hermes: handleHarnessCreate,
}
void createByRuntime[input.createRuntime]()
}

View File

@@ -4,9 +4,8 @@ import type {
HarnessAdapterDescriptor,
HarnessAgentAdapter,
} from './agent-harness-types'
import type { CreateAgentRuntime, ProviderOption } from './agents-page-types'
import type { CreateAgentRuntime } from './agents-page-types'
import { toProviderOptions } from './agents-page-utils'
import { getHermesSupportedProviders } from './hermes-supported-providers'
import {
buildOpenClawCliProviderOptions,
findOpenClawCliProviderById,
@@ -172,60 +171,3 @@ export function useOpenClawProviderSelection(input: {
cliAuthError,
}
}
/**
* Mirror of useOpenClawProviderSelection but for Hermes. Hermes only
* needs the create-dialog flow (no setup dialog, no CLI providers), so
* this hook is much smaller — it just filters the global provider list
* to ones Hermes can drive and seeds the selected id when the dialog
* opens.
*/
export function useHermesProviderSelection(input: {
providers: LlmProviderConfig[]
defaultProviderId: string
createOpen: boolean
createRuntime: CreateAgentRuntime
createHermesProviderId: string
setCreateHermesProviderId: Dispatch<SetStateAction<string>>
}) {
const {
providers,
defaultProviderId,
createOpen,
createRuntime,
createHermesProviderId,
setCreateHermesProviderId,
} = input
const selectableHermesProviders = useMemo<ProviderOption[]>(
() =>
getHermesSupportedProviders(providers).map((provider) => ({
id: provider.id,
type: provider.type,
name: provider.name,
modelId: provider.modelId,
baseUrl: provider.baseUrl,
apiKey: provider.apiKey,
})),
[providers],
)
useEffect(() => {
if (selectableHermesProviders.length === 0) return
if (!createOpen || createRuntime !== 'hermes') return
if (createHermesProviderId) return
const fallbackId =
selectableHermesProviders.find((p) => p.id === defaultProviderId)?.id ??
selectableHermesProviders[0].id
setCreateHermesProviderId(fallbackId)
}, [
createHermesProviderId,
createOpen,
createRuntime,
defaultProviderId,
selectableHermesProviders,
setCreateHermesProviderId,
])
return { selectableHermesProviders }
}

View File

@@ -1,30 +0,0 @@
import {
HERMES_SUPPORTED_BROWSEROS_PROVIDER_TYPES,
type HermesSupportedBrowserosProviderType,
} from '@browseros/shared/constants/hermes'
import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
export function isHermesSupportedProviderType(
providerType: ProviderType,
): providerType is HermesSupportedBrowserosProviderType {
return (
HERMES_SUPPORTED_BROWSEROS_PROVIDER_TYPES as readonly ProviderType[]
).includes(providerType)
}
/**
* Filters the user's global LLM providers down to ones Hermes can use.
* A provider qualifies when its type is in the Hermes-supported set
* AND it has an API key wired up. CLI-style providers (chatgpt-pro,
* github-copilot, qwen-code) and other unsupported types (browseros,
* ollama, lmstudio, bedrock, azure, google, moonshot) are filtered
* out — Hermes can't drive them today.
*/
export function getHermesSupportedProviders(
providers: LlmProviderConfig[],
): LlmProviderConfig[] {
return providers.filter(
(provider) =>
!!provider.apiKey && isHermesSupportedProviderType(provider.type),
)
}

View File

@@ -25,18 +25,12 @@ interface HarnessAgentsResponse {
export type { AgentHarnessStreamEvent }
export const AGENT_QUERY_KEYS = {
const AGENT_QUERY_KEYS = {
adapters: 'agent-harness-adapters',
agents: 'agent-harness-agents',
/** Outputs-rail data for one agent — `[agentOutputs, baseUrl, agentId]`. */
agentOutputs: 'agent-harness-agent-outputs',
/** Per-turn artifact-card files — `[agentTurnFiles, baseUrl, agentId, turnId]`. */
agentTurnFiles: 'agent-harness-agent-turn-files',
/** Single-file preview payload — `[filePreview, baseUrl, fileId]`. */
filePreview: 'agent-harness-file-preview',
} as const
export async function agentsFetch<T>(
async function agentsFetch<T>(
baseUrl: string,
path: string,
init?: RequestInit,

View File

@@ -85,8 +85,7 @@ export const SidebarLayout: FC = () => {
return (
<RpcClientProvider>
{/* pl-14 offsets all content by the collapsed sidebar width (w-14 = 56px) so it never sits under the rail */}
<div className="relative min-h-screen bg-background pl-14">
<div className="relative min-h-screen bg-background">
{/* Sidebar - fixed overlay */}
{/* biome-ignore lint/a11y/noStaticElementInteractions: hover interactions needed */}
<div
@@ -97,6 +96,7 @@ export const SidebarLayout: FC = () => {
<AppSidebar expanded={sidebarOpen} onOpenShortcuts={openShortcuts} />
</div>
{/* Main content - full width, centered */}
{location.pathname === '/home/chat' ? (
<main className="relative h-dvh overflow-hidden">
<Outlet />

View File

@@ -108,7 +108,6 @@ function formatAdapterName(adapter: HarnessAgentAdapter): string {
if (adapter === 'claude') return 'Claude Code'
if (adapter === 'codex') return 'Codex'
if (adapter === 'openclaw') return 'OpenClaw'
if (adapter === 'hermes') return 'Hermes'
return adapter
}

View File

@@ -42,34 +42,11 @@ export interface UserAttachmentPreview {
dataUrl?: string
}
/**
* Files attributed to this turn by the harness's per-turn workspace
* diff. Populated either via the live `produced_files` SSE event or
* (on resume) the `useAgentTurnFiles` fallback. Mirrors the wire
* shape from `agent-harness-types.HarnessProducedFile` minus the
* stream-only fields the inline card doesn't need.
*/
export interface ConversationTurnFile {
id: string
path: string
size: number
mtimeMs: number
}
export interface AgentConversationTurn {
id: string
/**
* Server-issued turn id, set as soon as the response headers arrive
* (`X-Turn-Id`) for fresh sends, or from the active-turn payload on
* resume. Required for the historic-files fallback fetch; absent on
* the brief optimistic window before the first header.
*/
turnId?: string | null
userText: string
userAttachments?: UserAttachmentPreview[]
parts: AssistantPart[]
/** Files produced during this turn (openclaw only in v1). */
producedFiles?: ConversationTurnFile[]
done: boolean
timestamp: number
}

View File

@@ -1,126 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* Pure helpers used by the artifact card and the Outputs rail.
* Display formatting only — no React, no fetch, no DOM. Anything
* stateful belongs in `./useAgentOutputs` or `./useFilePreview`.
*/
import { buildAgentApiUrl } from '@/entrypoints/app/agents/agent-api-url'
/**
* Coarse classification of a file's intended preview / icon path.
* Mirrors the server-side `FilePreviewKind` minus `missing` — the
* client only ever computes a kind for a row it already has.
*/
export type FileKind = 'text' | 'image' | 'pdf' | 'binary'
const TEXT_EXTENSIONS = new Set([
'txt',
'md',
'markdown',
'json',
'jsonl',
'csv',
'tsv',
'xml',
'yaml',
'yml',
'toml',
'ini',
'log',
'html',
'htm',
'css',
'js',
'mjs',
'cjs',
'ts',
'tsx',
'jsx',
'py',
'rb',
'go',
'rs',
'java',
'kt',
'swift',
'c',
'h',
'cpp',
'hpp',
'sh',
'zsh',
'bash',
'sql',
'svg',
])
const IMAGE_EXTENSIONS = new Set([
'png',
'jpg',
'jpeg',
'gif',
'webp',
'bmp',
'ico',
'heic',
'heif',
])
/** Best-effort kind based on extension only. Server's preview API
* is the source of truth for actual rendering — this is just for
* picking an icon / sort hint without a network round-trip. */
export function inferFileKind(path: string): FileKind {
const ext = extensionOf(path).toLowerCase()
if (ext === 'pdf') return 'pdf'
if (IMAGE_EXTENSIONS.has(ext)) return 'image'
if (TEXT_EXTENSIONS.has(ext)) return 'text'
return 'binary'
}
/** Plain extension without the leading dot. Empty string when none. */
export function extensionOf(path: string): string {
const dot = path.lastIndexOf('.')
if (dot === -1) return ''
const slash = path.lastIndexOf('/')
if (dot < slash) return ''
return path.slice(dot + 1)
}
/** File name (final path segment), no directory prefix. */
export function basenameOf(path: string): string {
const slash = path.lastIndexOf('/')
return slash === -1 ? path : path.slice(slash + 1)
}
const SIZE_UNITS = ['B', 'KB', 'MB', 'GB', 'TB'] as const
/** "2.4 MB" / "340 KB" / "78 B" — for the artifact card's right-side
* metadata. Not localised; the rail uses one space + the unit. */
export function formatFileSize(bytes: number): string {
if (!Number.isFinite(bytes) || bytes < 0) return '—'
if (bytes < 1024) return `${bytes} ${SIZE_UNITS[0]}`
let value = bytes
let unit = 0
while (value >= 1024 && unit < SIZE_UNITS.length - 1) {
value /= 1024
unit += 1
}
// 1-digit precision below 10, integer above — feels less noisy.
const formatted = value < 10 ? value.toFixed(1) : Math.round(value).toString()
return `${formatted} ${SIZE_UNITS[unit]}`
}
/**
* Build the per-file download URL using the same agent-api root the
* rest of the harness hits. Returned URL is already absolute.
*/
export function buildFileDownloadUrl(baseUrl: string, fileId: string): string {
return buildAgentApiUrl(
baseUrl,
`/files/${encodeURIComponent(fileId)}/download`,
)
}

View File

@@ -1,32 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
export {
basenameOf,
buildFileDownloadUrl,
extensionOf,
type FileKind,
formatFileSize,
inferFileKind,
} from './file-helpers'
export type {
BinaryFilePreview,
FilePreview,
FilePreviewKind,
ImageFilePreview,
MissingFilePreview,
PdfFilePreview,
ProducedFile,
ProducedFilesRailGroup,
TextFilePreview,
} from './types'
export {
useAgentOutputs,
useAgentTurnFiles,
useInvalidateAgentOutputs,
useRefreshAgentOutputs,
} from './useAgentOutputs'
export { useFilePreview } from './useFilePreview'

View File

@@ -1,75 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* Wire types shared by the inline artifact card and the per-agent
* Outputs rail. These mirror `ProducedFileEntry` /
* `ProducedFilesRailGroup` on the server and the `FilePreview`
* discriminated union from `apps/server/src/api/services/openclaw/file-preview.ts`.
*
* The schema mirror is deliberate (vs sharing a workspace package)
* because the server keeps the on-disk row shape — `agentDefinitionId`,
* `sessionKey` — out of the wire payload. Dropping those columns at the
* type boundary keeps the client honest about what it can refer to.
*/
export interface ProducedFile {
id: string
/** Workspace-relative POSIX path. */
path: string
size: number
mtimeMs: number
/** Server clock when the file was first attributed to its turn. */
createdAt: number
detectedBy: 'diff' | 'tool'
}
export interface ProducedFilesRailGroup {
turnId: string
/** First non-blank line of the user prompt that initiated this turn. */
turnPrompt: string
createdAt: number
files: ProducedFile[]
}
export type FilePreviewKind = 'text' | 'image' | 'pdf' | 'binary' | 'missing'
interface BasePreview {
kind: FilePreviewKind
mimeType: string
size: number
mtimeMs: number
}
export interface TextFilePreview extends BasePreview {
kind: 'text'
snippet: string
/** True when the on-disk file is larger than the server's snippet cap. */
truncated: boolean
}
export interface ImageFilePreview extends BasePreview {
kind: 'image'
/** Base64 data URL (incl. `data:` prefix). Suitable for `<img src>`. */
dataUrl: string
}
export interface PdfFilePreview extends BasePreview {
kind: 'pdf'
}
export interface BinaryFilePreview extends BasePreview {
kind: 'binary'
}
export interface MissingFilePreview {
kind: 'missing'
}
export type FilePreview =
| TextFilePreview
| ImageFilePreview
| PdfFilePreview
| BinaryFilePreview
| MissingFilePreview

View File

@@ -1,166 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* React Query hooks backing the per-agent Outputs rail and the
* inline artifact card.
*
* Live updates: the consumer of `useAgentConversation` (see Phase 5)
* is expected to call `useInvalidateAgentOutputs(agentId)` whenever
* an assistant turn completes, so the rail picks up the new
* `produced_files` rows the server attributed during that turn.
* No SSE channel here — invalidation off the existing chat-stream
* completion is enough for v1.
*/
import { useMutation, useQuery, useQueryClient } from '@tanstack/react-query'
import {
AGENT_QUERY_KEYS,
agentsFetch,
} from '@/entrypoints/app/agents/useAgents'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
import type { ProducedFile, ProducedFilesRailGroup } from './types'
interface OutputsResponse {
groups: ProducedFilesRailGroup[]
}
interface TurnFilesResponse {
files: ProducedFile[]
}
export function useAgentOutputs(agentId: string, enabled = true) {
const {
baseUrl,
isLoading: urlLoading,
error: urlError,
} = useAgentServerUrl()
const query = useQuery<ProducedFilesRailGroup[], Error>({
queryKey: [AGENT_QUERY_KEYS.agentOutputs, baseUrl, agentId],
queryFn: async () => {
const data = await agentsFetch<OutputsResponse>(
baseUrl as string,
`/${encodeURIComponent(agentId)}/files`,
)
return data.groups ?? []
},
enabled: Boolean(baseUrl) && !urlLoading && enabled && Boolean(agentId),
})
return {
groups: query.data ?? [],
loading: query.isLoading || urlLoading,
error: query.error ?? urlError,
refetch: query.refetch,
}
}
/**
* Per-turn fetch for the inline artifact card. Used both as the
* fallback when an SSE `produced_files` event was missed, and to
* rehydrate a turn the user scrolled back to.
*/
export function useAgentTurnFiles(
agentId: string,
turnId: string | null,
enabled = true,
) {
const {
baseUrl,
isLoading: urlLoading,
error: urlError,
} = useAgentServerUrl()
const query = useQuery<ProducedFile[], Error>({
queryKey: [AGENT_QUERY_KEYS.agentTurnFiles, baseUrl, agentId, turnId],
queryFn: async () => {
const data = await agentsFetch<TurnFilesResponse>(
baseUrl as string,
`/${encodeURIComponent(agentId)}/files/turn/${encodeURIComponent(
turnId as string,
)}`,
)
return data.files ?? []
},
enabled:
Boolean(baseUrl) &&
!urlLoading &&
enabled &&
Boolean(agentId) &&
Boolean(turnId),
})
return {
files: query.data ?? [],
loading: query.isLoading || urlLoading,
error: query.error ?? urlError,
refetch: query.refetch,
}
}
/**
* Returns a callable that invalidates outputs / turn-files queries
* for one agent across any baseUrl. Call after an assistant turn
* completes so the rail (and the inline file-card strip) pick up
* the new attributed rows. Cheap when the queries aren't mounted
* — react-query just marks the cached value stale.
*
* Implementation note: react-query's `invalidateQueries({ queryKey })`
* does positional partial-match, so passing `undefined` as the
* baseUrl placeholder does NOT match a cached `[…, baseUrl, …]`
* key — the cache stayed stale. Use a predicate so we ignore the
* baseUrl position entirely.
*/
export function useInvalidateAgentOutputs() {
const queryClient = useQueryClient()
return async (agentId: string, turnId?: string) => {
await Promise.all([
queryClient.invalidateQueries({
predicate: (query) => {
const key = query.queryKey
return (
Array.isArray(key) &&
key[0] === AGENT_QUERY_KEYS.agentOutputs &&
key[2] === agentId
)
},
}),
queryClient.invalidateQueries({
predicate: (query) => {
const key = query.queryKey
if (
!Array.isArray(key) ||
key[0] !== AGENT_QUERY_KEYS.agentTurnFiles ||
key[2] !== agentId
) {
return false
}
// When a turnId was supplied, scope to just that turn's
// entry. Otherwise flush every cached turn for this agent.
return turnId ? key[3] === turnId : true
},
}),
])
}
}
/**
* Tiny mutation wrapper so the Outputs rail's "Refresh" button can
* surface an `isPending` indicator while the new query is in flight.
* No body — just triggers `refetch` on the rail's query for this
* agent and resolves when it settles.
*/
export function useRefreshAgentOutputs(agentId: string) {
const queryClient = useQueryClient()
const { baseUrl } = useAgentServerUrl()
return useMutation({
mutationFn: async () => {
await queryClient.refetchQueries({
queryKey: [AGENT_QUERY_KEYS.agentOutputs, baseUrl, agentId],
exact: true,
})
},
})
}

View File

@@ -1,49 +0,0 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*
* Single-file preview hook used by the inline artifact card and the
* Outputs rail's preview Sheet. Always opt-in (`enabled`) — the
* preview is fetched only when the user clicks a row, never
* eagerly.
*/
import { useQuery } from '@tanstack/react-query'
import {
AGENT_QUERY_KEYS,
agentsFetch,
} from '@/entrypoints/app/agents/useAgents'
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
import type { FilePreview } from './types'
export function useFilePreview(fileId: string | null, enabled = true) {
const {
baseUrl,
isLoading: urlLoading,
error: urlError,
} = useAgentServerUrl()
const query = useQuery<FilePreview, Error>({
queryKey: [AGENT_QUERY_KEYS.filePreview, baseUrl, fileId],
queryFn: async () => {
return agentsFetch<FilePreview>(
baseUrl as string,
`/files/${encodeURIComponent(fileId as string)}/preview`,
)
},
enabled: Boolean(baseUrl) && !urlLoading && enabled && Boolean(fileId),
// Previews are immutable for a given fileId — once loaded, never
// refetch on focus / reconnect. They go stale only when the
// underlying file is removed (rare in v1; no rename / delete).
staleTime: Infinity,
gcTime: 5 * 60 * 1000,
})
return {
preview: query.data ?? null,
loading: query.isLoading || urlLoading,
error: query.error ?? urlError,
refetch: query.refetch,
}
}

View File

@@ -1,76 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { stageAttachment } from './attachments'
function restoreGlobal(name: string, value: unknown) {
if (value === undefined) {
Reflect.deleteProperty(globalThis, name)
return
}
Reflect.set(globalThis, name, value)
}
describe('stageAttachment', () => {
it('uses the recompressed blob media type for large images', async () => {
const originalCreateImageBitmap = Reflect.get(
globalThis,
'createImageBitmap',
)
const originalOffscreenCanvas = Reflect.get(globalThis, 'OffscreenCanvas')
const originalHTMLCanvasElement = Reflect.get(
globalThis,
'HTMLCanvasElement',
)
class FakeOffscreenCanvas {
width: number
height: number
constructor(width: number, height: number) {
this.width = width
this.height = height
}
getContext() {
return {
drawImage() {},
}
}
async convertToBlob(options: { type?: string }) {
return new Blob([new Uint8Array([9, 8, 7])], {
type: options.type ?? 'image/jpeg',
})
}
}
try {
Reflect.set(globalThis, 'createImageBitmap', async () => ({
width: 4096,
height: 2048,
close() {},
}))
Reflect.set(globalThis, 'OffscreenCanvas', FakeOffscreenCanvas)
Reflect.set(globalThis, 'HTMLCanvasElement', class HTMLCanvasElement {})
const file = new File([new Uint8Array(2 * 1024 * 1024)], 'shot.png', {
type: 'image/png',
})
const result = await stageAttachment(file)
expect(result.ok).toBe(true)
if (!result.ok) throw new Error(result.error.message)
expect(result.attachment.mediaType).toBe('image/jpeg')
expect(result.attachment.dataUrl).toStartWith('data:image/jpeg;base64,')
expect(result.attachment.payload).toMatchObject({
kind: 'image',
mediaType: 'image/jpeg',
dataUrl: result.attachment.dataUrl,
})
} finally {
restoreGlobal('createImageBitmap', originalCreateImageBitmap)
restoreGlobal('OffscreenCanvas', originalOffscreenCanvas)
restoreGlobal('HTMLCanvasElement', originalHTMLCanvasElement)
}
})
})

View File

@@ -100,7 +100,6 @@ export async function stageAttachment(
try {
const compressed = await compressImageIfNeeded(file)
const dataUrl = await readAsDataUrl(compressed)
const encodedMediaType = compressed.type || mediaType
// Rough byte ceiling — `data:image/png;base64,...` doubles size with
// base64. Reject early so we never POST something the route will 400.
if (dataUrl.length > MAX_IMAGE_BYTES * 2) {
@@ -119,12 +118,12 @@ export async function stageAttachment(
attachment: {
id: makeId(),
kind: 'image',
mediaType: encodedMediaType,
mediaType,
name: file.name || 'image',
dataUrl,
payload: {
kind: 'image',
mediaType: encodedMediaType,
mediaType,
dataUrl,
name: file.name || undefined,
},

View File

@@ -38,8 +38,8 @@ browseros-cli install # downloads BrowserOS for your platform
# If BrowserOS is installed but not running
browseros-cli launch # opens BrowserOS, waits for server
# Configure the CLI with the Server URL from BrowserOS settings
browseros-cli init http://127.0.0.1:9000/mcp
# Configure the CLI (auto-discovers running BrowserOS)
browseros-cli init --auto # detects server URL and saves config
# Verify connection
browseros-cli health
@@ -52,7 +52,7 @@ browseros-cli init <url> # non-interactive — pass URL directly
browseros-cli init # interactive — prompts for URL
```
Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
### CLI updates
@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
| `--debug` | `BOS_DEBUG=1` | Debug output |
| `--timeout, -t` | | Request timeout (default: 2m) |
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
## Testing
@@ -179,7 +179,7 @@ apps/cli/
│ └── config.go # Config file (~/.config/browseros-cli/config.yaml)
├── cmd/
│ ├── root.go # Root command, global flags
│ ├── init.go # Server URL configuration (URL arg or interactive)
│ ├── init.go # Server URL configuration (URL arg, --auto, interactive)
│ ├── install.go # install (download BrowserOS for current platform)
│ ├── launch.go # launch (find and start BrowserOS, wait for server)
│ ├── open.go # open (new_page / new_hidden_page)

View File

@@ -17,6 +17,8 @@ import (
)
func init() {
var autoDiscover bool
cmd := &cobra.Command{
Use: "init [url]",
Short: "Configure the BrowserOS server connection",
@@ -32,8 +34,9 @@ You can provide the full URL or just the port number:
browseros-cli init http://127.0.0.1:9000/mcp
browseros-cli init 9000
Modes:
Three modes:
browseros-cli init <url> Non-interactive (full URL or port number)
browseros-cli init --auto Auto-discover from ~/.browseros/server.json
browseros-cli init Interactive prompt`,
Annotations: map[string]string{"group": "Setup:"},
Args: cobra.MaximumNArgs(1),
@@ -46,9 +49,22 @@ Modes:
switch {
case len(args) == 1:
// Non-interactive: URL provided as argument
input = args[0]
case autoDiscover:
// Auto-discover: server.json → config → probe common ports
discovered := probeRunningServer()
if discovered == "" {
output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
" If not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", 1)
}
input = discovered
fmt.Printf("Auto-discovered server at %s\n", input)
default:
// Interactive prompt (original behavior)
fmt.Println()
bold.Println("BrowserOS CLI Setup")
fmt.Println()
@@ -79,14 +95,12 @@ Modes:
output.Errorf(1, "invalid URL: %s", input)
}
// Verify connectivity
fmt.Printf("Checking connection to %s ...\n", baseURL)
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Get(baseURL + "/health")
if err != nil {
output.Errorf(1, "cannot connect to %s: %v\n\n"+
"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
"Then run: browseros-cli init <Server URL>\n"+
"Example: browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
}
resp.Body.Close()
@@ -107,5 +121,6 @@ Modes:
},
}
cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
rootCmd.AddCommand(cmd)
}

View File

@@ -28,7 +28,7 @@ Linux: Downloads AppImage (or .deb with --deb flag)
After installation:
browseros-cli launch # start BrowserOS
browseros-cli init <url> # configure the CLI with the Server URL`,
browseros-cli init --auto # configure the CLI`,
Annotations: map[string]string{"group": "Setup:"},
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
@@ -81,7 +81,7 @@ After installation:
fmt.Println()
bold.Println("Next steps:")
dim.Println(" browseros-cli launch # start BrowserOS")
dim.Println(" browseros-cli init <url> # use the Server URL from BrowserOS settings")
dim.Println(" browseros-cli init --auto # configure the CLI")
},
}

View File

@@ -1,7 +1,6 @@
package cmd
import (
"encoding/json"
"fmt"
"net/http"
"os"
@@ -39,7 +38,6 @@ If BrowserOS is already running, reports the server URL.`,
if url := probeRunningServer(); url != "" {
green.Printf("BrowserOS is already running at %s\n", url)
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
return
}
@@ -65,7 +63,7 @@ If BrowserOS is already running, reports the server URL.`,
green.Printf("BrowserOS is ready at %s\n", url)
fmt.Println()
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
dim.Println("Next: browseros-cli init --auto")
},
}
@@ -77,77 +75,39 @@ If BrowserOS is already running, reports the server URL.`,
// Server probing
// ---------------------------------------------------------------------------
var commonBrowserOSPorts = []int{9100, 9200, 9300}
// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
// probeRunningServer checks server.json, config, and common ports for a running server.
func probeRunningServer() string {
client := &http.Client{Timeout: 2 * time.Second}
check := func(baseURL string) bool {
client := &http.Client{Timeout: 2 * time.Second}
resp, err := client.Get(baseURL + "/health")
if err != nil {
return false
}
resp.Body.Close()
return resp.StatusCode == 200
}
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
// 1. server.json — written by BrowserOS on startup with the actual port
if url := loadBrowserosServerURL(); url != "" && check(url) {
return url
}
if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
// 2. Saved config / env var
if url := defaultServerURL(); url != "" && check(url) {
return url
}
return probeCommonServerPorts(client)
}
func checkServerHealth(client *http.Client, baseURL string) bool {
resp, err := client.Get(baseURL + "/health")
if err != nil {
return false
}
resp.Body.Close()
return resp.StatusCode == 200
}
func probeCommonServerPorts(client *http.Client) string {
for _, port := range commonBrowserOSPorts {
// 3. Probe common BrowserOS ports as last resort
for _, port := range []int{9100, 9200, 9300} {
url := fmt.Sprintf("http://127.0.0.1:%d", port)
if checkServerHealth(client, url) {
if check(url) {
return url
}
}
return ""
}
type serverDiscoveryConfig struct {
ServerPort int `json:"server_port"`
URL string `json:"url"`
ServerVersion string `json:"server_version"`
BrowserOSVersion string `json:"browseros_version,omitempty"`
ChromiumVersion string `json:"chromium_version,omitempty"`
}
// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
//
// Normal command resolution must not call this because it can override a URL the
// user explicitly saved with `browseros-cli init <Server URL>`.
func loadBrowserosServerURL() string {
home, err := os.UserHomeDir()
if err != nil {
return ""
}
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
if err != nil {
return ""
}
var sc serverDiscoveryConfig
if err := json.Unmarshal(data, &sc); err != nil {
return ""
}
return normalizeServerURL(sc.URL)
}
func mcpEndpointURL(baseURL string) string {
return strings.TrimSuffix(baseURL, "/") + "/mcp"
}
// ---------------------------------------------------------------------------
// Platform-native installation detection
// ---------------------------------------------------------------------------
@@ -157,8 +117,7 @@ func mcpEndpointURL(baseURL string) string {
// macOS: `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
// Linux: checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
// Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
//
// and registry uninstall key (per-user Chromium install pattern)
// and registry uninstall key (per-user Chromium install pattern)
func isBrowserOSInstalled() bool {
switch runtime.GOOS {
case "darwin":
@@ -312,11 +271,14 @@ func waitForServer(maxWait time.Duration) (string, bool) {
for time.Now().Before(deadline) {
// server.json is written by BrowserOS on startup with the actual port
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
return url, true
}
if url := probeCommonServerPorts(client); url != "" {
return url, true
if url := loadBrowserosServerURL(); url != "" {
resp, err := client.Get(url + "/health")
if err == nil {
resp.Body.Close()
if resp.StatusCode == 200 {
return url, true
}
}
}
fmt.Print(".")
time.Sleep(1 * time.Second)

View File

@@ -1,99 +0,0 @@
package cmd
import (
"fmt"
"net"
"net/http"
"net/http/httptest"
"net/url"
"os"
"path/filepath"
"strconv"
"testing"
"time"
"browseros-cli/config"
)
func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
discoveredServer := newHealthyServer(t)
configServer := newHealthyServer(t)
serverDir := filepath.Join(home, ".browseros")
if err := os.MkdirAll(serverDir, 0755); err != nil {
t.Fatalf("os.MkdirAll() error = %v", err)
}
data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
t.Fatalf("os.WriteFile() error = %v", err)
}
if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := probeRunningServer()
if got != normalizeServerURL(discoveredServer.URL) {
t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
}
}
func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
server := newHealthyServer(t)
port := serverPort(t, server.URL)
originalPorts := commonBrowserOSPorts
commonBrowserOSPorts = []int{port}
t.Cleanup(func() {
commonBrowserOSPorts = originalPorts
})
got, ok := waitForServer(100 * time.Millisecond)
if !ok {
t.Fatal("waitForServer() ok = false, want true")
}
if got != normalizeServerURL(server.URL) {
t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
}
}
func newHealthyServer(t *testing.T) *httptest.Server {
t.Helper()
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/health" {
http.NotFound(w, r)
return
}
w.WriteHeader(http.StatusOK)
}))
t.Cleanup(server.Close)
return server
}
func serverPort(t *testing.T, rawURL string) int {
t.Helper()
parsed, err := url.Parse(rawURL)
if err != nil {
t.Fatalf("url.Parse() error = %v", err)
}
_, portText, err := net.SplitHostPort(parsed.Host)
if err != nil {
t.Fatalf("net.SplitHostPort() error = %v", err)
}
port, err := strconv.Atoi(portText)
if err != nil {
t.Fatalf("strconv.Atoi() error = %v", err)
}
return port
}

View File

@@ -2,8 +2,10 @@ package cmd
import (
"context"
"encoding/json"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"time"
@@ -287,15 +289,18 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
}
}
// defaultServerURL returns the implicit target from user-controlled settings only.
//
// BrowserOS writes a discovery file at runtime, but normal commands intentionally
// ignore it so a saved URL is not silently overridden by another running server.
func defaultServerURL() string {
// 1. Explicit env var always wins
if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
return env
}
// 2. Live discovery file from running BrowserOS (most current)
if url := loadBrowserosServerURL(); url != "" {
return url
}
// 3. Saved config (may be stale if port changed)
cfg, err := config.Load()
if err == nil {
if url := normalizeServerURL(cfg.ServerURL); url != "" {
@@ -306,6 +311,33 @@ func defaultServerURL() string {
return ""
}
type serverDiscoveryConfig struct {
ServerPort int `json:"server_port"`
URL string `json:"url"`
ServerVersion string `json:"server_version"`
BrowserOSVersion string `json:"browseros_version,omitempty"`
ChromiumVersion string `json:"chromium_version,omitempty"`
}
func loadBrowserosServerURL() string {
home, err := os.UserHomeDir()
if err != nil {
return ""
}
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
if err != nil {
return ""
}
var sc serverDiscoveryConfig
if err := json.Unmarshal(data, &sc); err != nil {
return ""
}
return normalizeServerURL(sc.URL)
}
func normalizeServerURL(raw string) string {
normalized := strings.TrimSpace(raw)
@@ -337,10 +369,8 @@ func validateServerURL(raw string) (string, error) {
return "", fmt.Errorf(
"BrowserOS server URL is not configured.\n\n" +
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
" Save it with: browseros-cli init <Server URL>\n" +
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install",
" If BrowserOS is running: browseros-cli init --auto\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install",
)
}

View File

@@ -1,13 +1,8 @@
package cmd
import (
"os"
"path/filepath"
"strings"
"testing"
"time"
"browseros-cli/config"
)
func TestSetVersionUpdatesRootCommand(t *testing.T) {
@@ -105,76 +100,6 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
}
}
func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := defaultServerURL()
if got != "http://127.0.0.1:9115" {
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
t.Fatalf("config.Save() error = %v", err)
}
got := defaultServerURL()
if got != "http://127.0.0.1:9115" {
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
home := t.TempDir()
t.Setenv("HOME", home)
t.Setenv("USERPROFILE", home)
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
t.Setenv("BROWSEROS_URL", "")
serverDir := filepath.Join(home, ".browseros")
if err := os.MkdirAll(serverDir, 0755); err != nil {
t.Fatalf("os.MkdirAll() error = %v", err)
}
data := []byte(`{"url":"http://127.0.0.1:9999"}`)
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
t.Fatalf("os.WriteFile() error = %v", err)
}
if got := defaultServerURL(); got != "" {
t.Fatalf("defaultServerURL() = %q, want empty", got)
}
}
func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
if got != "http://127.0.0.1:9115" {
t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
}
}
func TestValidateServerURLExplainsManualInit(t *testing.T) {
_, err := validateServerURL("")
if err == nil {
t.Fatal("validateServerURL() error = nil, want setup instructions")
}
msg := err.Error()
if !strings.Contains(msg, "browseros-cli init <Server URL>") {
t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
}
if strings.Contains(msg, "init --auto") {
t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
}
}
func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
done := make(chan struct{})
returned := make(chan struct{})

View File

@@ -44,7 +44,10 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {
session, err := sdkClient.Connect(ctx, transport, nil)
if err != nil {
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
" If BrowserOS is not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", c.BaseURL, err)
}
return session, nil
}
@@ -184,7 +187,10 @@ func (c *Client) Status() (map[string]any, error) {
func (c *Client) restGET(path string) (map[string]any, error) {
resp, err := c.HTTPClient.Get(c.BaseURL + path)
if err != nil {
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
" If BrowserOS is not running: browseros-cli launch\n"+
" If not installed: browseros-cli install", c.BaseURL, err)
}
defer resp.Body.Close()
@@ -199,14 +205,3 @@ func (c *Client) restGET(path string) (map[string]any, error) {
}
return data, nil
}
// connectionSetupInstructions explains how to recover from a stale or missing server URL.
func connectionSetupInstructions() string {
return "\n\n" +
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
" Save it with: browseros-cli init <Server URL>\n" +
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
" Run once with: browseros-cli --server <Server URL> health\n" +
" If BrowserOS is closed: browseros-cli launch\n" +
" If not installed: browseros-cli install"
}

View File

@@ -31,8 +31,8 @@ browseros-cli install
# Start BrowserOS
browseros-cli launch
# Configure MCP settings with the Server URL from BrowserOS settings
browseros-cli init http://127.0.0.1:9000/mcp
# Auto-configure MCP settings for your AI tools
browseros-cli init --auto
# Verify everything is working
browseros-cli health

View File

@@ -9,7 +9,6 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
- **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
- **Bun** runtime
- **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.
## Quick Start
@@ -68,7 +67,7 @@ This lets us run the same suite against multiple model setups without copying th
```txt
agisdk-daily-10 + kimi-fireworks
agisdk-daily-10 + claude-opus
agisdk-daily-10 + claude-sonnet
agisdk-daily-10 + clado-action-000159
```
@@ -80,7 +79,6 @@ For `orchestrator-executor` suites, there can also be an executor model/backend.
|------|-------------|
| `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
| `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |
### Single agent
@@ -121,24 +119,6 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
}
```
### Claude Code
Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
```json
{
"agent": {
"type": "claude-code",
"model": "opus"
}
}
```
```bash
BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
```
## Graders
| Name | Description |
@@ -171,7 +151,6 @@ The `apiKey` field supports two formats:
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
@@ -215,7 +194,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
"headless": true
}
```

View File

@@ -1,26 +0,0 @@
{
"agent": {
"type": "single",
"provider": "openai-compatible",
"model": "moonshotai/kimi-k2.5",
"apiKey": "OPENROUTER_API_KEY",
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 3,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -1,27 +0,0 @@
{
"agent": {
"type": "single",
"provider": "bedrock",
"model": "global.anthropic.claude-opus-4-6-v1",
"region": "AWS_REGION",
"accessKeyId": "AWS_ACCESS_KEY_ID",
"secretAccessKey": "AWS_SECRET_ACCESS_KEY",
"supportsImages": true
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 2,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -7,8 +7,8 @@
"baseUrl": "https://openrouter.ai/api/v1",
"supportsImages": true
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 3,
"dataset": "../../data/webbench-2of4-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
@@ -21,6 +21,6 @@
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"graders": ["performance_grader"],
"timeout_ms": 1800000
}

View File

@@ -23,7 +23,7 @@
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
"headless": true
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"

View File

@@ -1,23 +0,0 @@
{
"agent": {
"type": "claude-code",
"model": "opus",
"extraArgs": ["--permission-mode", "bypassPermissions"]
},
"dataset": "../../data/agisdk-real.jsonl",
"num_workers": 1,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -14,7 +14,7 @@
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
"headless": true
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"

View File

@@ -1,191 +0,0 @@
#!/usr/bin/env bun
import { mkdir, stat } from 'node:fs/promises'
import { dirname, resolve } from 'node:path'
import { query as claudeQuery } from '@anthropic-ai/claude-agent-sdk'
import { readRunMetricSummary } from '../src/reporting/task-metrics'
export const DEFAULT_REPORT_MODEL = 'claude-opus-4-6'
export const DEFAULT_REPORT_MAX_TURNS = 300
type Env = Record<string, string | undefined>
type ClaudeQuery = (input: unknown) => AsyncIterable<Record<string, unknown>>
export interface ReportAgentInvocation {
inputDir: string
outputPath: string
prompt: string
}
export interface GenerateEvalReportOptions {
inputDir: string
outputPath: string
runAgent?: (invocation: ReportAgentInvocation) => Promise<void>
}
interface ClaudeReportAgentDeps {
query?: ClaudeQuery
env?: Env
}
function usage(): string {
return `Usage: bun scripts/generate-report.ts --input <run-dir> --output <report.html>`
}
function parseArgs(
argv: string[],
): Pick<GenerateEvalReportOptions, 'inputDir' | 'outputPath'> {
let inputDir = ''
let outputPath = ''
for (let i = 0; i < argv.length; i++) {
const arg = argv[i]
if (arg === '--input' || arg === '--run') {
inputDir = argv[++i] ?? ''
} else if (arg === '--output' || arg === '--out') {
outputPath = argv[++i] ?? ''
} else if (arg === '--help' || arg === '-h') {
console.log(usage())
process.exit(0)
}
}
if (!inputDir || !outputPath) {
throw new Error(usage())
}
return { inputDir, outputPath }
}
function claudeCodeEnv(env: Env): Env {
return {
CLAUDE_CODE_OAUTH_TOKEN: env.CLAUDE_CODE_OAUTH_TOKEN,
ANTHROPIC_API_KEY: env.ANTHROPIC_API_KEY,
HOME: env.HOME,
PATH: env.PATH,
SHELL: env.SHELL,
TMPDIR: env.TMPDIR,
TMP: env.TMP,
TEMP: env.TEMP,
USER: env.USER,
CLAUDECODE: '',
}
}
async function buildReportPrompt(
inputDir: string,
outputPath: string,
): Promise<string> {
const metrics = await readRunMetricSummary(inputDir)
return `Analyze this BrowserOS eval run and write a shareable HTML report.
Run directory: ${inputDir}
Output file to write: ${outputPath}
You are running with the run directory as cwd. Inspect the local artifacts:
- summary.json for run totals and pass rate
- each task directory's metadata.json for query, final answer, timing, screenshots, and grader results
- each task directory's messages.jsonl for tool calls, tool errors, and recent trajectory
- screenshots/ for visual evidence
- grader-artifacts/ when present for grader-specific context
Write the final report directly to the output file path above. Do not print the
report instead of writing it. Do not modify any input artifacts. The only file
you should create or overwrite is the requested report.html.
The report should follow the style and density of the Shadowfax AGI SDK report:
- Title like "AGI SDK Random-10 Failure Report" or a run-specific equivalent
- Run directory and note that screenshots are embedded as data URIs
- Summary cards for total tasks, passed, failed, pass rate, average duration, average steps, and average tool calls
- A Metrics section with compact charts for Duration by task, Steps by task, Tool calls by task, and Tool errors by task
- Task Summary table with task id, status, score, duration, steps, and prompt
- Include tool calls and tool errors in the Task Summary table
- Failure sections with stable anchors using each task id, for example <section id="agisdk-networkin-10">
- For each failed task: Diagnosis, Evidence, Next Check, final screenshot, AGI SDK / grader criteria, final answer, and recent trajectory events
- Make failure links in the summary table point to the task anchors
- Keep the HTML self-contained: inline CSS and embedded final screenshots as data:image/png;base64 URIs
- Escape user/model text correctly so task outputs cannot break the page
Analysis guidance:
- Focus on why the model failed: task understanding, browser/tool usage, missing verification, tool errors, max-step/timeout, bad final answer, or grader ambiguity
- Use messages.jsonl strategically. Do not paste huge DOM outputs into the report. Summarize only the relevant recent trajectory and evidence.
- Limit trajectory analysis to the most relevant 200-300 events/calls across the run. Prefer failed tasks and the final/key actions for each failure.
- If a grader criterion is boolean-only or ambiguous, say so and identify what additional artifact would make it debuggable.
Deterministic run metrics computed from metadata.json and messages.jsonl:
\`\`\`json
${JSON.stringify(metrics, null, 2)}
\`\`\`
After writing the file, verify that ${outputPath} exists and is non-empty.`
}
async function assertRunDir(inputDir: string): Promise<void> {
const inputStat = await stat(inputDir).catch(() => null)
if (!inputStat?.isDirectory()) {
throw new Error(`Not a run directory: ${inputDir}`)
}
}
async function assertReportWritten(outputPath: string): Promise<void> {
const outputStat = await stat(outputPath).catch(() => null)
if (!outputStat?.isFile() || outputStat.size === 0) {
throw new Error(`Report was not written: ${outputPath}`)
}
}
export async function runClaudeCodeReportAgent(
invocation: ReportAgentInvocation,
deps: ClaudeReportAgentDeps = {},
): Promise<void> {
const query = deps.query ?? (claudeQuery as unknown as ClaudeQuery)
let resultSubtype: string | undefined
for await (const message of query({
prompt: invocation.prompt,
options: {
cwd: invocation.inputDir,
model: DEFAULT_REPORT_MODEL,
systemPrompt:
'You are an eval failure analyst. Produce a concise, evidence-backed, self-contained HTML report from local run artifacts.',
permissionMode: 'bypassPermissions',
allowDangerouslySkipPermissions: true,
maxTurns: DEFAULT_REPORT_MAX_TURNS,
env: claudeCodeEnv(deps.env ?? process.env),
},
})) {
if (message.type === 'result') {
resultSubtype =
typeof message.subtype === 'string' ? message.subtype : undefined
}
}
if (resultSubtype && resultSubtype !== 'success') {
throw new Error(`Claude Code report agent failed: ${resultSubtype}`)
}
}
export async function generateEvalReport(
options: GenerateEvalReportOptions,
): Promise<void> {
const inputDir = resolve(options.inputDir)
const outputPath = resolve(options.outputPath)
await assertRunDir(inputDir)
await mkdir(dirname(outputPath), { recursive: true })
const invocation = {
inputDir,
outputPath,
prompt: await buildReportPrompt(inputDir, outputPath),
}
await (options.runAgent ?? runClaudeCodeReportAgent)(invocation)
await assertReportWritten(outputPath)
}
if (import.meta.main) {
try {
await generateEvalReport(parseArgs(Bun.argv.slice(2)))
} catch (error) {
console.error(error instanceof Error ? error.message : String(error))
process.exit(1)
}
}

View File

@@ -1,238 +0,0 @@
import { writeFile } from 'node:fs/promises'
import { join } from 'node:path'
import { DEFAULT_TIMEOUT_MS } from '../../constants'
import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
import { withEvalTimeout } from '../../utils/with-eval-timeout'
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
import {
type ClaudeCodeProcessRunner,
createClaudeCodeProcessRunner,
} from './process-runner'
import {
ClaudeCodeStreamParser,
shouldCaptureScreenshotForTool,
} from './stream-parser'
export interface ClaudeCodeEvaluatorDeps {
processRunner?: ClaudeCodeProcessRunner
}
export class ClaudeCodeEvaluator implements AgentEvaluator {
private processRunner: ClaudeCodeProcessRunner
constructor(
private ctx: AgentContext,
deps: ClaudeCodeEvaluatorDeps = {},
) {
this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
}
async execute(): Promise<AgentResult> {
const { config, task, capture, taskOutputDir } = this.ctx
const startTime = Date.now()
const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
await capture.messageLogger.logUser(task.query)
if (config.agent.type !== 'claude-code') {
throw new Error('ClaudeCodeEvaluator only supports claude-code config')
}
const agentConfig = config.agent
const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
await writeFile(
mcpConfigPath,
JSON.stringify(
buildClaudeCodeMcpConfig(config.browseros.server_url),
null,
2,
),
)
const parser = new ClaudeCodeStreamParser()
const toolNamesById = new Map<string, string>()
const prompt = buildClaudeCodePrompt(task.query)
const args = buildClaudeCodeArgs({
prompt,
mcpConfigPath,
config: agentConfig,
})
const { terminationReason } = await withEvalTimeout(
timeoutMs,
capture,
async (signal) => {
const runResult = await this.processRunner.run({
executable: agentConfig.claudePath,
args,
cwd: taskOutputDir,
signal,
onStdoutLine: async (line) => {
const events = parser.pushLine(line)
for (const event of events) {
await this.handleStreamEvent(event, toolNamesById)
}
},
})
if (runResult.exitCode !== 0) {
const message =
runResult.stderr.trim() ||
`Claude Code exited with status ${runResult.exitCode}`
capture.addError('agent_execution', message, {
exitCode: runResult.exitCode,
})
if (!parser.getLastText()) {
throw new Error(message)
}
}
for (const error of runResult.streamErrors ?? []) {
capture.addWarning(
'message_logging',
`Claude Code stream event processing failed: ${error}`,
)
}
return runResult
},
)
const endTime = Date.now()
const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
const metadata = {
query_id: task.query_id,
dataset: task.dataset,
query: task.query,
started_at: new Date(startTime).toISOString(),
completed_at: new Date(endTime).toISOString(),
total_duration_ms: endTime - startTime,
total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
termination_reason: terminationReason,
final_answer: finalAnswer,
errors: capture.getErrors(),
warnings: capture.getWarnings(),
device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
agent_config: {
type: 'claude-code' as const,
model: agentConfig.model,
},
grader_results: {},
}
await capture.trajectorySaver.saveMetadata(metadata)
return {
metadata,
messages: capture.getMessages(),
finalAnswer,
}
}
private async handleStreamEvent(
event: UIMessageStreamEvent,
toolNamesById: Map<string, string>,
): Promise<void> {
const { capture, task } = this.ctx
let screenshot: number | undefined
if (event.type === 'tool-input-available') {
toolNamesById.set(event.toolCallId, event.toolName)
if (isPageInput(event.input)) {
capture.setActivePageId(event.input.page)
}
}
if (
event.type === 'tool-output-available' ||
event.type === 'tool-output-error'
) {
const toolName = toolNamesById.get(event.toolCallId)
if (toolName && shouldCaptureScreenshotForTool(toolName)) {
screenshot = await this.captureScreenshot()
}
}
await capture.messageLogger.logStreamEvent(event, screenshot)
capture.emitEvent(task.query_id, {
...event,
...(screenshot !== undefined && { screenshot }),
})
}
private async captureScreenshot(): Promise<number | undefined> {
const { capture, task } = this.ctx
try {
const screenshot = await capture.screenshot.capture(
capture.getActivePageId(),
)
capture.emitEvent(task.query_id, {
type: 'screenshot-captured',
screenshot,
})
return screenshot
} catch {
return undefined
}
}
}
function isPageInput(input: unknown): input is { page: number } {
return (
typeof input === 'object' &&
input !== null &&
'page' in input &&
typeof input.page === 'number'
)
}
function buildClaudeCodePrompt(taskQuery: string): string {
return [
'You are running inside BrowserOS eval.',
'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
'When the task is complete, respond with the final answer only.',
'If blocked, explain the blocker clearly.',
'',
`Task: ${taskQuery}`,
].join('\n')
}
function buildClaudeCodeArgs({
prompt,
mcpConfigPath,
config,
}: {
prompt: string
mcpConfigPath: string
config: ClaudeCodeAgentConfig
}): string[] {
const args = [
'-p',
prompt,
'--mcp-config',
mcpConfigPath,
'--strict-mcp-config',
'--output-format',
'stream-json',
'--verbose',
]
if (config.model) args.push('--model', config.model)
args.push(...config.extraArgs)
return args
}
function buildClaudeCodeMcpConfig(serverUrl: string) {
const trimmed = serverUrl.replace(/\/$/, '')
const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
return {
mcpServers: {
browseros: {
type: 'http',
url,
headers: { 'X-BrowserOS-Source': 'sdk-internal' },
},
},
}
}

View File

@@ -1,114 +0,0 @@
export interface ClaudeCodeRunOptions {
executable: string
args: string[]
cwd: string
signal?: AbortSignal
onStdoutLine: (line: string) => Promise<void>
}
export interface ClaudeCodeRunResult {
exitCode: number
stderr: string
streamErrors?: string[]
}
export interface ClaudeCodeProcessRunner {
run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
}
export interface SpawnOptions {
cwd: string
signal?: AbortSignal
onStdoutLine: (line: string) => Promise<void>
}
export interface CreateClaudeCodeProcessRunnerDeps {
spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
}
export function createClaudeCodeProcessRunner(
deps: CreateClaudeCodeProcessRunnerDeps = {},
): ClaudeCodeProcessRunner {
const spawn = deps.spawn ?? spawnClaudeCode
return {
run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
spawn([executable, ...args], { cwd, signal, onStdoutLine }),
}
}
async function spawnClaudeCode(
cmd: string[],
options: SpawnOptions,
): Promise<ClaudeCodeRunResult> {
const proc = Bun.spawn({
cmd,
cwd: options.cwd,
stdin: 'ignore',
stdout: 'pipe',
stderr: 'pipe',
})
const abort = () => {
try {
proc.kill('SIGTERM')
} catch {
// Process may already have exited.
}
}
options.signal?.addEventListener('abort', abort, { once: true })
try {
const streamErrors: string[] = []
const stdoutPromise = readLines(
proc.stdout,
options.onStdoutLine,
streamErrors,
)
const stderrPromise = new Response(proc.stderr).text()
const exitCode = await proc.exited
await stdoutPromise
const stderr = await stderrPromise
return { exitCode, stderr, streamErrors }
} finally {
options.signal?.removeEventListener('abort', abort)
}
}
async function readLines(
stream: ReadableStream<Uint8Array>,
onLine: (line: string) => Promise<void>,
streamErrors: string[],
): Promise<void> {
const reader = stream.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() ?? ''
for (const line of lines) {
await emitLine(line, onLine, streamErrors)
}
}
buffer += decoder.decode()
if (buffer.length > 0) {
await emitLine(buffer, onLine, streamErrors)
}
}
async function emitLine(
line: string,
onLine: (line: string) => Promise<void>,
streamErrors: string[],
): Promise<void> {
try {
await onLine(line)
} catch (error) {
streamErrors.push(error instanceof Error ? error.message : String(error))
}
}

View File

@@ -1,142 +0,0 @@
import { randomUUID } from 'node:crypto'
import type { UIMessageStreamEvent } from '../../types'
type JsonObject = Record<string, unknown>
export class ClaudeCodeStreamParser {
private lastText: string | null = null
private toolCallCount = 0
pushLine(line: string): UIMessageStreamEvent[] {
const trimmed = line.trim()
if (!trimmed) return []
let parsed: unknown
try {
parsed = JSON.parse(trimmed)
} catch {
return []
}
if (!isObject(parsed)) return []
if (parsed.type === 'assistant') {
return this.parseAssistantMessage(parsed)
}
if (parsed.type === 'user') {
return this.parseUserMessage(parsed)
}
if (parsed.type === 'result' && typeof parsed.result === 'string') {
this.lastText = parsed.result
}
return []
}
getLastText(): string | null {
return this.lastText
}
getToolCallCount(): number {
return this.toolCallCount
}
private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
const content = contentBlocks(message)
const events: UIMessageStreamEvent[] = []
for (const block of content) {
if (block.type === 'text' && typeof block.text === 'string') {
const id = randomUUID()
this.lastText = block.text
events.push(
{ type: 'text-start', id },
{ type: 'text-delta', id, delta: block.text },
{ type: 'text-end', id },
)
} else if (
block.type === 'tool_use' &&
typeof block.id === 'string' &&
typeof block.name === 'string'
) {
this.toolCallCount++
events.push({
type: 'tool-input-available',
toolCallId: block.id,
toolName: block.name,
input: block.input,
})
}
}
return events
}
private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
const content = contentBlocks(message)
const events: UIMessageStreamEvent[] = []
for (const block of content) {
if (
block.type !== 'tool_result' ||
typeof block.tool_use_id !== 'string'
) {
continue
}
if (block.is_error === true) {
events.push({
type: 'tool-output-error',
toolCallId: block.tool_use_id,
errorText: stringifyToolContent(block.content),
})
} else {
events.push({
type: 'tool-output-available',
toolCallId: block.tool_use_id,
output: normalizeToolContent(block.content),
})
}
}
return events
}
}
export function shouldCaptureScreenshotForTool(toolName: string): boolean {
if (!toolName.startsWith('mcp__browseros__')) return false
return !toolName.endsWith('__take_screenshot')
}
function contentBlocks(message: JsonObject): JsonObject[] {
const inner = isObject(message.message) ? message.message : message
return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
}
function isObject(value: unknown): value is JsonObject {
return typeof value === 'object' && value !== null
}
function normalizeToolContent(content: unknown): unknown {
if (!Array.isArray(content)) return content
return content.map((item) => {
if (
isObject(item) &&
item.type === 'text' &&
typeof item.text === 'string'
) {
return item.text
}
return item
})
}
function stringifyToolContent(content: unknown): string {
const normalized = normalizeToolContent(content)
if (typeof normalized === 'string') return normalized
try {
return JSON.stringify(normalized)
} catch {
return String(normalized)
}
}

View File

@@ -1,4 +1,3 @@
import { ClaudeCodeEvaluator } from './claude-code'
import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
import { SingleAgentEvaluator } from './single-agent'
import type { AgentContext, AgentEvaluator } from './types'
@@ -9,8 +8,6 @@ export function createAgent(context: AgentContext): AgentEvaluator {
return new SingleAgentEvaluator(context)
case 'orchestrator-executor':
return new OrchestratorExecutorEvaluator(context)
case 'claude-code':
return new ClaudeCodeEvaluator(context)
}
}

View File

@@ -134,10 +134,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
const cdpPort = config.browseros.base_cdp_port + workerIndex
const cdp = new CdpBackend({
port: cdpPort,
exitOnReconnectFailure: false,
})
const cdp = new CdpBackend({ port: cdpPort })
await cdp.connect()
const browser = new Browser(cdp)
capture.screenshot.setBrowser(browser)

View File

@@ -43,10 +43,7 @@ export class SingleAgentEvaluator implements AgentEvaluator {
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
const cdpPort = config.browseros.base_cdp_port + workerIndex
const cdp = new CdpBackend({
port: cdpPort,
exitOnReconnectFailure: false,
})
const cdp = new CdpBackend({ port: cdpPort })
await cdp.connect()
const browser = new Browser(cdp)

View File

@@ -105,10 +105,7 @@ export class TrajectorySaver {
errors: [],
warnings: [],
agent_config: {
type: agentConfig.type as
| 'single'
| 'orchestrator-executor'
| 'claude-code',
type: agentConfig.type as 'single' | 'orchestrator-executor',
model: agentConfig.model,
},
grader_results: {},

View File

@@ -82,16 +82,6 @@ function suiteToEvalConfig(
})
}
if (suite.agent.type === 'claude-code') {
return EvalConfigSchema.parse({
...base,
agent: {
type: 'claude-code',
...(variant.agent.model && { model: variant.agent.model }),
},
})
}
const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
const executor =
executorBackend === 'clado'
@@ -145,10 +135,7 @@ export async function resolveSuiteCommand(
const loaded = await loadSuite(options.suitePath)
const variant = resolveVariant({
variantId: options.variantId,
provider:
loaded.suite.agent.type === 'claude-code'
? 'claude-code'
: options.provider,
provider: options.provider,
model: options.model,
apiKey: options.apiKey,
baseUrl: options.baseUrl,

View File

@@ -536,12 +536,6 @@ export interface DashboardConfig {
configMode?: boolean
}
export function shouldAutoOpenDashboard(
env: Record<string, string | undefined> = process.env,
): boolean {
return env.CI !== 'true'
}
export function startDashboard(config: DashboardConfig) {
const port = config.port ?? 9900
dashboardConfigMode = config.configMode ?? false
@@ -564,12 +558,10 @@ export function startDashboard(config: DashboardConfig) {
console.log(` Dashboard: ${url}`)
// Auto-open browser
if (shouldAutoOpenDashboard()) {
try {
Bun.spawn(['open', url], { stdout: 'ignore', stderr: 'ignore' })
} catch {
/* ignore if open command fails */
}
try {
Bun.spawn(['open', url], { stdout: 'ignore', stderr: 'ignore' })
} catch {
/* ignore if open command fails */
}
return { url, port }

View File

@@ -61,17 +61,6 @@
.header-stats .stat-pass { color: #3fb950; }
.header-stats .stat-fail { color: #f85149; }
.header-stats .stat-score { color: #f0883e; }
.header-report {
color: #58a6ff;
text-decoration: none;
font-size: 12px;
font-weight: 600;
border: 1px solid #30363d;
border-radius: 6px;
padding: 5px 9px;
white-space: nowrap;
}
.header-report:hover { border-color: #58a6ff; background: #1c2333; }
/* ── 3-column layout ─────────────────────────────────────────── */
.layout {
@@ -95,7 +84,6 @@
background: #161b22;
border-bottom: 1px solid #30363d;
display: flex;
flex-wrap: wrap;
gap: 12px;
font-size: 11px;
font-weight: 600;
@@ -105,80 +93,6 @@
}
.sidebar-stats .s-pass { color: #3fb950; }
.sidebar-stats .s-fail { color: #f85149; }
.sidebar-metrics {
padding: 12px 16px;
background: #0d1117;
border-bottom: 1px solid #21262d;
}
.metric-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 8px;
margin-bottom: 12px;
}
.metric-cell {
min-width: 0;
}
.metric-label {
display: block;
font-size: 9px;
font-weight: 600;
color: #6e7681;
text-transform: uppercase;
letter-spacing: 0.04em;
white-space: nowrap;
}
.metric-value {
display: block;
font-size: 13px;
font-weight: 700;
color: #e6edf3;
margin-top: 2px;
overflow: hidden;
text-overflow: ellipsis;
}
.mini-chart {
display: flex;
flex-direction: column;
gap: 6px;
}
.mini-chart-title {
font-size: 10px;
font-weight: 700;
color: #8b949e;
text-transform: uppercase;
letter-spacing: 0.04em;
}
.mini-bar-row {
display: grid;
grid-template-columns: minmax(60px, 1fr) 70px 28px;
gap: 8px;
align-items: center;
font-size: 10px;
color: #8b949e;
}
.mini-bar-name {
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
}
.mini-bar-track {
height: 6px;
background: #21262d;
border-radius: 999px;
overflow: hidden;
}
.mini-bar-fill {
height: 100%;
background: #58a6ff;
border-radius: 999px;
}
.mini-bar-value {
color: #e6edf3;
font-variant-numeric: tabular-nums;
text-align: right;
}
.sidebar-filter {
padding: 8px 12px;
border-bottom: 1px solid #21262d;
@@ -612,7 +526,6 @@
<div class="header-sep"></div>
<span class="header-run" id="header-run"></span>
<span class="header-date" id="header-date"></span>
<a class="header-report" id="header-report" target="_blank" rel="noopener" style="display: none;">Run Report</a>
<div class="header-stats" id="header-stats"></div>
</div>
@@ -620,7 +533,6 @@
<!-- Left sidebar -->
<div class="sidebar" id="sidebar">
<div class="sidebar-stats" id="sidebar-stats"></div>
<div class="sidebar-metrics" id="sidebar-metrics"></div>
<div class="sidebar-filter">
<input type="text" id="filter-input" placeholder="Search tasks..." autocomplete="off" spellcheck="false" />
</div>
@@ -715,23 +627,7 @@
if (stats.avgScore !== null) {
parts.push(`<span class="stat-score">avg ${stats.avgScore}%</span>`);
}
if (stats.avgDurationMs !== null) {
parts.push(`<span>${fmtDuration(stats.avgDurationMs)} avg</span>`);
}
if (stats.avgToolCalls !== null) {
parts.push(`<span>${fmtCompact(stats.avgToolCalls)} tools/task</span>`);
}
el.innerHTML = parts.join('');
const reportLink = document.getElementById('header-report');
const url = reportUrl(manifest);
if (url) {
reportLink.href = url;
reportLink.style.display = '';
} else {
reportLink.removeAttribute('href');
reportLink.style.display = 'none';
}
}
// ── Sidebar rendering ─────────────────────────────────────────
@@ -743,49 +639,11 @@
statsEl.innerHTML =
'<span>' + stats.total + ' total</span>' +
'<span class="s-pass">' + stats.passed + ' pass</span>' +
'<span class="s-fail">' + stats.failed + ' fail</span>' +
(stats.avgSteps !== null ? '<span>' + fmtCompact(stats.avgSteps) + ' steps/task</span>' : '') +
(stats.avgToolCalls !== null ? '<span>' + fmtCompact(stats.avgToolCalls) + ' tools/task</span>' : '');
renderSidebarMetrics(tasks, stats);
'<span class="s-fail">' + stats.failed + ' fail</span>';
renderTaskList('');
}
function renderSidebarMetrics(tasks, stats) {
const el = document.getElementById('sidebar-metrics');
if (!el) return;
const chartTasks = tasks
.slice()
.sort((a, b) => taskMetrics(b).toolCalls - taskMetrics(a).toolCalls)
.slice(0, 5);
const maxCalls = Math.max(1, ...chartTasks.map((task) => taskMetrics(task).toolCalls));
const bars = chartTasks.map((task) => {
const calls = taskMetrics(task).toolCalls;
const width = Math.max(4, Math.round((calls / maxCalls) * 100));
return (
'<div class="mini-bar-row">' +
'<span class="mini-bar-name" title="' + escAttr(task.queryId || task.id || 'Untitled') + '">' + esc(task.queryId || task.id || 'Untitled') + '</span>' +
'<span class="mini-bar-track"><span class="mini-bar-fill" style="width: ' + width + '%"></span></span>' +
'<span class="mini-bar-value">' + fmtCompact(calls) + '</span>' +
'</div>'
);
}).join('');
el.innerHTML =
'<div class="metric-grid">' +
'<div class="metric-cell"><span class="metric-label">Avg Time</span><span class="metric-value">' + (stats.avgDurationMs !== null ? fmtDuration(stats.avgDurationMs) : '-') + '</span></div>' +
'<div class="metric-cell"><span class="metric-label">Avg Steps</span><span class="metric-value">' + (stats.avgSteps !== null ? fmtCompact(stats.avgSteps) : '-') + '</span></div>' +
'<div class="metric-cell"><span class="metric-label">Avg Tools</span><span class="metric-value">' + (stats.avgToolCalls !== null ? fmtCompact(stats.avgToolCalls) : '-') + '</span></div>' +
'</div>' +
'<div class="mini-chart">' +
'<div class="mini-chart-title">Tool Calls by Task</div>' +
(bars || '<div class="task-meta-line"><span>No tool calls recorded</span></div>') +
'</div>';
}
function renderTaskList(filter) {
const list = document.getElementById('task-list');
list.innerHTML = '';
@@ -810,11 +668,8 @@
}
const metaParts = [];
const metrics = taskMetrics(task);
if (metrics.durationMs) metaParts.push(fmtDuration(metrics.durationMs));
if (metrics.steps) metaParts.push(`${fmtCompact(metrics.steps)} steps`);
if (metrics.toolCalls) metaParts.push(`${fmtCompact(metrics.toolCalls)} tools`);
if (metrics.toolErrors) metaParts.push(`${fmtCompact(metrics.toolErrors)} errors`);
if (task.durationMs) metaParts.push(fmtDuration(task.durationMs));
if (task.screenshotCount) metaParts.push(`${task.screenshotCount} steps`);
item.innerHTML =
'<div class="task-row">' +
@@ -859,7 +714,7 @@
}
function artifactPath(task, artifact) {
const manifestPath = task.paths?.[artifact];
const manifestPath = task.paths && task.paths[artifact];
if (typeof manifestPath === 'string' && manifestPath.length > 0) {
return manifestPath.replace(/^\/+/, '');
}
@@ -870,17 +725,6 @@
return `${basePath}/${artifactPath(task, artifact)}`;
}
function runArtifactUrl(path) {
if (typeof path !== 'string' || path.length === 0) return null;
return `${basePath}/${path.replace(/^\/+/, '')}`;
}
function reportUrl(manifest, task) {
const url = runArtifactUrl(manifest?.reportPath);
if (!url || !task) return url;
return `${url}#${encodeURIComponent(task.queryId || task.id || '')}`;
}
function metadataUrl(task) {
return artifactUrl(task, 'metadata');
}
@@ -1061,38 +905,10 @@
}
// Duration
const metrics = taskMetrics(task);
if (metrics.durationMs) {
if (task.durationMs) {
html += '<div class="db-section">';
html += '<span class="db-label">Duration</span>';
html += `<span class="db-value">${fmtDuration(metrics.durationMs)}</span>`;
html += '</div>';
}
if (metrics.steps) {
html += '<div class="db-section">';
html += '<span class="db-label">Steps</span>';
html += `<span class="db-value">${fmtCompact(metrics.steps)}</span>`;
html += '</div>';
}
html += '<div class="db-section">';
html += '<span class="db-label">Tool Calls</span>';
html += `<span class="db-value">${fmtCompact(metrics.toolCalls)}</span>`;
html += '</div>';
if (metrics.toolErrors) {
html += '<div class="db-section">';
html += '<span class="db-label">Tool Errors</span>';
html += `<span class="db-value">${fmtCompact(metrics.toolErrors)}</span>`;
html += '</div>';
}
const reportLink = reportUrl(manifest, task);
if (reportLink) {
html += '<div class="db-section">';
html += '<span class="db-label">Report</span>';
html += `<span class="db-value"><a href="${escAttr(reportLink)}" target="_blank" rel="noopener">Open task analysis</a></span>`;
html += `<span class="db-value">${fmtDuration(task.durationMs)}</span>`;
html += '</div>';
}
@@ -1418,25 +1234,8 @@
function computeStats(tasks) {
const total = tasks.length;
let passed = 0, failed = 0, totalScore = 0, scoredCount = 0;
let totalDurationMs = 0, durationCount = 0;
let totalSteps = 0, stepsCount = 0;
let totalToolCalls = 0, toolCount = 0;
let totalToolErrors = 0;
tasks.forEach((t) => {
const metrics = taskMetrics(t);
if (metrics.durationMs > 0) {
totalDurationMs += metrics.durationMs;
durationCount++;
}
if (metrics.steps > 0) {
totalSteps += metrics.steps;
stepsCount++;
}
totalToolCalls += metrics.toolCalls;
totalToolErrors += metrics.toolErrors;
toolCount++;
const graders = t.graderResults || {};
const keys = Object.keys(graders);
if (keys.length > 0) {
@@ -1455,34 +1254,7 @@
total: total,
passed: passed,
failed: failed,
avgScore: scoredCount > 0 ? Math.round((totalScore / scoredCount) * 100) : null,
avgDurationMs: durationCount > 0 ? totalDurationMs / durationCount : null,
avgSteps: stepsCount > 0 ? totalSteps / stepsCount : null,
avgToolCalls: toolCount > 0 ? totalToolCalls / toolCount : null,
totalToolCalls: totalToolCalls,
totalToolErrors: totalToolErrors
};
}
function taskMetrics(task) {
const metrics = task.metrics || {};
const screenshots = Number.isFinite(Number(metrics.screenshots))
? Number(metrics.screenshots)
: Number(task.screenshotCount || 0);
return {
durationMs: Number.isFinite(Number(metrics.durationMs))
? Number(metrics.durationMs)
: Number(task.durationMs || 0),
steps: Number.isFinite(Number(metrics.steps))
? Number(metrics.steps)
: screenshots,
screenshots: screenshots,
toolCalls: Number.isFinite(Number(metrics.toolCalls))
? Number(metrics.toolCalls)
: 0,
toolErrors: Number.isFinite(Number(metrics.toolErrors))
? Number(metrics.toolErrors)
: 0
avgScore: scoredCount > 0 ? Math.round((totalScore / scoredCount) * 100) : null
};
}
@@ -1538,13 +1310,6 @@
return `${h}h ${remM}m`;
}
function fmtCompact(value) {
const num = Number(value);
if (!Number.isFinite(num)) return '0';
if (Number.isInteger(num)) return String(num);
return num.toFixed(1);
}
function showFatalError(msgHtml) {
document.getElementById('center-panel').innerHTML =
'<div class="placeholder error">' +

View File

@@ -2,7 +2,6 @@ export interface PythonEvaluatorOptions {
scriptPath: string
input: unknown
timeoutMs: number
pythonPath?: string
}
export interface PythonEvaluatorResult<T> {
@@ -16,9 +15,7 @@ export interface PythonEvaluatorResult<T> {
export async function runPythonJsonEvaluator<T>(
options: PythonEvaluatorOptions,
): Promise<PythonEvaluatorResult<T>> {
const pythonPath =
options.pythonPath || process.env.BROWSEROS_EVAL_PYTHON || 'python3'
const proc = Bun.spawn([pythonPath, options.scriptPath], {
const proc = Bun.spawn(['python3', options.scriptPath], {
stdin: 'pipe',
stdout: 'pipe',
stderr: 'pipe',

View File

@@ -5,7 +5,6 @@ import {
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3'
import { readTaskMetrics } from '../reporting/task-metrics'
import {
buildViewerManifest,
type ViewerManifestTaskInput,
@@ -316,7 +315,6 @@ export class R2Publisher {
graderResults:
(meta.grader_results as ViewerManifestTaskInput['graderResults']) ||
{},
metrics: await readTaskMetrics(taskPath, meta, screenshotCount),
})
}
@@ -381,12 +379,10 @@ export class R2Publisher {
await readFile(join(runDir, 'summary.json'), 'utf-8'),
) as Record<string, unknown>
} catch {}
const reportStat = await stat(join(runDir, 'report.html')).catch(() => null)
return buildViewerManifest({
runId,
uploadedAt: this.now().toISOString(),
reportPath: reportStat?.isFile() ? 'report.html' : undefined,
agentConfig,
dataset,
summary: summaryData

View File

@@ -1,188 +0,0 @@
import { readdir, readFile, stat } from 'node:fs/promises'
import { join } from 'node:path'
export interface EvalTaskMetrics {
durationMs: number
steps: number
screenshots: number
toolCalls: number
toolErrors: number
}
export interface EvalRunMetrics {
taskCount: number
totalDurationMs: number
avgDurationMs: number
totalSteps: number
avgSteps: number
totalToolCalls: number
avgToolCalls: number
totalToolErrors: number
avgToolErrors: number
}
export interface EvalTaskMetricSummary {
queryId: string
status: string
score?: number
pass?: boolean
metrics: EvalTaskMetrics
}
export interface EvalRunMetricSummary {
run: EvalRunMetrics
tasks: EvalTaskMetricSummary[]
}
interface TaskDirEntry {
taskId: string
taskPath: string
}
function numberValue(value: unknown): number {
return typeof value === 'number' && Number.isFinite(value) ? value : 0
}
export function countMessageMetrics(messagesJsonl: string): {
toolCalls: number
toolErrors: number
} {
let toolCalls = 0
let toolErrors = 0
for (const line of messagesJsonl.split('\n')) {
const trimmed = line.trim()
if (!trimmed) continue
try {
const event = JSON.parse(trimmed) as { type?: unknown }
if (event.type === 'tool-input-available') toolCalls++
if (event.type === 'tool-output-error') toolErrors++
} catch {
// Ignore malformed telemetry lines; the raw artifact is still uploaded.
}
}
return { toolCalls, toolErrors }
}
export function buildTaskMetrics(
metadata: Record<string, unknown>,
messageMetrics: { toolCalls: number; toolErrors: number },
screenshotCount = 0,
): EvalTaskMetrics {
const screenshots = numberValue(metadata.screenshot_count) || screenshotCount
return {
durationMs: numberValue(metadata.total_duration_ms),
steps: numberValue(metadata.total_steps) || screenshots,
screenshots,
toolCalls: messageMetrics.toolCalls,
toolErrors: messageMetrics.toolErrors,
}
}
export function buildRunMetrics(metrics: EvalTaskMetrics[]): EvalRunMetrics {
const taskCount = metrics.length
const totalDurationMs = metrics.reduce((sum, metric) => {
return sum + metric.durationMs
}, 0)
const totalSteps = metrics.reduce((sum, metric) => sum + metric.steps, 0)
const totalToolCalls = metrics.reduce((sum, metric) => {
return sum + metric.toolCalls
}, 0)
const totalToolErrors = metrics.reduce((sum, metric) => {
return sum + metric.toolErrors
}, 0)
return {
taskCount,
totalDurationMs,
avgDurationMs: taskCount > 0 ? totalDurationMs / taskCount : 0,
totalSteps,
avgSteps: taskCount > 0 ? totalSteps / taskCount : 0,
totalToolCalls,
avgToolCalls: taskCount > 0 ? totalToolCalls / taskCount : 0,
totalToolErrors,
avgToolErrors: taskCount > 0 ? totalToolErrors / taskCount : 0,
}
}
export async function readTaskMetrics(
taskPath: string,
metadata: Record<string, unknown>,
screenshotCount = 0,
): Promise<EvalTaskMetrics> {
const messages = await readFile(join(taskPath, 'messages.jsonl'), 'utf-8')
.then(countMessageMetrics)
.catch(() => ({ toolCalls: 0, toolErrors: 0 }))
return buildTaskMetrics(metadata, messages, screenshotCount)
}
function statusFromMetadata(metadata: Record<string, unknown>): string {
const termination = metadata.termination_reason
if (termination === 'timeout') return 'timeout'
if (Array.isArray(metadata.errors) && metadata.errors.length > 0) {
return 'failed'
}
return 'completed'
}
function primaryGrade(metadata: Record<string, unknown>): {
score?: number
pass?: boolean
} {
const graders = metadata.grader_results as
| Record<string, { score?: unknown; pass?: unknown }>
| undefined
const first = graders ? Object.values(graders)[0] : undefined
return {
...(typeof first?.score === 'number' ? { score: first.score } : {}),
...(typeof first?.pass === 'boolean' ? { pass: first.pass } : {}),
}
}
async function readTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
const canonicalTasksDir = join(runDir, 'tasks')
const canonicalStat = await stat(canonicalTasksDir).catch(() => null)
const baseDir = canonicalStat?.isDirectory() ? canonicalTasksDir : runDir
const entries = await readdir(baseDir, { withFileTypes: true }).catch(
() => [],
)
return entries
.filter((entry) => entry.isDirectory())
.filter((entry) => entry.name !== 'screenshots')
.filter((entry) => entry.name !== 'tasks')
.map((entry) => ({
taskId: entry.name,
taskPath: join(baseDir, entry.name),
}))
}
export async function readRunMetricSummary(
runDir: string,
): Promise<EvalRunMetricSummary> {
const tasks: EvalTaskMetricSummary[] = []
for (const entry of await readTaskDirs(runDir)) {
const metadata = await readFile(
join(entry.taskPath, 'metadata.json'),
'utf-8',
)
.then((text) => JSON.parse(text) as Record<string, unknown>)
.catch(() => null)
if (!metadata) continue
const metrics = await readTaskMetrics(entry.taskPath, metadata)
tasks.push({
queryId: (metadata.query_id as string | undefined) || entry.taskId,
status: statusFromMetadata(metadata),
...primaryGrade(metadata),
metrics,
})
}
return {
run: buildRunMetrics(tasks.map((task) => task.metrics)),
tasks,
}
}

View File

@@ -33,13 +33,6 @@ function variantSource(config: EvalConfig): {
baseUrl?: string
supportsImages?: boolean
} {
if (config.agent.type === 'claude-code') {
return {
provider: 'claude-code',
model: config.agent.model ?? 'default',
}
}
const agent =
config.agent.type === 'single' ? config.agent : config.agent.orchestrator
if (!agent.model) {
@@ -83,7 +76,10 @@ export async function adaptEvalConfigFile(
suite: {
id,
dataset: evalConfig.dataset,
agent: suiteAgent(evalConfig, backend),
agent:
evalConfig.agent.type === 'single'
? { type: 'tool-loop' }
: { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' },
graders: evalConfig.graders ?? [],
workers: evalConfig.num_workers,
restartBrowserPerTask: evalConfig.restart_server_per_task,
@@ -103,17 +99,3 @@ export async function adaptEvalConfigFile(
}),
}
}
function suiteAgent(
config: EvalConfig,
backend: ReturnType<typeof executorBackend>,
): EvalSuite['agent'] {
switch (config.agent.type) {
case 'single':
return { type: 'tool-loop' }
case 'orchestrator-executor':
return { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' }
case 'claude-code':
return { type: 'claude-code' }
}
}

View File

@@ -57,30 +57,10 @@ export function resolveVariant(
options: ResolveVariantOptions = {},
): EvalVariant {
const env = options.env ?? process.env
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
const provider =
options.provider ?? env.EVAL_AGENT_PROVIDER ?? 'openai-compatible'
const model = options.model ?? env.EVAL_AGENT_MODEL
if (provider === 'claude-code') {
const id = options.variantId ?? env.EVAL_VARIANT ?? 'claude-code'
return {
id,
agent: {
provider,
model: model ?? '',
},
publicMetadata: {
id,
agent: {
provider,
model: model || 'default',
apiKeyConfigured: false,
},
},
}
}
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
const apiKey = options.apiKey ?? env.EVAL_AGENT_API_KEY
const apiKeyEnv =
options.apiKeyEnv ?? (options.apiKey ? undefined : 'EVAL_AGENT_API_KEY')

View File

@@ -8,7 +8,6 @@ export const SuiteAgentSchema = z
'single',
'orchestrated',
'orchestrator-executor',
'claude-code',
]),
executorBackend: z.enum(['tool-loop', 'clado']).optional(),
})

View File

@@ -19,19 +19,9 @@ export const OrchestratorExecutorConfigSchema = z.object({
}),
})
export const ClaudeCodeAgentConfigSchema = z
.object({
type: z.literal('claude-code'),
model: z.string().min(1).optional(),
claudePath: z.string().min(1).default('claude'),
extraArgs: z.array(z.string()).default([]),
})
.strict()
export const AgentConfigSchema = z.discriminatedUnion('type', [
SingleAgentConfigSchema,
OrchestratorExecutorConfigSchema,
ClaudeCodeAgentConfigSchema,
])
export const EvalConfigSchema = z.object({
@@ -63,6 +53,5 @@ export type SingleAgentConfig = z.infer<typeof SingleAgentConfigSchema>
export type OrchestratorExecutorConfig = z.infer<
typeof OrchestratorExecutorConfigSchema
>
export type ClaudeCodeAgentConfig = z.infer<typeof ClaudeCodeAgentConfigSchema>
export type AgentConfig = z.infer<typeof AgentConfigSchema>
export type EvalConfig = z.infer<typeof EvalConfigSchema>

View File

@@ -2,8 +2,6 @@
export {
type AgentConfig,
AgentConfigSchema,
type ClaudeCodeAgentConfig,
ClaudeCodeAgentConfigSchema,
type EvalConfig,
EvalConfigSchema,
type OrchestratorExecutorConfig,

View File

@@ -13,7 +13,7 @@ export const GraderResultSchema = z.object({
// Agent config in metadata
const AgentConfigMetaSchema = z
.object({
type: z.enum(['single', 'orchestrator-executor', 'claude-code']),
type: z.enum(['single', 'orchestrator-executor']),
model: z.string().optional(),
})
.passthrough()

View File

@@ -59,7 +59,7 @@ export async function validateConfig(
) {
envVarsToCheck.push(config.agent.apiKey)
}
} else if (config.agent.type === 'orchestrator-executor') {
} else {
const { orchestrator, executor } = config.agent
if (orchestrator.apiKey && isEnvVarName(orchestrator.apiKey)) {
envVarsToCheck.push(orchestrator.apiKey)

View File

@@ -36,6 +36,5 @@ export async function resolveProviderConfig(
accessKeyId: resolveEnvValue(agent.accessKeyId),
secretAccessKey: resolveEnvValue(agent.secretAccessKey),
sessionToken: resolveEnvValue(agent.sessionToken),
region: resolveEnvValue(agent.region),
}
}

View File

@@ -1,8 +1,3 @@
import {
buildRunMetrics,
type EvalRunMetrics,
type EvalTaskMetrics,
} from '../reporting/task-metrics'
import type { GraderResult } from '../types'
export const VIEWER_MANIFEST_SCHEMA_VERSION = 2
@@ -25,7 +20,6 @@ export interface ViewerManifestTaskInput {
status: string
durationMs: number
screenshotCount: number
metrics?: EvalTaskMetrics
graderResults: Record<string, GraderResult>
}
@@ -41,11 +35,9 @@ export interface ViewerManifest {
suiteId?: string
variantId?: string
uploadedAt?: string
reportPath?: string
agentConfig?: Record<string, unknown>
dataset?: string
summary?: Record<string, unknown>
metrics?: EvalRunMetrics
tasks: ViewerManifestTask[]
}
@@ -54,7 +46,6 @@ export interface BuildViewerManifestInput {
suiteId?: string
variantId?: string
uploadedAt?: string
reportPath?: string
agentConfig?: Record<string, unknown>
dataset?: string
summary?: Record<string, unknown>
@@ -77,37 +68,22 @@ function taskPaths(queryId: string): ViewerManifestTaskPaths {
export function buildViewerManifest(
input: BuildViewerManifestInput,
): ViewerManifest {
const tasks = input.tasks.map((task) => {
const { artifactId, ...publicTask } = task
const metrics =
publicTask.metrics ??
({
durationMs: publicTask.durationMs,
steps: publicTask.screenshotCount,
screenshots: publicTask.screenshotCount,
toolCalls: 0,
toolErrors: 0,
} satisfies EvalTaskMetrics)
return {
...publicTask,
metrics,
startUrl: publicTask.startUrl ?? '',
paths: taskPaths(artifactId ?? publicTask.queryId),
}
})
return {
schemaVersion: VIEWER_MANIFEST_SCHEMA_VERSION,
runId: input.runId,
...(input.suiteId ? { suiteId: input.suiteId } : {}),
...(input.variantId ? { variantId: input.variantId } : {}),
...(input.uploadedAt ? { uploadedAt: input.uploadedAt } : {}),
...(input.reportPath ? { reportPath: input.reportPath } : {}),
...(input.agentConfig ? { agentConfig: input.agentConfig } : {}),
...(input.dataset ? { dataset: input.dataset } : {}),
...(input.summary ? { summary: input.summary } : {}),
metrics: buildRunMetrics(tasks.map((task) => task.metrics)),
tasks,
tasks: input.tasks.map((task) => {
const { artifactId, ...publicTask } = task
return {
...publicTask,
startUrl: publicTask.startUrl ?? '',
paths: taskPaths(artifactId ?? publicTask.queryId),
}
}),
}
}

View File

@@ -1,268 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { mkdtemp, readFile } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import { join } from 'node:path'
import { createAgent } from '../../src/agents'
import { ClaudeCodeEvaluator } from '../../src/agents/claude-code'
import { CaptureContext } from '../../src/capture/context'
import {
AgentConfigSchema,
type EvalConfig,
EvalConfigSchema,
type Task,
TaskMetadataSchema,
} from '../../src/types'
function config(): EvalConfig {
return {
agent: {
type: 'claude-code',
model: 'opus',
claudePath: 'claude',
extraArgs: [],
},
dataset: 'data/test.jsonl',
num_workers: 1,
restart_server_per_task: false,
browseros: {
server_url: 'http://127.0.0.1:9110',
base_cdp_port: 9010,
base_server_port: 9110,
base_extension_port: 9310,
load_extensions: false,
headless: false,
},
graders: [],
}
}
const task: Task = {
query_id: 'task-1',
dataset: 'test',
query: 'Find the title',
graders: [],
metadata: {
original_task_id: 'task-1',
},
}
describe('ClaudeCodeEvaluator', () => {
it('accepts claude-code config defaults without permission mode', () => {
const agent = AgentConfigSchema.parse({ type: 'claude-code' })
expect(agent).toEqual({
type: 'claude-code',
claudePath: 'claude',
extraArgs: [],
})
})
it('accepts claude-code as a runnable eval agent', () => {
const parsed = EvalConfigSchema.parse({
agent: {
type: 'claude-code',
model: 'opus',
},
dataset: 'data/test-set.jsonl',
browseros: {
server_url: 'http://127.0.0.1:9110',
},
})
expect(parsed.agent.type).toBe('claude-code')
expect(parsed.agent.model).toBe('opus')
})
it('rejects unsupported claude-code settings instead of silently ignoring them', () => {
expect(
AgentConfigSchema.safeParse({
type: 'claude-code',
permissionMode: 'bypassPermissions',
}).success,
).toBe(false)
expect(
AgentConfigSchema.safeParse({
type: 'claude-code',
maxTurns: 3,
}).success,
).toBe(false)
})
it('allows claude-code in task metadata', () => {
const metadata = TaskMetadataSchema.parse({
query_id: 'task-1',
dataset: 'test',
query: 'Do the thing',
started_at: new Date().toISOString(),
completed_at: new Date().toISOString(),
total_duration_ms: 100,
total_steps: 1,
termination_reason: 'completed',
final_answer: 'done',
errors: [],
warnings: [],
agent_config: {
type: 'claude-code',
model: 'opus',
},
grader_results: {},
})
expect(metadata.agent_config.type).toBe('claude-code')
})
it('is created by the agent factory', async () => {
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
const { capture, taskOutputDir } = await CaptureContext.create({
serverUrl: 'http://127.0.0.1:9110',
outputDir,
taskId: task.query_id,
initialPageId: 1,
})
const agent = createAgent({
config: config(),
task,
workerIndex: 0,
initialPageId: 1,
outputDir,
taskOutputDir,
capture,
})
expect(agent).toBeInstanceOf(ClaudeCodeEvaluator)
})
it('runs claude code, logs messages, writes MCP config, and saves metadata', async () => {
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
const { capture, taskOutputDir } = await CaptureContext.create({
serverUrl: 'http://127.0.0.1:9110',
outputDir,
taskId: task.query_id,
initialPageId: 1,
})
const calls: Array<{ executable: string; args: string[]; cwd: string }> = []
const evaluator = new ClaudeCodeEvaluator(
{
config: config(),
task,
workerIndex: 0,
initialPageId: 1,
outputDir,
taskOutputDir,
capture,
},
{
processRunner: {
async run(options) {
calls.push(options)
await options.onStdoutLine(
JSON.stringify({
type: 'assistant',
message: {
content: [{ type: 'text', text: 'The title is Example' }],
},
}),
)
await options.onStdoutLine(
JSON.stringify({
type: 'result',
subtype: 'success',
result: 'The title is Example',
}),
)
return { exitCode: 0, stderr: '' }
},
},
},
)
const result = await evaluator.execute()
expect(result.finalAnswer).toBe('The title is Example')
expect(result.metadata.agent_config).toMatchObject({
type: 'claude-code',
model: 'opus',
})
expect(result.messages.some((msg) => msg.type === 'user')).toBe(true)
expect(result.messages.some((msg) => msg.type === 'text-delta')).toBe(true)
const mcpConfig = JSON.parse(
await readFile(join(taskOutputDir, 'claude-code-mcp.json'), 'utf-8'),
)
expect(mcpConfig.mcpServers.browseros).toMatchObject({
type: 'http',
url: 'http://127.0.0.1:9110/mcp',
headers: {
'X-BrowserOS-Source': 'sdk-internal',
},
})
expect(calls).toEqual([
expect.objectContaining({
executable: 'claude',
cwd: taskOutputDir,
args: [
'-p',
expect.stringContaining('Task: Find the title'),
'--mcp-config',
join(taskOutputDir, 'claude-code-mcp.json'),
'--strict-mcp-config',
'--output-format',
'stream-json',
'--verbose',
'--model',
'opus',
],
}),
])
expect(calls[0].args).not.toContain('--permission-mode')
})
it('records non-fatal stream processing errors as warnings', async () => {
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
const { capture, taskOutputDir } = await CaptureContext.create({
serverUrl: 'http://127.0.0.1:9110',
outputDir,
taskId: task.query_id,
initialPageId: 1,
})
const evaluator = new ClaudeCodeEvaluator(
{
config: config(),
task,
workerIndex: 0,
initialPageId: 1,
outputDir,
taskOutputDir,
capture,
},
{
processRunner: {
async run(options) {
await options.onStdoutLine(
JSON.stringify({
type: 'result',
subtype: 'success',
result: 'done',
}),
)
return {
exitCode: 0,
stderr: '',
streamErrors: ['bad stream line'],
}
},
},
},
)
const result = await evaluator.execute()
expect(result.finalAnswer).toBe('done')
expect(result.metadata.warnings).toEqual([
expect.objectContaining({
source: 'message_logging',
message: 'Claude Code stream event processing failed: bad stream line',
}),
])
})
})

View File

@@ -1,78 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import { join } from 'node:path'
import { createClaudeCodeProcessRunner } from '../../src/agents/claude-code/process-runner'
async function writeStdoutScript(): Promise<string> {
const dir = await mkdtemp(join(tmpdir(), 'claude-code-runner-'))
const script = join(dir, 'stdout-lines')
await writeFile(script, '#!/bin/sh\nprintf "first\\nbad\\nlast\\n"\n')
await chmod(script, 0o755)
return script
}
describe('createClaudeCodeProcessRunner', () => {
it('passes executable and args to the spawn dependency', async () => {
const calls: unknown[] = []
const runner = createClaudeCodeProcessRunner({
spawn: async (cmd, options) => {
calls.push({ cmd, options })
await options.onStdoutLine('{"type":"result","result":"done"}')
return { exitCode: 0, stderr: '' }
},
})
const result = await runner.run({
executable: 'claude',
args: ['-p', 'hello'],
cwd: '/tmp',
signal: new AbortController().signal,
onStdoutLine: async () => {},
})
expect(result.exitCode).toBe(0)
expect(calls).toEqual([
{
cmd: ['claude', '-p', 'hello'],
options: expect.objectContaining({ cwd: '/tmp' }),
},
])
})
it('returns stderr and non-zero exit codes', async () => {
const runner = createClaudeCodeProcessRunner({
spawn: async () => ({ exitCode: 2, stderr: 'bad auth' }),
})
const result = await runner.run({
executable: 'claude',
args: [],
cwd: '/tmp',
signal: new AbortController().signal,
onStdoutLine: async () => {},
})
expect(result).toEqual({ exitCode: 2, stderr: 'bad auth' })
})
it('continues reading stdout after a line handler error', async () => {
const script = await writeStdoutScript()
const lines: string[] = []
const runner = createClaudeCodeProcessRunner()
const result = await runner.run({
executable: script,
args: [],
cwd: '/tmp',
onStdoutLine: async (line) => {
lines.push(line)
if (line === 'bad') throw new Error('bad line')
},
})
expect(result.exitCode).toBe(0)
expect(result.streamErrors).toEqual(['bad line'])
expect(lines).toEqual(['first', 'bad', 'last'])
})
})

View File

@@ -1,102 +0,0 @@
import { describe, expect, it } from 'bun:test'
import {
ClaudeCodeStreamParser,
shouldCaptureScreenshotForTool,
} from '../../src/agents/claude-code/stream-parser'
describe('ClaudeCodeStreamParser', () => {
it('maps assistant text and MCP tool use into eval stream events', () => {
const parser = new ClaudeCodeStreamParser()
const events = parser.pushLine(
JSON.stringify({
type: 'assistant',
message: {
content: [
{ type: 'text', text: 'I will navigate.' },
{
type: 'tool_use',
id: 'toolu_1',
name: 'mcp__browseros__navigate_page',
input: { page: 2, url: 'https://example.com' },
},
],
},
}),
)
expect(events).toEqual([
{ type: 'text-start', id: expect.any(String) },
{
type: 'text-delta',
id: expect.any(String),
delta: 'I will navigate.',
},
{ type: 'text-end', id: expect.any(String) },
{
type: 'tool-input-available',
toolCallId: 'toolu_1',
toolName: 'mcp__browseros__navigate_page',
input: { page: 2, url: 'https://example.com' },
},
])
expect(parser.getLastText()).toBe('I will navigate.')
expect(parser.getToolCallCount()).toBe(1)
})
it('maps Claude Code tool results into eval output events', () => {
const parser = new ClaudeCodeStreamParser()
const events = parser.pushLine(
JSON.stringify({
type: 'user',
message: {
content: [
{
type: 'tool_result',
tool_use_id: 'toolu_1',
content: 'Navigated successfully',
},
],
},
}),
)
expect(events).toEqual([
{
type: 'tool-output-available',
toolCallId: 'toolu_1',
output: 'Navigated successfully',
},
])
})
it('uses result messages as the authoritative final text', () => {
const parser = new ClaudeCodeStreamParser()
parser.pushLine(
JSON.stringify({
type: 'assistant',
message: {
content: [{ type: 'text', text: 'I will complete the task.' }],
},
}),
)
parser.pushLine(
JSON.stringify({
type: 'result',
subtype: 'success',
result: 'Final answer',
}),
)
expect(parser.getLastText()).toBe('Final answer')
})
it('identifies BrowserOS MCP tools that should trigger screenshots', () => {
expect(
shouldCaptureScreenshotForTool('mcp__browseros__navigate_page'),
).toBe(true)
expect(
shouldCaptureScreenshotForTool('mcp__browseros__take_screenshot'),
).toBe(false)
expect(shouldCaptureScreenshotForTool('Read')).toBe(false)
})
})

View File

@@ -7,11 +7,8 @@ import {
runSuiteCommand,
} from '../../src/cli/commands/suite'
import type { RunEvalOptions } from '../../src/runner/types'
import type { EvalSuite } from '../../src/suites/schema'
async function writeTempSuite(
overrides: Partial<EvalSuite> = {},
): Promise<{ dir: string; suitePath: string }> {
async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
const dir = await mkdtemp(join(tmpdir(), 'eval-suite-cli-'))
const suitePath = join(dir, 'agisdk-daily-10.json')
await writeFile(
@@ -26,9 +23,8 @@ async function writeTempSuite(
restartBrowserPerTask: true,
browseros: {
server_url: 'http://127.0.0.1:9110',
headless: false,
headless: true,
},
...overrides,
},
null,
2,
@@ -47,7 +43,9 @@ describe('suite command', () => {
expect(resolved.kind).toBe('config')
expect(resolved.suite.id).toBe('browseros-agent-weekly')
expect(resolved.evalConfig.dataset).toBe('../../data/agisdk-real.jsonl')
expect(resolved.evalConfig.dataset).toBe(
'../../data/webbench-2of4-50.jsonl',
)
expect(resolved.variant.publicMetadata.agent.apiKeyConfigured).toBe(true)
})
@@ -77,25 +75,6 @@ describe('suite command', () => {
expect(resolved.evalConfig.num_workers).toBe(2)
})
it('resolves claude-code suites without provider API credentials', async () => {
const { dir, suitePath } = await writeTempSuite({
agent: { type: 'claude-code' },
})
const resolved = await resolveSuiteCommand({
suitePath,
model: 'opus',
env: {},
})
expect(resolved.kind).toBe('suite')
expect(resolved.evalConfig.agent).toMatchObject({
type: 'claude-code',
model: 'opus',
})
expect(resolved.datasetPath).toBe(join(dir, 'tasks.jsonl'))
})
it('runs config and suite commands through the runner dependency', async () => {
const calls: RunEvalOptions[] = []
await runSuiteCommand(

View File

@@ -1,12 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { shouldAutoOpenDashboard } from '../../src/dashboard/server'
describe('dashboard server', () => {
it('does not auto-open the dashboard in CI', () => {
expect(shouldAutoOpenDashboard({ CI: 'true' })).toBe(false)
})
it('auto-opens the dashboard outside CI by default', () => {
expect(shouldAutoOpenDashboard({})).toBe(true)
})
})

View File

@@ -1,5 +1,5 @@
import { describe, expect, it } from 'bun:test'
import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
import { mkdtemp, writeFile } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import { join } from 'node:path'
import { runPythonJsonEvaluator } from '../../src/grading/python-evaluator'
@@ -11,17 +11,6 @@ async function writeScript(source: string): Promise<string> {
return script
}
async function writePythonWrapper(): Promise<string> {
const dir = await mkdtemp(join(tmpdir(), 'eval-python-wrapper-'))
const wrapper = join(dir, 'python-wrapper')
await writeFile(
wrapper,
'#!/bin/sh\necho custom-python >&2\nexec python3 "$@"\n',
)
await chmod(wrapper, 0o755)
return wrapper
}
describe('runPythonJsonEvaluator', () => {
it('sends JSON on stdin, captures stderr, and parses stdout JSON', async () => {
const script = await writeScript(`
@@ -60,34 +49,6 @@ sys.exit(3)
).rejects.toThrow('bad verifier')
})
it('uses BROWSEROS_EVAL_PYTHON when provided', async () => {
const script = await writeScript(`
import json, sys
data = json.loads(sys.stdin.read())
print(json.dumps({"ok": data["ok"]}))
`)
const wrapper = await writePythonWrapper()
const previousPythonPath = process.env.BROWSEROS_EVAL_PYTHON
process.env.BROWSEROS_EVAL_PYTHON = wrapper
try {
const result = await runPythonJsonEvaluator<{ ok: boolean }>({
scriptPath: script,
input: { ok: true },
timeoutMs: 5_000,
})
expect(result.output).toEqual({ ok: true })
expect(result.stderr).toContain('custom-python')
} finally {
if (previousPythonPath === undefined) {
delete process.env.BROWSEROS_EVAL_PYTHON
} else {
process.env.BROWSEROS_EVAL_PYTHON = previousPythonPath
}
}
})
it('enforces timeouts', async () => {
const script = await writeScript(`
import time

View File

@@ -40,7 +40,6 @@ async function writeRunFixture(
start_url: 'https://example.test',
termination_reason: 'completed',
total_duration_ms: 1200,
total_steps: 4,
screenshot_count: 1,
agent_config: { type: 'single', model: 'kimi' },
grader_results: {
@@ -48,22 +47,13 @@ async function writeRunFixture(
},
}),
)
await writeFile(
join(taskDir, 'messages.jsonl'),
[
'{"type":"user"}',
'{"type":"tool-input-available","toolName":"click"}',
'{"type":"tool-input-available","toolName":"take_snapshot"}',
'{"type":"tool-output-error","toolName":"click"}',
].join('\n'),
)
await writeFile(join(taskDir, 'messages.jsonl'), '{"type":"user"}\n')
await writeFile(join(taskDir, 'grades.json'), '{"ok":true}')
await writeFile(join(taskDir, 'screenshots', '1.png'), 'png')
await writeFile(
join(runDir, 'summary.json'),
JSON.stringify({ passRate: 1, avgDurationMs: 1200 }),
)
await writeFile(join(runDir, 'report.html'), '<html>report</html>')
return { runDir, runId: `${configName}-${timestamp}` }
}
@@ -120,9 +110,6 @@ describe('R2Publisher', () => {
expect(byKey.get(`runs/${runId}/summary.json`)?.ContentType).toBe(
'application/json',
)
expect(byKey.get(`runs/${runId}/report.html`)?.ContentType).toBe(
'text/html',
)
expect(byKey.get('viewer.html')?.ContentType).toBe('text/html')
expect(result.viewerUrl).toBe(
`https://eval.example.test/viewer.html?run=${runId}`,
@@ -139,28 +126,12 @@ describe('R2Publisher', () => {
uploadedAt: '2026-04-29T12:00:00.000Z',
agentConfig: { type: 'single', model: 'kimi' },
dataset: 'webbench',
reportPath: 'report.html',
summary: { passRate: 1, avgDurationMs: 1200 },
metrics: {
taskCount: 1,
avgDurationMs: 1200,
avgSteps: 4,
avgToolCalls: 2,
totalToolCalls: 2,
totalToolErrors: 1,
},
tasks: [
{
queryId: 'task-1',
status: 'completed',
screenshotCount: 1,
metrics: {
durationMs: 1200,
steps: 4,
screenshots: 1,
toolCalls: 2,
toolErrors: 1,
},
paths: {
attempt: 'tasks/task-1/attempt.json',
metadata: 'tasks/task-1/metadata.json',

View File

@@ -6,7 +6,6 @@ interface ViewerPathResolvers {
artifactUrl(task: Record<string, unknown>, artifact: string): string
metadataUrl(task: Record<string, unknown>): string
messagesUrl(task: Record<string, unknown>): string
reportUrl(manifest: Record<string, unknown>): string | null
screenshotUrl(task: Record<string, unknown>, step: number): string
}
@@ -25,7 +24,7 @@ async function loadViewerPathResolvers(): Promise<ViewerPathResolvers> {
`
const basePath = 'runs/run-1';
${block}
return { artifactUrl, metadataUrl, messagesUrl, reportUrl, screenshotUrl };
return { artifactUrl, metadataUrl, messagesUrl, screenshotUrl };
`,
) as () => ViewerPathResolvers
return createResolvers()
@@ -61,35 +60,6 @@ async function runAutoSelectFromHash(hash: string): Promise<unknown> {
return runAutoSelect()
}
async function runComputeStats(): Promise<unknown> {
const html = await readFile(
join(import.meta.dir, '..', '..', 'src', 'dashboard', 'viewer.html'),
'utf-8',
)
const start = html.indexOf('function computeStats(tasks)')
const end = html.indexOf('function resolveStatus(task)', start)
expect(start).toBeGreaterThan(-1)
expect(end).toBeGreaterThan(start)
const block = html.slice(start, end)
const compute = new Function(
`
${block}
return computeStats([
{
graderResults: { agisdk_state_diff: { pass: true, score: 1 } },
metrics: { durationMs: 1000, steps: 4, toolCalls: 3, toolErrors: 0 }
},
{
graderResults: { agisdk_state_diff: { pass: false, score: 0 } },
metrics: { durationMs: 3000, steps: 8, toolCalls: 5, toolErrors: 2 }
}
]);
`,
) as () => unknown
return compute()
}
describe('R2 viewer artifact path compatibility', () => {
it('uses explicit manifest paths for new uploaded runs', async () => {
const resolvers = await loadViewerPathResolvers()
@@ -125,15 +95,6 @@ describe('R2 viewer artifact path compatibility', () => {
)
})
it('resolves manifest-level run report links', async () => {
const resolvers = await loadViewerPathResolvers()
expect(resolvers.reportUrl({ reportPath: 'report.html' })).toBe(
'runs/run-1/report.html',
)
expect(resolvers.reportUrl({})).toBe(null)
})
it('falls back to legacy inferred paths for old uploaded runs', async () => {
const resolvers = await loadViewerPathResolvers()
const task = { queryId: 'legacy-task' }
@@ -166,17 +127,4 @@ describe('R2 viewer artifact path compatibility', () => {
queryId: 'legacy-task',
})
})
it('computes run-level timing and tool metrics for the viewer', async () => {
expect(await runComputeStats()).toMatchObject({
total: 2,
passed: 1,
failed: 1,
avgDurationMs: 2000,
avgSteps: 6,
avgToolCalls: 4,
totalToolCalls: 8,
totalToolErrors: 2,
})
})
})

View File

@@ -1,159 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import { join } from 'node:path'
import {
DEFAULT_REPORT_MAX_TURNS,
DEFAULT_REPORT_MODEL,
generateEvalReport,
runClaudeCodeReportAgent,
} from '../../scripts/generate-report'
async function writeRunFixture(): Promise<string> {
const runDir = await mkdtemp(join(tmpdir(), 'eval-report-script-'))
const taskDir = join(runDir, 'agisdk-networkin-10')
await mkdir(join(taskDir, 'screenshots'), { recursive: true })
await writeFile(
join(runDir, 'summary.json'),
JSON.stringify({
total: 1,
completed: 1,
passRate: 0,
avgDurationMs: 1234,
}),
)
await writeFile(
join(taskDir, 'metadata.json'),
JSON.stringify({
query_id: 'agisdk-networkin-10',
dataset: 'agisdk-real',
query: 'Send a follow-up message starting with "Following up on".',
termination_reason: 'completed',
total_duration_ms: 1234,
total_steps: 2,
screenshot_count: 1,
final_answer: 'No app action was taken.',
errors: [],
warnings: [],
agent_config: { type: 'single', model: 'kimi' },
grader_results: {
agisdk_state_diff: {
score: 0,
pass: false,
reasoning: 'Some criteria failed',
details: {
per_criterion: [
{ passed: true, detail: 'message starts correctly' },
{ passed: false, detail: 'message was not sent' },
],
},
},
},
}),
)
await writeFile(
join(taskDir, 'messages.jsonl'),
[
JSON.stringify({
type: 'tool-input-available',
timestamp: '2026-04-30T00:00:00.000Z',
toolCallId: 'call-1',
toolName: 'memory_search',
input: { q: 'chat' },
}),
JSON.stringify({
type: 'tool-output-error',
timestamp: '2026-04-30T00:00:01.000Z',
toolCallId: 'call-1',
errorText: 'memory unavailable',
}),
].join('\n'),
)
await writeFile(join(taskDir, 'screenshots', '1.png'), 'png')
return runDir
}
describe('generate-report script', () => {
it('delegates report.html creation to Claude Code', async () => {
const runDir = await writeRunFixture()
const outputPath = join(runDir, 'report.html')
let prompt = ''
await generateEvalReport({
inputDir: runDir,
outputPath,
runAgent: async (invocation) => {
prompt = invocation.prompt
await writeFile(
invocation.outputPath,
'<!doctype html><h1>Claude-written report</h1>',
)
},
})
expect(await readFile(outputPath, 'utf-8')).toContain(
'Claude-written report',
)
expect(prompt).toContain('AGI SDK Random-10 Failure Report')
expect(prompt).toContain('summary.json')
expect(prompt).toContain('messages.jsonl')
expect(prompt).toContain('screenshots')
expect(prompt).toContain('Deterministic run metrics')
expect(prompt).toContain('"queryId": "agisdk-networkin-10"')
expect(prompt).toContain('"toolCalls": 1')
expect(prompt).toContain('"toolErrors": 1')
expect(prompt).toContain('Duration by task')
expect(prompt).toContain('Tool calls by task')
expect(prompt).toContain(outputPath)
})
it('fails when the Claude Code agent does not write the report', async () => {
const runDir = await writeRunFixture()
await expect(
generateEvalReport({
inputDir: runDir,
outputPath: join(runDir, 'missing-report.html'),
runAgent: async () => {},
}),
).rejects.toThrow('Report was not written')
})
it('runs Claude Code with Opus 4.6, full bypass, and bounded turns', async () => {
const runDir = await writeRunFixture()
const calls: unknown[] = []
await runClaudeCodeReportAgent(
{
inputDir: runDir,
outputPath: join(runDir, 'report.html'),
prompt: 'write the report',
},
{
query: async function* (call: unknown) {
calls.push(call)
yield { type: 'result', subtype: 'success', result: 'done' }
},
env: {
CLAUDE_CODE_OAUTH_TOKEN: 'token',
EVAL_R2_SECRET_ACCESS_KEY: 'secret',
HOME: '/tmp/home',
PATH: '/bin',
},
},
)
expect(calls).toHaveLength(1)
expect(calls[0]).toMatchObject({
prompt: 'write the report',
options: {
cwd: runDir,
model: DEFAULT_REPORT_MODEL,
maxTurns: DEFAULT_REPORT_MAX_TURNS,
permissionMode: 'bypassPermissions',
allowDangerouslySkipPermissions: true,
},
})
expect(JSON.stringify(calls[0])).not.toContain('secret')
})
})

View File

@@ -1,22 +1,19 @@
import { describe, expect, it } from 'bun:test'
import { mkdtemp, writeFile } from 'node:fs/promises'
import { tmpdir } from 'node:os'
import { join } from 'node:path'
import { adaptEvalConfigFile } from '../../src/suites/config-adapter'
describe('adaptEvalConfigFile', () => {
it('preserves browseros-agent-weekly AGI SDK config semantics', async () => {
it('preserves browseros-agent-weekly config semantics', async () => {
const adapted = await adaptEvalConfigFile(
'apps/eval/configs/legacy/browseros-agent-weekly.json',
)
expect(adapted.suite.id).toBe('browseros-agent-weekly')
expect(adapted.suite.dataset).toBe('../../data/agisdk-real.jsonl')
expect(adapted.suite.graders).toEqual(['agisdk_state_diff'])
expect(adapted.suite.workers).toBe(3)
expect(adapted.suite.dataset).toBe('../../data/webbench-2of4-50.jsonl')
expect(adapted.suite.graders).toEqual(['performance_grader'])
expect(adapted.suite.workers).toBe(10)
expect(adapted.suite.restartBrowserPerTask).toBe(true)
expect(adapted.suite.timeoutMs).toBe(1_800_000)
expect(adapted.evalConfig.num_workers).toBe(3)
expect(adapted.evalConfig.num_workers).toBe(10)
expect(adapted.evalConfig.browseros.server_url).toBe(
'http://127.0.0.1:9110',
)
@@ -37,61 +34,4 @@ describe('adaptEvalConfigFile', () => {
'secret-openrouter-value',
)
})
it('adapts BrowserOS AGI SDK comparison configs', async () => {
const kimi = await adaptEvalConfigFile(
'apps/eval/configs/legacy/browseros-agent-kimi-k2-5-agisdk-real.json',
)
const opus = await adaptEvalConfigFile(
'apps/eval/configs/legacy/browseros-agent-opus-4-6-agisdk-real.json',
)
expect(kimi.suite.id).toBe('browseros-agent-kimi-k2-5-agisdk-real')
expect(kimi.evalConfig.agent).toMatchObject({
type: 'single',
provider: 'openai-compatible',
model: 'moonshotai/kimi-k2.5',
})
expect(kimi.evalConfig.num_workers).toBe(3)
expect(opus.suite.id).toBe('browseros-agent-opus-4-6-agisdk-real')
expect(opus.evalConfig.agent).toMatchObject({
type: 'single',
provider: 'bedrock',
model: 'global.anthropic.claude-opus-4-6-v1',
region: 'AWS_REGION',
accessKeyId: 'AWS_ACCESS_KEY_ID',
secretAccessKey: 'AWS_SECRET_ACCESS_KEY',
})
expect(opus.evalConfig.num_workers).toBe(2)
})
it('adapts claude-code configs without provider credentials', async () => {
const dir = await mkdtemp(join(tmpdir(), 'claude-code-config-'))
const configPath = join(dir, 'claude-code-agisdk.json')
await writeFile(
configPath,
JSON.stringify({
agent: {
type: 'claude-code',
model: 'opus',
},
dataset: 'tasks.jsonl',
num_workers: 1,
restart_server_per_task: false,
browseros: {
server_url: 'http://127.0.0.1:9110',
headless: false,
},
}),
)
const adapted = await adaptEvalConfigFile(configPath, { env: {} })
expect(adapted.suite.agent).toEqual({ type: 'claude-code' })
expect(adapted.variant.agent).toMatchObject({
provider: 'claude-code',
model: 'opus',
})
})
})

View File

@@ -35,16 +35,6 @@ describe('EvalSuiteSchema', () => {
expect(parsed.success).toBe(false)
})
it('validates claude-code suites', () => {
const suite = EvalSuiteSchema.parse({
id: 'claude-code-agisdk',
dataset: 'data/agisdk-real.jsonl',
agent: { type: 'claude-code' },
})
expect(suite.agent.type).toBe('claude-code')
})
it('validates the daily AGISDK 10-task suite', async () => {
const loaded = await loadSuite(
'apps/eval/configs/suites/agisdk-daily-10.json',
@@ -99,40 +89,4 @@ describe('resolveVariant', () => {
}),
).toThrow('EVAL_AGENT_API_KEY')
})
it('resolves claude-code variants without model or API key requirements', () => {
const variant = resolveVariant({
variantId: 'claude-opus',
provider: 'claude-code',
model: 'opus',
env: {},
})
expect(variant.id).toBe('claude-opus')
expect(variant.agent).toEqual({
provider: 'claude-code',
model: 'opus',
})
expect(variant.publicMetadata.agent).toEqual({
provider: 'claude-code',
model: 'opus',
apiKeyConfigured: false,
})
const defaultVariant = resolveVariant({
provider: 'claude-code',
env: {},
})
expect(defaultVariant.id).toBe('claude-code')
expect(defaultVariant.agent).toEqual({
provider: 'claude-code',
model: '',
})
expect(defaultVariant.publicMetadata.agent).toEqual({
provider: 'claude-code',
model: 'default',
apiKeyConfigured: false,
})
})
})

View File

@@ -1,38 +0,0 @@
import { describe, expect, it } from 'bun:test'
import { resolveProviderConfig } from '../../src/utils/resolve-provider-config'
describe('resolveProviderConfig', () => {
it('resolves Bedrock region from environment variables', async () => {
const previous = {
AWS_REGION: process.env.AWS_REGION,
AWS_ACCESS_KEY_ID: process.env.AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY: process.env.AWS_SECRET_ACCESS_KEY,
}
process.env.AWS_REGION = 'us-west-2'
process.env.AWS_ACCESS_KEY_ID = 'test-access-key'
process.env.AWS_SECRET_ACCESS_KEY = 'test-secret-key'
try {
const resolved = await resolveProviderConfig({
provider: 'bedrock',
model: 'global.anthropic.claude-opus-4-6-v1',
region: 'AWS_REGION',
accessKeyId: 'AWS_ACCESS_KEY_ID',
secretAccessKey: 'AWS_SECRET_ACCESS_KEY',
})
expect(resolved).toMatchObject({
provider: 'bedrock',
model: 'global.anthropic.claude-opus-4-6-v1',
region: process.env.AWS_REGION,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
})
} finally {
for (const [key, value] of Object.entries(previous)) {
if (value === undefined) delete process.env[key]
else process.env[key] = value
}
}
})
})

View File

@@ -9,7 +9,6 @@ describe('buildViewerManifest', () => {
suiteId: 'agisdk-daily-10',
variantId: 'kimi',
uploadedAt: '2026-04-29T06:00:00.000Z',
reportPath: 'report.html',
summary: { total: 1, passRate: 0 },
tasks: [
{
@@ -19,13 +18,6 @@ describe('buildViewerManifest', () => {
status: 'completed',
durationMs: 353_000,
screenshotCount: 42,
metrics: {
durationMs: 353_000,
steps: 47,
screenshots: 42,
toolCalls: 19,
toolErrors: 2,
},
graderResults: {
agisdk_state_diff: {
score: 0,
@@ -40,7 +32,6 @@ describe('buildViewerManifest', () => {
const publishManifest: R2RunManifest = manifest
expect(publishManifest.schemaVersion).toBe(2)
expect(manifest.reportPath).toBe('report.html')
expect(manifest.tasks[0].paths.messages).toBe(
'tasks/agisdk-dashdish-4/messages.jsonl',
)
@@ -50,21 +41,6 @@ describe('buildViewerManifest', () => {
expect(manifest.tasks[0].paths.graderArtifacts).toBe(
'tasks/agisdk-dashdish-4/grader-artifacts',
)
expect(manifest.metrics).toMatchObject({
taskCount: 1,
avgDurationMs: 353_000,
avgSteps: 47,
avgToolCalls: 19,
totalToolCalls: 19,
totalToolErrors: 2,
})
expect(manifest.tasks[0].metrics).toEqual({
durationMs: 353_000,
steps: 47,
screenshots: 42,
toolCalls: 19,
toolErrors: 2,
})
expect(manifest.tasks[0].graderResults.agisdk_state_diff.details).toEqual({
missing: ['checkout item'],
})

Some files were not shown because too many files have changed in this diff Show More