mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-14 08:03:58 +00:00
Compare commits
2 Commits
dev
...
fix/proces
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ad716fca78 | ||
|
|
df8ff02b8f |
@@ -1,152 +0,0 @@
|
||||
---
|
||||
name: ask-internal
|
||||
description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
|
||||
allowed-tools: Bash, Read, Grep, Glob, Edit, Write
|
||||
---
|
||||
|
||||
# Ask Internal
|
||||
|
||||
Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
|
||||
|
||||
**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
|
||||
|
||||
## When to use
|
||||
|
||||
- "How do I reset my dogfood profile?"
|
||||
- "What's the deal with the OpenClaw VM startup?"
|
||||
- "Where do we configure release signing?"
|
||||
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
|
||||
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
|
||||
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER cite a file or line number you have not actually read.
|
||||
|
||||
## Voice rules
|
||||
|
||||
Apply the same voice rules as `document-internal` to the synthesized answer:
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences. Active voice. No em dashes.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- No filler intros.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
```bash
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
|
||||
echo ".internal-docs/ missing or empty. Submodule not configured?"
|
||||
exit 0
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1: Parse the question
|
||||
|
||||
Pull the keywords from the user's question. Drop stop words. Identify intent:
|
||||
|
||||
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
|
||||
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
|
||||
- **Free-form** ("anything about Y"): search all categories.
|
||||
|
||||
### Step 2: Multi-source search
|
||||
|
||||
Run grep in parallel across two sources.
|
||||
|
||||
**Internal docs:**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.md' '<keyword>' .internal-docs/
|
||||
```
|
||||
|
||||
Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
|
||||
|
||||
**Codebase (skip vendored Chromium and `node_modules`):**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
|
||||
--exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
|
||||
'<keyword>' packages/ scripts/ .config/ .github/
|
||||
```
|
||||
|
||||
Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
|
||||
|
||||
### Step 3: Synthesize answer
|
||||
|
||||
Structure the response:
|
||||
|
||||
1. **Direct answer.** First sentence answers the question. No preamble.
|
||||
2. **Steps if applicable.** Numbered list with exact commands.
|
||||
3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
|
||||
|
||||
If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
|
||||
|
||||
### Step 4: Offer execution (only if commands surfaced)
|
||||
|
||||
If Step 3 produced executable commands the user could run, ask:
|
||||
|
||||
> Run these for you? (y / n / dry-run)
|
||||
|
||||
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
|
||||
- **n:** Skip. Done.
|
||||
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
|
||||
|
||||
### Step 5: Doc-not-found path
|
||||
|
||||
If Step 2 returned nothing useful (no doc hits AND no clear code answer):
|
||||
|
||||
1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
|
||||
2. Ask: "Draft a new doc and open a PR to internal-docs?"
|
||||
3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
|
||||
|
||||
### Step 6: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — answer delivered, citations verified.
|
||||
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
|
||||
- **BLOCKED** — submodule missing or other pre-flight failure.
|
||||
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
|
||||
|
||||
## Citation discipline
|
||||
|
||||
Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
|
||||
|
||||
If a doc says one thing and the code says another, surface the conflict explicitly:
|
||||
|
||||
> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Skimming and then citing**
|
||||
- **Problem:** Citation points to a line that doesn't actually contain the claim.
|
||||
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
|
||||
|
||||
**Executing without per-command confirmation for mutations**
|
||||
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
|
||||
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
|
||||
|
||||
**Searching only docs, not code**
|
||||
- **Problem:** Doc says X but code does Y; answer is wrong.
|
||||
- **Fix:** Always grep both sources in Step 2.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Cite a file:line you haven't read.
|
||||
- Run mutations without per-command confirmation.
|
||||
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before any search.
|
||||
- Reconcile doc vs code conflicts in the answer, don't hide them.
|
||||
- Plain "no doc covers this" when grep is empty — never invent.
|
||||
@@ -1,208 +0,0 @@
|
||||
---
|
||||
name: document-internal
|
||||
description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
|
||||
allowed-tools: Bash, Read, Write, Edit, Grep, Glob
|
||||
---
|
||||
|
||||
# Document Internal
|
||||
|
||||
Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
|
||||
|
||||
**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
|
||||
|
||||
## When to use
|
||||
|
||||
After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
|
||||
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
|
||||
- NEVER fabricate filler content for empty template sections. Empty stays empty.
|
||||
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
|
||||
|
||||
## Voice rules — enforced by Step 4
|
||||
|
||||
The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
|
||||
|
||||
- Lead with the point. First sentence answers "what is this?"
|
||||
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
|
||||
- Short sentences. Average <20 words. No deeply nested clauses.
|
||||
- Active voice. "X does Y" not "Y is done by X".
|
||||
- No em dashes. Use commas, periods, or rephrase.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
|
||||
- No filler intros ("This document describes..."). Start with the substance.
|
||||
- Empty sections stay empty. Do not write "N/A" or fabricate content.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
Bail with a clear message on any failure.
|
||||
|
||||
```bash
|
||||
# Submodule must be initialized
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
|
||||
|
||||
# Must be on a feature branch
|
||||
BRANCH=$(git branch --show-current)
|
||||
if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
|
||||
echo "On $BRANCH. Run from a feature branch."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Determine base branch (default: dev for this repo, fall back to main).
|
||||
# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
|
||||
BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
|
||||
|
||||
# Gather context
|
||||
git log "$BASE..HEAD" --oneline
|
||||
git diff "$BASE...HEAD" --stat
|
||||
gh pr view --json body -q .body 2>/dev/null # may be empty if no PR yet
|
||||
```
|
||||
|
||||
### Step 1: Identify the doc
|
||||
|
||||
Ask the user for three things in one prompt:
|
||||
|
||||
1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
|
||||
2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
|
||||
3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
|
||||
|
||||
### Step 2: Decision brief — four sharp questions
|
||||
|
||||
Ask one question at a time. Each answer constrains the next. These force compression before drafting.
|
||||
|
||||
1. "In one sentence: what can someone now DO that they could not before?"
|
||||
2. "What is the one design decision a future engineer needs to know?"
|
||||
3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
|
||||
4. "Any sharp edges or gotchas? (or 'none')"
|
||||
|
||||
Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
|
||||
|
||||
### Step 3: Draft from the template
|
||||
|
||||
Read the matching template from `.internal-docs/_templates/`:
|
||||
|
||||
- `feature` → `feature-note.md`
|
||||
- `architecture` → `architecture-note.md`
|
||||
- `design` → `design-spec.md`
|
||||
|
||||
If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
|
||||
|
||||
Generate the 1-pager from the template, the four answers, and the diff context.
|
||||
|
||||
### Step 4: Voice self-check
|
||||
|
||||
Scan the draft for violations:
|
||||
|
||||
- Em dash present (`—`).
|
||||
- Any banned word from the list.
|
||||
- Average sentence length > 20 words.
|
||||
- Body line count > 60 (feature notes only — architecture/design have no cap).
|
||||
|
||||
If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
|
||||
|
||||
If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
|
||||
|
||||
### Step 5: Show + iterate
|
||||
|
||||
Print the full draft. Ask:
|
||||
|
||||
> Edit needed? Paste any changes, or say "looks good".
|
||||
|
||||
Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
|
||||
|
||||
### Step 6: Open PR to internal-docs
|
||||
|
||||
Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
|
||||
|
||||
```bash
|
||||
TMP=$(mktemp -d)
|
||||
trap 'rm -rf "$TMP"' EXIT # cleans up even if any step below fails
|
||||
git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
|
||||
cd "$TMP"
|
||||
git checkout -b "docs/<slug>"
|
||||
|
||||
# Write the doc
|
||||
mkdir -p "<type>" # features, architecture, designs, or setup
|
||||
cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
|
||||
<draft content>
|
||||
DOC
|
||||
|
||||
# Update the root README index — insert one line under the matching section
|
||||
# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
|
||||
|
||||
git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
|
||||
git commit -m "docs(<type>): <slug>"
|
||||
git push -u origin "docs/<slug>"
|
||||
|
||||
PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
|
||||
--head "docs/<slug>" \
|
||||
--title "docs(<type>): <slug>" \
|
||||
--body "$(cat <<'BODY'
|
||||
## Summary
|
||||
<one-line of what this doc covers>
|
||||
|
||||
## Source
|
||||
- BrowserOS branch: <branch>
|
||||
- Related PR: <#NNN if any>
|
||||
BODY
|
||||
)")
|
||||
|
||||
cd -
|
||||
echo "PR opened: $PR_URL"
|
||||
# trap above cleans up $TMP on EXIT
|
||||
```
|
||||
|
||||
If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
|
||||
|
||||
### Step 7: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
|
||||
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
|
||||
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
|
||||
|
||||
## Doc type defaults
|
||||
|
||||
| Branch pattern | Default doc type | Default location |
|
||||
|----------------|------------------|------------------|
|
||||
| `feat/*` | feature | `features/` |
|
||||
| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
|
||||
| `rfc/*` or `design/*` | design | `designs/` |
|
||||
| Otherwise | ask | ask |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Drafting before asking the four questions**
|
||||
- **Problem:** Output is generic filler that says nothing concrete.
|
||||
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
|
||||
|
||||
**Touching `.internal-docs/` directly**
|
||||
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
|
||||
- **Fix:** Always use the tmp clone in Step 6.
|
||||
|
||||
**Skipping voice check on user edits**
|
||||
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
|
||||
- **Fix:** Re-run Step 4 after every user edit.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Push to `internal-docs/main`. Always branch + PR.
|
||||
- Modify the OSS repo's `.gitmodules` or submodule pointer.
|
||||
- Fabricate content for empty template sections.
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before doing any work.
|
||||
- One-pager rule for feature notes (60-line body cap).
|
||||
- File:line citations when referencing code.
|
||||
@@ -1,51 +0,0 @@
|
||||
# BrowserOS Internal Docs
|
||||
|
||||
Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
|
||||
|
||||
If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
|
||||
|
||||
## How to find what you need
|
||||
|
||||
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
|
||||
- Recently shipped feature → look in [`features/`](features/)
|
||||
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
|
||||
- A design decision or RFC → look in [`designs/`](designs/)
|
||||
|
||||
Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
|
||||
|
||||
## How to add a doc
|
||||
|
||||
Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
|
||||
|
||||
## Index
|
||||
|
||||
### Setup
|
||||
<!-- one line per setup runbook: -->
|
||||
<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
|
||||
|
||||
### Features
|
||||
<!-- one line per shipped feature, newest first: -->
|
||||
<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
|
||||
|
||||
### Architecture
|
||||
<!-- one line per cross-cutting subsystem: -->
|
||||
<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
|
||||
|
||||
### Designs
|
||||
<!-- one line per design spec, newest first: -->
|
||||
<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
|
||||
|
||||
## Templates
|
||||
|
||||
When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
|
||||
|
||||
## Voice
|
||||
|
||||
Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences, active voice, no em dashes.
|
||||
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
|
||||
- Empty sections stay empty. Do not write "N/A" or fake content.
|
||||
- Feature notes target one screen, body 60 lines max.
|
||||
@@ -1,31 +0,0 @@
|
||||
---
|
||||
title: <subsystem name>
|
||||
owner: <github handle>
|
||||
status: current | deprecated
|
||||
date: YYYY-MM-DD
|
||||
related-features: [feature-slug-1, feature-slug-2]
|
||||
---
|
||||
|
||||
# <subsystem name>
|
||||
|
||||
## What this subsystem does
|
||||
<1-2 paragraphs. The top-level responsibility. Boundaries.>
|
||||
|
||||
## Architecture
|
||||
<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
|
||||
|
||||
## Constraints
|
||||
<Hard rules the design enforces. "X must never call Y" type statements.>
|
||||
|
||||
## Decisions made
|
||||
<Numbered list of non-obvious decisions and the reason for each.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — role
|
||||
- `path/to/dir/` — what lives here
|
||||
|
||||
## How to evolve this
|
||||
<Where to add things. Which tests to expect to update. What NOT to touch.>
|
||||
|
||||
## Open questions
|
||||
<What is still being figured out. Empty if none.>
|
||||
@@ -1,34 +0,0 @@
|
||||
---
|
||||
title: <design name>
|
||||
owner: <github handle>
|
||||
status: proposed | accepted | rejected | superseded
|
||||
date: YYYY-MM-DD
|
||||
supersedes: <design-slug or none>
|
||||
---
|
||||
|
||||
# <design name>
|
||||
|
||||
## Goal
|
||||
<2-4 sentences. What this design is trying to accomplish.>
|
||||
|
||||
## Context
|
||||
<1-2 paragraphs. The current state, what is failing, why this needs to change.>
|
||||
|
||||
## Selected Approach
|
||||
<The chosen design at a high level. Architecture, components, data flow.>
|
||||
|
||||
## Alternatives Considered
|
||||
### 1. <name>
|
||||
<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
|
||||
|
||||
### 2. <name>
|
||||
<Same shape.>
|
||||
|
||||
## Out of Scope
|
||||
<What this design does NOT cover. Defer references.>
|
||||
|
||||
## Rollout
|
||||
<Numbered steps from "nothing exists" to "fully shipped".>
|
||||
|
||||
## Open Questions
|
||||
<Resolved during design? Empty. Unresolved? List with owner.>
|
||||
@@ -1,29 +0,0 @@
|
||||
---
|
||||
title: <feature name>
|
||||
owner: <github handle>
|
||||
status: shipped | wip | deprecated
|
||||
date: YYYY-MM-DD
|
||||
prs: ["#NNN"]
|
||||
tags: [agent, browser, mcp]
|
||||
---
|
||||
|
||||
# <feature name>
|
||||
|
||||
## What it does
|
||||
<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
|
||||
|
||||
## Why we built it
|
||||
<1-2 sentences. Motivation. What pain it removed or what unlocked.>
|
||||
|
||||
## How it works
|
||||
<3-6 sentences. The flow at a high level. Name the key files.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — what it does
|
||||
- `path/to/other.ts` — what it does
|
||||
|
||||
## How to run / test it locally
|
||||
<bullet list of commands. Empty section if N/A — do not fake.>
|
||||
|
||||
## Gotchas
|
||||
<known sharp edges. "If you see X, that's why." Empty if N/A.>
|
||||
53
.github/workflows/eval-weekly.yml
vendored
53
.github/workflows/eval-weekly.yml
vendored
@@ -44,19 +44,6 @@ jobs:
|
||||
working-directory: packages/browseros-agent
|
||||
run: bun install --ignore-scripts
|
||||
|
||||
- name: Install Claude Code CLI
|
||||
working-directory: packages/browseros-agent/apps/eval
|
||||
env:
|
||||
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
|
||||
run: |
|
||||
if bun -e "const config = await Bun.file(process.env.EVAL_CONFIG).json(); process.exit(config.agent?.type === 'claude-code' ? 0 : 1)"; then
|
||||
npm install -g @anthropic-ai/claude-code@2.1.119
|
||||
echo "Claude Code CLI installed at $(command -v claude)"
|
||||
claude --version
|
||||
else
|
||||
echo "Eval config does not use Claude Code; skipping Claude Code CLI install"
|
||||
fi
|
||||
|
||||
- name: Install Python eval dependencies
|
||||
# agisdk pinned so silent upstream releases can't shift task definitions
|
||||
# or grader behavior. Bump intentionally with a documented re-baseline.
|
||||
@@ -80,11 +67,13 @@ jobs:
|
||||
env:
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
|
||||
AWS_REGION: ${{ secrets.AWS_REGION || 'us-west-2' }}
|
||||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
|
||||
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
|
||||
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
|
||||
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
|
||||
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
|
||||
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
|
||||
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
|
||||
BROWSEROS_BINARY: /usr/bin/browseros
|
||||
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
|
||||
# OpenClaw container runtime is macOS-only; opt the Linux runner
|
||||
@@ -93,35 +82,7 @@ jobs:
|
||||
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
|
||||
run: |
|
||||
echo "Running eval with config: $EVAL_CONFIG"
|
||||
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG"
|
||||
# Capture the run directory so report.html can be generated before the R2 publish step.
|
||||
SUMMARY_PATH="$(find results -name summary.json -type f -print | sort | tail -n 1)"
|
||||
if [ -z "$SUMMARY_PATH" ]; then
|
||||
echo "No eval run summary found"
|
||||
exit 1
|
||||
fi
|
||||
RUN_DIR="$(dirname "$SUMMARY_PATH")"
|
||||
echo "EVAL_RUN_DIR=$RUN_DIR" >> "$GITHUB_ENV"
|
||||
|
||||
- name: Generate run analysis report
|
||||
if: success()
|
||||
working-directory: packages/browseros-agent/apps/eval
|
||||
env:
|
||||
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
|
||||
run: |
|
||||
echo "Generating run report for $EVAL_RUN_DIR"
|
||||
bun scripts/generate-report.ts --input "$EVAL_RUN_DIR" --output "$EVAL_RUN_DIR/report.html"
|
||||
|
||||
- name: Publish eval run to R2
|
||||
if: success()
|
||||
working-directory: packages/browseros-agent/apps/eval
|
||||
env:
|
||||
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
|
||||
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
|
||||
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
|
||||
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
|
||||
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
|
||||
run: bun run src/index.ts publish --run "$EVAL_RUN_DIR" --target r2
|
||||
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
|
||||
|
||||
- name: Generate trend report
|
||||
if: success()
|
||||
@@ -136,7 +97,7 @@ jobs:
|
||||
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
|
||||
run: bun apps/eval/scripts/weekly-report.ts /tmp/eval-report.html
|
||||
|
||||
- name: Upload trend report as artifact
|
||||
- name: Upload report as artifact
|
||||
if: success()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
|
||||
62
.github/workflows/sync-internal-docs.yml
vendored
62
.github/workflows/sync-internal-docs.yml
vendored
@@ -1,62 +0,0 @@
|
||||
name: Sync internal-docs submodule
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 */4 * * *'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
sync:
|
||||
name: Bump internal-docs submodule pointer on dev
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
pull-requests: write
|
||||
steps:
|
||||
- name: Rewrite SSH submodule URL to HTTPS-with-token
|
||||
env:
|
||||
TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
run: |
|
||||
git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
submodules: true
|
||||
ref: dev
|
||||
fetch-depth: 50
|
||||
|
||||
- name: Open auto-merge PR if internal-docs has new commits
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Skip if submodule not yet configured (handoff window before someone adds it)
|
||||
if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
|
||||
echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
git submodule update --remote --merge .internal-docs
|
||||
|
||||
if git diff --quiet .internal-docs; then
|
||||
echo "No internal-docs changes to sync."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
|
||||
git config user.name "browseros-bot"
|
||||
git config user.email "bot@browseros.ai"
|
||||
git checkout -b "$BRANCH"
|
||||
git add .internal-docs
|
||||
git commit -m "chore: sync internal-docs submodule"
|
||||
git push -u origin "$BRANCH"
|
||||
|
||||
PR_URL=$(gh pr create \
|
||||
--base dev \
|
||||
--head "$BRANCH" \
|
||||
--title "chore: sync internal-docs submodule" \
|
||||
--body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
|
||||
|
||||
gh pr merge "$PR_URL" --auto --squash --delete-branch
|
||||
4
.gitmodules
vendored
4
.gitmodules
vendored
@@ -1,4 +0,0 @@
|
||||
[submodule ".internal-docs"]
|
||||
path = .internal-docs
|
||||
url = git@github.com:browseros-ai/internal-docs.git
|
||||
branch = main
|
||||
|
||||
Submodule .internal-docs deleted from 590799ae1c
@@ -1,44 +1,186 @@
|
||||
import { ArrowLeft, PanelRight } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
|
||||
import { ArrowLeft, Bot, Home } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef } from 'react'
|
||||
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import type {
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import {
|
||||
cancelHarnessTurn,
|
||||
useAgentAdapters,
|
||||
useEnqueueHarnessMessage,
|
||||
useHarnessAgents,
|
||||
useRemoveHarnessQueuedMessage,
|
||||
useUpdateHarnessAgent,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { type ProducedFilesRailGroup, useAgentOutputs } from '@/lib/agent-files'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AgentRail } from './AgentRail'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import {
|
||||
OutputsRail,
|
||||
useOutputsRailOpen,
|
||||
} from './agent-conversation.outputs-rail'
|
||||
type AgentEntry,
|
||||
getModelDisplayName,
|
||||
} from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import { ClawChat } from './ClawChat'
|
||||
import { ConversationHeader } from './ConversationHeader'
|
||||
import { ConversationInput } from './ConversationInput'
|
||||
import {
|
||||
buildChatHistoryFromClawMessages,
|
||||
filterTurnsPersistedInHistory,
|
||||
flattenHistoryPages,
|
||||
mapHistoryToProducedFilesGroups,
|
||||
selectStripOnlyTurns,
|
||||
} from './claw-chat-types'
|
||||
import { consumePendingInitialMessage } from './pending-initial-message'
|
||||
import { QueuePanel } from './QueuePanel'
|
||||
import { useAgentConversation } from './useAgentConversation'
|
||||
import { useHarnessChatHistory } from './useHarnessChatHistory'
|
||||
|
||||
function StatusBadge({ status }: { status: string }) {
|
||||
return (
|
||||
<div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
|
||||
<span
|
||||
className={cn(
|
||||
'size-1.5 rounded-full',
|
||||
status === 'Working on your request'
|
||||
? 'bg-amber-500'
|
||||
: status === 'Ready'
|
||||
? 'bg-emerald-500'
|
||||
: status === 'Offline'
|
||||
? 'bg-muted-foreground/50'
|
||||
: 'bg-[var(--accent-orange)]',
|
||||
)}
|
||||
/>
|
||||
<span>{status}</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentIdentity({
|
||||
name,
|
||||
meta,
|
||||
className,
|
||||
}: {
|
||||
name: string
|
||||
meta: string
|
||||
className?: string
|
||||
}) {
|
||||
return (
|
||||
<div className={cn('min-w-0', className)}>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">{name}</div>
|
||||
<div className="truncate text-muted-foreground text-xs leading-5">
|
||||
{meta}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function ConversationHeader({
|
||||
agentName,
|
||||
agentMeta,
|
||||
status,
|
||||
backLabel,
|
||||
backTarget,
|
||||
onGoHome,
|
||||
}: {
|
||||
agentName: string
|
||||
agentMeta: string
|
||||
status: string
|
||||
backLabel: string
|
||||
backTarget: 'home' | 'page'
|
||||
onGoHome: () => void
|
||||
}) {
|
||||
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
|
||||
|
||||
return (
|
||||
<div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 rounded-xl lg:hidden"
|
||||
title={backLabel}
|
||||
>
|
||||
<BackIcon className="size-4" />
|
||||
</Button>
|
||||
<div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<Bot className="size-4" />
|
||||
</div>
|
||||
<AgentIdentity name={agentName} meta={agentMeta} />
|
||||
</div>
|
||||
|
||||
<StatusBadge status={status} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
|
||||
return (
|
||||
<div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 rounded-xl"
|
||||
title="Back to home"
|
||||
>
|
||||
<ArrowLeft className="size-4" />
|
||||
</Button>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">
|
||||
Agents
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentRailList({
|
||||
activeAgentId,
|
||||
agents,
|
||||
onSelectAgent,
|
||||
}: {
|
||||
activeAgentId: string
|
||||
agents: AgentEntry[]
|
||||
onSelectAgent: (entry: AgentEntry) => void
|
||||
}) {
|
||||
return (
|
||||
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
|
||||
<div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
|
||||
{agents.map((entry) => {
|
||||
const active = entry.agentId === activeAgentId
|
||||
const modelName = getAgentEntryMeta(entry)
|
||||
|
||||
return (
|
||||
<button
|
||||
key={entry.agentId}
|
||||
type="button"
|
||||
onClick={() => onSelectAgent(entry)}
|
||||
className={cn(
|
||||
'w-full rounded-2xl border px-3 py-3 text-left transition-all',
|
||||
active
|
||||
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
|
||||
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-center gap-3">
|
||||
<div
|
||||
className={cn(
|
||||
'flex size-9 items-center justify-center rounded-xl',
|
||||
active
|
||||
? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
|
||||
: 'bg-muted text-muted-foreground',
|
||||
)}
|
||||
>
|
||||
<Bot className="size-4" />
|
||||
</div>
|
||||
<AgentIdentity name={entry.name} meta={modelName} />
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</aside>
|
||||
)
|
||||
}
|
||||
|
||||
function getAgentEntryMeta(agent: AgentEntry | undefined): string {
|
||||
if (agent?.source === 'agent-harness') {
|
||||
return getModelDisplayName(agent.model) ?? 'ACP agent'
|
||||
}
|
||||
return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
|
||||
}
|
||||
|
||||
function AgentConversationController({
|
||||
agentId,
|
||||
initialMessage,
|
||||
@@ -46,7 +188,6 @@ function AgentConversationController({
|
||||
agents,
|
||||
agentPathPrefix,
|
||||
createAgentPath,
|
||||
onOpenOutputsRail,
|
||||
}: {
|
||||
agentId: string
|
||||
initialMessage: string | null
|
||||
@@ -54,7 +195,6 @@ function AgentConversationController({
|
||||
agents: AgentEntry[]
|
||||
agentPathPrefix: string
|
||||
createAgentPath: string
|
||||
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
|
||||
}) {
|
||||
const navigate = useNavigate()
|
||||
const initialMessageSentRef = useRef<string | null>(null)
|
||||
@@ -86,15 +226,6 @@ function AgentConversationController({
|
||||
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
|
||||
const queue = harnessAgent?.queue ?? []
|
||||
const activeTurnId = harnessAgent?.activeTurnId ?? null
|
||||
const isOpenClawAgent = harnessAgent?.adapter === 'openclaw'
|
||||
|
||||
// Used to surface produced-files strips on a fresh page load
|
||||
// when there's no optimistic turn to carry the data. Disabled
|
||||
// for non-openclaw adapters since they don't attribute files.
|
||||
const { groups: agentOutputGroups } = useAgentOutputs(
|
||||
agentId,
|
||||
isOpenClawAgent,
|
||||
)
|
||||
|
||||
const { turns, streaming, send } = useAgentConversation(agentId, {
|
||||
runtime: 'agent-harness',
|
||||
@@ -119,44 +250,6 @@ function AgentConversationController({
|
||||
() => filterTurnsPersistedInHistory(turns, historyMessages),
|
||||
[historyMessages, turns],
|
||||
)
|
||||
// Persisted turns that still need to surface their FileCardStrip
|
||||
// — history items don't carry produced-files data, so without
|
||||
// these the strip would vanish on history reload.
|
||||
const stripOnlyTurns = useMemo(
|
||||
() => selectStripOnlyTurns(turns, historyMessages),
|
||||
[historyMessages, turns],
|
||||
)
|
||||
// Two outputs from the per-turn matcher:
|
||||
// - filesByAssistantId → strip rendered directly under the
|
||||
// matching assistant history bubble.
|
||||
// - tailUnmatched → groups with no history pair (orphans);
|
||||
// rendered at the conversation tail.
|
||||
// Both are filtered to exclude turnIds already covered by a
|
||||
// live or strip-only optimistic turn (those carry their own
|
||||
// strip and history hasn't reloaded yet).
|
||||
const { filesByAssistantId, tailStripGroups } = useMemo(() => {
|
||||
if (!isOpenClawAgent) {
|
||||
return {
|
||||
filesByAssistantId: new Map<string, ProducedFilesRailGroup>(),
|
||||
tailStripGroups: [] as ProducedFilesRailGroup[],
|
||||
}
|
||||
}
|
||||
const coveredTurnIds = new Set<string>()
|
||||
for (const turn of turns) {
|
||||
if (turn.turnId) coveredTurnIds.add(turn.turnId)
|
||||
}
|
||||
const eligibleGroups = agentOutputGroups.filter(
|
||||
(group) => !coveredTurnIds.has(group.turnId),
|
||||
)
|
||||
const { byAssistantMessageId, unmatched } = mapHistoryToProducedFilesGroups(
|
||||
historyMessages,
|
||||
eligibleGroups,
|
||||
)
|
||||
return {
|
||||
filesByAssistantId: byAssistantMessageId,
|
||||
tailStripGroups: unmatched,
|
||||
}
|
||||
}, [agentOutputGroups, isOpenClawAgent, historyMessages, turns])
|
||||
onInitialMessageConsumedRef.current = onInitialMessageConsumed
|
||||
|
||||
const disabled = !agent
|
||||
@@ -171,73 +264,42 @@ function AgentConversationController({
|
||||
sendRef.current = send
|
||||
|
||||
useEffect(() => {
|
||||
if (disabled || !historyReady) return
|
||||
|
||||
// Registry-first: when the user submitted at /home with
|
||||
// attachments, the rich payload is here. URL `?q=` may also be
|
||||
// present and is the text-only fallback path; the registry wins
|
||||
// when both exist because it carries the binary attachments
|
||||
// alongside the text.
|
||||
const pending = consumePendingInitialMessage(agentId)
|
||||
if (pending) {
|
||||
// Mark the dedup ref so the text-only branch below doesn't
|
||||
// re-fire on the same render.
|
||||
if (initialMessageKey) {
|
||||
initialMessageSentRef.current = initialMessageKey
|
||||
}
|
||||
onInitialMessageConsumedRef.current()
|
||||
void sendRef.current({
|
||||
text: pending.text,
|
||||
attachments: pending.attachments.map((a) => a.payload),
|
||||
attachmentPreviews: pending.attachments.map((a) => ({
|
||||
id: a.id,
|
||||
kind: a.kind,
|
||||
mediaType: a.mediaType,
|
||||
name: a.name,
|
||||
dataUrl: a.dataUrl,
|
||||
})),
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
const query = initialMessage?.trim()
|
||||
if (!initialMessageKey) {
|
||||
// Reset is safe even on the post-registry-fire re-run: consume
|
||||
// is destructive, so the registry is already drained — there's
|
||||
// nothing left for a third run to re-send.
|
||||
initialMessageSentRef.current = null
|
||||
return
|
||||
}
|
||||
|
||||
if (!query || initialMessageSentRef.current === initialMessageKey) {
|
||||
if (
|
||||
!query ||
|
||||
initialMessageSentRef.current === initialMessageKey ||
|
||||
disabled ||
|
||||
!historyReady
|
||||
) {
|
||||
return
|
||||
}
|
||||
|
||||
initialMessageSentRef.current = initialMessageKey
|
||||
onInitialMessageConsumedRef.current()
|
||||
void sendRef.current({ text: query })
|
||||
}, [agentId, disabled, historyReady, initialMessage, initialMessageKey])
|
||||
}, [disabled, historyReady, initialMessage, initialMessageKey])
|
||||
|
||||
const handleSelectAgent = (entry: AgentEntry) => {
|
||||
navigate(`${agentPathPrefix}/${entry.agentId}`)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex min-h-0 flex-1 flex-col overflow-hidden">
|
||||
<div className="flex min-h-0 flex-col overflow-hidden">
|
||||
<ClawChat
|
||||
agentName={agentName}
|
||||
historyMessages={historyMessages}
|
||||
turns={visibleTurns}
|
||||
stripOnlyTurns={stripOnlyTurns}
|
||||
filesByAssistantId={filesByAssistantId}
|
||||
tailStripGroups={tailStripGroups}
|
||||
streaming={streaming}
|
||||
isInitialLoading={harnessHistoryQuery.isLoading}
|
||||
error={error}
|
||||
hasNextPage={false}
|
||||
isFetchingNextPage={false}
|
||||
onFetchNextPage={() => {}}
|
||||
onOpenOutputsRail={onOpenOutputsRail}
|
||||
onRetry={() => {
|
||||
void harnessHistoryQuery.refetch()
|
||||
}}
|
||||
@@ -306,22 +368,6 @@ interface AgentCommandConversationProps {
|
||||
createAgentPath?: string
|
||||
}
|
||||
|
||||
function inferAdapterFromEntry(
|
||||
entry: AgentEntry | undefined,
|
||||
): HarnessAgentAdapter | 'unknown' {
|
||||
if (!entry) return 'unknown'
|
||||
if (entry.source === 'agent-harness') {
|
||||
// Harness entries don't carry the adapter on AgentEntry; the rail
|
||||
// / header read the harness record directly. This branch only runs
|
||||
// before the harness query resolves, so 'unknown' is correct — the
|
||||
// tile's bot fallback renders until data arrives.
|
||||
return 'unknown'
|
||||
}
|
||||
// OpenClaw-only entries (no harness shadow) are deprecated in
|
||||
// practice but the rail still tolerates them.
|
||||
return 'openclaw'
|
||||
}
|
||||
|
||||
export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
|
||||
variant = 'command',
|
||||
backPath = '/home',
|
||||
@@ -332,191 +378,60 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
|
||||
const [searchParams, setSearchParams] = useSearchParams()
|
||||
const navigate = useNavigate()
|
||||
const { agents } = useAgentCommandData()
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const { adapters } = useAgentAdapters()
|
||||
const updateAgent = useUpdateHarnessAgent()
|
||||
|
||||
const shouldRedirectHome = !agentId
|
||||
const resolvedAgentId = agentId ?? ''
|
||||
const harnessAgent = harnessAgents.find(
|
||||
(entry) => entry.id === resolvedAgentId,
|
||||
)
|
||||
const entry = agents.find((item) => item.agentId === resolvedAgentId)
|
||||
const fallbackName = entry?.name || resolvedAgentId || 'Agent'
|
||||
const fallbackAdapter = inferAdapterFromEntry(entry)
|
||||
const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
|
||||
const agentName = agent?.name || resolvedAgentId || 'Agent'
|
||||
const agentMeta = getAgentEntryMeta(agent)
|
||||
const initialMessage = searchParams.get('q')
|
||||
const isPageVariant = variant === 'page'
|
||||
const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'
|
||||
|
||||
const isOpenClawAgent = harnessAgent?.adapter === 'openclaw'
|
||||
const [outputsRailOpen, setOutputsRailOpen] =
|
||||
useOutputsRailOpen(resolvedAgentId)
|
||||
const railVisible = isOpenClawAgent && outputsRailOpen
|
||||
|
||||
// Deep-link target for the rail. Set when (a) the user clicks
|
||||
// View / +N on an inline file-card strip, or (b) an external nav
|
||||
// arrived with `?outputsTurn=<turnId>`. Cleared by the rail
|
||||
// itself once it has scrolled to + expanded the matching group.
|
||||
const urlOutputsTurn = searchParams.get('outputsTurn')
|
||||
const [focusTurnId, setFocusTurnId] = useState<string | null>(urlOutputsTurn)
|
||||
// If the URL param flips while we're already on this agent, sync.
|
||||
useEffect(() => {
|
||||
if (!urlOutputsTurn) return
|
||||
setFocusTurnId(urlOutputsTurn)
|
||||
if (isOpenClawAgent) setOutputsRailOpen(true)
|
||||
}, [urlOutputsTurn, isOpenClawAgent, setOutputsRailOpen])
|
||||
|
||||
const handleOpenOutputsRail = (turnId?: string | null) => {
|
||||
if (!isOpenClawAgent) return
|
||||
setOutputsRailOpen(true)
|
||||
setFocusTurnId(turnId ?? null)
|
||||
}
|
||||
const handleFocusTurnConsumed = () => {
|
||||
setFocusTurnId(null)
|
||||
if (urlOutputsTurn) {
|
||||
// Drop the URL param so a back-nav doesn't re-trigger the
|
||||
// scroll. `replace: true` keeps history clean.
|
||||
setSearchParams(
|
||||
(prev) => {
|
||||
const next = new URLSearchParams(prev)
|
||||
next.delete('outputsTurn')
|
||||
return next
|
||||
},
|
||||
{ replace: true },
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
|
||||
const adapterId = harnessAgent?.adapter
|
||||
if (!adapterId) return null
|
||||
const descriptor = adapters.find((item) => item.id === adapterId)
|
||||
if (!descriptor?.health) return null
|
||||
return {
|
||||
healthy: descriptor.health.healthy,
|
||||
reason: descriptor.health.reason,
|
||||
}
|
||||
}, [adapters, harnessAgent?.adapter])
|
||||
|
||||
if (shouldRedirectHome) {
|
||||
return <Navigate to="/home" replace />
|
||||
}
|
||||
|
||||
const handleSelectHarnessAgent = (target: HarnessAgent) => {
|
||||
navigate(`${agentPathPrefix}/${target.id}`)
|
||||
const handleSelectAgent = (entry: AgentEntry) => {
|
||||
navigate(`${agentPathPrefix}/${entry.agentId}`)
|
||||
}
|
||||
|
||||
const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
|
||||
if (!target) return
|
||||
updateAgent.mutate({
|
||||
agentId: target.id,
|
||||
patch: { pinned: next },
|
||||
})
|
||||
}
|
||||
// Every visible agent runs through the harness now, so per-agent
|
||||
// runtime status doesn't gate chat the way OpenClaw's legacy
|
||||
// gateway lifecycle did. Show "Ready" once the agent record is
|
||||
// resolved from the rail, "Setup" otherwise.
|
||||
const statusCopy = agent ? 'Ready' : 'Setup'
|
||||
|
||||
return (
|
||||
<div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
|
||||
<div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
|
||||
{/* Shared top band — the rail's "Agents" header and the chat
|
||||
header live on one row so they're aligned by construction. */}
|
||||
<div className="flex shrink-0 items-stretch border-border/50 border-b">
|
||||
<div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={() => navigate(backPath)}
|
||||
className="size-8 rounded-xl"
|
||||
title="Back to home"
|
||||
>
|
||||
<ArrowLeft className="size-4" />
|
||||
</Button>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">
|
||||
Agents
|
||||
</div>
|
||||
</div>
|
||||
<div className="min-w-0 flex-1">
|
||||
<ConversationHeader
|
||||
agent={harnessAgent ?? null}
|
||||
fallbackName={fallbackName}
|
||||
fallbackAdapter={fallbackAdapter}
|
||||
adapterHealth={adapterHealth}
|
||||
backLabel={backLabel}
|
||||
backTarget={isPageVariant ? 'page' : 'home'}
|
||||
onGoHome={() => navigate(backPath)}
|
||||
onPinToggle={(next) =>
|
||||
handlePinToggle(harnessAgent ?? null, next)
|
||||
}
|
||||
headerExtra={
|
||||
isOpenClawAgent ? (
|
||||
<Button
|
||||
variant={railVisible ? 'secondary' : 'ghost'}
|
||||
size="icon"
|
||||
className="size-8 rounded-xl"
|
||||
onClick={() => setOutputsRailOpen(!railVisible)}
|
||||
title={railVisible ? 'Hide outputs' : 'Show outputs'}
|
||||
>
|
||||
<PanelRight className="size-4" />
|
||||
</Button>
|
||||
) : undefined
|
||||
}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
|
||||
<AgentRailHeader onGoHome={() => navigate(backPath)} />
|
||||
|
||||
{/* Body grid: rail list + chat (+ outputs rail when an
|
||||
openclaw agent has it open). Columns share the same top
|
||||
edge as the band above so headers can never drift. */}
|
||||
<div
|
||||
className={cn(
|
||||
'grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)]',
|
||||
railVisible
|
||||
? 'lg:grid-cols-[288px_minmax(0,1fr)_320px]'
|
||||
: 'lg:grid-cols-[288px_minmax(0,1fr)]',
|
||||
)}
|
||||
>
|
||||
<AgentRail
|
||||
agents={harnessAgents}
|
||||
adapters={adapters}
|
||||
activeAgentId={resolvedAgentId}
|
||||
onSelectAgent={handleSelectHarnessAgent}
|
||||
onPinToggle={(target, next) => handlePinToggle(target, next)}
|
||||
/>
|
||||
<ConversationHeader
|
||||
agentName={agentName}
|
||||
agentMeta={agentMeta}
|
||||
status={statusCopy}
|
||||
backLabel={backLabel}
|
||||
backTarget={isPageVariant ? 'page' : 'home'}
|
||||
onGoHome={() => navigate(backPath)}
|
||||
/>
|
||||
|
||||
<div className="flex h-full min-h-0 flex-col overflow-hidden">
|
||||
<AgentConversationController
|
||||
key={resolvedAgentId}
|
||||
agentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
initialMessage={initialMessage}
|
||||
onInitialMessageConsumed={() => {
|
||||
// Preserve the outputsTurn deep-link if present —
|
||||
// dropping all params would erase the rail focus
|
||||
// before it had a chance to consume.
|
||||
setSearchParams(
|
||||
(prev) => {
|
||||
const next = new URLSearchParams()
|
||||
const turn = prev.get('outputsTurn')
|
||||
if (turn) next.set('outputsTurn', turn)
|
||||
return next
|
||||
},
|
||||
{ replace: true },
|
||||
)
|
||||
}}
|
||||
agentPathPrefix={agentPathPrefix}
|
||||
createAgentPath={createAgentPath}
|
||||
onOpenOutputsRail={isOpenClawAgent ? handleOpenOutputsRail : null}
|
||||
/>
|
||||
</div>
|
||||
<AgentRailList
|
||||
activeAgentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
onSelectAgent={handleSelectAgent}
|
||||
/>
|
||||
|
||||
{railVisible ? (
|
||||
<OutputsRail
|
||||
agentId={resolvedAgentId}
|
||||
onClose={() => setOutputsRailOpen(false)}
|
||||
focusTurnId={focusTurnId}
|
||||
onFocusTurnConsumed={handleFocusTurnConsumed}
|
||||
/>
|
||||
) : null}
|
||||
</div>
|
||||
<AgentConversationController
|
||||
key={resolvedAgentId}
|
||||
agentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
initialMessage={initialMessage}
|
||||
onInitialMessageConsumed={() =>
|
||||
setSearchParams({}, { replace: true })
|
||||
}
|
||||
agentPathPrefix={agentPathPrefix}
|
||||
createAgentPath={createAgentPath}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
|
||||
@@ -18,12 +18,8 @@ import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
|
||||
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
|
||||
import { AgentCardDock } from './AgentCardDock'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import {
|
||||
ConversationInput,
|
||||
type ConversationInputSendInput,
|
||||
} from './ConversationInput'
|
||||
import { ConversationInput } from './ConversationInput'
|
||||
import { orderHomeAgents } from './home-agent-card.helpers'
|
||||
import { setPendingInitialMessage } from './pending-initial-message'
|
||||
|
||||
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
|
||||
return (
|
||||
@@ -120,19 +116,8 @@ export const AgentCommandHome: FC = () => {
|
||||
}
|
||||
}, [legacyAgents, selectedAgentId])
|
||||
|
||||
const handleSend = (input: ConversationInputSendInput) => {
|
||||
const handleSend = (input: { text: string }) => {
|
||||
if (!selectedAgentId) return
|
||||
// Stash text + attachments in the in-memory registry. Text also
|
||||
// travels in `?q=` so a hard refresh / shareable URL still works
|
||||
// for text-only prompts; attachments are registry-only because a
|
||||
// multi-megabyte dataUrl can't ride a URL search param. The chat
|
||||
// screen prefers the registry when both are present.
|
||||
setPendingInitialMessage({
|
||||
agentId: selectedAgentId,
|
||||
text: input.text,
|
||||
attachments: input.attachments,
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
navigate(
|
||||
`/home/agents/${selectedAgentId}?q=${encodeURIComponent(input.text)}`,
|
||||
)
|
||||
@@ -162,16 +147,12 @@ export const AgentCommandHome: FC = () => {
|
||||
<>
|
||||
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
|
||||
<div className="space-y-3">
|
||||
<h1 className="font-semibold text-[clamp(2.25rem,4.5vw,3.5rem)] leading-[1.08] tracking-[-0.025em] [text-wrap:balance]">
|
||||
What should your agent{' '}
|
||||
<span className="font-medium text-[var(--accent-orange)] italic">
|
||||
work on
|
||||
</span>{' '}
|
||||
next?
|
||||
<h1 className="font-semibold text-[clamp(2rem,4vw,3.25rem)] leading-tight tracking-tight">
|
||||
What should your agent work on next?
|
||||
</h1>
|
||||
<p className="mx-auto max-w-2xl text-muted-foreground text-sm leading-6 [text-wrap:pretty]">
|
||||
Start a task, continue a thread, or hand off to a different
|
||||
agent — all without leaving this tab.
|
||||
<p className="mx-auto max-w-2xl text-muted-foreground text-sm leading-6">
|
||||
Start with a task, continue a thread, or switch to another
|
||||
agent without leaving the new tab.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
@@ -186,7 +167,7 @@ export const AgentCommandHome: FC = () => {
|
||||
streaming={false}
|
||||
disabled={!selectedAgentReady}
|
||||
status={selectedAgentStatus}
|
||||
attachmentsEnabled={true}
|
||||
attachmentsEnabled={false}
|
||||
placeholder={
|
||||
selectedAgentReady
|
||||
? `Ask ${selectedAgentName} to handle a task...`
|
||||
|
||||
@@ -1,65 +0,0 @@
|
||||
import { type FC, useMemo } from 'react'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
|
||||
import { AgentRailRow } from './AgentRailRow'
|
||||
|
||||
interface AgentRailProps {
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
activeAgentId: string
|
||||
onSelectAgent: (agent: HarnessAgent) => void
|
||||
onPinToggle: (agent: HarnessAgent, next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Left-column scrollable list of agents. The "Agents" label + back
|
||||
* button live in the shared top band above (so the rail header and
|
||||
* the chat header sit on a single aligned strip rather than as two
|
||||
* separately-sized headers per column). Sort matches `/agents`:
|
||||
* pinned-first → recency, so the rail doesn't reshuffle as turns
|
||||
* transition every 5 s.
|
||||
*/
|
||||
export const AgentRail: FC<AgentRailProps> = ({
|
||||
agents,
|
||||
adapters,
|
||||
activeAgentId,
|
||||
onSelectAgent,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const adapterHealth = useMemo(() => {
|
||||
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
|
||||
for (const adapter of adapters) {
|
||||
if (adapter.health) {
|
||||
map.set(adapter.id, {
|
||||
healthy: adapter.health.healthy,
|
||||
reason: adapter.health.reason,
|
||||
})
|
||||
}
|
||||
}
|
||||
return map
|
||||
}, [adapters])
|
||||
|
||||
const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
|
||||
|
||||
return (
|
||||
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
|
||||
<div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
|
||||
{ordered.map((agent) => (
|
||||
<AgentRailRow
|
||||
key={agent.id}
|
||||
agent={agent}
|
||||
active={agent.id === activeAgentId}
|
||||
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
|
||||
onSelect={() => onSelectAgent(agent)}
|
||||
onPinToggle={(next) => onPinToggle(agent, next)}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
</aside>
|
||||
)
|
||||
}
|
||||
@@ -1,102 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
|
||||
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface AgentRailRowProps {
|
||||
agent: HarnessAgent
|
||||
active: boolean
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
onSelect: () => void
|
||||
onPinToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
|
||||
* down to the essentials that fit a ~280 px rail: tile + name + status
|
||||
* badge + pin star, with the adapter / model / reasoning chips on a
|
||||
* second line. Token totals, sparkline, last-message preview all stay
|
||||
* on the `/agents` page where rows are full-width.
|
||||
*/
|
||||
export const AgentRailRow: FC<AgentRailRowProps> = ({
|
||||
agent,
|
||||
active,
|
||||
adapterHealth,
|
||||
onSelect,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const status = agent.status ?? 'unknown'
|
||||
const lastUsedAt = agent.lastUsedAt ?? null
|
||||
const pinned = agent.pinned ?? false
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onSelect}
|
||||
className={cn(
|
||||
'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
|
||||
active
|
||||
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
|
||||
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<div className="flex min-w-0 items-start gap-3">
|
||||
<AgentTile
|
||||
adapter={agent.adapter}
|
||||
status={status}
|
||||
lastUsedAt={lastUsedAt}
|
||||
/>
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<span className="truncate font-semibold text-[14px] leading-5">
|
||||
{agent.name}
|
||||
</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 px-1.5 text-[10px] text-muted-foreground"
|
||||
>
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && (
|
||||
<Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
|
||||
Attention
|
||||
</Badge>
|
||||
)}
|
||||
<div className="ml-auto">
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
</div>
|
||||
</div>
|
||||
<AgentSummaryChips
|
||||
adapter={agent.adapter}
|
||||
modelLabel={agent.modelId ?? null}
|
||||
reasoningEffort={agent.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Tooltip-only label helper kept exported in case the tile row needs to
|
||||
* show "Codex agent" or similar in a future state. Inlined fallback for
|
||||
* the rare `unknown` adapter rendering path.
|
||||
*/
|
||||
export function railRowAdapterLabel(agent: HarnessAgent): string {
|
||||
return adapterLabel(agent.adapter)
|
||||
}
|
||||
@@ -27,14 +27,6 @@ interface AgentSelectorProps {
|
||||
onSelectAgent: (agent: AgentEntry) => void
|
||||
onCreateAgent?: () => void
|
||||
status?: string
|
||||
/**
|
||||
* `'pill'` renders the filled-pill variant used by the calm
|
||||
* composer on `/home` — bordered, slightly elevated background,
|
||||
* mono agent name, used as the visual anchor on the left of the
|
||||
* footer chip row. Default `'ghost'` keeps the existing flat
|
||||
* shadcn ghost-button trigger used by the chat surface.
|
||||
*/
|
||||
triggerVariant?: 'ghost' | 'pill'
|
||||
}
|
||||
|
||||
function getStatusDot(status?: string) {
|
||||
@@ -50,49 +42,31 @@ export const AgentSelector: FC<AgentSelectorProps> = ({
|
||||
onSelectAgent,
|
||||
onCreateAgent,
|
||||
status,
|
||||
triggerVariant = 'ghost',
|
||||
}) => {
|
||||
const [open, setOpen] = useState(false)
|
||||
const selectedAgent = agents.find(
|
||||
(agent) => agent.agentId === selectedAgentId,
|
||||
)
|
||||
|
||||
const triggerNode =
|
||||
triggerVariant === 'pill' ? (
|
||||
<button
|
||||
type="button"
|
||||
className={cn(
|
||||
'inline-flex h-6 max-w-[180px] items-center gap-1.5 rounded-full border border-border bg-accent/40 pr-2 pl-2.5 text-[11.5px] text-foreground transition-colors',
|
||||
'hover:border-border hover:bg-accent/70 data-[state=open]:border-border data-[state=open]:bg-accent/70',
|
||||
)}
|
||||
>
|
||||
<span className={cn('size-1.5 rounded-full', getStatusDot(status))} />
|
||||
<span className="truncate font-medium font-mono text-[11.5px] tracking-[-0.01em]">
|
||||
{selectedAgent?.name ?? 'Select agent'}
|
||||
</span>
|
||||
<ChevronDown className="size-3 shrink-0 text-muted-foreground" />
|
||||
</button>
|
||||
) : (
|
||||
<Button
|
||||
variant="ghost"
|
||||
className={cn(
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
'data-[state=open]:bg-accent',
|
||||
)}
|
||||
>
|
||||
<Bot className="h-4 w-4" />
|
||||
<span className={cn('size-2 rounded-full', getStatusDot(status))} />
|
||||
<span className="max-w-32 truncate">
|
||||
{selectedAgent?.name ?? 'Select agent'}
|
||||
</span>
|
||||
<ChevronDown className="h-3 w-3" />
|
||||
</Button>
|
||||
)
|
||||
|
||||
return (
|
||||
<Popover open={open} onOpenChange={setOpen}>
|
||||
<PopoverTrigger asChild>{triggerNode}</PopoverTrigger>
|
||||
<PopoverTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
className={cn(
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
'data-[state=open]:bg-accent',
|
||||
)}
|
||||
>
|
||||
<Bot className="h-4 w-4" />
|
||||
<span className={cn('size-2 rounded-full', getStatusDot(status))} />
|
||||
<span className="max-w-32 truncate">
|
||||
{selectedAgent?.name ?? 'Select agent'}
|
||||
</span>
|
||||
<ChevronDown className="h-3 w-3" />
|
||||
</Button>
|
||||
</PopoverTrigger>
|
||||
<PopoverContent side="bottom" align="start" className="w-72 p-0">
|
||||
<Command>
|
||||
<CommandInput placeholder="Search agents..." className="h-9" />
|
||||
|
||||
@@ -1,14 +1,12 @@
|
||||
import { Bot, Loader2, RefreshCw } from 'lucide-react'
|
||||
import { type FC, Fragment, useEffect, useRef } from 'react'
|
||||
import { type FC, useEffect, useRef } from 'react'
|
||||
import {
|
||||
Conversation,
|
||||
ConversationContent,
|
||||
ConversationScrollButton,
|
||||
} from '@/components/ai-elements/conversation'
|
||||
import type { AgentConversationTurn } from '@/lib/agent-conversations/types'
|
||||
import type { ProducedFilesRailGroup } from '@/lib/agent-files'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { FileCardStrip } from './agent-conversation.file-card-strip'
|
||||
import { ClawChatMessage } from './ClawChatMessage'
|
||||
import { ConversationMessage } from './ConversationMessage'
|
||||
import type { ClawChatMessage as ClawChatMessageModel } from './claw-chat-types'
|
||||
@@ -17,29 +15,6 @@ interface ClawChatProps {
|
||||
agentName: string
|
||||
historyMessages: ClawChatMessageModel[]
|
||||
turns: AgentConversationTurn[]
|
||||
/**
|
||||
* Persisted turns that still need to render their FileCardStrip
|
||||
* because the history items they were filtered against don't
|
||||
* carry produced-files data. Rendered between history and the
|
||||
* live `turns` so the strip lands at the bottom of the
|
||||
* corresponding assistant turn.
|
||||
*/
|
||||
stripOnlyTurns?: AgentConversationTurn[]
|
||||
/**
|
||||
* Maps each assistant history message id → the produced-files
|
||||
* group that came from its turn. Built by
|
||||
* `mapHistoryToProducedFilesGroups` upstream so the strip
|
||||
* renders directly under the matching message instead of
|
||||
* stacking at the conversation tail.
|
||||
*/
|
||||
filesByAssistantId?: Map<string, ProducedFilesRailGroup>
|
||||
/**
|
||||
* Produced-files groups that didn't match any persisted history
|
||||
* pair (e.g. orphaned turns where history loaded after the
|
||||
* group was attributed). Rendered at the conversation tail as
|
||||
* a fallback so the user can still see them.
|
||||
*/
|
||||
tailStripGroups?: ReadonlyArray<ProducedFilesRailGroup>
|
||||
streaming: boolean
|
||||
isInitialLoading: boolean
|
||||
error: Error | null
|
||||
@@ -47,8 +22,6 @@ interface ClawChatProps {
|
||||
isFetchingNextPage: boolean
|
||||
onFetchNextPage: () => void
|
||||
onRetry: () => void
|
||||
/** Wired through to the inline file-card strip on each assistant turn. */
|
||||
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
|
||||
className?: string
|
||||
}
|
||||
|
||||
@@ -105,9 +78,6 @@ export const ClawChat: FC<ClawChatProps> = ({
|
||||
agentName,
|
||||
historyMessages,
|
||||
turns,
|
||||
stripOnlyTurns,
|
||||
filesByAssistantId,
|
||||
tailStripGroups,
|
||||
streaming,
|
||||
isInitialLoading,
|
||||
error,
|
||||
@@ -115,7 +85,6 @@ export const ClawChat: FC<ClawChatProps> = ({
|
||||
isFetchingNextPage,
|
||||
onFetchNextPage,
|
||||
onRetry,
|
||||
onOpenOutputsRail,
|
||||
className,
|
||||
}) => {
|
||||
const topSentinelRef = useRef<HTMLDivElement>(null)
|
||||
@@ -178,44 +147,14 @@ export const ClawChat: FC<ClawChatProps> = ({
|
||||
Start of conversation
|
||||
</div>
|
||||
) : null}
|
||||
{historyMessages.map((message) => {
|
||||
const matched = filesByAssistantId?.get(message.id)
|
||||
return (
|
||||
<Fragment key={message.id}>
|
||||
<ClawChatMessage message={message} />
|
||||
{matched ? (
|
||||
<FileCardStrip
|
||||
turnId={matched.turnId}
|
||||
files={matched.files}
|
||||
onOpenRail={onOpenOutputsRail ?? (() => {})}
|
||||
/>
|
||||
) : null}
|
||||
</Fragment>
|
||||
)
|
||||
})}
|
||||
{(tailStripGroups ?? []).map((group) => (
|
||||
<FileCardStrip
|
||||
key={`tail-strip-${group.turnId}`}
|
||||
turnId={group.turnId}
|
||||
files={group.files}
|
||||
onOpenRail={onOpenOutputsRail ?? (() => {})}
|
||||
/>
|
||||
))}
|
||||
{(stripOnlyTurns ?? []).map((turn) => (
|
||||
<ConversationMessage
|
||||
key={`strip-${turn.id}`}
|
||||
turn={turn}
|
||||
streaming={false}
|
||||
stripOnly
|
||||
onOpenOutputsRail={onOpenOutputsRail}
|
||||
/>
|
||||
{historyMessages.map((message) => (
|
||||
<ClawChatMessage key={message.id} message={message} />
|
||||
))}
|
||||
{turns.map((turn, index) => (
|
||||
<ConversationMessage
|
||||
key={turn.id}
|
||||
turn={turn}
|
||||
streaming={streaming && index === turns.length - 1}
|
||||
onOpenOutputsRail={onOpenOutputsRail}
|
||||
/>
|
||||
))}
|
||||
{error ? (
|
||||
|
||||
@@ -1,187 +0,0 @@
|
||||
import { ArrowLeft, Home } from 'lucide-react'
|
||||
import type { FC, ReactNode } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
|
||||
import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
|
||||
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface ConversationHeaderProps {
|
||||
agent: HarnessAgent | null
|
||||
fallbackName: string
|
||||
fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'hermes' | 'unknown'
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
backLabel: string
|
||||
backTarget: 'home' | 'page'
|
||||
onGoHome: () => void
|
||||
onPinToggle: (next: boolean) => void
|
||||
/** Optional trailing slot — currently used for the Outputs rail toggle. */
|
||||
headerExtra?: ReactNode
|
||||
}
|
||||
|
||||
/**
|
||||
* Strip above the chat. Mirrors the `/agents` row card's title row +
|
||||
* summary chips so the user gets adapter health, pin state, and status
|
||||
* at a glance — but adds the meta line (last used · lifetime tokens ·
|
||||
* queued) that's specific to this surface.
|
||||
*
|
||||
* The mobile `lg:hidden` Back button is preserved so the small-screen
|
||||
* collapse keeps a navigable header without a sidebar.
|
||||
*/
|
||||
export const ConversationHeader: FC<ConversationHeaderProps> = ({
|
||||
agent,
|
||||
fallbackName,
|
||||
fallbackAdapter,
|
||||
adapterHealth,
|
||||
backLabel,
|
||||
backTarget,
|
||||
onGoHome,
|
||||
onPinToggle,
|
||||
headerExtra,
|
||||
}) => {
|
||||
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
|
||||
const adapter = agent?.adapter ?? fallbackAdapter
|
||||
const status: AgentLiveness = agent?.status ?? 'unknown'
|
||||
const lastUsedAt = agent?.lastUsedAt ?? null
|
||||
const pinned = agent?.pinned ?? false
|
||||
const queueCount = agent?.queue?.length ?? 0
|
||||
const tokens = agent?.tokens ?? null
|
||||
const lifetimeTotal = tokens
|
||||
? tokens.cumulative.input + tokens.cumulative.output
|
||||
: 0
|
||||
|
||||
const metaParts: string[] = []
|
||||
if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
|
||||
if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
|
||||
if (queueCount > 0) {
|
||||
metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 shrink-0 rounded-xl lg:hidden"
|
||||
title={backLabel}
|
||||
>
|
||||
<BackIcon className="size-4" />
|
||||
</Button>
|
||||
<div className="group min-w-0 flex-1">
|
||||
<div className="flex items-center gap-2">
|
||||
<span className="truncate font-semibold text-[15px] leading-6">
|
||||
{agent?.name || fallbackName}
|
||||
</span>
|
||||
{agent ? (
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
) : null}
|
||||
</div>
|
||||
<div className="mt-0.5 flex items-center gap-2">
|
||||
<AgentSummaryChips
|
||||
adapter={adapter}
|
||||
modelLabel={agent?.modelId ?? null}
|
||||
reasoningEffort={agent?.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex shrink-0 items-center gap-3">
|
||||
<div className="flex shrink-0 flex-col items-end gap-1">
|
||||
<StatusPill
|
||||
status={status}
|
||||
hasActiveTurn={Boolean(agent?.activeTurnId)}
|
||||
/>
|
||||
<div className="flex h-4 items-center text-[11px] text-muted-foreground">
|
||||
<span className="truncate">
|
||||
{metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
{headerExtra ? (
|
||||
<div className="flex shrink-0 items-center">{headerExtra}</div>
|
||||
) : null}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
interface StatusPillProps {
|
||||
status: AgentLiveness
|
||||
hasActiveTurn: boolean
|
||||
}
|
||||
|
||||
/**
|
||||
* Working / Asleep / Attention all get distinctive styling; idle keeps
|
||||
* the legacy emerald `Ready` pill so the default state is visually
|
||||
* calm. Defensive working: `idle + activeTurnId` falls through to the
|
||||
* working pill since the server says a turn is in flight.
|
||||
*/
|
||||
const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
|
||||
const effective: AgentLiveness =
|
||||
status === 'idle' && hasActiveTurn ? 'working' : status
|
||||
|
||||
const base =
|
||||
'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
|
||||
|
||||
if (effective === 'working') {
|
||||
return (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className={cn(
|
||||
base,
|
||||
'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
|
||||
)}
|
||||
>
|
||||
<span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
|
||||
Working
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'asleep') {
|
||||
return (
|
||||
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
|
||||
<span className="size-1.5 rounded-full bg-muted-foreground/50" />
|
||||
Asleep
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'error') {
|
||||
return (
|
||||
<Badge
|
||||
variant="destructive"
|
||||
className={cn(base, 'border-destructive/30')}
|
||||
>
|
||||
<span className="size-1.5 rounded-full bg-destructive-foreground" />
|
||||
Attention
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'idle') {
|
||||
return (
|
||||
<Badge
|
||||
variant="outline"
|
||||
className={cn(
|
||||
base,
|
||||
'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
|
||||
)}
|
||||
>
|
||||
<span className="size-1.5 rounded-full bg-emerald-500" />
|
||||
Ready
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
|
||||
<span className="size-1.5 rounded-full bg-muted-foreground/30" />
|
||||
Setup
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
@@ -164,16 +164,7 @@ function VoiceButton({
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Calm-composer footer shared by both `/home` (`variant="home"`) and
|
||||
* the chat surface at `/agents/:agentId` (`variant="conversation"`).
|
||||
* Pill-shaped chips on an internal dashed divider, with a right-
|
||||
* aligned keyboard hint. The agent selector is conditional via
|
||||
* `showAgentSelector`: home shows it as a filled pill on the left,
|
||||
* the chat surface hides it (the agent is locked once you're in the
|
||||
* conversation).
|
||||
*/
|
||||
function CalmContextControls({
|
||||
function ContextControls({
|
||||
agents,
|
||||
onCreateAgent,
|
||||
onSelectAgent,
|
||||
@@ -210,128 +201,110 @@ function CalmContextControls({
|
||||
)?.is_authenticated
|
||||
})
|
||||
|
||||
const showApps = supports(Feature.MANAGED_MCP_SUPPORT)
|
||||
const showWorkspace = supports(Feature.WORKSPACE_FOLDER_SUPPORT)
|
||||
|
||||
return (
|
||||
<div className="mx-3 flex items-center gap-1 border-border/60 border-t border-dashed py-2">
|
||||
{showAgentSelector ? (
|
||||
<>
|
||||
<div className="flex items-center justify-between border-border/40 border-t px-4 py-2.5">
|
||||
<div className="flex items-center gap-1">
|
||||
{showAgentSelector ? (
|
||||
<AgentSelector
|
||||
agents={agents}
|
||||
selectedAgentId={selectedAgentId}
|
||||
onSelectAgent={onSelectAgent}
|
||||
onCreateAgent={onCreateAgent}
|
||||
status={status}
|
||||
triggerVariant="pill"
|
||||
/>
|
||||
<span
|
||||
aria-hidden="true"
|
||||
className="mx-1 inline-block h-3.5 w-px shrink-0 bg-border"
|
||||
/>
|
||||
</>
|
||||
) : null}
|
||||
{showWorkspace ? (
|
||||
<WorkspaceSelector>
|
||||
<button
|
||||
type="button"
|
||||
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground data-[state=open]:bg-accent data-[state=open]:text-foreground"
|
||||
>
|
||||
<Folder className="size-3" />
|
||||
<span>Workspace</span>
|
||||
<span className="font-mono text-[10.5px] text-muted-foreground/70">
|
||||
{selectedFolder?.name ?? 'none'}
|
||||
</span>
|
||||
</button>
|
||||
</WorkspaceSelector>
|
||||
) : null}
|
||||
<TabPickerPopover
|
||||
variant="selector"
|
||||
selectedTabs={selectedTabs}
|
||||
onToggleTab={onToggleTab}
|
||||
>
|
||||
<button
|
||||
type="button"
|
||||
className={cn(
|
||||
'inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] transition-colors data-[state=open]:bg-accent data-[state=open]:text-foreground',
|
||||
selectedTabs.length > 0
|
||||
? 'bg-[var(--accent-orange)] text-white hover:bg-[var(--accent-orange)]/90'
|
||||
: 'text-muted-foreground hover:bg-accent hover:text-foreground',
|
||||
)}
|
||||
) : null}
|
||||
{supports(Feature.WORKSPACE_FOLDER_SUPPORT) ? (
|
||||
<WorkspaceSelector>
|
||||
<Button
|
||||
variant="ghost"
|
||||
className={cn(
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
'data-[state=open]:bg-accent',
|
||||
)}
|
||||
>
|
||||
<Folder className="h-4 w-4" />
|
||||
<span>{selectedFolder?.name || 'Add workspace'}</span>
|
||||
<ChevronDown className="h-3 w-3" />
|
||||
</Button>
|
||||
</WorkspaceSelector>
|
||||
) : null}
|
||||
<TabPickerPopover
|
||||
variant="selector"
|
||||
selectedTabs={selectedTabs}
|
||||
onToggleTab={onToggleTab}
|
||||
>
|
||||
<Layers className="size-3" />
|
||||
<span>Tabs</span>
|
||||
<span
|
||||
<Button
|
||||
className={cn(
|
||||
'font-mono text-[10.5px]',
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
selectedTabs.length > 0
|
||||
? 'text-white/80'
|
||||
: 'text-muted-foreground/70',
|
||||
? 'bg-[var(--accent-orange)]! text-white shadow-sm'
|
||||
: 'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
'data-[state=open]:bg-accent',
|
||||
)}
|
||||
>
|
||||
{selectedTabs.length}
|
||||
</span>
|
||||
</button>
|
||||
</TabPickerPopover>
|
||||
<button
|
||||
type="button"
|
||||
onClick={onAttachClick}
|
||||
disabled={attachDisabled || !attachmentsEnabled}
|
||||
title="Attach files"
|
||||
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
<Paperclip className="size-3" />
|
||||
<span>Attach</span>
|
||||
</button>
|
||||
{showApps ? (
|
||||
<AppSelector side="bottom">
|
||||
<button
|
||||
type="button"
|
||||
className="inline-flex h-6 items-center gap-1.5 rounded-full px-2.5 text-[11.5px] text-muted-foreground transition-colors hover:bg-accent hover:text-foreground data-[state=open]:bg-accent data-[state=open]:text-foreground"
|
||||
>
|
||||
{connectedManagedServers.length > 0 ? (
|
||||
<span className="flex items-center -space-x-1.5">
|
||||
<Layers className="h-4 w-4" />
|
||||
<span>Tabs</span>
|
||||
</Button>
|
||||
</TabPickerPopover>
|
||||
<Button
|
||||
type="button"
|
||||
variant="ghost"
|
||||
onClick={onAttachClick}
|
||||
disabled={attachDisabled || !attachmentsEnabled}
|
||||
title="Attach files"
|
||||
className={cn(
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
)}
|
||||
>
|
||||
<Paperclip className="h-4 w-4" />
|
||||
<span>Attach</span>
|
||||
</Button>
|
||||
</div>
|
||||
|
||||
{supports(Feature.MANAGED_MCP_SUPPORT) ? (
|
||||
<div className="ml-auto flex items-center gap-1.5">
|
||||
<AppSelector side="bottom">
|
||||
<Button
|
||||
variant="ghost"
|
||||
className={cn(
|
||||
'flex items-center gap-2 rounded-lg px-3 py-1.5 font-medium text-sm transition-all',
|
||||
'bg-transparent text-muted-foreground hover:bg-accent hover:text-accent-foreground',
|
||||
'data-[state=open]:bg-accent',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-center -space-x-1.5">
|
||||
{connectedManagedServers.slice(0, 4).map((server) => (
|
||||
<span
|
||||
<div
|
||||
key={server.id}
|
||||
className="rounded-full ring-2 ring-card"
|
||||
>
|
||||
<McpServerIcon
|
||||
serverName={server.managedServerName ?? ''}
|
||||
size={12}
|
||||
size={16}
|
||||
/>
|
||||
</span>
|
||||
</div>
|
||||
))}
|
||||
</span>
|
||||
) : (
|
||||
<FileText className="size-3" />
|
||||
)}
|
||||
<span>Apps</span>
|
||||
<ChevronDown className="size-3" />
|
||||
</button>
|
||||
</AppSelector>
|
||||
</div>
|
||||
{connectedManagedServers.length > 4 ? (
|
||||
<span className="text-xs">
|
||||
+{connectedManagedServers.length - 4}
|
||||
</span>
|
||||
) : null}
|
||||
<span>Apps</span>
|
||||
<ChevronDown className="h-3 w-3" />
|
||||
</Button>
|
||||
</AppSelector>
|
||||
</div>
|
||||
) : null}
|
||||
<div className="ml-auto inline-flex shrink-0 items-center gap-1.5 text-[11px] text-muted-foreground/70">
|
||||
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
|
||||
↵
|
||||
</kbd>
|
||||
<span>to run</span>
|
||||
<span className="text-muted-foreground/40">·</span>
|
||||
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
|
||||
⇧
|
||||
</kbd>
|
||||
<kbd className="inline-flex h-4 min-w-4 items-center justify-center rounded border border-border bg-accent/30 px-1 font-mono text-[10px] text-muted-foreground">
|
||||
↵
|
||||
</kbd>
|
||||
<span>new line</span>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function HomeShell({ children }: { children: ReactNode }) {
|
||||
return (
|
||||
<div className="overflow-hidden rounded-[1.55rem] border border-border/60 bg-card/95 shadow-sm transition-[border-color,box-shadow] duration-150 focus-within:border-[var(--accent-orange)]/40 focus-within:shadow-[0_0_0_4px_color-mix(in_oklch,var(--accent-orange)_15%,transparent),0_1px_2px_rgba(15,23,42,0.04)]">
|
||||
<div className="overflow-hidden rounded-[1.55rem] border border-border/60 bg-card/95 shadow-sm">
|
||||
{children}
|
||||
</div>
|
||||
)
|
||||
@@ -339,7 +312,7 @@ function HomeShell({ children }: { children: ReactNode }) {
|
||||
|
||||
function ConversationShell({ children }: { children: ReactNode }) {
|
||||
return (
|
||||
<div className="overflow-hidden rounded-[1.35rem] border border-border/50 bg-background/95 shadow-[0_10px_30px_rgba(15,23,42,0.06)] backdrop-blur-md transition-[border-color,box-shadow] duration-150 focus-within:border-[var(--accent-orange)]/40 focus-within:shadow-[0_0_0_4px_color-mix(in_oklch,var(--accent-orange)_15%,transparent),0_10px_30px_rgba(15,23,42,0.06)]">
|
||||
<div className="overflow-hidden rounded-[1.35rem] border border-border/50 bg-background/95 shadow-[0_10px_30px_rgba(15,23,42,0.06)] backdrop-blur-md">
|
||||
{children}
|
||||
</div>
|
||||
)
|
||||
@@ -569,7 +542,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
}
|
||||
disabled={disabled || voice.isTranscribing}
|
||||
className={cn(
|
||||
'resize-none border-none bg-transparent px-0 text-[15px] shadow-none focus-visible:ring-0 dark:bg-transparent',
|
||||
'resize-none border-none bg-transparent px-0 text-[15px] shadow-none focus-visible:ring-0',
|
||||
'[field-sizing:fixed]',
|
||||
variant === 'home'
|
||||
? 'min-h-[40px] py-2 leading-6'
|
||||
@@ -610,7 +583,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
{voice.error}
|
||||
</div>
|
||||
) : null}
|
||||
<CalmContextControls
|
||||
<ContextControls
|
||||
agents={agents}
|
||||
onCreateAgent={onCreateAgent}
|
||||
onSelectAgent={onSelectAgent}
|
||||
|
||||
@@ -22,26 +22,10 @@ import type {
|
||||
AgentConversationTurn,
|
||||
ToolEntry,
|
||||
} from '@/lib/agent-conversations/types'
|
||||
import { FileCardStrip } from './agent-conversation.file-card-strip'
|
||||
|
||||
interface ConversationMessageProps {
|
||||
turn: AgentConversationTurn
|
||||
streaming: boolean
|
||||
/**
|
||||
* Forwarded to the inline file-card strip's "View" / "+N"
|
||||
* button. Wired up by AgentCommandConversation so the strip can
|
||||
* deep-link straight into the Outputs rail at the matching turn
|
||||
* group. `null` here disables the strip's deep-link affordance
|
||||
* — the cards still open the preview Sheet directly.
|
||||
*/
|
||||
onOpenOutputsRail?: ((turnId?: string | null) => void) | null
|
||||
/**
|
||||
* Render only the trailing FileCardStrip for this turn — used
|
||||
* when the turn's user / assistant text is already rendered
|
||||
* elsewhere (e.g. by `ClawChatMessage` from persisted history)
|
||||
* but the produced-files affordance would otherwise be lost.
|
||||
*/
|
||||
stripOnly?: boolean
|
||||
}
|
||||
|
||||
interface RenderEntry {
|
||||
@@ -104,22 +88,9 @@ function ToolStatusIcon({ status }: { status: ToolEntry['status'] }) {
|
||||
export const ConversationMessage: FC<ConversationMessageProps> = ({
|
||||
turn,
|
||||
streaming,
|
||||
onOpenOutputsRail,
|
||||
stripOnly,
|
||||
}) => {
|
||||
const entries = useMemo(() => buildRenderEntries(turn), [turn])
|
||||
|
||||
if (stripOnly) {
|
||||
if (!turn.producedFiles || turn.producedFiles.length === 0) return null
|
||||
return (
|
||||
<FileCardStrip
|
||||
turnId={turn.turnId ?? null}
|
||||
files={turn.producedFiles}
|
||||
onOpenRail={onOpenOutputsRail ?? (() => {})}
|
||||
/>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-3">
|
||||
<Message from="user">
|
||||
@@ -214,14 +185,6 @@ export const ConversationMessage: FC<ConversationMessageProps> = ({
|
||||
</Message>
|
||||
)}
|
||||
|
||||
{turn.producedFiles && turn.producedFiles.length > 0 ? (
|
||||
<FileCardStrip
|
||||
turnId={turn.turnId ?? null}
|
||||
files={turn.producedFiles}
|
||||
onOpenRail={onOpenOutputsRail ?? (() => {})}
|
||||
/>
|
||||
) : null}
|
||||
|
||||
{!turn.done && turn.parts.length === 0 && streaming && (
|
||||
<div className="flex gap-2">
|
||||
<div className="flex size-7 shrink-0 items-center justify-center rounded-full bg-[var(--accent-orange)] text-white">
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* @deprecated Replaced by `FileCardStrip` in
|
||||
* `agent-conversation.file-card-strip.tsx`. Kept temporarily so
|
||||
* any in-flight callers don't fail to import; remove in a
|
||||
* follow-up once nothing external references it.
|
||||
*
|
||||
* Compact "Files produced" card rendered under an assistant turn.
|
||||
*/
|
||||
|
||||
import { FileText, Image as ImageIcon, Paperclip } from 'lucide-react'
|
||||
import { type FC, useMemo, useState } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { basenameOf, formatFileSize, inferFileKind } from '@/lib/agent-files'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
|
||||
|
||||
export interface ProducedFileLike {
|
||||
id: string
|
||||
path: string
|
||||
size: number
|
||||
}
|
||||
|
||||
interface ArtifactCardProps {
|
||||
files: ReadonlyArray<ProducedFileLike>
|
||||
className?: string
|
||||
}
|
||||
|
||||
const MAX_INLINE_ROWS = 4
|
||||
|
||||
export const ArtifactCard: FC<ArtifactCardProps> = ({ files, className }) => {
|
||||
const [openFileId, setOpenFileId] = useState<string | null>(null)
|
||||
const [expanded, setExpanded] = useState(false)
|
||||
|
||||
const sortedFiles = useMemo(
|
||||
() => [...files].sort((a, b) => a.path.localeCompare(b.path)),
|
||||
[files],
|
||||
)
|
||||
|
||||
if (sortedFiles.length === 0) return null
|
||||
|
||||
const visible = expanded ? sortedFiles : sortedFiles.slice(0, MAX_INLINE_ROWS)
|
||||
const hiddenCount = sortedFiles.length - visible.length
|
||||
const openFile = sortedFiles.find((file) => file.id === openFileId) ?? null
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'rounded-xl border border-border/60 bg-card/50 px-3 py-2.5',
|
||||
className,
|
||||
)}
|
||||
>
|
||||
<div className="mb-2 flex items-center gap-2 text-muted-foreground text-xs">
|
||||
<Paperclip className="size-3.5" />
|
||||
<span className="font-medium text-foreground">
|
||||
{sortedFiles.length === 1
|
||||
? '1 file produced'
|
||||
: `${sortedFiles.length} files produced`}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<ul className="flex flex-col gap-1">
|
||||
{visible.map((file) => (
|
||||
<li key={file.id}>
|
||||
<ArtifactRow file={file} onOpen={() => setOpenFileId(file.id)} />
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
|
||||
{hiddenCount > 0 ? (
|
||||
<Button
|
||||
type="button"
|
||||
variant="ghost"
|
||||
size="sm"
|
||||
className="mt-1.5 h-7 px-2 text-xs"
|
||||
onClick={() => setExpanded(true)}
|
||||
>
|
||||
Show {hiddenCount} more
|
||||
</Button>
|
||||
) : null}
|
||||
|
||||
<FilePreviewSheet
|
||||
fileId={openFile?.id ?? null}
|
||||
filePath={openFile?.path ?? null}
|
||||
open={Boolean(openFileId)}
|
||||
onOpenChange={(next) => {
|
||||
if (!next) setOpenFileId(null)
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function ArtifactRow({
|
||||
file,
|
||||
onOpen,
|
||||
}: {
|
||||
file: ProducedFileLike
|
||||
onOpen: () => void
|
||||
}) {
|
||||
const name = basenameOf(file.path)
|
||||
const kind = inferFileKind(file.path)
|
||||
const Icon = kind === 'image' ? ImageIcon : FileText
|
||||
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onOpen}
|
||||
className={cn(
|
||||
'flex w-full items-center gap-2 rounded-md px-2 py-1.5 text-left text-sm transition-colors',
|
||||
'hover:bg-accent/60 focus:bg-accent/60 focus:outline-hidden',
|
||||
)}
|
||||
>
|
||||
<Icon className="size-3.5 shrink-0 text-muted-foreground" />
|
||||
<span className="min-w-0 flex-1 truncate font-medium">{name}</span>
|
||||
<span className="shrink-0 text-muted-foreground text-xs tabular-nums">
|
||||
{formatFileSize(file.size)}
|
||||
</span>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
@@ -1,163 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* "Files produced" strip rendered at the bottom of any assistant
|
||||
* turn that produced files (openclaw only). Replaces Phase 5.3's
|
||||
* row-list ArtifactCard with small horizontal cards for a lighter
|
||||
* visual treatment.
|
||||
*
|
||||
* Click semantics:
|
||||
* - Card → opens FilePreviewSheet directly (preview + download).
|
||||
* - View → emits onOpenRail(turnId); the parent opens the rail
|
||||
* and scrolls to the matching turn group.
|
||||
* - +N → same as View (the user is asking to see what was
|
||||
* overflowed).
|
||||
*/
|
||||
|
||||
import { ChevronRight, FileText, Image as ImageIcon } from 'lucide-react'
|
||||
import { type FC, useMemo, useState } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { basenameOf, formatFileSize, inferFileKind } from '@/lib/agent-files'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
|
||||
|
||||
export interface CardStripFile {
|
||||
id: string
|
||||
path: string
|
||||
size: number
|
||||
}
|
||||
|
||||
interface FileCardStripProps {
|
||||
/**
|
||||
* The turn id that produced these files. Forwarded to
|
||||
* `onOpenRail` so the rail can scroll/expand the matching group.
|
||||
* Optional because the live `produced_files` event lands before
|
||||
* the harness has stamped a server-issued turn id on the
|
||||
* optimistic turn — in that brief window, View falls back to
|
||||
* just opening the rail at the top.
|
||||
*/
|
||||
turnId?: string | null
|
||||
files: ReadonlyArray<CardStripFile>
|
||||
/** Caller wires this to `setOutputsRailOpen(true)` + deep-link. */
|
||||
onOpenRail: (turnId?: string | null) => void
|
||||
className?: string
|
||||
}
|
||||
|
||||
const MAX_VISIBLE = 4
|
||||
|
||||
export const FileCardStrip: FC<FileCardStripProps> = ({
|
||||
turnId,
|
||||
files,
|
||||
onOpenRail,
|
||||
className,
|
||||
}) => {
|
||||
const [openFileId, setOpenFileId] = useState<string | null>(null)
|
||||
|
||||
const sortedFiles = useMemo(
|
||||
() => [...files].sort((a, b) => a.path.localeCompare(b.path)),
|
||||
[files],
|
||||
)
|
||||
|
||||
if (sortedFiles.length === 0) return null
|
||||
|
||||
const visible = sortedFiles.slice(0, MAX_VISIBLE)
|
||||
const hiddenCount = sortedFiles.length - visible.length
|
||||
const openFile = sortedFiles.find((file) => file.id === openFileId) ?? null
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'rounded-xl border border-border/60 bg-card/50 px-3 py-2.5',
|
||||
className,
|
||||
)}
|
||||
>
|
||||
<div className="mb-2 flex items-center gap-2">
|
||||
<span className="text-muted-foreground text-xs">
|
||||
{sortedFiles.length === 1
|
||||
? 'File produced'
|
||||
: `Files produced (${sortedFiles.length})`}
|
||||
</span>
|
||||
<Button
|
||||
type="button"
|
||||
variant="ghost"
|
||||
size="sm"
|
||||
className="ml-auto h-7 gap-1 px-2 text-xs"
|
||||
onClick={() => onOpenRail(turnId ?? null)}
|
||||
>
|
||||
View
|
||||
<ChevronRight className="size-3" />
|
||||
</Button>
|
||||
</div>
|
||||
|
||||
<div className="flex flex-wrap gap-2">
|
||||
{visible.map((file) => (
|
||||
<FileCard
|
||||
key={file.id}
|
||||
file={file}
|
||||
onOpen={() => setOpenFileId(file.id)}
|
||||
/>
|
||||
))}
|
||||
{hiddenCount > 0 ? (
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => onOpenRail(turnId ?? null)}
|
||||
className={cn(
|
||||
'flex h-[56px] min-w-[56px] shrink-0 items-center justify-center rounded-lg border border-border/60 px-3 text-muted-foreground text-xs',
|
||||
'transition-colors hover:border-border hover:bg-accent/40 hover:text-foreground',
|
||||
'focus:outline-hidden focus-visible:ring-2 focus-visible:ring-[var(--accent-orange)]',
|
||||
)}
|
||||
title={`See ${hiddenCount} more in the Outputs rail`}
|
||||
>
|
||||
+{hiddenCount}
|
||||
</button>
|
||||
) : null}
|
||||
</div>
|
||||
|
||||
<FilePreviewSheet
|
||||
fileId={openFile?.id ?? null}
|
||||
filePath={openFile?.path ?? null}
|
||||
open={Boolean(openFileId)}
|
||||
onOpenChange={(next) => {
|
||||
if (!next) setOpenFileId(null)
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function FileCard({
|
||||
file,
|
||||
onOpen,
|
||||
}: {
|
||||
file: CardStripFile
|
||||
onOpen: () => void
|
||||
}) {
|
||||
const name = basenameOf(file.path)
|
||||
const kind = inferFileKind(file.path)
|
||||
const Icon = kind === 'image' ? ImageIcon : FileText
|
||||
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onOpen}
|
||||
title={file.path}
|
||||
className={cn(
|
||||
'flex h-[56px] w-[140px] shrink-0 flex-col justify-between rounded-lg border border-border/60 bg-background px-2.5 py-1.5 text-left',
|
||||
'transition-colors hover:border-border hover:bg-accent/40',
|
||||
'focus:outline-hidden focus-visible:ring-2 focus-visible:ring-[var(--accent-orange)]',
|
||||
)}
|
||||
>
|
||||
<div className="flex min-w-0 items-center gap-1.5">
|
||||
<Icon className="size-3.5 shrink-0 text-muted-foreground" />
|
||||
<span className="min-w-0 flex-1 truncate font-medium text-xs">
|
||||
{name}
|
||||
</span>
|
||||
</div>
|
||||
<span className="text-[11px] text-muted-foreground tabular-nums">
|
||||
{formatFileSize(file.size)}
|
||||
</span>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
@@ -1,283 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* Shared preview drawer used by the inline artifact card AND the
|
||||
* Outputs rail. Branches on the FilePreview discriminated union and
|
||||
* renders the appropriate body. Always opens via a controlled
|
||||
* `open`/`onOpenChange` pair so the parent owns the selected file.
|
||||
*/
|
||||
|
||||
import { Download, FileWarning, Loader2 } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef } from 'react'
|
||||
import { toast } from 'sonner'
|
||||
import { MessageResponse } from '@/components/ai-elements/message'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { ScrollArea } from '@/components/ui/scroll-area'
|
||||
import {
|
||||
Sheet,
|
||||
SheetContent,
|
||||
SheetDescription,
|
||||
SheetHeader,
|
||||
SheetTitle,
|
||||
} from '@/components/ui/sheet'
|
||||
import { Skeleton } from '@/components/ui/skeleton'
|
||||
import {
|
||||
basenameOf,
|
||||
buildFileDownloadUrl,
|
||||
extensionOf,
|
||||
type FilePreview,
|
||||
formatFileSize,
|
||||
useFilePreview,
|
||||
} from '@/lib/agent-files'
|
||||
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface FilePreviewSheetProps {
|
||||
fileId: string | null
|
||||
filePath: string | null
|
||||
open: boolean
|
||||
onOpenChange: (open: boolean) => void
|
||||
}
|
||||
|
||||
const MARKDOWN_EXTENSIONS = new Set(['md', 'markdown', 'mdx'])
|
||||
|
||||
export const FilePreviewSheet: FC<FilePreviewSheetProps> = ({
|
||||
fileId,
|
||||
filePath,
|
||||
open,
|
||||
onOpenChange,
|
||||
}) => {
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
const { preview, loading, error } = useFilePreview(fileId, open)
|
||||
|
||||
const fileName = filePath ? basenameOf(filePath) : 'File preview'
|
||||
const downloadUrl = useMemo(() => {
|
||||
if (!baseUrl || !fileId) return null
|
||||
return buildFileDownloadUrl(baseUrl, fileId)
|
||||
}, [baseUrl, fileId])
|
||||
|
||||
// Surface preview-load failures in a toast in addition to the
|
||||
// inline error block — the inline UI lives at the bottom of the
|
||||
// sheet and is easy to miss when scrolled into the body.
|
||||
const lastToastedFileIdRef = useRef<string | null>(null)
|
||||
useEffect(() => {
|
||||
if (!open) {
|
||||
lastToastedFileIdRef.current = null
|
||||
return
|
||||
}
|
||||
if (!error || !fileId) return
|
||||
if (lastToastedFileIdRef.current === fileId) return
|
||||
lastToastedFileIdRef.current = fileId
|
||||
toast.error('Could not load preview', { description: error.message })
|
||||
}, [open, error, fileId])
|
||||
|
||||
const handleDownload = () => {
|
||||
if (!downloadUrl) {
|
||||
toast.error("Couldn't reach the agent server", {
|
||||
description: 'Reconnect to BrowserOS and try again.',
|
||||
})
|
||||
return
|
||||
}
|
||||
// Manually trigger the download so any future failure (e.g. the
|
||||
// server returns 404 because the file was removed) can be
|
||||
// surfaced via toast — the bare <a download> path swallows
|
||||
// these errors silently.
|
||||
const link = document.createElement('a')
|
||||
link.href = downloadUrl
|
||||
link.download = fileName
|
||||
link.rel = 'noopener'
|
||||
document.body.appendChild(link)
|
||||
link.click()
|
||||
link.remove()
|
||||
}
|
||||
|
||||
return (
|
||||
<Sheet open={open} onOpenChange={onOpenChange}>
|
||||
<SheetContent
|
||||
side="right"
|
||||
className="flex w-full flex-col gap-0 p-0 sm:max-w-xl"
|
||||
>
|
||||
<SheetHeader className="border-border/60 border-b px-5 py-4">
|
||||
<SheetTitle className="truncate pr-8">{fileName}</SheetTitle>
|
||||
<SheetDescription className="truncate">
|
||||
{filePath ?? ''}
|
||||
</SheetDescription>
|
||||
</SheetHeader>
|
||||
|
||||
<ScrollArea className="min-h-0 flex-1">
|
||||
<div className="px-5 py-4">
|
||||
{loading ? (
|
||||
<PreviewSkeleton />
|
||||
) : error ? (
|
||||
<PreviewError message={error.message} />
|
||||
) : preview ? (
|
||||
<PreviewBody
|
||||
preview={preview}
|
||||
filePath={filePath}
|
||||
downloadUrl={downloadUrl}
|
||||
/>
|
||||
) : null}
|
||||
</div>
|
||||
</ScrollArea>
|
||||
|
||||
{fileId ? (
|
||||
<div className="border-border/60 border-t bg-background/90 px-5 py-3 backdrop-blur">
|
||||
<Button
|
||||
type="button"
|
||||
size="sm"
|
||||
className="w-full gap-2"
|
||||
onClick={handleDownload}
|
||||
>
|
||||
<Download className="size-3.5" />
|
||||
Download
|
||||
</Button>
|
||||
</div>
|
||||
) : null}
|
||||
</SheetContent>
|
||||
</Sheet>
|
||||
)
|
||||
}
|
||||
|
||||
function PreviewSkeleton() {
|
||||
return (
|
||||
<div className="flex flex-col gap-2">
|
||||
<div className="flex items-center gap-2 text-muted-foreground text-xs">
|
||||
<Loader2 className="size-3.5 animate-spin" />
|
||||
Loading preview...
|
||||
</div>
|
||||
<Skeleton className="h-4 w-3/4" />
|
||||
<Skeleton className="h-4 w-full" />
|
||||
<Skeleton className="h-4 w-5/6" />
|
||||
<Skeleton className="h-4 w-2/3" />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function PreviewError({ message }: { message: string }) {
|
||||
return (
|
||||
<div className="flex flex-col items-start gap-2 rounded-lg border border-destructive/30 bg-destructive/5 px-3 py-2 text-destructive text-sm">
|
||||
<div className="flex items-center gap-2 font-medium">
|
||||
<FileWarning className="size-4" />
|
||||
Could not load preview
|
||||
</div>
|
||||
<p className="text-destructive/80 text-xs">{message}</p>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function PreviewBody({
|
||||
preview,
|
||||
filePath,
|
||||
downloadUrl,
|
||||
}: {
|
||||
preview: FilePreview
|
||||
filePath: string | null
|
||||
downloadUrl: string | null
|
||||
}) {
|
||||
if (preview.kind === 'missing') {
|
||||
return (
|
||||
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
|
||||
This file is no longer in the workspace. The agent may have moved or
|
||||
deleted it after the turn finished.
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (preview.kind === 'image') {
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<PreviewMeta preview={preview} />
|
||||
<div className="overflow-hidden rounded-lg border border-border/60 bg-muted/30">
|
||||
<img
|
||||
src={preview.dataUrl}
|
||||
alt={filePath ?? 'preview'}
|
||||
className="block max-h-[60vh] w-full object-contain"
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (preview.kind === 'pdf') {
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<PreviewMeta preview={preview} />
|
||||
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
|
||||
PDF previews aren't supported inline yet. Use Download to open this
|
||||
file in your default PDF viewer.
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (preview.kind === 'binary') {
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<PreviewMeta preview={preview} />
|
||||
<div className="rounded-lg border border-border/60 bg-muted/40 px-4 py-6 text-center text-muted-foreground text-sm">
|
||||
No inline preview for this file type.
|
||||
{downloadUrl ? ' Use Download to save it locally.' : null}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return <TextPreviewBody preview={preview} filePath={filePath} />
|
||||
}
|
||||
|
||||
function TextPreviewBody({
|
||||
preview,
|
||||
filePath,
|
||||
}: {
|
||||
preview: Extract<FilePreview, { kind: 'text' }>
|
||||
filePath: string | null
|
||||
}) {
|
||||
const ext = filePath ? extensionOf(filePath).toLowerCase() : ''
|
||||
const renderAsMarkdown = MARKDOWN_EXTENSIONS.has(ext)
|
||||
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<PreviewMeta preview={preview} />
|
||||
{renderAsMarkdown ? (
|
||||
<div
|
||||
className={cn(
|
||||
'prose prose-sm dark:prose-invert max-w-none break-words rounded-lg border border-border/60 bg-muted/30 px-4 py-3',
|
||||
"[&_[data-streamdown='code-block']]:!w-full [&_[data-streamdown='code-block']]:overflow-x-auto",
|
||||
)}
|
||||
>
|
||||
<MessageResponse mode="static" parseIncompleteMarkdown={false}>
|
||||
{preview.snippet}
|
||||
</MessageResponse>
|
||||
</div>
|
||||
) : (
|
||||
<pre className="overflow-x-auto rounded-lg border border-border/60 bg-muted/30 px-3 py-2 text-xs leading-relaxed">
|
||||
<code className="font-mono text-foreground">{preview.snippet}</code>
|
||||
</pre>
|
||||
)}
|
||||
{preview.truncated ? (
|
||||
<div className="text-muted-foreground text-xs">
|
||||
Showing the first part of this file. Download to see the full
|
||||
contents.
|
||||
</div>
|
||||
) : null}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function PreviewMeta({
|
||||
preview,
|
||||
}: {
|
||||
preview: Exclude<FilePreview, { kind: 'missing' }>
|
||||
}) {
|
||||
return (
|
||||
<div className="flex flex-wrap items-center gap-x-3 gap-y-1 text-muted-foreground text-xs">
|
||||
<span className="font-medium text-foreground">
|
||||
{formatFileSize(preview.size)}
|
||||
</span>
|
||||
<span>·</span>
|
||||
<span className="font-mono">{preview.mimeType || 'unknown'}</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,338 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* Per-agent right-side "Outputs" panel. Lists every file the harness
|
||||
* has attributed to this agent, grouped by the turn that produced
|
||||
* them. Click a row to open the shared preview Sheet.
|
||||
*
|
||||
* Lifecycle:
|
||||
* - Open/closed state is controlled by the parent and persisted via
|
||||
* `useOutputsRailOpen(agentId)` so each agent remembers its
|
||||
* preference independently.
|
||||
* - Data refreshes whenever a turn finishes (the conversation hook
|
||||
* fires `useInvalidateAgentOutputs` from its finally block).
|
||||
* - Manual "Refresh" button is wired to `useRefreshAgentOutputs`
|
||||
* for users who navigate in mid-turn.
|
||||
*/
|
||||
|
||||
import {
|
||||
ChevronDown,
|
||||
ChevronRight,
|
||||
FileText,
|
||||
Image as ImageIcon,
|
||||
Inbox,
|
||||
Loader2,
|
||||
PanelRightClose,
|
||||
RefreshCw,
|
||||
} from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
|
||||
import { toast } from 'sonner'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
Collapsible,
|
||||
CollapsibleContent,
|
||||
CollapsibleTrigger,
|
||||
} from '@/components/ui/collapsible'
|
||||
import { ScrollArea } from '@/components/ui/scroll-area'
|
||||
import { Skeleton } from '@/components/ui/skeleton'
|
||||
import {
|
||||
basenameOf,
|
||||
formatFileSize,
|
||||
inferFileKind,
|
||||
type ProducedFilesRailGroup,
|
||||
useAgentOutputs,
|
||||
useRefreshAgentOutputs,
|
||||
} from '@/lib/agent-files'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { FilePreviewSheet } from './agent-conversation.file-preview-sheet'
|
||||
|
||||
interface OutputsRailProps {
|
||||
agentId: string
|
||||
onClose: () => void
|
||||
/**
|
||||
* When set, the rail scrolls the matching `RailTurnGroup` into
|
||||
* view and force-opens its `Collapsible`. Used by the inline
|
||||
* file-card strip's "View" / "+N" deep-link path. Cleared by
|
||||
* the parent (via `onFocusTurnConsumed`) once the rail has
|
||||
* acknowledged the deep-link so subsequent renders don't keep
|
||||
* re-scrolling the same group.
|
||||
*/
|
||||
focusTurnId?: string | null
|
||||
onFocusTurnConsumed?: () => void
|
||||
}
|
||||
|
||||
const RAIL_LOCAL_STORAGE_PREFIX = 'browseros:outputs-rail:'
|
||||
|
||||
/**
|
||||
* Controlled open/close state with per-agent localStorage memory.
|
||||
* Returns a tuple compatible with React's useState shape so the
|
||||
* parent can pass it straight into the rail without an extra effect.
|
||||
*/
|
||||
export function useOutputsRailOpen(
|
||||
agentId: string,
|
||||
): [boolean, (next: boolean) => void] {
|
||||
const [open, setOpen] = useState(false)
|
||||
|
||||
useEffect(() => {
|
||||
if (typeof window === 'undefined' || !agentId) return
|
||||
try {
|
||||
const stored = window.localStorage.getItem(
|
||||
`${RAIL_LOCAL_STORAGE_PREFIX}${agentId}`,
|
||||
)
|
||||
setOpen(stored === '1')
|
||||
} catch {
|
||||
// localStorage may be unavailable (private mode, locked-down
|
||||
// contexts) — fall back to closed.
|
||||
}
|
||||
}, [agentId])
|
||||
|
||||
const update = (next: boolean) => {
|
||||
setOpen(next)
|
||||
if (typeof window === 'undefined' || !agentId) return
|
||||
try {
|
||||
window.localStorage.setItem(
|
||||
`${RAIL_LOCAL_STORAGE_PREFIX}${agentId}`,
|
||||
next ? '1' : '0',
|
||||
)
|
||||
} catch {
|
||||
// Best-effort persistence.
|
||||
}
|
||||
}
|
||||
|
||||
return [open, update]
|
||||
}
|
||||
|
||||
export const OutputsRail: FC<OutputsRailProps> = ({
|
||||
agentId,
|
||||
onClose,
|
||||
focusTurnId,
|
||||
onFocusTurnConsumed,
|
||||
}) => {
|
||||
const { groups, loading, error } = useAgentOutputs(agentId)
|
||||
const refresh = useRefreshAgentOutputs(agentId)
|
||||
|
||||
const [openFile, setOpenFile] = useState<{
|
||||
id: string
|
||||
path: string
|
||||
} | null>(null)
|
||||
|
||||
const totalFiles = useMemo(
|
||||
() => groups.reduce((sum, group) => sum + group.files.length, 0),
|
||||
[groups],
|
||||
)
|
||||
|
||||
return (
|
||||
<aside className="flex h-full min-h-0 w-full flex-col border-border/50 border-l bg-background">
|
||||
<header className="flex shrink-0 items-center gap-2 border-border/50 border-b px-3 py-3">
|
||||
<span className="font-semibold text-[13px] uppercase tracking-wide">
|
||||
Outputs
|
||||
</span>
|
||||
{totalFiles > 0 ? (
|
||||
<span className="text-muted-foreground text-xs tabular-nums">
|
||||
{totalFiles}
|
||||
</span>
|
||||
) : null}
|
||||
<div className="ml-auto flex items-center gap-1">
|
||||
<Button
|
||||
type="button"
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
className="size-7"
|
||||
onClick={() =>
|
||||
refresh.mutate(undefined, {
|
||||
onError: (err) =>
|
||||
toast.error('Refresh failed', {
|
||||
description:
|
||||
err instanceof Error ? err.message : String(err),
|
||||
}),
|
||||
})
|
||||
}
|
||||
disabled={refresh.isPending}
|
||||
title="Refresh"
|
||||
>
|
||||
{refresh.isPending ? (
|
||||
<Loader2 className="size-3.5 animate-spin" />
|
||||
) : (
|
||||
<RefreshCw className="size-3.5" />
|
||||
)}
|
||||
</Button>
|
||||
<Button
|
||||
type="button"
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
className="size-7"
|
||||
onClick={onClose}
|
||||
title="Hide outputs"
|
||||
>
|
||||
<PanelRightClose className="size-3.5" />
|
||||
</Button>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<ScrollArea className="min-h-0 flex-1">
|
||||
<div className="px-2 py-2">
|
||||
{loading && groups.length === 0 ? (
|
||||
<RailSkeleton />
|
||||
) : error ? (
|
||||
<RailError message={error.message} />
|
||||
) : groups.length === 0 ? (
|
||||
<RailEmpty />
|
||||
) : (
|
||||
<ul className="flex flex-col gap-2">
|
||||
{groups.map((group) => (
|
||||
<li key={group.turnId}>
|
||||
<RailTurnGroup
|
||||
group={group}
|
||||
focused={
|
||||
Boolean(focusTurnId) && focusTurnId === group.turnId
|
||||
}
|
||||
onFocusConsumed={onFocusTurnConsumed}
|
||||
onOpenFile={(file) =>
|
||||
setOpenFile({ id: file.id, path: file.path })
|
||||
}
|
||||
/>
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
)}
|
||||
</div>
|
||||
</ScrollArea>
|
||||
|
||||
<FilePreviewSheet
|
||||
fileId={openFile?.id ?? null}
|
||||
filePath={openFile?.path ?? null}
|
||||
open={Boolean(openFile)}
|
||||
onOpenChange={(next) => {
|
||||
if (!next) setOpenFile(null)
|
||||
}}
|
||||
/>
|
||||
</aside>
|
||||
)
|
||||
}
|
||||
|
||||
function RailTurnGroup({
|
||||
group,
|
||||
focused,
|
||||
onFocusConsumed,
|
||||
onOpenFile,
|
||||
}: {
|
||||
group: ProducedFilesRailGroup
|
||||
focused: boolean
|
||||
onFocusConsumed?: () => void
|
||||
onOpenFile: (file: { id: string; path: string }) => void
|
||||
}) {
|
||||
const [open, setOpen] = useState(true)
|
||||
const headerLabel = group.turnPrompt.trim() || 'Turn'
|
||||
const containerRef = useRef<HTMLDivElement>(null)
|
||||
|
||||
// Deep-link consumption: when the parent passes `focused=true`,
|
||||
// expand the collapsible (in case the user had collapsed it
|
||||
// earlier) and scroll into view. Fire `onFocusConsumed` so the
|
||||
// parent can drop the URL param and we don't re-scroll on every
|
||||
// render after that.
|
||||
useEffect(() => {
|
||||
if (!focused) return
|
||||
setOpen(true)
|
||||
containerRef.current?.scrollIntoView({
|
||||
behavior: 'smooth',
|
||||
block: 'nearest',
|
||||
})
|
||||
onFocusConsumed?.()
|
||||
}, [focused, onFocusConsumed])
|
||||
|
||||
return (
|
||||
<div ref={containerRef}>
|
||||
<Collapsible open={open} onOpenChange={setOpen}>
|
||||
<CollapsibleTrigger
|
||||
className={cn(
|
||||
'flex w-full items-center gap-1.5 rounded-md px-1.5 py-1 text-left text-muted-foreground text-xs',
|
||||
'transition-colors hover:bg-accent/40 hover:text-foreground',
|
||||
)}
|
||||
>
|
||||
{open ? (
|
||||
<ChevronDown className="size-3 shrink-0" />
|
||||
) : (
|
||||
<ChevronRight className="size-3 shrink-0" />
|
||||
)}
|
||||
<span className="min-w-0 flex-1 truncate font-medium">
|
||||
{headerLabel}
|
||||
</span>
|
||||
<span className="shrink-0 tabular-nums">{group.files.length}</span>
|
||||
</CollapsibleTrigger>
|
||||
<CollapsibleContent>
|
||||
<ul className="mt-1 ml-1 flex flex-col gap-0.5 border-border/40 border-l pl-2">
|
||||
{group.files.map((file) => (
|
||||
<li key={file.id}>
|
||||
<RailFileRow file={file} onOpen={() => onOpenFile(file)} />
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
</CollapsibleContent>
|
||||
</Collapsible>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function RailFileRow({
|
||||
file,
|
||||
onOpen,
|
||||
}: {
|
||||
file: ProducedFilesRailGroup['files'][number]
|
||||
onOpen: () => void
|
||||
}) {
|
||||
const name = basenameOf(file.path)
|
||||
const kind = inferFileKind(file.path)
|
||||
const Icon = kind === 'image' ? ImageIcon : FileText
|
||||
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onOpen}
|
||||
className={cn(
|
||||
'flex w-full items-center gap-2 rounded-md px-1.5 py-1 text-left text-xs transition-colors',
|
||||
'hover:bg-accent/60 focus:bg-accent/60 focus:outline-hidden',
|
||||
)}
|
||||
title={file.path}
|
||||
>
|
||||
<Icon className="size-3 shrink-0 text-muted-foreground" />
|
||||
<span className="min-w-0 flex-1 truncate">{name}</span>
|
||||
<span className="shrink-0 text-muted-foreground tabular-nums">
|
||||
{formatFileSize(file.size)}
|
||||
</span>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
function RailSkeleton() {
|
||||
return (
|
||||
<div className="flex flex-col gap-2 px-1.5 py-1">
|
||||
<Skeleton className="h-4 w-1/2" />
|
||||
<Skeleton className="h-4 w-3/4" />
|
||||
<Skeleton className="h-4 w-2/3" />
|
||||
<Skeleton className="h-4 w-5/6" />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function RailEmpty() {
|
||||
return (
|
||||
<div className="mx-2 my-3 flex flex-col items-center gap-1.5 rounded-lg border border-border/60 border-dashed bg-muted/20 px-3 py-6 text-center text-muted-foreground text-xs">
|
||||
<Inbox className="size-4" />
|
||||
<p className="font-medium">No outputs yet</p>
|
||||
<p className="text-[11px] text-muted-foreground/70 leading-snug">
|
||||
Files this agent creates will appear here, grouped by the turn that made
|
||||
them.
|
||||
</p>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function RailError({ message }: { message: string }) {
|
||||
return (
|
||||
<div className="mx-2 my-3 rounded-lg border border-destructive/30 bg-destructive/5 px-3 py-2 text-destructive text-xs">
|
||||
{message}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,6 +1,5 @@
|
||||
import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import type { AgentConversationTurn } from '@/lib/agent-conversations/types'
|
||||
import type { ProducedFilesRailGroup } from '@/lib/agent-files'
|
||||
|
||||
export type ClawChatRole = 'user' | 'assistant'
|
||||
|
||||
@@ -235,30 +234,6 @@ export function filterTurnsPersistedInHistory(
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Persisted turns that still carry `producedFiles` — once history
|
||||
* reloads, the assistant text is rendered by `ClawChatMessage` and
|
||||
* the optimistic turn is filtered out by
|
||||
* `filterTurnsPersistedInHistory`. The historical message has no
|
||||
* `producedFiles` field (history items don't carry that), so the
|
||||
* inline file-card strip would vanish on history reload.
|
||||
*
|
||||
* Returning these here lets the caller render a strip-only entry
|
||||
* after the corresponding history bubble — full message stays as
|
||||
* the persisted history pair, but the produced-files affordance
|
||||
* survives.
|
||||
*/
|
||||
export function selectStripOnlyTurns(
|
||||
turns: AgentConversationTurn[],
|
||||
historyMessages: ClawChatMessage[],
|
||||
): AgentConversationTurn[] {
|
||||
return turns.filter(
|
||||
(turn) =>
|
||||
Boolean(turn.producedFiles && turn.producedFiles.length > 0) &&
|
||||
isTurnPersistedInHistory(turn, historyMessages),
|
||||
)
|
||||
}
|
||||
|
||||
function isTurnPersistedInHistory(
|
||||
turn: AgentConversationTurn,
|
||||
historyMessages: ClawChatMessage[],
|
||||
@@ -310,59 +285,3 @@ function getClawMessageText(message: ClawChatMessage): string {
|
||||
.join('')
|
||||
.trim()
|
||||
}
|
||||
|
||||
function firstNonBlankLine(value: string): string {
|
||||
for (const raw of value.split('\n')) {
|
||||
const trimmed = raw.trim()
|
||||
if (trimmed) return trimmed
|
||||
}
|
||||
return ''
|
||||
}
|
||||
|
||||
/**
|
||||
* Map each assistant history message to the produced-files group
|
||||
* that came from its turn. Match key is `group.turnPrompt` (first
|
||||
* non-blank line of the user prompt that initiated the turn) vs.
|
||||
* the first non-blank line of the user message that immediately
|
||||
* preceded this assistant message — the same shape the server
|
||||
* emits when storing turnPrompt.
|
||||
*
|
||||
* Walks history forward (oldest-first per `flattenHistoryPages`)
|
||||
* and consumes groups in chronological order. A group can only
|
||||
* match once — if two turns share the same prompt the earlier
|
||||
* one wins, and the later assistant message stays unassociated
|
||||
* (those land back in `tailStripGroups` at the conversation tail).
|
||||
*/
|
||||
export function mapHistoryToProducedFilesGroups(
|
||||
historyMessages: ClawChatMessage[],
|
||||
groups: ReadonlyArray<ProducedFilesRailGroup>,
|
||||
): {
|
||||
byAssistantMessageId: Map<string, ProducedFilesRailGroup>
|
||||
unmatched: ProducedFilesRailGroup[]
|
||||
} {
|
||||
const byAssistantMessageId = new Map<string, ProducedFilesRailGroup>()
|
||||
if (groups.length === 0) {
|
||||
return { byAssistantMessageId, unmatched: [] }
|
||||
}
|
||||
// Oldest-first so the iteration order matches history.
|
||||
const remaining = [...groups].sort((a, b) => a.createdAt - b.createdAt)
|
||||
|
||||
let pendingPrompt: string | null = null
|
||||
for (const message of historyMessages) {
|
||||
if (message.role === 'user') {
|
||||
pendingPrompt = firstNonBlankLine(getClawMessageText(message))
|
||||
continue
|
||||
}
|
||||
if (message.role !== 'assistant' || !pendingPrompt) continue
|
||||
const matchIndex = remaining.findIndex(
|
||||
(group) => group.turnPrompt === pendingPrompt,
|
||||
)
|
||||
if (matchIndex >= 0) {
|
||||
const [match] = remaining.splice(matchIndex, 1)
|
||||
byAssistantMessageId.set(message.id, match)
|
||||
}
|
||||
pendingPrompt = null
|
||||
}
|
||||
|
||||
return { byAssistantMessageId, unmatched: remaining }
|
||||
}
|
||||
|
||||
@@ -1,109 +0,0 @@
|
||||
import { afterEach, describe, expect, it } from 'bun:test'
|
||||
import type { StagedAttachment } from '@/lib/attachments'
|
||||
import {
|
||||
consumePendingInitialMessage,
|
||||
peekPendingInitialMessage,
|
||||
setPendingInitialMessage,
|
||||
} from './pending-initial-message'
|
||||
|
||||
function makeAttachment(id: string): StagedAttachment {
|
||||
return {
|
||||
id,
|
||||
kind: 'image',
|
||||
mediaType: 'image/png',
|
||||
name: `${id}.png`,
|
||||
dataUrl: `data:image/png;base64,${id}`,
|
||||
payload: {
|
||||
kind: 'image',
|
||||
mediaType: 'image/png',
|
||||
name: `${id}.png`,
|
||||
dataUrl: `data:image/png;base64,${id}`,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
afterEach(() => {
|
||||
// Drain any leftover pending entry so tests don't leak into each
|
||||
// other (the module-scope state survives across `it` blocks).
|
||||
consumePendingInitialMessage('drain')
|
||||
// If still set, clear by consuming with the matching id.
|
||||
const leftover = peekPendingInitialMessage()
|
||||
if (leftover) consumePendingInitialMessage(leftover.agentId)
|
||||
})
|
||||
|
||||
describe('pending-initial-message', () => {
|
||||
it('consume returns the payload set for the same agentId', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-a',
|
||||
text: 'hello',
|
||||
attachments: [makeAttachment('one')],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
const result = consumePendingInitialMessage('agent-a')
|
||||
expect(result?.text).toBe('hello')
|
||||
expect(result?.attachments).toHaveLength(1)
|
||||
expect(result?.attachments[0]?.id).toBe('one')
|
||||
})
|
||||
|
||||
it('consume is destructive — second call returns null', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-a',
|
||||
text: 'hello',
|
||||
attachments: [],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
|
||||
expect(consumePendingInitialMessage('agent-a')).toBeNull()
|
||||
})
|
||||
|
||||
it('consume returns null and preserves entry when agentId differs', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-a',
|
||||
text: 'hello',
|
||||
attachments: [],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
expect(consumePendingInitialMessage('agent-b')).toBeNull()
|
||||
expect(peekPendingInitialMessage()?.agentId).toBe('agent-a')
|
||||
expect(consumePendingInitialMessage('agent-a')).not.toBeNull()
|
||||
})
|
||||
|
||||
it('returns null for entries older than the TTL', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-a',
|
||||
text: 'old',
|
||||
attachments: [],
|
||||
createdAt: Date.now() - 11_000, // older than 10 s TTL
|
||||
})
|
||||
expect(consumePendingInitialMessage('agent-a')).toBeNull()
|
||||
})
|
||||
|
||||
it('replaces a previous pending entry when set is called again', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-a',
|
||||
text: 'first',
|
||||
attachments: [],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
setPendingInitialMessage({
|
||||
agentId: 'agent-b',
|
||||
text: 'second',
|
||||
attachments: [makeAttachment('two')],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
expect(consumePendingInitialMessage('agent-a')).toBeNull()
|
||||
const result = consumePendingInitialMessage('agent-b')
|
||||
expect(result?.text).toBe('second')
|
||||
expect(result?.attachments[0]?.id).toBe('two')
|
||||
})
|
||||
|
||||
it('no-ops when set is called with empty agentId', () => {
|
||||
setPendingInitialMessage({
|
||||
agentId: '',
|
||||
text: 'oops',
|
||||
attachments: [],
|
||||
createdAt: Date.now(),
|
||||
})
|
||||
expect(peekPendingInitialMessage()).toBeNull()
|
||||
})
|
||||
})
|
||||
@@ -1,81 +0,0 @@
|
||||
import type { StagedAttachment } from '@/lib/attachments'
|
||||
|
||||
/**
|
||||
* Same-tab in-memory handoff between the `/home` composer and the
|
||||
* chat screen at `/home/agents/:agentId`. URL search params (`?q=`)
|
||||
* carry the text fine, but cannot carry binary attachments — a multi-
|
||||
* megabyte image dataUrl would explode URL length limits and round-
|
||||
* trip badly. This module is the rich-data side channel for the same
|
||||
* navigation: the composer writes here, the chat screen reads here on
|
||||
* mount.
|
||||
*
|
||||
* Intentionally module-scope. Same render tree, same tab — no need
|
||||
* for sessionStorage (which would force JSON-serialising the dataUrls
|
||||
* and re-parsing on the read side). Cross-tab handoff is out of
|
||||
* scope: the user typing at home in tab A and switching to tab B's
|
||||
* chat would surface an empty registry there, which is the correct
|
||||
* behaviour.
|
||||
*/
|
||||
|
||||
export interface PendingInitialMessage {
|
||||
agentId: string
|
||||
text: string
|
||||
attachments: StagedAttachment[]
|
||||
createdAt: number
|
||||
}
|
||||
|
||||
/**
|
||||
* 10s TTL on the entry. A stale entry from a back-button journey
|
||||
* shouldn't fire on a future visit; if real-world latency makes 10s
|
||||
* too tight under slow harness boot, bump but never make it
|
||||
* indefinite.
|
||||
*/
|
||||
const PENDING_TTL_MS = 10_000
|
||||
|
||||
let pending: PendingInitialMessage | null = null
|
||||
let pendingTimer: ReturnType<typeof setTimeout> | null = null
|
||||
|
||||
function clearPending(): void {
|
||||
pending = null
|
||||
if (pendingTimer !== null) {
|
||||
clearTimeout(pendingTimer)
|
||||
pendingTimer = null
|
||||
}
|
||||
}
|
||||
|
||||
export function setPendingInitialMessage(payload: PendingInitialMessage): void {
|
||||
// Defensive: the home composer should never call this without an
|
||||
// agent selected. If it somehow does, no-op rather than holding a
|
||||
// payload we can't route.
|
||||
if (!payload.agentId) return
|
||||
clearPending()
|
||||
pending = payload
|
||||
pendingTimer = setTimeout(clearPending, PENDING_TTL_MS)
|
||||
}
|
||||
|
||||
/**
|
||||
* Destructive read. Returns the entry only if `agentId` matches and
|
||||
* the entry is fresh; clears the entry on success so Strict-Mode
|
||||
* double-invokes can't double-send.
|
||||
*/
|
||||
export function consumePendingInitialMessage(
|
||||
agentId: string,
|
||||
): PendingInitialMessage | null {
|
||||
if (!pending) return null
|
||||
if (pending.agentId !== agentId) return null
|
||||
if (Date.now() - pending.createdAt >= PENDING_TTL_MS) {
|
||||
clearPending()
|
||||
return null
|
||||
}
|
||||
const entry = pending
|
||||
clearPending()
|
||||
return entry
|
||||
}
|
||||
|
||||
/**
|
||||
* Non-mutating read for tests. Production code should never need this
|
||||
* — use `consume` and own the lifecycle.
|
||||
*/
|
||||
export function peekPendingInitialMessage(): PendingInitialMessage | null {
|
||||
return pending
|
||||
}
|
||||
@@ -10,11 +10,9 @@ import type { OpenClawChatHistoryMessage } from '@/entrypoints/app/agents/useOpe
|
||||
import type {
|
||||
AgentConversationTurn,
|
||||
AssistantPart,
|
||||
ConversationTurnFile,
|
||||
ToolEntry,
|
||||
UserAttachmentPreview,
|
||||
} from '@/lib/agent-conversations/types'
|
||||
import { useInvalidateAgentOutputs } from '@/lib/agent-files'
|
||||
import type { ServerAttachmentPayload } from '@/lib/attachments'
|
||||
import { consumeSSEStream } from '@/lib/sse'
|
||||
import { buildToolLabel } from '@/lib/tool-labels'
|
||||
@@ -55,12 +53,6 @@ export function useAgentConversation(
|
||||
) {
|
||||
const [turns, setTurns] = useState<AgentConversationTurn[]>([])
|
||||
const [streaming, setStreaming] = useState(false)
|
||||
const invalidateAgentOutputs = useInvalidateAgentOutputs()
|
||||
// Stable ref so the resume effect doesn't re-subscribe on every
|
||||
// render (the hook's returned callable is freshly closured each
|
||||
// time, but the underlying queryClient is stable).
|
||||
const invalidateAgentOutputsRef = useRef(invalidateAgentOutputs)
|
||||
invalidateAgentOutputsRef.current = invalidateAgentOutputs
|
||||
const sessionKeyRef = useRef(options.sessionKey ?? '')
|
||||
const historyRef = useRef<OpenClawChatHistoryMessage[]>(options.history ?? [])
|
||||
const textAccRef = useRef('')
|
||||
@@ -160,17 +152,6 @@ export function useAgentConversation(
|
||||
})
|
||||
}
|
||||
|
||||
const setProducedFilesOnCurrentTurn = (files: ConversationTurnFile[]) => {
|
||||
setTurns((prev) => {
|
||||
const last = prev[prev.length - 1]
|
||||
if (!last) return prev
|
||||
// Replace, don't merge: the server's diff is authoritative for
|
||||
// the just-completed turn — duplicate events shouldn't grow the
|
||||
// list, and a re-attribution should overwrite an earlier one.
|
||||
return [...prev.slice(0, -1), { ...last, producedFiles: files }]
|
||||
})
|
||||
}
|
||||
|
||||
const upsertAgentHarnessTool = (event: AgentHarnessStreamEvent) => {
|
||||
if (event.type !== 'tool_call') return
|
||||
const rawName = event.title || event.rawType || 'tool call'
|
||||
@@ -227,9 +208,6 @@ export function useAgentConversation(
|
||||
case 'tool_call':
|
||||
upsertAgentHarnessTool(event)
|
||||
break
|
||||
case 'produced_files':
|
||||
setProducedFilesOnCurrentTurn(event.files)
|
||||
break
|
||||
case 'done':
|
||||
markCurrentTurnDone()
|
||||
break
|
||||
@@ -281,7 +259,6 @@ export function useAgentConversation(
|
||||
...prev,
|
||||
{
|
||||
id: crypto.randomUUID(),
|
||||
turnId: active.turnId,
|
||||
userText: active.prompt ?? '',
|
||||
parts: [],
|
||||
done: false,
|
||||
@@ -327,14 +304,9 @@ export function useAgentConversation(
|
||||
// When `cancelled` is true the next run will set these
|
||||
// itself, so resetting here would only cause a brief flicker.
|
||||
if (!cancelled && weStartedStream) {
|
||||
const finishedTurnId = turnIdRef.current
|
||||
turnIdRef.current = null
|
||||
lastSeqRef.current = null
|
||||
setStreaming(false)
|
||||
void invalidateAgentOutputsRef.current(
|
||||
agentId,
|
||||
finishedTurnId ?? undefined,
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -346,60 +318,6 @@ export function useAgentConversation(
|
||||
}
|
||||
}, [agentId, activeTurnIdDep])
|
||||
|
||||
/**
|
||||
* Send the chat request and follow the 409-active-turn redirect
|
||||
* once. Pulled out of `send` to keep its cognitive complexity in
|
||||
* check — the retry adds a branch that biome counts heavily.
|
||||
*/
|
||||
const openSendStream = async (
|
||||
targetAgentId: string,
|
||||
text: string,
|
||||
attachments: ServerAttachmentPayload[],
|
||||
signal: AbortSignal,
|
||||
): Promise<Response> => {
|
||||
const initial = await chatWithHarnessAgent(
|
||||
targetAgentId,
|
||||
text,
|
||||
signal,
|
||||
attachments,
|
||||
)
|
||||
if (initial.status !== 409) return initial
|
||||
// 409 means the server already has an active turn for this agent
|
||||
// (a previous tab kicked one off and we're a fresh mount that
|
||||
// missed the resume window). Attach to it instead of double-sending.
|
||||
const body = (await initial.json()) as { turnId?: string }
|
||||
if (!body.turnId) return initial
|
||||
return attachToHarnessTurn(targetAgentId, {
|
||||
turnId: body.turnId,
|
||||
signal,
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Pull session-key / turn-id off response headers and propagate to
|
||||
* refs + the optimistic turn. Stamping `turnId` here lets the
|
||||
* inline artifact card fall back to /files/turn/<id> on a resumed
|
||||
* mount that missed the live `produced_files` event.
|
||||
*/
|
||||
const applyResponseHeadersToTurn = (response: Response) => {
|
||||
const responseSessionKey =
|
||||
response.headers.get('X-Session-Key') ??
|
||||
response.headers.get('X-Session-Id')
|
||||
if (responseSessionKey) {
|
||||
sessionKeyRef.current = responseSessionKey
|
||||
onSessionKeyChangeRef.current?.(responseSessionKey)
|
||||
}
|
||||
const responseTurnId = response.headers.get('X-Turn-Id')
|
||||
if (!responseTurnId) return
|
||||
turnIdRef.current = responseTurnId
|
||||
lastSeqRef.current = null
|
||||
setTurns((prev) => {
|
||||
const last = prev[prev.length - 1]
|
||||
if (!last) return prev
|
||||
return [...prev.slice(0, -1), { ...last, turnId: responseTurnId }]
|
||||
})
|
||||
}
|
||||
|
||||
const send = async (input: string | SendInput) => {
|
||||
const normalized: SendInput =
|
||||
typeof input === 'string' ? { text: input } : input
|
||||
@@ -428,13 +346,37 @@ export function useAgentConversation(
|
||||
streamAbortRef.current = abortController
|
||||
|
||||
try {
|
||||
const response = await openSendStream(
|
||||
let response = await chatWithHarnessAgent(
|
||||
agentId,
|
||||
trimmed,
|
||||
attachments,
|
||||
abortController.signal,
|
||||
attachments,
|
||||
)
|
||||
applyResponseHeadersToTurn(response)
|
||||
// 409 means the server already has an active turn for this
|
||||
// agent (e.g. a previous tab kicked one off and we're a fresh
|
||||
// mount that missed the resume window). Attach to it instead of
|
||||
// double-sending.
|
||||
if (response.status === 409) {
|
||||
const body = (await response.json()) as { turnId?: string }
|
||||
if (body.turnId) {
|
||||
response = await attachToHarnessTurn(agentId, {
|
||||
turnId: body.turnId,
|
||||
signal: abortController.signal,
|
||||
})
|
||||
}
|
||||
}
|
||||
const responseSessionKey =
|
||||
response.headers.get('X-Session-Key') ??
|
||||
response.headers.get('X-Session-Id')
|
||||
if (responseSessionKey) {
|
||||
sessionKeyRef.current = responseSessionKey
|
||||
onSessionKeyChangeRef.current?.(responseSessionKey)
|
||||
}
|
||||
const responseTurnId = response.headers.get('X-Turn-Id')
|
||||
if (responseTurnId) {
|
||||
turnIdRef.current = responseTurnId
|
||||
lastSeqRef.current = null
|
||||
}
|
||||
if (!response.ok) {
|
||||
const err = await response.text()
|
||||
updateCurrentTurnParts((parts) => [
|
||||
@@ -462,15 +404,10 @@ export function useAgentConversation(
|
||||
if (streamAbortRef.current === abortController) {
|
||||
streamAbortRef.current = null
|
||||
}
|
||||
// Capture before nulling — the invalidation needs the turn id so
|
||||
// useAgentTurnFiles consumers also flush, not just the agent-wide
|
||||
// rail query.
|
||||
const finishedTurnId = turnIdRef.current
|
||||
turnIdRef.current = null
|
||||
lastSeqRef.current = null
|
||||
onCompleteRef.current?.()
|
||||
setStreaming(false)
|
||||
void invalidateAgentOutputs(agentId, finishedTurnId ?? undefined)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { Bot, Cpu, Sparkles, Wand2 } from 'lucide-react'
|
||||
import { Bot, Cpu, Sparkles } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import type { HarnessAgentAdapter } from './agent-harness-types'
|
||||
|
||||
@@ -23,9 +23,6 @@ export const AdapterIcon: FC<AdapterIconProps> = ({ adapter, className }) => {
|
||||
case 'openclaw':
|
||||
// OpenClaw — bot/automation framing.
|
||||
return <Bot className={className} aria-label="OpenClaw" />
|
||||
case 'hermes':
|
||||
// Hermes — messenger god framing, wand evokes the agentic conjuring.
|
||||
return <Wand2 className={className} aria-label="Hermes" />
|
||||
default:
|
||||
return <Bot className={className} aria-label="Agent" />
|
||||
}
|
||||
@@ -39,8 +36,6 @@ export function adapterLabel(adapter: HarnessAgentAdapter | 'unknown'): string {
|
||||
return 'Codex'
|
||||
case 'openclaw':
|
||||
return 'OpenClaw'
|
||||
case 'hermes':
|
||||
return 'Hermes'
|
||||
default:
|
||||
return 'Agent'
|
||||
}
|
||||
|
||||
@@ -11,7 +11,6 @@ import type {
|
||||
AgentAdapterHealth,
|
||||
AgentRowData,
|
||||
} from './agent-row/agent-row.types'
|
||||
import { compareAgentsByPinThenRecency } from './agents-list-order'
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import type { AgentLiveness } from './LivenessDot'
|
||||
|
||||
@@ -57,18 +56,31 @@ export const AgentList: FC<AgentListProps> = ({
|
||||
return map
|
||||
}, [adapters])
|
||||
|
||||
// Sort: pinned rows first, then most recently used, then never-used
|
||||
// agents in id-stable order. The gateway's `main` agent stays
|
||||
// pinned-to-top when never touched so a fresh install has an
|
||||
// obvious starting point.
|
||||
const ordered = useMemo(() => {
|
||||
const withMeta = agents.map((agent) => {
|
||||
const harness = harnessAgentLookup?.get(agent.agentId)
|
||||
return {
|
||||
agent,
|
||||
id: agent.agentId,
|
||||
pinned: harness?.pinned ?? false,
|
||||
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
|
||||
}
|
||||
})
|
||||
return withMeta
|
||||
.sort(compareAgentsByPinThenRecency)
|
||||
.sort((a, b) => {
|
||||
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
|
||||
const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
|
||||
const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
const aValue = a.lastUsedAt ?? -Infinity
|
||||
const bValue = b.lastUsedAt ?? -Infinity
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
return a.agent.agentId.localeCompare(b.agent.agentId)
|
||||
})
|
||||
.map((entry) => entry.agent)
|
||||
}, [activity, agents, harnessAgentLookup])
|
||||
|
||||
@@ -117,7 +129,6 @@ function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
|
||||
if (lower === 'claude code') return 'claude'
|
||||
if (lower === 'codex') return 'codex'
|
||||
if (lower === 'openclaw') return 'openclaw'
|
||||
if (lower === 'hermes') return 'hermes'
|
||||
return 'unknown'
|
||||
}
|
||||
|
||||
|
||||
@@ -10,7 +10,6 @@ import { createAgentPageActions } from './agents-page-actions'
|
||||
import {
|
||||
useDefaultAgentName,
|
||||
useHarnessAgentDefaults,
|
||||
useHermesProviderSelection,
|
||||
useOpenClawProviderSelection,
|
||||
} from './agents-page-hooks'
|
||||
import {
|
||||
@@ -107,7 +106,6 @@ export const AgentsPage: FC = () => {
|
||||
)
|
||||
const [harnessModelId, setHarnessModelId] = useState('')
|
||||
const [harnessReasoningEffort, setHarnessReasoningEffort] = useState('')
|
||||
const [createHermesProviderId, setCreateHermesProviderId] = useState('')
|
||||
const [showTerminal, setShowTerminal] = useState(false)
|
||||
const [cliAuthModalOpen, setCliAuthModalOpen] = useState(false)
|
||||
const [pageError, setPageError] = useState<string | null>(null)
|
||||
@@ -135,14 +133,6 @@ export const AgentsPage: FC = () => {
|
||||
cliAuthModalOpen,
|
||||
setCliAuthModalOpen,
|
||||
})
|
||||
const { selectableHermesProviders } = useHermesProviderSelection({
|
||||
providers,
|
||||
defaultProviderId,
|
||||
createOpen,
|
||||
createRuntime,
|
||||
createHermesProviderId,
|
||||
setCreateHermesProviderId,
|
||||
})
|
||||
useDefaultAgentName(createOpen, setNewName)
|
||||
useHarnessAgentDefaults({
|
||||
adapters,
|
||||
@@ -236,13 +226,11 @@ export const AgentsPage: FC = () => {
|
||||
createAgentPageActions({
|
||||
createProviderId,
|
||||
createRuntime,
|
||||
createHermesProviderId,
|
||||
harnessModelId,
|
||||
harnessReasoningEffort,
|
||||
navigate,
|
||||
newName,
|
||||
selectableOpenClawProviders,
|
||||
selectableHermesProviders,
|
||||
setupProviderId,
|
||||
createHarnessAgent: createHarnessAgent.mutateAsync,
|
||||
createOpenClawAgent,
|
||||
@@ -398,8 +386,6 @@ export const AgentsPage: FC = () => {
|
||||
harnessAdapterId={harnessAdapterId}
|
||||
harnessModelId={harnessModelId}
|
||||
harnessReasoningEffort={harnessReasoningEffort}
|
||||
hermesProviders={selectableHermesProviders}
|
||||
hermesSelectedProviderId={createHermesProviderId}
|
||||
name={newName}
|
||||
open={createOpen}
|
||||
providers={selectableOpenClawProviders}
|
||||
@@ -415,14 +401,12 @@ export const AgentsPage: FC = () => {
|
||||
if (!open) {
|
||||
setCreateError(null)
|
||||
createHarnessAgent.reset()
|
||||
setCreateHermesProviderId('')
|
||||
}
|
||||
}}
|
||||
onRuntimeChange={setCreateRuntime}
|
||||
onHarnessAdapterChange={handleHarnessAdapterChange}
|
||||
onHarnessModelChange={setHarnessModelId}
|
||||
onHarnessReasoningChange={setHarnessReasoningEffort}
|
||||
onHermesProviderChange={setCreateHermesProviderId}
|
||||
onNameChange={setNewName}
|
||||
onProviderChange={setCreateProviderId}
|
||||
/>
|
||||
|
||||
@@ -40,8 +40,6 @@ interface NewAgentDialogProps {
|
||||
harnessAdapterId: HarnessAgentAdapter
|
||||
harnessModelId: string
|
||||
harnessReasoningEffort: string
|
||||
hermesProviders: ProviderOption[]
|
||||
hermesSelectedProviderId: string
|
||||
name: string
|
||||
open: boolean
|
||||
providers: ProviderOption[]
|
||||
@@ -57,7 +55,6 @@ interface NewAgentDialogProps {
|
||||
onHarnessAdapterChange: (adapter: HarnessAgentAdapter) => void
|
||||
onHarnessModelChange: (modelId: string) => void
|
||||
onHarnessReasoningChange: (reasoningEffort: string) => void
|
||||
onHermesProviderChange: (providerId: string) => void
|
||||
onNameChange: (name: string) => void
|
||||
onProviderChange: (providerId: string) => void
|
||||
}
|
||||
@@ -72,8 +69,6 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
|
||||
harnessAdapterId,
|
||||
harnessModelId,
|
||||
harnessReasoningEffort,
|
||||
hermesProviders,
|
||||
hermesSelectedProviderId,
|
||||
name,
|
||||
open,
|
||||
providers,
|
||||
@@ -89,29 +84,22 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
|
||||
onHarnessAdapterChange,
|
||||
onHarnessModelChange,
|
||||
onHarnessReasoningChange,
|
||||
onHermesProviderChange,
|
||||
onNameChange,
|
||||
onProviderChange,
|
||||
}) => {
|
||||
const selectedHarnessAdapter =
|
||||
adapters.find((adapter) => adapter.id === harnessAdapterId) ?? adapters[0]
|
||||
const isHarnessRuntime = createRuntime !== 'openclaw'
|
||||
const isHermesRuntime = createRuntime === 'hermes'
|
||||
const isClassicHarnessRuntime = isHarnessRuntime && !isHermesRuntime
|
||||
const openClawBlocked = createRuntime === 'openclaw' && !canManageOpenClaw
|
||||
const cliBlocked =
|
||||
createRuntime === 'openclaw' &&
|
||||
!!selectedCliProvider &&
|
||||
!cliAuthStatus?.loggedIn
|
||||
const hermesBlocked =
|
||||
isHermesRuntime &&
|
||||
(hermesProviders.length === 0 || !hermesSelectedProviderId)
|
||||
const canCreate =
|
||||
Boolean(name.trim()) &&
|
||||
!creating &&
|
||||
!openClawBlocked &&
|
||||
!cliBlocked &&
|
||||
!hermesBlocked &&
|
||||
(createRuntime === 'openclaw'
|
||||
? providers.length > 0
|
||||
: Boolean(selectedHarnessAdapter))
|
||||
@@ -155,8 +143,7 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
|
||||
if (
|
||||
value === 'openclaw' ||
|
||||
value === 'claude' ||
|
||||
value === 'codex' ||
|
||||
value === 'hermes'
|
||||
value === 'codex'
|
||||
) {
|
||||
onRuntimeChange(value)
|
||||
if (value !== 'openclaw') onHarnessAdapterChange(value)
|
||||
@@ -209,16 +196,7 @@ export const NewAgentDialog: FC<NewAgentDialogProps> = ({
|
||||
</>
|
||||
) : null}
|
||||
|
||||
{isHermesRuntime ? (
|
||||
<ProviderSelector
|
||||
providers={hermesProviders}
|
||||
defaultProviderId={defaultProviderId}
|
||||
selectedId={hermesSelectedProviderId}
|
||||
onSelect={onHermesProviderChange}
|
||||
/>
|
||||
) : null}
|
||||
|
||||
{isClassicHarnessRuntime ? (
|
||||
{isHarnessRuntime ? (
|
||||
<>
|
||||
<div className="grid gap-2">
|
||||
<Label htmlFor="harness-model">Model</Label>
|
||||
|
||||
@@ -1,21 +1,6 @@
|
||||
import type { AgentEntry } from './useOpenClaw'
|
||||
|
||||
export type HarnessAgentAdapter = 'claude' | 'codex' | 'openclaw' | 'hermes'
|
||||
|
||||
/**
|
||||
* One file the harness attributed to the assistant turn that just
|
||||
* finished. Mirrors the server-side `ProducedFileEventEntry` shape so
|
||||
* the inline artifact card can render alongside the streamed text the
|
||||
* user just watched complete. Only present for openclaw adapter
|
||||
* turns; claude / codex don't produce these events in v1.
|
||||
*/
|
||||
export interface HarnessProducedFile {
|
||||
id: string
|
||||
/** Workspace-relative POSIX path. */
|
||||
path: string
|
||||
size: number
|
||||
mtimeMs: number
|
||||
}
|
||||
export type HarnessAgentAdapter = 'claude' | 'codex' | 'openclaw'
|
||||
|
||||
export type AgentHarnessStreamEvent =
|
||||
| {
|
||||
@@ -37,10 +22,6 @@ export type AgentHarnessStreamEvent =
|
||||
text: string
|
||||
rawType?: string
|
||||
}
|
||||
| {
|
||||
type: 'produced_files'
|
||||
files: HarnessProducedFile[]
|
||||
}
|
||||
| {
|
||||
type: 'done'
|
||||
text?: string
|
||||
@@ -130,16 +111,6 @@ export interface CreateHarnessAgentInput {
|
||||
adapter: HarnessAgentAdapter
|
||||
modelId?: string
|
||||
reasoningEffort?: string
|
||||
/**
|
||||
* Adapter provider id from the user's BrowserOS AI Settings entry.
|
||||
* Provider-backed adapters use this with `apiKey`/`baseUrl` to write
|
||||
* or provision their runtime-specific provider config.
|
||||
*/
|
||||
providerType?: string
|
||||
/** API key paired with `providerType` when the selected adapter needs one. */
|
||||
apiKey?: string
|
||||
/** Base URL for OpenAI-compatible/custom provider entries. */
|
||||
baseUrl?: string
|
||||
}
|
||||
|
||||
export interface HarnessHistoryReasoning {
|
||||
|
||||
@@ -1,104 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import type { HarnessAgent } from './agent-harness-types'
|
||||
import {
|
||||
compareAgentsByPinThenRecency,
|
||||
orderAgentsByPinThenRecency,
|
||||
} from './agents-list-order'
|
||||
|
||||
function makeAgent(input: {
|
||||
id: string
|
||||
pinned?: boolean
|
||||
lastUsedAt?: number | null
|
||||
}): HarnessAgent {
|
||||
return {
|
||||
id: input.id,
|
||||
name: input.id,
|
||||
adapter: 'codex',
|
||||
permissionMode: 'approve-all',
|
||||
sessionKey: 'session',
|
||||
createdAt: 0,
|
||||
updatedAt: 0,
|
||||
pinned: input.pinned,
|
||||
lastUsedAt: input.lastUsedAt,
|
||||
}
|
||||
}
|
||||
|
||||
describe('orderAgentsByPinThenRecency', () => {
|
||||
it('floats pinned agents to the top regardless of recency', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
|
||||
makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
|
||||
})
|
||||
|
||||
it('sorts by lastUsedAt desc within each pin group', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
|
||||
makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
|
||||
makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual([
|
||||
'newer-pin',
|
||||
'older-pin',
|
||||
'newer',
|
||||
'older',
|
||||
])
|
||||
})
|
||||
|
||||
it('seed-pins the gateway main agent above other never-used agents', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
|
||||
})
|
||||
|
||||
it('drops the main seed-pin once the agent has been used', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
|
||||
makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
|
||||
})
|
||||
|
||||
it('puts never-used agents below recently-used ones', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
|
||||
})
|
||||
|
||||
it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
|
||||
})
|
||||
})
|
||||
|
||||
describe('compareAgentsByPinThenRecency', () => {
|
||||
it('produces the same order as the harness-shape helper', () => {
|
||||
const items = [
|
||||
{ id: 'older', pinned: false, lastUsedAt: 50 },
|
||||
{ id: 'newer', pinned: false, lastUsedAt: 80 },
|
||||
{ id: 'pinned', pinned: true, lastUsedAt: 1 },
|
||||
]
|
||||
const sorted = [...items].sort(compareAgentsByPinThenRecency)
|
||||
expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
|
||||
})
|
||||
|
||||
it('seeds the main agent above other never-used rows', () => {
|
||||
const items = [
|
||||
{ id: 'zzz', pinned: false, lastUsedAt: null },
|
||||
{ id: 'main', pinned: false, lastUsedAt: null },
|
||||
]
|
||||
const sorted = [...items].sort(compareAgentsByPinThenRecency)
|
||||
expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
|
||||
})
|
||||
})
|
||||
@@ -1,59 +0,0 @@
|
||||
import type { HarnessAgent } from './agent-harness-types'
|
||||
|
||||
/**
|
||||
* Stable ordering for index-shaped agent surfaces (the `/agents` rail
|
||||
* and the chat-screen rail at `/agents/:agentId`). Pinned rows float
|
||||
* to the top, then recency desc, with never-used agents falling to
|
||||
* the bottom in id-stable order. The gateway's `main` agent gets
|
||||
* seed-pinned to the top of the never-used group so a fresh install
|
||||
* has an obvious starting point even before the user has used it.
|
||||
*
|
||||
* NOT the same rule as the home grid (`orderHomeAgents`): home is
|
||||
* action-shaped — active-turn floats to the top — so users can
|
||||
* resume what's running. The chat rail keeps recency stable so it
|
||||
* doesn't reshuffle as turns transition every 5s.
|
||||
*/
|
||||
export function orderAgentsByPinThenRecency(
|
||||
agents: HarnessAgent[],
|
||||
): HarnessAgent[] {
|
||||
return [...agents].sort((a, b) => {
|
||||
const aPinned = a.pinned ?? false
|
||||
const bPinned = b.pinned ?? false
|
||||
if (aPinned !== bPinned) return aPinned ? -1 : 1
|
||||
|
||||
const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
|
||||
const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Same comparator, but operates over arbitrary records that carry
|
||||
* `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
|
||||
* `/agents` `AgentList` which pivots `AgentListItem` + harness
|
||||
* lookup into a sortable shape; both surfaces stay on identical
|
||||
* sort semantics through this adapter.
|
||||
*/
|
||||
export function compareAgentsByPinThenRecency<
|
||||
T extends { pinned: boolean; lastUsedAt: number | null; id: string },
|
||||
>(a: T, b: T): number {
|
||||
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
|
||||
|
||||
const aSeed = a.id === 'main' && a.lastUsedAt === null
|
||||
const bSeed = b.id === 'main' && b.lastUsedAt === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
}
|
||||
@@ -20,22 +20,17 @@ import type {
|
||||
export interface AgentPageActionInput {
|
||||
createProviderId: string
|
||||
createRuntime: CreateAgentRuntime
|
||||
createHermesProviderId: string
|
||||
harnessModelId: string
|
||||
harnessReasoningEffort: string
|
||||
navigate: NavigateFunction
|
||||
newName: string
|
||||
selectableOpenClawProviders: ProviderOption[]
|
||||
selectableHermesProviders: ProviderOption[]
|
||||
setupProviderId: string
|
||||
createHarnessAgent: (input: {
|
||||
name: string
|
||||
adapter: HarnessAgentAdapter
|
||||
modelId?: string
|
||||
reasoningEffort?: string
|
||||
providerType?: string
|
||||
apiKey?: string
|
||||
baseUrl?: string
|
||||
}) => Promise<HarnessAgent>
|
||||
createOpenClawAgent: (
|
||||
input: OpenClawAgentMutationInput,
|
||||
@@ -119,37 +114,20 @@ export function createAgentPageActions(input: AgentPageActionInput) {
|
||||
const handleHarnessCreate = async () => {
|
||||
if (!input.newName.trim()) return
|
||||
|
||||
const isHermes = input.createRuntime === 'hermes'
|
||||
// Hermes pulls every provider field from the user's selected entry
|
||||
// in the global LLM-providers list (managed under AI Settings). The
|
||||
// backend rejects creation if any required field is missing.
|
||||
const hermesProvider = isHermes
|
||||
? input.selectableHermesProviders.find(
|
||||
(option) => option.id === input.createHermesProviderId,
|
||||
)
|
||||
: undefined
|
||||
const effectiveModelId = isHermes
|
||||
? hermesProvider?.modelId
|
||||
: input.harnessModelId || undefined
|
||||
|
||||
input.setCreateError(null)
|
||||
try {
|
||||
const agent = await input.createHarnessAgent({
|
||||
name: input.newName.trim(),
|
||||
adapter: input.createRuntime as HarnessAgentAdapter,
|
||||
modelId: effectiveModelId,
|
||||
modelId: input.harnessModelId || undefined,
|
||||
reasoningEffort: input.harnessReasoningEffort || undefined,
|
||||
providerType: hermesProvider?.type,
|
||||
apiKey: hermesProvider?.apiKey,
|
||||
baseUrl: hermesProvider?.baseUrl,
|
||||
})
|
||||
input.setCreateOpen(false)
|
||||
input.setNewName('')
|
||||
track(AGENT_CREATED_EVENT, {
|
||||
runtime: input.createRuntime,
|
||||
model_id: effectiveModelId,
|
||||
model_id: input.harnessModelId || undefined,
|
||||
reasoning_effort: input.harnessReasoningEffort || undefined,
|
||||
provider_type: hermesProvider?.type,
|
||||
})
|
||||
input.navigate(`/agents/${agent.id}`)
|
||||
} catch (err) {
|
||||
@@ -162,7 +140,6 @@ export function createAgentPageActions(input: AgentPageActionInput) {
|
||||
openclaw: handleOpenClawCreate,
|
||||
claude: handleHarnessCreate,
|
||||
codex: handleHarnessCreate,
|
||||
hermes: handleHarnessCreate,
|
||||
}
|
||||
void createByRuntime[input.createRuntime]()
|
||||
}
|
||||
|
||||
@@ -4,9 +4,8 @@ import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgentAdapter,
|
||||
} from './agent-harness-types'
|
||||
import type { CreateAgentRuntime, ProviderOption } from './agents-page-types'
|
||||
import type { CreateAgentRuntime } from './agents-page-types'
|
||||
import { toProviderOptions } from './agents-page-utils'
|
||||
import { getHermesSupportedProviders } from './hermes-supported-providers'
|
||||
import {
|
||||
buildOpenClawCliProviderOptions,
|
||||
findOpenClawCliProviderById,
|
||||
@@ -172,60 +171,3 @@ export function useOpenClawProviderSelection(input: {
|
||||
cliAuthError,
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Mirror of useOpenClawProviderSelection but for Hermes. Hermes only
|
||||
* needs the create-dialog flow (no setup dialog, no CLI providers), so
|
||||
* this hook is much smaller — it just filters the global provider list
|
||||
* to ones Hermes can drive and seeds the selected id when the dialog
|
||||
* opens.
|
||||
*/
|
||||
export function useHermesProviderSelection(input: {
|
||||
providers: LlmProviderConfig[]
|
||||
defaultProviderId: string
|
||||
createOpen: boolean
|
||||
createRuntime: CreateAgentRuntime
|
||||
createHermesProviderId: string
|
||||
setCreateHermesProviderId: Dispatch<SetStateAction<string>>
|
||||
}) {
|
||||
const {
|
||||
providers,
|
||||
defaultProviderId,
|
||||
createOpen,
|
||||
createRuntime,
|
||||
createHermesProviderId,
|
||||
setCreateHermesProviderId,
|
||||
} = input
|
||||
|
||||
const selectableHermesProviders = useMemo<ProviderOption[]>(
|
||||
() =>
|
||||
getHermesSupportedProviders(providers).map((provider) => ({
|
||||
id: provider.id,
|
||||
type: provider.type,
|
||||
name: provider.name,
|
||||
modelId: provider.modelId,
|
||||
baseUrl: provider.baseUrl,
|
||||
apiKey: provider.apiKey,
|
||||
})),
|
||||
[providers],
|
||||
)
|
||||
|
||||
useEffect(() => {
|
||||
if (selectableHermesProviders.length === 0) return
|
||||
if (!createOpen || createRuntime !== 'hermes') return
|
||||
if (createHermesProviderId) return
|
||||
const fallbackId =
|
||||
selectableHermesProviders.find((p) => p.id === defaultProviderId)?.id ??
|
||||
selectableHermesProviders[0].id
|
||||
setCreateHermesProviderId(fallbackId)
|
||||
}, [
|
||||
createHermesProviderId,
|
||||
createOpen,
|
||||
createRuntime,
|
||||
defaultProviderId,
|
||||
selectableHermesProviders,
|
||||
setCreateHermesProviderId,
|
||||
])
|
||||
|
||||
return { selectableHermesProviders }
|
||||
}
|
||||
|
||||
@@ -1,30 +0,0 @@
|
||||
import {
|
||||
HERMES_SUPPORTED_BROWSEROS_PROVIDER_TYPES,
|
||||
type HermesSupportedBrowserosProviderType,
|
||||
} from '@browseros/shared/constants/hermes'
|
||||
import type { LlmProviderConfig, ProviderType } from '@/lib/llm-providers/types'
|
||||
|
||||
export function isHermesSupportedProviderType(
|
||||
providerType: ProviderType,
|
||||
): providerType is HermesSupportedBrowserosProviderType {
|
||||
return (
|
||||
HERMES_SUPPORTED_BROWSEROS_PROVIDER_TYPES as readonly ProviderType[]
|
||||
).includes(providerType)
|
||||
}
|
||||
|
||||
/**
|
||||
* Filters the user's global LLM providers down to ones Hermes can use.
|
||||
* A provider qualifies when its type is in the Hermes-supported set
|
||||
* AND it has an API key wired up. CLI-style providers (chatgpt-pro,
|
||||
* github-copilot, qwen-code) and other unsupported types (browseros,
|
||||
* ollama, lmstudio, bedrock, azure, google, moonshot) are filtered
|
||||
* out — Hermes can't drive them today.
|
||||
*/
|
||||
export function getHermesSupportedProviders(
|
||||
providers: LlmProviderConfig[],
|
||||
): LlmProviderConfig[] {
|
||||
return providers.filter(
|
||||
(provider) =>
|
||||
!!provider.apiKey && isHermesSupportedProviderType(provider.type),
|
||||
)
|
||||
}
|
||||
@@ -25,18 +25,12 @@ interface HarnessAgentsResponse {
|
||||
|
||||
export type { AgentHarnessStreamEvent }
|
||||
|
||||
export const AGENT_QUERY_KEYS = {
|
||||
const AGENT_QUERY_KEYS = {
|
||||
adapters: 'agent-harness-adapters',
|
||||
agents: 'agent-harness-agents',
|
||||
/** Outputs-rail data for one agent — `[agentOutputs, baseUrl, agentId]`. */
|
||||
agentOutputs: 'agent-harness-agent-outputs',
|
||||
/** Per-turn artifact-card files — `[agentTurnFiles, baseUrl, agentId, turnId]`. */
|
||||
agentTurnFiles: 'agent-harness-agent-turn-files',
|
||||
/** Single-file preview payload — `[filePreview, baseUrl, fileId]`. */
|
||||
filePreview: 'agent-harness-file-preview',
|
||||
} as const
|
||||
|
||||
export async function agentsFetch<T>(
|
||||
async function agentsFetch<T>(
|
||||
baseUrl: string,
|
||||
path: string,
|
||||
init?: RequestInit,
|
||||
|
||||
@@ -85,8 +85,7 @@ export const SidebarLayout: FC = () => {
|
||||
|
||||
return (
|
||||
<RpcClientProvider>
|
||||
{/* pl-14 offsets all content by the collapsed sidebar width (w-14 = 56px) so it never sits under the rail */}
|
||||
<div className="relative min-h-screen bg-background pl-14">
|
||||
<div className="relative min-h-screen bg-background">
|
||||
{/* Sidebar - fixed overlay */}
|
||||
{/* biome-ignore lint/a11y/noStaticElementInteractions: hover interactions needed */}
|
||||
<div
|
||||
@@ -97,6 +96,7 @@ export const SidebarLayout: FC = () => {
|
||||
<AppSidebar expanded={sidebarOpen} onOpenShortcuts={openShortcuts} />
|
||||
</div>
|
||||
|
||||
{/* Main content - full width, centered */}
|
||||
{location.pathname === '/home/chat' ? (
|
||||
<main className="relative h-dvh overflow-hidden">
|
||||
<Outlet />
|
||||
|
||||
@@ -108,7 +108,6 @@ function formatAdapterName(adapter: HarnessAgentAdapter): string {
|
||||
if (adapter === 'claude') return 'Claude Code'
|
||||
if (adapter === 'codex') return 'Codex'
|
||||
if (adapter === 'openclaw') return 'OpenClaw'
|
||||
if (adapter === 'hermes') return 'Hermes'
|
||||
return adapter
|
||||
}
|
||||
|
||||
|
||||
@@ -42,34 +42,11 @@ export interface UserAttachmentPreview {
|
||||
dataUrl?: string
|
||||
}
|
||||
|
||||
/**
|
||||
* Files attributed to this turn by the harness's per-turn workspace
|
||||
* diff. Populated either via the live `produced_files` SSE event or
|
||||
* (on resume) the `useAgentTurnFiles` fallback. Mirrors the wire
|
||||
* shape from `agent-harness-types.HarnessProducedFile` minus the
|
||||
* stream-only fields the inline card doesn't need.
|
||||
*/
|
||||
export interface ConversationTurnFile {
|
||||
id: string
|
||||
path: string
|
||||
size: number
|
||||
mtimeMs: number
|
||||
}
|
||||
|
||||
export interface AgentConversationTurn {
|
||||
id: string
|
||||
/**
|
||||
* Server-issued turn id, set as soon as the response headers arrive
|
||||
* (`X-Turn-Id`) for fresh sends, or from the active-turn payload on
|
||||
* resume. Required for the historic-files fallback fetch; absent on
|
||||
* the brief optimistic window before the first header.
|
||||
*/
|
||||
turnId?: string | null
|
||||
userText: string
|
||||
userAttachments?: UserAttachmentPreview[]
|
||||
parts: AssistantPart[]
|
||||
/** Files produced during this turn (openclaw only in v1). */
|
||||
producedFiles?: ConversationTurnFile[]
|
||||
done: boolean
|
||||
timestamp: number
|
||||
}
|
||||
|
||||
@@ -1,126 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* Pure helpers used by the artifact card and the Outputs rail.
|
||||
* Display formatting only — no React, no fetch, no DOM. Anything
|
||||
* stateful belongs in `./useAgentOutputs` or `./useFilePreview`.
|
||||
*/
|
||||
|
||||
import { buildAgentApiUrl } from '@/entrypoints/app/agents/agent-api-url'
|
||||
|
||||
/**
|
||||
* Coarse classification of a file's intended preview / icon path.
|
||||
* Mirrors the server-side `FilePreviewKind` minus `missing` — the
|
||||
* client only ever computes a kind for a row it already has.
|
||||
*/
|
||||
export type FileKind = 'text' | 'image' | 'pdf' | 'binary'
|
||||
|
||||
const TEXT_EXTENSIONS = new Set([
|
||||
'txt',
|
||||
'md',
|
||||
'markdown',
|
||||
'json',
|
||||
'jsonl',
|
||||
'csv',
|
||||
'tsv',
|
||||
'xml',
|
||||
'yaml',
|
||||
'yml',
|
||||
'toml',
|
||||
'ini',
|
||||
'log',
|
||||
'html',
|
||||
'htm',
|
||||
'css',
|
||||
'js',
|
||||
'mjs',
|
||||
'cjs',
|
||||
'ts',
|
||||
'tsx',
|
||||
'jsx',
|
||||
'py',
|
||||
'rb',
|
||||
'go',
|
||||
'rs',
|
||||
'java',
|
||||
'kt',
|
||||
'swift',
|
||||
'c',
|
||||
'h',
|
||||
'cpp',
|
||||
'hpp',
|
||||
'sh',
|
||||
'zsh',
|
||||
'bash',
|
||||
'sql',
|
||||
'svg',
|
||||
])
|
||||
|
||||
const IMAGE_EXTENSIONS = new Set([
|
||||
'png',
|
||||
'jpg',
|
||||
'jpeg',
|
||||
'gif',
|
||||
'webp',
|
||||
'bmp',
|
||||
'ico',
|
||||
'heic',
|
||||
'heif',
|
||||
])
|
||||
|
||||
/** Best-effort kind based on extension only. Server's preview API
|
||||
* is the source of truth for actual rendering — this is just for
|
||||
* picking an icon / sort hint without a network round-trip. */
|
||||
export function inferFileKind(path: string): FileKind {
|
||||
const ext = extensionOf(path).toLowerCase()
|
||||
if (ext === 'pdf') return 'pdf'
|
||||
if (IMAGE_EXTENSIONS.has(ext)) return 'image'
|
||||
if (TEXT_EXTENSIONS.has(ext)) return 'text'
|
||||
return 'binary'
|
||||
}
|
||||
|
||||
/** Plain extension without the leading dot. Empty string when none. */
|
||||
export function extensionOf(path: string): string {
|
||||
const dot = path.lastIndexOf('.')
|
||||
if (dot === -1) return ''
|
||||
const slash = path.lastIndexOf('/')
|
||||
if (dot < slash) return ''
|
||||
return path.slice(dot + 1)
|
||||
}
|
||||
|
||||
/** File name (final path segment), no directory prefix. */
|
||||
export function basenameOf(path: string): string {
|
||||
const slash = path.lastIndexOf('/')
|
||||
return slash === -1 ? path : path.slice(slash + 1)
|
||||
}
|
||||
|
||||
const SIZE_UNITS = ['B', 'KB', 'MB', 'GB', 'TB'] as const
|
||||
|
||||
/** "2.4 MB" / "340 KB" / "78 B" — for the artifact card's right-side
|
||||
* metadata. Not localised; the rail uses one space + the unit. */
|
||||
export function formatFileSize(bytes: number): string {
|
||||
if (!Number.isFinite(bytes) || bytes < 0) return '—'
|
||||
if (bytes < 1024) return `${bytes} ${SIZE_UNITS[0]}`
|
||||
let value = bytes
|
||||
let unit = 0
|
||||
while (value >= 1024 && unit < SIZE_UNITS.length - 1) {
|
||||
value /= 1024
|
||||
unit += 1
|
||||
}
|
||||
// 1-digit precision below 10, integer above — feels less noisy.
|
||||
const formatted = value < 10 ? value.toFixed(1) : Math.round(value).toString()
|
||||
return `${formatted} ${SIZE_UNITS[unit]}`
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the per-file download URL using the same agent-api root the
|
||||
* rest of the harness hits. Returned URL is already absolute.
|
||||
*/
|
||||
export function buildFileDownloadUrl(baseUrl: string, fileId: string): string {
|
||||
return buildAgentApiUrl(
|
||||
baseUrl,
|
||||
`/files/${encodeURIComponent(fileId)}/download`,
|
||||
)
|
||||
}
|
||||
@@ -1,32 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*/
|
||||
|
||||
export {
|
||||
basenameOf,
|
||||
buildFileDownloadUrl,
|
||||
extensionOf,
|
||||
type FileKind,
|
||||
formatFileSize,
|
||||
inferFileKind,
|
||||
} from './file-helpers'
|
||||
export type {
|
||||
BinaryFilePreview,
|
||||
FilePreview,
|
||||
FilePreviewKind,
|
||||
ImageFilePreview,
|
||||
MissingFilePreview,
|
||||
PdfFilePreview,
|
||||
ProducedFile,
|
||||
ProducedFilesRailGroup,
|
||||
TextFilePreview,
|
||||
} from './types'
|
||||
export {
|
||||
useAgentOutputs,
|
||||
useAgentTurnFiles,
|
||||
useInvalidateAgentOutputs,
|
||||
useRefreshAgentOutputs,
|
||||
} from './useAgentOutputs'
|
||||
export { useFilePreview } from './useFilePreview'
|
||||
@@ -1,75 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* Wire types shared by the inline artifact card and the per-agent
|
||||
* Outputs rail. These mirror `ProducedFileEntry` /
|
||||
* `ProducedFilesRailGroup` on the server and the `FilePreview`
|
||||
* discriminated union from `apps/server/src/api/services/openclaw/file-preview.ts`.
|
||||
*
|
||||
* The schema mirror is deliberate (vs sharing a workspace package)
|
||||
* because the server keeps the on-disk row shape — `agentDefinitionId`,
|
||||
* `sessionKey` — out of the wire payload. Dropping those columns at the
|
||||
* type boundary keeps the client honest about what it can refer to.
|
||||
*/
|
||||
|
||||
export interface ProducedFile {
|
||||
id: string
|
||||
/** Workspace-relative POSIX path. */
|
||||
path: string
|
||||
size: number
|
||||
mtimeMs: number
|
||||
/** Server clock when the file was first attributed to its turn. */
|
||||
createdAt: number
|
||||
detectedBy: 'diff' | 'tool'
|
||||
}
|
||||
|
||||
export interface ProducedFilesRailGroup {
|
||||
turnId: string
|
||||
/** First non-blank line of the user prompt that initiated this turn. */
|
||||
turnPrompt: string
|
||||
createdAt: number
|
||||
files: ProducedFile[]
|
||||
}
|
||||
|
||||
export type FilePreviewKind = 'text' | 'image' | 'pdf' | 'binary' | 'missing'
|
||||
|
||||
interface BasePreview {
|
||||
kind: FilePreviewKind
|
||||
mimeType: string
|
||||
size: number
|
||||
mtimeMs: number
|
||||
}
|
||||
|
||||
export interface TextFilePreview extends BasePreview {
|
||||
kind: 'text'
|
||||
snippet: string
|
||||
/** True when the on-disk file is larger than the server's snippet cap. */
|
||||
truncated: boolean
|
||||
}
|
||||
|
||||
export interface ImageFilePreview extends BasePreview {
|
||||
kind: 'image'
|
||||
/** Base64 data URL (incl. `data:` prefix). Suitable for `<img src>`. */
|
||||
dataUrl: string
|
||||
}
|
||||
|
||||
export interface PdfFilePreview extends BasePreview {
|
||||
kind: 'pdf'
|
||||
}
|
||||
|
||||
export interface BinaryFilePreview extends BasePreview {
|
||||
kind: 'binary'
|
||||
}
|
||||
|
||||
export interface MissingFilePreview {
|
||||
kind: 'missing'
|
||||
}
|
||||
|
||||
export type FilePreview =
|
||||
| TextFilePreview
|
||||
| ImageFilePreview
|
||||
| PdfFilePreview
|
||||
| BinaryFilePreview
|
||||
| MissingFilePreview
|
||||
@@ -1,166 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* React Query hooks backing the per-agent Outputs rail and the
|
||||
* inline artifact card.
|
||||
*
|
||||
* Live updates: the consumer of `useAgentConversation` (see Phase 5)
|
||||
* is expected to call `useInvalidateAgentOutputs(agentId)` whenever
|
||||
* an assistant turn completes, so the rail picks up the new
|
||||
* `produced_files` rows the server attributed during that turn.
|
||||
* No SSE channel here — invalidation off the existing chat-stream
|
||||
* completion is enough for v1.
|
||||
*/
|
||||
|
||||
import { useMutation, useQuery, useQueryClient } from '@tanstack/react-query'
|
||||
import {
|
||||
AGENT_QUERY_KEYS,
|
||||
agentsFetch,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
|
||||
import type { ProducedFile, ProducedFilesRailGroup } from './types'
|
||||
|
||||
interface OutputsResponse {
|
||||
groups: ProducedFilesRailGroup[]
|
||||
}
|
||||
|
||||
interface TurnFilesResponse {
|
||||
files: ProducedFile[]
|
||||
}
|
||||
|
||||
export function useAgentOutputs(agentId: string, enabled = true) {
|
||||
const {
|
||||
baseUrl,
|
||||
isLoading: urlLoading,
|
||||
error: urlError,
|
||||
} = useAgentServerUrl()
|
||||
|
||||
const query = useQuery<ProducedFilesRailGroup[], Error>({
|
||||
queryKey: [AGENT_QUERY_KEYS.agentOutputs, baseUrl, agentId],
|
||||
queryFn: async () => {
|
||||
const data = await agentsFetch<OutputsResponse>(
|
||||
baseUrl as string,
|
||||
`/${encodeURIComponent(agentId)}/files`,
|
||||
)
|
||||
return data.groups ?? []
|
||||
},
|
||||
enabled: Boolean(baseUrl) && !urlLoading && enabled && Boolean(agentId),
|
||||
})
|
||||
|
||||
return {
|
||||
groups: query.data ?? [],
|
||||
loading: query.isLoading || urlLoading,
|
||||
error: query.error ?? urlError,
|
||||
refetch: query.refetch,
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Per-turn fetch for the inline artifact card. Used both as the
|
||||
* fallback when an SSE `produced_files` event was missed, and to
|
||||
* rehydrate a turn the user scrolled back to.
|
||||
*/
|
||||
export function useAgentTurnFiles(
|
||||
agentId: string,
|
||||
turnId: string | null,
|
||||
enabled = true,
|
||||
) {
|
||||
const {
|
||||
baseUrl,
|
||||
isLoading: urlLoading,
|
||||
error: urlError,
|
||||
} = useAgentServerUrl()
|
||||
|
||||
const query = useQuery<ProducedFile[], Error>({
|
||||
queryKey: [AGENT_QUERY_KEYS.agentTurnFiles, baseUrl, agentId, turnId],
|
||||
queryFn: async () => {
|
||||
const data = await agentsFetch<TurnFilesResponse>(
|
||||
baseUrl as string,
|
||||
`/${encodeURIComponent(agentId)}/files/turn/${encodeURIComponent(
|
||||
turnId as string,
|
||||
)}`,
|
||||
)
|
||||
return data.files ?? []
|
||||
},
|
||||
enabled:
|
||||
Boolean(baseUrl) &&
|
||||
!urlLoading &&
|
||||
enabled &&
|
||||
Boolean(agentId) &&
|
||||
Boolean(turnId),
|
||||
})
|
||||
|
||||
return {
|
||||
files: query.data ?? [],
|
||||
loading: query.isLoading || urlLoading,
|
||||
error: query.error ?? urlError,
|
||||
refetch: query.refetch,
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a callable that invalidates outputs / turn-files queries
|
||||
* for one agent across any baseUrl. Call after an assistant turn
|
||||
* completes so the rail (and the inline file-card strip) pick up
|
||||
* the new attributed rows. Cheap when the queries aren't mounted
|
||||
* — react-query just marks the cached value stale.
|
||||
*
|
||||
* Implementation note: react-query's `invalidateQueries({ queryKey })`
|
||||
* does positional partial-match, so passing `undefined` as the
|
||||
* baseUrl placeholder does NOT match a cached `[…, baseUrl, …]`
|
||||
* key — the cache stayed stale. Use a predicate so we ignore the
|
||||
* baseUrl position entirely.
|
||||
*/
|
||||
export function useInvalidateAgentOutputs() {
|
||||
const queryClient = useQueryClient()
|
||||
return async (agentId: string, turnId?: string) => {
|
||||
await Promise.all([
|
||||
queryClient.invalidateQueries({
|
||||
predicate: (query) => {
|
||||
const key = query.queryKey
|
||||
return (
|
||||
Array.isArray(key) &&
|
||||
key[0] === AGENT_QUERY_KEYS.agentOutputs &&
|
||||
key[2] === agentId
|
||||
)
|
||||
},
|
||||
}),
|
||||
queryClient.invalidateQueries({
|
||||
predicate: (query) => {
|
||||
const key = query.queryKey
|
||||
if (
|
||||
!Array.isArray(key) ||
|
||||
key[0] !== AGENT_QUERY_KEYS.agentTurnFiles ||
|
||||
key[2] !== agentId
|
||||
) {
|
||||
return false
|
||||
}
|
||||
// When a turnId was supplied, scope to just that turn's
|
||||
// entry. Otherwise flush every cached turn for this agent.
|
||||
return turnId ? key[3] === turnId : true
|
||||
},
|
||||
}),
|
||||
])
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Tiny mutation wrapper so the Outputs rail's "Refresh" button can
|
||||
* surface an `isPending` indicator while the new query is in flight.
|
||||
* No body — just triggers `refetch` on the rail's query for this
|
||||
* agent and resolves when it settles.
|
||||
*/
|
||||
export function useRefreshAgentOutputs(agentId: string) {
|
||||
const queryClient = useQueryClient()
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
return useMutation({
|
||||
mutationFn: async () => {
|
||||
await queryClient.refetchQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agentOutputs, baseUrl, agentId],
|
||||
exact: true,
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
@@ -1,49 +0,0 @@
|
||||
/**
|
||||
* @license
|
||||
* Copyright 2025 BrowserOS
|
||||
* SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
*
|
||||
* Single-file preview hook used by the inline artifact card and the
|
||||
* Outputs rail's preview Sheet. Always opt-in (`enabled`) — the
|
||||
* preview is fetched only when the user clicks a row, never
|
||||
* eagerly.
|
||||
*/
|
||||
|
||||
import { useQuery } from '@tanstack/react-query'
|
||||
import {
|
||||
AGENT_QUERY_KEYS,
|
||||
agentsFetch,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
|
||||
import type { FilePreview } from './types'
|
||||
|
||||
export function useFilePreview(fileId: string | null, enabled = true) {
|
||||
const {
|
||||
baseUrl,
|
||||
isLoading: urlLoading,
|
||||
error: urlError,
|
||||
} = useAgentServerUrl()
|
||||
|
||||
const query = useQuery<FilePreview, Error>({
|
||||
queryKey: [AGENT_QUERY_KEYS.filePreview, baseUrl, fileId],
|
||||
queryFn: async () => {
|
||||
return agentsFetch<FilePreview>(
|
||||
baseUrl as string,
|
||||
`/files/${encodeURIComponent(fileId as string)}/preview`,
|
||||
)
|
||||
},
|
||||
enabled: Boolean(baseUrl) && !urlLoading && enabled && Boolean(fileId),
|
||||
// Previews are immutable for a given fileId — once loaded, never
|
||||
// refetch on focus / reconnect. They go stale only when the
|
||||
// underlying file is removed (rare in v1; no rename / delete).
|
||||
staleTime: Infinity,
|
||||
gcTime: 5 * 60 * 1000,
|
||||
})
|
||||
|
||||
return {
|
||||
preview: query.data ?? null,
|
||||
loading: query.isLoading || urlLoading,
|
||||
error: query.error ?? urlError,
|
||||
refetch: query.refetch,
|
||||
}
|
||||
}
|
||||
@@ -1,76 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { stageAttachment } from './attachments'
|
||||
|
||||
function restoreGlobal(name: string, value: unknown) {
|
||||
if (value === undefined) {
|
||||
Reflect.deleteProperty(globalThis, name)
|
||||
return
|
||||
}
|
||||
Reflect.set(globalThis, name, value)
|
||||
}
|
||||
|
||||
describe('stageAttachment', () => {
|
||||
it('uses the recompressed blob media type for large images', async () => {
|
||||
const originalCreateImageBitmap = Reflect.get(
|
||||
globalThis,
|
||||
'createImageBitmap',
|
||||
)
|
||||
const originalOffscreenCanvas = Reflect.get(globalThis, 'OffscreenCanvas')
|
||||
const originalHTMLCanvasElement = Reflect.get(
|
||||
globalThis,
|
||||
'HTMLCanvasElement',
|
||||
)
|
||||
|
||||
class FakeOffscreenCanvas {
|
||||
width: number
|
||||
height: number
|
||||
|
||||
constructor(width: number, height: number) {
|
||||
this.width = width
|
||||
this.height = height
|
||||
}
|
||||
|
||||
getContext() {
|
||||
return {
|
||||
drawImage() {},
|
||||
}
|
||||
}
|
||||
|
||||
async convertToBlob(options: { type?: string }) {
|
||||
return new Blob([new Uint8Array([9, 8, 7])], {
|
||||
type: options.type ?? 'image/jpeg',
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
Reflect.set(globalThis, 'createImageBitmap', async () => ({
|
||||
width: 4096,
|
||||
height: 2048,
|
||||
close() {},
|
||||
}))
|
||||
Reflect.set(globalThis, 'OffscreenCanvas', FakeOffscreenCanvas)
|
||||
Reflect.set(globalThis, 'HTMLCanvasElement', class HTMLCanvasElement {})
|
||||
|
||||
const file = new File([new Uint8Array(2 * 1024 * 1024)], 'shot.png', {
|
||||
type: 'image/png',
|
||||
})
|
||||
|
||||
const result = await stageAttachment(file)
|
||||
|
||||
expect(result.ok).toBe(true)
|
||||
if (!result.ok) throw new Error(result.error.message)
|
||||
expect(result.attachment.mediaType).toBe('image/jpeg')
|
||||
expect(result.attachment.dataUrl).toStartWith('data:image/jpeg;base64,')
|
||||
expect(result.attachment.payload).toMatchObject({
|
||||
kind: 'image',
|
||||
mediaType: 'image/jpeg',
|
||||
dataUrl: result.attachment.dataUrl,
|
||||
})
|
||||
} finally {
|
||||
restoreGlobal('createImageBitmap', originalCreateImageBitmap)
|
||||
restoreGlobal('OffscreenCanvas', originalOffscreenCanvas)
|
||||
restoreGlobal('HTMLCanvasElement', originalHTMLCanvasElement)
|
||||
}
|
||||
})
|
||||
})
|
||||
@@ -100,7 +100,6 @@ export async function stageAttachment(
|
||||
try {
|
||||
const compressed = await compressImageIfNeeded(file)
|
||||
const dataUrl = await readAsDataUrl(compressed)
|
||||
const encodedMediaType = compressed.type || mediaType
|
||||
// Rough byte ceiling — `data:image/png;base64,...` doubles size with
|
||||
// base64. Reject early so we never POST something the route will 400.
|
||||
if (dataUrl.length > MAX_IMAGE_BYTES * 2) {
|
||||
@@ -119,12 +118,12 @@ export async function stageAttachment(
|
||||
attachment: {
|
||||
id: makeId(),
|
||||
kind: 'image',
|
||||
mediaType: encodedMediaType,
|
||||
mediaType,
|
||||
name: file.name || 'image',
|
||||
dataUrl,
|
||||
payload: {
|
||||
kind: 'image',
|
||||
mediaType: encodedMediaType,
|
||||
mediaType,
|
||||
dataUrl,
|
||||
name: file.name || undefined,
|
||||
},
|
||||
|
||||
@@ -38,8 +38,8 @@ browseros-cli install # downloads BrowserOS for your platform
|
||||
# If BrowserOS is installed but not running
|
||||
browseros-cli launch # opens BrowserOS, waits for server
|
||||
|
||||
# Configure the CLI with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
# Configure the CLI (auto-discovers running BrowserOS)
|
||||
browseros-cli init --auto # detects server URL and saves config
|
||||
|
||||
# Verify connection
|
||||
browseros-cli health
|
||||
@@ -52,7 +52,7 @@ browseros-cli init <url> # non-interactive — pass URL directly
|
||||
browseros-cli init # interactive — prompts for URL
|
||||
```
|
||||
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
|
||||
|
||||
### CLI updates
|
||||
|
||||
@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
|
||||
| `--debug` | `BOS_DEBUG=1` | Debug output |
|
||||
| `--timeout, -t` | | Request timeout (default: 2m) |
|
||||
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
|
||||
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
|
||||
|
||||
## Testing
|
||||
|
||||
@@ -179,7 +179,7 @@ apps/cli/
|
||||
│ └── config.go # Config file (~/.config/browseros-cli/config.yaml)
|
||||
├── cmd/
|
||||
│ ├── root.go # Root command, global flags
|
||||
│ ├── init.go # Server URL configuration (URL arg or interactive)
|
||||
│ ├── init.go # Server URL configuration (URL arg, --auto, interactive)
|
||||
│ ├── install.go # install (download BrowserOS for current platform)
|
||||
│ ├── launch.go # launch (find and start BrowserOS, wait for server)
|
||||
│ ├── open.go # open (new_page / new_hidden_page)
|
||||
|
||||
@@ -17,6 +17,8 @@ import (
|
||||
)
|
||||
|
||||
func init() {
|
||||
var autoDiscover bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "init [url]",
|
||||
Short: "Configure the BrowserOS server connection",
|
||||
@@ -32,8 +34,9 @@ You can provide the full URL or just the port number:
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
browseros-cli init 9000
|
||||
|
||||
Modes:
|
||||
Three modes:
|
||||
browseros-cli init <url> Non-interactive (full URL or port number)
|
||||
browseros-cli init --auto Auto-discover from ~/.browseros/server.json
|
||||
browseros-cli init Interactive prompt`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.MaximumNArgs(1),
|
||||
@@ -46,9 +49,22 @@ Modes:
|
||||
|
||||
switch {
|
||||
case len(args) == 1:
|
||||
// Non-interactive: URL provided as argument
|
||||
input = args[0]
|
||||
|
||||
case autoDiscover:
|
||||
// Auto-discover: server.json → config → probe common ports
|
||||
discovered := probeRunningServer()
|
||||
if discovered == "" {
|
||||
output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
|
||||
" If not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", 1)
|
||||
}
|
||||
input = discovered
|
||||
fmt.Printf("Auto-discovered server at %s\n", input)
|
||||
|
||||
default:
|
||||
// Interactive prompt (original behavior)
|
||||
fmt.Println()
|
||||
bold.Println("BrowserOS CLI Setup")
|
||||
fmt.Println()
|
||||
@@ -79,14 +95,12 @@ Modes:
|
||||
output.Errorf(1, "invalid URL: %s", input)
|
||||
}
|
||||
|
||||
// Verify connectivity
|
||||
fmt.Printf("Checking connection to %s ...\n", baseURL)
|
||||
client := &http.Client{Timeout: 5 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
output.Errorf(1, "cannot connect to %s: %v\n\n"+
|
||||
"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
|
||||
"Then run: browseros-cli init <Server URL>\n"+
|
||||
"Example: browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
|
||||
output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
|
||||
}
|
||||
resp.Body.Close()
|
||||
|
||||
@@ -107,5 +121,6 @@ Modes:
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
|
||||
rootCmd.AddCommand(cmd)
|
||||
}
|
||||
|
||||
@@ -28,7 +28,7 @@ Linux: Downloads AppImage (or .deb with --deb flag)
|
||||
|
||||
After installation:
|
||||
browseros-cli launch # start BrowserOS
|
||||
browseros-cli init <url> # configure the CLI with the Server URL`,
|
||||
browseros-cli init --auto # configure the CLI`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.NoArgs,
|
||||
Run: func(cmd *cobra.Command, args []string) {
|
||||
@@ -81,7 +81,7 @@ After installation:
|
||||
fmt.Println()
|
||||
bold.Println("Next steps:")
|
||||
dim.Println(" browseros-cli launch # start BrowserOS")
|
||||
dim.Println(" browseros-cli init <url> # use the Server URL from BrowserOS settings")
|
||||
dim.Println(" browseros-cli init --auto # configure the CLI")
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"os"
|
||||
@@ -39,7 +38,6 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
if url := probeRunningServer(); url != "" {
|
||||
green.Printf("BrowserOS is already running at %s\n", url)
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
return
|
||||
}
|
||||
|
||||
@@ -65,7 +63,7 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
green.Printf("BrowserOS is ready at %s\n", url)
|
||||
fmt.Println()
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
dim.Println("Next: browseros-cli init --auto")
|
||||
},
|
||||
}
|
||||
|
||||
@@ -77,77 +75,39 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
// Server probing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
var commonBrowserOSPorts = []int{9100, 9200, 9300}
|
||||
|
||||
// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
|
||||
// probeRunningServer checks server.json, config, and common ports for a running server.
|
||||
func probeRunningServer() string {
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
check := func(baseURL string) bool {
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
// 1. server.json — written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" && check(url) {
|
||||
return url
|
||||
}
|
||||
|
||||
if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
// 2. Saved config / env var
|
||||
if url := defaultServerURL(); url != "" && check(url) {
|
||||
return url
|
||||
}
|
||||
|
||||
return probeCommonServerPorts(client)
|
||||
}
|
||||
|
||||
func checkServerHealth(client *http.Client, baseURL string) bool {
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
|
||||
func probeCommonServerPorts(client *http.Client) string {
|
||||
for _, port := range commonBrowserOSPorts {
|
||||
// 3. Probe common BrowserOS ports as last resort
|
||||
for _, port := range []int{9100, 9200, 9300} {
|
||||
url := fmt.Sprintf("http://127.0.0.1:%d", port)
|
||||
if checkServerHealth(client, url) {
|
||||
if check(url) {
|
||||
return url
|
||||
}
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
|
||||
//
|
||||
// Normal command resolution must not call this because it can override a URL the
|
||||
// user explicitly saved with `browseros-cli init <Server URL>`.
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func mcpEndpointURL(baseURL string) string {
|
||||
return strings.TrimSuffix(baseURL, "/") + "/mcp"
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Platform-native installation detection
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -157,8 +117,7 @@ func mcpEndpointURL(baseURL string) string {
|
||||
// macOS: `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
|
||||
// Linux: checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
|
||||
// Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
|
||||
//
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
func isBrowserOSInstalled() bool {
|
||||
switch runtime.GOOS {
|
||||
case "darwin":
|
||||
@@ -312,11 +271,14 @@ func waitForServer(maxWait time.Duration) (string, bool) {
|
||||
|
||||
for time.Now().Before(deadline) {
|
||||
// server.json is written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
return url, true
|
||||
}
|
||||
if url := probeCommonServerPorts(client); url != "" {
|
||||
return url, true
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
resp, err := client.Get(url + "/health")
|
||||
if err == nil {
|
||||
resp.Body.Close()
|
||||
if resp.StatusCode == 200 {
|
||||
return url, true
|
||||
}
|
||||
}
|
||||
}
|
||||
fmt.Print(".")
|
||||
time.Sleep(1 * time.Second)
|
||||
|
||||
@@ -1,99 +0,0 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"net/url"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
discoveredServer := newHealthyServer(t)
|
||||
configServer := newHealthyServer(t)
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := probeRunningServer()
|
||||
if got != normalizeServerURL(discoveredServer.URL) {
|
||||
t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
|
||||
server := newHealthyServer(t)
|
||||
port := serverPort(t, server.URL)
|
||||
|
||||
originalPorts := commonBrowserOSPorts
|
||||
commonBrowserOSPorts = []int{port}
|
||||
t.Cleanup(func() {
|
||||
commonBrowserOSPorts = originalPorts
|
||||
})
|
||||
|
||||
got, ok := waitForServer(100 * time.Millisecond)
|
||||
if !ok {
|
||||
t.Fatal("waitForServer() ok = false, want true")
|
||||
}
|
||||
if got != normalizeServerURL(server.URL) {
|
||||
t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func newHealthyServer(t *testing.T) *httptest.Server {
|
||||
t.Helper()
|
||||
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if r.URL.Path != "/health" {
|
||||
http.NotFound(w, r)
|
||||
return
|
||||
}
|
||||
w.WriteHeader(http.StatusOK)
|
||||
}))
|
||||
t.Cleanup(server.Close)
|
||||
return server
|
||||
}
|
||||
|
||||
func serverPort(t *testing.T, rawURL string) int {
|
||||
t.Helper()
|
||||
|
||||
parsed, err := url.Parse(rawURL)
|
||||
if err != nil {
|
||||
t.Fatalf("url.Parse() error = %v", err)
|
||||
}
|
||||
_, portText, err := net.SplitHostPort(parsed.Host)
|
||||
if err != nil {
|
||||
t.Fatalf("net.SplitHostPort() error = %v", err)
|
||||
}
|
||||
port, err := strconv.Atoi(portText)
|
||||
if err != nil {
|
||||
t.Fatalf("strconv.Atoi() error = %v", err)
|
||||
}
|
||||
return port
|
||||
}
|
||||
@@ -2,8 +2,10 @@ package cmd
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -287,15 +289,18 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
|
||||
}
|
||||
}
|
||||
|
||||
// defaultServerURL returns the implicit target from user-controlled settings only.
|
||||
//
|
||||
// BrowserOS writes a discovery file at runtime, but normal commands intentionally
|
||||
// ignore it so a saved URL is not silently overridden by another running server.
|
||||
func defaultServerURL() string {
|
||||
// 1. Explicit env var always wins
|
||||
if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
|
||||
return env
|
||||
}
|
||||
|
||||
// 2. Live discovery file from running BrowserOS (most current)
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
return url
|
||||
}
|
||||
|
||||
// 3. Saved config (may be stale if port changed)
|
||||
cfg, err := config.Load()
|
||||
if err == nil {
|
||||
if url := normalizeServerURL(cfg.ServerURL); url != "" {
|
||||
@@ -306,6 +311,33 @@ func defaultServerURL() string {
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func normalizeServerURL(raw string) string {
|
||||
normalized := strings.TrimSpace(raw)
|
||||
|
||||
@@ -337,10 +369,8 @@ func validateServerURL(raw string) (string, error) {
|
||||
|
||||
return "", fmt.Errorf(
|
||||
"BrowserOS server URL is not configured.\n\n" +
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
" If BrowserOS is running: browseros-cli init --auto\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
)
|
||||
}
|
||||
|
||||
@@ -1,13 +1,8 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestSetVersionUpdatesRootCommand(t *testing.T) {
|
||||
@@ -105,76 +100,6 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(`{"url":"http://127.0.0.1:9999"}`)
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
|
||||
if got := defaultServerURL(); got != "" {
|
||||
t.Fatalf("defaultServerURL() = %q, want empty", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
|
||||
got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateServerURLExplainsManualInit(t *testing.T) {
|
||||
_, err := validateServerURL("")
|
||||
if err == nil {
|
||||
t.Fatal("validateServerURL() error = nil, want setup instructions")
|
||||
}
|
||||
msg := err.Error()
|
||||
if !strings.Contains(msg, "browseros-cli init <Server URL>") {
|
||||
t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
|
||||
}
|
||||
if strings.Contains(msg, "init --auto") {
|
||||
t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
|
||||
done := make(chan struct{})
|
||||
returned := make(chan struct{})
|
||||
|
||||
@@ -44,7 +44,10 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {
|
||||
|
||||
session, err := sdkClient.Connect(ctx, transport, nil)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
}
|
||||
return session, nil
|
||||
}
|
||||
@@ -184,7 +187,10 @@ func (c *Client) Status() (map[string]any, error) {
|
||||
func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
resp, err := c.HTTPClient.Get(c.BaseURL + path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
@@ -199,14 +205,3 @@ func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
}
|
||||
return data, nil
|
||||
}
|
||||
|
||||
// connectionSetupInstructions explains how to recover from a stale or missing server URL.
|
||||
func connectionSetupInstructions() string {
|
||||
return "\n\n" +
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" Run once with: browseros-cli --server <Server URL> health\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install"
|
||||
}
|
||||
|
||||
@@ -31,8 +31,8 @@ browseros-cli install
|
||||
# Start BrowserOS
|
||||
browseros-cli launch
|
||||
|
||||
# Configure MCP settings with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
# Auto-configure MCP settings for your AI tools
|
||||
browseros-cli init --auto
|
||||
|
||||
# Verify everything is working
|
||||
browseros-cli health
|
||||
|
||||
25
packages/browseros-agent/apps/eval/README.md
vendored
25
packages/browseros-agent/apps/eval/README.md
vendored
@@ -9,7 +9,6 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
|
||||
- **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
|
||||
- **Bun** runtime
|
||||
- **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
|
||||
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -68,7 +67,7 @@ This lets us run the same suite against multiple model setups without copying th
|
||||
|
||||
```txt
|
||||
agisdk-daily-10 + kimi-fireworks
|
||||
agisdk-daily-10 + claude-opus
|
||||
agisdk-daily-10 + claude-sonnet
|
||||
agisdk-daily-10 + clado-action-000159
|
||||
```
|
||||
|
||||
@@ -80,7 +79,6 @@ For `orchestrator-executor` suites, there can also be an executor model/backend.
|
||||
|------|-------------|
|
||||
| `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
|
||||
| `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
|
||||
| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |
|
||||
|
||||
### Single agent
|
||||
|
||||
@@ -121,24 +119,6 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Code
|
||||
|
||||
Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
|
||||
bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
|
||||
```
|
||||
|
||||
## Graders
|
||||
|
||||
| Name | Description |
|
||||
@@ -171,7 +151,6 @@ The `apiKey` field supports two formats:
|
||||
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
|
||||
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
|
||||
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
|
||||
| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
|
||||
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
|
||||
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
|
||||
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
|
||||
@@ -215,7 +194,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
@@ -1,26 +0,0 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "single",
|
||||
"provider": "openai-compatible",
|
||||
"model": "moonshotai/kimi-k2.5",
|
||||
"apiKey": "OPENROUTER_API_KEY",
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 3,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -1,27 +0,0 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "single",
|
||||
"provider": "bedrock",
|
||||
"model": "global.anthropic.claude-opus-4-6-v1",
|
||||
"region": "AWS_REGION",
|
||||
"accessKeyId": "AWS_ACCESS_KEY_ID",
|
||||
"secretAccessKey": "AWS_SECRET_ACCESS_KEY",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 2,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -7,8 +7,8 @@
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 3,
|
||||
"dataset": "../../data/webbench-2of4-50.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
@@ -21,6 +21,6 @@
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"graders": ["performance_grader"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
|
||||
@@ -1,23 +0,0 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus",
|
||||
"extraArgs": ["--permission-mode", "bypassPermissions"]
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 1,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
|
||||
@@ -1,191 +0,0 @@
|
||||
#!/usr/bin/env bun
|
||||
|
||||
import { mkdir, stat } from 'node:fs/promises'
|
||||
import { dirname, resolve } from 'node:path'
|
||||
import { query as claudeQuery } from '@anthropic-ai/claude-agent-sdk'
|
||||
import { readRunMetricSummary } from '../src/reporting/task-metrics'
|
||||
|
||||
export const DEFAULT_REPORT_MODEL = 'claude-opus-4-6'
|
||||
export const DEFAULT_REPORT_MAX_TURNS = 300
|
||||
|
||||
type Env = Record<string, string | undefined>
|
||||
type ClaudeQuery = (input: unknown) => AsyncIterable<Record<string, unknown>>
|
||||
|
||||
export interface ReportAgentInvocation {
|
||||
inputDir: string
|
||||
outputPath: string
|
||||
prompt: string
|
||||
}
|
||||
|
||||
export interface GenerateEvalReportOptions {
|
||||
inputDir: string
|
||||
outputPath: string
|
||||
runAgent?: (invocation: ReportAgentInvocation) => Promise<void>
|
||||
}
|
||||
|
||||
interface ClaudeReportAgentDeps {
|
||||
query?: ClaudeQuery
|
||||
env?: Env
|
||||
}
|
||||
|
||||
function usage(): string {
|
||||
return `Usage: bun scripts/generate-report.ts --input <run-dir> --output <report.html>`
|
||||
}
|
||||
|
||||
function parseArgs(
|
||||
argv: string[],
|
||||
): Pick<GenerateEvalReportOptions, 'inputDir' | 'outputPath'> {
|
||||
let inputDir = ''
|
||||
let outputPath = ''
|
||||
for (let i = 0; i < argv.length; i++) {
|
||||
const arg = argv[i]
|
||||
if (arg === '--input' || arg === '--run') {
|
||||
inputDir = argv[++i] ?? ''
|
||||
} else if (arg === '--output' || arg === '--out') {
|
||||
outputPath = argv[++i] ?? ''
|
||||
} else if (arg === '--help' || arg === '-h') {
|
||||
console.log(usage())
|
||||
process.exit(0)
|
||||
}
|
||||
}
|
||||
if (!inputDir || !outputPath) {
|
||||
throw new Error(usage())
|
||||
}
|
||||
return { inputDir, outputPath }
|
||||
}
|
||||
|
||||
function claudeCodeEnv(env: Env): Env {
|
||||
return {
|
||||
CLAUDE_CODE_OAUTH_TOKEN: env.CLAUDE_CODE_OAUTH_TOKEN,
|
||||
ANTHROPIC_API_KEY: env.ANTHROPIC_API_KEY,
|
||||
HOME: env.HOME,
|
||||
PATH: env.PATH,
|
||||
SHELL: env.SHELL,
|
||||
TMPDIR: env.TMPDIR,
|
||||
TMP: env.TMP,
|
||||
TEMP: env.TEMP,
|
||||
USER: env.USER,
|
||||
CLAUDECODE: '',
|
||||
}
|
||||
}
|
||||
|
||||
async function buildReportPrompt(
|
||||
inputDir: string,
|
||||
outputPath: string,
|
||||
): Promise<string> {
|
||||
const metrics = await readRunMetricSummary(inputDir)
|
||||
|
||||
return `Analyze this BrowserOS eval run and write a shareable HTML report.
|
||||
|
||||
Run directory: ${inputDir}
|
||||
Output file to write: ${outputPath}
|
||||
|
||||
You are running with the run directory as cwd. Inspect the local artifacts:
|
||||
- summary.json for run totals and pass rate
|
||||
- each task directory's metadata.json for query, final answer, timing, screenshots, and grader results
|
||||
- each task directory's messages.jsonl for tool calls, tool errors, and recent trajectory
|
||||
- screenshots/ for visual evidence
|
||||
- grader-artifacts/ when present for grader-specific context
|
||||
|
||||
Write the final report directly to the output file path above. Do not print the
|
||||
report instead of writing it. Do not modify any input artifacts. The only file
|
||||
you should create or overwrite is the requested report.html.
|
||||
|
||||
The report should follow the style and density of the Shadowfax AGI SDK report:
|
||||
- Title like "AGI SDK Random-10 Failure Report" or a run-specific equivalent
|
||||
- Run directory and note that screenshots are embedded as data URIs
|
||||
- Summary cards for total tasks, passed, failed, pass rate, average duration, average steps, and average tool calls
|
||||
- A Metrics section with compact charts for Duration by task, Steps by task, Tool calls by task, and Tool errors by task
|
||||
- Task Summary table with task id, status, score, duration, steps, and prompt
|
||||
- Include tool calls and tool errors in the Task Summary table
|
||||
- Failure sections with stable anchors using each task id, for example <section id="agisdk-networkin-10">
|
||||
- For each failed task: Diagnosis, Evidence, Next Check, final screenshot, AGI SDK / grader criteria, final answer, and recent trajectory events
|
||||
- Make failure links in the summary table point to the task anchors
|
||||
- Keep the HTML self-contained: inline CSS and embedded final screenshots as data:image/png;base64 URIs
|
||||
- Escape user/model text correctly so task outputs cannot break the page
|
||||
|
||||
Analysis guidance:
|
||||
- Focus on why the model failed: task understanding, browser/tool usage, missing verification, tool errors, max-step/timeout, bad final answer, or grader ambiguity
|
||||
- Use messages.jsonl strategically. Do not paste huge DOM outputs into the report. Summarize only the relevant recent trajectory and evidence.
|
||||
- Limit trajectory analysis to the most relevant 200-300 events/calls across the run. Prefer failed tasks and the final/key actions for each failure.
|
||||
- If a grader criterion is boolean-only or ambiguous, say so and identify what additional artifact would make it debuggable.
|
||||
|
||||
Deterministic run metrics computed from metadata.json and messages.jsonl:
|
||||
\`\`\`json
|
||||
${JSON.stringify(metrics, null, 2)}
|
||||
\`\`\`
|
||||
|
||||
After writing the file, verify that ${outputPath} exists and is non-empty.`
|
||||
}
|
||||
|
||||
async function assertRunDir(inputDir: string): Promise<void> {
|
||||
const inputStat = await stat(inputDir).catch(() => null)
|
||||
if (!inputStat?.isDirectory()) {
|
||||
throw new Error(`Not a run directory: ${inputDir}`)
|
||||
}
|
||||
}
|
||||
|
||||
async function assertReportWritten(outputPath: string): Promise<void> {
|
||||
const outputStat = await stat(outputPath).catch(() => null)
|
||||
if (!outputStat?.isFile() || outputStat.size === 0) {
|
||||
throw new Error(`Report was not written: ${outputPath}`)
|
||||
}
|
||||
}
|
||||
|
||||
export async function runClaudeCodeReportAgent(
|
||||
invocation: ReportAgentInvocation,
|
||||
deps: ClaudeReportAgentDeps = {},
|
||||
): Promise<void> {
|
||||
const query = deps.query ?? (claudeQuery as unknown as ClaudeQuery)
|
||||
let resultSubtype: string | undefined
|
||||
|
||||
for await (const message of query({
|
||||
prompt: invocation.prompt,
|
||||
options: {
|
||||
cwd: invocation.inputDir,
|
||||
model: DEFAULT_REPORT_MODEL,
|
||||
systemPrompt:
|
||||
'You are an eval failure analyst. Produce a concise, evidence-backed, self-contained HTML report from local run artifacts.',
|
||||
permissionMode: 'bypassPermissions',
|
||||
allowDangerouslySkipPermissions: true,
|
||||
maxTurns: DEFAULT_REPORT_MAX_TURNS,
|
||||
env: claudeCodeEnv(deps.env ?? process.env),
|
||||
},
|
||||
})) {
|
||||
if (message.type === 'result') {
|
||||
resultSubtype =
|
||||
typeof message.subtype === 'string' ? message.subtype : undefined
|
||||
}
|
||||
}
|
||||
|
||||
if (resultSubtype && resultSubtype !== 'success') {
|
||||
throw new Error(`Claude Code report agent failed: ${resultSubtype}`)
|
||||
}
|
||||
}
|
||||
|
||||
export async function generateEvalReport(
|
||||
options: GenerateEvalReportOptions,
|
||||
): Promise<void> {
|
||||
const inputDir = resolve(options.inputDir)
|
||||
const outputPath = resolve(options.outputPath)
|
||||
|
||||
await assertRunDir(inputDir)
|
||||
await mkdir(dirname(outputPath), { recursive: true })
|
||||
|
||||
const invocation = {
|
||||
inputDir,
|
||||
outputPath,
|
||||
prompt: await buildReportPrompt(inputDir, outputPath),
|
||||
}
|
||||
await (options.runAgent ?? runClaudeCodeReportAgent)(invocation)
|
||||
await assertReportWritten(outputPath)
|
||||
}
|
||||
|
||||
if (import.meta.main) {
|
||||
try {
|
||||
await generateEvalReport(parseArgs(Bun.argv.slice(2)))
|
||||
} catch (error) {
|
||||
console.error(error instanceof Error ? error.message : String(error))
|
||||
process.exit(1)
|
||||
}
|
||||
}
|
||||
@@ -1,238 +0,0 @@
|
||||
import { writeFile } from 'node:fs/promises'
|
||||
import { join } from 'node:path'
|
||||
import { DEFAULT_TIMEOUT_MS } from '../../constants'
|
||||
import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
|
||||
import { withEvalTimeout } from '../../utils/with-eval-timeout'
|
||||
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
|
||||
import {
|
||||
type ClaudeCodeProcessRunner,
|
||||
createClaudeCodeProcessRunner,
|
||||
} from './process-runner'
|
||||
import {
|
||||
ClaudeCodeStreamParser,
|
||||
shouldCaptureScreenshotForTool,
|
||||
} from './stream-parser'
|
||||
|
||||
export interface ClaudeCodeEvaluatorDeps {
|
||||
processRunner?: ClaudeCodeProcessRunner
|
||||
}
|
||||
|
||||
export class ClaudeCodeEvaluator implements AgentEvaluator {
|
||||
private processRunner: ClaudeCodeProcessRunner
|
||||
|
||||
constructor(
|
||||
private ctx: AgentContext,
|
||||
deps: ClaudeCodeEvaluatorDeps = {},
|
||||
) {
|
||||
this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
|
||||
}
|
||||
|
||||
async execute(): Promise<AgentResult> {
|
||||
const { config, task, capture, taskOutputDir } = this.ctx
|
||||
const startTime = Date.now()
|
||||
const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
|
||||
|
||||
await capture.messageLogger.logUser(task.query)
|
||||
|
||||
if (config.agent.type !== 'claude-code') {
|
||||
throw new Error('ClaudeCodeEvaluator only supports claude-code config')
|
||||
}
|
||||
const agentConfig = config.agent
|
||||
|
||||
const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
|
||||
await writeFile(
|
||||
mcpConfigPath,
|
||||
JSON.stringify(
|
||||
buildClaudeCodeMcpConfig(config.browseros.server_url),
|
||||
null,
|
||||
2,
|
||||
),
|
||||
)
|
||||
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const toolNamesById = new Map<string, string>()
|
||||
const prompt = buildClaudeCodePrompt(task.query)
|
||||
const args = buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config: agentConfig,
|
||||
})
|
||||
|
||||
const { terminationReason } = await withEvalTimeout(
|
||||
timeoutMs,
|
||||
capture,
|
||||
async (signal) => {
|
||||
const runResult = await this.processRunner.run({
|
||||
executable: agentConfig.claudePath,
|
||||
args,
|
||||
cwd: taskOutputDir,
|
||||
signal,
|
||||
onStdoutLine: async (line) => {
|
||||
const events = parser.pushLine(line)
|
||||
for (const event of events) {
|
||||
await this.handleStreamEvent(event, toolNamesById)
|
||||
}
|
||||
},
|
||||
})
|
||||
|
||||
if (runResult.exitCode !== 0) {
|
||||
const message =
|
||||
runResult.stderr.trim() ||
|
||||
`Claude Code exited with status ${runResult.exitCode}`
|
||||
capture.addError('agent_execution', message, {
|
||||
exitCode: runResult.exitCode,
|
||||
})
|
||||
if (!parser.getLastText()) {
|
||||
throw new Error(message)
|
||||
}
|
||||
}
|
||||
|
||||
for (const error of runResult.streamErrors ?? []) {
|
||||
capture.addWarning(
|
||||
'message_logging',
|
||||
`Claude Code stream event processing failed: ${error}`,
|
||||
)
|
||||
}
|
||||
|
||||
return runResult
|
||||
},
|
||||
)
|
||||
|
||||
const endTime = Date.now()
|
||||
const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
|
||||
const metadata = {
|
||||
query_id: task.query_id,
|
||||
dataset: task.dataset,
|
||||
query: task.query,
|
||||
started_at: new Date(startTime).toISOString(),
|
||||
completed_at: new Date(endTime).toISOString(),
|
||||
total_duration_ms: endTime - startTime,
|
||||
total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
|
||||
termination_reason: terminationReason,
|
||||
final_answer: finalAnswer,
|
||||
errors: capture.getErrors(),
|
||||
warnings: capture.getWarnings(),
|
||||
device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
|
||||
agent_config: {
|
||||
type: 'claude-code' as const,
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
}
|
||||
|
||||
await capture.trajectorySaver.saveMetadata(metadata)
|
||||
|
||||
return {
|
||||
metadata,
|
||||
messages: capture.getMessages(),
|
||||
finalAnswer,
|
||||
}
|
||||
}
|
||||
|
||||
private async handleStreamEvent(
|
||||
event: UIMessageStreamEvent,
|
||||
toolNamesById: Map<string, string>,
|
||||
): Promise<void> {
|
||||
const { capture, task } = this.ctx
|
||||
let screenshot: number | undefined
|
||||
|
||||
if (event.type === 'tool-input-available') {
|
||||
toolNamesById.set(event.toolCallId, event.toolName)
|
||||
if (isPageInput(event.input)) {
|
||||
capture.setActivePageId(event.input.page)
|
||||
}
|
||||
}
|
||||
|
||||
if (
|
||||
event.type === 'tool-output-available' ||
|
||||
event.type === 'tool-output-error'
|
||||
) {
|
||||
const toolName = toolNamesById.get(event.toolCallId)
|
||||
if (toolName && shouldCaptureScreenshotForTool(toolName)) {
|
||||
screenshot = await this.captureScreenshot()
|
||||
}
|
||||
}
|
||||
|
||||
await capture.messageLogger.logStreamEvent(event, screenshot)
|
||||
capture.emitEvent(task.query_id, {
|
||||
...event,
|
||||
...(screenshot !== undefined && { screenshot }),
|
||||
})
|
||||
}
|
||||
|
||||
private async captureScreenshot(): Promise<number | undefined> {
|
||||
const { capture, task } = this.ctx
|
||||
try {
|
||||
const screenshot = await capture.screenshot.capture(
|
||||
capture.getActivePageId(),
|
||||
)
|
||||
capture.emitEvent(task.query_id, {
|
||||
type: 'screenshot-captured',
|
||||
screenshot,
|
||||
})
|
||||
return screenshot
|
||||
} catch {
|
||||
return undefined
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function isPageInput(input: unknown): input is { page: number } {
|
||||
return (
|
||||
typeof input === 'object' &&
|
||||
input !== null &&
|
||||
'page' in input &&
|
||||
typeof input.page === 'number'
|
||||
)
|
||||
}
|
||||
|
||||
function buildClaudeCodePrompt(taskQuery: string): string {
|
||||
return [
|
||||
'You are running inside BrowserOS eval.',
|
||||
'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
|
||||
'When the task is complete, respond with the final answer only.',
|
||||
'If blocked, explain the blocker clearly.',
|
||||
'',
|
||||
`Task: ${taskQuery}`,
|
||||
].join('\n')
|
||||
}
|
||||
|
||||
function buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config,
|
||||
}: {
|
||||
prompt: string
|
||||
mcpConfigPath: string
|
||||
config: ClaudeCodeAgentConfig
|
||||
}): string[] {
|
||||
const args = [
|
||||
'-p',
|
||||
prompt,
|
||||
'--mcp-config',
|
||||
mcpConfigPath,
|
||||
'--strict-mcp-config',
|
||||
'--output-format',
|
||||
'stream-json',
|
||||
'--verbose',
|
||||
]
|
||||
|
||||
if (config.model) args.push('--model', config.model)
|
||||
args.push(...config.extraArgs)
|
||||
|
||||
return args
|
||||
}
|
||||
|
||||
function buildClaudeCodeMcpConfig(serverUrl: string) {
|
||||
const trimmed = serverUrl.replace(/\/$/, '')
|
||||
const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
|
||||
return {
|
||||
mcpServers: {
|
||||
browseros: {
|
||||
type: 'http',
|
||||
url,
|
||||
headers: { 'X-BrowserOS-Source': 'sdk-internal' },
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
@@ -1,114 +0,0 @@
|
||||
export interface ClaudeCodeRunOptions {
|
||||
executable: string
|
||||
args: string[]
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface ClaudeCodeRunResult {
|
||||
exitCode: number
|
||||
stderr: string
|
||||
streamErrors?: string[]
|
||||
}
|
||||
|
||||
export interface ClaudeCodeProcessRunner {
|
||||
run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export interface SpawnOptions {
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface CreateClaudeCodeProcessRunnerDeps {
|
||||
spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export function createClaudeCodeProcessRunner(
|
||||
deps: CreateClaudeCodeProcessRunnerDeps = {},
|
||||
): ClaudeCodeProcessRunner {
|
||||
const spawn = deps.spawn ?? spawnClaudeCode
|
||||
return {
|
||||
run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
|
||||
spawn([executable, ...args], { cwd, signal, onStdoutLine }),
|
||||
}
|
||||
}
|
||||
|
||||
async function spawnClaudeCode(
|
||||
cmd: string[],
|
||||
options: SpawnOptions,
|
||||
): Promise<ClaudeCodeRunResult> {
|
||||
const proc = Bun.spawn({
|
||||
cmd,
|
||||
cwd: options.cwd,
|
||||
stdin: 'ignore',
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
})
|
||||
|
||||
const abort = () => {
|
||||
try {
|
||||
proc.kill('SIGTERM')
|
||||
} catch {
|
||||
// Process may already have exited.
|
||||
}
|
||||
}
|
||||
options.signal?.addEventListener('abort', abort, { once: true })
|
||||
|
||||
try {
|
||||
const streamErrors: string[] = []
|
||||
const stdoutPromise = readLines(
|
||||
proc.stdout,
|
||||
options.onStdoutLine,
|
||||
streamErrors,
|
||||
)
|
||||
const stderrPromise = new Response(proc.stderr).text()
|
||||
const exitCode = await proc.exited
|
||||
await stdoutPromise
|
||||
const stderr = await stderrPromise
|
||||
return { exitCode, stderr, streamErrors }
|
||||
} finally {
|
||||
options.signal?.removeEventListener('abort', abort)
|
||||
}
|
||||
}
|
||||
|
||||
async function readLines(
|
||||
stream: ReadableStream<Uint8Array>,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
const reader = stream.getReader()
|
||||
const decoder = new TextDecoder()
|
||||
let buffer = ''
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read()
|
||||
if (done) break
|
||||
|
||||
buffer += decoder.decode(value, { stream: true })
|
||||
const lines = buffer.split('\n')
|
||||
buffer = lines.pop() ?? ''
|
||||
for (const line of lines) {
|
||||
await emitLine(line, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
buffer += decoder.decode()
|
||||
if (buffer.length > 0) {
|
||||
await emitLine(buffer, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
async function emitLine(
|
||||
line: string,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
try {
|
||||
await onLine(line)
|
||||
} catch (error) {
|
||||
streamErrors.push(error instanceof Error ? error.message : String(error))
|
||||
}
|
||||
}
|
||||
@@ -1,142 +0,0 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import type { UIMessageStreamEvent } from '../../types'
|
||||
|
||||
type JsonObject = Record<string, unknown>
|
||||
|
||||
export class ClaudeCodeStreamParser {
|
||||
private lastText: string | null = null
|
||||
private toolCallCount = 0
|
||||
|
||||
pushLine(line: string): UIMessageStreamEvent[] {
|
||||
const trimmed = line.trim()
|
||||
if (!trimmed) return []
|
||||
|
||||
let parsed: unknown
|
||||
try {
|
||||
parsed = JSON.parse(trimmed)
|
||||
} catch {
|
||||
return []
|
||||
}
|
||||
|
||||
if (!isObject(parsed)) return []
|
||||
|
||||
if (parsed.type === 'assistant') {
|
||||
return this.parseAssistantMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'user') {
|
||||
return this.parseUserMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'result' && typeof parsed.result === 'string') {
|
||||
this.lastText = parsed.result
|
||||
}
|
||||
|
||||
return []
|
||||
}
|
||||
|
||||
getLastText(): string | null {
|
||||
return this.lastText
|
||||
}
|
||||
|
||||
getToolCallCount(): number {
|
||||
return this.toolCallCount
|
||||
}
|
||||
|
||||
private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (block.type === 'text' && typeof block.text === 'string') {
|
||||
const id = randomUUID()
|
||||
this.lastText = block.text
|
||||
events.push(
|
||||
{ type: 'text-start', id },
|
||||
{ type: 'text-delta', id, delta: block.text },
|
||||
{ type: 'text-end', id },
|
||||
)
|
||||
} else if (
|
||||
block.type === 'tool_use' &&
|
||||
typeof block.id === 'string' &&
|
||||
typeof block.name === 'string'
|
||||
) {
|
||||
this.toolCallCount++
|
||||
events.push({
|
||||
type: 'tool-input-available',
|
||||
toolCallId: block.id,
|
||||
toolName: block.name,
|
||||
input: block.input,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
|
||||
private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (
|
||||
block.type !== 'tool_result' ||
|
||||
typeof block.tool_use_id !== 'string'
|
||||
) {
|
||||
continue
|
||||
}
|
||||
|
||||
if (block.is_error === true) {
|
||||
events.push({
|
||||
type: 'tool-output-error',
|
||||
toolCallId: block.tool_use_id,
|
||||
errorText: stringifyToolContent(block.content),
|
||||
})
|
||||
} else {
|
||||
events.push({
|
||||
type: 'tool-output-available',
|
||||
toolCallId: block.tool_use_id,
|
||||
output: normalizeToolContent(block.content),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
}
|
||||
|
||||
export function shouldCaptureScreenshotForTool(toolName: string): boolean {
|
||||
if (!toolName.startsWith('mcp__browseros__')) return false
|
||||
return !toolName.endsWith('__take_screenshot')
|
||||
}
|
||||
|
||||
function contentBlocks(message: JsonObject): JsonObject[] {
|
||||
const inner = isObject(message.message) ? message.message : message
|
||||
return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
|
||||
}
|
||||
|
||||
function isObject(value: unknown): value is JsonObject {
|
||||
return typeof value === 'object' && value !== null
|
||||
}
|
||||
|
||||
function normalizeToolContent(content: unknown): unknown {
|
||||
if (!Array.isArray(content)) return content
|
||||
return content.map((item) => {
|
||||
if (
|
||||
isObject(item) &&
|
||||
item.type === 'text' &&
|
||||
typeof item.text === 'string'
|
||||
) {
|
||||
return item.text
|
||||
}
|
||||
return item
|
||||
})
|
||||
}
|
||||
|
||||
function stringifyToolContent(content: unknown): string {
|
||||
const normalized = normalizeToolContent(content)
|
||||
if (typeof normalized === 'string') return normalized
|
||||
try {
|
||||
return JSON.stringify(normalized)
|
||||
} catch {
|
||||
return String(normalized)
|
||||
}
|
||||
}
|
||||
@@ -1,4 +1,3 @@
|
||||
import { ClaudeCodeEvaluator } from './claude-code'
|
||||
import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
|
||||
import { SingleAgentEvaluator } from './single-agent'
|
||||
import type { AgentContext, AgentEvaluator } from './types'
|
||||
@@ -9,8 +8,6 @@ export function createAgent(context: AgentContext): AgentEvaluator {
|
||||
return new SingleAgentEvaluator(context)
|
||||
case 'orchestrator-executor':
|
||||
return new OrchestratorExecutorEvaluator(context)
|
||||
case 'claude-code':
|
||||
return new ClaudeCodeEvaluator(context)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -134,10 +134,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
|
||||
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
|
||||
const cdpPort = config.browseros.base_cdp_port + workerIndex
|
||||
const cdp = new CdpBackend({
|
||||
port: cdpPort,
|
||||
exitOnReconnectFailure: false,
|
||||
})
|
||||
const cdp = new CdpBackend({ port: cdpPort })
|
||||
await cdp.connect()
|
||||
const browser = new Browser(cdp)
|
||||
capture.screenshot.setBrowser(browser)
|
||||
|
||||
@@ -43,10 +43,7 @@ export class SingleAgentEvaluator implements AgentEvaluator {
|
||||
|
||||
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
|
||||
const cdpPort = config.browseros.base_cdp_port + workerIndex
|
||||
const cdp = new CdpBackend({
|
||||
port: cdpPort,
|
||||
exitOnReconnectFailure: false,
|
||||
})
|
||||
const cdp = new CdpBackend({ port: cdpPort })
|
||||
await cdp.connect()
|
||||
|
||||
const browser = new Browser(cdp)
|
||||
|
||||
@@ -105,10 +105,7 @@ export class TrajectorySaver {
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: {
|
||||
type: agentConfig.type as
|
||||
| 'single'
|
||||
| 'orchestrator-executor'
|
||||
| 'claude-code',
|
||||
type: agentConfig.type as 'single' | 'orchestrator-executor',
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
|
||||
@@ -82,16 +82,6 @@ function suiteToEvalConfig(
|
||||
})
|
||||
}
|
||||
|
||||
if (suite.agent.type === 'claude-code') {
|
||||
return EvalConfigSchema.parse({
|
||||
...base,
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
...(variant.agent.model && { model: variant.agent.model }),
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
|
||||
const executor =
|
||||
executorBackend === 'clado'
|
||||
@@ -145,10 +135,7 @@ export async function resolveSuiteCommand(
|
||||
const loaded = await loadSuite(options.suitePath)
|
||||
const variant = resolveVariant({
|
||||
variantId: options.variantId,
|
||||
provider:
|
||||
loaded.suite.agent.type === 'claude-code'
|
||||
? 'claude-code'
|
||||
: options.provider,
|
||||
provider: options.provider,
|
||||
model: options.model,
|
||||
apiKey: options.apiKey,
|
||||
baseUrl: options.baseUrl,
|
||||
|
||||
@@ -536,12 +536,6 @@ export interface DashboardConfig {
|
||||
configMode?: boolean
|
||||
}
|
||||
|
||||
export function shouldAutoOpenDashboard(
|
||||
env: Record<string, string | undefined> = process.env,
|
||||
): boolean {
|
||||
return env.CI !== 'true'
|
||||
}
|
||||
|
||||
export function startDashboard(config: DashboardConfig) {
|
||||
const port = config.port ?? 9900
|
||||
dashboardConfigMode = config.configMode ?? false
|
||||
@@ -564,12 +558,10 @@ export function startDashboard(config: DashboardConfig) {
|
||||
console.log(` Dashboard: ${url}`)
|
||||
|
||||
// Auto-open browser
|
||||
if (shouldAutoOpenDashboard()) {
|
||||
try {
|
||||
Bun.spawn(['open', url], { stdout: 'ignore', stderr: 'ignore' })
|
||||
} catch {
|
||||
/* ignore if open command fails */
|
||||
}
|
||||
try {
|
||||
Bun.spawn(['open', url], { stdout: 'ignore', stderr: 'ignore' })
|
||||
} catch {
|
||||
/* ignore if open command fails */
|
||||
}
|
||||
|
||||
return { url, port }
|
||||
|
||||
@@ -61,17 +61,6 @@
|
||||
.header-stats .stat-pass { color: #3fb950; }
|
||||
.header-stats .stat-fail { color: #f85149; }
|
||||
.header-stats .stat-score { color: #f0883e; }
|
||||
.header-report {
|
||||
color: #58a6ff;
|
||||
text-decoration: none;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 6px;
|
||||
padding: 5px 9px;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.header-report:hover { border-color: #58a6ff; background: #1c2333; }
|
||||
|
||||
/* ── 3-column layout ─────────────────────────────────────────── */
|
||||
.layout {
|
||||
@@ -95,7 +84,6 @@
|
||||
background: #161b22;
|
||||
border-bottom: 1px solid #30363d;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 12px;
|
||||
font-size: 11px;
|
||||
font-weight: 600;
|
||||
@@ -105,80 +93,6 @@
|
||||
}
|
||||
.sidebar-stats .s-pass { color: #3fb950; }
|
||||
.sidebar-stats .s-fail { color: #f85149; }
|
||||
.sidebar-metrics {
|
||||
padding: 12px 16px;
|
||||
background: #0d1117;
|
||||
border-bottom: 1px solid #21262d;
|
||||
}
|
||||
.metric-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(3, minmax(0, 1fr));
|
||||
gap: 8px;
|
||||
margin-bottom: 12px;
|
||||
}
|
||||
.metric-cell {
|
||||
min-width: 0;
|
||||
}
|
||||
.metric-label {
|
||||
display: block;
|
||||
font-size: 9px;
|
||||
font-weight: 600;
|
||||
color: #6e7681;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.metric-value {
|
||||
display: block;
|
||||
font-size: 13px;
|
||||
font-weight: 700;
|
||||
color: #e6edf3;
|
||||
margin-top: 2px;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.mini-chart {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 6px;
|
||||
}
|
||||
.mini-chart-title {
|
||||
font-size: 10px;
|
||||
font-weight: 700;
|
||||
color: #8b949e;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
.mini-bar-row {
|
||||
display: grid;
|
||||
grid-template-columns: minmax(60px, 1fr) 70px 28px;
|
||||
gap: 8px;
|
||||
align-items: center;
|
||||
font-size: 10px;
|
||||
color: #8b949e;
|
||||
}
|
||||
.mini-bar-name {
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
|
||||
}
|
||||
.mini-bar-track {
|
||||
height: 6px;
|
||||
background: #21262d;
|
||||
border-radius: 999px;
|
||||
overflow: hidden;
|
||||
}
|
||||
.mini-bar-fill {
|
||||
height: 100%;
|
||||
background: #58a6ff;
|
||||
border-radius: 999px;
|
||||
}
|
||||
.mini-bar-value {
|
||||
color: #e6edf3;
|
||||
font-variant-numeric: tabular-nums;
|
||||
text-align: right;
|
||||
}
|
||||
.sidebar-filter {
|
||||
padding: 8px 12px;
|
||||
border-bottom: 1px solid #21262d;
|
||||
@@ -612,7 +526,6 @@
|
||||
<div class="header-sep"></div>
|
||||
<span class="header-run" id="header-run"></span>
|
||||
<span class="header-date" id="header-date"></span>
|
||||
<a class="header-report" id="header-report" target="_blank" rel="noopener" style="display: none;">Run Report</a>
|
||||
<div class="header-stats" id="header-stats"></div>
|
||||
</div>
|
||||
|
||||
@@ -620,7 +533,6 @@
|
||||
<!-- Left sidebar -->
|
||||
<div class="sidebar" id="sidebar">
|
||||
<div class="sidebar-stats" id="sidebar-stats"></div>
|
||||
<div class="sidebar-metrics" id="sidebar-metrics"></div>
|
||||
<div class="sidebar-filter">
|
||||
<input type="text" id="filter-input" placeholder="Search tasks..." autocomplete="off" spellcheck="false" />
|
||||
</div>
|
||||
@@ -715,23 +627,7 @@
|
||||
if (stats.avgScore !== null) {
|
||||
parts.push(`<span class="stat-score">avg ${stats.avgScore}%</span>`);
|
||||
}
|
||||
if (stats.avgDurationMs !== null) {
|
||||
parts.push(`<span>${fmtDuration(stats.avgDurationMs)} avg</span>`);
|
||||
}
|
||||
if (stats.avgToolCalls !== null) {
|
||||
parts.push(`<span>${fmtCompact(stats.avgToolCalls)} tools/task</span>`);
|
||||
}
|
||||
el.innerHTML = parts.join('');
|
||||
|
||||
const reportLink = document.getElementById('header-report');
|
||||
const url = reportUrl(manifest);
|
||||
if (url) {
|
||||
reportLink.href = url;
|
||||
reportLink.style.display = '';
|
||||
} else {
|
||||
reportLink.removeAttribute('href');
|
||||
reportLink.style.display = 'none';
|
||||
}
|
||||
}
|
||||
|
||||
// ── Sidebar rendering ─────────────────────────────────────────
|
||||
@@ -743,49 +639,11 @@
|
||||
statsEl.innerHTML =
|
||||
'<span>' + stats.total + ' total</span>' +
|
||||
'<span class="s-pass">' + stats.passed + ' pass</span>' +
|
||||
'<span class="s-fail">' + stats.failed + ' fail</span>' +
|
||||
(stats.avgSteps !== null ? '<span>' + fmtCompact(stats.avgSteps) + ' steps/task</span>' : '') +
|
||||
(stats.avgToolCalls !== null ? '<span>' + fmtCompact(stats.avgToolCalls) + ' tools/task</span>' : '');
|
||||
|
||||
renderSidebarMetrics(tasks, stats);
|
||||
'<span class="s-fail">' + stats.failed + ' fail</span>';
|
||||
|
||||
renderTaskList('');
|
||||
}
|
||||
|
||||
function renderSidebarMetrics(tasks, stats) {
|
||||
const el = document.getElementById('sidebar-metrics');
|
||||
if (!el) return;
|
||||
|
||||
const chartTasks = tasks
|
||||
.slice()
|
||||
.sort((a, b) => taskMetrics(b).toolCalls - taskMetrics(a).toolCalls)
|
||||
.slice(0, 5);
|
||||
const maxCalls = Math.max(1, ...chartTasks.map((task) => taskMetrics(task).toolCalls));
|
||||
|
||||
const bars = chartTasks.map((task) => {
|
||||
const calls = taskMetrics(task).toolCalls;
|
||||
const width = Math.max(4, Math.round((calls / maxCalls) * 100));
|
||||
return (
|
||||
'<div class="mini-bar-row">' +
|
||||
'<span class="mini-bar-name" title="' + escAttr(task.queryId || task.id || 'Untitled') + '">' + esc(task.queryId || task.id || 'Untitled') + '</span>' +
|
||||
'<span class="mini-bar-track"><span class="mini-bar-fill" style="width: ' + width + '%"></span></span>' +
|
||||
'<span class="mini-bar-value">' + fmtCompact(calls) + '</span>' +
|
||||
'</div>'
|
||||
);
|
||||
}).join('');
|
||||
|
||||
el.innerHTML =
|
||||
'<div class="metric-grid">' +
|
||||
'<div class="metric-cell"><span class="metric-label">Avg Time</span><span class="metric-value">' + (stats.avgDurationMs !== null ? fmtDuration(stats.avgDurationMs) : '-') + '</span></div>' +
|
||||
'<div class="metric-cell"><span class="metric-label">Avg Steps</span><span class="metric-value">' + (stats.avgSteps !== null ? fmtCompact(stats.avgSteps) : '-') + '</span></div>' +
|
||||
'<div class="metric-cell"><span class="metric-label">Avg Tools</span><span class="metric-value">' + (stats.avgToolCalls !== null ? fmtCompact(stats.avgToolCalls) : '-') + '</span></div>' +
|
||||
'</div>' +
|
||||
'<div class="mini-chart">' +
|
||||
'<div class="mini-chart-title">Tool Calls by Task</div>' +
|
||||
(bars || '<div class="task-meta-line"><span>No tool calls recorded</span></div>') +
|
||||
'</div>';
|
||||
}
|
||||
|
||||
function renderTaskList(filter) {
|
||||
const list = document.getElementById('task-list');
|
||||
list.innerHTML = '';
|
||||
@@ -810,11 +668,8 @@
|
||||
}
|
||||
|
||||
const metaParts = [];
|
||||
const metrics = taskMetrics(task);
|
||||
if (metrics.durationMs) metaParts.push(fmtDuration(metrics.durationMs));
|
||||
if (metrics.steps) metaParts.push(`${fmtCompact(metrics.steps)} steps`);
|
||||
if (metrics.toolCalls) metaParts.push(`${fmtCompact(metrics.toolCalls)} tools`);
|
||||
if (metrics.toolErrors) metaParts.push(`${fmtCompact(metrics.toolErrors)} errors`);
|
||||
if (task.durationMs) metaParts.push(fmtDuration(task.durationMs));
|
||||
if (task.screenshotCount) metaParts.push(`${task.screenshotCount} steps`);
|
||||
|
||||
item.innerHTML =
|
||||
'<div class="task-row">' +
|
||||
@@ -859,7 +714,7 @@
|
||||
}
|
||||
|
||||
function artifactPath(task, artifact) {
|
||||
const manifestPath = task.paths?.[artifact];
|
||||
const manifestPath = task.paths && task.paths[artifact];
|
||||
if (typeof manifestPath === 'string' && manifestPath.length > 0) {
|
||||
return manifestPath.replace(/^\/+/, '');
|
||||
}
|
||||
@@ -870,17 +725,6 @@
|
||||
return `${basePath}/${artifactPath(task, artifact)}`;
|
||||
}
|
||||
|
||||
function runArtifactUrl(path) {
|
||||
if (typeof path !== 'string' || path.length === 0) return null;
|
||||
return `${basePath}/${path.replace(/^\/+/, '')}`;
|
||||
}
|
||||
|
||||
function reportUrl(manifest, task) {
|
||||
const url = runArtifactUrl(manifest?.reportPath);
|
||||
if (!url || !task) return url;
|
||||
return `${url}#${encodeURIComponent(task.queryId || task.id || '')}`;
|
||||
}
|
||||
|
||||
function metadataUrl(task) {
|
||||
return artifactUrl(task, 'metadata');
|
||||
}
|
||||
@@ -1061,38 +905,10 @@
|
||||
}
|
||||
|
||||
// Duration
|
||||
const metrics = taskMetrics(task);
|
||||
if (metrics.durationMs) {
|
||||
if (task.durationMs) {
|
||||
html += '<div class="db-section">';
|
||||
html += '<span class="db-label">Duration</span>';
|
||||
html += `<span class="db-value">${fmtDuration(metrics.durationMs)}</span>`;
|
||||
html += '</div>';
|
||||
}
|
||||
|
||||
if (metrics.steps) {
|
||||
html += '<div class="db-section">';
|
||||
html += '<span class="db-label">Steps</span>';
|
||||
html += `<span class="db-value">${fmtCompact(metrics.steps)}</span>`;
|
||||
html += '</div>';
|
||||
}
|
||||
|
||||
html += '<div class="db-section">';
|
||||
html += '<span class="db-label">Tool Calls</span>';
|
||||
html += `<span class="db-value">${fmtCompact(metrics.toolCalls)}</span>`;
|
||||
html += '</div>';
|
||||
|
||||
if (metrics.toolErrors) {
|
||||
html += '<div class="db-section">';
|
||||
html += '<span class="db-label">Tool Errors</span>';
|
||||
html += `<span class="db-value">${fmtCompact(metrics.toolErrors)}</span>`;
|
||||
html += '</div>';
|
||||
}
|
||||
|
||||
const reportLink = reportUrl(manifest, task);
|
||||
if (reportLink) {
|
||||
html += '<div class="db-section">';
|
||||
html += '<span class="db-label">Report</span>';
|
||||
html += `<span class="db-value"><a href="${escAttr(reportLink)}" target="_blank" rel="noopener">Open task analysis</a></span>`;
|
||||
html += `<span class="db-value">${fmtDuration(task.durationMs)}</span>`;
|
||||
html += '</div>';
|
||||
}
|
||||
|
||||
@@ -1418,25 +1234,8 @@
|
||||
function computeStats(tasks) {
|
||||
const total = tasks.length;
|
||||
let passed = 0, failed = 0, totalScore = 0, scoredCount = 0;
|
||||
let totalDurationMs = 0, durationCount = 0;
|
||||
let totalSteps = 0, stepsCount = 0;
|
||||
let totalToolCalls = 0, toolCount = 0;
|
||||
let totalToolErrors = 0;
|
||||
|
||||
tasks.forEach((t) => {
|
||||
const metrics = taskMetrics(t);
|
||||
if (metrics.durationMs > 0) {
|
||||
totalDurationMs += metrics.durationMs;
|
||||
durationCount++;
|
||||
}
|
||||
if (metrics.steps > 0) {
|
||||
totalSteps += metrics.steps;
|
||||
stepsCount++;
|
||||
}
|
||||
totalToolCalls += metrics.toolCalls;
|
||||
totalToolErrors += metrics.toolErrors;
|
||||
toolCount++;
|
||||
|
||||
const graders = t.graderResults || {};
|
||||
const keys = Object.keys(graders);
|
||||
if (keys.length > 0) {
|
||||
@@ -1455,34 +1254,7 @@
|
||||
total: total,
|
||||
passed: passed,
|
||||
failed: failed,
|
||||
avgScore: scoredCount > 0 ? Math.round((totalScore / scoredCount) * 100) : null,
|
||||
avgDurationMs: durationCount > 0 ? totalDurationMs / durationCount : null,
|
||||
avgSteps: stepsCount > 0 ? totalSteps / stepsCount : null,
|
||||
avgToolCalls: toolCount > 0 ? totalToolCalls / toolCount : null,
|
||||
totalToolCalls: totalToolCalls,
|
||||
totalToolErrors: totalToolErrors
|
||||
};
|
||||
}
|
||||
|
||||
function taskMetrics(task) {
|
||||
const metrics = task.metrics || {};
|
||||
const screenshots = Number.isFinite(Number(metrics.screenshots))
|
||||
? Number(metrics.screenshots)
|
||||
: Number(task.screenshotCount || 0);
|
||||
return {
|
||||
durationMs: Number.isFinite(Number(metrics.durationMs))
|
||||
? Number(metrics.durationMs)
|
||||
: Number(task.durationMs || 0),
|
||||
steps: Number.isFinite(Number(metrics.steps))
|
||||
? Number(metrics.steps)
|
||||
: screenshots,
|
||||
screenshots: screenshots,
|
||||
toolCalls: Number.isFinite(Number(metrics.toolCalls))
|
||||
? Number(metrics.toolCalls)
|
||||
: 0,
|
||||
toolErrors: Number.isFinite(Number(metrics.toolErrors))
|
||||
? Number(metrics.toolErrors)
|
||||
: 0
|
||||
avgScore: scoredCount > 0 ? Math.round((totalScore / scoredCount) * 100) : null
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1538,13 +1310,6 @@
|
||||
return `${h}h ${remM}m`;
|
||||
}
|
||||
|
||||
function fmtCompact(value) {
|
||||
const num = Number(value);
|
||||
if (!Number.isFinite(num)) return '0';
|
||||
if (Number.isInteger(num)) return String(num);
|
||||
return num.toFixed(1);
|
||||
}
|
||||
|
||||
function showFatalError(msgHtml) {
|
||||
document.getElementById('center-panel').innerHTML =
|
||||
'<div class="placeholder error">' +
|
||||
|
||||
@@ -2,7 +2,6 @@ export interface PythonEvaluatorOptions {
|
||||
scriptPath: string
|
||||
input: unknown
|
||||
timeoutMs: number
|
||||
pythonPath?: string
|
||||
}
|
||||
|
||||
export interface PythonEvaluatorResult<T> {
|
||||
@@ -16,9 +15,7 @@ export interface PythonEvaluatorResult<T> {
|
||||
export async function runPythonJsonEvaluator<T>(
|
||||
options: PythonEvaluatorOptions,
|
||||
): Promise<PythonEvaluatorResult<T>> {
|
||||
const pythonPath =
|
||||
options.pythonPath || process.env.BROWSEROS_EVAL_PYTHON || 'python3'
|
||||
const proc = Bun.spawn([pythonPath, options.scriptPath], {
|
||||
const proc = Bun.spawn(['python3', options.scriptPath], {
|
||||
stdin: 'pipe',
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
|
||||
@@ -5,7 +5,6 @@ import {
|
||||
PutObjectCommand,
|
||||
S3Client,
|
||||
} from '@aws-sdk/client-s3'
|
||||
import { readTaskMetrics } from '../reporting/task-metrics'
|
||||
import {
|
||||
buildViewerManifest,
|
||||
type ViewerManifestTaskInput,
|
||||
@@ -316,7 +315,6 @@ export class R2Publisher {
|
||||
graderResults:
|
||||
(meta.grader_results as ViewerManifestTaskInput['graderResults']) ||
|
||||
{},
|
||||
metrics: await readTaskMetrics(taskPath, meta, screenshotCount),
|
||||
})
|
||||
}
|
||||
|
||||
@@ -381,12 +379,10 @@ export class R2Publisher {
|
||||
await readFile(join(runDir, 'summary.json'), 'utf-8'),
|
||||
) as Record<string, unknown>
|
||||
} catch {}
|
||||
const reportStat = await stat(join(runDir, 'report.html')).catch(() => null)
|
||||
|
||||
return buildViewerManifest({
|
||||
runId,
|
||||
uploadedAt: this.now().toISOString(),
|
||||
reportPath: reportStat?.isFile() ? 'report.html' : undefined,
|
||||
agentConfig,
|
||||
dataset,
|
||||
summary: summaryData
|
||||
|
||||
@@ -1,188 +0,0 @@
|
||||
import { readdir, readFile, stat } from 'node:fs/promises'
|
||||
import { join } from 'node:path'
|
||||
|
||||
export interface EvalTaskMetrics {
|
||||
durationMs: number
|
||||
steps: number
|
||||
screenshots: number
|
||||
toolCalls: number
|
||||
toolErrors: number
|
||||
}
|
||||
|
||||
export interface EvalRunMetrics {
|
||||
taskCount: number
|
||||
totalDurationMs: number
|
||||
avgDurationMs: number
|
||||
totalSteps: number
|
||||
avgSteps: number
|
||||
totalToolCalls: number
|
||||
avgToolCalls: number
|
||||
totalToolErrors: number
|
||||
avgToolErrors: number
|
||||
}
|
||||
|
||||
export interface EvalTaskMetricSummary {
|
||||
queryId: string
|
||||
status: string
|
||||
score?: number
|
||||
pass?: boolean
|
||||
metrics: EvalTaskMetrics
|
||||
}
|
||||
|
||||
export interface EvalRunMetricSummary {
|
||||
run: EvalRunMetrics
|
||||
tasks: EvalTaskMetricSummary[]
|
||||
}
|
||||
|
||||
interface TaskDirEntry {
|
||||
taskId: string
|
||||
taskPath: string
|
||||
}
|
||||
|
||||
function numberValue(value: unknown): number {
|
||||
return typeof value === 'number' && Number.isFinite(value) ? value : 0
|
||||
}
|
||||
|
||||
export function countMessageMetrics(messagesJsonl: string): {
|
||||
toolCalls: number
|
||||
toolErrors: number
|
||||
} {
|
||||
let toolCalls = 0
|
||||
let toolErrors = 0
|
||||
|
||||
for (const line of messagesJsonl.split('\n')) {
|
||||
const trimmed = line.trim()
|
||||
if (!trimmed) continue
|
||||
try {
|
||||
const event = JSON.parse(trimmed) as { type?: unknown }
|
||||
if (event.type === 'tool-input-available') toolCalls++
|
||||
if (event.type === 'tool-output-error') toolErrors++
|
||||
} catch {
|
||||
// Ignore malformed telemetry lines; the raw artifact is still uploaded.
|
||||
}
|
||||
}
|
||||
|
||||
return { toolCalls, toolErrors }
|
||||
}
|
||||
|
||||
export function buildTaskMetrics(
|
||||
metadata: Record<string, unknown>,
|
||||
messageMetrics: { toolCalls: number; toolErrors: number },
|
||||
screenshotCount = 0,
|
||||
): EvalTaskMetrics {
|
||||
const screenshots = numberValue(metadata.screenshot_count) || screenshotCount
|
||||
return {
|
||||
durationMs: numberValue(metadata.total_duration_ms),
|
||||
steps: numberValue(metadata.total_steps) || screenshots,
|
||||
screenshots,
|
||||
toolCalls: messageMetrics.toolCalls,
|
||||
toolErrors: messageMetrics.toolErrors,
|
||||
}
|
||||
}
|
||||
|
||||
export function buildRunMetrics(metrics: EvalTaskMetrics[]): EvalRunMetrics {
|
||||
const taskCount = metrics.length
|
||||
const totalDurationMs = metrics.reduce((sum, metric) => {
|
||||
return sum + metric.durationMs
|
||||
}, 0)
|
||||
const totalSteps = metrics.reduce((sum, metric) => sum + metric.steps, 0)
|
||||
const totalToolCalls = metrics.reduce((sum, metric) => {
|
||||
return sum + metric.toolCalls
|
||||
}, 0)
|
||||
const totalToolErrors = metrics.reduce((sum, metric) => {
|
||||
return sum + metric.toolErrors
|
||||
}, 0)
|
||||
|
||||
return {
|
||||
taskCount,
|
||||
totalDurationMs,
|
||||
avgDurationMs: taskCount > 0 ? totalDurationMs / taskCount : 0,
|
||||
totalSteps,
|
||||
avgSteps: taskCount > 0 ? totalSteps / taskCount : 0,
|
||||
totalToolCalls,
|
||||
avgToolCalls: taskCount > 0 ? totalToolCalls / taskCount : 0,
|
||||
totalToolErrors,
|
||||
avgToolErrors: taskCount > 0 ? totalToolErrors / taskCount : 0,
|
||||
}
|
||||
}
|
||||
|
||||
export async function readTaskMetrics(
|
||||
taskPath: string,
|
||||
metadata: Record<string, unknown>,
|
||||
screenshotCount = 0,
|
||||
): Promise<EvalTaskMetrics> {
|
||||
const messages = await readFile(join(taskPath, 'messages.jsonl'), 'utf-8')
|
||||
.then(countMessageMetrics)
|
||||
.catch(() => ({ toolCalls: 0, toolErrors: 0 }))
|
||||
return buildTaskMetrics(metadata, messages, screenshotCount)
|
||||
}
|
||||
|
||||
function statusFromMetadata(metadata: Record<string, unknown>): string {
|
||||
const termination = metadata.termination_reason
|
||||
if (termination === 'timeout') return 'timeout'
|
||||
if (Array.isArray(metadata.errors) && metadata.errors.length > 0) {
|
||||
return 'failed'
|
||||
}
|
||||
return 'completed'
|
||||
}
|
||||
|
||||
function primaryGrade(metadata: Record<string, unknown>): {
|
||||
score?: number
|
||||
pass?: boolean
|
||||
} {
|
||||
const graders = metadata.grader_results as
|
||||
| Record<string, { score?: unknown; pass?: unknown }>
|
||||
| undefined
|
||||
const first = graders ? Object.values(graders)[0] : undefined
|
||||
return {
|
||||
...(typeof first?.score === 'number' ? { score: first.score } : {}),
|
||||
...(typeof first?.pass === 'boolean' ? { pass: first.pass } : {}),
|
||||
}
|
||||
}
|
||||
|
||||
async function readTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
|
||||
const canonicalTasksDir = join(runDir, 'tasks')
|
||||
const canonicalStat = await stat(canonicalTasksDir).catch(() => null)
|
||||
const baseDir = canonicalStat?.isDirectory() ? canonicalTasksDir : runDir
|
||||
const entries = await readdir(baseDir, { withFileTypes: true }).catch(
|
||||
() => [],
|
||||
)
|
||||
|
||||
return entries
|
||||
.filter((entry) => entry.isDirectory())
|
||||
.filter((entry) => entry.name !== 'screenshots')
|
||||
.filter((entry) => entry.name !== 'tasks')
|
||||
.map((entry) => ({
|
||||
taskId: entry.name,
|
||||
taskPath: join(baseDir, entry.name),
|
||||
}))
|
||||
}
|
||||
|
||||
export async function readRunMetricSummary(
|
||||
runDir: string,
|
||||
): Promise<EvalRunMetricSummary> {
|
||||
const tasks: EvalTaskMetricSummary[] = []
|
||||
|
||||
for (const entry of await readTaskDirs(runDir)) {
|
||||
const metadata = await readFile(
|
||||
join(entry.taskPath, 'metadata.json'),
|
||||
'utf-8',
|
||||
)
|
||||
.then((text) => JSON.parse(text) as Record<string, unknown>)
|
||||
.catch(() => null)
|
||||
if (!metadata) continue
|
||||
|
||||
const metrics = await readTaskMetrics(entry.taskPath, metadata)
|
||||
tasks.push({
|
||||
queryId: (metadata.query_id as string | undefined) || entry.taskId,
|
||||
status: statusFromMetadata(metadata),
|
||||
...primaryGrade(metadata),
|
||||
metrics,
|
||||
})
|
||||
}
|
||||
|
||||
return {
|
||||
run: buildRunMetrics(tasks.map((task) => task.metrics)),
|
||||
tasks,
|
||||
}
|
||||
}
|
||||
@@ -33,13 +33,6 @@ function variantSource(config: EvalConfig): {
|
||||
baseUrl?: string
|
||||
supportsImages?: boolean
|
||||
} {
|
||||
if (config.agent.type === 'claude-code') {
|
||||
return {
|
||||
provider: 'claude-code',
|
||||
model: config.agent.model ?? 'default',
|
||||
}
|
||||
}
|
||||
|
||||
const agent =
|
||||
config.agent.type === 'single' ? config.agent : config.agent.orchestrator
|
||||
if (!agent.model) {
|
||||
@@ -83,7 +76,10 @@ export async function adaptEvalConfigFile(
|
||||
suite: {
|
||||
id,
|
||||
dataset: evalConfig.dataset,
|
||||
agent: suiteAgent(evalConfig, backend),
|
||||
agent:
|
||||
evalConfig.agent.type === 'single'
|
||||
? { type: 'tool-loop' }
|
||||
: { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' },
|
||||
graders: evalConfig.graders ?? [],
|
||||
workers: evalConfig.num_workers,
|
||||
restartBrowserPerTask: evalConfig.restart_server_per_task,
|
||||
@@ -103,17 +99,3 @@ export async function adaptEvalConfigFile(
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
function suiteAgent(
|
||||
config: EvalConfig,
|
||||
backend: ReturnType<typeof executorBackend>,
|
||||
): EvalSuite['agent'] {
|
||||
switch (config.agent.type) {
|
||||
case 'single':
|
||||
return { type: 'tool-loop' }
|
||||
case 'orchestrator-executor':
|
||||
return { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' }
|
||||
case 'claude-code':
|
||||
return { type: 'claude-code' }
|
||||
}
|
||||
}
|
||||
|
||||
@@ -57,30 +57,10 @@ export function resolveVariant(
|
||||
options: ResolveVariantOptions = {},
|
||||
): EvalVariant {
|
||||
const env = options.env ?? process.env
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
|
||||
const provider =
|
||||
options.provider ?? env.EVAL_AGENT_PROVIDER ?? 'openai-compatible'
|
||||
const model = options.model ?? env.EVAL_AGENT_MODEL
|
||||
|
||||
if (provider === 'claude-code') {
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'claude-code'
|
||||
return {
|
||||
id,
|
||||
agent: {
|
||||
provider,
|
||||
model: model ?? '',
|
||||
},
|
||||
publicMetadata: {
|
||||
id,
|
||||
agent: {
|
||||
provider,
|
||||
model: model || 'default',
|
||||
apiKeyConfigured: false,
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
|
||||
const apiKey = options.apiKey ?? env.EVAL_AGENT_API_KEY
|
||||
const apiKeyEnv =
|
||||
options.apiKeyEnv ?? (options.apiKey ? undefined : 'EVAL_AGENT_API_KEY')
|
||||
|
||||
@@ -8,7 +8,6 @@ export const SuiteAgentSchema = z
|
||||
'single',
|
||||
'orchestrated',
|
||||
'orchestrator-executor',
|
||||
'claude-code',
|
||||
]),
|
||||
executorBackend: z.enum(['tool-loop', 'clado']).optional(),
|
||||
})
|
||||
|
||||
@@ -19,19 +19,9 @@ export const OrchestratorExecutorConfigSchema = z.object({
|
||||
}),
|
||||
})
|
||||
|
||||
export const ClaudeCodeAgentConfigSchema = z
|
||||
.object({
|
||||
type: z.literal('claude-code'),
|
||||
model: z.string().min(1).optional(),
|
||||
claudePath: z.string().min(1).default('claude'),
|
||||
extraArgs: z.array(z.string()).default([]),
|
||||
})
|
||||
.strict()
|
||||
|
||||
export const AgentConfigSchema = z.discriminatedUnion('type', [
|
||||
SingleAgentConfigSchema,
|
||||
OrchestratorExecutorConfigSchema,
|
||||
ClaudeCodeAgentConfigSchema,
|
||||
])
|
||||
|
||||
export const EvalConfigSchema = z.object({
|
||||
@@ -63,6 +53,5 @@ export type SingleAgentConfig = z.infer<typeof SingleAgentConfigSchema>
|
||||
export type OrchestratorExecutorConfig = z.infer<
|
||||
typeof OrchestratorExecutorConfigSchema
|
||||
>
|
||||
export type ClaudeCodeAgentConfig = z.infer<typeof ClaudeCodeAgentConfigSchema>
|
||||
export type AgentConfig = z.infer<typeof AgentConfigSchema>
|
||||
export type EvalConfig = z.infer<typeof EvalConfigSchema>
|
||||
|
||||
@@ -2,8 +2,6 @@
|
||||
export {
|
||||
type AgentConfig,
|
||||
AgentConfigSchema,
|
||||
type ClaudeCodeAgentConfig,
|
||||
ClaudeCodeAgentConfigSchema,
|
||||
type EvalConfig,
|
||||
EvalConfigSchema,
|
||||
type OrchestratorExecutorConfig,
|
||||
|
||||
@@ -13,7 +13,7 @@ export const GraderResultSchema = z.object({
|
||||
// Agent config in metadata
|
||||
const AgentConfigMetaSchema = z
|
||||
.object({
|
||||
type: z.enum(['single', 'orchestrator-executor', 'claude-code']),
|
||||
type: z.enum(['single', 'orchestrator-executor']),
|
||||
model: z.string().optional(),
|
||||
})
|
||||
.passthrough()
|
||||
|
||||
@@ -59,7 +59,7 @@ export async function validateConfig(
|
||||
) {
|
||||
envVarsToCheck.push(config.agent.apiKey)
|
||||
}
|
||||
} else if (config.agent.type === 'orchestrator-executor') {
|
||||
} else {
|
||||
const { orchestrator, executor } = config.agent
|
||||
if (orchestrator.apiKey && isEnvVarName(orchestrator.apiKey)) {
|
||||
envVarsToCheck.push(orchestrator.apiKey)
|
||||
|
||||
@@ -36,6 +36,5 @@ export async function resolveProviderConfig(
|
||||
accessKeyId: resolveEnvValue(agent.accessKeyId),
|
||||
secretAccessKey: resolveEnvValue(agent.secretAccessKey),
|
||||
sessionToken: resolveEnvValue(agent.sessionToken),
|
||||
region: resolveEnvValue(agent.region),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,8 +1,3 @@
|
||||
import {
|
||||
buildRunMetrics,
|
||||
type EvalRunMetrics,
|
||||
type EvalTaskMetrics,
|
||||
} from '../reporting/task-metrics'
|
||||
import type { GraderResult } from '../types'
|
||||
|
||||
export const VIEWER_MANIFEST_SCHEMA_VERSION = 2
|
||||
@@ -25,7 +20,6 @@ export interface ViewerManifestTaskInput {
|
||||
status: string
|
||||
durationMs: number
|
||||
screenshotCount: number
|
||||
metrics?: EvalTaskMetrics
|
||||
graderResults: Record<string, GraderResult>
|
||||
}
|
||||
|
||||
@@ -41,11 +35,9 @@ export interface ViewerManifest {
|
||||
suiteId?: string
|
||||
variantId?: string
|
||||
uploadedAt?: string
|
||||
reportPath?: string
|
||||
agentConfig?: Record<string, unknown>
|
||||
dataset?: string
|
||||
summary?: Record<string, unknown>
|
||||
metrics?: EvalRunMetrics
|
||||
tasks: ViewerManifestTask[]
|
||||
}
|
||||
|
||||
@@ -54,7 +46,6 @@ export interface BuildViewerManifestInput {
|
||||
suiteId?: string
|
||||
variantId?: string
|
||||
uploadedAt?: string
|
||||
reportPath?: string
|
||||
agentConfig?: Record<string, unknown>
|
||||
dataset?: string
|
||||
summary?: Record<string, unknown>
|
||||
@@ -77,37 +68,22 @@ function taskPaths(queryId: string): ViewerManifestTaskPaths {
|
||||
export function buildViewerManifest(
|
||||
input: BuildViewerManifestInput,
|
||||
): ViewerManifest {
|
||||
const tasks = input.tasks.map((task) => {
|
||||
const { artifactId, ...publicTask } = task
|
||||
const metrics =
|
||||
publicTask.metrics ??
|
||||
({
|
||||
durationMs: publicTask.durationMs,
|
||||
steps: publicTask.screenshotCount,
|
||||
screenshots: publicTask.screenshotCount,
|
||||
toolCalls: 0,
|
||||
toolErrors: 0,
|
||||
} satisfies EvalTaskMetrics)
|
||||
|
||||
return {
|
||||
...publicTask,
|
||||
metrics,
|
||||
startUrl: publicTask.startUrl ?? '',
|
||||
paths: taskPaths(artifactId ?? publicTask.queryId),
|
||||
}
|
||||
})
|
||||
|
||||
return {
|
||||
schemaVersion: VIEWER_MANIFEST_SCHEMA_VERSION,
|
||||
runId: input.runId,
|
||||
...(input.suiteId ? { suiteId: input.suiteId } : {}),
|
||||
...(input.variantId ? { variantId: input.variantId } : {}),
|
||||
...(input.uploadedAt ? { uploadedAt: input.uploadedAt } : {}),
|
||||
...(input.reportPath ? { reportPath: input.reportPath } : {}),
|
||||
...(input.agentConfig ? { agentConfig: input.agentConfig } : {}),
|
||||
...(input.dataset ? { dataset: input.dataset } : {}),
|
||||
...(input.summary ? { summary: input.summary } : {}),
|
||||
metrics: buildRunMetrics(tasks.map((task) => task.metrics)),
|
||||
tasks,
|
||||
tasks: input.tasks.map((task) => {
|
||||
const { artifactId, ...publicTask } = task
|
||||
return {
|
||||
...publicTask,
|
||||
startUrl: publicTask.startUrl ?? '',
|
||||
paths: taskPaths(artifactId ?? publicTask.queryId),
|
||||
}
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,268 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { mkdtemp, readFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { createAgent } from '../../src/agents'
|
||||
import { ClaudeCodeEvaluator } from '../../src/agents/claude-code'
|
||||
import { CaptureContext } from '../../src/capture/context'
|
||||
import {
|
||||
AgentConfigSchema,
|
||||
type EvalConfig,
|
||||
EvalConfigSchema,
|
||||
type Task,
|
||||
TaskMetadataSchema,
|
||||
} from '../../src/types'
|
||||
|
||||
function config(): EvalConfig {
|
||||
return {
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
claudePath: 'claude',
|
||||
extraArgs: [],
|
||||
},
|
||||
dataset: 'data/test.jsonl',
|
||||
num_workers: 1,
|
||||
restart_server_per_task: false,
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
base_cdp_port: 9010,
|
||||
base_server_port: 9110,
|
||||
base_extension_port: 9310,
|
||||
load_extensions: false,
|
||||
headless: false,
|
||||
},
|
||||
graders: [],
|
||||
}
|
||||
}
|
||||
|
||||
const task: Task = {
|
||||
query_id: 'task-1',
|
||||
dataset: 'test',
|
||||
query: 'Find the title',
|
||||
graders: [],
|
||||
metadata: {
|
||||
original_task_id: 'task-1',
|
||||
},
|
||||
}
|
||||
|
||||
describe('ClaudeCodeEvaluator', () => {
|
||||
it('accepts claude-code config defaults without permission mode', () => {
|
||||
const agent = AgentConfigSchema.parse({ type: 'claude-code' })
|
||||
|
||||
expect(agent).toEqual({
|
||||
type: 'claude-code',
|
||||
claudePath: 'claude',
|
||||
extraArgs: [],
|
||||
})
|
||||
})
|
||||
|
||||
it('accepts claude-code as a runnable eval agent', () => {
|
||||
const parsed = EvalConfigSchema.parse({
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
},
|
||||
dataset: 'data/test-set.jsonl',
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
},
|
||||
})
|
||||
|
||||
expect(parsed.agent.type).toBe('claude-code')
|
||||
expect(parsed.agent.model).toBe('opus')
|
||||
})
|
||||
|
||||
it('rejects unsupported claude-code settings instead of silently ignoring them', () => {
|
||||
expect(
|
||||
AgentConfigSchema.safeParse({
|
||||
type: 'claude-code',
|
||||
permissionMode: 'bypassPermissions',
|
||||
}).success,
|
||||
).toBe(false)
|
||||
expect(
|
||||
AgentConfigSchema.safeParse({
|
||||
type: 'claude-code',
|
||||
maxTurns: 3,
|
||||
}).success,
|
||||
).toBe(false)
|
||||
})
|
||||
|
||||
it('allows claude-code in task metadata', () => {
|
||||
const metadata = TaskMetadataSchema.parse({
|
||||
query_id: 'task-1',
|
||||
dataset: 'test',
|
||||
query: 'Do the thing',
|
||||
started_at: new Date().toISOString(),
|
||||
completed_at: new Date().toISOString(),
|
||||
total_duration_ms: 100,
|
||||
total_steps: 1,
|
||||
termination_reason: 'completed',
|
||||
final_answer: 'done',
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
},
|
||||
grader_results: {},
|
||||
})
|
||||
|
||||
expect(metadata.agent_config.type).toBe('claude-code')
|
||||
})
|
||||
|
||||
it('is created by the agent factory', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
|
||||
const agent = createAgent({
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
})
|
||||
|
||||
expect(agent).toBeInstanceOf(ClaudeCodeEvaluator)
|
||||
})
|
||||
|
||||
it('runs claude code, logs messages, writes MCP config, and saves metadata', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
const calls: Array<{ executable: string; args: string[]; cwd: string }> = []
|
||||
const evaluator = new ClaudeCodeEvaluator(
|
||||
{
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
},
|
||||
{
|
||||
processRunner: {
|
||||
async run(options) {
|
||||
calls.push(options)
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [{ type: 'text', text: 'The title is Example' }],
|
||||
},
|
||||
}),
|
||||
)
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'The title is Example',
|
||||
}),
|
||||
)
|
||||
return { exitCode: 0, stderr: '' }
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
const result = await evaluator.execute()
|
||||
|
||||
expect(result.finalAnswer).toBe('The title is Example')
|
||||
expect(result.metadata.agent_config).toMatchObject({
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
expect(result.messages.some((msg) => msg.type === 'user')).toBe(true)
|
||||
expect(result.messages.some((msg) => msg.type === 'text-delta')).toBe(true)
|
||||
const mcpConfig = JSON.parse(
|
||||
await readFile(join(taskOutputDir, 'claude-code-mcp.json'), 'utf-8'),
|
||||
)
|
||||
expect(mcpConfig.mcpServers.browseros).toMatchObject({
|
||||
type: 'http',
|
||||
url: 'http://127.0.0.1:9110/mcp',
|
||||
headers: {
|
||||
'X-BrowserOS-Source': 'sdk-internal',
|
||||
},
|
||||
})
|
||||
expect(calls).toEqual([
|
||||
expect.objectContaining({
|
||||
executable: 'claude',
|
||||
cwd: taskOutputDir,
|
||||
args: [
|
||||
'-p',
|
||||
expect.stringContaining('Task: Find the title'),
|
||||
'--mcp-config',
|
||||
join(taskOutputDir, 'claude-code-mcp.json'),
|
||||
'--strict-mcp-config',
|
||||
'--output-format',
|
||||
'stream-json',
|
||||
'--verbose',
|
||||
'--model',
|
||||
'opus',
|
||||
],
|
||||
}),
|
||||
])
|
||||
expect(calls[0].args).not.toContain('--permission-mode')
|
||||
})
|
||||
|
||||
it('records non-fatal stream processing errors as warnings', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
const evaluator = new ClaudeCodeEvaluator(
|
||||
{
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
},
|
||||
{
|
||||
processRunner: {
|
||||
async run(options) {
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'done',
|
||||
}),
|
||||
)
|
||||
return {
|
||||
exitCode: 0,
|
||||
stderr: '',
|
||||
streamErrors: ['bad stream line'],
|
||||
}
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
const result = await evaluator.execute()
|
||||
|
||||
expect(result.finalAnswer).toBe('done')
|
||||
expect(result.metadata.warnings).toEqual([
|
||||
expect.objectContaining({
|
||||
source: 'message_logging',
|
||||
message: 'Claude Code stream event processing failed: bad stream line',
|
||||
}),
|
||||
])
|
||||
})
|
||||
})
|
||||
@@ -1,78 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { createClaudeCodeProcessRunner } from '../../src/agents/claude-code/process-runner'
|
||||
|
||||
async function writeStdoutScript(): Promise<string> {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'claude-code-runner-'))
|
||||
const script = join(dir, 'stdout-lines')
|
||||
await writeFile(script, '#!/bin/sh\nprintf "first\\nbad\\nlast\\n"\n')
|
||||
await chmod(script, 0o755)
|
||||
return script
|
||||
}
|
||||
|
||||
describe('createClaudeCodeProcessRunner', () => {
|
||||
it('passes executable and args to the spawn dependency', async () => {
|
||||
const calls: unknown[] = []
|
||||
const runner = createClaudeCodeProcessRunner({
|
||||
spawn: async (cmd, options) => {
|
||||
calls.push({ cmd, options })
|
||||
await options.onStdoutLine('{"type":"result","result":"done"}')
|
||||
return { exitCode: 0, stderr: '' }
|
||||
},
|
||||
})
|
||||
|
||||
const result = await runner.run({
|
||||
executable: 'claude',
|
||||
args: ['-p', 'hello'],
|
||||
cwd: '/tmp',
|
||||
signal: new AbortController().signal,
|
||||
onStdoutLine: async () => {},
|
||||
})
|
||||
|
||||
expect(result.exitCode).toBe(0)
|
||||
expect(calls).toEqual([
|
||||
{
|
||||
cmd: ['claude', '-p', 'hello'],
|
||||
options: expect.objectContaining({ cwd: '/tmp' }),
|
||||
},
|
||||
])
|
||||
})
|
||||
|
||||
it('returns stderr and non-zero exit codes', async () => {
|
||||
const runner = createClaudeCodeProcessRunner({
|
||||
spawn: async () => ({ exitCode: 2, stderr: 'bad auth' }),
|
||||
})
|
||||
|
||||
const result = await runner.run({
|
||||
executable: 'claude',
|
||||
args: [],
|
||||
cwd: '/tmp',
|
||||
signal: new AbortController().signal,
|
||||
onStdoutLine: async () => {},
|
||||
})
|
||||
|
||||
expect(result).toEqual({ exitCode: 2, stderr: 'bad auth' })
|
||||
})
|
||||
|
||||
it('continues reading stdout after a line handler error', async () => {
|
||||
const script = await writeStdoutScript()
|
||||
const lines: string[] = []
|
||||
const runner = createClaudeCodeProcessRunner()
|
||||
|
||||
const result = await runner.run({
|
||||
executable: script,
|
||||
args: [],
|
||||
cwd: '/tmp',
|
||||
onStdoutLine: async (line) => {
|
||||
lines.push(line)
|
||||
if (line === 'bad') throw new Error('bad line')
|
||||
},
|
||||
})
|
||||
|
||||
expect(result.exitCode).toBe(0)
|
||||
expect(result.streamErrors).toEqual(['bad line'])
|
||||
expect(lines).toEqual(['first', 'bad', 'last'])
|
||||
})
|
||||
})
|
||||
@@ -1,102 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import {
|
||||
ClaudeCodeStreamParser,
|
||||
shouldCaptureScreenshotForTool,
|
||||
} from '../../src/agents/claude-code/stream-parser'
|
||||
|
||||
describe('ClaudeCodeStreamParser', () => {
|
||||
it('maps assistant text and MCP tool use into eval stream events', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const events = parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [
|
||||
{ type: 'text', text: 'I will navigate.' },
|
||||
{
|
||||
type: 'tool_use',
|
||||
id: 'toolu_1',
|
||||
name: 'mcp__browseros__navigate_page',
|
||||
input: { page: 2, url: 'https://example.com' },
|
||||
},
|
||||
],
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
expect(events).toEqual([
|
||||
{ type: 'text-start', id: expect.any(String) },
|
||||
{
|
||||
type: 'text-delta',
|
||||
id: expect.any(String),
|
||||
delta: 'I will navigate.',
|
||||
},
|
||||
{ type: 'text-end', id: expect.any(String) },
|
||||
{
|
||||
type: 'tool-input-available',
|
||||
toolCallId: 'toolu_1',
|
||||
toolName: 'mcp__browseros__navigate_page',
|
||||
input: { page: 2, url: 'https://example.com' },
|
||||
},
|
||||
])
|
||||
expect(parser.getLastText()).toBe('I will navigate.')
|
||||
expect(parser.getToolCallCount()).toBe(1)
|
||||
})
|
||||
|
||||
it('maps Claude Code tool results into eval output events', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const events = parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'user',
|
||||
message: {
|
||||
content: [
|
||||
{
|
||||
type: 'tool_result',
|
||||
tool_use_id: 'toolu_1',
|
||||
content: 'Navigated successfully',
|
||||
},
|
||||
],
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
expect(events).toEqual([
|
||||
{
|
||||
type: 'tool-output-available',
|
||||
toolCallId: 'toolu_1',
|
||||
output: 'Navigated successfully',
|
||||
},
|
||||
])
|
||||
})
|
||||
|
||||
it('uses result messages as the authoritative final text', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [{ type: 'text', text: 'I will complete the task.' }],
|
||||
},
|
||||
}),
|
||||
)
|
||||
parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'Final answer',
|
||||
}),
|
||||
)
|
||||
|
||||
expect(parser.getLastText()).toBe('Final answer')
|
||||
})
|
||||
|
||||
it('identifies BrowserOS MCP tools that should trigger screenshots', () => {
|
||||
expect(
|
||||
shouldCaptureScreenshotForTool('mcp__browseros__navigate_page'),
|
||||
).toBe(true)
|
||||
expect(
|
||||
shouldCaptureScreenshotForTool('mcp__browseros__take_screenshot'),
|
||||
).toBe(false)
|
||||
expect(shouldCaptureScreenshotForTool('Read')).toBe(false)
|
||||
})
|
||||
})
|
||||
@@ -7,11 +7,8 @@ import {
|
||||
runSuiteCommand,
|
||||
} from '../../src/cli/commands/suite'
|
||||
import type { RunEvalOptions } from '../../src/runner/types'
|
||||
import type { EvalSuite } from '../../src/suites/schema'
|
||||
|
||||
async function writeTempSuite(
|
||||
overrides: Partial<EvalSuite> = {},
|
||||
): Promise<{ dir: string; suitePath: string }> {
|
||||
async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'eval-suite-cli-'))
|
||||
const suitePath = join(dir, 'agisdk-daily-10.json')
|
||||
await writeFile(
|
||||
@@ -26,9 +23,8 @@ async function writeTempSuite(
|
||||
restartBrowserPerTask: true,
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
headless: false,
|
||||
headless: true,
|
||||
},
|
||||
...overrides,
|
||||
},
|
||||
null,
|
||||
2,
|
||||
@@ -47,7 +43,9 @@ describe('suite command', () => {
|
||||
|
||||
expect(resolved.kind).toBe('config')
|
||||
expect(resolved.suite.id).toBe('browseros-agent-weekly')
|
||||
expect(resolved.evalConfig.dataset).toBe('../../data/agisdk-real.jsonl')
|
||||
expect(resolved.evalConfig.dataset).toBe(
|
||||
'../../data/webbench-2of4-50.jsonl',
|
||||
)
|
||||
expect(resolved.variant.publicMetadata.agent.apiKeyConfigured).toBe(true)
|
||||
})
|
||||
|
||||
@@ -77,25 +75,6 @@ describe('suite command', () => {
|
||||
expect(resolved.evalConfig.num_workers).toBe(2)
|
||||
})
|
||||
|
||||
it('resolves claude-code suites without provider API credentials', async () => {
|
||||
const { dir, suitePath } = await writeTempSuite({
|
||||
agent: { type: 'claude-code' },
|
||||
})
|
||||
|
||||
const resolved = await resolveSuiteCommand({
|
||||
suitePath,
|
||||
model: 'opus',
|
||||
env: {},
|
||||
})
|
||||
|
||||
expect(resolved.kind).toBe('suite')
|
||||
expect(resolved.evalConfig.agent).toMatchObject({
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
expect(resolved.datasetPath).toBe(join(dir, 'tasks.jsonl'))
|
||||
})
|
||||
|
||||
it('runs config and suite commands through the runner dependency', async () => {
|
||||
const calls: RunEvalOptions[] = []
|
||||
await runSuiteCommand(
|
||||
|
||||
@@ -1,12 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { shouldAutoOpenDashboard } from '../../src/dashboard/server'
|
||||
|
||||
describe('dashboard server', () => {
|
||||
it('does not auto-open the dashboard in CI', () => {
|
||||
expect(shouldAutoOpenDashboard({ CI: 'true' })).toBe(false)
|
||||
})
|
||||
|
||||
it('auto-opens the dashboard outside CI by default', () => {
|
||||
expect(shouldAutoOpenDashboard({})).toBe(true)
|
||||
})
|
||||
})
|
||||
@@ -1,5 +1,5 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
|
||||
import { mkdtemp, writeFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { runPythonJsonEvaluator } from '../../src/grading/python-evaluator'
|
||||
@@ -11,17 +11,6 @@ async function writeScript(source: string): Promise<string> {
|
||||
return script
|
||||
}
|
||||
|
||||
async function writePythonWrapper(): Promise<string> {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'eval-python-wrapper-'))
|
||||
const wrapper = join(dir, 'python-wrapper')
|
||||
await writeFile(
|
||||
wrapper,
|
||||
'#!/bin/sh\necho custom-python >&2\nexec python3 "$@"\n',
|
||||
)
|
||||
await chmod(wrapper, 0o755)
|
||||
return wrapper
|
||||
}
|
||||
|
||||
describe('runPythonJsonEvaluator', () => {
|
||||
it('sends JSON on stdin, captures stderr, and parses stdout JSON', async () => {
|
||||
const script = await writeScript(`
|
||||
@@ -60,34 +49,6 @@ sys.exit(3)
|
||||
).rejects.toThrow('bad verifier')
|
||||
})
|
||||
|
||||
it('uses BROWSEROS_EVAL_PYTHON when provided', async () => {
|
||||
const script = await writeScript(`
|
||||
import json, sys
|
||||
data = json.loads(sys.stdin.read())
|
||||
print(json.dumps({"ok": data["ok"]}))
|
||||
`)
|
||||
const wrapper = await writePythonWrapper()
|
||||
const previousPythonPath = process.env.BROWSEROS_EVAL_PYTHON
|
||||
process.env.BROWSEROS_EVAL_PYTHON = wrapper
|
||||
|
||||
try {
|
||||
const result = await runPythonJsonEvaluator<{ ok: boolean }>({
|
||||
scriptPath: script,
|
||||
input: { ok: true },
|
||||
timeoutMs: 5_000,
|
||||
})
|
||||
|
||||
expect(result.output).toEqual({ ok: true })
|
||||
expect(result.stderr).toContain('custom-python')
|
||||
} finally {
|
||||
if (previousPythonPath === undefined) {
|
||||
delete process.env.BROWSEROS_EVAL_PYTHON
|
||||
} else {
|
||||
process.env.BROWSEROS_EVAL_PYTHON = previousPythonPath
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
it('enforces timeouts', async () => {
|
||||
const script = await writeScript(`
|
||||
import time
|
||||
|
||||
@@ -40,7 +40,6 @@ async function writeRunFixture(
|
||||
start_url: 'https://example.test',
|
||||
termination_reason: 'completed',
|
||||
total_duration_ms: 1200,
|
||||
total_steps: 4,
|
||||
screenshot_count: 1,
|
||||
agent_config: { type: 'single', model: 'kimi' },
|
||||
grader_results: {
|
||||
@@ -48,22 +47,13 @@ async function writeRunFixture(
|
||||
},
|
||||
}),
|
||||
)
|
||||
await writeFile(
|
||||
join(taskDir, 'messages.jsonl'),
|
||||
[
|
||||
'{"type":"user"}',
|
||||
'{"type":"tool-input-available","toolName":"click"}',
|
||||
'{"type":"tool-input-available","toolName":"take_snapshot"}',
|
||||
'{"type":"tool-output-error","toolName":"click"}',
|
||||
].join('\n'),
|
||||
)
|
||||
await writeFile(join(taskDir, 'messages.jsonl'), '{"type":"user"}\n')
|
||||
await writeFile(join(taskDir, 'grades.json'), '{"ok":true}')
|
||||
await writeFile(join(taskDir, 'screenshots', '1.png'), 'png')
|
||||
await writeFile(
|
||||
join(runDir, 'summary.json'),
|
||||
JSON.stringify({ passRate: 1, avgDurationMs: 1200 }),
|
||||
)
|
||||
await writeFile(join(runDir, 'report.html'), '<html>report</html>')
|
||||
return { runDir, runId: `${configName}-${timestamp}` }
|
||||
}
|
||||
|
||||
@@ -120,9 +110,6 @@ describe('R2Publisher', () => {
|
||||
expect(byKey.get(`runs/${runId}/summary.json`)?.ContentType).toBe(
|
||||
'application/json',
|
||||
)
|
||||
expect(byKey.get(`runs/${runId}/report.html`)?.ContentType).toBe(
|
||||
'text/html',
|
||||
)
|
||||
expect(byKey.get('viewer.html')?.ContentType).toBe('text/html')
|
||||
expect(result.viewerUrl).toBe(
|
||||
`https://eval.example.test/viewer.html?run=${runId}`,
|
||||
@@ -139,28 +126,12 @@ describe('R2Publisher', () => {
|
||||
uploadedAt: '2026-04-29T12:00:00.000Z',
|
||||
agentConfig: { type: 'single', model: 'kimi' },
|
||||
dataset: 'webbench',
|
||||
reportPath: 'report.html',
|
||||
summary: { passRate: 1, avgDurationMs: 1200 },
|
||||
metrics: {
|
||||
taskCount: 1,
|
||||
avgDurationMs: 1200,
|
||||
avgSteps: 4,
|
||||
avgToolCalls: 2,
|
||||
totalToolCalls: 2,
|
||||
totalToolErrors: 1,
|
||||
},
|
||||
tasks: [
|
||||
{
|
||||
queryId: 'task-1',
|
||||
status: 'completed',
|
||||
screenshotCount: 1,
|
||||
metrics: {
|
||||
durationMs: 1200,
|
||||
steps: 4,
|
||||
screenshots: 1,
|
||||
toolCalls: 2,
|
||||
toolErrors: 1,
|
||||
},
|
||||
paths: {
|
||||
attempt: 'tasks/task-1/attempt.json',
|
||||
metadata: 'tasks/task-1/metadata.json',
|
||||
|
||||
@@ -6,7 +6,6 @@ interface ViewerPathResolvers {
|
||||
artifactUrl(task: Record<string, unknown>, artifact: string): string
|
||||
metadataUrl(task: Record<string, unknown>): string
|
||||
messagesUrl(task: Record<string, unknown>): string
|
||||
reportUrl(manifest: Record<string, unknown>): string | null
|
||||
screenshotUrl(task: Record<string, unknown>, step: number): string
|
||||
}
|
||||
|
||||
@@ -25,7 +24,7 @@ async function loadViewerPathResolvers(): Promise<ViewerPathResolvers> {
|
||||
`
|
||||
const basePath = 'runs/run-1';
|
||||
${block}
|
||||
return { artifactUrl, metadataUrl, messagesUrl, reportUrl, screenshotUrl };
|
||||
return { artifactUrl, metadataUrl, messagesUrl, screenshotUrl };
|
||||
`,
|
||||
) as () => ViewerPathResolvers
|
||||
return createResolvers()
|
||||
@@ -61,35 +60,6 @@ async function runAutoSelectFromHash(hash: string): Promise<unknown> {
|
||||
return runAutoSelect()
|
||||
}
|
||||
|
||||
async function runComputeStats(): Promise<unknown> {
|
||||
const html = await readFile(
|
||||
join(import.meta.dir, '..', '..', 'src', 'dashboard', 'viewer.html'),
|
||||
'utf-8',
|
||||
)
|
||||
const start = html.indexOf('function computeStats(tasks)')
|
||||
const end = html.indexOf('function resolveStatus(task)', start)
|
||||
expect(start).toBeGreaterThan(-1)
|
||||
expect(end).toBeGreaterThan(start)
|
||||
|
||||
const block = html.slice(start, end)
|
||||
const compute = new Function(
|
||||
`
|
||||
${block}
|
||||
return computeStats([
|
||||
{
|
||||
graderResults: { agisdk_state_diff: { pass: true, score: 1 } },
|
||||
metrics: { durationMs: 1000, steps: 4, toolCalls: 3, toolErrors: 0 }
|
||||
},
|
||||
{
|
||||
graderResults: { agisdk_state_diff: { pass: false, score: 0 } },
|
||||
metrics: { durationMs: 3000, steps: 8, toolCalls: 5, toolErrors: 2 }
|
||||
}
|
||||
]);
|
||||
`,
|
||||
) as () => unknown
|
||||
return compute()
|
||||
}
|
||||
|
||||
describe('R2 viewer artifact path compatibility', () => {
|
||||
it('uses explicit manifest paths for new uploaded runs', async () => {
|
||||
const resolvers = await loadViewerPathResolvers()
|
||||
@@ -125,15 +95,6 @@ describe('R2 viewer artifact path compatibility', () => {
|
||||
)
|
||||
})
|
||||
|
||||
it('resolves manifest-level run report links', async () => {
|
||||
const resolvers = await loadViewerPathResolvers()
|
||||
|
||||
expect(resolvers.reportUrl({ reportPath: 'report.html' })).toBe(
|
||||
'runs/run-1/report.html',
|
||||
)
|
||||
expect(resolvers.reportUrl({})).toBe(null)
|
||||
})
|
||||
|
||||
it('falls back to legacy inferred paths for old uploaded runs', async () => {
|
||||
const resolvers = await loadViewerPathResolvers()
|
||||
const task = { queryId: 'legacy-task' }
|
||||
@@ -166,17 +127,4 @@ describe('R2 viewer artifact path compatibility', () => {
|
||||
queryId: 'legacy-task',
|
||||
})
|
||||
})
|
||||
|
||||
it('computes run-level timing and tool metrics for the viewer', async () => {
|
||||
expect(await runComputeStats()).toMatchObject({
|
||||
total: 2,
|
||||
passed: 1,
|
||||
failed: 1,
|
||||
avgDurationMs: 2000,
|
||||
avgSteps: 6,
|
||||
avgToolCalls: 4,
|
||||
totalToolCalls: 8,
|
||||
totalToolErrors: 2,
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -1,159 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import {
|
||||
DEFAULT_REPORT_MAX_TURNS,
|
||||
DEFAULT_REPORT_MODEL,
|
||||
generateEvalReport,
|
||||
runClaudeCodeReportAgent,
|
||||
} from '../../scripts/generate-report'
|
||||
|
||||
async function writeRunFixture(): Promise<string> {
|
||||
const runDir = await mkdtemp(join(tmpdir(), 'eval-report-script-'))
|
||||
const taskDir = join(runDir, 'agisdk-networkin-10')
|
||||
await mkdir(join(taskDir, 'screenshots'), { recursive: true })
|
||||
await writeFile(
|
||||
join(runDir, 'summary.json'),
|
||||
JSON.stringify({
|
||||
total: 1,
|
||||
completed: 1,
|
||||
passRate: 0,
|
||||
avgDurationMs: 1234,
|
||||
}),
|
||||
)
|
||||
await writeFile(
|
||||
join(taskDir, 'metadata.json'),
|
||||
JSON.stringify({
|
||||
query_id: 'agisdk-networkin-10',
|
||||
dataset: 'agisdk-real',
|
||||
query: 'Send a follow-up message starting with "Following up on".',
|
||||
termination_reason: 'completed',
|
||||
total_duration_ms: 1234,
|
||||
total_steps: 2,
|
||||
screenshot_count: 1,
|
||||
final_answer: 'No app action was taken.',
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: { type: 'single', model: 'kimi' },
|
||||
grader_results: {
|
||||
agisdk_state_diff: {
|
||||
score: 0,
|
||||
pass: false,
|
||||
reasoning: 'Some criteria failed',
|
||||
details: {
|
||||
per_criterion: [
|
||||
{ passed: true, detail: 'message starts correctly' },
|
||||
{ passed: false, detail: 'message was not sent' },
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
}),
|
||||
)
|
||||
await writeFile(
|
||||
join(taskDir, 'messages.jsonl'),
|
||||
[
|
||||
JSON.stringify({
|
||||
type: 'tool-input-available',
|
||||
timestamp: '2026-04-30T00:00:00.000Z',
|
||||
toolCallId: 'call-1',
|
||||
toolName: 'memory_search',
|
||||
input: { q: 'chat' },
|
||||
}),
|
||||
JSON.stringify({
|
||||
type: 'tool-output-error',
|
||||
timestamp: '2026-04-30T00:00:01.000Z',
|
||||
toolCallId: 'call-1',
|
||||
errorText: 'memory unavailable',
|
||||
}),
|
||||
].join('\n'),
|
||||
)
|
||||
await writeFile(join(taskDir, 'screenshots', '1.png'), 'png')
|
||||
return runDir
|
||||
}
|
||||
|
||||
describe('generate-report script', () => {
|
||||
it('delegates report.html creation to Claude Code', async () => {
|
||||
const runDir = await writeRunFixture()
|
||||
const outputPath = join(runDir, 'report.html')
|
||||
let prompt = ''
|
||||
|
||||
await generateEvalReport({
|
||||
inputDir: runDir,
|
||||
outputPath,
|
||||
runAgent: async (invocation) => {
|
||||
prompt = invocation.prompt
|
||||
await writeFile(
|
||||
invocation.outputPath,
|
||||
'<!doctype html><h1>Claude-written report</h1>',
|
||||
)
|
||||
},
|
||||
})
|
||||
|
||||
expect(await readFile(outputPath, 'utf-8')).toContain(
|
||||
'Claude-written report',
|
||||
)
|
||||
expect(prompt).toContain('AGI SDK Random-10 Failure Report')
|
||||
expect(prompt).toContain('summary.json')
|
||||
expect(prompt).toContain('messages.jsonl')
|
||||
expect(prompt).toContain('screenshots')
|
||||
expect(prompt).toContain('Deterministic run metrics')
|
||||
expect(prompt).toContain('"queryId": "agisdk-networkin-10"')
|
||||
expect(prompt).toContain('"toolCalls": 1')
|
||||
expect(prompt).toContain('"toolErrors": 1')
|
||||
expect(prompt).toContain('Duration by task')
|
||||
expect(prompt).toContain('Tool calls by task')
|
||||
expect(prompt).toContain(outputPath)
|
||||
})
|
||||
|
||||
it('fails when the Claude Code agent does not write the report', async () => {
|
||||
const runDir = await writeRunFixture()
|
||||
|
||||
await expect(
|
||||
generateEvalReport({
|
||||
inputDir: runDir,
|
||||
outputPath: join(runDir, 'missing-report.html'),
|
||||
runAgent: async () => {},
|
||||
}),
|
||||
).rejects.toThrow('Report was not written')
|
||||
})
|
||||
|
||||
it('runs Claude Code with Opus 4.6, full bypass, and bounded turns', async () => {
|
||||
const runDir = await writeRunFixture()
|
||||
const calls: unknown[] = []
|
||||
|
||||
await runClaudeCodeReportAgent(
|
||||
{
|
||||
inputDir: runDir,
|
||||
outputPath: join(runDir, 'report.html'),
|
||||
prompt: 'write the report',
|
||||
},
|
||||
{
|
||||
query: async function* (call: unknown) {
|
||||
calls.push(call)
|
||||
yield { type: 'result', subtype: 'success', result: 'done' }
|
||||
},
|
||||
env: {
|
||||
CLAUDE_CODE_OAUTH_TOKEN: 'token',
|
||||
EVAL_R2_SECRET_ACCESS_KEY: 'secret',
|
||||
HOME: '/tmp/home',
|
||||
PATH: '/bin',
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
expect(calls).toHaveLength(1)
|
||||
expect(calls[0]).toMatchObject({
|
||||
prompt: 'write the report',
|
||||
options: {
|
||||
cwd: runDir,
|
||||
model: DEFAULT_REPORT_MODEL,
|
||||
maxTurns: DEFAULT_REPORT_MAX_TURNS,
|
||||
permissionMode: 'bypassPermissions',
|
||||
allowDangerouslySkipPermissions: true,
|
||||
},
|
||||
})
|
||||
expect(JSON.stringify(calls[0])).not.toContain('secret')
|
||||
})
|
||||
})
|
||||
@@ -1,22 +1,19 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { mkdtemp, writeFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { adaptEvalConfigFile } from '../../src/suites/config-adapter'
|
||||
|
||||
describe('adaptEvalConfigFile', () => {
|
||||
it('preserves browseros-agent-weekly AGI SDK config semantics', async () => {
|
||||
it('preserves browseros-agent-weekly config semantics', async () => {
|
||||
const adapted = await adaptEvalConfigFile(
|
||||
'apps/eval/configs/legacy/browseros-agent-weekly.json',
|
||||
)
|
||||
|
||||
expect(adapted.suite.id).toBe('browseros-agent-weekly')
|
||||
expect(adapted.suite.dataset).toBe('../../data/agisdk-real.jsonl')
|
||||
expect(adapted.suite.graders).toEqual(['agisdk_state_diff'])
|
||||
expect(adapted.suite.workers).toBe(3)
|
||||
expect(adapted.suite.dataset).toBe('../../data/webbench-2of4-50.jsonl')
|
||||
expect(adapted.suite.graders).toEqual(['performance_grader'])
|
||||
expect(adapted.suite.workers).toBe(10)
|
||||
expect(adapted.suite.restartBrowserPerTask).toBe(true)
|
||||
expect(adapted.suite.timeoutMs).toBe(1_800_000)
|
||||
expect(adapted.evalConfig.num_workers).toBe(3)
|
||||
expect(adapted.evalConfig.num_workers).toBe(10)
|
||||
expect(adapted.evalConfig.browseros.server_url).toBe(
|
||||
'http://127.0.0.1:9110',
|
||||
)
|
||||
@@ -37,61 +34,4 @@ describe('adaptEvalConfigFile', () => {
|
||||
'secret-openrouter-value',
|
||||
)
|
||||
})
|
||||
|
||||
it('adapts BrowserOS AGI SDK comparison configs', async () => {
|
||||
const kimi = await adaptEvalConfigFile(
|
||||
'apps/eval/configs/legacy/browseros-agent-kimi-k2-5-agisdk-real.json',
|
||||
)
|
||||
const opus = await adaptEvalConfigFile(
|
||||
'apps/eval/configs/legacy/browseros-agent-opus-4-6-agisdk-real.json',
|
||||
)
|
||||
|
||||
expect(kimi.suite.id).toBe('browseros-agent-kimi-k2-5-agisdk-real')
|
||||
expect(kimi.evalConfig.agent).toMatchObject({
|
||||
type: 'single',
|
||||
provider: 'openai-compatible',
|
||||
model: 'moonshotai/kimi-k2.5',
|
||||
})
|
||||
expect(kimi.evalConfig.num_workers).toBe(3)
|
||||
|
||||
expect(opus.suite.id).toBe('browseros-agent-opus-4-6-agisdk-real')
|
||||
expect(opus.evalConfig.agent).toMatchObject({
|
||||
type: 'single',
|
||||
provider: 'bedrock',
|
||||
model: 'global.anthropic.claude-opus-4-6-v1',
|
||||
region: 'AWS_REGION',
|
||||
accessKeyId: 'AWS_ACCESS_KEY_ID',
|
||||
secretAccessKey: 'AWS_SECRET_ACCESS_KEY',
|
||||
})
|
||||
expect(opus.evalConfig.num_workers).toBe(2)
|
||||
})
|
||||
|
||||
it('adapts claude-code configs without provider credentials', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'claude-code-config-'))
|
||||
const configPath = join(dir, 'claude-code-agisdk.json')
|
||||
await writeFile(
|
||||
configPath,
|
||||
JSON.stringify({
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
},
|
||||
dataset: 'tasks.jsonl',
|
||||
num_workers: 1,
|
||||
restart_server_per_task: false,
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
headless: false,
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
const adapted = await adaptEvalConfigFile(configPath, { env: {} })
|
||||
|
||||
expect(adapted.suite.agent).toEqual({ type: 'claude-code' })
|
||||
expect(adapted.variant.agent).toMatchObject({
|
||||
provider: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -35,16 +35,6 @@ describe('EvalSuiteSchema', () => {
|
||||
expect(parsed.success).toBe(false)
|
||||
})
|
||||
|
||||
it('validates claude-code suites', () => {
|
||||
const suite = EvalSuiteSchema.parse({
|
||||
id: 'claude-code-agisdk',
|
||||
dataset: 'data/agisdk-real.jsonl',
|
||||
agent: { type: 'claude-code' },
|
||||
})
|
||||
|
||||
expect(suite.agent.type).toBe('claude-code')
|
||||
})
|
||||
|
||||
it('validates the daily AGISDK 10-task suite', async () => {
|
||||
const loaded = await loadSuite(
|
||||
'apps/eval/configs/suites/agisdk-daily-10.json',
|
||||
@@ -99,40 +89,4 @@ describe('resolveVariant', () => {
|
||||
}),
|
||||
).toThrow('EVAL_AGENT_API_KEY')
|
||||
})
|
||||
|
||||
it('resolves claude-code variants without model or API key requirements', () => {
|
||||
const variant = resolveVariant({
|
||||
variantId: 'claude-opus',
|
||||
provider: 'claude-code',
|
||||
model: 'opus',
|
||||
env: {},
|
||||
})
|
||||
|
||||
expect(variant.id).toBe('claude-opus')
|
||||
expect(variant.agent).toEqual({
|
||||
provider: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
expect(variant.publicMetadata.agent).toEqual({
|
||||
provider: 'claude-code',
|
||||
model: 'opus',
|
||||
apiKeyConfigured: false,
|
||||
})
|
||||
|
||||
const defaultVariant = resolveVariant({
|
||||
provider: 'claude-code',
|
||||
env: {},
|
||||
})
|
||||
|
||||
expect(defaultVariant.id).toBe('claude-code')
|
||||
expect(defaultVariant.agent).toEqual({
|
||||
provider: 'claude-code',
|
||||
model: '',
|
||||
})
|
||||
expect(defaultVariant.publicMetadata.agent).toEqual({
|
||||
provider: 'claude-code',
|
||||
model: 'default',
|
||||
apiKeyConfigured: false,
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -1,38 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { resolveProviderConfig } from '../../src/utils/resolve-provider-config'
|
||||
|
||||
describe('resolveProviderConfig', () => {
|
||||
it('resolves Bedrock region from environment variables', async () => {
|
||||
const previous = {
|
||||
AWS_REGION: process.env.AWS_REGION,
|
||||
AWS_ACCESS_KEY_ID: process.env.AWS_ACCESS_KEY_ID,
|
||||
AWS_SECRET_ACCESS_KEY: process.env.AWS_SECRET_ACCESS_KEY,
|
||||
}
|
||||
process.env.AWS_REGION = 'us-west-2'
|
||||
process.env.AWS_ACCESS_KEY_ID = 'test-access-key'
|
||||
process.env.AWS_SECRET_ACCESS_KEY = 'test-secret-key'
|
||||
|
||||
try {
|
||||
const resolved = await resolveProviderConfig({
|
||||
provider: 'bedrock',
|
||||
model: 'global.anthropic.claude-opus-4-6-v1',
|
||||
region: 'AWS_REGION',
|
||||
accessKeyId: 'AWS_ACCESS_KEY_ID',
|
||||
secretAccessKey: 'AWS_SECRET_ACCESS_KEY',
|
||||
})
|
||||
|
||||
expect(resolved).toMatchObject({
|
||||
provider: 'bedrock',
|
||||
model: 'global.anthropic.claude-opus-4-6-v1',
|
||||
region: process.env.AWS_REGION,
|
||||
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
|
||||
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
|
||||
})
|
||||
} finally {
|
||||
for (const [key, value] of Object.entries(previous)) {
|
||||
if (value === undefined) delete process.env[key]
|
||||
else process.env[key] = value
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
@@ -9,7 +9,6 @@ describe('buildViewerManifest', () => {
|
||||
suiteId: 'agisdk-daily-10',
|
||||
variantId: 'kimi',
|
||||
uploadedAt: '2026-04-29T06:00:00.000Z',
|
||||
reportPath: 'report.html',
|
||||
summary: { total: 1, passRate: 0 },
|
||||
tasks: [
|
||||
{
|
||||
@@ -19,13 +18,6 @@ describe('buildViewerManifest', () => {
|
||||
status: 'completed',
|
||||
durationMs: 353_000,
|
||||
screenshotCount: 42,
|
||||
metrics: {
|
||||
durationMs: 353_000,
|
||||
steps: 47,
|
||||
screenshots: 42,
|
||||
toolCalls: 19,
|
||||
toolErrors: 2,
|
||||
},
|
||||
graderResults: {
|
||||
agisdk_state_diff: {
|
||||
score: 0,
|
||||
@@ -40,7 +32,6 @@ describe('buildViewerManifest', () => {
|
||||
|
||||
const publishManifest: R2RunManifest = manifest
|
||||
expect(publishManifest.schemaVersion).toBe(2)
|
||||
expect(manifest.reportPath).toBe('report.html')
|
||||
expect(manifest.tasks[0].paths.messages).toBe(
|
||||
'tasks/agisdk-dashdish-4/messages.jsonl',
|
||||
)
|
||||
@@ -50,21 +41,6 @@ describe('buildViewerManifest', () => {
|
||||
expect(manifest.tasks[0].paths.graderArtifacts).toBe(
|
||||
'tasks/agisdk-dashdish-4/grader-artifacts',
|
||||
)
|
||||
expect(manifest.metrics).toMatchObject({
|
||||
taskCount: 1,
|
||||
avgDurationMs: 353_000,
|
||||
avgSteps: 47,
|
||||
avgToolCalls: 19,
|
||||
totalToolCalls: 19,
|
||||
totalToolErrors: 2,
|
||||
})
|
||||
expect(manifest.tasks[0].metrics).toEqual({
|
||||
durationMs: 353_000,
|
||||
steps: 47,
|
||||
screenshots: 42,
|
||||
toolCalls: 19,
|
||||
toolErrors: 2,
|
||||
})
|
||||
expect(manifest.tasks[0].graderResults.agisdk_state_diff.details).toEqual({
|
||||
missing: ['checkout item'],
|
||||
})
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user