mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-13 23:53:25 +00:00
Compare commits
22 Commits
fix/eval-3
...
fix/auto-i
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
58cb43ec7f | ||
|
|
eb90fcb6b3 | ||
|
|
7c942e91ce | ||
|
|
1ff92c44b3 | ||
|
|
c81906ecbf | ||
|
|
ffc0f09c86 | ||
|
|
7fb53c9921 | ||
|
|
d38b01a8c7 | ||
|
|
ff36c8412b | ||
|
|
fd5aba249b | ||
|
|
492f3fcdf2 | ||
|
|
cb0c0dd0c1 | ||
|
|
8712f89f18 | ||
|
|
ba60bf466f | ||
|
|
26afb826c6 | ||
|
|
b2340c8afa | ||
|
|
790a270f47 | ||
|
|
84a79ba0a1 | ||
|
|
6e3306f5e5 | ||
|
|
c244462b29 | ||
|
|
ebf97f74f6 | ||
|
|
561f2baf97 |
152
.claude/skills/ask-internal/SKILL.md
Normal file
152
.claude/skills/ask-internal/SKILL.md
Normal file
@@ -0,0 +1,152 @@
|
||||
---
|
||||
name: ask-internal
|
||||
description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
|
||||
allowed-tools: Bash, Read, Grep, Glob, Edit, Write
|
||||
---
|
||||
|
||||
# Ask Internal
|
||||
|
||||
Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
|
||||
|
||||
**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
|
||||
|
||||
## When to use
|
||||
|
||||
- "How do I reset my dogfood profile?"
|
||||
- "What's the deal with the OpenClaw VM startup?"
|
||||
- "Where do we configure release signing?"
|
||||
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
|
||||
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
|
||||
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER cite a file or line number you have not actually read.
|
||||
|
||||
## Voice rules
|
||||
|
||||
Apply the same voice rules as `document-internal` to the synthesized answer:
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences. Active voice. No em dashes.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- No filler intros.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
```bash
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
|
||||
echo ".internal-docs/ missing or empty. Submodule not configured?"
|
||||
exit 0
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1: Parse the question
|
||||
|
||||
Pull the keywords from the user's question. Drop stop words. Identify intent:
|
||||
|
||||
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
|
||||
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
|
||||
- **Free-form** ("anything about Y"): search all categories.
|
||||
|
||||
### Step 2: Multi-source search
|
||||
|
||||
Run grep in parallel across two sources.
|
||||
|
||||
**Internal docs:**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.md' '<keyword>' .internal-docs/
|
||||
```
|
||||
|
||||
Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
|
||||
|
||||
**Codebase (skip vendored Chromium and `node_modules`):**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
|
||||
--exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
|
||||
'<keyword>' packages/ scripts/ .config/ .github/
|
||||
```
|
||||
|
||||
Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
|
||||
|
||||
### Step 3: Synthesize answer
|
||||
|
||||
Structure the response:
|
||||
|
||||
1. **Direct answer.** First sentence answers the question. No preamble.
|
||||
2. **Steps if applicable.** Numbered list with exact commands.
|
||||
3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
|
||||
|
||||
If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
|
||||
|
||||
### Step 4: Offer execution (only if commands surfaced)
|
||||
|
||||
If Step 3 produced executable commands the user could run, ask:
|
||||
|
||||
> Run these for you? (y / n / dry-run)
|
||||
|
||||
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
|
||||
- **n:** Skip. Done.
|
||||
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
|
||||
|
||||
### Step 5: Doc-not-found path
|
||||
|
||||
If Step 2 returned nothing useful (no doc hits AND no clear code answer):
|
||||
|
||||
1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
|
||||
2. Ask: "Draft a new doc and open a PR to internal-docs?"
|
||||
3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
|
||||
|
||||
### Step 6: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — answer delivered, citations verified.
|
||||
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
|
||||
- **BLOCKED** — submodule missing or other pre-flight failure.
|
||||
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
|
||||
|
||||
## Citation discipline
|
||||
|
||||
Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
|
||||
|
||||
If a doc says one thing and the code says another, surface the conflict explicitly:
|
||||
|
||||
> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Skimming and then citing**
|
||||
- **Problem:** Citation points to a line that doesn't actually contain the claim.
|
||||
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
|
||||
|
||||
**Executing without per-command confirmation for mutations**
|
||||
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
|
||||
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
|
||||
|
||||
**Searching only docs, not code**
|
||||
- **Problem:** Doc says X but code does Y; answer is wrong.
|
||||
- **Fix:** Always grep both sources in Step 2.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Cite a file:line you haven't read.
|
||||
- Run mutations without per-command confirmation.
|
||||
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before any search.
|
||||
- Reconcile doc vs code conflicts in the answer, don't hide them.
|
||||
- Plain "no doc covers this" when grep is empty — never invent.
|
||||
208
.claude/skills/document-internal/SKILL.md
Normal file
208
.claude/skills/document-internal/SKILL.md
Normal file
@@ -0,0 +1,208 @@
|
||||
---
|
||||
name: document-internal
|
||||
description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
|
||||
allowed-tools: Bash, Read, Write, Edit, Grep, Glob
|
||||
---
|
||||
|
||||
# Document Internal
|
||||
|
||||
Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
|
||||
|
||||
**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
|
||||
|
||||
## When to use
|
||||
|
||||
After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
|
||||
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
|
||||
- NEVER fabricate filler content for empty template sections. Empty stays empty.
|
||||
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
|
||||
|
||||
## Voice rules — enforced by Step 4
|
||||
|
||||
The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
|
||||
|
||||
- Lead with the point. First sentence answers "what is this?"
|
||||
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
|
||||
- Short sentences. Average <20 words. No deeply nested clauses.
|
||||
- Active voice. "X does Y" not "Y is done by X".
|
||||
- No em dashes. Use commas, periods, or rephrase.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
|
||||
- No filler intros ("This document describes..."). Start with the substance.
|
||||
- Empty sections stay empty. Do not write "N/A" or fabricate content.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
Bail with a clear message on any failure.
|
||||
|
||||
```bash
|
||||
# Submodule must be initialized
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
|
||||
|
||||
# Must be on a feature branch
|
||||
BRANCH=$(git branch --show-current)
|
||||
if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
|
||||
echo "On $BRANCH. Run from a feature branch."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Determine base branch (default: dev for this repo, fall back to main).
|
||||
# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
|
||||
BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
|
||||
|
||||
# Gather context
|
||||
git log "$BASE..HEAD" --oneline
|
||||
git diff "$BASE...HEAD" --stat
|
||||
gh pr view --json body -q .body 2>/dev/null # may be empty if no PR yet
|
||||
```
|
||||
|
||||
### Step 1: Identify the doc
|
||||
|
||||
Ask the user for three things in one prompt:
|
||||
|
||||
1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
|
||||
2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
|
||||
3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
|
||||
|
||||
### Step 2: Decision brief — four sharp questions
|
||||
|
||||
Ask one question at a time. Each answer constrains the next. These force compression before drafting.
|
||||
|
||||
1. "In one sentence: what can someone now DO that they could not before?"
|
||||
2. "What is the one design decision a future engineer needs to know?"
|
||||
3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
|
||||
4. "Any sharp edges or gotchas? (or 'none')"
|
||||
|
||||
Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
|
||||
|
||||
### Step 3: Draft from the template
|
||||
|
||||
Read the matching template from `.internal-docs/_templates/`:
|
||||
|
||||
- `feature` → `feature-note.md`
|
||||
- `architecture` → `architecture-note.md`
|
||||
- `design` → `design-spec.md`
|
||||
|
||||
If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
|
||||
|
||||
Generate the 1-pager from the template, the four answers, and the diff context.
|
||||
|
||||
### Step 4: Voice self-check
|
||||
|
||||
Scan the draft for violations:
|
||||
|
||||
- Em dash present (`—`).
|
||||
- Any banned word from the list.
|
||||
- Average sentence length > 20 words.
|
||||
- Body line count > 60 (feature notes only — architecture/design have no cap).
|
||||
|
||||
If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
|
||||
|
||||
If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
|
||||
|
||||
### Step 5: Show + iterate
|
||||
|
||||
Print the full draft. Ask:
|
||||
|
||||
> Edit needed? Paste any changes, or say "looks good".
|
||||
|
||||
Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
|
||||
|
||||
### Step 6: Open PR to internal-docs
|
||||
|
||||
Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
|
||||
|
||||
```bash
|
||||
TMP=$(mktemp -d)
|
||||
trap 'rm -rf "$TMP"' EXIT # cleans up even if any step below fails
|
||||
git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
|
||||
cd "$TMP"
|
||||
git checkout -b "docs/<slug>"
|
||||
|
||||
# Write the doc
|
||||
mkdir -p "<type>" # features, architecture, designs, or setup
|
||||
cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
|
||||
<draft content>
|
||||
DOC
|
||||
|
||||
# Update the root README index — insert one line under the matching section
|
||||
# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
|
||||
|
||||
git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
|
||||
git commit -m "docs(<type>): <slug>"
|
||||
git push -u origin "docs/<slug>"
|
||||
|
||||
PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
|
||||
--head "docs/<slug>" \
|
||||
--title "docs(<type>): <slug>" \
|
||||
--body "$(cat <<'BODY'
|
||||
## Summary
|
||||
<one-line of what this doc covers>
|
||||
|
||||
## Source
|
||||
- BrowserOS branch: <branch>
|
||||
- Related PR: <#NNN if any>
|
||||
BODY
|
||||
)")
|
||||
|
||||
cd -
|
||||
echo "PR opened: $PR_URL"
|
||||
# trap above cleans up $TMP on EXIT
|
||||
```
|
||||
|
||||
If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
|
||||
|
||||
### Step 7: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
|
||||
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
|
||||
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
|
||||
|
||||
## Doc type defaults
|
||||
|
||||
| Branch pattern | Default doc type | Default location |
|
||||
|----------------|------------------|------------------|
|
||||
| `feat/*` | feature | `features/` |
|
||||
| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
|
||||
| `rfc/*` or `design/*` | design | `designs/` |
|
||||
| Otherwise | ask | ask |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Drafting before asking the four questions**
|
||||
- **Problem:** Output is generic filler that says nothing concrete.
|
||||
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
|
||||
|
||||
**Touching `.internal-docs/` directly**
|
||||
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
|
||||
- **Fix:** Always use the tmp clone in Step 6.
|
||||
|
||||
**Skipping voice check on user edits**
|
||||
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
|
||||
- **Fix:** Re-run Step 4 after every user edit.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Push to `internal-docs/main`. Always branch + PR.
|
||||
- Modify the OSS repo's `.gitmodules` or submodule pointer.
|
||||
- Fabricate content for empty template sections.
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before doing any work.
|
||||
- One-pager rule for feature notes (60-line body cap).
|
||||
- File:line citations when referencing code.
|
||||
51
.claude/skills/document-internal/seeds/README.md
Normal file
51
.claude/skills/document-internal/seeds/README.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# BrowserOS Internal Docs
|
||||
|
||||
Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
|
||||
|
||||
If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
|
||||
|
||||
## How to find what you need
|
||||
|
||||
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
|
||||
- Recently shipped feature → look in [`features/`](features/)
|
||||
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
|
||||
- A design decision or RFC → look in [`designs/`](designs/)
|
||||
|
||||
Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
|
||||
|
||||
## How to add a doc
|
||||
|
||||
Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
|
||||
|
||||
## Index
|
||||
|
||||
### Setup
|
||||
<!-- one line per setup runbook: -->
|
||||
<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
|
||||
|
||||
### Features
|
||||
<!-- one line per shipped feature, newest first: -->
|
||||
<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
|
||||
|
||||
### Architecture
|
||||
<!-- one line per cross-cutting subsystem: -->
|
||||
<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
|
||||
|
||||
### Designs
|
||||
<!-- one line per design spec, newest first: -->
|
||||
<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
|
||||
|
||||
## Templates
|
||||
|
||||
When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
|
||||
|
||||
## Voice
|
||||
|
||||
Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences, active voice, no em dashes.
|
||||
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
|
||||
- Empty sections stay empty. Do not write "N/A" or fake content.
|
||||
- Feature notes target one screen, body 60 lines max.
|
||||
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: <subsystem name>
|
||||
owner: <github handle>
|
||||
status: current | deprecated
|
||||
date: YYYY-MM-DD
|
||||
related-features: [feature-slug-1, feature-slug-2]
|
||||
---
|
||||
|
||||
# <subsystem name>
|
||||
|
||||
## What this subsystem does
|
||||
<1-2 paragraphs. The top-level responsibility. Boundaries.>
|
||||
|
||||
## Architecture
|
||||
<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
|
||||
|
||||
## Constraints
|
||||
<Hard rules the design enforces. "X must never call Y" type statements.>
|
||||
|
||||
## Decisions made
|
||||
<Numbered list of non-obvious decisions and the reason for each.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — role
|
||||
- `path/to/dir/` — what lives here
|
||||
|
||||
## How to evolve this
|
||||
<Where to add things. Which tests to expect to update. What NOT to touch.>
|
||||
|
||||
## Open questions
|
||||
<What is still being figured out. Empty if none.>
|
||||
@@ -0,0 +1,34 @@
|
||||
---
|
||||
title: <design name>
|
||||
owner: <github handle>
|
||||
status: proposed | accepted | rejected | superseded
|
||||
date: YYYY-MM-DD
|
||||
supersedes: <design-slug or none>
|
||||
---
|
||||
|
||||
# <design name>
|
||||
|
||||
## Goal
|
||||
<2-4 sentences. What this design is trying to accomplish.>
|
||||
|
||||
## Context
|
||||
<1-2 paragraphs. The current state, what is failing, why this needs to change.>
|
||||
|
||||
## Selected Approach
|
||||
<The chosen design at a high level. Architecture, components, data flow.>
|
||||
|
||||
## Alternatives Considered
|
||||
### 1. <name>
|
||||
<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
|
||||
|
||||
### 2. <name>
|
||||
<Same shape.>
|
||||
|
||||
## Out of Scope
|
||||
<What this design does NOT cover. Defer references.>
|
||||
|
||||
## Rollout
|
||||
<Numbered steps from "nothing exists" to "fully shipped".>
|
||||
|
||||
## Open Questions
|
||||
<Resolved during design? Empty. Unresolved? List with owner.>
|
||||
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: <feature name>
|
||||
owner: <github handle>
|
||||
status: shipped | wip | deprecated
|
||||
date: YYYY-MM-DD
|
||||
prs: ["#NNN"]
|
||||
tags: [agent, browser, mcp]
|
||||
---
|
||||
|
||||
# <feature name>
|
||||
|
||||
## What it does
|
||||
<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
|
||||
|
||||
## Why we built it
|
||||
<1-2 sentences. Motivation. What pain it removed or what unlocked.>
|
||||
|
||||
## How it works
|
||||
<3-6 sentences. The flow at a high level. Name the key files.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — what it does
|
||||
- `path/to/other.ts` — what it does
|
||||
|
||||
## How to run / test it locally
|
||||
<bullet list of commands. Empty section if N/A — do not fake.>
|
||||
|
||||
## Gotchas
|
||||
<known sharp edges. "If you see X, that's why." Empty if N/A.>
|
||||
29
.github/workflows/eval-weekly.yml
vendored
29
.github/workflows/eval-weekly.yml
vendored
@@ -14,7 +14,7 @@ on:
|
||||
config:
|
||||
description: 'Eval config file (relative to apps/eval/)'
|
||||
required: false
|
||||
default: 'configs/browseros-agent-weekly.json'
|
||||
default: 'configs/legacy/browseros-agent-weekly.json'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
@@ -62,36 +62,27 @@ jobs:
|
||||
curl -sL -o /tmp/nopecha.zip https://github.com/NopeCHALLC/nopecha-extension/releases/latest/download/chromium_automation.zip
|
||||
unzip -qo /tmp/nopecha.zip -d extensions/nopecha
|
||||
|
||||
- name: Run eval
|
||||
- name: Run eval and publish to R2
|
||||
working-directory: packages/browseros-agent/apps/eval
|
||||
env:
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
|
||||
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
|
||||
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
|
||||
BROWSEROS_BINARY: /usr/bin/browseros
|
||||
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
|
||||
# OpenClaw container runtime is macOS-only; opt the Linux runner
|
||||
# into the no-op stub so the server can boot and the eval can run.
|
||||
BROWSEROS_SKIP_OPENCLAW: '1'
|
||||
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
|
||||
run: |
|
||||
echo "Running eval with config: $EVAL_CONFIG"
|
||||
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts -c "$EVAL_CONFIG"
|
||||
|
||||
- name: Upload runs to R2
|
||||
if: success()
|
||||
working-directory: packages/browseros-agent/apps/eval
|
||||
env:
|
||||
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}
|
||||
EVAL_R2_ACCESS_KEY_ID: ${{ secrets.EVAL_R2_ACCESS_KEY_ID }}
|
||||
EVAL_R2_SECRET_ACCESS_KEY: ${{ secrets.EVAL_R2_SECRET_ACCESS_KEY }}
|
||||
EVAL_R2_BUCKET: ${{ secrets.EVAL_R2_BUCKET }}
|
||||
EVAL_R2_CDN_BASE_URL: ${{ secrets.EVAL_R2_CDN_BASE_URL }}
|
||||
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
|
||||
BROWSEROS_BINARY: /usr/bin/browseros
|
||||
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
|
||||
# OpenClaw container runtime is macOS-only; opt the Linux runner
|
||||
# into the no-op stub so the server can boot and the eval can run.
|
||||
BROWSEROS_SKIP_OPENCLAW: '1'
|
||||
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/legacy/browseros-agent-weekly.json' }}
|
||||
run: |
|
||||
CONFIG_NAME=$(basename "$EVAL_CONFIG" .json)
|
||||
bun scripts/upload-run.ts "results/$CONFIG_NAME"
|
||||
echo "Running eval with config: $EVAL_CONFIG"
|
||||
xvfb-run --auto-servernum --server-args="-screen 0 1440x900x24" bun run src/index.ts suite --config "$EVAL_CONFIG" --publish r2
|
||||
|
||||
- name: Generate trend report
|
||||
if: success()
|
||||
|
||||
167
.github/workflows/publish-vm-agent-cache.yml
vendored
167
.github/workflows/publish-vm-agent-cache.yml
vendored
@@ -1,167 +0,0 @@
|
||||
name: Publish VM Agent Cache
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
agent:
|
||||
description: "Agent name from bundle.json"
|
||||
required: true
|
||||
type: string
|
||||
default: openclaw
|
||||
publish:
|
||||
description: "Upload to R2 and merge manifest slice"
|
||||
required: false
|
||||
default: false
|
||||
type: boolean
|
||||
pull_request:
|
||||
paths:
|
||||
- "packages/browseros-agent/packages/build-tools/**"
|
||||
- ".github/workflows/publish-vm-agent-cache.yml"
|
||||
|
||||
env:
|
||||
BUN_VERSION: "1.3.6"
|
||||
PKG_DIR: packages/browseros-agent/packages/build-tools
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check:
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun run --filter @browseros/build-tools typecheck
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun run --filter @browseros/build-tools test
|
||||
|
||||
build:
|
||||
needs: check
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- arch: arm64
|
||||
runner: ubuntu-24.04-arm
|
||||
- arch: x64
|
||||
runner: ubuntu-24.04
|
||||
runs-on: ${{ matrix.runner }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- name: Install podman
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y podman
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Build tarball
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
OUT: ${{ github.workspace }}/dist/images
|
||||
run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
|
||||
path: dist/images/
|
||||
retention-days: 7
|
||||
|
||||
smoke:
|
||||
needs: build
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- arch: arm64
|
||||
runner: ubuntu-24.04-arm
|
||||
- arch: x64
|
||||
runner: ubuntu-24.04
|
||||
runs-on: ${{ matrix.runner }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
|
||||
path: dist/images
|
||||
- name: Install podman
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y podman
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Smoke test tarball
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-${{ matrix.arch }}.tar.gz" -print -quit)"
|
||||
if [ -z "$tarball" ]; then
|
||||
echo "missing ${{ matrix.arch }} tarball artifact for ${AGENT}" >&2
|
||||
exit 1
|
||||
fi
|
||||
bun run smoke:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --tarball "$tarball"
|
||||
|
||||
publish:
|
||||
needs: [build, smoke]
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
|
||||
runs-on: ubuntu-24.04
|
||||
environment: release
|
||||
concurrency:
|
||||
group: r2-manifest-publish
|
||||
cancel-in-progress: false
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
pattern: tarball-*
|
||||
path: dist/images
|
||||
merge-multiple: true
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Upload tarballs to R2
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
|
||||
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
|
||||
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
|
||||
R2_BUCKET: ${{ secrets.R2_BUCKET }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
|
||||
base="$(basename "$file")"
|
||||
bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
|
||||
done
|
||||
- name: Merge agent slice into manifest
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
|
||||
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
|
||||
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
|
||||
R2_BUCKET: ${{ secrets.R2_BUCKET }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
mkdir -p dist/images
|
||||
cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
|
||||
bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
|
||||
bun run emit-manifest -- \
|
||||
--slice "agents:${AGENT}" \
|
||||
--dist-dir dist \
|
||||
--merge-from dist/baseline-manifest.json \
|
||||
--out dist/manifest.json
|
||||
bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
|
||||
53
.github/workflows/sync-internal-docs.yml
vendored
Normal file
53
.github/workflows/sync-internal-docs.yml
vendored
Normal file
@@ -0,0 +1,53 @@
|
||||
name: Sync internal-docs submodule
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 */4 * * *'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
sync:
|
||||
name: Bump internal-docs submodule pointer on dev
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Rewrite SSH submodule URL to HTTPS-with-token
|
||||
env:
|
||||
TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
run: |
|
||||
git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
submodules: true
|
||||
ref: dev
|
||||
fetch-depth: 50
|
||||
|
||||
- name: Bump submodule pointer if internal-docs has new commits
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Skip if submodule not yet configured (handoff window before someone adds it)
|
||||
if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
|
||||
echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
git submodule update --remote --merge .internal-docs
|
||||
|
||||
if git diff --quiet .internal-docs; then
|
||||
echo "No internal-docs changes to sync."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
git config user.name "browseros-bot"
|
||||
git config user.email "bot@browseros.ai"
|
||||
git add .internal-docs
|
||||
git commit -m "chore: sync internal-docs submodule"
|
||||
|
||||
# Rebase onto latest dev to absorb any commits that landed during the run,
|
||||
# then push. set -e takes care of failing the run on rebase conflict.
|
||||
git pull --rebase origin dev
|
||||
git push origin dev
|
||||
6
.github/workflows/test.yml
vendored
6
.github/workflows/test.yml
vendored
@@ -63,15 +63,15 @@ jobs:
|
||||
junit_path: test-results/server-root.xml
|
||||
needs_browser: false
|
||||
- suite: agent
|
||||
command: bun run test:agent
|
||||
command: (cd apps/agent && bun run test)
|
||||
junit_path: test-results/agent.xml
|
||||
needs_browser: false
|
||||
- suite: eval
|
||||
command: bun run test:eval
|
||||
command: (cd apps/eval && bun run test)
|
||||
junit_path: test-results/eval.xml
|
||||
needs_browser: false
|
||||
- suite: build
|
||||
command: bun run test:build
|
||||
command: bun run ./scripts/run-bun-test.ts ./scripts/build
|
||||
junit_path: test-results/build.xml
|
||||
needs_browser: false
|
||||
|
||||
|
||||
4
.gitmodules
vendored
4
.gitmodules
vendored
@@ -0,0 +1,4 @@
|
||||
[submodule ".internal-docs"]
|
||||
path = .internal-docs
|
||||
url = git@github.com:browseros-ai/internal-docs.git
|
||||
branch = main
|
||||
|
||||
1
.internal-docs
Submodule
1
.internal-docs
Submodule
Submodule .internal-docs added at 01085a4ef5
15
README.md
15
README.md
@@ -188,6 +188,21 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
|
||||
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
|
||||
- [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.
|
||||
|
||||
## Citation
|
||||
|
||||
If you use BrowserOS in your research or project, please cite:
|
||||
|
||||
```bibtex
|
||||
@software{browseros2025,
|
||||
author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
|
||||
title = {BrowserOS: The open-source Agentic browser},
|
||||
url = {https://github.com/browseros-ai/BrowserOS},
|
||||
year = {2025},
|
||||
publisher = {GitHub},
|
||||
license = {AGPL-3.0},
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
|
||||
|
||||
@@ -79,14 +79,15 @@ cp apps/server/.env.example apps/server/.env.development
|
||||
cp apps/agent/.env.example apps/agent/.env.development
|
||||
cp apps/server/.env.production.example apps/server/.env.production
|
||||
|
||||
# Install deps, generate agent code, and sync the VM cache
|
||||
# Install deps and generate agent code
|
||||
bun run dev:setup
|
||||
|
||||
# Start the full dev environment
|
||||
bun run dev:watch
|
||||
```
|
||||
|
||||
`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.
|
||||
`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
|
||||
the server startup path and pulls the configured GHCR image on demand.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
@@ -156,9 +157,14 @@ bun run build:server # Build production server resource artifacts and u
|
||||
bun run build:agent # Build agent extension
|
||||
|
||||
# Test
|
||||
bun run test # Run standard tests
|
||||
bun run test:cdp # Run CDP-based tests
|
||||
bun run test:integration # Run integration tests
|
||||
bun run test # Run all tests
|
||||
bun run test:all # Run all tests
|
||||
bun run test:main # Run key server tools and integration tests
|
||||
|
||||
# App-specific test groups (from packages/browseros-agent)
|
||||
cd apps/server && bun run test:tools
|
||||
cd apps/server && bun run test:cdp
|
||||
cd apps/server && bun run test:integration
|
||||
|
||||
# Quality
|
||||
bun run lint # Check with Biome
|
||||
|
||||
@@ -1,136 +0,0 @@
|
||||
import { Bot, Loader2, Wrench } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface AgentCardProps {
|
||||
agent: AgentCardData
|
||||
onClick: () => void
|
||||
active?: boolean
|
||||
}
|
||||
|
||||
function formatTimestamp(timestamp?: number): string {
|
||||
if (!timestamp) return 'No activity yet'
|
||||
const diff = Date.now() - timestamp
|
||||
const minutes = Math.floor(diff / 60000)
|
||||
if (minutes < 1) return 'just now'
|
||||
if (minutes < 60) return `${minutes}m ago`
|
||||
const hours = Math.floor(minutes / 60)
|
||||
if (hours < 24) return `${hours}h ago`
|
||||
return `${Math.floor(hours / 24)}d ago`
|
||||
}
|
||||
|
||||
function getStatusLabel(status: AgentCardData['status']): string {
|
||||
if (status === 'working') return 'Working'
|
||||
if (status === 'error') return 'Error'
|
||||
return 'Ready'
|
||||
}
|
||||
|
||||
function getStatusTone(status: AgentCardData['status']): string {
|
||||
if (status === 'working') return 'bg-amber-500'
|
||||
if (status === 'error') return 'bg-destructive'
|
||||
return 'bg-emerald-500'
|
||||
}
|
||||
|
||||
function formatCost(usd: number): string {
|
||||
if (usd < 0.005) return `$${usd.toFixed(4)}`
|
||||
return `$${usd.toFixed(2)}`
|
||||
}
|
||||
|
||||
export const AgentCardExpanded: FC<AgentCardProps> = ({
|
||||
agent,
|
||||
onClick,
|
||||
active,
|
||||
}) => (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
|
||||
active
|
||||
? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
|
||||
: 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start justify-between gap-3">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<div
|
||||
className={cn(
|
||||
'flex size-10 shrink-0 items-center justify-center rounded-xl',
|
||||
active
|
||||
? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
|
||||
: 'bg-muted text-muted-foreground',
|
||||
)}
|
||||
>
|
||||
<Bot className="size-5" />
|
||||
</div>
|
||||
<div className="min-w-0">
|
||||
<div className="truncate font-semibold text-sm">{agent.name}</div>
|
||||
<div className="truncate text-muted-foreground text-xs">
|
||||
{agent.model ?? 'OpenClaw agent'}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
|
||||
<span
|
||||
className={cn('size-2 rounded-full', getStatusTone(agent.status))}
|
||||
/>
|
||||
<span>{getStatusLabel(agent.status)}</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 flex-1">
|
||||
<p className="line-clamp-2 text-foreground/90 text-sm">
|
||||
{agent.lastMessage ??
|
||||
'Start a conversation to see recent work and summaries.'}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
|
||||
<div className="flex items-center justify-between gap-3">
|
||||
<span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
|
||||
{agent.costUsd ? (
|
||||
<span className="tabular-nums opacity-70">
|
||||
{formatCost(agent.costUsd)}
|
||||
</span>
|
||||
) : null}
|
||||
</div>
|
||||
{agent.status === 'working' && agent.currentTool ? (
|
||||
<div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
|
||||
<Loader2 className="size-3 shrink-0 animate-spin" />
|
||||
<span className="truncate">{agent.currentTool}</span>
|
||||
</div>
|
||||
) : agent.activitySummary ? (
|
||||
<div className="flex items-center gap-1.5 text-muted-foreground/60">
|
||||
<Wrench className="size-3 shrink-0" />
|
||||
<span className="truncate">{agent.activitySummary}</span>
|
||||
</div>
|
||||
) : null}
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
|
||||
export const AgentCardCompact: FC<AgentCardProps> = ({
|
||||
agent,
|
||||
onClick,
|
||||
active,
|
||||
}) => (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
|
||||
active
|
||||
? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
|
||||
: 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<span
|
||||
className={cn(
|
||||
'size-2 rounded-full',
|
||||
active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
|
||||
)}
|
||||
/>
|
||||
<span className="truncate">{agent.name}</span>
|
||||
</button>
|
||||
)
|
||||
@@ -1,70 +1,71 @@
|
||||
import { Plus } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAdapterHealth,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AgentCardCompact, AgentCardExpanded } from './AgentCard'
|
||||
import { HomeAgentCard } from './HomeAgentCard'
|
||||
|
||||
interface AgentCardDockProps {
|
||||
agents: AgentCardData[]
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
activeAgentId?: string
|
||||
onSelectAgent: (agentId: string) => void
|
||||
onCreateAgent?: () => void
|
||||
compact?: boolean
|
||||
}
|
||||
|
||||
function CreateAgentButton({
|
||||
compact,
|
||||
onCreateAgent,
|
||||
}: {
|
||||
compact?: boolean
|
||||
onCreateAgent: () => void
|
||||
}) {
|
||||
function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onCreateAgent}
|
||||
className={cn(
|
||||
'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
|
||||
compact
|
||||
? 'rounded-full px-3 py-2 text-sm'
|
||||
: 'min-h-32 rounded-2xl px-5 py-4',
|
||||
'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
|
||||
'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
|
||||
)}
|
||||
>
|
||||
<Plus className={compact ? 'size-3.5' : 'size-5'} />
|
||||
<span>{compact ? 'New' : 'Create agent'}</span>
|
||||
<Plus className="size-5" />
|
||||
<span>Create agent</span>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* 3-column grid of HomeAgentCards plus a trailing "Create agent"
|
||||
* tile. The previous `compact` mode (rendered a horizontal pill rail)
|
||||
* had no callers and was dropped along with the legacy AgentCard.
|
||||
*/
|
||||
export const AgentCardDock: FC<AgentCardDockProps> = ({
|
||||
agents,
|
||||
adapters,
|
||||
activeAgentId,
|
||||
onSelectAgent,
|
||||
onCreateAgent,
|
||||
compact,
|
||||
}) => {
|
||||
if (agents.length === 0 && !onCreateAgent) return null
|
||||
|
||||
const Card = compact ? AgentCardCompact : AgentCardExpanded
|
||||
const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
|
||||
for (const descriptor of adapters) {
|
||||
if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
|
||||
}
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
compact
|
||||
? 'flex items-center gap-2 overflow-x-auto pb-1'
|
||||
: 'grid gap-4 md:grid-cols-3',
|
||||
)}
|
||||
>
|
||||
<div className="grid gap-4 md:grid-cols-3">
|
||||
{agents.map((agent) => (
|
||||
<Card
|
||||
key={agent.agentId}
|
||||
<HomeAgentCard
|
||||
key={agent.id}
|
||||
agent={agent}
|
||||
active={agent.agentId === activeAgentId}
|
||||
onClick={() => onSelectAgent(agent.agentId)}
|
||||
adapter={agent.adapter}
|
||||
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
|
||||
active={agent.id === activeAgentId}
|
||||
onClick={() => onSelectAgent(agent.id)}
|
||||
/>
|
||||
))}
|
||||
{onCreateAgent ? (
|
||||
<CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
|
||||
<CreateAgentButton onCreateAgent={onCreateAgent} />
|
||||
) : null}
|
||||
</div>
|
||||
)
|
||||
|
||||
@@ -2,6 +2,12 @@ import { ArrowLeft, Bot, Home } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef } from 'react'
|
||||
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
cancelHarnessTurn,
|
||||
useEnqueueHarnessMessage,
|
||||
useHarnessAgents,
|
||||
useRemoveHarnessQueuedMessage,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import {
|
||||
type AgentEntry,
|
||||
getModelDisplayName,
|
||||
@@ -15,6 +21,7 @@ import {
|
||||
filterTurnsPersistedInHistory,
|
||||
flattenHistoryPages,
|
||||
} from './claw-chat-types'
|
||||
import { QueuePanel } from './QueuePanel'
|
||||
import { useAgentConversation } from './useAgentConversation'
|
||||
import { useHarnessChatHistory } from './useHarnessChatHistory'
|
||||
|
||||
@@ -212,15 +219,33 @@ function AgentConversationController({
|
||||
[historyMessages],
|
||||
)
|
||||
|
||||
// Listing query feeds queue + active-turn state for this agent. We
|
||||
// already poll it every 5s for the rail; reusing the same cache
|
||||
// keeps cross-tab queue state in sync without a second poll.
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
|
||||
const queue = harnessAgent?.queue ?? []
|
||||
const activeTurnId = harnessAgent?.activeTurnId ?? null
|
||||
|
||||
const { turns, streaming, send } = useAgentConversation(agentId, {
|
||||
runtime: 'agent-harness',
|
||||
sessionKey: null,
|
||||
history: chatHistory,
|
||||
activeTurnId,
|
||||
onComplete: () => {
|
||||
void harnessHistoryQuery.refetch()
|
||||
},
|
||||
onSessionKeyChange: () => {},
|
||||
})
|
||||
const enqueueMessage = useEnqueueHarnessMessage()
|
||||
const removeQueuedMessage = useRemoveHarnessQueuedMessage()
|
||||
|
||||
const handleStop = () => {
|
||||
void cancelHarnessTurn(agentId, {
|
||||
turnId: activeTurnId ?? undefined,
|
||||
reason: 'user pressed stop',
|
||||
})
|
||||
}
|
||||
const visibleTurns = useMemo(
|
||||
() => filterTurnsPersistedInHistory(turns, historyMessages),
|
||||
[historyMessages, turns],
|
||||
@@ -281,7 +306,15 @@ function AgentConversationController({
|
||||
/>
|
||||
|
||||
<div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
|
||||
<div className="mx-auto max-w-3xl">
|
||||
<div className="mx-auto max-w-3xl space-y-3">
|
||||
{queue.length > 0 ? (
|
||||
<QueuePanel
|
||||
queue={queue}
|
||||
onRemove={(messageId) =>
|
||||
removeQueuedMessage.mutate({ agentId, messageId })
|
||||
}
|
||||
/>
|
||||
) : null}
|
||||
<ConversationInput
|
||||
variant="conversation"
|
||||
agents={agents}
|
||||
@@ -296,14 +329,31 @@ function AgentConversationController({
|
||||
name: a.name,
|
||||
dataUrl: a.dataUrl,
|
||||
}))
|
||||
// When the agent already has an in-flight turn, route
|
||||
// the new message into the durable queue instead of
|
||||
// starting a parallel turn. Drains automatically as
|
||||
// soon as the active turn ends.
|
||||
if (streaming || activeTurnId) {
|
||||
enqueueMessage.mutate({
|
||||
agentId,
|
||||
message: input.text,
|
||||
attachments,
|
||||
})
|
||||
return
|
||||
}
|
||||
void send({ text: input.text, attachments, attachmentPreviews })
|
||||
}}
|
||||
onCreateAgent={() => navigate(createAgentPath)}
|
||||
onStop={handleStop}
|
||||
streaming={streaming}
|
||||
disabled={disabled}
|
||||
status="running"
|
||||
attachmentsEnabled={true}
|
||||
placeholder={`Message ${agentName}...`}
|
||||
placeholder={
|
||||
streaming
|
||||
? `Type to queue another message for ${agentName}...`
|
||||
: `Message ${agentName}...`
|
||||
}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -1,18 +1,25 @@
|
||||
import { Plus } from 'lucide-react'
|
||||
import { type FC, useEffect, useState } from 'react'
|
||||
import { type FC, useEffect, useMemo, useState } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { Card, CardContent } from '@/components/ui/card'
|
||||
import { Separator } from '@/components/ui/separator'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import {
|
||||
useAgentAdapters,
|
||||
useHarnessAgents,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
|
||||
import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
|
||||
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import { AgentCardDock } from './AgentCardDock'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import { ConversationInput } from './ConversationInput'
|
||||
import { buildAgentCardData } from './useAgentCardData'
|
||||
import { orderHomeAgents } from './home-agent-card.helpers'
|
||||
|
||||
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
|
||||
return (
|
||||
@@ -38,11 +45,13 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
|
||||
function RecentThreads({
|
||||
activeAgentId,
|
||||
agents,
|
||||
adapters,
|
||||
onOpenAgents,
|
||||
onSelectAgent,
|
||||
}: {
|
||||
activeAgentId?: string | null
|
||||
agents: AgentCardData[]
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
onOpenAgents: () => void
|
||||
onSelectAgent: (agentId: string) => void
|
||||
}) {
|
||||
@@ -68,6 +77,7 @@ function RecentThreads({
|
||||
</div>
|
||||
<AgentCardDock
|
||||
agents={agents}
|
||||
adapters={adapters}
|
||||
activeAgentId={activeAgentId ?? undefined}
|
||||
onSelectAgent={onSelectAgent}
|
||||
onCreateAgent={onOpenAgents}
|
||||
@@ -79,25 +89,32 @@ function RecentThreads({
|
||||
export const AgentCommandHome: FC = () => {
|
||||
const navigate = useNavigate()
|
||||
const activeHint = useActiveHint()
|
||||
const { agents, status } = useAgentCommandData()
|
||||
// The conversation input still consumes the merged AgentEntry list
|
||||
// from the layout context (handles legacy /claw/agents entries that
|
||||
// haven't yet been backfilled into the harness store). The Recent
|
||||
// Agents grid below reads the richer harness payload directly.
|
||||
const { agents: legacyAgents, status } = useAgentCommandData()
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const { adapters } = useAgentAdapters()
|
||||
const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
|
||||
const cardData = buildAgentCardData(agents, status?.status, undefined)
|
||||
|
||||
const orderedAgents = useMemo(
|
||||
() => orderHomeAgents(harnessAgents),
|
||||
[harnessAgents],
|
||||
)
|
||||
|
||||
useEffect(() => {
|
||||
if (agents.length === 0) {
|
||||
if (selectedAgentId) {
|
||||
setSelectedAgentId(null)
|
||||
}
|
||||
if (legacyAgents.length === 0) {
|
||||
if (selectedAgentId) setSelectedAgentId(null)
|
||||
return
|
||||
}
|
||||
|
||||
if (
|
||||
!selectedAgentId ||
|
||||
!agents.some((agent) => agent.agentId === selectedAgentId)
|
||||
!legacyAgents.some((agent) => agent.agentId === selectedAgentId)
|
||||
) {
|
||||
setSelectedAgentId(agents[0].agentId)
|
||||
setSelectedAgentId(legacyAgents[0].agentId)
|
||||
}
|
||||
}, [agents, selectedAgentId])
|
||||
}, [legacyAgents, selectedAgentId])
|
||||
|
||||
const handleSend = (input: { text: string }) => {
|
||||
if (!selectedAgentId) return
|
||||
@@ -110,7 +127,7 @@ export const AgentCommandHome: FC = () => {
|
||||
setSelectedAgentId(agent.agentId)
|
||||
}
|
||||
|
||||
const selectedAgent = agents.find(
|
||||
const selectedAgent = legacyAgents.find(
|
||||
(agent) => agent.agentId === selectedAgentId,
|
||||
)
|
||||
const selectedAgentReady = selectedAgent
|
||||
@@ -118,13 +135,15 @@ export const AgentCommandHome: FC = () => {
|
||||
: false
|
||||
const selectedAgentStatus =
|
||||
selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
|
||||
const selectedCard =
|
||||
cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]
|
||||
const selectedAgentName =
|
||||
selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
|
||||
|
||||
const hasAgents = legacyAgents.length > 0
|
||||
|
||||
return (
|
||||
<div className="min-h-full px-4 py-6">
|
||||
<div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
|
||||
{cardData.length > 0 ? (
|
||||
{hasAgents ? (
|
||||
<>
|
||||
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
|
||||
<div className="space-y-3">
|
||||
@@ -140,7 +159,7 @@ export const AgentCommandHome: FC = () => {
|
||||
<div className="w-full max-w-3xl">
|
||||
<ConversationInput
|
||||
variant="home"
|
||||
agents={agents}
|
||||
agents={legacyAgents}
|
||||
selectedAgentId={selectedAgentId}
|
||||
onSelectAgent={handleSelectAgent}
|
||||
onSend={handleSend}
|
||||
@@ -151,7 +170,7 @@ export const AgentCommandHome: FC = () => {
|
||||
attachmentsEnabled={false}
|
||||
placeholder={
|
||||
selectedAgentReady
|
||||
? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
|
||||
? `Ask ${selectedAgentName} to handle a task...`
|
||||
: 'Agent runtime is not running...'
|
||||
}
|
||||
/>
|
||||
@@ -162,7 +181,8 @@ export const AgentCommandHome: FC = () => {
|
||||
|
||||
<RecentThreads
|
||||
activeAgentId={selectedAgentId}
|
||||
agents={cardData}
|
||||
agents={orderedAgents}
|
||||
adapters={adapters}
|
||||
onOpenAgents={() => navigate('/agents')}
|
||||
onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
|
||||
/>
|
||||
|
||||
@@ -54,25 +54,40 @@ interface ConversationInputProps {
|
||||
placeholder?: string
|
||||
attachmentsEnabled?: boolean
|
||||
variant?: 'home' | 'conversation'
|
||||
/**
|
||||
* When set, a Stop button surfaces to the left of the voice mic
|
||||
* while `streaming === true`. Click cancels the active turn
|
||||
* server-side via the chat-cancel endpoint. Absent → no Stop
|
||||
* button (legacy behaviour for the home composer).
|
||||
*/
|
||||
onStop?: () => void
|
||||
}
|
||||
|
||||
function InputActionButton({
|
||||
disabled,
|
||||
onClick,
|
||||
streaming,
|
||||
hasContent,
|
||||
}: {
|
||||
disabled: boolean
|
||||
onClick: () => void
|
||||
streaming: boolean
|
||||
hasContent: boolean
|
||||
}) {
|
||||
// Show the spinner while streaming only when there's nothing to
|
||||
// send — once the user types something, the icon flips back to the
|
||||
// paper-plane so it reads as "queue this message" instead of
|
||||
// "still working".
|
||||
const showSpinner = streaming && !hasContent
|
||||
return (
|
||||
<Button
|
||||
onClick={onClick}
|
||||
size="icon"
|
||||
disabled={disabled}
|
||||
title={streaming && hasContent ? 'Queue message' : undefined}
|
||||
className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
|
||||
>
|
||||
{streaming ? (
|
||||
{showSpinner ? (
|
||||
<Loader2 className="h-5 w-5 animate-spin" />
|
||||
) : (
|
||||
<ArrowRight className="h-5 w-5" />
|
||||
@@ -81,6 +96,22 @@ function InputActionButton({
|
||||
)
|
||||
}
|
||||
|
||||
function StopButton({ onStop }: { onStop: () => void }) {
|
||||
return (
|
||||
<Button
|
||||
type="button"
|
||||
size="icon"
|
||||
variant="ghost"
|
||||
onClick={onStop}
|
||||
title="Stop current turn — queued messages will start next."
|
||||
aria-label="Stop current turn"
|
||||
className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
|
||||
>
|
||||
<Square className="h-3.5 w-3.5 fill-current" />
|
||||
</Button>
|
||||
)
|
||||
}
|
||||
|
||||
function VoiceButton({
|
||||
isRecording,
|
||||
isTranscribing,
|
||||
@@ -299,6 +330,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
placeholder,
|
||||
attachmentsEnabled = true,
|
||||
variant = 'conversation',
|
||||
onStop,
|
||||
}) => {
|
||||
const [input, setInput] = useState('')
|
||||
const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
|
||||
@@ -379,10 +411,17 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
}
|
||||
|
||||
const hasContent = input.trim().length > 0 || attachments.length > 0
|
||||
// Queue-aware composers (the conversation panel passes `onStop`)
|
||||
// accept input while streaming — the parent decides whether the
|
||||
// submission opens a new turn or enqueues onto the active one.
|
||||
// Surfaces without a Stop hook (home) keep the legacy behaviour
|
||||
// and block input until the current turn finishes.
|
||||
const queueAware = Boolean(onStop)
|
||||
|
||||
const handleSend = () => {
|
||||
const text = input.trim()
|
||||
if (disabled || isStaging || streaming) return
|
||||
if (disabled || isStaging) return
|
||||
if (streaming && !queueAware) return
|
||||
if (!text && attachments.length === 0) return
|
||||
onSend({ text, attachments })
|
||||
setInput('')
|
||||
@@ -512,6 +551,7 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
{streaming && onStop ? <StopButton onStop={onStop} /> : null}
|
||||
<VoiceButton
|
||||
isRecording={voice.isRecording}
|
||||
isTranscribing={voice.isTranscribing}
|
||||
@@ -529,12 +569,13 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
!!disabled ||
|
||||
voice.isRecording ||
|
||||
voice.isTranscribing ||
|
||||
streaming
|
||||
(streaming && !queueAware)
|
||||
}
|
||||
onClick={handleSend}
|
||||
// Spinner stays the user-facing "agent is busy" hint; with the
|
||||
// queue active we still spin while a turn is in flight.
|
||||
streaming={streaming}
|
||||
hasContent={hasContent}
|
||||
/>
|
||||
</div>
|
||||
{voice.error ? (
|
||||
|
||||
@@ -0,0 +1,243 @@
|
||||
import { Quote, TriangleAlert } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
|
||||
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
|
||||
import type {
|
||||
HarnessAdapterHealth,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
|
||||
import {
|
||||
firstNonBlankLine,
|
||||
truncate,
|
||||
} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface HomeAgentCardProps {
|
||||
agent: HarnessAgent
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
/** Per-adapter health snapshot, shared across cards rendering the
|
||||
* same adapter. `null` when the /adapters response hasn't surfaced
|
||||
* health yet (we treat that as healthy until proven otherwise). */
|
||||
adapterHealth: HarnessAdapterHealth | null
|
||||
/** Highlights the card with an accent ring; tells the user which
|
||||
* agent the conversation input is bound to. */
|
||||
active?: boolean
|
||||
onClick: () => void
|
||||
}
|
||||
|
||||
const PREVIEW_CHARS = 100
|
||||
|
||||
/**
|
||||
* Grid-shaped card for the /home Recent agents section. Composition
|
||||
* mirrors the rail's `AgentRowCard` but the layout is a vertical
|
||||
* column sized for a 1/3-width tile rather than a full-width row.
|
||||
*
|
||||
* Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
|
||||
* `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
|
||||
* inline `Unavailable` chip pattern so the visual language is
|
||||
* continuous between rail and grid.
|
||||
*/
|
||||
export const HomeAgentCard: FC<HomeAgentCardProps> = ({
|
||||
agent,
|
||||
adapter,
|
||||
adapterHealth,
|
||||
active,
|
||||
onClick,
|
||||
}) => {
|
||||
const status = agent.status ?? 'unknown'
|
||||
const lastUsedAt = agent.lastUsedAt ?? null
|
||||
const isWorking = status === 'working'
|
||||
const isAsleep = status === 'asleep'
|
||||
const isError = status === 'error'
|
||||
const hasActiveTurn = Boolean(agent.activeTurnId)
|
||||
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
|
||||
active && 'ring-1 ring-[var(--accent-orange)]/30',
|
||||
isWorking
|
||||
? 'border-[var(--accent-orange)]/40'
|
||||
: isError
|
||||
? 'border-destructive/30'
|
||||
: 'border-border/60 hover:border-[var(--accent-orange)]/30',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start gap-3">
|
||||
<AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<span className="truncate font-semibold text-sm">
|
||||
{displayName(agent)}
|
||||
</span>
|
||||
{isWorking && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
</div>
|
||||
<SummaryLine
|
||||
adapter={adapter}
|
||||
modelId={agent.modelId ?? null}
|
||||
reasoningEffort={agent.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<LastMessage message={agent.lastUserMessage ?? null} />
|
||||
|
||||
<div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
|
||||
<span>{statusFootnote(status, lastUsedAt)}</span>
|
||||
{hasActiveTurn ? (
|
||||
<ResumeChip />
|
||||
) : isAsleep ? (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
) : isError ? (
|
||||
<ErrorChip lastError={agent.lastError ?? null} />
|
||||
) : null}
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
const SummaryLine: FC<{
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelId: string | null
|
||||
reasoningEffort: string | null
|
||||
adapterHealth: HarnessAdapterHealth | null
|
||||
}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
|
||||
const parts = [adapterLabel(adapter)]
|
||||
if (modelId) parts.push(modelId)
|
||||
if (reasoningEffort) parts.push(reasoningEffort)
|
||||
const unhealthy = adapterHealth?.healthy === false
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
|
||||
unhealthy && 'text-muted-foreground/70',
|
||||
)}
|
||||
>
|
||||
<span className="truncate">{parts.join(' · ')}</span>
|
||||
{unhealthy && (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
<TriangleAlert className="size-2.5" />
|
||||
<span className="font-normal">Unavailable</span>
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="right" className="w-72 text-sm">
|
||||
<div className="font-medium">
|
||||
{adapterLabel(adapter)} CLI not available
|
||||
</div>
|
||||
<div className="mt-1 text-muted-foreground text-xs">
|
||||
{adapterHealth?.reason ??
|
||||
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
|
||||
</div>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
const LastMessage: FC<{ message: string | null }> = ({ message }) => {
|
||||
if (!message) {
|
||||
return (
|
||||
<p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
|
||||
No messages yet — start a chat
|
||||
</p>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
|
||||
<Quote
|
||||
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
|
||||
aria-hidden
|
||||
/>
|
||||
<span className="line-clamp-2">
|
||||
{truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
|
||||
</span>
|
||||
</p>
|
||||
)
|
||||
}
|
||||
|
||||
const ResumeChip: FC = () => (
|
||||
<span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
|
||||
<span className="relative flex size-1.5">
|
||||
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
|
||||
<span className="relative inline-flex size-1.5 rounded-full bg-white" />
|
||||
</span>
|
||||
Resume
|
||||
</span>
|
||||
)
|
||||
|
||||
const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
|
||||
if (!lastError) {
|
||||
return <Badge variant="destructive">Attention</Badge>
|
||||
}
|
||||
return (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge variant="destructive" className="cursor-default">
|
||||
Attention
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent
|
||||
side="left"
|
||||
className="max-w-xs whitespace-pre-wrap font-mono text-xs"
|
||||
>
|
||||
{lastError}
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Footer left side: relative time on every state EXCEPT working,
|
||||
* which shows `now` (the dot is already pulsing — restating it as
|
||||
* "Working" would duplicate the pill in the title row).
|
||||
*/
|
||||
function statusFootnote(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null,
|
||||
): string {
|
||||
if (status === 'working') return 'now'
|
||||
return formatRelativeTime(lastUsedAt)
|
||||
}
|
||||
|
||||
const UUID_PATTERN =
|
||||
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
|
||||
const OC_UUID_PATTERN =
|
||||
/^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
|
||||
|
||||
function displayName(agent: HarnessAgent): string {
|
||||
const name = agent.name?.trim()
|
||||
const id = agent.id
|
||||
if (!name || name === id) {
|
||||
if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
|
||||
if (UUID_PATTERN.test(id)) return id.slice(0, 8)
|
||||
return id
|
||||
}
|
||||
return name
|
||||
}
|
||||
@@ -0,0 +1,94 @@
|
||||
import { ListPlus, X } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
Queue,
|
||||
QueueItem,
|
||||
QueueItemAction,
|
||||
QueueItemActions,
|
||||
QueueItemAttachment,
|
||||
QueueItemContent,
|
||||
QueueItemFile,
|
||||
QueueItemImage,
|
||||
QueueList,
|
||||
QueueSection,
|
||||
QueueSectionContent,
|
||||
QueueSectionLabel,
|
||||
QueueSectionTrigger,
|
||||
} from '@/components/ai-elements/queue'
|
||||
import type {
|
||||
HarnessQueuedMessage,
|
||||
HarnessQueuedMessageAttachment,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
|
||||
interface QueuePanelProps {
|
||||
queue: HarnessQueuedMessage[]
|
||||
onRemove: (messageId: string) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Renders the agent's pending message queue using the shared AI
|
||||
* Elements `Queue` primitives. Caller is expected to gate render on
|
||||
* `queue.length > 0` — when empty, this returns null so the panel
|
||||
* disappears cleanly between turns.
|
||||
*/
|
||||
export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
|
||||
if (queue.length === 0) return null
|
||||
return (
|
||||
<Queue>
|
||||
<QueueSection>
|
||||
<QueueSectionTrigger>
|
||||
<QueueSectionLabel
|
||||
count={queue.length}
|
||||
label={queue.length === 1 ? 'queued message' : 'queued messages'}
|
||||
icon={<ListPlus className="size-3.5" />}
|
||||
/>
|
||||
</QueueSectionTrigger>
|
||||
<QueueSectionContent>
|
||||
<QueueList>
|
||||
{queue.map((entry) => (
|
||||
<QueueItem key={entry.id}>
|
||||
<div className="flex items-center gap-2">
|
||||
<QueueItemContent>
|
||||
{firstNonBlankLine(entry.message)}
|
||||
</QueueItemContent>
|
||||
<QueueItemActions>
|
||||
<QueueItemAction
|
||||
aria-label="Remove from queue"
|
||||
onClick={() => onRemove(entry.id)}
|
||||
>
|
||||
<X className="size-3" />
|
||||
</QueueItemAction>
|
||||
</QueueItemActions>
|
||||
</div>
|
||||
{entry.attachments && entry.attachments.length > 0 ? (
|
||||
<QueueItemAttachment>
|
||||
{entry.attachments.map((attachment, idx) =>
|
||||
renderAttachment(entry.id, attachment, idx),
|
||||
)}
|
||||
</QueueItemAttachment>
|
||||
) : null}
|
||||
</QueueItem>
|
||||
))}
|
||||
</QueueList>
|
||||
</QueueSectionContent>
|
||||
</QueueSection>
|
||||
</Queue>
|
||||
)
|
||||
}
|
||||
|
||||
function renderAttachment(
|
||||
messageId: string,
|
||||
attachment: HarnessQueuedMessageAttachment,
|
||||
idx: number,
|
||||
) {
|
||||
if (attachment.mediaType.startsWith('image/')) {
|
||||
const src = `data:${attachment.mediaType};base64,${attachment.data}`
|
||||
return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
|
||||
}
|
||||
return (
|
||||
<QueueItemFile key={`${messageId}-${idx}`}>
|
||||
{attachment.mediaType}
|
||||
</QueueItemFile>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,69 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { orderHomeAgents } from './home-agent-card.helpers'
|
||||
|
||||
function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
|
||||
return {
|
||||
id: overrides.id ?? 'agent-x',
|
||||
name: overrides.name ?? overrides.id ?? 'agent-x',
|
||||
adapter: overrides.adapter ?? 'codex',
|
||||
permissionMode: 'approve-all',
|
||||
sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
|
||||
createdAt: 1000,
|
||||
updatedAt: 1000,
|
||||
...overrides,
|
||||
}
|
||||
}
|
||||
|
||||
describe('orderHomeAgents', () => {
|
||||
it('places active-turn agents before everyone else', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'a', lastUsedAt: 5000 }),
|
||||
agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
|
||||
agent({ id: 'c', lastUsedAt: 7000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
|
||||
})
|
||||
|
||||
it('orders non-active agents by lastUsedAt desc', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'old', lastUsedAt: 1000 }),
|
||||
agent({ id: 'new', lastUsedAt: 9000 }),
|
||||
agent({ id: 'mid', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
|
||||
})
|
||||
|
||||
it('puts the gateway `main` seed agent above other never-used agents', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
|
||||
agent({ id: 'main', lastUsedAt: null }),
|
||||
agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
|
||||
})
|
||||
|
||||
it('sends never-used agents to the bottom even when `main` is among them', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'main', lastUsedAt: null }),
|
||||
agent({ id: 'used', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
|
||||
})
|
||||
|
||||
it('does NOT sort by pinned — pinned agents are treated like any other', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
|
||||
agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
|
||||
})
|
||||
|
||||
it('falls back to id-stable ordering when lastUsedAt ties', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'b', lastUsedAt: 5000 }),
|
||||
agent({ id: 'a', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
|
||||
})
|
||||
})
|
||||
@@ -0,0 +1,42 @@
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
|
||||
/**
|
||||
* Order for the /home Recent agents grid.
|
||||
*
|
||||
* 1. Active turn first — agents mid-turn float to the top so the
|
||||
* Resume affordance is the first thing the user sees on /home.
|
||||
* 2. The protected gateway-side `main` agent stays pinned-to-top in
|
||||
* the never-used group on a fresh install (mirrors the rail).
|
||||
* 3. Recency (`lastUsedAt` desc).
|
||||
* 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
|
||||
* every 5-second poll.
|
||||
*
|
||||
* Pin is NOT a sort key. The home grid is action-oriented and trusts
|
||||
* recency + active-turn to surface the right agent; pinning is an
|
||||
* organisation tool that lives on the rail at /agents.
|
||||
*/
|
||||
export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
|
||||
return [...agents].sort((a, b) => {
|
||||
const aActive = a.activeTurnId != null
|
||||
const bActive = b.activeTurnId != null
|
||||
if (aActive !== bActive) return aActive ? -1 : 1
|
||||
|
||||
// Recency wins outright. Never-used agents (`lastUsedAt == null`)
|
||||
// both fall to the same `-Infinity` bucket and the seed/id rules
|
||||
// below decide their order — but a used agent always beats any
|
||||
// never-used agent regardless of id.
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
// Inside the never-used (or exact-tie) group: pin the gateway
|
||||
// `main` seed to the top of the group on a fresh install, then
|
||||
// fall back to id-stable order so the grid doesn't reshuffle on
|
||||
// every poll.
|
||||
const aSeed = a.id === 'main' && a.lastUsedAt == null
|
||||
const bSeed = b.id === 'main' && b.lastUsedAt == null
|
||||
if (aSeed !== bSeed) return aSeed ? -1 : 1
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
})
|
||||
}
|
||||
@@ -1,53 +0,0 @@
|
||||
import {
|
||||
type AgentEntry,
|
||||
getModelDisplayName,
|
||||
type OpenClawStatus,
|
||||
} from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import type { AgentOverview } from './useAgentDashboard'
|
||||
|
||||
function resolveAgentStatus(
|
||||
gatewayStatus: OpenClawStatus['status'] | undefined,
|
||||
liveStatus: AgentOverview['status'] | undefined,
|
||||
): AgentCardData['status'] {
|
||||
// Gateway-level errors take precedence
|
||||
if (gatewayStatus === 'error') return 'error'
|
||||
if (gatewayStatus === 'starting') return 'working'
|
||||
|
||||
// Per-agent live status from the WS observer
|
||||
if (liveStatus === 'working') return 'working'
|
||||
if (liveStatus === 'error') return 'error'
|
||||
|
||||
return 'idle'
|
||||
}
|
||||
|
||||
/**
|
||||
* Build agent card display data by merging the raw agent entries from
|
||||
* the gateway with enriched overview data from the dashboard API.
|
||||
*
|
||||
* Pure function — no hooks, no IndexedDB, no async.
|
||||
*/
|
||||
export function buildAgentCardData(
|
||||
agents: AgentEntry[],
|
||||
status: OpenClawStatus['status'] | undefined,
|
||||
dashboard: AgentOverview[] | undefined,
|
||||
): AgentCardData[] {
|
||||
return agents.map((agent) => {
|
||||
const overview = dashboard?.find((d) => d.agentId === agent.agentId)
|
||||
|
||||
return {
|
||||
agentId: agent.agentId,
|
||||
name: agent.name,
|
||||
model: getModelDisplayName(agent.model),
|
||||
status:
|
||||
agent.source === 'agent-harness'
|
||||
? 'idle'
|
||||
: resolveAgentStatus(status, overview?.status),
|
||||
lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
|
||||
lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
|
||||
activitySummary: overview?.activitySummary ?? undefined,
|
||||
currentTool: overview?.currentTool ?? undefined,
|
||||
costUsd: overview?.totalCostUsd ?? undefined,
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -36,6 +36,15 @@ interface UseAgentConversationOptions {
|
||||
history?: OpenClawChatHistoryMessage[]
|
||||
onComplete?: () => void
|
||||
onSessionKeyChange?: (sessionKey: string) => void
|
||||
/**
|
||||
* Server-side active turn id, surfaced via the listing query. When
|
||||
* this changes from null/<id> to a different non-null id while we
|
||||
* aren't already streaming (e.g. the server just popped a queued
|
||||
* message and started a new turn), the hook reattaches via
|
||||
* /chat/active so the chat panel picks up the live stream without
|
||||
* waiting for a remount.
|
||||
*/
|
||||
activeTurnId?: string | null
|
||||
}
|
||||
|
||||
export function useAgentConversation(
|
||||
@@ -211,31 +220,46 @@ export function useAgentConversation(
|
||||
}
|
||||
processEventRef.current = processAgentHarnessStreamEvent
|
||||
|
||||
// On mount (and whenever the agent changes), check whether the
|
||||
// server has an in-flight turn for this agent and reattach to it.
|
||||
// This is what makes the chat resilient across tab close/reopen,
|
||||
// refresh, and navigation: the runtime call kept running on the
|
||||
// server while we were away. Effect only depends on `agentId` —
|
||||
// the event handler is read off a ref so this doesn't re-subscribe
|
||||
// every render.
|
||||
const activeTurnIdDep = options.activeTurnId ?? null
|
||||
|
||||
// On mount, on agent change, and whenever the listing reports a
|
||||
// *new* active turn id, check whether the server has an in-flight
|
||||
// turn for this agent and reattach to it. This catches three
|
||||
// cases at once: the chat resilience flow (tab close/reopen),
|
||||
// navigation between agents, AND queue drain (the server starts a
|
||||
// new turn from a queued message → activeTurnId flips → attach).
|
||||
useEffect(() => {
|
||||
let cancelled = false
|
||||
const abortController = new AbortController()
|
||||
// Reference the dep inside the body so biome's exhaustive-deps
|
||||
// rule sees it consumed; the value is just an "any non-null
|
||||
// active turn id" trigger — the actual id we attach to comes
|
||||
// from the fresh fetchActiveHarnessTurn call below.
|
||||
void activeTurnIdDep
|
||||
|
||||
const attemptResume = async () => {
|
||||
// Track whether *we* started a stream in this run. When the
|
||||
// early-return paths fire (no active turn, or a `send()` /
|
||||
// earlier resume already owns `streamAbortRef`), the finally
|
||||
// block must NOT touch streaming/turnIdRef/lastSeqRef —
|
||||
// otherwise we clobber the in-flight stream's state and the
|
||||
// Stop button drops out mid-turn while events keep arriving.
|
||||
let weStartedStream = false
|
||||
try {
|
||||
const active = await fetchActiveHarnessTurn(agentId)
|
||||
if (cancelled || !active || active.status !== 'running') return
|
||||
if (streamAbortRef.current) return // a fresh send already in flight
|
||||
if (streamAbortRef.current) return // someone else already owns the stream
|
||||
|
||||
// Stage a placeholder turn so the streamed events have a row
|
||||
// to render into. We don't have the user message text on
|
||||
// resume; the assistant turn is what we're catching up on.
|
||||
// to render into. The server now persists the kicking-off
|
||||
// prompt on the active turn, so we render it as the user
|
||||
// bubble immediately — no empty-bubble flicker when a queued
|
||||
// message starts running.
|
||||
setTurns((prev) => [
|
||||
...prev,
|
||||
{
|
||||
id: crypto.randomUUID(),
|
||||
userText: '',
|
||||
userText: active.prompt ?? '',
|
||||
parts: [],
|
||||
done: false,
|
||||
timestamp: active.startedAt,
|
||||
@@ -247,6 +271,7 @@ export function useAgentConversation(
|
||||
lastSeqRef.current = null
|
||||
streamAbortRef.current = abortController
|
||||
setStreaming(true)
|
||||
weStartedStream = true
|
||||
|
||||
const response = await attachToHarnessTurn(agentId, {
|
||||
turnId: active.turnId,
|
||||
@@ -265,10 +290,20 @@ export function useAgentConversation(
|
||||
// Resume is best-effort; transient errors fall back to the
|
||||
// user starting a new turn manually.
|
||||
} finally {
|
||||
if (!cancelled) {
|
||||
if (streamAbortRef.current === abortController) {
|
||||
streamAbortRef.current = null
|
||||
}
|
||||
// Always release `streamAbortRef` if we owned it — even when
|
||||
// the effect was cancelled mid-stream (a listing poll
|
||||
// captured the next queue-drain turn id, for example). If we
|
||||
// don't, the next effect run hits `if (streamAbortRef.current)
|
||||
// return` against our now-aborted controller and never
|
||||
// reattaches, leaving `streaming === true` with no live stream.
|
||||
if (weStartedStream && streamAbortRef.current === abortController) {
|
||||
streamAbortRef.current = null
|
||||
}
|
||||
// The other state (streaming flag, turn id, lastSeq) is the
|
||||
// *current run's* lifecycle: only reset it on a clean exit.
|
||||
// When `cancelled` is true the next run will set these
|
||||
// itself, so resetting here would only cause a brief flicker.
|
||||
if (!cancelled && weStartedStream) {
|
||||
turnIdRef.current = null
|
||||
lastSeqRef.current = null
|
||||
setStreaming(false)
|
||||
@@ -281,7 +316,7 @@ export function useAgentConversation(
|
||||
cancelled = true
|
||||
abortController.abort()
|
||||
}
|
||||
}, [agentId])
|
||||
}, [agentId, activeTurnIdDep])
|
||||
|
||||
const send = async (input: string | SendInput) => {
|
||||
const normalized: SendInput =
|
||||
|
||||
@@ -1,95 +0,0 @@
|
||||
import { useQuery, useQueryClient } from '@tanstack/react-query'
|
||||
import { useEffect } from 'react'
|
||||
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
|
||||
|
||||
export interface AgentOverview {
|
||||
agentId: string
|
||||
status: 'working' | 'idle' | 'error' | 'unknown'
|
||||
latestMessage: string | null
|
||||
latestMessageAt: number | null
|
||||
activitySummary: string | null
|
||||
currentTool: string | null
|
||||
totalCostUsd: number
|
||||
sessionCount: number
|
||||
}
|
||||
|
||||
export interface DashboardResponse {
|
||||
agents: AgentOverview[]
|
||||
summary: {
|
||||
totalAgents: number
|
||||
totalCostUsd: number
|
||||
}
|
||||
}
|
||||
|
||||
interface StatusEvent {
|
||||
agentId: string
|
||||
status: AgentOverview['status']
|
||||
currentTool: string | null
|
||||
error: string | null
|
||||
timestamp: number
|
||||
}
|
||||
|
||||
const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
|
||||
|
||||
export function useAgentDashboard(enabled: boolean) {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
const ready = enabled && Boolean(baseUrl) && !urlLoading
|
||||
|
||||
// Initial data load + periodic refresh as fallback
|
||||
const query = useQuery<DashboardResponse>({
|
||||
queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
|
||||
queryFn: async () => {
|
||||
const url = new URL('/claw/dashboard', baseUrl as string)
|
||||
const response = await fetch(url.toString())
|
||||
if (!response.ok) throw new Error('Failed to fetch dashboard')
|
||||
return response.json()
|
||||
},
|
||||
enabled: ready,
|
||||
})
|
||||
|
||||
// SSE subscription for real-time status patches
|
||||
useEffect(() => {
|
||||
if (!ready || !baseUrl) return
|
||||
|
||||
const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
|
||||
const eventSource = new EventSource(streamUrl.toString())
|
||||
|
||||
eventSource.addEventListener('snapshot', (event) => {
|
||||
try {
|
||||
const dashboard = JSON.parse(event.data) as DashboardResponse
|
||||
queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
|
||||
} catch {}
|
||||
})
|
||||
|
||||
eventSource.addEventListener('status', (event) => {
|
||||
try {
|
||||
const status = JSON.parse(event.data) as StatusEvent
|
||||
queryClient.setQueryData<DashboardResponse>(
|
||||
[...DASHBOARD_QUERY_KEY, baseUrl],
|
||||
(prev) => {
|
||||
if (!prev) return prev
|
||||
return {
|
||||
...prev,
|
||||
agents: prev.agents.map((agent) =>
|
||||
agent.agentId === status.agentId
|
||||
? {
|
||||
...agent,
|
||||
status: status.status,
|
||||
currentTool: status.currentTool,
|
||||
}
|
||||
: agent,
|
||||
),
|
||||
}
|
||||
},
|
||||
)
|
||||
} catch {}
|
||||
})
|
||||
|
||||
return () => {
|
||||
eventSource.close()
|
||||
}
|
||||
}, [ready, baseUrl, queryClient])
|
||||
|
||||
return query
|
||||
}
|
||||
@@ -2,67 +2,87 @@ import { Loader2 } from 'lucide-react'
|
||||
import { type FC, useMemo } from 'react'
|
||||
import { AgentRowCard } from './AgentRowCard'
|
||||
import { AgentsEmptyState } from './AgentsEmptyState'
|
||||
import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from './agent-harness-types'
|
||||
import type {
|
||||
AgentAdapterHealth,
|
||||
AgentRowData,
|
||||
} from './agent-row/agent-row.types'
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import type { AgentLiveness } from './LivenessDot'
|
||||
|
||||
interface AgentListProps {
|
||||
agents: AgentListItem[]
|
||||
/**
|
||||
* Optional per-agent activity metadata. Keyed by `agentId`. Missing
|
||||
* entries fall back to status='unknown' / lastUsedAt=null and the
|
||||
* row renders an "unknown" dot. The server will populate this once
|
||||
* the activity tracker ships; the page works without it.
|
||||
*/
|
||||
/** Optional per-agent activity metadata, keyed by `agentId`. */
|
||||
activity?: Record<
|
||||
string,
|
||||
{ status: AgentLiveness; lastUsedAt: number | null }
|
||||
>
|
||||
/**
|
||||
* Lookup table from harness agent id → adapter + reasoning effort,
|
||||
* sourced from `useHarnessAgents`. Lets the row card render the
|
||||
* correct adapter icon and chips for harness agents (legacy
|
||||
* /claw/agents entries fall back to inferring from `runtimeLabel`).
|
||||
*/
|
||||
/** Lookup table from harness id → enriched agent record. */
|
||||
harnessAgentLookup?: Map<string, HarnessAgent>
|
||||
/** Adapter catalog (carries per-adapter health). */
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
loading: boolean
|
||||
deletingAgentKey: string | null
|
||||
onCreateAgent: () => void
|
||||
onDeleteAgent: (agent: AgentListItem) => void
|
||||
onPinToggle: (agent: AgentListItem, next: boolean) => void
|
||||
}
|
||||
|
||||
export const AgentList: FC<AgentListProps> = ({
|
||||
agents,
|
||||
activity,
|
||||
harnessAgentLookup,
|
||||
adapters,
|
||||
loading,
|
||||
deletingAgentKey,
|
||||
onCreateAgent,
|
||||
onDeleteAgent,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
// Sort by recency: most recently used first; never-used agents drop
|
||||
// to the bottom in id-stable order so the list doesn't reshuffle on
|
||||
// every refresh. The pinned exception is the gateway's `main` agent
|
||||
// when it's never been touched — keep it at the top so a fresh
|
||||
// install has an obvious starting point.
|
||||
const adapterHealth = useMemo(() => {
|
||||
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
|
||||
for (const adapter of adapters) {
|
||||
if (adapter.health) {
|
||||
map.set(adapter.id, {
|
||||
healthy: adapter.health.healthy,
|
||||
reason: adapter.health.reason,
|
||||
})
|
||||
}
|
||||
}
|
||||
return map
|
||||
}, [adapters])
|
||||
|
||||
// Sort: pinned rows first, then most recently used, then never-used
|
||||
// agents in id-stable order. The gateway's `main` agent stays
|
||||
// pinned-to-top when never touched so a fresh install has an
|
||||
// obvious starting point.
|
||||
const ordered = useMemo(() => {
|
||||
const withScore = agents.map((agent) => {
|
||||
const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
|
||||
return { agent, lastUsedAt }
|
||||
const withMeta = agents.map((agent) => {
|
||||
const harness = harnessAgentLookup?.get(agent.agentId)
|
||||
return {
|
||||
agent,
|
||||
pinned: harness?.pinned ?? false,
|
||||
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
|
||||
}
|
||||
})
|
||||
return withScore
|
||||
return withMeta
|
||||
.sort((a, b) => {
|
||||
const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
|
||||
const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
|
||||
if (aPinned && !bPinned) return -1
|
||||
if (!aPinned && bPinned) return 1
|
||||
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
|
||||
const aSeed = a.agent.agentId === 'main' && a.lastUsedAt === null
|
||||
const bSeed = b.agent.agentId === 'main' && b.lastUsedAt === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
const aValue = a.lastUsedAt ?? -Infinity
|
||||
const bValue = b.lastUsedAt ?? -Infinity
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
return a.agent.agentId.localeCompare(b.agent.agentId)
|
||||
})
|
||||
.map((entry) => entry.agent)
|
||||
}, [activity, agents])
|
||||
}, [activity, agents, harnessAgentLookup])
|
||||
|
||||
if (loading && agents.length === 0) {
|
||||
return (
|
||||
@@ -80,18 +100,23 @@ export const AgentList: FC<AgentListProps> = ({
|
||||
<div className="grid gap-3">
|
||||
{ordered.map((agent) => {
|
||||
const harness = harnessAgentLookup?.get(agent.agentId)
|
||||
const adapter: HarnessAgentAdapter | undefined =
|
||||
const adapter: HarnessAgentAdapter | 'unknown' =
|
||||
harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
|
||||
const data = buildRowData({
|
||||
agent,
|
||||
adapter,
|
||||
harness,
|
||||
activity: activity?.[agent.agentId],
|
||||
adapterHealth:
|
||||
adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
|
||||
})
|
||||
return (
|
||||
<AgentRowCard
|
||||
key={agent.key}
|
||||
agent={agent}
|
||||
status={activity?.[agent.agentId]?.status}
|
||||
lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
|
||||
adapter={adapter}
|
||||
reasoningEffort={harness?.reasoningEffort ?? null}
|
||||
onDelete={onDeleteAgent}
|
||||
data={data}
|
||||
deleting={deletingAgentKey === agent.key}
|
||||
onDelete={onDeleteAgent}
|
||||
onPinToggle={onPinToggle}
|
||||
/>
|
||||
)
|
||||
})}
|
||||
@@ -99,10 +124,53 @@ export const AgentList: FC<AgentListProps> = ({
|
||||
)
|
||||
}
|
||||
|
||||
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
|
||||
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
|
||||
const lower = label?.toLowerCase()
|
||||
if (lower === 'claude code') return 'claude'
|
||||
if (lower === 'codex') return 'codex'
|
||||
if (lower === 'openclaw') return 'openclaw'
|
||||
return undefined
|
||||
return 'unknown'
|
||||
}
|
||||
|
||||
const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
|
||||
|
||||
function buildRowData(input: {
|
||||
agent: AgentListItem
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
harness: HarnessAgent | undefined
|
||||
activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}): AgentRowData {
|
||||
const { agent, adapter, harness, activity, adapterHealth } = input
|
||||
return {
|
||||
agent,
|
||||
adapter,
|
||||
modelLabel: deriveModelLabel(agent, harness),
|
||||
reasoningEffort: harness?.reasoningEffort ?? null,
|
||||
status: activity?.status ?? 'unknown',
|
||||
lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
|
||||
pinned: harness?.pinned ?? false,
|
||||
cwd: harness?.cwd ?? null,
|
||||
lastUserMessage: harness?.lastUserMessage ?? null,
|
||||
tokens: harness?.tokens ?? null,
|
||||
turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
|
||||
failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
|
||||
lastError: harness?.lastError ?? null,
|
||||
lastErrorAt: harness?.lastErrorAt ?? null,
|
||||
activeTurnId: harness?.activeTurnId ?? null,
|
||||
adapterHealth,
|
||||
}
|
||||
}
|
||||
|
||||
function deriveModelLabel(
|
||||
agent: AgentListItem,
|
||||
harness: HarnessAgent | undefined,
|
||||
): string | null {
|
||||
// Prefer the agent rail's modelLabel when meaningful; harness's
|
||||
// modelId is a stable identifier but the rail's `modelLabel`
|
||||
// already maps to a friendly display string.
|
||||
if (agent.modelLabel && agent.modelLabel !== 'default') {
|
||||
return agent.modelLabel
|
||||
}
|
||||
return harness?.modelId ?? null
|
||||
}
|
||||
|
||||
@@ -1,270 +1,99 @@
|
||||
import {
|
||||
Copy,
|
||||
Loader2,
|
||||
MessageSquare,
|
||||
MoreHorizontal,
|
||||
Pencil,
|
||||
RotateCcw,
|
||||
Trash2,
|
||||
} from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { toast } from 'sonner'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
DropdownMenuItem,
|
||||
DropdownMenuSeparator,
|
||||
DropdownMenuTrigger,
|
||||
} from '@/components/ui/dropdown-menu'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AdapterIcon, adapterLabel } from './AdapterIcon'
|
||||
import {
|
||||
canDelete as canDeleteAgent,
|
||||
canRename as canRenameAgent,
|
||||
displayName,
|
||||
formatRelativeTime,
|
||||
workspaceLabel,
|
||||
} from './agent-display.helpers'
|
||||
import type { HarnessAgentAdapter } from './agent-harness-types'
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import { type AgentLiveness, LivenessDot } from './LivenessDot'
|
||||
import { AgentActions } from './agent-row/AgentActions'
|
||||
import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
|
||||
import { AgentLastMessage } from './agent-row/AgentLastMessage'
|
||||
import { AgentMetaRow } from './agent-row/AgentMetaRow'
|
||||
import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
|
||||
import { AgentTile } from './agent-row/AgentTile'
|
||||
import { AgentTitleRow } from './agent-row/AgentTitleRow'
|
||||
import type {
|
||||
AgentRowCallbacks,
|
||||
AgentRowData,
|
||||
} from './agent-row/agent-row.types'
|
||||
|
||||
interface AgentRowCardProps {
|
||||
agent: AgentListItem
|
||||
/**
|
||||
* Per-agent extras the listing surface provides on top of the
|
||||
* minimal `AgentListItem` shape. `lastUsedAt` survives server
|
||||
* restart (sourced from acpx session record); `status` is in-memory
|
||||
* server-side.
|
||||
*/
|
||||
status?: AgentLiveness
|
||||
lastUsedAt?: number | null
|
||||
/** Adapter the agent belongs to. Drives icon + label. */
|
||||
adapter?: HarnessAgentAdapter
|
||||
/** Reasoning effort chip (claude/codex/openclaw catalog). */
|
||||
reasoningEffort?: string | null
|
||||
/** Modeled directly off the inbound delete handler so the parent owns the dialog. */
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
/** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
|
||||
interface AgentRowCardProps extends AgentRowCallbacks {
|
||||
data: AgentRowData
|
||||
/** Whether THIS agent is mid-delete; renders a spinner in the menu. */
|
||||
deleting?: boolean
|
||||
}
|
||||
|
||||
/**
|
||||
* Composition shell for the agent rail. Owns no state; sub-components
|
||||
* each handle their own micro-state (error-panel collapse, etc.) and
|
||||
* emit callbacks (delete, pin/unpin) for the page to act on.
|
||||
*
|
||||
* The whole card carries state — not just the tile — so the row's
|
||||
* border subtly tells the user what's going on at a glance:
|
||||
* working → accent-orange border with a soft glow
|
||||
* error → destructive border
|
||||
* idle → muted border, lifts on hover
|
||||
*/
|
||||
export const AgentRowCard: FC<AgentRowCardProps> = ({
|
||||
agent,
|
||||
status = 'unknown',
|
||||
lastUsedAt,
|
||||
adapter,
|
||||
reasoningEffort,
|
||||
onDelete,
|
||||
data,
|
||||
deleting,
|
||||
onDelete,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const navigate = useNavigate()
|
||||
const adapterId = adapter ?? inferAdapterFromListItem(agent)
|
||||
const workspace = workspaceLabel(agent)
|
||||
const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
|
||||
const allowDelete = canDeleteAgent(agent)
|
||||
const allowRename = canRenameAgent(agent)
|
||||
|
||||
const handleChat = () => navigate(`/agents/${agent.agentId}`)
|
||||
const handleCopyId = async () => {
|
||||
try {
|
||||
await navigator.clipboard.writeText(agent.agentId)
|
||||
toast.success('Agent id copied')
|
||||
} catch {
|
||||
toast.error('Could not copy agent id')
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
|
||||
'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
|
||||
// Layout-stable hover. No translate, no shadow change — both
|
||||
// visibly perturb neighbouring rows. Only the border tint
|
||||
// shifts on hover, and the rail's vertical rhythm stays
|
||||
// exactly the same in every state.
|
||||
'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
|
||||
data.status === 'working'
|
||||
? 'border-[var(--accent-orange)]/40'
|
||||
: data.status === 'error'
|
||||
? 'border-destructive/40'
|
||||
: 'border-border hover:border-[var(--accent-orange)]/30',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start gap-4">
|
||||
{/* Adapter tile + liveness dot in the corner. */}
|
||||
<div className="relative shrink-0">
|
||||
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<AdapterIcon adapter={adapterId} className="h-6 w-6" />
|
||||
</div>
|
||||
<LivenessDot
|
||||
status={status}
|
||||
detail={livenessDetail(status, lastUsedAt)}
|
||||
className="absolute -right-0.5 -bottom-0.5"
|
||||
/>
|
||||
</div>
|
||||
<AgentTile
|
||||
adapter={data.adapter}
|
||||
status={data.status}
|
||||
lastUsedAt={data.lastUsedAt}
|
||||
/>
|
||||
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="mb-1 flex items-center gap-2">
|
||||
<span className="truncate font-semibold">{displayName(agent)}</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && (
|
||||
<Badge variant="destructive">Attention</Badge>
|
||||
)}
|
||||
</div>
|
||||
<AgentTitleRow
|
||||
agent={data.agent}
|
||||
status={data.status}
|
||||
pinned={data.pinned}
|
||||
turnsByDay={data.turnsByDay}
|
||||
failedByDay={data.failedByDay}
|
||||
onPinToggle={(next) => onPinToggle(data.agent, next)}
|
||||
/>
|
||||
|
||||
<div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
|
||||
<Badge variant="secondary" className="font-normal">
|
||||
{adapterLabel(adapterId)}
|
||||
</Badge>
|
||||
{agent.modelLabel && agent.modelLabel !== 'default' && (
|
||||
<Badge variant="outline" className="font-normal">
|
||||
{agent.modelLabel}
|
||||
</Badge>
|
||||
)}
|
||||
{reasoningEffort && reasoningEffort !== 'medium' && (
|
||||
<Badge variant="outline" className="font-normal">
|
||||
{reasoningEffort}
|
||||
</Badge>
|
||||
)}
|
||||
</div>
|
||||
<AgentSummaryChips
|
||||
adapter={data.adapter}
|
||||
modelLabel={data.modelLabel}
|
||||
reasoningEffort={data.reasoningEffort}
|
||||
adapterHealth={data.adapterHealth}
|
||||
/>
|
||||
|
||||
<div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
|
||||
<span>Last used {lastUsedLabel}</span>
|
||||
{workspace && (
|
||||
<>
|
||||
<span aria-hidden>•</span>
|
||||
<span className="truncate font-mono" title={workspace}>
|
||||
{workspace}
|
||||
</span>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
<AgentLastMessage message={data.lastUserMessage} />
|
||||
|
||||
<AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
|
||||
|
||||
{data.status === 'error' && data.lastError && (
|
||||
<AgentErrorPanel
|
||||
agentId={data.agent.agentId}
|
||||
message={data.lastError}
|
||||
errorAt={data.lastErrorAt}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div className="flex shrink-0 items-center gap-2">
|
||||
<Button variant="outline" size="sm" onClick={handleChat}>
|
||||
<MessageSquare className="mr-1.5 h-3 w-3" />
|
||||
Chat
|
||||
</Button>
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
aria-label={`More actions for ${displayName(agent)}`}
|
||||
className="h-8 w-8"
|
||||
>
|
||||
<MoreHorizontal className="h-4 w-4" />
|
||||
</Button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end" className="w-44">
|
||||
<DropdownMenuItem onSelect={() => void handleCopyId()}>
|
||||
<Copy className="mr-2 h-3.5 w-3.5" />
|
||||
Copy id
|
||||
</DropdownMenuItem>
|
||||
<RenameMenuItem disabled={!allowRename} />
|
||||
<ResetHistoryMenuItem />
|
||||
<DropdownMenuSeparator />
|
||||
<DropdownMenuItem
|
||||
onSelect={() => onDelete(agent)}
|
||||
disabled={!allowDelete || deleting}
|
||||
className="text-destructive focus:text-destructive"
|
||||
>
|
||||
{deleting ? (
|
||||
<Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
|
||||
) : (
|
||||
<Trash2 className="mr-2 h-3.5 w-3.5" />
|
||||
)}
|
||||
Delete
|
||||
</DropdownMenuItem>
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
</div>
|
||||
<AgentActions
|
||||
agent={data.agent}
|
||||
activeTurnId={data.activeTurnId}
|
||||
deleting={deleting}
|
||||
onDelete={onDelete}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<Pencil className="mr-2 h-3.5 w-3.5" />
|
||||
Rename
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
if (!disabled) return item
|
||||
// Disabled but with a hint so users know it's coming, not broken.
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
Rename coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
|
||||
const ResetHistoryMenuItem: FC = () => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<RotateCcw className="mr-2 h-3.5 w-3.5" />
|
||||
Reset history
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
Reset history coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
|
||||
function inferAdapterFromListItem(
|
||||
agent: AgentListItem,
|
||||
): HarnessAgentAdapter | 'unknown' {
|
||||
const label = agent.runtimeLabel?.toLowerCase()
|
||||
if (label?.includes('claude')) return 'claude'
|
||||
if (label?.includes('codex')) return 'codex'
|
||||
if (label?.includes('openclaw')) return 'openclaw'
|
||||
return 'unknown'
|
||||
}
|
||||
|
||||
function livenessDetail(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null | undefined,
|
||||
): string | undefined {
|
||||
if (lastUsedAt == null) return undefined
|
||||
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
|
||||
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
|
||||
if (status === 'asleep') {
|
||||
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
|
||||
const hr = Math.floor(diffMin / 60)
|
||||
return `Asleep — quiet for ${hr} hr`
|
||||
}
|
||||
if (status === 'working') return 'Working on a turn'
|
||||
if (status === 'error') return 'Attention — last turn failed'
|
||||
return undefined
|
||||
}
|
||||
|
||||
@@ -44,6 +44,7 @@ import {
|
||||
useCreateHarnessAgent,
|
||||
useDeleteHarnessAgent,
|
||||
useHarnessAgents,
|
||||
useUpdateHarnessAgent,
|
||||
} from './useAgents'
|
||||
import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'
|
||||
|
||||
@@ -76,6 +77,7 @@ export const AgentsPage: FC = () => {
|
||||
} = useOpenClawAgents(openClawAgentsEnabled)
|
||||
const createHarnessAgent = useCreateHarnessAgent()
|
||||
const deleteHarnessAgent = useDeleteHarnessAgent()
|
||||
const updateHarnessAgent = useUpdateHarnessAgent()
|
||||
const {
|
||||
setupOpenClaw,
|
||||
createAgent: createOpenClawAgent,
|
||||
@@ -342,12 +344,24 @@ export const AgentsPage: FC = () => {
|
||||
agents={agentListItems}
|
||||
activity={agentActivity}
|
||||
harnessAgentLookup={harnessAgentLookup}
|
||||
adapters={adapters}
|
||||
loading={agentsLoading}
|
||||
deletingAgentKey={deletingAgent ? deletingAgentKey : null}
|
||||
onCreateAgent={() => setCreateOpen(true)}
|
||||
onDeleteAgent={(agent) => {
|
||||
void handleDelete(agent)
|
||||
}}
|
||||
onPinToggle={(agent, next) => {
|
||||
// Optimistic mutation; harness-only — gateway-original
|
||||
// OpenClaw entries are gated server-side via the harness
|
||||
// backfill, so we only fire when the row maps to a
|
||||
// harness agent record.
|
||||
if (!harnessAgentLookup.has(agent.agentId)) return
|
||||
updateHarnessAgent.mutate({
|
||||
agentId: agent.agentId,
|
||||
patch: { pinned: next },
|
||||
})
|
||||
}}
|
||||
/>
|
||||
|
||||
<SetupOpenClawDialog
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import type { AgentLiveness } from './LivenessDot'
|
||||
|
||||
/**
|
||||
* Display rules for the redesigned agent rows. Pure helpers — no React,
|
||||
@@ -82,3 +83,25 @@ export function formatRelativeTime(epochMs: number | null): string {
|
||||
const d = Math.floor(diff / ONE_DAY)
|
||||
return d === 1 ? '1 day ago' : `${d} days ago`
|
||||
}
|
||||
|
||||
/**
|
||||
* Tooltip-friendly description of a row's current liveness state.
|
||||
* Returns `undefined` when the state has nothing extra to add (e.g.
|
||||
* `unknown` with no timestamp).
|
||||
*/
|
||||
export function livenessDetail(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null | undefined,
|
||||
): string | undefined {
|
||||
if (lastUsedAt == null) return undefined
|
||||
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
|
||||
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
|
||||
if (status === 'asleep') {
|
||||
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
|
||||
const hr = Math.floor(diffMin / 60)
|
||||
return `Asleep — quiet for ${hr} hr`
|
||||
}
|
||||
if (status === 'working') return 'Working on a turn'
|
||||
if (status === 'error') return 'Attention — last turn failed'
|
||||
return undefined
|
||||
}
|
||||
|
||||
@@ -56,6 +56,43 @@ export interface HarnessAgent {
|
||||
* agents. Drives the recency sort and the "Last used X min ago" copy.
|
||||
*/
|
||||
lastUsedAt?: number | null
|
||||
/** Pinned agents float to the top of the list. Defaults to `false`. */
|
||||
pinned?: boolean
|
||||
/** First non-blank line of the most recent user message; null if none. */
|
||||
lastUserMessage?: string | null
|
||||
/** Working directory the agent runs in; null when no session record yet. */
|
||||
cwd?: string | null
|
||||
/** Cumulative + 7-day rolling token usage; null when no record. */
|
||||
tokens?: {
|
||||
last7d: { input: number; output: number; requestCount: number }
|
||||
cumulative: { input: number; output: number }
|
||||
} | null
|
||||
turnsByDay?: number[]
|
||||
failedByDay?: number[]
|
||||
lastError?: string | null
|
||||
lastErrorAt?: number | null
|
||||
/** When non-null, an in-flight turn this row can be resumed from. */
|
||||
activeTurnId?: string | null
|
||||
/** Persistent FIFO queue of messages waiting for this agent. */
|
||||
queue?: HarnessQueuedMessage[]
|
||||
}
|
||||
|
||||
export interface HarnessQueuedMessageAttachment {
|
||||
mediaType: string
|
||||
data: string
|
||||
}
|
||||
|
||||
export interface HarnessQueuedMessage {
|
||||
id: string
|
||||
createdAt: number
|
||||
message: string
|
||||
attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
|
||||
}
|
||||
|
||||
export interface HarnessAdapterHealth {
|
||||
healthy: boolean
|
||||
reason?: string
|
||||
checkedAt: number
|
||||
}
|
||||
|
||||
export interface HarnessAdapterDescriptor {
|
||||
@@ -66,6 +103,7 @@ export interface HarnessAdapterDescriptor {
|
||||
modelControl: 'runtime-supported' | 'best-effort'
|
||||
models: Array<{ id: string; label: string; recommended?: boolean }>
|
||||
reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
|
||||
health?: HarnessAdapterHealth
|
||||
}
|
||||
|
||||
export interface CreateHarnessAgentInput {
|
||||
|
||||
@@ -0,0 +1,160 @@
|
||||
import {
|
||||
Copy,
|
||||
Loader2,
|
||||
MessageSquare,
|
||||
MoreHorizontal,
|
||||
Pencil,
|
||||
RotateCcw,
|
||||
Trash2,
|
||||
} from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { toast } from 'sonner'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
DropdownMenuItem,
|
||||
DropdownMenuSeparator,
|
||||
DropdownMenuTrigger,
|
||||
} from '@/components/ui/dropdown-menu'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import {
|
||||
canDelete as canDeleteAgent,
|
||||
canRename as canRenameAgent,
|
||||
displayName,
|
||||
} from '../agent-display.helpers'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
|
||||
interface AgentActionsProps {
|
||||
agent: AgentListItem
|
||||
activeTurnId: string | null
|
||||
deleting?: boolean
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Single primary CTA per row: `Resume` (filled, accent-orange, with a
|
||||
* pulsing dot) when an active turn exists; otherwise `Chat` (outline).
|
||||
* Both navigate to the same place — the chat hook auto-attaches via
|
||||
* `/chat/active` when there's a live turn — but the row signals which
|
||||
* action the user is actually taking.
|
||||
*/
|
||||
export const AgentActions: FC<AgentActionsProps> = ({
|
||||
agent,
|
||||
activeTurnId,
|
||||
deleting,
|
||||
onDelete,
|
||||
}) => {
|
||||
const navigate = useNavigate()
|
||||
const allowDelete = canDeleteAgent(agent)
|
||||
const allowRename = canRenameAgent(agent)
|
||||
|
||||
const handleChat = () => navigate(`/agents/${agent.agentId}`)
|
||||
const handleCopyId = async () => {
|
||||
try {
|
||||
await navigator.clipboard.writeText(agent.agentId)
|
||||
toast.success('Agent id copied')
|
||||
} catch {
|
||||
toast.error('Could not copy agent id')
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex shrink-0 items-center gap-1.5">
|
||||
{activeTurnId ? (
|
||||
<Button
|
||||
variant="default"
|
||||
size="sm"
|
||||
onClick={handleChat}
|
||||
className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
|
||||
>
|
||||
<span className="relative flex size-2">
|
||||
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
|
||||
<span className="relative inline-flex size-2 rounded-full bg-white" />
|
||||
</span>
|
||||
Resume
|
||||
</Button>
|
||||
) : (
|
||||
<Button variant="outline" size="sm" onClick={handleChat}>
|
||||
<MessageSquare className="mr-1.5 size-3" />
|
||||
Chat
|
||||
</Button>
|
||||
)}
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
aria-label={`More actions for ${displayName(agent)}`}
|
||||
className="size-8 text-muted-foreground hover:text-foreground"
|
||||
>
|
||||
<MoreHorizontal className="size-4" />
|
||||
</Button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end" className="w-44">
|
||||
<DropdownMenuItem onSelect={() => void handleCopyId()}>
|
||||
<Copy className="mr-2 size-3.5" />
|
||||
Copy id
|
||||
</DropdownMenuItem>
|
||||
<ComingSoonItem
|
||||
icon={Pencil}
|
||||
label="Rename"
|
||||
disabled={!allowRename}
|
||||
/>
|
||||
<ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
|
||||
<DropdownMenuSeparator />
|
||||
<DropdownMenuItem
|
||||
onSelect={() => onDelete(agent)}
|
||||
disabled={!allowDelete || deleting}
|
||||
className="text-destructive focus:text-destructive"
|
||||
>
|
||||
{deleting ? (
|
||||
<Loader2 className="mr-2 size-3.5 animate-spin" />
|
||||
) : (
|
||||
<Trash2 className="mr-2 size-3.5" />
|
||||
)}
|
||||
Delete
|
||||
</DropdownMenuItem>
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
interface ComingSoonItemProps {
|
||||
icon: typeof Pencil
|
||||
label: string
|
||||
disabled: boolean
|
||||
}
|
||||
|
||||
const ComingSoonItem: FC<ComingSoonItemProps> = ({
|
||||
icon: Icon,
|
||||
label,
|
||||
disabled,
|
||||
}) => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<Icon className="mr-2 size-3.5" />
|
||||
{label}
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
if (!disabled) return item
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
{label} coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,96 @@
|
||||
import { AlertTriangle, ChevronDown } from 'lucide-react'
|
||||
import { type FC, useEffect, useState } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
Collapsible,
|
||||
CollapsibleContent,
|
||||
CollapsibleTrigger,
|
||||
} from '@/components/ui/collapsible'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { truncate } from './agent-row.helpers'
|
||||
|
||||
interface AgentErrorPanelProps {
|
||||
agentId: string
|
||||
message: string
|
||||
errorAt: number | null
|
||||
}
|
||||
|
||||
const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
|
||||
const PREVIEW_CHARS = 200
|
||||
|
||||
export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
|
||||
agentId,
|
||||
message,
|
||||
errorAt,
|
||||
}) => {
|
||||
const storageKey = `${STORAGE_PREFIX}${agentId}`
|
||||
// Open if we've never seen this `errorAt` for this agent. Once the
|
||||
// user collapses the panel (or refreshes after seeing it), we mark
|
||||
// it seen so it doesn't re-pop on every poll.
|
||||
const [open, setOpen] = useState<boolean>(() => {
|
||||
if (typeof window === 'undefined' || !errorAt) return true
|
||||
const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
|
||||
return !Number.isFinite(seen) || errorAt > seen
|
||||
})
|
||||
|
||||
useEffect(() => {
|
||||
if (!open && errorAt && typeof window !== 'undefined') {
|
||||
window.localStorage.setItem(storageKey, String(errorAt))
|
||||
}
|
||||
}, [open, errorAt, storageKey])
|
||||
|
||||
const preview = truncate(message, PREVIEW_CHARS)
|
||||
const truncated = preview.length < message.length
|
||||
|
||||
return (
|
||||
<Collapsible open={open} onOpenChange={setOpen} className="mt-3">
|
||||
<div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
|
||||
<div className="flex items-center gap-2 font-medium text-destructive text-xs">
|
||||
<AlertTriangle className="size-3.5" />
|
||||
Last error
|
||||
</div>
|
||||
<CollapsibleTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="sm"
|
||||
className="h-6 px-2 text-muted-foreground"
|
||||
>
|
||||
<span className="text-xs">{open ? 'hide' : 'show'}</span>
|
||||
<ChevronDown
|
||||
className={cn(
|
||||
'ml-1 size-3 transition-transform',
|
||||
open && 'rotate-180',
|
||||
)}
|
||||
/>
|
||||
</Button>
|
||||
</CollapsibleTrigger>
|
||||
</div>
|
||||
<CollapsibleContent>
|
||||
<div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
|
||||
{truncated ? (
|
||||
<HoverCard openDelay={300}>
|
||||
<HoverCardTrigger asChild>
|
||||
<span className="cursor-default font-mono text-foreground/80">
|
||||
{preview}…
|
||||
</span>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent
|
||||
side="bottom"
|
||||
className="max-w-md whitespace-pre-wrap font-mono text-xs"
|
||||
>
|
||||
{message}
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
) : (
|
||||
<span className="font-mono text-foreground/80">{message}</span>
|
||||
)}
|
||||
</div>
|
||||
</CollapsibleContent>
|
||||
</Collapsible>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
import { Quote } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { firstNonBlankLine, truncate } from './agent-row.helpers'
|
||||
|
||||
interface AgentLastMessageProps {
|
||||
message: string | null
|
||||
}
|
||||
|
||||
const PREVIEW_CHARS = 110
|
||||
|
||||
/**
|
||||
* Inline preview of the most recent user message. Renders as a quoted,
|
||||
* italic line so the row reads like a conversation snippet rather than
|
||||
* a label-and-value pair. No hover-card — opening the agent's chat is
|
||||
* the canonical way to read the full message.
|
||||
*/
|
||||
export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
|
||||
if (!message) {
|
||||
return (
|
||||
<p className="mt-1 text-muted-foreground/70 text-xs italic">
|
||||
No messages yet — start a chat
|
||||
</p>
|
||||
)
|
||||
}
|
||||
const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
|
||||
return (
|
||||
<p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
|
||||
<Quote
|
||||
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
|
||||
aria-hidden
|
||||
/>
|
||||
<span className="truncate">{preview}</span>
|
||||
</p>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
import type { FC } from 'react'
|
||||
import { formatRelativeTime } from '../agent-display.helpers'
|
||||
import { AgentTokenSummary } from './AgentTokenSummary'
|
||||
import type { AgentTokenUsage } from './agent-row.types'
|
||||
|
||||
interface AgentMetaRowProps {
|
||||
lastUsedAt: number | null
|
||||
tokens: AgentTokenUsage | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Bottom-of-row meta line. Intentionally sparse — last activity time
|
||||
* and lifetime tokens. CWD is no longer surfaced here because the path
|
||||
* the server happens to be running from isn't actionable; if a future
|
||||
* surface needs the cwd (chat panel, debug view) it reads from the
|
||||
* listing payload directly.
|
||||
*/
|
||||
export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
|
||||
const lastUsedLabel = formatRelativeTime(lastUsedAt)
|
||||
const tokensTotal =
|
||||
(tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
|
||||
const showTokens = tokensTotal > 0
|
||||
|
||||
return (
|
||||
<div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
|
||||
<span>{lastUsedLabel}</span>
|
||||
{showTokens && (
|
||||
<>
|
||||
<span aria-hidden className="text-muted-foreground/50">
|
||||
·
|
||||
</span>
|
||||
<AgentTokenSummary tokens={tokens} />
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,92 @@
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
|
||||
|
||||
interface AgentSparklineProps {
|
||||
/** 14 entries, oldest → newest. Today's bucket is the last index. */
|
||||
turnsByDay: number[]
|
||||
/** Same length, same order. Failed turns counted separately. */
|
||||
failedByDay: number[]
|
||||
className?: string
|
||||
}
|
||||
|
||||
const MIN_BAR_HEIGHT_PX = 2
|
||||
const MAX_BAR_HEIGHT_PX = 18
|
||||
|
||||
export const AgentSparkline: FC<AgentSparklineProps> = ({
|
||||
turnsByDay,
|
||||
failedByDay,
|
||||
className,
|
||||
}) => {
|
||||
if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
|
||||
const max = Math.max(1, ...turnsByDay)
|
||||
|
||||
return (
|
||||
<HoverCard openDelay={250}>
|
||||
<HoverCardTrigger asChild>
|
||||
<div
|
||||
role="img"
|
||||
aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
|
||||
className={cn('flex h-5 items-end gap-px', className)}
|
||||
>
|
||||
{turnsByDay.map((count, idx) => {
|
||||
const ratio = count / max
|
||||
const height = Math.max(
|
||||
MIN_BAR_HEIGHT_PX,
|
||||
Math.round(ratio * MAX_BAR_HEIGHT_PX),
|
||||
)
|
||||
const isToday = idx === ROW_BAR_COUNT - 1
|
||||
const failed = failedByDay[idx] ?? 0
|
||||
return (
|
||||
<div
|
||||
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
|
||||
key={`bar-${idx}`}
|
||||
className={cn(
|
||||
'w-1.5 rounded-sm',
|
||||
count === 0
|
||||
? 'bg-muted-foreground/15'
|
||||
: failed > 0
|
||||
? 'bg-destructive/50'
|
||||
: 'bg-[var(--accent-orange)]/50',
|
||||
isToday && 'ring-1 ring-foreground/30',
|
||||
)}
|
||||
style={{ height }}
|
||||
/>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="left" className="w-56 text-xs">
|
||||
<div className="mb-2 font-medium text-sm">Last 14 days</div>
|
||||
<ul className="space-y-0.5">
|
||||
{turnsByDay.map((count, idx) => {
|
||||
const failed = failedByDay[idx] ?? 0
|
||||
const dayLabel = formatLocalDate(idx)
|
||||
return (
|
||||
<li
|
||||
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
|
||||
key={`day-${idx}`}
|
||||
className="flex items-center justify-between text-muted-foreground"
|
||||
>
|
||||
<span>{dayLabel}</span>
|
||||
<span>
|
||||
{count}
|
||||
{failed > 0 && (
|
||||
<span className="ml-1 text-destructive">
|
||||
({failed} failed)
|
||||
</span>
|
||||
)}
|
||||
</span>
|
||||
</li>
|
||||
)
|
||||
})}
|
||||
</ul>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,71 @@
|
||||
import { TriangleAlert } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { adapterLabel } from '../AdapterIcon'
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import type { AgentAdapterHealth } from './agent-row.types'
|
||||
|
||||
interface AgentSummaryChipsProps {
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelLabel: string | null
|
||||
reasoningEffort: string | null
|
||||
/** When unhealthy, the adapter label dims and a warning chip appears. */
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Adapter / model / reasoning summary line. Always rendered (so OpenClaw
|
||||
* rows that fall back to defaults still expose what they're set up to do)
|
||||
* and surfaces adapter-health *only when unhealthy* — keeping the calm
|
||||
* default state silent and reserving visual noise for things the user
|
||||
* needs to act on.
|
||||
*/
|
||||
export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
|
||||
adapter,
|
||||
modelLabel,
|
||||
reasoningEffort,
|
||||
adapterHealth,
|
||||
}) => {
|
||||
const parts = [adapterLabel(adapter)]
|
||||
if (modelLabel) parts.push(modelLabel)
|
||||
if (reasoningEffort) parts.push(reasoningEffort)
|
||||
const unhealthy = adapterHealth?.healthy === false
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'flex items-center gap-1.5 text-muted-foreground text-xs',
|
||||
unhealthy && 'text-muted-foreground/70',
|
||||
)}
|
||||
>
|
||||
<span className="truncate">{parts.join(' · ')}</span>
|
||||
{unhealthy && adapterHealth && (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
<TriangleAlert className="size-2.5" />
|
||||
<span className="font-normal">Unavailable</span>
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="right" className="w-72 text-sm">
|
||||
<div className="font-medium">
|
||||
{adapterLabel(adapter)} CLI not available
|
||||
</div>
|
||||
<div className="mt-1 text-muted-foreground text-xs">
|
||||
{adapterHealth.reason ??
|
||||
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
|
||||
</div>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
import type { FC } from 'react'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AdapterIcon } from '../AdapterIcon'
|
||||
import { livenessDetail } from '../agent-display.helpers'
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import { type AgentLiveness, LivenessDot } from '../LivenessDot'
|
||||
|
||||
export interface AgentTileProps {
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
status: AgentLiveness
|
||||
lastUsedAt: number | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Adapter glyph + a single liveness dot. Adapter health is no longer
|
||||
* surfaced here — it lives as an inline pill inside `AgentSummaryChips`
|
||||
* so the user isn't asked to disambiguate two dots on the same tile.
|
||||
*/
|
||||
export const AgentTile: FC<AgentTileProps> = ({
|
||||
adapter,
|
||||
status,
|
||||
lastUsedAt,
|
||||
}) => (
|
||||
<div className="relative shrink-0">
|
||||
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<AdapterIcon adapter={adapter} className="h-6 w-6" />
|
||||
</div>
|
||||
<LivenessDot
|
||||
status={status}
|
||||
detail={livenessDetail(status, lastUsedAt)}
|
||||
className={cn(
|
||||
'absolute -right-0.5 -bottom-0.5',
|
||||
status === 'working' && 'animate-pulse',
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
@@ -0,0 +1,55 @@
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { displayName } from '../agent-display.helpers'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
import type { AgentLiveness } from '../LivenessDot'
|
||||
import { AgentSparkline } from './AgentSparkline'
|
||||
import { PinToggle } from './PinToggle'
|
||||
|
||||
interface AgentTitleRowProps {
|
||||
agent: AgentListItem
|
||||
status: AgentLiveness
|
||||
pinned: boolean
|
||||
turnsByDay: number[]
|
||||
failedByDay: number[]
|
||||
onPinToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Title strip: name + status badge + (right-aligned) sparkline. The
|
||||
* pin toggle sits trailing the title so the title always flushes left
|
||||
* regardless of pin state — moving the star left of the title indents
|
||||
* the row's first line off-axis from the model/preview/meta lines
|
||||
* below it. When unpinned and not hovered, the toggle is removed from
|
||||
* layout entirely so it reserves no space at all.
|
||||
*/
|
||||
export const AgentTitleRow: FC<AgentTitleRowProps> = ({
|
||||
agent,
|
||||
status,
|
||||
pinned,
|
||||
turnsByDay,
|
||||
failedByDay,
|
||||
onPinToggle,
|
||||
}) => (
|
||||
<div className="mb-1 flex items-center gap-2">
|
||||
<span className="truncate font-semibold">{displayName(agent)}</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && <Badge variant="destructive">Attention</Badge>}
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
<div className="ml-auto">
|
||||
<AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
@@ -0,0 +1,63 @@
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { Progress } from '@/components/ui/progress'
|
||||
import { formatTokens } from './agent-row.helpers'
|
||||
import type { AgentTokenUsage } from './agent-row.types'
|
||||
|
||||
interface AgentTokenSummaryProps {
|
||||
tokens: AgentTokenUsage | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
|
||||
* (the only window we can compute reliably from the session record).
|
||||
* Per-window stats land in a follow-up once the activity ledger ships.
|
||||
*/
|
||||
export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
|
||||
if (!tokens) return null
|
||||
const { input, output } = tokens.cumulative
|
||||
const total = input + output
|
||||
if (total === 0) return null
|
||||
const inputPct = (input / total) * 100
|
||||
|
||||
return (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
|
||||
{formatTokens(total)} tokens
|
||||
</span>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="top" align="end" className="w-72 text-sm">
|
||||
<div className="mb-3 flex items-center justify-between">
|
||||
<span className="font-medium">Lifetime tokens</span>
|
||||
<span className="text-muted-foreground text-xs tabular-nums">
|
||||
{formatTokens(total)} total
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center justify-between text-xs">
|
||||
<span className="text-muted-foreground">Input</span>
|
||||
<span className="tabular-nums">{formatTokens(input)}</span>
|
||||
</div>
|
||||
<Progress value={inputPct} className="h-1.5" />
|
||||
|
||||
<div className="mt-2 flex items-center justify-between text-xs">
|
||||
<span className="text-muted-foreground">Output</span>
|
||||
<span className="tabular-nums">{formatTokens(output)}</span>
|
||||
</div>
|
||||
<Progress value={100 - inputPct} className="h-1.5" />
|
||||
</div>
|
||||
|
||||
<p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
|
||||
Cumulative across every turn this agent has run. Per-window stats
|
||||
arrive in a future release.
|
||||
</p>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
import { Star } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface PinToggleProps {
|
||||
pinned: boolean
|
||||
onToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Trailing star toggle. The button is *always rendered* — only its
|
||||
* opacity changes between pinned/unpinned/hover states — so the title
|
||||
* row's height is constant. Hiding the slot via `display: none` would
|
||||
* collapse the row's vertical metrics on hover and shift every card
|
||||
* below in the rail.
|
||||
*
|
||||
* Placement is trailing the title (after the status badge) so the
|
||||
* title itself flushes left regardless of pin state — leading the
|
||||
* row with the star would indent the title relative to the model /
|
||||
* preview / meta lines beneath it.
|
||||
*/
|
||||
export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
className={cn(
|
||||
'size-6 text-muted-foreground transition-opacity hover:text-foreground',
|
||||
pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
|
||||
)}
|
||||
aria-pressed={pinned}
|
||||
aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
|
||||
onClick={(event) => {
|
||||
event.stopPropagation()
|
||||
onToggle(!pinned)
|
||||
}}
|
||||
>
|
||||
<Star
|
||||
className={cn(
|
||||
'size-3.5',
|
||||
pinned && 'fill-amber-400 text-amber-500',
|
||||
)}
|
||||
/>
|
||||
</Button>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="top" className="text-xs">
|
||||
{pinned ? 'Unpin' : 'Pin to top'}
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
@@ -0,0 +1,73 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import {
|
||||
firstNonBlankLine,
|
||||
formatLocalDate,
|
||||
formatTokens,
|
||||
ROW_BAR_COUNT,
|
||||
truncate,
|
||||
} from './agent-row.helpers'
|
||||
|
||||
describe('formatTokens', () => {
|
||||
it('renders zero / NaN as "0"', () => {
|
||||
expect(formatTokens(0)).toBe('0')
|
||||
expect(formatTokens(Number.NaN)).toBe('0')
|
||||
})
|
||||
|
||||
it('renders sub-1K as integer', () => {
|
||||
expect(formatTokens(142)).toBe('142')
|
||||
})
|
||||
|
||||
it('renders K with one decimal under 10', () => {
|
||||
expect(formatTokens(8_400)).toBe('8.4K')
|
||||
})
|
||||
|
||||
it('drops the decimal at >=10K', () => {
|
||||
expect(formatTokens(120_000)).toBe('120K')
|
||||
})
|
||||
|
||||
it('renders M with one decimal under 10', () => {
|
||||
expect(formatTokens(1_200_000)).toBe('1.2M')
|
||||
})
|
||||
})
|
||||
|
||||
describe('firstNonBlankLine', () => {
|
||||
it('returns the first non-blank line', () => {
|
||||
expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
|
||||
})
|
||||
|
||||
it('skips USER_QUERY envelope tags', () => {
|
||||
expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
|
||||
'fix tests',
|
||||
)
|
||||
})
|
||||
|
||||
it('falls back to the trimmed input when nothing matches', () => {
|
||||
expect(firstNonBlankLine(' single ')).toBe('single')
|
||||
})
|
||||
})
|
||||
|
||||
describe('truncate', () => {
|
||||
it('returns input unchanged when within limit', () => {
|
||||
expect(truncate('hello', 10)).toBe('hello')
|
||||
})
|
||||
|
||||
it('appends an ellipsis when over limit', () => {
|
||||
expect(truncate('hello world', 6)).toBe('hello…')
|
||||
})
|
||||
})
|
||||
|
||||
describe('formatLocalDate', () => {
|
||||
const today = new Date('2026-04-30T12:00:00Z')
|
||||
|
||||
it('labels today and yesterday explicitly', () => {
|
||||
expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
|
||||
expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
|
||||
})
|
||||
|
||||
it('returns a "Mon D" format for older days', () => {
|
||||
const label = formatLocalDate(0, today)
|
||||
// "Apr 17" or "Apr 17," depending on locale; just assert it
|
||||
// contains a month abbreviation and a day number.
|
||||
expect(label).toMatch(/[A-Za-z]+ \d+/)
|
||||
})
|
||||
})
|
||||
@@ -0,0 +1,64 @@
|
||||
/**
|
||||
* Pure formatters consumed by row sub-components. Kept distinct from
|
||||
* `agent-display.helpers.ts` (page-level helpers) so the row internals
|
||||
* have an obvious single home.
|
||||
*/
|
||||
|
||||
const TOKEN_THRESHOLDS: Array<[number, string]> = [
|
||||
[1_000_000, 'M'],
|
||||
[1_000, 'K'],
|
||||
]
|
||||
|
||||
/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
|
||||
export function formatTokens(n: number): string {
|
||||
if (!Number.isFinite(n) || n <= 0) return '0'
|
||||
for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
|
||||
if (n >= threshold) {
|
||||
const value = n / threshold
|
||||
const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
|
||||
return `${decimal}${suffix}`
|
||||
}
|
||||
}
|
||||
return String(Math.round(n))
|
||||
}
|
||||
|
||||
const USER_QUERY_OPEN = /^<USER_QUERY>$/i
|
||||
const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
|
||||
|
||||
/**
|
||||
* First non-blank line, with the BrowserOS user-system-prompt
|
||||
* `<USER_QUERY>` envelope tags stripped so previews don't show
|
||||
* structural noise.
|
||||
*/
|
||||
export function firstNonBlankLine(text: string): string {
|
||||
const lines = text.split('\n').map((line) => line.trim())
|
||||
for (const line of lines) {
|
||||
if (!line) continue
|
||||
if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
|
||||
return line
|
||||
}
|
||||
return text.trim()
|
||||
}
|
||||
|
||||
export function truncate(text: string, max: number): string {
|
||||
if (text.length <= max) return text
|
||||
return `${text.slice(0, max - 1).trimEnd()}…`
|
||||
}
|
||||
|
||||
const SPARKLINE_DAYS = 14
|
||||
|
||||
/**
|
||||
* "today" / "yesterday" / "Apr 17" — given an index 0..13 from
|
||||
* oldest → newest. `today` defaults to `new Date()` so callers don't
|
||||
* have to thread a clock through.
|
||||
*/
|
||||
export function formatLocalDate(idx: number, today: Date = new Date()): string {
|
||||
if (idx === SPARKLINE_DAYS - 1) return 'today'
|
||||
if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
|
||||
const offset = SPARKLINE_DAYS - 1 - idx
|
||||
const date = new Date(today)
|
||||
date.setDate(date.getDate() - offset)
|
||||
return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
|
||||
}
|
||||
|
||||
export const ROW_BAR_COUNT = SPARKLINE_DAYS
|
||||
@@ -0,0 +1,51 @@
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
import type { AgentLiveness } from '../LivenessDot'
|
||||
|
||||
/**
|
||||
* Window-bounded token usage. Server returns `null` when no session
|
||||
* record exists yet for the agent.
|
||||
*/
|
||||
export interface AgentTokenUsage {
|
||||
last7d: { input: number; output: number; requestCount: number }
|
||||
cumulative: { input: number; output: number }
|
||||
}
|
||||
|
||||
export interface AgentAdapterHealth {
|
||||
healthy: boolean
|
||||
reason?: string
|
||||
}
|
||||
|
||||
/**
|
||||
* Everything an `AgentRowCard` needs to render. Mirrors the shape
|
||||
* `useHarnessAgents` exposes; the page assembles one entry per row in
|
||||
* `AgentList` and passes it down. Sub-components only see slices of
|
||||
* this object — no prop drilling beyond two levels.
|
||||
*/
|
||||
export interface AgentRowData {
|
||||
agent: AgentListItem
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelLabel: string | null
|
||||
reasoningEffort: string | null
|
||||
status: AgentLiveness
|
||||
lastUsedAt: number | null
|
||||
pinned: boolean
|
||||
cwd: string | null
|
||||
lastUserMessage: string | null
|
||||
tokens: AgentTokenUsage | null
|
||||
/** 14 entries, oldest → newest. Today is the last index. */
|
||||
turnsByDay: number[]
|
||||
/** Same length and ordering as `turnsByDay`. */
|
||||
failedByDay: number[]
|
||||
lastError: string | null
|
||||
lastErrorAt: number | null
|
||||
/** When non-null, an in-flight turn this row can be resumed from. */
|
||||
activeTurnId: string | null
|
||||
/** Adapter-level health, shared across rows for the same adapter. */
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}
|
||||
|
||||
export interface AgentRowCallbacks {
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
onPinToggle: (agent: AgentListItem, next: boolean) => void
|
||||
}
|
||||
@@ -8,6 +8,7 @@ import {
|
||||
type HarnessAdapterDescriptor,
|
||||
type HarnessAgent,
|
||||
type HarnessAgentHistoryPage,
|
||||
type HarnessQueuedMessage,
|
||||
mapHarnessAgentToEntry,
|
||||
} from './agent-harness-types'
|
||||
import type { OpenClawStatus } from './useOpenClaw'
|
||||
@@ -135,6 +136,63 @@ export function useCreateHarnessAgent() {
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply a partial update to a harness agent. Used by the pin-toggle
|
||||
* star and (eventually) the inline rename UI. Optimistically writes
|
||||
* the patch into the listing query cache so the row updates instantly,
|
||||
* then rolls back if the server rejects the change.
|
||||
*/
|
||||
export function useUpdateHarnessAgent() {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: {
|
||||
agentId: string
|
||||
patch: { name?: string; pinned?: boolean }
|
||||
}) => {
|
||||
if (!baseUrl || urlLoading) {
|
||||
throw new Error('BrowserOS agent server URL is not ready')
|
||||
}
|
||||
const data = await agentsFetch<{ agent: HarnessAgent }>(
|
||||
baseUrl,
|
||||
`/${encodeURIComponent(input.agentId)}`,
|
||||
{
|
||||
method: 'PATCH',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify(input.patch),
|
||||
},
|
||||
)
|
||||
return data.agent
|
||||
},
|
||||
onMutate: async ({ agentId, patch }) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === agentId ? { ...agent, ...patch } : agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
export function useDeleteHarnessAgent() {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
@@ -206,6 +264,8 @@ export interface HarnessActiveTurnInfo {
|
||||
lastSeq: number
|
||||
startedAt: number
|
||||
endedAt?: number
|
||||
/** User message that kicked off the turn; null when not captured. */
|
||||
prompt: string | null
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -260,3 +320,145 @@ export async function fetchHarnessAgentHistory(
|
||||
`/${encodeURIComponent(agentId)}/sessions/main/history`,
|
||||
)
|
||||
}
|
||||
|
||||
export interface EnqueueMessageInput {
|
||||
message: string
|
||||
attachments?: ReadonlyArray<unknown>
|
||||
}
|
||||
|
||||
export async function enqueueHarnessMessage(
|
||||
agentId: string,
|
||||
input: EnqueueMessageInput,
|
||||
): Promise<HarnessQueuedMessage> {
|
||||
const baseUrl = await getAgentServerUrl()
|
||||
const response = await fetch(
|
||||
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
message: input.message,
|
||||
...(input.attachments && input.attachments.length > 0
|
||||
? { attachments: input.attachments }
|
||||
: {}),
|
||||
}),
|
||||
},
|
||||
)
|
||||
if (!response.ok) {
|
||||
let message = `Request failed with status ${response.status}`
|
||||
try {
|
||||
const body = (await response.json()) as { error?: string }
|
||||
if (body.error) message = body.error
|
||||
} catch {}
|
||||
throw new Error(message)
|
||||
}
|
||||
const body = (await response.json()) as { queued: HarnessQueuedMessage }
|
||||
return body.queued
|
||||
}
|
||||
|
||||
export async function removeHarnessQueuedMessage(
|
||||
agentId: string,
|
||||
messageId: string,
|
||||
): Promise<{ removed: boolean }> {
|
||||
const baseUrl = await getAgentServerUrl()
|
||||
const response = await fetch(
|
||||
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
|
||||
messageId,
|
||||
)}`,
|
||||
{ method: 'DELETE' },
|
||||
)
|
||||
if (!response.ok) return { removed: false }
|
||||
return (await response.json()) as { removed: boolean }
|
||||
}
|
||||
|
||||
/**
|
||||
* Optimistic enqueue: writes the new queued message into the listing
|
||||
* cache immediately so the queue panel reflects the change without
|
||||
* waiting for the next poll. Rolls back if the server rejects.
|
||||
*/
|
||||
export function useEnqueueHarnessMessage() {
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
|
||||
enqueueHarnessMessage(input.agentId, input),
|
||||
onMutate: async (input) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
const optimistic: HarnessQueuedMessage = {
|
||||
id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
|
||||
createdAt: Date.now(),
|
||||
message: input.message,
|
||||
}
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === input.agentId
|
||||
? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
|
||||
: agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
|
||||
*/
|
||||
export function useRemoveHarnessQueuedMessage() {
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: { agentId: string; messageId: string }) =>
|
||||
removeHarnessQueuedMessage(input.agentId, input.messageId),
|
||||
onMutate: async (input) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === input.agentId
|
||||
? {
|
||||
...agent,
|
||||
queue: (agent.queue ?? []).filter(
|
||||
(entry) => entry.id !== input.messageId,
|
||||
),
|
||||
}
|
||||
: agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
@@ -59,15 +59,3 @@ export interface AgentConversation {
|
||||
createdAt: number
|
||||
updatedAt: number
|
||||
}
|
||||
|
||||
export interface AgentCardData {
|
||||
agentId: string
|
||||
name: string
|
||||
model?: string
|
||||
status: 'idle' | 'working' | 'error'
|
||||
lastMessage?: string
|
||||
lastMessageTimestamp?: number
|
||||
activitySummary?: string
|
||||
currentTool?: string
|
||||
costUsd?: number
|
||||
}
|
||||
|
||||
@@ -9,6 +9,7 @@
|
||||
"build": "bun run codegen && wxt build",
|
||||
"build:dev": "bun --env-file=.env.development wxt build --mode development",
|
||||
"zip": "wxt zip",
|
||||
"test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
|
||||
"compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
|
||||
"lint": "bunx biome check",
|
||||
"typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
|
||||
|
||||
@@ -38,8 +38,8 @@ browseros-cli install # downloads BrowserOS for your platform
|
||||
# If BrowserOS is installed but not running
|
||||
browseros-cli launch # opens BrowserOS, waits for server
|
||||
|
||||
# Configure the CLI (auto-discovers running BrowserOS)
|
||||
browseros-cli init --auto # detects server URL and saves config
|
||||
# Configure the CLI with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
|
||||
# Verify connection
|
||||
browseros-cli health
|
||||
@@ -52,7 +52,7 @@ browseros-cli init <url> # non-interactive — pass URL directly
|
||||
browseros-cli init # interactive — prompts for URL
|
||||
```
|
||||
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
|
||||
|
||||
### CLI updates
|
||||
|
||||
@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
|
||||
| `--debug` | `BOS_DEBUG=1` | Debug output |
|
||||
| `--timeout, -t` | | Request timeout (default: 2m) |
|
||||
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
|
||||
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
|
||||
|
||||
## Testing
|
||||
|
||||
@@ -179,7 +179,7 @@ apps/cli/
|
||||
│ └── config.go # Config file (~/.config/browseros-cli/config.yaml)
|
||||
├── cmd/
|
||||
│ ├── root.go # Root command, global flags
|
||||
│ ├── init.go # Server URL configuration (URL arg, --auto, interactive)
|
||||
│ ├── init.go # Server URL configuration (URL arg or interactive)
|
||||
│ ├── install.go # install (download BrowserOS for current platform)
|
||||
│ ├── launch.go # launch (find and start BrowserOS, wait for server)
|
||||
│ ├── open.go # open (new_page / new_hidden_page)
|
||||
|
||||
@@ -17,8 +17,6 @@ import (
|
||||
)
|
||||
|
||||
func init() {
|
||||
var autoDiscover bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "init [url]",
|
||||
Short: "Configure the BrowserOS server connection",
|
||||
@@ -34,9 +32,8 @@ You can provide the full URL or just the port number:
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
browseros-cli init 9000
|
||||
|
||||
Three modes:
|
||||
Modes:
|
||||
browseros-cli init <url> Non-interactive (full URL or port number)
|
||||
browseros-cli init --auto Auto-discover from ~/.browseros/server.json
|
||||
browseros-cli init Interactive prompt`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.MaximumNArgs(1),
|
||||
@@ -49,22 +46,9 @@ Three modes:
|
||||
|
||||
switch {
|
||||
case len(args) == 1:
|
||||
// Non-interactive: URL provided as argument
|
||||
input = args[0]
|
||||
|
||||
case autoDiscover:
|
||||
// Auto-discover: server.json → config → probe common ports
|
||||
discovered := probeRunningServer()
|
||||
if discovered == "" {
|
||||
output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
|
||||
" If not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", 1)
|
||||
}
|
||||
input = discovered
|
||||
fmt.Printf("Auto-discovered server at %s\n", input)
|
||||
|
||||
default:
|
||||
// Interactive prompt (original behavior)
|
||||
fmt.Println()
|
||||
bold.Println("BrowserOS CLI Setup")
|
||||
fmt.Println()
|
||||
@@ -95,12 +79,14 @@ Three modes:
|
||||
output.Errorf(1, "invalid URL: %s", input)
|
||||
}
|
||||
|
||||
// Verify connectivity
|
||||
fmt.Printf("Checking connection to %s ...\n", baseURL)
|
||||
client := &http.Client{Timeout: 5 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
|
||||
output.Errorf(1, "cannot connect to %s: %v\n\n"+
|
||||
"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
|
||||
"Then run: browseros-cli init <Server URL>\n"+
|
||||
"Example: browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
|
||||
}
|
||||
resp.Body.Close()
|
||||
|
||||
@@ -121,6 +107,5 @@ Three modes:
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
|
||||
rootCmd.AddCommand(cmd)
|
||||
}
|
||||
|
||||
@@ -28,7 +28,7 @@ Linux: Downloads AppImage (or .deb with --deb flag)
|
||||
|
||||
After installation:
|
||||
browseros-cli launch # start BrowserOS
|
||||
browseros-cli init --auto # configure the CLI`,
|
||||
browseros-cli init <url> # configure the CLI with the Server URL`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.NoArgs,
|
||||
Run: func(cmd *cobra.Command, args []string) {
|
||||
@@ -81,7 +81,7 @@ After installation:
|
||||
fmt.Println()
|
||||
bold.Println("Next steps:")
|
||||
dim.Println(" browseros-cli launch # start BrowserOS")
|
||||
dim.Println(" browseros-cli init --auto # configure the CLI")
|
||||
dim.Println(" browseros-cli init <url> # use the Server URL from BrowserOS settings")
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"os"
|
||||
@@ -38,6 +39,7 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
if url := probeRunningServer(); url != "" {
|
||||
green.Printf("BrowserOS is already running at %s\n", url)
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
return
|
||||
}
|
||||
|
||||
@@ -63,7 +65,7 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
green.Printf("BrowserOS is ready at %s\n", url)
|
||||
fmt.Println()
|
||||
dim.Println("Next: browseros-cli init --auto")
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
},
|
||||
}
|
||||
|
||||
@@ -75,39 +77,77 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
// Server probing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
// probeRunningServer checks server.json, config, and common ports for a running server.
|
||||
var commonBrowserOSPorts = []int{9100, 9200, 9300}
|
||||
|
||||
// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
|
||||
func probeRunningServer() string {
|
||||
check := func(baseURL string) bool {
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
|
||||
// 1. server.json — written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" && check(url) {
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
return url
|
||||
}
|
||||
|
||||
// 2. Saved config / env var
|
||||
if url := defaultServerURL(); url != "" && check(url) {
|
||||
if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
return url
|
||||
}
|
||||
|
||||
// 3. Probe common BrowserOS ports as last resort
|
||||
for _, port := range []int{9100, 9200, 9300} {
|
||||
return probeCommonServerPorts(client)
|
||||
}
|
||||
|
||||
func checkServerHealth(client *http.Client, baseURL string) bool {
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
|
||||
func probeCommonServerPorts(client *http.Client) string {
|
||||
for _, port := range commonBrowserOSPorts {
|
||||
url := fmt.Sprintf("http://127.0.0.1:%d", port)
|
||||
if check(url) {
|
||||
if checkServerHealth(client, url) {
|
||||
return url
|
||||
}
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
|
||||
//
|
||||
// Normal command resolution must not call this because it can override a URL the
|
||||
// user explicitly saved with `browseros-cli init <Server URL>`.
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func mcpEndpointURL(baseURL string) string {
|
||||
return strings.TrimSuffix(baseURL, "/") + "/mcp"
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Platform-native installation detection
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -117,7 +157,8 @@ func probeRunningServer() string {
|
||||
// macOS: `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
|
||||
// Linux: checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
|
||||
// Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
//
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
func isBrowserOSInstalled() bool {
|
||||
switch runtime.GOOS {
|
||||
case "darwin":
|
||||
@@ -271,14 +312,11 @@ func waitForServer(maxWait time.Duration) (string, bool) {
|
||||
|
||||
for time.Now().Before(deadline) {
|
||||
// server.json is written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
resp, err := client.Get(url + "/health")
|
||||
if err == nil {
|
||||
resp.Body.Close()
|
||||
if resp.StatusCode == 200 {
|
||||
return url, true
|
||||
}
|
||||
}
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
return url, true
|
||||
}
|
||||
if url := probeCommonServerPorts(client); url != "" {
|
||||
return url, true
|
||||
}
|
||||
fmt.Print(".")
|
||||
time.Sleep(1 * time.Second)
|
||||
|
||||
99
packages/browseros-agent/apps/cli/cmd/launch_test.go
Normal file
99
packages/browseros-agent/apps/cli/cmd/launch_test.go
Normal file
@@ -0,0 +1,99 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"net/url"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
discoveredServer := newHealthyServer(t)
|
||||
configServer := newHealthyServer(t)
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := probeRunningServer()
|
||||
if got != normalizeServerURL(discoveredServer.URL) {
|
||||
t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
|
||||
server := newHealthyServer(t)
|
||||
port := serverPort(t, server.URL)
|
||||
|
||||
originalPorts := commonBrowserOSPorts
|
||||
commonBrowserOSPorts = []int{port}
|
||||
t.Cleanup(func() {
|
||||
commonBrowserOSPorts = originalPorts
|
||||
})
|
||||
|
||||
got, ok := waitForServer(100 * time.Millisecond)
|
||||
if !ok {
|
||||
t.Fatal("waitForServer() ok = false, want true")
|
||||
}
|
||||
if got != normalizeServerURL(server.URL) {
|
||||
t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func newHealthyServer(t *testing.T) *httptest.Server {
|
||||
t.Helper()
|
||||
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if r.URL.Path != "/health" {
|
||||
http.NotFound(w, r)
|
||||
return
|
||||
}
|
||||
w.WriteHeader(http.StatusOK)
|
||||
}))
|
||||
t.Cleanup(server.Close)
|
||||
return server
|
||||
}
|
||||
|
||||
func serverPort(t *testing.T, rawURL string) int {
|
||||
t.Helper()
|
||||
|
||||
parsed, err := url.Parse(rawURL)
|
||||
if err != nil {
|
||||
t.Fatalf("url.Parse() error = %v", err)
|
||||
}
|
||||
_, portText, err := net.SplitHostPort(parsed.Host)
|
||||
if err != nil {
|
||||
t.Fatalf("net.SplitHostPort() error = %v", err)
|
||||
}
|
||||
port, err := strconv.Atoi(portText)
|
||||
if err != nil {
|
||||
t.Fatalf("strconv.Atoi() error = %v", err)
|
||||
}
|
||||
return port
|
||||
}
|
||||
@@ -2,10 +2,8 @@ package cmd
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -289,18 +287,15 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
|
||||
}
|
||||
}
|
||||
|
||||
// defaultServerURL returns the implicit target from user-controlled settings only.
|
||||
//
|
||||
// BrowserOS writes a discovery file at runtime, but normal commands intentionally
|
||||
// ignore it so a saved URL is not silently overridden by another running server.
|
||||
func defaultServerURL() string {
|
||||
// 1. Explicit env var always wins
|
||||
if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
|
||||
return env
|
||||
}
|
||||
|
||||
// 2. Live discovery file from running BrowserOS (most current)
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
return url
|
||||
}
|
||||
|
||||
// 3. Saved config (may be stale if port changed)
|
||||
cfg, err := config.Load()
|
||||
if err == nil {
|
||||
if url := normalizeServerURL(cfg.ServerURL); url != "" {
|
||||
@@ -311,33 +306,6 @@ func defaultServerURL() string {
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func normalizeServerURL(raw string) string {
|
||||
normalized := strings.TrimSpace(raw)
|
||||
|
||||
@@ -369,8 +337,10 @@ func validateServerURL(raw string) (string, error) {
|
||||
|
||||
return "", fmt.Errorf(
|
||||
"BrowserOS server URL is not configured.\n\n" +
|
||||
" If BrowserOS is running: browseros-cli init --auto\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
)
|
||||
}
|
||||
|
||||
@@ -1,8 +1,13 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestSetVersionUpdatesRootCommand(t *testing.T) {
|
||||
@@ -100,6 +105,76 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(`{"url":"http://127.0.0.1:9999"}`)
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
|
||||
if got := defaultServerURL(); got != "" {
|
||||
t.Fatalf("defaultServerURL() = %q, want empty", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
|
||||
got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateServerURLExplainsManualInit(t *testing.T) {
|
||||
_, err := validateServerURL("")
|
||||
if err == nil {
|
||||
t.Fatal("validateServerURL() error = nil, want setup instructions")
|
||||
}
|
||||
msg := err.Error()
|
||||
if !strings.Contains(msg, "browseros-cli init <Server URL>") {
|
||||
t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
|
||||
}
|
||||
if strings.Contains(msg, "init --auto") {
|
||||
t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
|
||||
done := make(chan struct{})
|
||||
returned := make(chan struct{})
|
||||
|
||||
@@ -44,10 +44,7 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {
|
||||
|
||||
session, err := sdkClient.Connect(ctx, transport, nil)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
}
|
||||
return session, nil
|
||||
}
|
||||
@@ -187,10 +184,7 @@ func (c *Client) Status() (map[string]any, error) {
|
||||
func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
resp, err := c.HTTPClient.Get(c.BaseURL + path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
@@ -205,3 +199,14 @@ func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
}
|
||||
return data, nil
|
||||
}
|
||||
|
||||
// connectionSetupInstructions explains how to recover from a stale or missing server URL.
|
||||
func connectionSetupInstructions() string {
|
||||
return "\n\n" +
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" Run once with: browseros-cli --server <Server URL> health\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install"
|
||||
}
|
||||
|
||||
@@ -31,8 +31,8 @@ browseros-cli install
|
||||
# Start BrowserOS
|
||||
browseros-cli launch
|
||||
|
||||
# Auto-configure MCP settings for your AI tools
|
||||
browseros-cli init --auto
|
||||
# Configure MCP settings with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
|
||||
# Verify everything is working
|
||||
browseros-cli health
|
||||
|
||||
51
packages/browseros-agent/apps/eval/.env.example
vendored
Normal file
51
packages/browseros-agent/apps/eval/.env.example
vendored
Normal file
@@ -0,0 +1,51 @@
|
||||
# Copy to .env.development for local eval runs.
|
||||
|
||||
# Provider keys used by existing config files.
|
||||
OPENROUTER_API_KEY=
|
||||
FIREWORKS_API_KEY=
|
||||
ANTHROPIC_API_KEY=
|
||||
OPENAI_API_KEY=
|
||||
GOOGLE_GENERATIVE_AI_API_KEY=
|
||||
|
||||
# Claude Agent SDK token used by performance_grader.
|
||||
CLAUDE_CODE_OAUTH_TOKEN=
|
||||
|
||||
# Suite-mode model selection.
|
||||
EVAL_VARIANT=local
|
||||
EVAL_AGENT_PROVIDER=openai-compatible
|
||||
EVAL_AGENT_MODEL=
|
||||
EVAL_AGENT_API_KEY=
|
||||
EVAL_AGENT_BASE_URL=
|
||||
EVAL_AGENT_SUPPORTS_IMAGES=true
|
||||
|
||||
# Optional suite-mode executor override for orchestrator suites.
|
||||
EVAL_EXECUTOR_MODEL=
|
||||
EVAL_EXECUTOR_API_KEY=
|
||||
EVAL_EXECUTOR_BASE_URL=
|
||||
|
||||
# Clado visual action executor.
|
||||
CLADO_ACTION_MODEL=
|
||||
CLADO_ACTION_API_KEY=
|
||||
CLADO_ACTION_BASE_URL=
|
||||
# Backward-compatible alias used by older local scripts.
|
||||
CLADO_ACTION_URL=
|
||||
|
||||
# BrowserOS runner.
|
||||
BROWSEROS_BINARY=/Applications/BrowserOS.app/Contents/MacOS/BrowserOS
|
||||
BROWSEROS_SERVER_URL=http://127.0.0.1:9110
|
||||
BROWSEROS_SERVER_LOG_DIR=/tmp/browseros-server-logs
|
||||
BROWSEROS_CONFIG_URL=
|
||||
|
||||
# Captcha solver extension.
|
||||
NOPECHA_API_KEY=
|
||||
|
||||
# WebArena-Infinity.
|
||||
WEBARENA_INFINITY_DIR=
|
||||
INFINITY_APP_URL=
|
||||
|
||||
# R2 publishing and weekly report.
|
||||
EVAL_R2_ACCOUNT_ID=
|
||||
EVAL_R2_ACCESS_KEY_ID=
|
||||
EVAL_R2_SECRET_ACCESS_KEY=
|
||||
EVAL_R2_BUCKET=browseros-eval
|
||||
EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
|
||||
137
packages/browseros-agent/apps/eval/README.md
vendored
137
packages/browseros-agent/apps/eval/README.md
vendored
@@ -9,11 +9,13 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
|
||||
- **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
|
||||
- **Bun** runtime
|
||||
- **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
|
||||
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd apps/eval
|
||||
cp .env.example .env.development
|
||||
# Edit .env.development with your keys, then:
|
||||
bun run eval
|
||||
```
|
||||
@@ -23,17 +25,62 @@ Opens the eval dashboard at `http://localhost:9900` in config mode. From there:
|
||||
### CLI mode
|
||||
|
||||
```bash
|
||||
bun run eval -c configs/browseros-agent-weekly.json
|
||||
bun run eval -c configs/legacy/browseros-agent-weekly.json
|
||||
bun run eval suite --config configs/legacy/browseros-agent-weekly.json --publish r2
|
||||
```
|
||||
|
||||
Runs immediately. Dashboard still available at `http://localhost:9900` for live progress.
|
||||
|
||||
The `suite` command is the workflow-compatible full loop: execute tasks, run graders, write artifacts, and optionally publish to R2. The old `-c` form remains supported during migration.
|
||||
|
||||
```bash
|
||||
bun run eval run --config configs/legacy/browseros-agent-weekly.json
|
||||
bun run eval suite --suite configs/suites/agisdk-daily-10.json --variant kimi-fireworks --publish r2
|
||||
bun run eval grade --run results/browseros-agent-weekly/2026-04-29-1430
|
||||
bun run eval publish --run results/browseros-agent-weekly/2026-04-29-1430 --target r2
|
||||
```
|
||||
|
||||
Config files live in two groups:
|
||||
|
||||
```txt
|
||||
configs/legacy/ # Complete EvalConfig files used by older workflows and the dashboard
|
||||
configs/suites/ # Suite definitions; model/provider comes from CLI flags or env
|
||||
```
|
||||
|
||||
Suite mode takes model settings from CLI flags first, then env:
|
||||
|
||||
```bash
|
||||
EVAL_VARIANT=kimi-fireworks \
|
||||
EVAL_AGENT_PROVIDER=openai-compatible \
|
||||
EVAL_AGENT_MODEL=accounts/fireworks/models/kimi-k2p5 \
|
||||
EVAL_AGENT_API_KEY=$FIREWORKS_API_KEY \
|
||||
EVAL_AGENT_BASE_URL=https://api.fireworks.ai/inference/v1 \
|
||||
bun run eval suite --suite configs/suites/agisdk-daily-10.json --publish r2
|
||||
```
|
||||
|
||||
### Suites and variants
|
||||
|
||||
A **suite** is what we run: the task dataset, graders, worker count, timeout, and browser settings. For example, `agisdk-daily-10` means "run these 10 AGI SDK tasks and grade them with `agisdk_state_diff`."
|
||||
|
||||
A **variant** is the model setup we are testing on that suite. `EVAL_VARIANT` is just the human-readable name for that setup. The actual model connection still comes from `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, and `EVAL_AGENT_BASE_URL`.
|
||||
|
||||
This lets us run the same suite against multiple model setups without copying the benchmark config:
|
||||
|
||||
```txt
|
||||
agisdk-daily-10 + kimi-fireworks
|
||||
agisdk-daily-10 + claude-opus
|
||||
agisdk-daily-10 + clado-action-000159
|
||||
```
|
||||
|
||||
For `orchestrator-executor` suites, there can also be an executor model/backend. The `EVAL_AGENT_*` vars describe the main agent or orchestrator. The optional `EVAL_EXECUTOR_*` or `CLADO_ACTION_*` vars describe the delegated executor.
|
||||
|
||||
## Agent types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
|
||||
| `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
|
||||
| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |
|
||||
|
||||
### Single agent
|
||||
|
||||
@@ -74,6 +121,24 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Code
|
||||
|
||||
Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
|
||||
bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
|
||||
```
|
||||
|
||||
## Graders
|
||||
|
||||
| Name | Description |
|
||||
@@ -96,6 +161,21 @@ The `apiKey` field supports two formats:
|
||||
- **Env var name**: `"OPENAI_API_KEY"` — resolved from `.env.development` at runtime
|
||||
- **Direct value**: `"sk-xxxxx"` — used as-is (not recommended)
|
||||
|
||||
### Environment variables
|
||||
|
||||
| Variable | Used for |
|
||||
|----------|----------|
|
||||
| `EVAL_AGENT_PROVIDER`, `EVAL_AGENT_MODEL`, `EVAL_AGENT_API_KEY`, `EVAL_AGENT_BASE_URL`, `EVAL_AGENT_SUPPORTS_IMAGES` | Suite variant model selection |
|
||||
| `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, provider-specific keys | Config-file or provider-backed model calls |
|
||||
| `EVAL_EXECUTOR_MODEL`, `EVAL_EXECUTOR_API_KEY`, `EVAL_EXECUTOR_BASE_URL` | Suite-mode orchestrator executor override |
|
||||
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
|
||||
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
|
||||
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
|
||||
| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
|
||||
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
|
||||
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
|
||||
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
|
||||
|
||||
### Supported providers
|
||||
|
||||
| Provider | `provider` value | Requires `baseUrl` |
|
||||
@@ -110,6 +190,22 @@ The `apiKey` field supports two formats:
|
||||
| Ollama | `ollama` | No |
|
||||
| Clado Action (executor only) | `clado-action` | Yes |
|
||||
|
||||
### R2 publishing
|
||||
|
||||
`suite --config ... --publish r2` and `publish --target r2` upload the run artifacts plus `viewer.html` to the viewer-compatible R2 layout:
|
||||
|
||||
```bash
|
||||
export EVAL_R2_ACCOUNT_ID=...
|
||||
export EVAL_R2_ACCESS_KEY_ID=...
|
||||
export EVAL_R2_SECRET_ACCESS_KEY=...
|
||||
export EVAL_R2_BUCKET=browseros-eval
|
||||
export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
|
||||
```
|
||||
|
||||
`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
|
||||
|
||||
Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
|
||||
### BrowserOS infrastructure
|
||||
|
||||
```json
|
||||
@@ -119,7 +215,7 @@ The `apiKey` field supports two formats:
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": true
|
||||
"headless": false
|
||||
}
|
||||
```
|
||||
|
||||
@@ -137,10 +233,12 @@ Each worker gets its own Chrome instance. Worker N uses `base_port + N` for CDP
|
||||
|
||||
| File | Tasks | Description |
|
||||
|------|-------|-------------|
|
||||
| `agisdk-daily-10.jsonl` | 10 | Daily AGI SDK / REAL Bench subset |
|
||||
| `webvoyager.jsonl` | 643 | Full WebVoyager benchmark |
|
||||
| `mind2web.jsonl` | 300 | Online-Mind2Web |
|
||||
| `webbench-{0,1,2}of4-50.jsonl` | 50 each | WebBench shards (50-task subsets) |
|
||||
| `agisdk-real.jsonl` | 40 | AGI SDK / REAL Bench (action-only tasks) |
|
||||
| `agisdk-real-smoke.jsonl` | 1 | AGI SDK / REAL Bench smoke task |
|
||||
| `agisdk-real.jsonl` | 36 | AGI SDK / REAL Bench (action-only tasks) |
|
||||
| `webarena-infinity-hard-50.jsonl` | 50 | WebArena-Infinity hard set |
|
||||
| `browsecomp-medium-hard-50.jsonl` | 50 | BrowseComp medium-hard |
|
||||
| `browsecomp-very-hard-50.jsonl` | 50 | BrowseComp very-hard |
|
||||
@@ -167,14 +265,47 @@ results/
|
||||
browseros-agent-weekly/
|
||||
2026-04-29-1430/
|
||||
Amazon--0/
|
||||
attempt.json # Stable attempt summary for viewer/reporting
|
||||
metadata.json # Task result, timing, grader scores
|
||||
grades.json # Compact grader results
|
||||
messages.jsonl # Full message log
|
||||
grader-artifacts/ # Grader-specific inputs/outputs/stderr
|
||||
screenshots/
|
||||
001.png # Step-by-step screenshots
|
||||
002.png
|
||||
summary.json # Aggregate pass rates
|
||||
```
|
||||
|
||||
R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
|
||||
### R2 viewer manifest
|
||||
|
||||
`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 2,
|
||||
"runId": "agisdk-real-smoke-2026-04-30-0000",
|
||||
"tasks": [
|
||||
{
|
||||
"queryId": "agisdk-dashdish-10",
|
||||
"paths": {
|
||||
"metadata": "tasks/agisdk-dashdish-10/metadata.json",
|
||||
"messages": "tasks/agisdk-dashdish-10/messages.jsonl",
|
||||
"grades": "tasks/agisdk-dashdish-10/grades.json",
|
||||
"trace": "tasks/agisdk-dashdish-10/trace.jsonl",
|
||||
"screenshots": "tasks/agisdk-dashdish-10/screenshots",
|
||||
"graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
|
||||
|
||||
Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**BrowserOS not found**: Expects `/Applications/BrowserOS.app/Contents/MacOS/BrowserOS`. Set `BROWSEROS_BINARY` to override.
|
||||
|
||||
26
packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
vendored
Normal file
26
packages/browseros-agent/apps/eval/configs/legacy/agisdk-real-smoke.json
vendored
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "single",
|
||||
"provider": "openai-compatible",
|
||||
"model": "moonshotai/kimi-k2.5",
|
||||
"apiKey": "OPENROUTER_API_KEY",
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../../data/agisdk-real-smoke.jsonl",
|
||||
"num_workers": 1,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
"baseUrl": "https://api.fireworks.ai/inference/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../data/agisdk-real.jsonl",
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 4,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -7,7 +7,7 @@
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../data/webbench-2of4-50.jsonl",
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -21,6 +21,6 @@
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["performance_grader"],
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
"baseUrl": "https://api.fireworks.ai/inference/v1"
|
||||
}
|
||||
},
|
||||
"dataset": "../data/webbench-2of4-50.jsonl",
|
||||
"dataset": "../../data/webbench-2of4-50.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -14,7 +14,7 @@
|
||||
"baseUrl": "https://clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef.modal.run"
|
||||
}
|
||||
},
|
||||
"dataset": "../data/agisdk-real.jsonl",
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -23,7 +23,7 @@
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": true
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
22
packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
vendored
Normal file
22
packages/browseros-agent/apps/eval/configs/legacy/claude-code-agisdk-real.json
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus"
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 1,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../data/webarena-infinity-hard-50.jsonl",
|
||||
"dataset": "../../data/webarena-infinity-hard-50.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -5,7 +5,7 @@
|
||||
"model": "openai/gpt-4.1",
|
||||
"apiKey": "OPENROUTER_API_KEY"
|
||||
},
|
||||
"dataset": "../data/mind2web.jsonl",
|
||||
"dataset": "../../data/mind2web.jsonl",
|
||||
"num_workers": 5,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -7,7 +7,7 @@
|
||||
"baseUrl": "https://api.fireworks.ai/inference/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../data/webvoyager.jsonl",
|
||||
"dataset": "../../data/webvoyager.jsonl",
|
||||
"num_workers": 3,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
vendored
Normal file
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-daily-10.json
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "agisdk-daily-10",
|
||||
"dataset": "../../data/agisdk-daily-10.jsonl",
|
||||
"agent": {
|
||||
"type": "single"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"workers": 1,
|
||||
"restartBrowserPerTask": true,
|
||||
"timeoutMs": 1800000,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
}
|
||||
}
|
||||
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
vendored
Normal file
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-real-smoke.json
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "agisdk-real-smoke",
|
||||
"dataset": "../../data/agisdk-real-smoke.jsonl",
|
||||
"agent": {
|
||||
"type": "single"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"workers": 1,
|
||||
"restartBrowserPerTask": true,
|
||||
"timeoutMs": 1800000,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
}
|
||||
}
|
||||
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
vendored
Normal file
22
packages/browseros-agent/apps/eval/configs/suites/agisdk-real.json
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "agisdk-real",
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"agent": {
|
||||
"type": "single"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"workers": 1,
|
||||
"restartBrowserPerTask": true,
|
||||
"timeoutMs": 1800000,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
}
|
||||
}
|
||||
10
packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
vendored
Normal file
10
packages/browseros-agent/apps/eval/data/agisdk-daily-10.jsonl
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
|
||||
{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/30, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
|
||||
{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
|
||||
{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
|
||||
{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
|
||||
{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
|
||||
{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
|
||||
{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
|
||||
{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
|
||||
{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}
|
||||
1
packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
vendored
Normal file
1
packages/browseros-agent/apps/eval/data/agisdk-real-smoke.jsonl
vendored
Normal file
@@ -0,0 +1 @@
|
||||
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
|
||||
@@ -5,6 +5,7 @@
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"eval": "bun --env-file=.env.development run src/index.ts",
|
||||
"test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
|
||||
"typecheck": "tsc --noEmit"
|
||||
},
|
||||
"dependencies": {
|
||||
|
||||
@@ -1,349 +1,43 @@
|
||||
#!/usr/bin/env bun
|
||||
|
||||
/**
|
||||
* Upload eval runs to R2.
|
||||
*
|
||||
* Two modes:
|
||||
* bun scripts/upload-run.ts results/browseros-agent-weekly/2026-03-21-1730
|
||||
* → uploads that specific run
|
||||
*
|
||||
* bun scripts/upload-run.ts results/browseros-agent-weekly
|
||||
* → finds all timestamped subfolders, uploads any not yet in R2
|
||||
*
|
||||
* Env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY
|
||||
* EVAL_R2_BUCKET (default: browseros-eval)
|
||||
* EVAL_R2_CDN_BASE_URL (default: https://eval.browseros.com)
|
||||
*/
|
||||
|
||||
import { readdir, readFile, stat } from 'node:fs/promises'
|
||||
import { basename, dirname, extname, join } from 'node:path'
|
||||
import {
|
||||
GetObjectCommand,
|
||||
PutObjectCommand,
|
||||
S3Client,
|
||||
} from '@aws-sdk/client-s3'
|
||||
loadR2ConfigFromEnv,
|
||||
R2Publisher,
|
||||
} from '../src/publishing/r2-publisher'
|
||||
|
||||
const CONCURRENCY = 20
|
||||
|
||||
const CONTENT_TYPES: Record<string, string> = {
|
||||
'.json': 'application/json',
|
||||
'.jsonl': 'application/x-ndjson',
|
||||
'.png': 'image/png',
|
||||
}
|
||||
|
||||
interface R2Config {
|
||||
accountId: string
|
||||
accessKeyId: string
|
||||
secretAccessKey: string
|
||||
bucket: string
|
||||
cdnBaseUrl: string
|
||||
}
|
||||
|
||||
function loadConfig(): R2Config {
|
||||
const accountId = process.env.EVAL_R2_ACCOUNT_ID
|
||||
const accessKeyId = process.env.EVAL_R2_ACCESS_KEY_ID
|
||||
const secretAccessKey = process.env.EVAL_R2_SECRET_ACCESS_KEY
|
||||
|
||||
if (!accountId || !accessKeyId || !secretAccessKey) {
|
||||
console.error(
|
||||
'Missing required env vars: EVAL_R2_ACCOUNT_ID, EVAL_R2_ACCESS_KEY_ID, EVAL_R2_SECRET_ACCESS_KEY',
|
||||
)
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
return {
|
||||
accountId,
|
||||
accessKeyId,
|
||||
secretAccessKey,
|
||||
bucket: process.env.EVAL_R2_BUCKET || 'browseros-eval',
|
||||
cdnBaseUrl: (
|
||||
process.env.EVAL_R2_CDN_BASE_URL || 'https://eval.browseros.com'
|
||||
).replace(/\/+$/, ''),
|
||||
}
|
||||
}
|
||||
|
||||
function createClient(config: R2Config): S3Client {
|
||||
return new S3Client({
|
||||
region: 'auto',
|
||||
endpoint: `https://${config.accountId}.r2.cloudflarestorage.com`,
|
||||
credentials: {
|
||||
accessKeyId: config.accessKeyId,
|
||||
secretAccessKey: config.secretAccessKey,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
async function upload(
|
||||
client: S3Client,
|
||||
bucket: string,
|
||||
key: string,
|
||||
body: Buffer,
|
||||
contentType: string,
|
||||
) {
|
||||
await client.send(
|
||||
new PutObjectCommand({
|
||||
Bucket: bucket,
|
||||
Key: key,
|
||||
Body: body,
|
||||
ContentType: contentType,
|
||||
}),
|
||||
)
|
||||
}
|
||||
|
||||
async function collectFiles(dir: string): Promise<string[]> {
|
||||
const files: string[] = []
|
||||
const entries = await readdir(dir, { withFileTypes: true })
|
||||
for (const entry of entries) {
|
||||
const full = join(dir, entry.name)
|
||||
if (entry.isDirectory()) {
|
||||
files.push(...(await collectFiles(full)))
|
||||
} else {
|
||||
files.push(full)
|
||||
}
|
||||
}
|
||||
return files
|
||||
}
|
||||
|
||||
async function runPool<T>(
|
||||
items: T[],
|
||||
concurrency: number,
|
||||
fn: (item: T) => Promise<void>,
|
||||
) {
|
||||
let i = 0
|
||||
const workers = Array.from({ length: concurrency }, async () => {
|
||||
while (i < items.length) {
|
||||
const idx = i++
|
||||
await fn(items[idx])
|
||||
}
|
||||
})
|
||||
await Promise.all(workers)
|
||||
}
|
||||
|
||||
// Check if a run has already been uploaded to R2
|
||||
async function isUploaded(
|
||||
client: S3Client,
|
||||
bucket: string,
|
||||
runId: string,
|
||||
): Promise<boolean> {
|
||||
try {
|
||||
await client.send(
|
||||
new GetObjectCommand({
|
||||
Bucket: bucket,
|
||||
Key: `runs/${runId}/manifest.json`,
|
||||
}),
|
||||
)
|
||||
return true
|
||||
} catch {
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
// Detect if a directory is a run dir (has task subdirs with metadata.json)
|
||||
// vs a config dir (has timestamped subdirs like 2026-03-21-1730/)
|
||||
async function isRunDir(dir: string): Promise<boolean> {
|
||||
const entries = await readdir(dir, { withFileTypes: true })
|
||||
const subdirs = entries.filter((e) => e.isDirectory())
|
||||
for (const subdir of subdirs) {
|
||||
const metaPath = join(dir, subdir.name, 'metadata.json')
|
||||
const metaStat = await stat(metaPath).catch(() => null)
|
||||
if (metaStat?.isFile()) return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
async function uploadSingleRun(
|
||||
runDir: string,
|
||||
runId: string,
|
||||
r2Config: R2Config,
|
||||
client: S3Client,
|
||||
): Promise<void> {
|
||||
const taskDirs = await readdir(runDir, { withFileTypes: true })
|
||||
const taskEntries = taskDirs.filter((d) => d.isDirectory())
|
||||
|
||||
if (taskEntries.length === 0) {
|
||||
console.warn(` No task subdirectories in ${runId}, skipping`)
|
||||
return
|
||||
}
|
||||
|
||||
const manifestTasks: Record<string, unknown>[] = []
|
||||
const jobs: { key: string; filePath: string; contentType: string }[] = []
|
||||
|
||||
// Extract agent config from first task
|
||||
let agentConfig: Record<string, unknown> | undefined
|
||||
let dataset: string | undefined
|
||||
|
||||
for (const taskDir of taskEntries) {
|
||||
const taskId = taskDir.name
|
||||
const taskPath = join(runDir, taskId)
|
||||
const metaPath = join(taskPath, 'metadata.json')
|
||||
|
||||
let meta: Record<string, unknown> = {}
|
||||
try {
|
||||
meta = JSON.parse(await readFile(metaPath, 'utf-8'))
|
||||
} catch {
|
||||
continue
|
||||
}
|
||||
|
||||
if (!agentConfig && meta.agent_config)
|
||||
agentConfig = meta.agent_config as Record<string, unknown>
|
||||
if (!dataset && meta.dataset) dataset = meta.dataset as string
|
||||
|
||||
const files = await collectFiles(taskPath)
|
||||
let screenshotCount = 0
|
||||
|
||||
for (const file of files) {
|
||||
const relative = file.slice(taskPath.length + 1)
|
||||
const ext = extname(file)
|
||||
if (relative.startsWith('screenshots/') && ext === '.png')
|
||||
screenshotCount++
|
||||
|
||||
jobs.push({
|
||||
key: `runs/${runId}/${taskId}/${relative}`,
|
||||
filePath: file,
|
||||
contentType: CONTENT_TYPES[ext] || 'application/octet-stream',
|
||||
})
|
||||
}
|
||||
|
||||
manifestTasks.push({
|
||||
queryId: meta.query_id || taskId,
|
||||
query: meta.query || '',
|
||||
startUrl: meta.start_url || '',
|
||||
status:
|
||||
meta.termination_reason === 'completed'
|
||||
? 'completed'
|
||||
: meta.termination_reason || 'unknown',
|
||||
durationMs: meta.total_duration_ms || 0,
|
||||
screenshotCount: (meta.screenshot_count as number) || screenshotCount,
|
||||
graderResults: meta.grader_results || {},
|
||||
})
|
||||
}
|
||||
|
||||
if (manifestTasks.length === 0) {
|
||||
console.warn(` No completed tasks in ${runId}, skipping`)
|
||||
return
|
||||
}
|
||||
|
||||
console.log(
|
||||
` Uploading ${jobs.length} files across ${manifestTasks.length} tasks...`,
|
||||
)
|
||||
|
||||
let uploaded = 0
|
||||
await runPool(jobs, CONCURRENCY, async (job) => {
|
||||
const body = await readFile(job.filePath)
|
||||
await upload(client, r2Config.bucket, job.key, body, job.contentType)
|
||||
uploaded++
|
||||
if (uploaded % 50 === 0 || uploaded === jobs.length) {
|
||||
console.log(` ${uploaded}/${jobs.length}`)
|
||||
}
|
||||
})
|
||||
|
||||
// Read summary.json if it exists
|
||||
let summaryData: Record<string, unknown> | undefined
|
||||
try {
|
||||
summaryData = JSON.parse(
|
||||
await readFile(join(runDir, 'summary.json'), 'utf-8'),
|
||||
)
|
||||
} catch {}
|
||||
|
||||
// Upload manifest
|
||||
const manifest = {
|
||||
runId,
|
||||
uploadedAt: new Date().toISOString(),
|
||||
agentConfig,
|
||||
dataset,
|
||||
summary: summaryData
|
||||
? {
|
||||
passRate: summaryData.passRate,
|
||||
avgDurationMs: summaryData.avgDurationMs,
|
||||
}
|
||||
: undefined,
|
||||
tasks: manifestTasks,
|
||||
}
|
||||
const manifestBody = Buffer.from(JSON.stringify(manifest, null, 2))
|
||||
await upload(
|
||||
client,
|
||||
r2Config.bucket,
|
||||
`runs/${runId}/manifest.json`,
|
||||
manifestBody,
|
||||
'application/json',
|
||||
)
|
||||
|
||||
// Upload viewer.html to bucket root
|
||||
const viewerPath = join(
|
||||
import.meta.dir,
|
||||
'..',
|
||||
'src',
|
||||
'dashboard',
|
||||
'viewer.html',
|
||||
)
|
||||
const viewerBody = await readFile(viewerPath)
|
||||
await upload(client, r2Config.bucket, 'viewer.html', viewerBody, 'text/html')
|
||||
|
||||
console.log(` Uploaded ${uploaded + 2} files`)
|
||||
console.log(` ${r2Config.cdnBaseUrl}/viewer.html?run=${runId}`)
|
||||
}
|
||||
|
||||
async function main() {
|
||||
async function main(): Promise<void> {
|
||||
const inputDir = process.argv[2]
|
||||
if (!inputDir) {
|
||||
console.error(
|
||||
throw new Error(
|
||||
'Usage:\n' +
|
||||
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730 (specific run)\n' +
|
||||
' bun scripts/upload-run.ts results/config-name (all un-uploaded runs)',
|
||||
)
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
const dirStat = await stat(inputDir).catch(() => null)
|
||||
if (!dirStat?.isDirectory()) {
|
||||
console.error(`Not a directory: ${inputDir}`)
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
const r2Config = loadConfig()
|
||||
const client = createClient(r2Config)
|
||||
|
||||
if (await isRunDir(inputDir)) {
|
||||
// Single run: results/config-name/2026-03-21-1730
|
||||
const timestamp = basename(inputDir)
|
||||
const configName = basename(dirname(inputDir))
|
||||
const runId = `${configName}-${timestamp}`
|
||||
console.log(`Uploading run: ${runId}`)
|
||||
await uploadSingleRun(inputDir, runId, r2Config, client)
|
||||
} else {
|
||||
// Config dir: results/config-name/ — upload all un-uploaded runs
|
||||
const configName = basename(inputDir)
|
||||
const entries = await readdir(inputDir, { withFileTypes: true })
|
||||
const runDirs = entries
|
||||
.filter((e) => e.isDirectory())
|
||||
.map((e) => e.name)
|
||||
.sort()
|
||||
|
||||
if (runDirs.length === 0) {
|
||||
console.error('No run subdirectories found')
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
console.log(
|
||||
`Found ${runDirs.length} runs for config "${configName}", checking R2...`,
|
||||
)
|
||||
|
||||
let uploadedCount = 0
|
||||
for (const dir of runDirs) {
|
||||
const runId = `${configName}-${dir}`
|
||||
const alreadyUploaded = await isUploaded(client, r2Config.bucket, runId)
|
||||
if (alreadyUploaded) {
|
||||
console.log(` ${runId}: already uploaded, skipping`)
|
||||
continue
|
||||
}
|
||||
|
||||
console.log(` ${runId}: uploading...`)
|
||||
await uploadSingleRun(join(inputDir, dir), runId, r2Config, client)
|
||||
uploadedCount++
|
||||
}
|
||||
|
||||
console.log(
|
||||
`\nDone. Uploaded ${uploadedCount} new run(s), ${runDirs.length - uploadedCount} already in R2.`,
|
||||
' bun scripts/upload-run.ts results/config-name/2026-03-21-1730\n' +
|
||||
' bun scripts/upload-run.ts results/config-name',
|
||||
)
|
||||
}
|
||||
|
||||
const publisher = new R2Publisher({ config: loadR2ConfigFromEnv() })
|
||||
const result = await publisher.publishPath(inputDir)
|
||||
for (const run of result.uploadedRuns) {
|
||||
console.log(`Uploaded ${run.uploadedFiles} files for ${run.runId}`)
|
||||
console.log(run.viewerUrl)
|
||||
}
|
||||
for (const runId of result.skippedRuns) {
|
||||
console.log(`${runId}: already uploaded, skipping`)
|
||||
}
|
||||
console.log(
|
||||
`Done. Uploaded ${result.uploadedRuns.length} run(s), skipped ${result.skippedRuns.length}.`,
|
||||
)
|
||||
}
|
||||
|
||||
main()
|
||||
main().catch((error) => {
|
||||
console.error(error instanceof Error ? error.message : String(error))
|
||||
process.exit(1)
|
||||
})
|
||||
|
||||
@@ -24,45 +24,11 @@ import {
|
||||
PutObjectCommand,
|
||||
S3Client,
|
||||
} from '@aws-sdk/client-s3'
|
||||
|
||||
interface ManifestTask {
|
||||
queryId: string
|
||||
query: string
|
||||
status: string
|
||||
durationMs: number
|
||||
screenshotCount: number
|
||||
graderResults: Record<string, { pass: boolean; score: number }>
|
||||
}
|
||||
|
||||
interface Manifest {
|
||||
runId: string
|
||||
uploadedAt: string
|
||||
agentConfig?: { type?: string; model?: string }
|
||||
dataset?: string
|
||||
summary?: { passRate?: number; avgDurationMs?: number }
|
||||
tasks: ManifestTask[]
|
||||
}
|
||||
|
||||
interface RunSummary {
|
||||
runId: string
|
||||
configName: string
|
||||
date: string
|
||||
avgScore: number
|
||||
total: number
|
||||
completed: number
|
||||
failed: number
|
||||
timeout: number
|
||||
avgDurationMs: number
|
||||
model: string
|
||||
dataset: string
|
||||
agentType: string
|
||||
}
|
||||
|
||||
const PASS_FAIL_GRADER_ORDER = [
|
||||
'agisdk_state_diff',
|
||||
'infinity_state',
|
||||
'performance_grader',
|
||||
]
|
||||
import {
|
||||
buildRunSummaries,
|
||||
type ReportManifest,
|
||||
type RunSummary,
|
||||
} from '../src/reporting/run-summary'
|
||||
|
||||
function requireEnv(name: string): string {
|
||||
const value = process.env[name]
|
||||
@@ -87,7 +53,7 @@ const client = new S3Client({
|
||||
// Step 1: List all manifest.json files in runs/
|
||||
console.log('Scanning R2 for eval runs...')
|
||||
|
||||
const manifests: Manifest[] = []
|
||||
const manifests: ReportManifest[] = []
|
||||
let continuationToken: string | undefined
|
||||
|
||||
do {
|
||||
@@ -127,64 +93,9 @@ if (manifests.length === 0) {
|
||||
}
|
||||
|
||||
// Step 2: Build run summaries
|
||||
const runs: RunSummary[] = manifests
|
||||
.map((m) => {
|
||||
const total = m.tasks.length
|
||||
const completed = m.tasks.filter((t) => t.status === 'completed').length
|
||||
const failed = m.tasks.filter((t) => t.status === 'failed').length
|
||||
const timeout = m.tasks.filter((t) => t.status === 'timeout').length
|
||||
|
||||
let scoredCount = 0
|
||||
let scoreSum = 0
|
||||
for (const task of m.tasks) {
|
||||
if (!task.graderResults) continue
|
||||
for (const name of PASS_FAIL_GRADER_ORDER) {
|
||||
if (task.graderResults[name]) {
|
||||
scoredCount++
|
||||
scoreSum += task.graderResults[name].score ?? 0
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
|
||||
const durations = m.tasks
|
||||
.filter((t) => t.durationMs > 0)
|
||||
.map((t) => t.durationMs)
|
||||
const avgDurationMs =
|
||||
durations.length > 0
|
||||
? durations.reduce((a, b) => a + b, 0) / durations.length
|
||||
: 0
|
||||
|
||||
const date = m.uploadedAt
|
||||
? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
|
||||
: m.runId.slice(0, 15)
|
||||
|
||||
const model = m.agentConfig?.model || 'unknown'
|
||||
const dataset = m.dataset || m.runId
|
||||
const agentType = m.agentConfig?.type || 'unknown'
|
||||
|
||||
const configName = extractConfigName(m.runId)
|
||||
return {
|
||||
runId: m.runId,
|
||||
configName,
|
||||
date,
|
||||
avgScore,
|
||||
total,
|
||||
completed,
|
||||
failed,
|
||||
timeout,
|
||||
avgDurationMs,
|
||||
model,
|
||||
dataset,
|
||||
agentType,
|
||||
}
|
||||
})
|
||||
.sort((a, b) => a.date.localeCompare(b.date))
|
||||
const runs: RunSummary[] = buildRunSummaries(manifests)
|
||||
|
||||
// Step 3: Identify unique config groups
|
||||
// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
|
||||
// Extract config name by stripping the date-time suffix pattern
|
||||
function escHtml(s: string): string {
|
||||
return s
|
||||
.replace(/&/g, '&')
|
||||
@@ -193,12 +104,6 @@ function escHtml(s: string): string {
|
||||
.replace(/"/g, '"')
|
||||
}
|
||||
|
||||
function extractConfigName(runId: string): string {
|
||||
// "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
|
||||
// "ci-weekly" → "ci-weekly" (no timestamp, old format)
|
||||
return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
|
||||
}
|
||||
|
||||
const configGroups = [...new Set(runs.map((r) => r.configName))]
|
||||
const defaultConfig = configGroups.includes('ci-weekly')
|
||||
? 'ci-weekly'
|
||||
|
||||
238
packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
vendored
Normal file
238
packages/browseros-agent/apps/eval/src/agents/claude-code/index.ts
vendored
Normal file
@@ -0,0 +1,238 @@
|
||||
import { writeFile } from 'node:fs/promises'
|
||||
import { join } from 'node:path'
|
||||
import { DEFAULT_TIMEOUT_MS } from '../../constants'
|
||||
import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
|
||||
import { withEvalTimeout } from '../../utils/with-eval-timeout'
|
||||
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
|
||||
import {
|
||||
type ClaudeCodeProcessRunner,
|
||||
createClaudeCodeProcessRunner,
|
||||
} from './process-runner'
|
||||
import {
|
||||
ClaudeCodeStreamParser,
|
||||
shouldCaptureScreenshotForTool,
|
||||
} from './stream-parser'
|
||||
|
||||
export interface ClaudeCodeEvaluatorDeps {
|
||||
processRunner?: ClaudeCodeProcessRunner
|
||||
}
|
||||
|
||||
export class ClaudeCodeEvaluator implements AgentEvaluator {
|
||||
private processRunner: ClaudeCodeProcessRunner
|
||||
|
||||
constructor(
|
||||
private ctx: AgentContext,
|
||||
deps: ClaudeCodeEvaluatorDeps = {},
|
||||
) {
|
||||
this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
|
||||
}
|
||||
|
||||
async execute(): Promise<AgentResult> {
|
||||
const { config, task, capture, taskOutputDir } = this.ctx
|
||||
const startTime = Date.now()
|
||||
const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
|
||||
|
||||
await capture.messageLogger.logUser(task.query)
|
||||
|
||||
if (config.agent.type !== 'claude-code') {
|
||||
throw new Error('ClaudeCodeEvaluator only supports claude-code config')
|
||||
}
|
||||
const agentConfig = config.agent
|
||||
|
||||
const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
|
||||
await writeFile(
|
||||
mcpConfigPath,
|
||||
JSON.stringify(
|
||||
buildClaudeCodeMcpConfig(config.browseros.server_url),
|
||||
null,
|
||||
2,
|
||||
),
|
||||
)
|
||||
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const toolNamesById = new Map<string, string>()
|
||||
const prompt = buildClaudeCodePrompt(task.query)
|
||||
const args = buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config: agentConfig,
|
||||
})
|
||||
|
||||
const { terminationReason } = await withEvalTimeout(
|
||||
timeoutMs,
|
||||
capture,
|
||||
async (signal) => {
|
||||
const runResult = await this.processRunner.run({
|
||||
executable: agentConfig.claudePath,
|
||||
args,
|
||||
cwd: taskOutputDir,
|
||||
signal,
|
||||
onStdoutLine: async (line) => {
|
||||
const events = parser.pushLine(line)
|
||||
for (const event of events) {
|
||||
await this.handleStreamEvent(event, toolNamesById)
|
||||
}
|
||||
},
|
||||
})
|
||||
|
||||
if (runResult.exitCode !== 0) {
|
||||
const message =
|
||||
runResult.stderr.trim() ||
|
||||
`Claude Code exited with status ${runResult.exitCode}`
|
||||
capture.addError('agent_execution', message, {
|
||||
exitCode: runResult.exitCode,
|
||||
})
|
||||
if (!parser.getLastText()) {
|
||||
throw new Error(message)
|
||||
}
|
||||
}
|
||||
|
||||
for (const error of runResult.streamErrors ?? []) {
|
||||
capture.addWarning(
|
||||
'message_logging',
|
||||
`Claude Code stream event processing failed: ${error}`,
|
||||
)
|
||||
}
|
||||
|
||||
return runResult
|
||||
},
|
||||
)
|
||||
|
||||
const endTime = Date.now()
|
||||
const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
|
||||
const metadata = {
|
||||
query_id: task.query_id,
|
||||
dataset: task.dataset,
|
||||
query: task.query,
|
||||
started_at: new Date(startTime).toISOString(),
|
||||
completed_at: new Date(endTime).toISOString(),
|
||||
total_duration_ms: endTime - startTime,
|
||||
total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
|
||||
termination_reason: terminationReason,
|
||||
final_answer: finalAnswer,
|
||||
errors: capture.getErrors(),
|
||||
warnings: capture.getWarnings(),
|
||||
device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
|
||||
agent_config: {
|
||||
type: 'claude-code' as const,
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
}
|
||||
|
||||
await capture.trajectorySaver.saveMetadata(metadata)
|
||||
|
||||
return {
|
||||
metadata,
|
||||
messages: capture.getMessages(),
|
||||
finalAnswer,
|
||||
}
|
||||
}
|
||||
|
||||
private async handleStreamEvent(
|
||||
event: UIMessageStreamEvent,
|
||||
toolNamesById: Map<string, string>,
|
||||
): Promise<void> {
|
||||
const { capture, task } = this.ctx
|
||||
let screenshot: number | undefined
|
||||
|
||||
if (event.type === 'tool-input-available') {
|
||||
toolNamesById.set(event.toolCallId, event.toolName)
|
||||
if (isPageInput(event.input)) {
|
||||
capture.setActivePageId(event.input.page)
|
||||
}
|
||||
}
|
||||
|
||||
if (
|
||||
event.type === 'tool-output-available' ||
|
||||
event.type === 'tool-output-error'
|
||||
) {
|
||||
const toolName = toolNamesById.get(event.toolCallId)
|
||||
if (toolName && shouldCaptureScreenshotForTool(toolName)) {
|
||||
screenshot = await this.captureScreenshot()
|
||||
}
|
||||
}
|
||||
|
||||
await capture.messageLogger.logStreamEvent(event, screenshot)
|
||||
capture.emitEvent(task.query_id, {
|
||||
...event,
|
||||
...(screenshot !== undefined && { screenshot }),
|
||||
})
|
||||
}
|
||||
|
||||
private async captureScreenshot(): Promise<number | undefined> {
|
||||
const { capture, task } = this.ctx
|
||||
try {
|
||||
const screenshot = await capture.screenshot.capture(
|
||||
capture.getActivePageId(),
|
||||
)
|
||||
capture.emitEvent(task.query_id, {
|
||||
type: 'screenshot-captured',
|
||||
screenshot,
|
||||
})
|
||||
return screenshot
|
||||
} catch {
|
||||
return undefined
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function isPageInput(input: unknown): input is { page: number } {
|
||||
return (
|
||||
typeof input === 'object' &&
|
||||
input !== null &&
|
||||
'page' in input &&
|
||||
typeof input.page === 'number'
|
||||
)
|
||||
}
|
||||
|
||||
function buildClaudeCodePrompt(taskQuery: string): string {
|
||||
return [
|
||||
'You are running inside BrowserOS eval.',
|
||||
'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
|
||||
'When the task is complete, respond with the final answer only.',
|
||||
'If blocked, explain the blocker clearly.',
|
||||
'',
|
||||
`Task: ${taskQuery}`,
|
||||
].join('\n')
|
||||
}
|
||||
|
||||
function buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config,
|
||||
}: {
|
||||
prompt: string
|
||||
mcpConfigPath: string
|
||||
config: ClaudeCodeAgentConfig
|
||||
}): string[] {
|
||||
const args = [
|
||||
'-p',
|
||||
prompt,
|
||||
'--mcp-config',
|
||||
mcpConfigPath,
|
||||
'--strict-mcp-config',
|
||||
'--output-format',
|
||||
'stream-json',
|
||||
'--verbose',
|
||||
]
|
||||
|
||||
if (config.model) args.push('--model', config.model)
|
||||
args.push(...config.extraArgs)
|
||||
|
||||
return args
|
||||
}
|
||||
|
||||
function buildClaudeCodeMcpConfig(serverUrl: string) {
|
||||
const trimmed = serverUrl.replace(/\/$/, '')
|
||||
const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
|
||||
return {
|
||||
mcpServers: {
|
||||
browseros: {
|
||||
type: 'http',
|
||||
url,
|
||||
headers: { 'X-BrowserOS-Source': 'sdk-internal' },
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
114
packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
vendored
Normal file
114
packages/browseros-agent/apps/eval/src/agents/claude-code/process-runner.ts
vendored
Normal file
@@ -0,0 +1,114 @@
|
||||
export interface ClaudeCodeRunOptions {
|
||||
executable: string
|
||||
args: string[]
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface ClaudeCodeRunResult {
|
||||
exitCode: number
|
||||
stderr: string
|
||||
streamErrors?: string[]
|
||||
}
|
||||
|
||||
export interface ClaudeCodeProcessRunner {
|
||||
run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export interface SpawnOptions {
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface CreateClaudeCodeProcessRunnerDeps {
|
||||
spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export function createClaudeCodeProcessRunner(
|
||||
deps: CreateClaudeCodeProcessRunnerDeps = {},
|
||||
): ClaudeCodeProcessRunner {
|
||||
const spawn = deps.spawn ?? spawnClaudeCode
|
||||
return {
|
||||
run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
|
||||
spawn([executable, ...args], { cwd, signal, onStdoutLine }),
|
||||
}
|
||||
}
|
||||
|
||||
async function spawnClaudeCode(
|
||||
cmd: string[],
|
||||
options: SpawnOptions,
|
||||
): Promise<ClaudeCodeRunResult> {
|
||||
const proc = Bun.spawn({
|
||||
cmd,
|
||||
cwd: options.cwd,
|
||||
stdin: 'ignore',
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
})
|
||||
|
||||
const abort = () => {
|
||||
try {
|
||||
proc.kill('SIGTERM')
|
||||
} catch {
|
||||
// Process may already have exited.
|
||||
}
|
||||
}
|
||||
options.signal?.addEventListener('abort', abort, { once: true })
|
||||
|
||||
try {
|
||||
const streamErrors: string[] = []
|
||||
const stdoutPromise = readLines(
|
||||
proc.stdout,
|
||||
options.onStdoutLine,
|
||||
streamErrors,
|
||||
)
|
||||
const stderrPromise = new Response(proc.stderr).text()
|
||||
const exitCode = await proc.exited
|
||||
await stdoutPromise
|
||||
const stderr = await stderrPromise
|
||||
return { exitCode, stderr, streamErrors }
|
||||
} finally {
|
||||
options.signal?.removeEventListener('abort', abort)
|
||||
}
|
||||
}
|
||||
|
||||
async function readLines(
|
||||
stream: ReadableStream<Uint8Array>,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
const reader = stream.getReader()
|
||||
const decoder = new TextDecoder()
|
||||
let buffer = ''
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read()
|
||||
if (done) break
|
||||
|
||||
buffer += decoder.decode(value, { stream: true })
|
||||
const lines = buffer.split('\n')
|
||||
buffer = lines.pop() ?? ''
|
||||
for (const line of lines) {
|
||||
await emitLine(line, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
buffer += decoder.decode()
|
||||
if (buffer.length > 0) {
|
||||
await emitLine(buffer, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
async function emitLine(
|
||||
line: string,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
try {
|
||||
await onLine(line)
|
||||
} catch (error) {
|
||||
streamErrors.push(error instanceof Error ? error.message : String(error))
|
||||
}
|
||||
}
|
||||
142
packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
vendored
Normal file
142
packages/browseros-agent/apps/eval/src/agents/claude-code/stream-parser.ts
vendored
Normal file
@@ -0,0 +1,142 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import type { UIMessageStreamEvent } from '../../types'
|
||||
|
||||
type JsonObject = Record<string, unknown>
|
||||
|
||||
export class ClaudeCodeStreamParser {
|
||||
private lastText: string | null = null
|
||||
private toolCallCount = 0
|
||||
|
||||
pushLine(line: string): UIMessageStreamEvent[] {
|
||||
const trimmed = line.trim()
|
||||
if (!trimmed) return []
|
||||
|
||||
let parsed: unknown
|
||||
try {
|
||||
parsed = JSON.parse(trimmed)
|
||||
} catch {
|
||||
return []
|
||||
}
|
||||
|
||||
if (!isObject(parsed)) return []
|
||||
|
||||
if (parsed.type === 'assistant') {
|
||||
return this.parseAssistantMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'user') {
|
||||
return this.parseUserMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'result' && typeof parsed.result === 'string') {
|
||||
this.lastText = parsed.result
|
||||
}
|
||||
|
||||
return []
|
||||
}
|
||||
|
||||
getLastText(): string | null {
|
||||
return this.lastText
|
||||
}
|
||||
|
||||
getToolCallCount(): number {
|
||||
return this.toolCallCount
|
||||
}
|
||||
|
||||
private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (block.type === 'text' && typeof block.text === 'string') {
|
||||
const id = randomUUID()
|
||||
this.lastText = block.text
|
||||
events.push(
|
||||
{ type: 'text-start', id },
|
||||
{ type: 'text-delta', id, delta: block.text },
|
||||
{ type: 'text-end', id },
|
||||
)
|
||||
} else if (
|
||||
block.type === 'tool_use' &&
|
||||
typeof block.id === 'string' &&
|
||||
typeof block.name === 'string'
|
||||
) {
|
||||
this.toolCallCount++
|
||||
events.push({
|
||||
type: 'tool-input-available',
|
||||
toolCallId: block.id,
|
||||
toolName: block.name,
|
||||
input: block.input,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
|
||||
private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (
|
||||
block.type !== 'tool_result' ||
|
||||
typeof block.tool_use_id !== 'string'
|
||||
) {
|
||||
continue
|
||||
}
|
||||
|
||||
if (block.is_error === true) {
|
||||
events.push({
|
||||
type: 'tool-output-error',
|
||||
toolCallId: block.tool_use_id,
|
||||
errorText: stringifyToolContent(block.content),
|
||||
})
|
||||
} else {
|
||||
events.push({
|
||||
type: 'tool-output-available',
|
||||
toolCallId: block.tool_use_id,
|
||||
output: normalizeToolContent(block.content),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
}
|
||||
|
||||
export function shouldCaptureScreenshotForTool(toolName: string): boolean {
|
||||
if (!toolName.startsWith('mcp__browseros__')) return false
|
||||
return !toolName.endsWith('__take_screenshot')
|
||||
}
|
||||
|
||||
function contentBlocks(message: JsonObject): JsonObject[] {
|
||||
const inner = isObject(message.message) ? message.message : message
|
||||
return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
|
||||
}
|
||||
|
||||
function isObject(value: unknown): value is JsonObject {
|
||||
return typeof value === 'object' && value !== null
|
||||
}
|
||||
|
||||
function normalizeToolContent(content: unknown): unknown {
|
||||
if (!Array.isArray(content)) return content
|
||||
return content.map((item) => {
|
||||
if (
|
||||
isObject(item) &&
|
||||
item.type === 'text' &&
|
||||
typeof item.text === 'string'
|
||||
) {
|
||||
return item.text
|
||||
}
|
||||
return item
|
||||
})
|
||||
}
|
||||
|
||||
function stringifyToolContent(content: unknown): string {
|
||||
const normalized = normalizeToolContent(content)
|
||||
if (typeof normalized === 'string') return normalized
|
||||
try {
|
||||
return JSON.stringify(normalized)
|
||||
} catch {
|
||||
return String(normalized)
|
||||
}
|
||||
}
|
||||
@@ -1,3 +1,4 @@
|
||||
import { ClaudeCodeEvaluator } from './claude-code'
|
||||
import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
|
||||
import { SingleAgentEvaluator } from './single-agent'
|
||||
import type { AgentContext, AgentEvaluator } from './types'
|
||||
@@ -8,6 +9,8 @@ export function createAgent(context: AgentContext): AgentEvaluator {
|
||||
return new SingleAgentEvaluator(context)
|
||||
case 'orchestrator-executor':
|
||||
return new OrchestratorExecutorEvaluator(context)
|
||||
case 'claude-code':
|
||||
return new ClaudeCodeEvaluator(context)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,121 +1,67 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
|
||||
import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
|
||||
import { sleep } from '../../../../utils/sleep'
|
||||
import type {
|
||||
ExecutorConfig,
|
||||
ExecutorResult,
|
||||
} from '../../../orchestrator-executor/types'
|
||||
import type { ExecutorCallbacks } from '../../executor-backend'
|
||||
import {
|
||||
CLADO_REQUEST_TIMEOUT_MS,
|
||||
MAX_ACTIONS_PER_DELEGATION,
|
||||
} from '../../constants'
|
||||
import { McpClient, type McpToolResult } from '../../utils/mcp-client'
|
||||
import { sleep } from '../../utils/sleep'
|
||||
import type { ExecutorCallbacks } from './executor'
|
||||
import type { ExecutorConfig, ExecutorResult } from './types'
|
||||
|
||||
const CLADO_ACTION_PROVIDER = 'clado-action'
|
||||
const PAGE_SCOPED_TOOLS = new Set<string>([
|
||||
'take_screenshot',
|
||||
'evaluate_script',
|
||||
'click',
|
||||
'click_at',
|
||||
'hover',
|
||||
'hover_at',
|
||||
'clear',
|
||||
'fill',
|
||||
'press_key',
|
||||
'type_at',
|
||||
'drag',
|
||||
'drag_at',
|
||||
'scroll',
|
||||
'handle_dialog',
|
||||
'select_option',
|
||||
'navigate_page',
|
||||
'close_page',
|
||||
'wait_for',
|
||||
])
|
||||
|
||||
interface CladoActionResponse {
|
||||
action?: string | null
|
||||
x?: number
|
||||
y?: number
|
||||
text?: string
|
||||
key?: string
|
||||
direction?: string
|
||||
startX?: number
|
||||
startY?: number
|
||||
endX?: number
|
||||
endY?: number
|
||||
amount?: number
|
||||
time?: number
|
||||
final_answer?: string | null
|
||||
inference_time_seconds?: number
|
||||
raw_response?: string
|
||||
thinking?: string | null
|
||||
parse_error?: string | null
|
||||
}
|
||||
|
||||
interface Viewport {
|
||||
width: number
|
||||
height: number
|
||||
}
|
||||
|
||||
interface CladoAction {
|
||||
action: string
|
||||
x?: number
|
||||
y?: number
|
||||
text?: string
|
||||
key?: string
|
||||
direction?: string
|
||||
startX?: number
|
||||
startY?: number
|
||||
endX?: number
|
||||
endY?: number
|
||||
amount?: number
|
||||
time?: number
|
||||
final_answer?: string
|
||||
}
|
||||
|
||||
type RawActionPayload = Partial<Omit<CladoAction, 'final_answer'>> & {
|
||||
final_answer?: string | null
|
||||
}
|
||||
extractCladoThinking,
|
||||
formatCladoHistory,
|
||||
getCladoActionSignature,
|
||||
parseCladoActions,
|
||||
summarizeCladoPrediction,
|
||||
} from './clado-actions'
|
||||
import {
|
||||
normalizeCladoDirection,
|
||||
normalizeCladoPressKey,
|
||||
normalizeCladoScrollAmount,
|
||||
prepareCladoToolArgs,
|
||||
resolveCladoPoint,
|
||||
} from './clado-browser-driver'
|
||||
import { CladoActionClient } from './clado-client'
|
||||
import {
|
||||
CLADO_ACTION_PROVIDER,
|
||||
type CladoAction,
|
||||
type CladoActionPoint,
|
||||
type CladoActionResponse,
|
||||
type CladoViewport,
|
||||
isCladoActionProvider,
|
||||
} from './types'
|
||||
|
||||
const MAX_CONSECUTIVE_PARSE_FAILURES = 3
|
||||
|
||||
interface ActionPoint {
|
||||
x: number
|
||||
y: number
|
||||
}
|
||||
|
||||
function asErrorMessage(error: unknown): string {
|
||||
return error instanceof Error ? error.message : String(error)
|
||||
}
|
||||
|
||||
function clampNormalized(value: number): number {
|
||||
return Math.min(999, Math.max(0, Math.round(value)))
|
||||
}
|
||||
|
||||
function isCladoProvider(provider: string): boolean {
|
||||
return provider === CLADO_ACTION_PROVIDER
|
||||
}
|
||||
|
||||
export class CladoActionExecutor {
|
||||
private readonly mcpClient: McpClient
|
||||
private readonly cladoClient: CladoActionClient
|
||||
private readonly pageId: number
|
||||
private callbacks: ExecutorCallbacks = {}
|
||||
private stepsUsed = 0
|
||||
private viewport: Viewport | null = null
|
||||
private lastPoint: ActionPoint | null = null
|
||||
private viewport: CladoViewport | null = null
|
||||
private lastPoint: CladoActionPoint | null = null
|
||||
private currentUrl = ''
|
||||
|
||||
constructor(
|
||||
private readonly config: ExecutorConfig,
|
||||
config: ExecutorConfig,
|
||||
serverUrl: string,
|
||||
readonly _windowId?: number,
|
||||
readonly _tabId?: number,
|
||||
initialPageId?: number,
|
||||
) {
|
||||
if (!isCladoProvider(config.provider)) {
|
||||
if (!isCladoActionProvider(config.provider)) {
|
||||
throw new Error(
|
||||
`CladoActionExecutor requires provider="${CLADO_ACTION_PROVIDER}"`,
|
||||
)
|
||||
}
|
||||
this.mcpClient = new McpClient(`${serverUrl}/mcp`)
|
||||
this.cladoClient = new CladoActionClient({
|
||||
baseUrl: config.baseUrl,
|
||||
apiKey: config.apiKey,
|
||||
})
|
||||
this.pageId = initialPageId ?? 1
|
||||
}
|
||||
|
||||
@@ -165,7 +111,7 @@ export class CladoActionExecutor {
|
||||
break
|
||||
}
|
||||
|
||||
const historyForPrediction = this.formatHistory(actionHistory)
|
||||
const historyForPrediction = formatCladoHistory(actionHistory)
|
||||
const actionToolCallId = randomUUID()
|
||||
const predictionInput = {
|
||||
instruction,
|
||||
@@ -187,7 +133,7 @@ export class CladoActionExecutor {
|
||||
signal,
|
||||
)
|
||||
predictionCalls++
|
||||
const thinking = this.extractThinking(prediction.raw_response)
|
||||
const thinking = extractCladoThinking(prediction.raw_response)
|
||||
if (thinking) {
|
||||
const previous = thinkingTrace[thinkingTrace.length - 1]
|
||||
if (previous !== thinking) {
|
||||
@@ -217,7 +163,7 @@ export class CladoActionExecutor {
|
||||
break
|
||||
}
|
||||
|
||||
const predictedActions = this.parseActions(prediction)
|
||||
const predictedActions = parseCladoActions(prediction)
|
||||
if (predictedActions.length === 0) {
|
||||
// Per Clado contract: HTTP 200 with action=null on parse failure.
|
||||
// Count as an invalid step so the model can self-correct on the
|
||||
@@ -243,7 +189,7 @@ export class CladoActionExecutor {
|
||||
toolCallId: actionToolCallId,
|
||||
toolName: 'clado_action_predict',
|
||||
output: {
|
||||
prediction: this.summarizePrediction(prediction),
|
||||
prediction: summarizeCladoPrediction(prediction),
|
||||
parsedActions: [],
|
||||
parseError,
|
||||
consecutiveParseFailures,
|
||||
@@ -285,7 +231,7 @@ export class CladoActionExecutor {
|
||||
toolCallId: actionToolCallId,
|
||||
toolName: 'clado_action_predict',
|
||||
output: {
|
||||
prediction: this.summarizePrediction(prediction),
|
||||
prediction: summarizeCladoPrediction(prediction),
|
||||
parsedActions: predictedActions,
|
||||
executed: executionNotes,
|
||||
},
|
||||
@@ -326,7 +272,7 @@ export class CladoActionExecutor {
|
||||
toolCallId: actionToolCallId,
|
||||
toolName: 'clado_action_predict',
|
||||
output: {
|
||||
prediction: this.summarizePrediction(prediction),
|
||||
prediction: summarizeCladoPrediction(prediction),
|
||||
parsedActions: predictedActions,
|
||||
executed: executionNotes,
|
||||
},
|
||||
@@ -378,125 +324,12 @@ export class CladoActionExecutor {
|
||||
actionHistory: CladoAction[],
|
||||
signal?: AbortSignal,
|
||||
): Promise<CladoActionResponse> {
|
||||
if (!this.config.baseUrl) {
|
||||
throw new Error('executor.baseUrl must be set for clado-action provider')
|
||||
}
|
||||
|
||||
const requestController = new AbortController()
|
||||
const onAbort = () => requestController.abort()
|
||||
signal?.addEventListener('abort', onAbort, { once: true })
|
||||
|
||||
const timeoutHandle = setTimeout(() => {
|
||||
requestController.abort()
|
||||
}, CLADO_REQUEST_TIMEOUT_MS)
|
||||
|
||||
try {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
if (this.config.apiKey) {
|
||||
headers.Authorization = `Bearer ${this.config.apiKey}`
|
||||
}
|
||||
|
||||
const response = await fetch(this.config.baseUrl, {
|
||||
method: 'POST',
|
||||
headers,
|
||||
body: JSON.stringify({
|
||||
instruction,
|
||||
image_base64: imageBase64,
|
||||
history: this.formatHistory(actionHistory),
|
||||
}),
|
||||
signal: requestController.signal,
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const body = await response.text()
|
||||
throw new Error(
|
||||
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
|
||||
)
|
||||
}
|
||||
|
||||
return (await response.json()) as CladoActionResponse
|
||||
} finally {
|
||||
clearTimeout(timeoutHandle)
|
||||
signal?.removeEventListener('abort', onAbort)
|
||||
}
|
||||
}
|
||||
|
||||
private parseActions(prediction: CladoActionResponse): CladoAction[] {
|
||||
const actionFromField =
|
||||
typeof prediction.action === 'string' ? prediction.action : null
|
||||
|
||||
const rawActions = this.parseActionsFromRawResponse(prediction.raw_response)
|
||||
const primaryFromRaw = rawActions[0] ?? null
|
||||
const mergedPrimary = {
|
||||
...primaryFromRaw,
|
||||
...prediction,
|
||||
action: actionFromField ?? primaryFromRaw?.action,
|
||||
}
|
||||
|
||||
const normalized: CladoAction[] = []
|
||||
const primary = this.normalizeActionPayload(mergedPrimary)
|
||||
if (primary) normalized.push(primary)
|
||||
|
||||
for (const candidate of rawActions.slice(1)) {
|
||||
const parsed = this.normalizeActionPayload(candidate)
|
||||
if (!parsed) continue
|
||||
const prev = normalized[normalized.length - 1]
|
||||
if (
|
||||
!prev ||
|
||||
this.getActionSignature(prev) !== this.getActionSignature(parsed)
|
||||
) {
|
||||
normalized.push(parsed)
|
||||
}
|
||||
}
|
||||
|
||||
return normalized
|
||||
}
|
||||
|
||||
private normalizeActionPayload(
|
||||
payload: RawActionPayload,
|
||||
): CladoAction | null {
|
||||
if (!payload.action || typeof payload.action !== 'string') {
|
||||
return null
|
||||
}
|
||||
return {
|
||||
action: payload.action,
|
||||
x: typeof payload.x === 'number' ? payload.x : undefined,
|
||||
y: typeof payload.y === 'number' ? payload.y : undefined,
|
||||
text: typeof payload.text === 'string' ? payload.text : undefined,
|
||||
key: typeof payload.key === 'string' ? payload.key : undefined,
|
||||
direction:
|
||||
typeof payload.direction === 'string' ? payload.direction : undefined,
|
||||
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
|
||||
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
|
||||
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
|
||||
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
|
||||
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
|
||||
time: typeof payload.time === 'number' ? payload.time : undefined,
|
||||
final_answer:
|
||||
typeof payload.final_answer === 'string'
|
||||
? payload.final_answer
|
||||
: undefined,
|
||||
}
|
||||
}
|
||||
|
||||
private parseActionsFromRawResponse(
|
||||
rawResponse: string | undefined,
|
||||
): RawActionPayload[] {
|
||||
if (!rawResponse) return []
|
||||
const matches = [
|
||||
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
|
||||
]
|
||||
const parsed: RawActionPayload[] = []
|
||||
for (const match of matches) {
|
||||
try {
|
||||
parsed.push(JSON.parse(match[1]) as RawActionPayload)
|
||||
} catch {
|
||||
// ignore malformed answer blocks
|
||||
}
|
||||
}
|
||||
return parsed
|
||||
return this.cladoClient.requestActionPrediction({
|
||||
instruction,
|
||||
imageBase64,
|
||||
actionHistory,
|
||||
signal,
|
||||
})
|
||||
}
|
||||
|
||||
private async executeAction(
|
||||
@@ -567,14 +400,14 @@ export class CladoActionExecutor {
|
||||
}
|
||||
|
||||
case 'press_key': {
|
||||
const key = this.normalizePressKey(action.key)
|
||||
const key = normalizeCladoPressKey(action.key)
|
||||
await this.runTool('press_key', { key }, signal)
|
||||
return `Pressed key "${key}".`
|
||||
}
|
||||
|
||||
case 'scroll': {
|
||||
const direction = this.normalizeDirection(action.direction)
|
||||
const amountPx = this.normalizeScrollAmount(action.amount)
|
||||
const direction = normalizeCladoDirection(action.direction)
|
||||
const amountPx = normalizeCladoScrollAmount(action.amount)
|
||||
const ticks = Math.max(1, Math.round(amountPx / 120))
|
||||
|
||||
await this.runTool('scroll', { direction, amount: ticks }, signal)
|
||||
@@ -645,7 +478,7 @@ export class CladoActionExecutor {
|
||||
return image.data
|
||||
}
|
||||
|
||||
private async getViewport(signal?: AbortSignal): Promise<Viewport> {
|
||||
private async getViewport(signal?: AbortSignal): Promise<CladoViewport> {
|
||||
if (this.viewport) return this.viewport
|
||||
|
||||
try {
|
||||
@@ -676,15 +509,9 @@ export class CladoActionExecutor {
|
||||
normalizedX: number | undefined,
|
||||
normalizedY: number | undefined,
|
||||
signal?: AbortSignal,
|
||||
): Promise<ActionPoint> {
|
||||
): Promise<CladoActionPoint> {
|
||||
const viewport = await this.getViewport(signal)
|
||||
const nx = clampNormalized(normalizedX ?? 500)
|
||||
const ny = clampNormalized(normalizedY ?? 500)
|
||||
|
||||
return {
|
||||
x: Math.round((nx / 1000) * viewport.width),
|
||||
y: Math.round((ny / 1000) * viewport.height),
|
||||
}
|
||||
return resolveCladoPoint(viewport, normalizedX, normalizedY)
|
||||
}
|
||||
|
||||
private async getCurrentUrl(signal?: AbortSignal): Promise<string> {
|
||||
@@ -711,7 +538,7 @@ export class CladoActionExecutor {
|
||||
throw new Error('aborted')
|
||||
}
|
||||
|
||||
const toolArgs = this.prepareToolArgs(toolName, args)
|
||||
const toolArgs = prepareCladoToolArgs(toolName, args, this.pageId)
|
||||
|
||||
try {
|
||||
const raw = await this.mcpClient.callTool(toolName, toolArgs)
|
||||
@@ -730,207 +557,6 @@ export class CladoActionExecutor {
|
||||
}
|
||||
}
|
||||
|
||||
private prepareToolArgs(
|
||||
toolName: string,
|
||||
args: Record<string, unknown>,
|
||||
): Record<string, unknown> {
|
||||
const prepared: Record<string, unknown> = { ...args }
|
||||
|
||||
if (
|
||||
toolName === 'evaluate_script' &&
|
||||
typeof prepared.function === 'string' &&
|
||||
prepared.expression === undefined
|
||||
) {
|
||||
prepared.expression = this.toEvaluateExpression(prepared.function)
|
||||
delete prepared.function
|
||||
}
|
||||
|
||||
if (
|
||||
toolName === 'click_at' &&
|
||||
typeof prepared.dblClick === 'boolean' &&
|
||||
prepared.clickCount === undefined
|
||||
) {
|
||||
prepared.clickCount = prepared.dblClick ? 2 : 1
|
||||
delete prepared.dblClick
|
||||
}
|
||||
|
||||
// Use fixed page ID for all page-scoped tools (single-page operation)
|
||||
if (PAGE_SCOPED_TOOLS.has(toolName) && typeof prepared.page !== 'number') {
|
||||
prepared.page = this.pageId
|
||||
}
|
||||
|
||||
return prepared
|
||||
}
|
||||
|
||||
private toEvaluateExpression(rawFunction: unknown): string {
|
||||
const source = String(rawFunction).trim()
|
||||
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
|
||||
return `(${source})()`
|
||||
}
|
||||
if (source.startsWith('function')) {
|
||||
return `(${source})()`
|
||||
}
|
||||
return source
|
||||
}
|
||||
|
||||
private normalizePressKey(key: string | undefined): string {
|
||||
const raw = (key ?? '').trim()
|
||||
if (!raw) throw new Error('press_key action missing key field')
|
||||
|
||||
const map: Record<string, string> = {
|
||||
'C-a': 'Control+A',
|
||||
'C-c': 'Control+C',
|
||||
'C-v': 'Control+V',
|
||||
'C-x': 'Control+X',
|
||||
'C-z': 'Control+Z',
|
||||
'C-y': 'Control+Y',
|
||||
'C-s': 'Control+S',
|
||||
'C-t': 'Control+T',
|
||||
'C-w': 'Control+W',
|
||||
'C-h': 'Control+H',
|
||||
'C-f': 'Control+F',
|
||||
'C-+': 'Control++',
|
||||
'C--': 'Control+-',
|
||||
'C-tab': 'Control+Tab',
|
||||
'C-S-tab': 'Control+Shift+Tab',
|
||||
'C-S-n': 'Control+Shift+N',
|
||||
'C-down': 'Control+ArrowDown',
|
||||
// macOS Cmd shortcuts (Meta in CDP).
|
||||
'M-a': 'Meta+A',
|
||||
'M-c': 'Meta+C',
|
||||
'M-v': 'Meta+V',
|
||||
'M-x': 'Meta+X',
|
||||
'M-f4': 'Alt+F4',
|
||||
}
|
||||
return map[raw] ?? raw
|
||||
}
|
||||
|
||||
private normalizeDirection(
|
||||
direction: string | undefined,
|
||||
): 'up' | 'down' | 'left' | 'right' {
|
||||
if (
|
||||
direction === 'up' ||
|
||||
direction === 'down' ||
|
||||
direction === 'left' ||
|
||||
direction === 'right'
|
||||
) {
|
||||
return direction
|
||||
}
|
||||
return 'down'
|
||||
}
|
||||
|
||||
private normalizeScrollAmount(amount: number | undefined): number {
|
||||
if (typeof amount !== 'number') return 500
|
||||
if (amount <= 0) return 100
|
||||
const clamped = Math.min(amount, 1000)
|
||||
return Math.max(100, Math.round((clamped / 1000) * 900))
|
||||
}
|
||||
|
||||
private summarizePrediction(
|
||||
prediction: CladoActionResponse,
|
||||
): Record<string, unknown> {
|
||||
const preview =
|
||||
typeof prediction.raw_response === 'string' &&
|
||||
prediction.raw_response.length > 0
|
||||
? prediction.raw_response.slice(0, 240)
|
||||
: undefined
|
||||
|
||||
return {
|
||||
action: prediction.action,
|
||||
x: prediction.x,
|
||||
y: prediction.y,
|
||||
text: prediction.text,
|
||||
key: prediction.key,
|
||||
direction: prediction.direction,
|
||||
startX: prediction.startX,
|
||||
startY: prediction.startY,
|
||||
endX: prediction.endX,
|
||||
endY: prediction.endY,
|
||||
amount: prediction.amount,
|
||||
time: prediction.time,
|
||||
inference_time_seconds: prediction.inference_time_seconds,
|
||||
raw_response_preview: preview,
|
||||
}
|
||||
}
|
||||
|
||||
private extractThinking(rawResponse: string | undefined): string | undefined {
|
||||
if (!rawResponse) return undefined
|
||||
const matches = [
|
||||
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
|
||||
]
|
||||
if (matches.length === 0) return undefined
|
||||
|
||||
const merged = matches
|
||||
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
|
||||
.filter((value) => value.length > 0)
|
||||
.join(' ')
|
||||
|
||||
if (!merged) return undefined
|
||||
return merged
|
||||
}
|
||||
|
||||
private getActionSignature(action: CladoAction): string {
|
||||
switch (action.action) {
|
||||
case 'click':
|
||||
case 'double_click':
|
||||
case 'right_click':
|
||||
case 'hover':
|
||||
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
|
||||
case 'type':
|
||||
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
|
||||
case 'press_key':
|
||||
return `${action.action}:${action.key ?? 'key'}`
|
||||
case 'scroll':
|
||||
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
|
||||
case 'drag':
|
||||
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
|
||||
case 'wait':
|
||||
return `${action.action}:${action.time ?? 1}`
|
||||
case 'end':
|
||||
return action.final_answer
|
||||
? `end(${action.final_answer.slice(0, 32)})`
|
||||
: 'end()'
|
||||
case 'invalid':
|
||||
return `invalid(${(action.text ?? '').slice(0, 40)})`
|
||||
default:
|
||||
return action.action
|
||||
}
|
||||
}
|
||||
|
||||
private formatHistory(actions: CladoAction[]): string {
|
||||
if (actions.length === 0) return 'None'
|
||||
|
||||
const parts = actions.map((action) => {
|
||||
switch (action.action) {
|
||||
case 'click':
|
||||
case 'double_click':
|
||||
case 'right_click':
|
||||
case 'hover':
|
||||
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
|
||||
case 'type': {
|
||||
const text = (action.text ?? '').replace(/'/g, "\\'")
|
||||
return `type('${text}')`
|
||||
}
|
||||
case 'press_key':
|
||||
return `press_key('${action.key ?? 'Enter'}')`
|
||||
case 'scroll':
|
||||
return `scroll(${action.direction ?? 'down'})`
|
||||
case 'drag':
|
||||
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
|
||||
case 'wait':
|
||||
return `wait(${Math.round(action.time ?? 1)}s)`
|
||||
case 'end':
|
||||
return 'end()'
|
||||
case 'invalid':
|
||||
return 'invalid()'
|
||||
default:
|
||||
return action.action
|
||||
}
|
||||
})
|
||||
|
||||
return parts.join(' -> ')
|
||||
}
|
||||
|
||||
private buildObservation(params: {
|
||||
status: ExecutorResult['status']
|
||||
reason: string
|
||||
@@ -946,7 +572,7 @@ export class CladoActionExecutor {
|
||||
: actions
|
||||
.slice(-5)
|
||||
.map(
|
||||
(action, idx) => `${idx + 1}. ${this.getActionSignature(action)}`,
|
||||
(action, idx) => `${idx + 1}. ${getCladoActionSignature(action)}`,
|
||||
)
|
||||
.join('\n')
|
||||
const thinkingSummary =
|
||||
191
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
vendored
Normal file
191
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-actions.ts
vendored
Normal file
@@ -0,0 +1,191 @@
|
||||
import type {
|
||||
CladoAction,
|
||||
CladoActionResponse,
|
||||
RawCladoActionPayload,
|
||||
} from './types'
|
||||
|
||||
/** Parses Clado's structured response plus any raw `<answer>` blocks into executable actions. */
|
||||
export function parseCladoActions(
|
||||
prediction: CladoActionResponse,
|
||||
): CladoAction[] {
|
||||
const actionFromField =
|
||||
typeof prediction.action === 'string' ? prediction.action : null
|
||||
|
||||
const rawActions = parseCladoActionsFromRawResponse(prediction.raw_response)
|
||||
const primaryFromRaw = rawActions[0] ?? null
|
||||
const mergedPrimary = {
|
||||
...primaryFromRaw,
|
||||
...prediction,
|
||||
action: actionFromField ?? primaryFromRaw?.action,
|
||||
}
|
||||
|
||||
const normalized: CladoAction[] = []
|
||||
const primary = normalizeCladoActionPayload(mergedPrimary)
|
||||
if (primary) normalized.push(primary)
|
||||
|
||||
for (const candidate of rawActions.slice(1)) {
|
||||
const parsed = normalizeCladoActionPayload(candidate)
|
||||
if (!parsed) continue
|
||||
const prev = normalized[normalized.length - 1]
|
||||
if (
|
||||
!prev ||
|
||||
getCladoActionSignature(prev) !== getCladoActionSignature(parsed)
|
||||
) {
|
||||
normalized.push(parsed)
|
||||
}
|
||||
}
|
||||
|
||||
return normalized
|
||||
}
|
||||
|
||||
export function normalizeCladoActionPayload(
|
||||
payload: RawCladoActionPayload,
|
||||
): CladoAction | null {
|
||||
if (!payload.action || typeof payload.action !== 'string') {
|
||||
return null
|
||||
}
|
||||
return {
|
||||
action: payload.action,
|
||||
x: typeof payload.x === 'number' ? payload.x : undefined,
|
||||
y: typeof payload.y === 'number' ? payload.y : undefined,
|
||||
text: typeof payload.text === 'string' ? payload.text : undefined,
|
||||
key: typeof payload.key === 'string' ? payload.key : undefined,
|
||||
direction:
|
||||
typeof payload.direction === 'string' ? payload.direction : undefined,
|
||||
startX: typeof payload.startX === 'number' ? payload.startX : undefined,
|
||||
startY: typeof payload.startY === 'number' ? payload.startY : undefined,
|
||||
endX: typeof payload.endX === 'number' ? payload.endX : undefined,
|
||||
endY: typeof payload.endY === 'number' ? payload.endY : undefined,
|
||||
amount: typeof payload.amount === 'number' ? payload.amount : undefined,
|
||||
time: typeof payload.time === 'number' ? payload.time : undefined,
|
||||
final_answer:
|
||||
typeof payload.final_answer === 'string'
|
||||
? payload.final_answer
|
||||
: undefined,
|
||||
}
|
||||
}
|
||||
|
||||
export function parseCladoActionsFromRawResponse(
|
||||
rawResponse: string | undefined,
|
||||
): RawCladoActionPayload[] {
|
||||
if (!rawResponse) return []
|
||||
const matches = [
|
||||
...rawResponse.matchAll(/<answer>\s*([\s\S]*?)\s*<\/answer>/gi),
|
||||
]
|
||||
const parsed: RawCladoActionPayload[] = []
|
||||
for (const match of matches) {
|
||||
try {
|
||||
parsed.push(JSON.parse(match[1]) as RawCladoActionPayload)
|
||||
} catch {
|
||||
// Ignore malformed answer blocks so one bad block does not drop the whole prediction.
|
||||
}
|
||||
}
|
||||
return parsed
|
||||
}
|
||||
|
||||
export function extractCladoThinking(
|
||||
rawResponse: string | undefined,
|
||||
): string | undefined {
|
||||
if (!rawResponse) return undefined
|
||||
const matches = [
|
||||
...rawResponse.matchAll(/<thinking>\s*([\s\S]*?)\s*<\/thinking>/gi),
|
||||
]
|
||||
if (matches.length === 0) return undefined
|
||||
|
||||
const merged = matches
|
||||
.map((match) => match[1]?.replace(/\s+/g, ' ').trim() ?? '')
|
||||
.filter((value) => value.length > 0)
|
||||
.join(' ')
|
||||
|
||||
if (!merged) return undefined
|
||||
return merged
|
||||
}
|
||||
|
||||
export function summarizeCladoPrediction(
|
||||
prediction: CladoActionResponse,
|
||||
): Record<string, unknown> {
|
||||
const preview =
|
||||
typeof prediction.raw_response === 'string' &&
|
||||
prediction.raw_response.length > 0
|
||||
? prediction.raw_response.slice(0, 240)
|
||||
: undefined
|
||||
|
||||
return {
|
||||
action: prediction.action,
|
||||
x: prediction.x,
|
||||
y: prediction.y,
|
||||
text: prediction.text,
|
||||
key: prediction.key,
|
||||
direction: prediction.direction,
|
||||
startX: prediction.startX,
|
||||
startY: prediction.startY,
|
||||
endX: prediction.endX,
|
||||
endY: prediction.endY,
|
||||
amount: prediction.amount,
|
||||
time: prediction.time,
|
||||
inference_time_seconds: prediction.inference_time_seconds,
|
||||
raw_response_preview: preview,
|
||||
}
|
||||
}
|
||||
|
||||
export function getCladoActionSignature(action: CladoAction): string {
|
||||
switch (action.action) {
|
||||
case 'click':
|
||||
case 'double_click':
|
||||
case 'right_click':
|
||||
case 'hover':
|
||||
return `${action.action}:${action.x ?? 'x'}:${action.y ?? 'y'}`
|
||||
case 'type':
|
||||
return `${action.action}:${(action.text ?? '').slice(0, 16)}`
|
||||
case 'press_key':
|
||||
return `${action.action}:${action.key ?? 'key'}`
|
||||
case 'scroll':
|
||||
return `${action.action}:${action.direction ?? 'down'}:${action.amount ?? 500}`
|
||||
case 'drag':
|
||||
return `${action.action}:${action.startX}:${action.startY}:${action.endX}:${action.endY}`
|
||||
case 'wait':
|
||||
return `${action.action}:${action.time ?? 1}`
|
||||
case 'end':
|
||||
return action.final_answer
|
||||
? `end(${action.final_answer.slice(0, 32)})`
|
||||
: 'end()'
|
||||
case 'invalid':
|
||||
return `invalid(${(action.text ?? '').slice(0, 40)})`
|
||||
default:
|
||||
return action.action
|
||||
}
|
||||
}
|
||||
|
||||
export function formatCladoHistory(actions: CladoAction[]): string {
|
||||
if (actions.length === 0) return 'None'
|
||||
|
||||
const parts = actions.map((action) => {
|
||||
switch (action.action) {
|
||||
case 'click':
|
||||
case 'double_click':
|
||||
case 'right_click':
|
||||
case 'hover':
|
||||
return `${action.action}(${Math.round(action.x ?? 500)}, ${Math.round(action.y ?? 500)})`
|
||||
case 'type': {
|
||||
const text = (action.text ?? '').replace(/'/g, "\\'")
|
||||
return `type('${text}')`
|
||||
}
|
||||
case 'press_key':
|
||||
return `press_key('${action.key ?? 'Enter'}')`
|
||||
case 'scroll':
|
||||
return `scroll(${action.direction ?? 'down'})`
|
||||
case 'drag':
|
||||
return `drag(${Math.round(action.startX ?? 500)},${Math.round(action.startY ?? 500)} -> ${Math.round(action.endX ?? 500)},${Math.round(action.endY ?? 500)})`
|
||||
case 'wait':
|
||||
return `wait(${Math.round(action.time ?? 1)}s)`
|
||||
case 'end':
|
||||
return 'end()'
|
||||
case 'invalid':
|
||||
return 'invalid()'
|
||||
default:
|
||||
return action.action
|
||||
}
|
||||
})
|
||||
|
||||
return parts.join(' -> ')
|
||||
}
|
||||
123
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
vendored
Normal file
123
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-browser-driver.ts
vendored
Normal file
@@ -0,0 +1,123 @@
|
||||
import {
|
||||
CLADO_PAGE_SCOPED_TOOLS,
|
||||
type CladoActionPoint,
|
||||
type CladoViewport,
|
||||
} from './types'
|
||||
|
||||
export function clampCladoNormalizedCoordinate(value: number): number {
|
||||
return Math.min(999, Math.max(0, Math.round(value)))
|
||||
}
|
||||
|
||||
/** Converts Clado's 0-1000 normalized coordinate space into BrowserOS viewport pixels. */
|
||||
export function resolveCladoPoint(
|
||||
viewport: CladoViewport,
|
||||
normalizedX: number | undefined,
|
||||
normalizedY: number | undefined,
|
||||
): CladoActionPoint {
|
||||
const nx = clampCladoNormalizedCoordinate(normalizedX ?? 500)
|
||||
const ny = clampCladoNormalizedCoordinate(normalizedY ?? 500)
|
||||
|
||||
return {
|
||||
x: Math.round((nx / 1000) * viewport.width),
|
||||
y: Math.round((ny / 1000) * viewport.height),
|
||||
}
|
||||
}
|
||||
|
||||
/** Adapts Clado action tool arguments to the BrowserOS MCP tool argument contract. */
|
||||
export function prepareCladoToolArgs(
|
||||
toolName: string,
|
||||
args: Record<string, unknown>,
|
||||
pageId: number,
|
||||
): Record<string, unknown> {
|
||||
const prepared: Record<string, unknown> = { ...args }
|
||||
|
||||
if (
|
||||
toolName === 'evaluate_script' &&
|
||||
typeof prepared.function === 'string' &&
|
||||
prepared.expression === undefined
|
||||
) {
|
||||
prepared.expression = toCladoEvaluateExpression(prepared.function)
|
||||
delete prepared.function
|
||||
}
|
||||
|
||||
if (
|
||||
toolName === 'click_at' &&
|
||||
typeof prepared.dblClick === 'boolean' &&
|
||||
prepared.clickCount === undefined
|
||||
) {
|
||||
prepared.clickCount = prepared.dblClick ? 2 : 1
|
||||
delete prepared.dblClick
|
||||
}
|
||||
|
||||
if (
|
||||
CLADO_PAGE_SCOPED_TOOLS.has(toolName) &&
|
||||
typeof prepared.page !== 'number'
|
||||
) {
|
||||
prepared.page = pageId
|
||||
}
|
||||
|
||||
return prepared
|
||||
}
|
||||
|
||||
export function toCladoEvaluateExpression(rawFunction: unknown): string {
|
||||
const source = String(rawFunction).trim()
|
||||
if (source.startsWith('() =>') || source.startsWith('async () =>')) {
|
||||
return `(${source})()`
|
||||
}
|
||||
if (source.startsWith('function')) {
|
||||
return `(${source})()`
|
||||
}
|
||||
return source
|
||||
}
|
||||
|
||||
export function normalizeCladoPressKey(key: string | undefined): string {
|
||||
const raw = (key ?? '').trim()
|
||||
if (!raw) throw new Error('press_key action missing key field')
|
||||
|
||||
const map: Record<string, string> = {
|
||||
'C-a': 'Control+A',
|
||||
'C-c': 'Control+C',
|
||||
'C-v': 'Control+V',
|
||||
'C-x': 'Control+X',
|
||||
'C-z': 'Control+Z',
|
||||
'C-y': 'Control+Y',
|
||||
'C-s': 'Control+S',
|
||||
'C-t': 'Control+T',
|
||||
'C-w': 'Control+W',
|
||||
'C-h': 'Control+H',
|
||||
'C-f': 'Control+F',
|
||||
'C-+': 'Control++',
|
||||
'C--': 'Control+-',
|
||||
'C-tab': 'Control+Tab',
|
||||
'C-S-tab': 'Control+Shift+Tab',
|
||||
'C-S-n': 'Control+Shift+N',
|
||||
'C-down': 'Control+ArrowDown',
|
||||
'M-a': 'Meta+A',
|
||||
'M-c': 'Meta+C',
|
||||
'M-v': 'Meta+V',
|
||||
'M-x': 'Meta+X',
|
||||
'M-f4': 'Alt+F4',
|
||||
}
|
||||
return map[raw] ?? raw
|
||||
}
|
||||
|
||||
export function normalizeCladoDirection(
|
||||
direction: string | undefined,
|
||||
): 'up' | 'down' | 'left' | 'right' {
|
||||
if (
|
||||
direction === 'up' ||
|
||||
direction === 'down' ||
|
||||
direction === 'left' ||
|
||||
direction === 'right'
|
||||
) {
|
||||
return direction
|
||||
}
|
||||
return 'down'
|
||||
}
|
||||
|
||||
export function normalizeCladoScrollAmount(amount: number | undefined): number {
|
||||
if (typeof amount !== 'number') return 500
|
||||
if (amount <= 0) return 100
|
||||
const clamped = Math.min(amount, 1000)
|
||||
return Math.max(100, Math.round((clamped / 1000) * 900))
|
||||
}
|
||||
68
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
vendored
Normal file
68
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/clado-client.ts
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
import { CLADO_REQUEST_TIMEOUT_MS } from '../../../../constants'
|
||||
import { formatCladoHistory } from './clado-actions'
|
||||
import type { CladoAction, CladoActionResponse } from './types'
|
||||
|
||||
export interface CladoActionClientOptions {
|
||||
baseUrl?: string
|
||||
apiKey?: string
|
||||
}
|
||||
|
||||
export interface CladoActionPredictionInput {
|
||||
instruction: string
|
||||
imageBase64: string
|
||||
actionHistory: CladoAction[]
|
||||
signal?: AbortSignal
|
||||
}
|
||||
|
||||
/** Calls the Clado action model without exposing credentials in process arguments or artifacts. */
|
||||
export class CladoActionClient {
|
||||
constructor(private readonly options: CladoActionClientOptions) {}
|
||||
|
||||
async requestActionPrediction(
|
||||
input: CladoActionPredictionInput,
|
||||
): Promise<CladoActionResponse> {
|
||||
if (!this.options.baseUrl) {
|
||||
throw new Error('executor.baseUrl must be set for clado-action provider')
|
||||
}
|
||||
|
||||
const requestController = new AbortController()
|
||||
const onAbort = () => requestController.abort()
|
||||
input.signal?.addEventListener('abort', onAbort, { once: true })
|
||||
|
||||
const timeoutHandle = setTimeout(() => {
|
||||
requestController.abort()
|
||||
}, CLADO_REQUEST_TIMEOUT_MS)
|
||||
|
||||
try {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
if (this.options.apiKey) {
|
||||
headers.Authorization = `Bearer ${this.options.apiKey}`
|
||||
}
|
||||
|
||||
const response = await fetch(this.options.baseUrl, {
|
||||
method: 'POST',
|
||||
headers,
|
||||
body: JSON.stringify({
|
||||
instruction: input.instruction,
|
||||
image_base64: input.imageBase64,
|
||||
history: formatCladoHistory(input.actionHistory),
|
||||
}),
|
||||
signal: requestController.signal,
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const body = await response.text()
|
||||
throw new Error(
|
||||
`HTTP ${response.status} ${response.statusText}: ${body.slice(0, 400)}`,
|
||||
)
|
||||
}
|
||||
|
||||
return (await response.json()) as CladoActionResponse
|
||||
} finally {
|
||||
clearTimeout(timeoutHandle)
|
||||
input.signal?.removeEventListener('abort', onAbort)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,56 @@
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type {
|
||||
DelegationResult,
|
||||
ExecutorBackend,
|
||||
ExecutorCallbacks,
|
||||
} from '../../executor-backend'
|
||||
import { CladoActionExecutor } from './clado-action-executor'
|
||||
|
||||
export interface CladoExecutorBackendOptions {
|
||||
configTemplate: ResolvedAgentConfig
|
||||
serverUrl: string
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
}
|
||||
|
||||
/** Executes delegated goals through the Clado visual action model. */
|
||||
export class CladoExecutorBackend implements ExecutorBackend {
|
||||
readonly kind = 'clado'
|
||||
private executor: CladoActionExecutor | null = null
|
||||
|
||||
constructor(private readonly options: CladoExecutorBackendOptions) {}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<DelegationResult> {
|
||||
const executor = this.getExecutor()
|
||||
const result = await executor.execute(instruction, signal)
|
||||
return result
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
await this.executor?.close()
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
return this.executor?.getTotalSteps() ?? 0
|
||||
}
|
||||
|
||||
private getExecutor(): CladoActionExecutor {
|
||||
if (this.executor) return this.executor
|
||||
|
||||
this.executor = new CladoActionExecutor(
|
||||
{
|
||||
provider: this.options.configTemplate.provider,
|
||||
model: this.options.configTemplate.model,
|
||||
apiKey: this.options.configTemplate.apiKey ?? '',
|
||||
baseUrl: this.options.configTemplate.baseUrl,
|
||||
},
|
||||
this.options.serverUrl,
|
||||
this.options.initialPageId,
|
||||
)
|
||||
this.executor.setCallbacks(this.options.callbacks ?? {})
|
||||
return this.executor
|
||||
}
|
||||
}
|
||||
78
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/types.ts
vendored
Normal file
78
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/clado/types.ts
vendored
Normal file
@@ -0,0 +1,78 @@
|
||||
export const CLADO_ACTION_PROVIDER = 'clado-action'
|
||||
|
||||
export const CLADO_PAGE_SCOPED_TOOLS = new Set<string>([
|
||||
'take_screenshot',
|
||||
'evaluate_script',
|
||||
'click',
|
||||
'click_at',
|
||||
'hover',
|
||||
'hover_at',
|
||||
'clear',
|
||||
'fill',
|
||||
'press_key',
|
||||
'type_at',
|
||||
'drag',
|
||||
'drag_at',
|
||||
'scroll',
|
||||
'handle_dialog',
|
||||
'select_option',
|
||||
'navigate_page',
|
||||
'close_page',
|
||||
'wait_for',
|
||||
])
|
||||
|
||||
export interface CladoActionResponse {
|
||||
action?: string | null
|
||||
x?: number
|
||||
y?: number
|
||||
text?: string
|
||||
key?: string
|
||||
direction?: string
|
||||
startX?: number
|
||||
startY?: number
|
||||
endX?: number
|
||||
endY?: number
|
||||
amount?: number
|
||||
time?: number
|
||||
final_answer?: string | null
|
||||
inference_time_seconds?: number
|
||||
raw_response?: string
|
||||
thinking?: string | null
|
||||
parse_error?: string | null
|
||||
}
|
||||
|
||||
export interface CladoViewport {
|
||||
width: number
|
||||
height: number
|
||||
}
|
||||
|
||||
export interface CladoAction {
|
||||
action: string
|
||||
x?: number
|
||||
y?: number
|
||||
text?: string
|
||||
key?: string
|
||||
direction?: string
|
||||
startX?: number
|
||||
startY?: number
|
||||
endX?: number
|
||||
endY?: number
|
||||
amount?: number
|
||||
time?: number
|
||||
final_answer?: string
|
||||
}
|
||||
|
||||
export type RawCladoActionPayload = Partial<
|
||||
Omit<CladoAction, 'final_answer'>
|
||||
> & {
|
||||
final_answer?: string | null
|
||||
}
|
||||
|
||||
export interface CladoActionPoint {
|
||||
x: number
|
||||
y: number
|
||||
}
|
||||
|
||||
export function isCladoActionProvider(provider: string): boolean {
|
||||
return provider === CLADO_ACTION_PROVIDER
|
||||
}
|
||||
60
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
vendored
Normal file
60
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/create-executor-backend.ts
vendored
Normal file
@@ -0,0 +1,60 @@
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import type {
|
||||
ExecutorBackend,
|
||||
ExecutorBackendKind,
|
||||
ExecutorCallbacks,
|
||||
} from '../executor-backend'
|
||||
import { CladoExecutorBackend } from './clado/clado-executor-backend'
|
||||
import { isCladoActionProvider } from './clado/types'
|
||||
import { ToolLoopExecutorBackend } from './tool-loop/tool-loop-executor-backend'
|
||||
|
||||
export interface CreateExecutorBackendOptions {
|
||||
backendKind?: ExecutorBackendKind
|
||||
provider?: string
|
||||
configTemplate?: ResolvedAgentConfig
|
||||
browser?: Browser | null
|
||||
serverUrl?: string
|
||||
windowId?: number
|
||||
tabId?: number
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
executor?: ExecutorBackend
|
||||
}
|
||||
|
||||
export function backendKindForProvider(provider: string): ExecutorBackendKind {
|
||||
return isCladoActionProvider(provider) ? 'clado' : 'tool-loop'
|
||||
}
|
||||
|
||||
/** Creates the backend used for one orchestrator delegation. */
|
||||
export function createExecutorBackend(
|
||||
options: CreateExecutorBackendOptions,
|
||||
): ExecutorBackend {
|
||||
if (options.executor) return options.executor
|
||||
|
||||
const kind =
|
||||
options.backendKind ??
|
||||
backendKindForProvider(
|
||||
options.provider ?? options.configTemplate?.provider ?? '',
|
||||
)
|
||||
|
||||
if (kind === 'clado') {
|
||||
return new CladoExecutorBackend({
|
||||
configTemplate: required(options.configTemplate, 'configTemplate'),
|
||||
serverUrl: required(options.serverUrl, 'serverUrl'),
|
||||
initialPageId: options.initialPageId,
|
||||
callbacks: options.callbacks,
|
||||
})
|
||||
}
|
||||
|
||||
return new ToolLoopExecutorBackend({
|
||||
configTemplate: required(options.configTemplate, 'configTemplate'),
|
||||
browser: options.browser ?? null,
|
||||
callbacks: options.callbacks,
|
||||
})
|
||||
}
|
||||
|
||||
function required<T>(value: T | undefined, name: string): T {
|
||||
if (value === undefined) throw new Error(`${name} is required`)
|
||||
return value
|
||||
}
|
||||
@@ -0,0 +1,144 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import { registry } from '@browseros/server/tools/registry'
|
||||
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
|
||||
import type {
|
||||
DelegationResult,
|
||||
ExecutorBackend,
|
||||
ExecutorCallbacks,
|
||||
} from '../../executor-backend'
|
||||
import { TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT } from './tool-loop-executor-prompt'
|
||||
|
||||
export interface ToolLoopExecutorBackendOptions {
|
||||
configTemplate: ResolvedAgentConfig
|
||||
browser: Browser | null
|
||||
callbacks?: ExecutorCallbacks
|
||||
}
|
||||
|
||||
/** Executes delegated goals through the BrowserOS ToolLoopAgent. */
|
||||
export class ToolLoopExecutorBackend implements ExecutorBackend {
|
||||
readonly kind = 'tool-loop'
|
||||
private stepsUsed = 0
|
||||
private currentUrl = ''
|
||||
|
||||
constructor(private readonly options: ToolLoopExecutorBackendOptions) {}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<DelegationResult> {
|
||||
const browser = this.options.browser
|
||||
if (!browser) {
|
||||
throw new Error('Browser instance is required for tool-loop executor')
|
||||
}
|
||||
|
||||
const stepsAtStart = this.stepsUsed
|
||||
const toolsUsed: string[] = []
|
||||
let status: DelegationResult['status'] = 'done'
|
||||
let resultText = ''
|
||||
|
||||
const conversationId = randomUUID()
|
||||
const agentConfig: ResolvedAgentConfig = {
|
||||
...this.options.configTemplate,
|
||||
conversationId,
|
||||
userSystemPrompt: TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT,
|
||||
evalMode: true,
|
||||
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
|
||||
}
|
||||
|
||||
const browserContext = await this.browserContext(browser)
|
||||
let agent: AiSdkAgent | null = null
|
||||
|
||||
try {
|
||||
agent = await AiSdkAgent.create({
|
||||
resolvedConfig: agentConfig,
|
||||
browser,
|
||||
registry,
|
||||
browserContext,
|
||||
})
|
||||
|
||||
await agent.toolLoopAgent.generate({
|
||||
prompt: instruction,
|
||||
abortSignal: signal,
|
||||
|
||||
experimental_onToolCallStart: ({ toolCall }) => {
|
||||
const input = toolCall.input as Record<string, unknown> | undefined
|
||||
if (input && typeof input.url === 'string' && input.url.length > 0) {
|
||||
this.currentUrl = input.url
|
||||
}
|
||||
this.options.callbacks?.onToolCallStart?.({
|
||||
toolCallId: toolCall.toolCallId,
|
||||
toolName: toolCall.toolName,
|
||||
input: toolCall.input,
|
||||
})
|
||||
},
|
||||
|
||||
experimental_onToolCallFinish: async () => {
|
||||
this.stepsUsed++
|
||||
await this.options.callbacks?.onToolCallFinish?.()
|
||||
},
|
||||
|
||||
onStepFinish: async ({ toolCalls, toolResults, text }) => {
|
||||
if (toolCalls) {
|
||||
for (const toolCall of toolCalls) {
|
||||
if (!toolsUsed.includes(toolCall.toolName)) {
|
||||
toolsUsed.push(toolCall.toolName)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (text) resultText = text
|
||||
|
||||
await this.options.callbacks?.onStepFinish?.({
|
||||
toolCalls,
|
||||
toolResults,
|
||||
text,
|
||||
})
|
||||
},
|
||||
})
|
||||
} catch {
|
||||
status = signal?.aborted ? 'timeout' : 'blocked'
|
||||
} finally {
|
||||
if (agent) await agent.dispose().catch(() => {})
|
||||
}
|
||||
|
||||
if (status === 'done' && signal?.aborted) {
|
||||
status = 'timeout'
|
||||
}
|
||||
|
||||
return {
|
||||
observation: resultText || 'Execution completed with no actions taken.',
|
||||
status,
|
||||
url: this.currentUrl,
|
||||
actionsPerformed: this.stepsUsed - stepsAtStart,
|
||||
toolsUsed,
|
||||
}
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
// No persistent resources; AiSdkAgent is disposed at the end of each execute() call.
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
return this.stepsUsed
|
||||
}
|
||||
|
||||
private async browserContext(
|
||||
browser: Browser,
|
||||
): Promise<BrowserContext | undefined> {
|
||||
const pages = await browser.listPages()
|
||||
const activePage = pages[0]
|
||||
if (!activePage) return undefined
|
||||
|
||||
return {
|
||||
activeTab: {
|
||||
id: activePage.tabId,
|
||||
pageId: activePage.pageId,
|
||||
url: activePage.url,
|
||||
title: activePage.title,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,21 @@
|
||||
export const TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
|
||||
|
||||
## Your Job
|
||||
1. Execute browser actions to achieve the given goal
|
||||
2. Stop as soon as the goal is accomplished -- do NOT perform extra actions
|
||||
3. Write a final observation describing the result
|
||||
|
||||
## Final Response Format
|
||||
When done, your response MUST include:
|
||||
- What you accomplished (or what went wrong)
|
||||
- What the page currently shows: key headings, links, data, or content visible
|
||||
- The current URL from the address bar
|
||||
- If you got stuck, what is blocking progress
|
||||
|
||||
## Rules
|
||||
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
|
||||
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
|
||||
- If the goal is to click something, confirm the result of the click.
|
||||
- If you cannot find what was asked for, say so clearly -- do not guess or improvise.
|
||||
- Prefer browser_navigate over browser_open_tab for going to URLs.
|
||||
- Do NOT call browser_group_tabs or other organizational tools.`
|
||||
33
packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
vendored
Normal file
33
packages/browseros-agent/apps/eval/src/agents/orchestrated/executor-backend.ts
vendored
Normal file
@@ -0,0 +1,33 @@
|
||||
import type { ExecutorResult } from '../orchestrator-executor/types'
|
||||
|
||||
export type ExecutorBackendKind = 'tool-loop' | 'clado'
|
||||
export type DelegationResult = ExecutorResult
|
||||
|
||||
export interface ToolCallInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
input: unknown
|
||||
}
|
||||
|
||||
export interface ToolResultInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
output: unknown
|
||||
}
|
||||
|
||||
export interface ExecutorCallbacks {
|
||||
onToolCallStart?: (toolCall: ToolCallInfo) => void
|
||||
onToolCallFinish?: () => Promise<void>
|
||||
onStepFinish?: (step: {
|
||||
toolCalls?: ReadonlyArray<ToolCallInfo>
|
||||
toolResults?: ReadonlyArray<ToolResultInfo>
|
||||
text?: string
|
||||
}) => Promise<void>
|
||||
}
|
||||
|
||||
export interface ExecutorBackend {
|
||||
readonly kind: ExecutorBackendKind
|
||||
execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
|
||||
close(): Promise<void>
|
||||
getTotalSteps(): number
|
||||
}
|
||||
@@ -1,243 +0,0 @@
|
||||
/**
|
||||
* Executor - Wraps AiSdkAgent for page-level browser actions (direct CDP)
|
||||
*
|
||||
* The executor:
|
||||
* - Receives goal-level instructions from orchestrator
|
||||
* - Executes browser actions until the goal is accomplished
|
||||
* - Returns observation to orchestrator (not full history)
|
||||
*/
|
||||
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import { registry } from '@browseros/server/tools/registry'
|
||||
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
|
||||
import { CladoActionExecutor } from './clado-action-executor'
|
||||
import type { ExecutorResult } from './types'
|
||||
|
||||
const EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
|
||||
|
||||
## Your Job
|
||||
1. Execute browser actions to achieve the given goal
|
||||
2. Stop as soon as the goal is accomplished — do NOT perform extra actions
|
||||
3. Write a final observation describing the result
|
||||
|
||||
## Final Response Format
|
||||
When done, your response MUST include:
|
||||
- What you accomplished (or what went wrong)
|
||||
- What the page currently shows: key headings, links, data, or content visible
|
||||
- The current URL from the address bar
|
||||
- If you got stuck, what is blocking progress
|
||||
|
||||
## Rules
|
||||
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
|
||||
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
|
||||
- If the goal is to click something, confirm the result of the click.
|
||||
- If you cannot find what was asked for, say so clearly — do not guess or improvise.
|
||||
- Prefer browser_navigate over browser_open_tab for going to URLs.
|
||||
- Do NOT call browser_group_tabs or other organizational tools.`
|
||||
|
||||
export interface ToolCallInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
input: unknown
|
||||
}
|
||||
|
||||
export interface ToolResultInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
output: unknown
|
||||
}
|
||||
|
||||
export interface ExecutorCallbacks {
|
||||
onToolCallStart?: (toolCall: ToolCallInfo) => void
|
||||
onToolCallFinish?: () => Promise<void>
|
||||
onStepFinish?: (step: {
|
||||
toolCalls?: ReadonlyArray<ToolCallInfo>
|
||||
toolResults?: ReadonlyArray<ToolResultInfo>
|
||||
text?: string
|
||||
}) => Promise<void>
|
||||
}
|
||||
|
||||
export class Executor {
|
||||
private cladoExecutor: CladoActionExecutor | null = null
|
||||
private stepsUsed = 0
|
||||
private currentUrl = ''
|
||||
private configTemplate: ResolvedAgentConfig
|
||||
private isCladoAction: boolean
|
||||
private browser: Browser | null
|
||||
private serverUrl: string
|
||||
private windowId?: number
|
||||
private tabId?: number
|
||||
private initialPageId?: number
|
||||
private callbacks: ExecutorCallbacks
|
||||
|
||||
constructor(
|
||||
configTemplate: ResolvedAgentConfig,
|
||||
browser: Browser | null,
|
||||
serverUrl: string,
|
||||
options?: {
|
||||
isCladoAction?: boolean
|
||||
windowId?: number
|
||||
tabId?: number
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
},
|
||||
) {
|
||||
this.configTemplate = configTemplate
|
||||
this.isCladoAction = options?.isCladoAction ?? false
|
||||
this.browser = browser
|
||||
this.serverUrl = serverUrl
|
||||
this.windowId = options?.windowId
|
||||
this.tabId = options?.tabId
|
||||
this.initialPageId = options?.initialPageId
|
||||
this.callbacks = options?.callbacks ?? {}
|
||||
}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<ExecutorResult> {
|
||||
if (this.isCladoAction) {
|
||||
if (!this.cladoExecutor) {
|
||||
this.cladoExecutor = new CladoActionExecutor(
|
||||
{
|
||||
provider: this.configTemplate.provider,
|
||||
model: this.configTemplate.model,
|
||||
apiKey: this.configTemplate.apiKey ?? '',
|
||||
baseUrl: this.configTemplate.baseUrl,
|
||||
},
|
||||
this.serverUrl,
|
||||
this.windowId,
|
||||
this.tabId,
|
||||
this.initialPageId,
|
||||
)
|
||||
this.cladoExecutor.setCallbacks(this.callbacks)
|
||||
}
|
||||
|
||||
const result = await this.cladoExecutor.execute(instruction, signal)
|
||||
this.stepsUsed = this.cladoExecutor.getTotalSteps()
|
||||
this.currentUrl = result.url || this.currentUrl
|
||||
return result
|
||||
}
|
||||
|
||||
if (!this.browser) {
|
||||
throw new Error('Browser instance is required for standard executor path')
|
||||
}
|
||||
|
||||
const stepsAtStart = this.stepsUsed
|
||||
const toolsUsed: string[] = []
|
||||
let status: 'done' | 'blocked' | 'timeout' = 'done'
|
||||
let resultText = ''
|
||||
|
||||
const conversationId = randomUUID()
|
||||
const agentConfig: ResolvedAgentConfig = {
|
||||
...this.configTemplate,
|
||||
conversationId,
|
||||
userSystemPrompt: EXECUTOR_SYSTEM_PROMPT,
|
||||
evalMode: true,
|
||||
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
|
||||
}
|
||||
|
||||
// Build browser context so executor agent knows the correct page ID
|
||||
let browserContext: BrowserContext | undefined
|
||||
if (this.browser) {
|
||||
const pages = await this.browser.listPages()
|
||||
const activePage = pages[0]
|
||||
if (activePage) {
|
||||
browserContext = {
|
||||
activeTab: {
|
||||
id: activePage.tabId,
|
||||
pageId: activePage.pageId,
|
||||
url: activePage.url,
|
||||
title: activePage.title,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let agent: AiSdkAgent | null = null
|
||||
|
||||
try {
|
||||
agent = await AiSdkAgent.create({
|
||||
resolvedConfig: agentConfig,
|
||||
browser: this.browser,
|
||||
registry,
|
||||
browserContext,
|
||||
})
|
||||
|
||||
await agent.toolLoopAgent.generate({
|
||||
prompt: instruction,
|
||||
abortSignal: signal,
|
||||
|
||||
experimental_onToolCallStart: ({ toolCall }) => {
|
||||
const input = toolCall.input as Record<string, unknown> | undefined
|
||||
if (input && typeof input.url === 'string' && input.url.length > 0) {
|
||||
this.currentUrl = input.url
|
||||
}
|
||||
this.callbacks.onToolCallStart?.({
|
||||
toolCallId: toolCall.toolCallId,
|
||||
toolName: toolCall.toolName,
|
||||
input: toolCall.input,
|
||||
})
|
||||
},
|
||||
|
||||
experimental_onToolCallFinish: async () => {
|
||||
this.stepsUsed++
|
||||
await this.callbacks.onToolCallFinish?.()
|
||||
},
|
||||
|
||||
onStepFinish: async ({ toolCalls, toolResults, text }) => {
|
||||
if (toolCalls) {
|
||||
for (const tc of toolCalls) {
|
||||
if (!toolsUsed.includes(tc.toolName)) {
|
||||
toolsUsed.push(tc.toolName)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (text) {
|
||||
resultText = text
|
||||
}
|
||||
|
||||
await this.callbacks.onStepFinish?.({ toolCalls, toolResults, text })
|
||||
},
|
||||
})
|
||||
} catch {
|
||||
if (signal?.aborted) {
|
||||
status = 'timeout'
|
||||
} else {
|
||||
status = 'blocked'
|
||||
}
|
||||
} finally {
|
||||
if (agent) await agent.dispose().catch(() => {})
|
||||
}
|
||||
|
||||
if (status === 'done' && signal?.aborted) {
|
||||
status = 'timeout'
|
||||
}
|
||||
|
||||
const observation =
|
||||
resultText || 'Execution completed with no actions taken.'
|
||||
|
||||
return {
|
||||
observation,
|
||||
status,
|
||||
url: this.currentUrl,
|
||||
actionsPerformed: this.stepsUsed - stepsAtStart,
|
||||
toolsUsed,
|
||||
}
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
await this.cladoExecutor?.close()
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
if (this.isCladoAction) {
|
||||
return this.cladoExecutor?.getTotalSteps() ?? 0
|
||||
}
|
||||
return this.stepsUsed
|
||||
}
|
||||
}
|
||||
@@ -24,15 +24,16 @@ import {
|
||||
resolveProviderConfig,
|
||||
} from '../../utils/resolve-provider-config'
|
||||
import { withEvalTimeout } from '../../utils/with-eval-timeout'
|
||||
import { isCladoActionProvider } from '../orchestrated/backends/clado/types'
|
||||
import { createExecutorBackend } from '../orchestrated/backends/create-executor-backend'
|
||||
import type { ExecutorCallbacks } from '../orchestrated/executor-backend'
|
||||
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
|
||||
import { Executor, type ExecutorCallbacks } from './executor'
|
||||
import { OrchestratorAgent } from './orchestrator-agent'
|
||||
import type { ExecutorFactory, ExecutorResult } from './types'
|
||||
|
||||
interface ResolvedConfigs {
|
||||
orchestratorConfig: ResolvedAgentConfig & { maxTurns?: number }
|
||||
executorConfig: ResolvedAgentConfig
|
||||
isCladoAction: boolean
|
||||
}
|
||||
|
||||
function toResolvedAgentConfig(
|
||||
@@ -67,7 +68,10 @@ async function resolveAgentConfig(
|
||||
if (!executorModel) {
|
||||
throw new Error('executor.model is required in config')
|
||||
}
|
||||
if (config.executor.provider === 'clado-action' && !config.executor.baseUrl) {
|
||||
if (
|
||||
isCladoActionProvider(config.executor.provider) &&
|
||||
!config.executor.baseUrl
|
||||
) {
|
||||
throw new Error(
|
||||
'executor.baseUrl is required in config for clado-action provider',
|
||||
)
|
||||
@@ -75,10 +79,8 @@ async function resolveAgentConfig(
|
||||
|
||||
const resolvedOrchestrator = await resolveProviderConfig(config.orchestrator)
|
||||
|
||||
const isCladoAction = config.executor.provider === 'clado-action'
|
||||
|
||||
let executorConfig: ResolvedAgentConfig
|
||||
if (isCladoAction) {
|
||||
if (isCladoActionProvider(config.executor.provider)) {
|
||||
executorConfig = {
|
||||
conversationId: crypto.randomUUID(),
|
||||
provider: config.executor.provider as ResolvedAgentConfig['provider'],
|
||||
@@ -107,7 +109,7 @@ async function resolveAgentConfig(
|
||||
maxTurns: config.orchestrator.maxTurns,
|
||||
}
|
||||
|
||||
return { orchestratorConfig, executorConfig, isCladoAction }
|
||||
return { orchestratorConfig, executorConfig }
|
||||
}
|
||||
|
||||
export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
@@ -127,7 +129,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
}
|
||||
|
||||
const agentConfig = config.agent as OrchestratorExecutorConfig
|
||||
const { orchestratorConfig, executorConfig, isCladoAction } =
|
||||
const { orchestratorConfig, executorConfig } =
|
||||
await resolveAgentConfig(agentConfig)
|
||||
|
||||
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
|
||||
@@ -235,12 +237,12 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
await capture.messageLogger.logStreamEvent(delegateInputEvent)
|
||||
capture.emitEvent(task.query_id, delegateInputEvent)
|
||||
|
||||
const executor = new Executor(
|
||||
executorConfig,
|
||||
const executor = createExecutorBackend({
|
||||
configTemplate: executorConfig,
|
||||
browser,
|
||||
config.browseros.server_url,
|
||||
{ isCladoAction, callbacks },
|
||||
)
|
||||
serverUrl: config.browseros.server_url,
|
||||
callbacks,
|
||||
})
|
||||
let result: ExecutorResult
|
||||
try {
|
||||
result = await executor.execute(instruction, signal)
|
||||
@@ -329,6 +331,5 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
}
|
||||
}
|
||||
|
||||
export { Executor } from './executor'
|
||||
export { OrchestratorAgent } from './orchestrator-agent'
|
||||
export * from './types'
|
||||
|
||||
@@ -57,6 +57,20 @@ export class TrajectorySaver {
|
||||
)
|
||||
}
|
||||
|
||||
async saveAttempt(attempt: Record<string, unknown>): Promise<void> {
|
||||
await writeFile(
|
||||
join(this.outputDir, 'attempt.json'),
|
||||
JSON.stringify(attempt, null, 2),
|
||||
)
|
||||
}
|
||||
|
||||
async saveGrades(graderResults: Record<string, GraderResult>): Promise<void> {
|
||||
await writeFile(
|
||||
join(this.outputDir, 'grades.json'),
|
||||
JSON.stringify(graderResults, null, 2),
|
||||
)
|
||||
}
|
||||
|
||||
async loadMetadata(): Promise<TaskMetadata> {
|
||||
const content = await readFile(
|
||||
join(this.outputDir, 'metadata.json'),
|
||||
@@ -70,6 +84,7 @@ export class TrajectorySaver {
|
||||
): Promise<void> {
|
||||
const metadata = await this.loadMetadata()
|
||||
metadata.grader_results = graderResults
|
||||
await this.saveGrades(graderResults)
|
||||
await this.saveMetadata(metadata)
|
||||
}
|
||||
|
||||
@@ -90,7 +105,10 @@ export class TrajectorySaver {
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: {
|
||||
type: agentConfig.type as 'single' | 'orchestrator-executor',
|
||||
type: agentConfig.type as
|
||||
| 'single'
|
||||
| 'orchestrator-executor'
|
||||
| 'claude-code',
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
|
||||
170
packages/browseros-agent/apps/eval/src/cli/args.ts
vendored
Normal file
170
packages/browseros-agent/apps/eval/src/cli/args.ts
vendored
Normal file
@@ -0,0 +1,170 @@
|
||||
import { parseArgs } from 'node:util'
|
||||
|
||||
export type PublishTarget = 'r2'
|
||||
|
||||
export interface LegacyCliArgs {
|
||||
command: 'legacy'
|
||||
configPath?: string
|
||||
help?: boolean
|
||||
}
|
||||
|
||||
export interface SuiteCliArgs {
|
||||
command: 'suite'
|
||||
configPath?: string
|
||||
suitePath?: string
|
||||
variantId?: string
|
||||
provider?: string
|
||||
model?: string
|
||||
apiKey?: string
|
||||
baseUrl?: string
|
||||
publishTarget?: PublishTarget
|
||||
}
|
||||
|
||||
export interface RunCliArgs
|
||||
extends Omit<SuiteCliArgs, 'command' | 'publishTarget'> {
|
||||
command: 'run'
|
||||
}
|
||||
|
||||
export interface GradeCliArgs {
|
||||
command: 'grade'
|
||||
runDir: string
|
||||
}
|
||||
|
||||
export interface PublishCliArgs {
|
||||
command: 'publish'
|
||||
runDir: string
|
||||
target: PublishTarget
|
||||
}
|
||||
|
||||
export type EvalCliArgs =
|
||||
| LegacyCliArgs
|
||||
| SuiteCliArgs
|
||||
| RunCliArgs
|
||||
| GradeCliArgs
|
||||
| PublishCliArgs
|
||||
|
||||
const COMMANDS = new Set(['suite', 'run', 'grade', 'publish'])
|
||||
|
||||
function stringValue(value: string | boolean | undefined): string | undefined {
|
||||
return typeof value === 'string' && value.length > 0 ? value : undefined
|
||||
}
|
||||
|
||||
function publishTarget(value: string | undefined): PublishTarget | undefined {
|
||||
if (value === undefined) return undefined
|
||||
if (value === 'r2') return 'r2'
|
||||
throw new Error(`Unsupported publish target: ${value}`)
|
||||
}
|
||||
|
||||
function requireOne(
|
||||
command: string,
|
||||
configPath: string | undefined,
|
||||
suitePath: string | undefined,
|
||||
): void {
|
||||
if (!configPath && !suitePath) {
|
||||
throw new Error(`${command} requires --config or --suite`)
|
||||
}
|
||||
if (configPath && suitePath) {
|
||||
throw new Error(`${command} accepts either --config or --suite, not both`)
|
||||
}
|
||||
}
|
||||
|
||||
function parseSuiteLikeArgs(
|
||||
command: 'suite' | 'run',
|
||||
argv: string[],
|
||||
): SuiteCliArgs | RunCliArgs {
|
||||
const { values } = parseArgs({
|
||||
args: argv,
|
||||
options: {
|
||||
config: { type: 'string' },
|
||||
suite: { type: 'string' },
|
||||
variant: { type: 'string' },
|
||||
provider: { type: 'string' },
|
||||
model: { type: 'string' },
|
||||
'api-key': { type: 'string' },
|
||||
'base-url': { type: 'string' },
|
||||
publish: { type: 'string' },
|
||||
},
|
||||
})
|
||||
|
||||
const configPath = stringValue(values.config)
|
||||
const suitePath = stringValue(values.suite)
|
||||
requireOne(command, configPath, suitePath)
|
||||
|
||||
const parsed: SuiteCliArgs | RunCliArgs =
|
||||
command === 'suite' ? { command: 'suite' } : { command: 'run' }
|
||||
if (configPath) parsed.configPath = configPath
|
||||
if (suitePath) parsed.suitePath = suitePath
|
||||
const variantId = stringValue(values.variant)
|
||||
if (variantId) parsed.variantId = variantId
|
||||
const provider = stringValue(values.provider)
|
||||
if (provider) parsed.provider = provider
|
||||
const model = stringValue(values.model)
|
||||
if (model) parsed.model = model
|
||||
const apiKey = stringValue(values['api-key'])
|
||||
if (apiKey) parsed.apiKey = apiKey
|
||||
const baseUrl = stringValue(values['base-url'])
|
||||
if (baseUrl) parsed.baseUrl = baseUrl
|
||||
|
||||
if (command === 'suite') {
|
||||
const target = publishTarget(stringValue(values.publish))
|
||||
if (target) {
|
||||
const suiteArgs = parsed as SuiteCliArgs
|
||||
suiteArgs.publishTarget = target
|
||||
}
|
||||
}
|
||||
|
||||
return parsed
|
||||
}
|
||||
|
||||
function parseLegacyArgs(argv: string[]): LegacyCliArgs {
|
||||
const { values } = parseArgs({
|
||||
args: argv,
|
||||
options: {
|
||||
config: { type: 'string', short: 'c' },
|
||||
help: { type: 'boolean', short: 'h' },
|
||||
},
|
||||
})
|
||||
|
||||
const parsed: LegacyCliArgs = { command: 'legacy' }
|
||||
const configPath = stringValue(values.config)
|
||||
if (configPath) parsed.configPath = configPath
|
||||
if (values.help) parsed.help = true
|
||||
return parsed
|
||||
}
|
||||
|
||||
/** Parses the eval CLI command without running browser or publishing side effects. */
|
||||
export function parseEvalCliArgs(argv: string[]): EvalCliArgs {
|
||||
const [command, ...rest] = argv
|
||||
if (!COMMANDS.has(command ?? '')) {
|
||||
return parseLegacyArgs(argv)
|
||||
}
|
||||
|
||||
switch (command) {
|
||||
case 'suite':
|
||||
return parseSuiteLikeArgs('suite', rest)
|
||||
case 'run':
|
||||
return parseSuiteLikeArgs('run', rest)
|
||||
case 'grade': {
|
||||
const { values } = parseArgs({
|
||||
args: rest,
|
||||
options: { run: { type: 'string' } },
|
||||
})
|
||||
const runDir = stringValue(values.run)
|
||||
if (!runDir) throw new Error('grade requires --run')
|
||||
return { command: 'grade', runDir }
|
||||
}
|
||||
case 'publish': {
|
||||
const { values } = parseArgs({
|
||||
args: rest,
|
||||
options: { run: { type: 'string' }, target: { type: 'string' } },
|
||||
})
|
||||
const runDir = stringValue(values.run)
|
||||
if (!runDir) throw new Error('publish requires --run')
|
||||
const target = publishTarget(stringValue(values.target))
|
||||
if (!target) throw new Error('publish requires --target')
|
||||
return { command: 'publish', runDir, target }
|
||||
}
|
||||
default:
|
||||
return parseLegacyArgs(argv)
|
||||
}
|
||||
}
|
||||
84
packages/browseros-agent/apps/eval/src/cli/commands/grade.ts
vendored
Normal file
84
packages/browseros-agent/apps/eval/src/cli/commands/grade.ts
vendored
Normal file
@@ -0,0 +1,84 @@
|
||||
import { readdir, readFile, stat } from 'node:fs/promises'
|
||||
import { join } from 'node:path'
|
||||
import { TrajectorySaver } from '../../capture/trajectory-saver'
|
||||
import { runGraders } from '../../grading/grader-runner'
|
||||
import { type Message, MessageSchema, TaskMetadataSchema } from '../../types'
|
||||
import type { GradeCliArgs } from '../args'
|
||||
|
||||
async function loadMessages(taskDir: string): Promise<Message[]> {
|
||||
const content = await readFile(
|
||||
join(taskDir, 'messages.jsonl'),
|
||||
'utf-8',
|
||||
).catch(() => '')
|
||||
return content
|
||||
.split('\n')
|
||||
.filter((line) => line.trim().length > 0)
|
||||
.map((line) => MessageSchema.parse(JSON.parse(line)))
|
||||
}
|
||||
|
||||
async function findTaskDirs(runDir: string): Promise<string[]> {
|
||||
const entries = await readdir(runDir, { withFileTypes: true })
|
||||
const taskDirs: string[] = []
|
||||
for (const entry of entries) {
|
||||
if (!entry.isDirectory()) continue
|
||||
const taskDir = join(runDir, entry.name)
|
||||
const metadata = await stat(join(taskDir, 'metadata.json')).catch(
|
||||
() => null,
|
||||
)
|
||||
if (metadata?.isFile()) taskDirs.push(taskDir)
|
||||
}
|
||||
return taskDirs
|
||||
}
|
||||
|
||||
/** Re-runs graders for task artifacts that already contain metadata and messages. */
|
||||
export async function runGradeCommand(args: GradeCliArgs): Promise<void> {
|
||||
const runStat = await stat(args.runDir).catch(() => null)
|
||||
if (!runStat?.isDirectory()) {
|
||||
throw new Error(`Not a run directory: ${args.runDir}`)
|
||||
}
|
||||
|
||||
const taskDirs = await findTaskDirs(args.runDir)
|
||||
if (taskDirs.length === 0) {
|
||||
throw new Error(`No task metadata found under ${args.runDir}`)
|
||||
}
|
||||
|
||||
let graded = 0
|
||||
for (const taskDir of taskDirs) {
|
||||
const metadata = TaskMetadataSchema.parse(
|
||||
JSON.parse(await readFile(join(taskDir, 'metadata.json'), 'utf-8')),
|
||||
)
|
||||
const graderNames = Object.keys(metadata.grader_results ?? {})
|
||||
if (graderNames.length === 0) {
|
||||
console.warn(`Skipping ${metadata.query_id}: no existing grader names`)
|
||||
continue
|
||||
}
|
||||
|
||||
const messages = await loadMessages(taskDir)
|
||||
const graderResults = await runGraders(graderNames, {
|
||||
task: {
|
||||
query_id: metadata.query_id,
|
||||
query: metadata.query,
|
||||
dataset: metadata.dataset,
|
||||
},
|
||||
messages,
|
||||
screenshotCount: metadata.screenshot_count ?? metadata.total_steps,
|
||||
finalAnswer: metadata.final_answer,
|
||||
taskArtifactDir: taskDir,
|
||||
outputDir: taskDir,
|
||||
mcpUrl: `${process.env.BROWSEROS_SERVER_URL || 'http://127.0.0.1:9110'}/mcp`,
|
||||
})
|
||||
|
||||
await new TrajectorySaver(
|
||||
args.runDir,
|
||||
metadata.query_id,
|
||||
).updateGraderResults(graderResults)
|
||||
graded++
|
||||
}
|
||||
|
||||
if (graded === 0) {
|
||||
throw new Error(
|
||||
`No tasks with existing grader names found under ${args.runDir}`,
|
||||
)
|
||||
}
|
||||
console.log(`Re-graded ${graded} task(s) in ${args.runDir}`)
|
||||
}
|
||||
25
packages/browseros-agent/apps/eval/src/cli/commands/publish.ts
vendored
Normal file
25
packages/browseros-agent/apps/eval/src/cli/commands/publish.ts
vendored
Normal file
@@ -0,0 +1,25 @@
|
||||
import { publishPathToR2 } from '../../publishing/r2-publisher'
|
||||
import type { PublishCliArgs, PublishTarget } from '../args'
|
||||
|
||||
export interface PublishRunOptions {
|
||||
runDir: string
|
||||
target: PublishTarget
|
||||
}
|
||||
|
||||
/** Publishes run artifacts through the R2 viewer upload path. */
|
||||
export async function publishRun(options: PublishRunOptions): Promise<void> {
|
||||
if (options.target !== 'r2') {
|
||||
throw new Error(`Unsupported publish target: ${options.target}`)
|
||||
}
|
||||
const result = await publishPathToR2(options.runDir)
|
||||
for (const run of result.uploadedRuns) {
|
||||
console.log(run.viewerUrl)
|
||||
}
|
||||
for (const runId of result.skippedRuns) {
|
||||
console.log(`${runId}: already uploaded, skipping`)
|
||||
}
|
||||
}
|
||||
|
||||
export async function runPublishCommand(args: PublishCliArgs): Promise<void> {
|
||||
await publishRun({ runDir: args.runDir, target: args.target })
|
||||
}
|
||||
21
packages/browseros-agent/apps/eval/src/cli/commands/run.ts
vendored
Normal file
21
packages/browseros-agent/apps/eval/src/cli/commands/run.ts
vendored
Normal file
@@ -0,0 +1,21 @@
|
||||
import type { RunCliArgs } from '../args'
|
||||
import { runSuiteCommand, type SuiteCommandDeps } from './suite'
|
||||
|
||||
/** Executes tasks from a config or suite without publishing artifacts. */
|
||||
export async function runRunCommand(
|
||||
args: RunCliArgs,
|
||||
deps: SuiteCommandDeps = {},
|
||||
): Promise<void> {
|
||||
await runSuiteCommand(
|
||||
{
|
||||
configPath: args.configPath,
|
||||
suitePath: args.suitePath,
|
||||
variantId: args.variantId,
|
||||
provider: args.provider,
|
||||
model: args.model,
|
||||
apiKey: args.apiKey,
|
||||
baseUrl: args.baseUrl,
|
||||
},
|
||||
deps,
|
||||
)
|
||||
}
|
||||
200
packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
vendored
Normal file
200
packages/browseros-agent/apps/eval/src/cli/commands/suite.ts
vendored
Normal file
@@ -0,0 +1,200 @@
|
||||
import type { RunEvalOptions, RunEvalResult } from '../../runner/types'
|
||||
import { runEval as defaultRunEval } from '../../runs/eval-runner'
|
||||
import {
|
||||
type AdaptedEvalConfig,
|
||||
adaptEvalConfigFile,
|
||||
} from '../../suites/config-adapter'
|
||||
import { loadSuite } from '../../suites/load-suite'
|
||||
import { type EvalVariant, resolveVariant } from '../../suites/resolve-variant'
|
||||
import type { EvalSuite } from '../../suites/schema'
|
||||
import { type EvalConfig, EvalConfigSchema } from '../../types'
|
||||
import type { PublishTarget } from '../args'
|
||||
|
||||
type Env = Record<string, string | undefined>
|
||||
|
||||
export interface SuiteCommandOptions {
|
||||
configPath?: string
|
||||
suitePath?: string
|
||||
variantId?: string
|
||||
provider?: string
|
||||
model?: string
|
||||
apiKey?: string
|
||||
baseUrl?: string
|
||||
publishTarget?: PublishTarget
|
||||
env?: Env
|
||||
}
|
||||
|
||||
export type ResolvedSuiteCommand =
|
||||
| (AdaptedEvalConfig & { kind: 'config'; datasetPath?: undefined })
|
||||
| {
|
||||
kind: 'suite'
|
||||
suitePath: string
|
||||
suite: EvalSuite
|
||||
variant: EvalVariant
|
||||
datasetPath: string
|
||||
evalConfig: EvalConfig
|
||||
}
|
||||
|
||||
export interface SuiteCommandDeps {
|
||||
runEval?: (options: RunEvalOptions) => Promise<RunEvalResult | undefined>
|
||||
publishRun?: (options: {
|
||||
runDir: string
|
||||
target: PublishTarget
|
||||
}) => Promise<void>
|
||||
}
|
||||
|
||||
function ensureRunnableSuite(suite: EvalSuite): void {
|
||||
if (!suite.browseros) {
|
||||
throw new Error('suite browseros config is required to run suite commands')
|
||||
}
|
||||
}
|
||||
|
||||
function suiteToEvalConfig(
|
||||
suite: EvalSuite,
|
||||
datasetPath: string,
|
||||
variant: EvalVariant,
|
||||
env: Env,
|
||||
): EvalConfig {
|
||||
ensureRunnableSuite(suite)
|
||||
|
||||
const base = {
|
||||
dataset: datasetPath,
|
||||
num_workers: suite.workers,
|
||||
restart_server_per_task: suite.restartBrowserPerTask,
|
||||
browseros: suite.browseros,
|
||||
graders: suite.graders,
|
||||
timeout_ms: suite.timeoutMs,
|
||||
captcha: suite.captcha,
|
||||
}
|
||||
|
||||
if (suite.agent.type === 'single' || suite.agent.type === 'tool-loop') {
|
||||
// The legacy runner names the BrowserOS tool-loop agent "single".
|
||||
return EvalConfigSchema.parse({
|
||||
...base,
|
||||
agent: {
|
||||
type: 'single',
|
||||
provider: variant.agent.provider,
|
||||
model: variant.agent.model,
|
||||
apiKey: variant.agent.apiKey,
|
||||
baseUrl: variant.agent.baseUrl,
|
||||
supportsImages: variant.agent.supportsImages,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
if (suite.agent.type === 'claude-code') {
|
||||
return EvalConfigSchema.parse({
|
||||
...base,
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
...(variant.agent.model && { model: variant.agent.model }),
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
|
||||
const executor =
|
||||
executorBackend === 'clado'
|
||||
? {
|
||||
provider: 'clado-action' as const,
|
||||
model:
|
||||
env.EVAL_EXECUTOR_MODEL ?? env.CLADO_ACTION_MODEL ?? 'clado-action',
|
||||
apiKey: env.EVAL_EXECUTOR_API_KEY ?? env.CLADO_ACTION_API_KEY ?? '',
|
||||
baseUrl:
|
||||
env.EVAL_EXECUTOR_BASE_URL ??
|
||||
env.CLADO_ACTION_BASE_URL ??
|
||||
env.CLADO_ACTION_URL,
|
||||
}
|
||||
: {
|
||||
provider: variant.agent.provider,
|
||||
model: variant.agent.model,
|
||||
apiKey: variant.agent.apiKey,
|
||||
baseUrl: variant.agent.baseUrl,
|
||||
}
|
||||
|
||||
return EvalConfigSchema.parse({
|
||||
...base,
|
||||
agent: {
|
||||
type: 'orchestrator-executor',
|
||||
orchestrator: {
|
||||
provider: variant.agent.provider,
|
||||
model: variant.agent.model,
|
||||
apiKey: variant.agent.apiKey,
|
||||
baseUrl: variant.agent.baseUrl,
|
||||
},
|
||||
executor,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
/** Resolves config-backed or suite-backed CLI input into the run shape used by the runner. */
|
||||
export async function resolveSuiteCommand(
|
||||
options: SuiteCommandOptions,
|
||||
): Promise<ResolvedSuiteCommand> {
|
||||
const env = options.env ?? process.env
|
||||
if (options.configPath) {
|
||||
return {
|
||||
kind: 'config',
|
||||
...(await adaptEvalConfigFile(options.configPath, { env })),
|
||||
}
|
||||
}
|
||||
if (!options.suitePath) {
|
||||
throw new Error('suite requires --config or --suite')
|
||||
}
|
||||
|
||||
const loaded = await loadSuite(options.suitePath)
|
||||
const variant = resolveVariant({
|
||||
variantId: options.variantId,
|
||||
provider:
|
||||
loaded.suite.agent.type === 'claude-code'
|
||||
? 'claude-code'
|
||||
: options.provider,
|
||||
model: options.model,
|
||||
apiKey: options.apiKey,
|
||||
baseUrl: options.baseUrl,
|
||||
env,
|
||||
})
|
||||
|
||||
return {
|
||||
kind: 'suite',
|
||||
suitePath: loaded.suitePath,
|
||||
suite: loaded.suite,
|
||||
variant,
|
||||
datasetPath: loaded.datasetPath,
|
||||
evalConfig: suiteToEvalConfig(
|
||||
loaded.suite,
|
||||
loaded.datasetPath,
|
||||
variant,
|
||||
env,
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
/** Runs the full suite loop: resolve input, execute tasks, then optionally publish the run. */
|
||||
export async function runSuiteCommand(
|
||||
options: SuiteCommandOptions,
|
||||
deps: SuiteCommandDeps = {},
|
||||
): Promise<void> {
|
||||
const runEval = deps.runEval ?? defaultRunEval
|
||||
const resolved = await resolveSuiteCommand(options)
|
||||
const runOptions: RunEvalOptions =
|
||||
resolved.kind === 'config'
|
||||
? { configPath: resolved.configPath }
|
||||
: {
|
||||
configPath: resolved.suitePath,
|
||||
dataPath: resolved.datasetPath,
|
||||
config: resolved.evalConfig,
|
||||
}
|
||||
|
||||
const result = await runEval(runOptions)
|
||||
if (!options.publishTarget) return
|
||||
|
||||
const outputDir = result?.outputDir
|
||||
if (!outputDir) {
|
||||
throw new Error('publish requested but runner did not return an outputDir')
|
||||
}
|
||||
if (!deps.publishRun) {
|
||||
throw new Error('publish requested before the publisher is configured')
|
||||
}
|
||||
await deps.publishRun({ runDir: outputDir, target: options.publishTarget })
|
||||
}
|
||||
70
packages/browseros-agent/apps/eval/src/cli/index.ts
vendored
Normal file
70
packages/browseros-agent/apps/eval/src/cli/index.ts
vendored
Normal file
@@ -0,0 +1,70 @@
|
||||
import { startDashboard } from '../dashboard/server'
|
||||
import { runEval } from '../runs/eval-runner'
|
||||
import { type EvalCliArgs, parseEvalCliArgs } from './args'
|
||||
import { runGradeCommand } from './commands/grade'
|
||||
import { publishRun, runPublishCommand } from './commands/publish'
|
||||
import { runRunCommand } from './commands/run'
|
||||
import { runSuiteCommand } from './commands/suite'
|
||||
|
||||
export function usage(): string {
|
||||
return `
|
||||
BrowserOS Eval
|
||||
|
||||
Usage:
|
||||
bun run eval suite --config <config.json> [--publish r2]
|
||||
bun run eval suite --suite <suite.json> --variant <id> [--publish r2]
|
||||
bun run eval run --config <config.json>
|
||||
bun run eval run --suite <suite.json> --variant <id>
|
||||
bun run eval grade --run <results/run-dir>
|
||||
bun run eval publish --run <results/run-dir> --target r2
|
||||
bun run eval -c <config.json>
|
||||
`
|
||||
}
|
||||
|
||||
async function runLegacyCommand(args: EvalCliArgs): Promise<void> {
|
||||
if (args.command !== 'legacy') return
|
||||
if (args.help) {
|
||||
console.log(usage())
|
||||
return
|
||||
}
|
||||
if (args.configPath) {
|
||||
await runEval({ configPath: args.configPath })
|
||||
return
|
||||
}
|
||||
|
||||
startDashboard({
|
||||
tasks: [],
|
||||
configName: '',
|
||||
agentType: '',
|
||||
outputDir: '',
|
||||
configMode: true,
|
||||
})
|
||||
console.log(
|
||||
'Dashboard running at http://localhost:9900 — configure and run from the UI',
|
||||
)
|
||||
await new Promise(() => {})
|
||||
}
|
||||
|
||||
/** Dispatches the eval CLI while preserving the old config/dashboard entry points. */
|
||||
export async function runCli(
|
||||
argv: string[] = Bun.argv.slice(2),
|
||||
): Promise<void> {
|
||||
const args = parseEvalCliArgs(argv)
|
||||
switch (args.command) {
|
||||
case 'legacy':
|
||||
await runLegacyCommand(args)
|
||||
break
|
||||
case 'suite':
|
||||
await runSuiteCommand(args, { publishRun })
|
||||
break
|
||||
case 'run':
|
||||
await runRunCommand(args)
|
||||
break
|
||||
case 'grade':
|
||||
await runGradeCommand(args)
|
||||
break
|
||||
case 'publish':
|
||||
await runPublishCommand(args)
|
||||
break
|
||||
}
|
||||
}
|
||||
@@ -1,5 +1,5 @@
|
||||
import { mkdir, readdir, readFile, stat } from 'node:fs/promises'
|
||||
import { join, resolve } from 'node:path'
|
||||
import { dirname, join, resolve, sep } from 'node:path'
|
||||
import { Hono } from 'hono'
|
||||
import { streamSSE } from 'hono/streaming'
|
||||
import { ParallelExecutor } from '../runner/parallel-executor'
|
||||
@@ -128,6 +128,35 @@ let dashboardConfigMode = false
|
||||
const configsDir = join(import.meta.dir, '..', '..', 'configs')
|
||||
const projectRoot = resolve(import.meta.dir, '..', '..', '..', '..')
|
||||
|
||||
async function listConfigFiles(dir: string, prefix = ''): Promise<string[]> {
|
||||
const entries = await readdir(join(dir, prefix), { withFileTypes: true })
|
||||
const files: string[] = []
|
||||
for (const entry of entries) {
|
||||
const relativePath = prefix ? join(prefix, entry.name) : entry.name
|
||||
if (entry.isDirectory()) {
|
||||
files.push(...(await listConfigFiles(dir, relativePath)))
|
||||
} else if (entry.isFile() && entry.name.endsWith('.json')) {
|
||||
files.push(relativePath.split(sep).join('/'))
|
||||
}
|
||||
}
|
||||
return files.sort()
|
||||
}
|
||||
|
||||
function resolveConfigPath(name: string): string | null {
|
||||
if (!name.endsWith('.json')) return null
|
||||
if (name.split('/').some((part) => !part || part === '.' || part === '..')) {
|
||||
return null
|
||||
}
|
||||
|
||||
const resolvedPath = resolve(configsDir, name)
|
||||
const resolvedConfigsDir = resolve(configsDir)
|
||||
const configRootPrefix = resolvedConfigsDir.endsWith(sep)
|
||||
? resolvedConfigsDir
|
||||
: `${resolvedConfigsDir}${sep}`
|
||||
if (!resolvedPath.startsWith(configRootPrefix)) return null
|
||||
return resolvedPath
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Hono App
|
||||
// ============================================================================
|
||||
@@ -339,21 +368,21 @@ app.get('/api/mode', (c) => {
|
||||
// List saved config files
|
||||
app.get('/api/configs', async (c) => {
|
||||
try {
|
||||
const files = await readdir(configsDir)
|
||||
return c.json(files.filter((f) => f.endsWith('.json')))
|
||||
return c.json(await listConfigFiles(configsDir))
|
||||
} catch {
|
||||
return c.json([])
|
||||
}
|
||||
})
|
||||
|
||||
// Read a specific config file
|
||||
app.get('/api/config/:name', async (c) => {
|
||||
const name = c.req.param('name')
|
||||
if (name.includes('/') || name.includes('..')) {
|
||||
app.get('/api/config/*', async (c) => {
|
||||
const name = decodeURIComponent(c.req.path.slice('/api/config/'.length))
|
||||
const configPath = resolveConfigPath(name)
|
||||
if (!configPath) {
|
||||
return c.json({ error: 'Invalid config name' }, 400)
|
||||
}
|
||||
try {
|
||||
const content = await readFile(join(configsDir, name), 'utf-8')
|
||||
const content = await readFile(configPath, 'utf-8')
|
||||
return c.json(JSON.parse(content))
|
||||
} catch {
|
||||
return c.notFound()
|
||||
@@ -382,8 +411,17 @@ app.post('/api/run', async (c) => {
|
||||
|
||||
const config = parseResult.data
|
||||
|
||||
// Resolve relative paths from configs/ dir (dataset dropdown values are relative to it)
|
||||
const baseDir = configsDir
|
||||
let baseDir = configsDir
|
||||
if (body.configName) {
|
||||
const configPath = resolveConfigPath(body.configName)
|
||||
if (!configPath) {
|
||||
return c.json({ error: 'Invalid config name' }, 400)
|
||||
}
|
||||
baseDir = dirname(configPath)
|
||||
}
|
||||
|
||||
// Resolve relative paths from the loaded config location. Unsaved dashboard
|
||||
// configs keep using apps/eval/configs as their base for dropdown values.
|
||||
const datasetPath = resolve(
|
||||
config.dataset.startsWith('/')
|
||||
? config.dataset
|
||||
|
||||
@@ -685,6 +685,59 @@
|
||||
});
|
||||
}
|
||||
|
||||
// Test harness note: these ASCII section markers are used by r2-viewer-compat.test.ts.
|
||||
// -- Artifact path resolution
|
||||
function taskKey(task) {
|
||||
return task.queryId || task.id || 'unknown-task';
|
||||
}
|
||||
|
||||
function legacyArtifactPath(task, artifact) {
|
||||
const id = taskKey(task);
|
||||
switch (artifact) {
|
||||
case 'attempt':
|
||||
return `${id}/attempt.json`;
|
||||
case 'metadata':
|
||||
return `${id}/metadata.json`;
|
||||
case 'messages':
|
||||
return `${id}/messages.jsonl`;
|
||||
case 'trace':
|
||||
return `${id}/trace.jsonl`;
|
||||
case 'grades':
|
||||
return `${id}/grades.json`;
|
||||
case 'screenshots':
|
||||
return `${id}/screenshots`;
|
||||
case 'graderArtifacts':
|
||||
return `${id}/grader-artifacts`;
|
||||
default:
|
||||
return `${id}/${artifact}`;
|
||||
}
|
||||
}
|
||||
|
||||
function artifactPath(task, artifact) {
|
||||
const manifestPath = task.paths && task.paths[artifact];
|
||||
if (typeof manifestPath === 'string' && manifestPath.length > 0) {
|
||||
return manifestPath.replace(/^\/+/, '');
|
||||
}
|
||||
return legacyArtifactPath(task, artifact);
|
||||
}
|
||||
|
||||
function artifactUrl(task, artifact) {
|
||||
return `${basePath}/${artifactPath(task, artifact)}`;
|
||||
}
|
||||
|
||||
function metadataUrl(task) {
|
||||
return artifactUrl(task, 'metadata');
|
||||
}
|
||||
|
||||
function messagesUrl(task) {
|
||||
return artifactUrl(task, 'messages');
|
||||
}
|
||||
|
||||
function screenshotUrl(task, n) {
|
||||
return `${artifactUrl(task, 'screenshots')}/${n}.png`;
|
||||
}
|
||||
|
||||
// -- Task selection
|
||||
// ── Task selection ─────────────────────────────────────────────
|
||||
function selectTask(task) {
|
||||
stopAutoplay();
|
||||
@@ -716,6 +769,7 @@
|
||||
}
|
||||
}
|
||||
|
||||
// -- Center panel
|
||||
// ── Center panel: screenshot viewer ────────────────────────────
|
||||
function renderCenterPanel(task) {
|
||||
const panel = document.getElementById('center-panel');
|
||||
@@ -763,10 +817,6 @@
|
||||
updateControls();
|
||||
}
|
||||
|
||||
function screenshotUrl(task, n) {
|
||||
return `${basePath}/${task.queryId || task.id}/screenshots/${n}.png`;
|
||||
}
|
||||
|
||||
function goToStep(n) {
|
||||
if (!selectedTask || n < 1 || n > totalSteps) return;
|
||||
currentStep = n;
|
||||
@@ -914,7 +964,7 @@
|
||||
body.innerHTML = '<div class="placeholder"><div class="ph-text" style="color: #6e7681;">Loading messages...</div></div>';
|
||||
countEl.textContent = '';
|
||||
|
||||
const msgUrl = `${basePath}/${task.queryId || task.id}/messages.jsonl`;
|
||||
const msgUrl = messagesUrl(task);
|
||||
|
||||
fetch(msgUrl)
|
||||
.then((res) => {
|
||||
@@ -1075,7 +1125,7 @@
|
||||
|
||||
// ── Load task metadata for rich grader details ──────────────────
|
||||
function loadTaskMetadata(task) {
|
||||
const metaUrl = `${basePath}/${task.queryId || task.id}/metadata.json`;
|
||||
const metaUrl = metadataUrl(task);
|
||||
fetch(metaUrl)
|
||||
.then((res) => res.ok ? res.json() : null)
|
||||
.then((meta) => {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user