mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-14 08:03:58 +00:00
Compare commits
17 Commits
fix/db-pat
...
fix/eval-4
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6ee306236e | ||
|
|
0afc59cda1 | ||
|
|
eb8faa931a | ||
|
|
be70170313 | ||
|
|
0661197f5b | ||
|
|
c4e7824266 | ||
|
|
22f71a36c5 | ||
|
|
d49986d0b3 | ||
|
|
acdd394585 | ||
|
|
219fdf1e28 | ||
|
|
014f71d227 | ||
|
|
876dea4d56 | ||
|
|
fca7d4cbcb | ||
|
|
e1bfadb075 | ||
|
|
aa0d9b96ef | ||
|
|
1c9604b5fa | ||
|
|
685266a1d8 |
@@ -1,152 +0,0 @@
|
||||
---
|
||||
name: ask-internal
|
||||
description: Answer questions about BrowserOS internal stuff (setup, features, architecture, design decisions) by reading the private internal-docs submodule and the codebase. Use for "how do I X", "where is Y", "what is the deal with Z", or any question that mixes ops/setup knowledge with code knowledge. Can execute steps with per-command confirmation.
|
||||
allowed-tools: Bash, Read, Grep, Glob, Edit, Write
|
||||
---
|
||||
|
||||
# Ask Internal
|
||||
|
||||
Answer team-internal questions by reading `.internal-docs/` and the codebase, synthesizing a direct answer with file:line citations, and optionally running surfaced commands with confirmation.
|
||||
|
||||
**Announce at start:** "I'm using the ask-internal skill to answer this from internal-docs and the codebase."
|
||||
|
||||
## When to use
|
||||
|
||||
- "How do I reset my dogfood profile?"
|
||||
- "What's the deal with the OpenClaw VM startup?"
|
||||
- "Where do we configure release signing?"
|
||||
- Any question whose answer lives in setup runbooks, feature notes, architecture docs, or the code that produced them.
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER execute a state-mutating command without per-command `y` confirmation from the user.
|
||||
- NEVER edit BrowserOS code in response to an ask-internal question. The skill answers; it does not modify code. Use `/document-internal` for writes.
|
||||
- NEVER guess. If grep finds nothing useful in docs or code, say so plainly.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER cite a file or line number you have not actually read.
|
||||
|
||||
## Voice rules
|
||||
|
||||
Apply the same voice rules as `document-internal` to the synthesized answer:
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences. Active voice. No em dashes.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- No filler intros.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
```bash
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] && [ -n "$(ls -A .internal-docs 2>/dev/null)" ] || {
|
||||
echo ".internal-docs/ missing or empty. Submodule not configured?"
|
||||
exit 0
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1: Parse the question
|
||||
|
||||
Pull the keywords from the user's question. Drop stop words. Identify intent:
|
||||
|
||||
- **Setup-question** ("how do I", "how to", "where do I configure"): bias the search toward `setup/`.
|
||||
- **Feature-question** ("what is X", "why does X work this way"): bias toward `features/` and `architecture/`.
|
||||
- **Free-form** ("anything about Y"): search all categories.
|
||||
|
||||
### Step 2: Multi-source search
|
||||
|
||||
Run grep in parallel across two sources.
|
||||
|
||||
**Internal docs:**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.md' '<keyword>' .internal-docs/
|
||||
```
|
||||
|
||||
Search each keyword separately. Collect top hits by relevance (more keyword matches = higher).
|
||||
|
||||
**Codebase (skip vendored Chromium and `node_modules`):**
|
||||
|
||||
```bash
|
||||
grep -rni --include='*.ts' --include='*.tsx' --include='*.js' --include='*.json' --include='*.sh' \
|
||||
--exclude-dir=node_modules --exclude-dir=chromium --exclude-dir=.grove \
|
||||
'<keyword>' packages/ scripts/ .config/ .github/
|
||||
```
|
||||
|
||||
Read the top 3-5 doc hits and top 3-5 code hits. Do not skim — read the relevant section fully so citations are accurate.
|
||||
|
||||
### Step 3: Synthesize answer
|
||||
|
||||
Structure the response:
|
||||
|
||||
1. **Direct answer.** First sentence answers the question. No preamble.
|
||||
2. **Steps if applicable.** Numbered list with exact commands.
|
||||
3. **Citations.** Every factual claim references `path/to/file.md:42` or `path/to/code.ts:117`. Run the voice self-check before printing.
|
||||
|
||||
If multiple docs cover the topic at different layers (e.g., a setup runbook and a feature note both mention dogfood profiles), reconcile them in the answer rather than dumping both.
|
||||
|
||||
### Step 4: Offer execution (only if commands surfaced)
|
||||
|
||||
If Step 3 produced executable commands the user could run, ask:
|
||||
|
||||
> Run these for you? (y / n / dry-run)
|
||||
|
||||
- **y:** Execute one at a time. For any command that mutates state (writes a file, modifies config, kills a process, deletes anything), ask "run this? <command>" before each. Read-only commands (`ls`, `cat`, `git status`) run without per-command confirmation but still print before running.
|
||||
- **n:** Skip. Done.
|
||||
- **dry-run:** Print the full sequence as a `bash` block. Do not execute.
|
||||
|
||||
### Step 5: Doc-not-found path
|
||||
|
||||
If Step 2 returned nothing useful (no doc hits AND no clear code answer):
|
||||
|
||||
1. Tell the user: "No doc covers this. Tangentially relevant files: <list>."
|
||||
2. Ask: "Draft a new doc and open a PR to internal-docs?"
|
||||
3. On yes: invoke the full `/document-internal` flow (four sharp questions, draft, voice check, PR), forced to `setup/` doc type, with the code-grep findings handed in as initial context.
|
||||
|
||||
### Step 6: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — answer delivered, citations verified.
|
||||
- **DONE_WITH_CONCERNS** — answered, but flag uncertainty (e.g., docs and code disagreed; user should reconcile).
|
||||
- **BLOCKED** — submodule missing or other pre-flight failure.
|
||||
- **NEEDS_CONTEXT** — question too vague to search effectively. Ask one clarifying question.
|
||||
|
||||
## Citation discipline
|
||||
|
||||
Every "X is at Y" claim in the answer must point to a file:line that the skill actually read. Do not approximate. If you didn't read it, don't cite it.
|
||||
|
||||
If a doc says one thing and the code says another, surface the conflict explicitly:
|
||||
|
||||
> The setup runbook (`setup/dogfood-profile.md:23`) says to delete `~/.cache/browseros/dogfood`, but the actual code path in `packages/cli/src/cleanup.ts:47` removes `~/.local/share/browseros/dogfood`. The doc looks stale. Recommend updating it.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Skimming and then citing**
|
||||
- **Problem:** Citation points to a line that doesn't actually contain the claim.
|
||||
- **Fix:** Read the section fully before citing. If you didn't read line 117, don't cite line 117.
|
||||
|
||||
**Executing without per-command confirmation for mutations**
|
||||
- **Problem:** User says "y" to "run all", skill blasts through `rm -rf`-style commands.
|
||||
- **Fix:** "y" means "run this sequence with per-mutation confirmations". Per-command y is required for writes.
|
||||
|
||||
**Searching only docs, not code**
|
||||
- **Problem:** Doc says X but code does Y; answer is wrong.
|
||||
- **Fix:** Always grep both sources in Step 2.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Cite a file:line you haven't read.
|
||||
- Run mutations without per-command confirmation.
|
||||
- Modify BrowserOS code from this skill (use `/document-internal` for writes).
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before any search.
|
||||
- Reconcile doc vs code conflicts in the answer, don't hide them.
|
||||
- Plain "no doc covers this" when grep is empty — never invent.
|
||||
@@ -1,208 +0,0 @@
|
||||
---
|
||||
name: document-internal
|
||||
description: Draft a 1-page internal doc (feature, architecture, or design) for the private browseros-ai/internal-docs repo. Use when wrapping up a feature on a branch, after the PR is open or about to be opened. Skill drafts from the diff, asks four sharp questions, enforces voice rules, and opens a PR to internal-docs.
|
||||
allowed-tools: Bash, Read, Write, Edit, Grep, Glob
|
||||
---
|
||||
|
||||
# Document Internal
|
||||
|
||||
Draft a 1-page internal doc (feature note, architecture note, or design spec) from the current branch's diff and open a PR to `browseros-ai/internal-docs`.
|
||||
|
||||
**Announce at start:** "I'm using the document-internal skill to draft a doc for internal-docs."
|
||||
|
||||
## When to use
|
||||
|
||||
After finishing implementation on a feature branch, when the work is doc-worthy (a major feature, a new subsystem, a setup runbook for something internal, or a design decision that future engineers need to know).
|
||||
|
||||
## Hard rules — never do these
|
||||
|
||||
- NEVER `git add -A` or `git add .` inside the tmp clone of internal-docs. Always specific paths.
|
||||
- NEVER write outside the tmp clone (no spillover into the OSS repo's working tree).
|
||||
- NEVER fabricate filler content for empty template sections. Empty stays empty.
|
||||
- NEVER touch the OSS repo's `.gitmodules` or submodule pointer — the sync workflow handles that.
|
||||
- NEVER run this skill if `.internal-docs/` is missing. Stop with the init command.
|
||||
- NEVER push to `internal-docs/main` directly. Always a feature branch + PR.
|
||||
|
||||
## Voice rules — enforced by Step 4
|
||||
|
||||
The skill MUST follow these and refuse to draft otherwise. After generation, scan for violations and regenerate offending sentences (max 3 attempts).
|
||||
|
||||
- Lead with the point. First sentence answers "what is this?"
|
||||
- Concrete nouns. Name files, functions, commands. Not "the system" or "the component".
|
||||
- Short sentences. Average <20 words. No deeply nested clauses.
|
||||
- Active voice. "X does Y" not "Y is done by X".
|
||||
- No em dashes. Use commas, periods, or rephrase.
|
||||
- Banned words: delve, crucial, robust, comprehensive, nuanced, multifaceted, furthermore, moreover, additionally, pivotal, landscape, tapestry, underscore, foster, showcase, intricate, vibrant, fundamental, significant, leverage, utilize.
|
||||
- "110 IQ" target. Write for a smart engineer who has not seen this code yet.
|
||||
- No filler intros ("This document describes..."). Start with the substance.
|
||||
- Empty sections stay empty. Do not write "N/A" or fabricate content.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Pre-flight
|
||||
|
||||
Bail with a clear message on any failure.
|
||||
|
||||
```bash
|
||||
# Submodule must be initialized
|
||||
if git submodule status .internal-docs 2>/dev/null | grep -q '^-'; then
|
||||
echo "internal-docs submodule not initialized. Run: git submodule update --init .internal-docs"
|
||||
exit 0
|
||||
fi
|
||||
[ -d .internal-docs ] || { echo ".internal-docs/ missing. Submodule not configured?"; exit 0; }
|
||||
|
||||
# Must be on a feature branch
|
||||
BRANCH=$(git branch --show-current)
|
||||
if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "dev" ]; then
|
||||
echo "On $BRANCH. Run from a feature branch."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Determine base branch (default: dev for this repo, fall back to main).
|
||||
# Suppress rev-parse's SHA output on stdout so it doesn't get captured into BASE.
|
||||
BASE=$(git rev-parse --verify origin/dev >/dev/null 2>&1 && echo dev || echo main)
|
||||
|
||||
# Gather context
|
||||
git log "$BASE..HEAD" --oneline
|
||||
git diff "$BASE...HEAD" --stat
|
||||
gh pr view --json body -q .body 2>/dev/null # may be empty if no PR yet
|
||||
```
|
||||
|
||||
### Step 1: Identify the doc
|
||||
|
||||
Ask the user for three things in one prompt:
|
||||
|
||||
1. **Doc type:** `feature` (default for `feat/*` branches), `architecture`, or `design`
|
||||
2. **Slug:** kebab-case, short (e.g., `cowork-mcp`, `auto-skill-suggest`)
|
||||
3. **Owner:** GitHub handle (default = `git config user.name` or current `gh api user --jq .login`)
|
||||
|
||||
### Step 2: Decision brief — four sharp questions
|
||||
|
||||
Ask one question at a time. Each answer constrains the next. These force compression before drafting.
|
||||
|
||||
1. "In one sentence: what can someone now DO that they could not before?"
|
||||
2. "What is the one design decision a future engineer needs to know?"
|
||||
3. "Which 3-5 files are the heart of this change?" (suggest candidates from the diff)
|
||||
4. "Any sharp edges or gotchas? (or 'none')"
|
||||
|
||||
Skip any question that is N/A for the doc type. Architecture notes don't need question 1; design specs don't need question 4.
|
||||
|
||||
### Step 3: Draft from the template
|
||||
|
||||
Read the matching template from `.internal-docs/_templates/`:
|
||||
|
||||
- `feature` → `feature-note.md`
|
||||
- `architecture` → `architecture-note.md`
|
||||
- `design` → `design-spec.md`
|
||||
|
||||
If `.internal-docs/_templates/` does not exist (first run, before seeding), fall back to the seeds bundled with this skill at `.claude/skills/document-internal/seeds/_templates/`.
|
||||
|
||||
Generate the 1-pager from the template, the four answers, and the diff context.
|
||||
|
||||
### Step 4: Voice self-check
|
||||
|
||||
Scan the draft for violations:
|
||||
|
||||
- Em dash present (`—`).
|
||||
- Any banned word from the list.
|
||||
- Average sentence length > 20 words.
|
||||
- Body line count > 60 (feature notes only — architecture/design have no cap).
|
||||
|
||||
If any violation found, regenerate the offending sentences in place. Max 3 attempts. If still failing after 3 attempts, stop and report which rules are violated.
|
||||
|
||||
If the body is over 60 lines for a feature note, ask: "This is N lines, target is 60. Trim, or promote to `architecture/` (no length cap)?"
|
||||
|
||||
### Step 5: Show + iterate
|
||||
|
||||
Print the full draft. Ask:
|
||||
|
||||
> Edit needed? Paste any changes, or say "looks good".
|
||||
|
||||
Apply user edits with the Edit tool. Re-run Step 4. Loop until the user approves.
|
||||
|
||||
### Step 6: Open PR to internal-docs
|
||||
|
||||
Use a tmp clone. Never the user's `.internal-docs` checkout — keeps the user's submodule clean.
|
||||
|
||||
```bash
|
||||
TMP=$(mktemp -d)
|
||||
trap 'rm -rf "$TMP"' EXIT # cleans up even if any step below fails
|
||||
git clone -b main git@github.com:browseros-ai/internal-docs.git "$TMP"
|
||||
cd "$TMP"
|
||||
git checkout -b "docs/<slug>"
|
||||
|
||||
# Write the doc
|
||||
mkdir -p "<type>" # features, architecture, designs, or setup
|
||||
cat > "<type>/$(date -u +%Y-%m)-<slug>.md" <<'DOC'
|
||||
<draft content>
|
||||
DOC
|
||||
|
||||
# Update the root README index — insert one line under the matching section
|
||||
# Use Edit tool to add: "- [<title>](<type>/YYYY-MM-<slug>.md) — <one-line description>"
|
||||
|
||||
git add "<type>/$(date -u +%Y-%m)-<slug>.md" README.md
|
||||
git commit -m "docs(<type>): <slug>"
|
||||
git push -u origin "docs/<slug>"
|
||||
|
||||
PR_URL=$(gh pr create -R browseros-ai/internal-docs --base main \
|
||||
--head "docs/<slug>" \
|
||||
--title "docs(<type>): <slug>" \
|
||||
--body "$(cat <<'BODY'
|
||||
## Summary
|
||||
<one-line of what this doc covers>
|
||||
|
||||
## Source
|
||||
- BrowserOS branch: <branch>
|
||||
- Related PR: <#NNN if any>
|
||||
BODY
|
||||
)")
|
||||
|
||||
cd -
|
||||
echo "PR opened: $PR_URL"
|
||||
# trap above cleans up $TMP on EXIT
|
||||
```
|
||||
|
||||
If the slug contains characters that won't shell-escape cleanly, sanitize before substitution.
|
||||
|
||||
### Step 7: Completion status
|
||||
|
||||
Report one of:
|
||||
|
||||
- **DONE** — file written, branch pushed, PR opened. Print PR URL.
|
||||
- **DONE_WITH_CONCERNS** — same as DONE but list concerns (e.g., voice check needed multiple regens, user skipped a question).
|
||||
- **BLOCKED** — submodule missing, auth fail, or template missing. State exactly what's needed.
|
||||
|
||||
## Doc type defaults
|
||||
|
||||
| Branch pattern | Default doc type | Default location |
|
||||
|----------------|------------------|------------------|
|
||||
| `feat/*` | feature | `features/` |
|
||||
| `arch/*` or refactor branches with >10 files in `packages/` | architecture | `architecture/` |
|
||||
| `rfc/*` or `design/*` | design | `designs/` |
|
||||
| Otherwise | ask | ask |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
**Drafting before asking the four questions**
|
||||
- **Problem:** Output is generic filler that says nothing concrete.
|
||||
- **Fix:** Always ask Step 2 first, even if the diff "looks obvious".
|
||||
|
||||
**Touching `.internal-docs/` directly**
|
||||
- **Problem:** User's submodule HEAD moves, parent repo shows dirty state.
|
||||
- **Fix:** Always use the tmp clone in Step 6.
|
||||
|
||||
**Skipping voice check on user edits**
|
||||
- **Problem:** User pastes prose with em dashes or filler; ships as-is.
|
||||
- **Fix:** Re-run Step 4 after every user edit.
|
||||
|
||||
## Red Flags
|
||||
|
||||
**Never:**
|
||||
- Push to `internal-docs/main`. Always branch + PR.
|
||||
- Modify the OSS repo's `.gitmodules` or submodule pointer.
|
||||
- Fabricate content for empty template sections.
|
||||
|
||||
**Always:**
|
||||
- Pre-flight check before doing any work.
|
||||
- One-pager rule for feature notes (60-line body cap).
|
||||
- File:line citations when referencing code.
|
||||
@@ -1,51 +0,0 @@
|
||||
# BrowserOS Internal Docs
|
||||
|
||||
Private team docs for `browseros-ai`. Mounted as a submodule into the public OSS repo at `.internal-docs/`.
|
||||
|
||||
If you are reading this from a public clone of BrowserOS without team access — this submodule is for the BrowserOS internal team. Nothing here is required to build or use BrowserOS.
|
||||
|
||||
## How to find what you need
|
||||
|
||||
- Setup task ("how do I X locally") → look in [`setup/`](setup/)
|
||||
- Recently shipped feature → look in [`features/`](features/)
|
||||
- Cross-cutting subsystem → look in [`architecture/`](architecture/)
|
||||
- A design decision or RFC → look in [`designs/`](designs/)
|
||||
|
||||
Or run `/ask-internal "<your question>"` from any BrowserOS checkout. The skill greps these docs and the codebase, then synthesizes an answer with citations.
|
||||
|
||||
## How to add a doc
|
||||
|
||||
Run `/document-internal` from a feature branch. The skill drafts a 1-pager from your branch's diff, asks four sharp questions, enforces voice rules, and opens a PR back to this repo.
|
||||
|
||||
## Index
|
||||
|
||||
### Setup
|
||||
<!-- one line per setup runbook: -->
|
||||
<!-- - [Dev environment](setup/dev-environment.md): first-time machine setup -->
|
||||
|
||||
### Features
|
||||
<!-- one line per shipped feature, newest first: -->
|
||||
<!-- - [Cowork MCP](features/2026-04-cowork-mcp.md): bring outside MCPs into the BrowserOS agent -->
|
||||
|
||||
### Architecture
|
||||
<!-- one line per cross-cutting subsystem: -->
|
||||
<!-- - [Chrome fork overview](architecture/chrome-fork-overview.md): what we patched and why -->
|
||||
|
||||
### Designs
|
||||
<!-- one line per design spec, newest first: -->
|
||||
<!-- - [Internal docs submodule](designs/2026-04-30-internal-docs-submodule.md): this system -->
|
||||
|
||||
## Templates
|
||||
|
||||
When `/document-internal` runs, it reads from [`_templates/`](_templates/). Edit the templates here when the team's preferred shape changes.
|
||||
|
||||
## Voice
|
||||
|
||||
Docs in this repo follow these rules. The `/document-internal` skill enforces them; humans editing by hand should match.
|
||||
|
||||
- Lead with the point.
|
||||
- Concrete nouns. Name files, functions, commands.
|
||||
- Short sentences, active voice, no em dashes.
|
||||
- No filler words: delve, crucial, robust, comprehensive, nuanced, multifaceted, leverage, utilize, etc.
|
||||
- Empty sections stay empty. Do not write "N/A" or fake content.
|
||||
- Feature notes target one screen, body 60 lines max.
|
||||
@@ -1,31 +0,0 @@
|
||||
---
|
||||
title: <subsystem name>
|
||||
owner: <github handle>
|
||||
status: current | deprecated
|
||||
date: YYYY-MM-DD
|
||||
related-features: [feature-slug-1, feature-slug-2]
|
||||
---
|
||||
|
||||
# <subsystem name>
|
||||
|
||||
## What this subsystem does
|
||||
<1-2 paragraphs. The top-level responsibility. Boundaries.>
|
||||
|
||||
## Architecture
|
||||
<Diagram (ASCII or mermaid) plus prose. Components and how they talk.>
|
||||
|
||||
## Constraints
|
||||
<Hard rules the design enforces. "X must never call Y" type statements.>
|
||||
|
||||
## Decisions made
|
||||
<Numbered list of non-obvious decisions and the reason for each.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — role
|
||||
- `path/to/dir/` — what lives here
|
||||
|
||||
## How to evolve this
|
||||
<Where to add things. Which tests to expect to update. What NOT to touch.>
|
||||
|
||||
## Open questions
|
||||
<What is still being figured out. Empty if none.>
|
||||
@@ -1,34 +0,0 @@
|
||||
---
|
||||
title: <design name>
|
||||
owner: <github handle>
|
||||
status: proposed | accepted | rejected | superseded
|
||||
date: YYYY-MM-DD
|
||||
supersedes: <design-slug or none>
|
||||
---
|
||||
|
||||
# <design name>
|
||||
|
||||
## Goal
|
||||
<2-4 sentences. What this design is trying to accomplish.>
|
||||
|
||||
## Context
|
||||
<1-2 paragraphs. The current state, what is failing, why this needs to change.>
|
||||
|
||||
## Selected Approach
|
||||
<The chosen design at a high level. Architecture, components, data flow.>
|
||||
|
||||
## Alternatives Considered
|
||||
### 1. <name>
|
||||
<2-3 sentences on what this would look like, then pro/con and why rejected (or deferred).>
|
||||
|
||||
### 2. <name>
|
||||
<Same shape.>
|
||||
|
||||
## Out of Scope
|
||||
<What this design does NOT cover. Defer references.>
|
||||
|
||||
## Rollout
|
||||
<Numbered steps from "nothing exists" to "fully shipped".>
|
||||
|
||||
## Open Questions
|
||||
<Resolved during design? Empty. Unresolved? List with owner.>
|
||||
@@ -1,29 +0,0 @@
|
||||
---
|
||||
title: <feature name>
|
||||
owner: <github handle>
|
||||
status: shipped | wip | deprecated
|
||||
date: YYYY-MM-DD
|
||||
prs: ["#NNN"]
|
||||
tags: [agent, browser, mcp]
|
||||
---
|
||||
|
||||
# <feature name>
|
||||
|
||||
## What it does
|
||||
<2-3 sentences. What can someone now do that they could not before. Lead with user-facing impact, not implementation.>
|
||||
|
||||
## Why we built it
|
||||
<1-2 sentences. Motivation. What pain it removed or what unlocked.>
|
||||
|
||||
## How it works
|
||||
<3-6 sentences. The flow at a high level. Name the key files.>
|
||||
|
||||
## Key files
|
||||
- `path/to/file.ts` — what it does
|
||||
- `path/to/other.ts` — what it does
|
||||
|
||||
## How to run / test it locally
|
||||
<bullet list of commands. Empty section if N/A — do not fake.>
|
||||
|
||||
## Gotchas
|
||||
<known sharp edges. "If you see X, that's why." Empty if N/A.>
|
||||
167
.github/workflows/publish-vm-agent-cache.yml
vendored
Normal file
167
.github/workflows/publish-vm-agent-cache.yml
vendored
Normal file
@@ -0,0 +1,167 @@
|
||||
name: Publish VM Agent Cache
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
agent:
|
||||
description: "Agent name from bundle.json"
|
||||
required: true
|
||||
type: string
|
||||
default: openclaw
|
||||
publish:
|
||||
description: "Upload to R2 and merge manifest slice"
|
||||
required: false
|
||||
default: false
|
||||
type: boolean
|
||||
pull_request:
|
||||
paths:
|
||||
- "packages/browseros-agent/packages/build-tools/**"
|
||||
- ".github/workflows/publish-vm-agent-cache.yml"
|
||||
|
||||
env:
|
||||
BUN_VERSION: "1.3.6"
|
||||
PKG_DIR: packages/browseros-agent/packages/build-tools
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check:
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun run --filter @browseros/build-tools typecheck
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun run --filter @browseros/build-tools test
|
||||
|
||||
build:
|
||||
needs: check
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- arch: arm64
|
||||
runner: ubuntu-24.04-arm
|
||||
- arch: x64
|
||||
runner: ubuntu-24.04
|
||||
runs-on: ${{ matrix.runner }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- name: Install podman
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y podman
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Build tarball
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
OUT: ${{ github.workspace }}/dist/images
|
||||
run: bun run build:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --output-dir "$OUT"
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
|
||||
path: dist/images/
|
||||
retention-days: 7
|
||||
|
||||
smoke:
|
||||
needs: build
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- arch: arm64
|
||||
runner: ubuntu-24.04-arm
|
||||
- arch: x64
|
||||
runner: ubuntu-24.04
|
||||
runs-on: ${{ matrix.runner }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: tarball-${{ inputs.agent || 'openclaw' }}-${{ matrix.arch }}
|
||||
path: dist/images
|
||||
- name: Install podman
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y podman
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Smoke test tarball
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
tarball="$(find "$GITHUB_WORKSPACE/dist/images" -name "${AGENT}-*-${{ matrix.arch }}.tar.gz" -print -quit)"
|
||||
if [ -z "$tarball" ]; then
|
||||
echo "missing ${{ matrix.arch }} tarball artifact for ${AGENT}" >&2
|
||||
exit 1
|
||||
fi
|
||||
bun run smoke:tarball -- --agent "$AGENT" --arch "${{ matrix.arch }}" --tarball "$tarball"
|
||||
|
||||
publish:
|
||||
needs: [build, smoke]
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && inputs.publish == true }}
|
||||
runs-on: ubuntu-24.04
|
||||
environment: release
|
||||
concurrency:
|
||||
group: r2-manifest-publish
|
||||
cancel-in-progress: false
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: ${{ env.BUN_VERSION }}
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
pattern: tarball-*
|
||||
path: dist/images
|
||||
merge-multiple: true
|
||||
- working-directory: packages/browseros-agent
|
||||
run: bun install --frozen-lockfile
|
||||
- name: Upload tarballs to R2
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
|
||||
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
|
||||
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
|
||||
R2_BUCKET: ${{ secrets.R2_BUCKET }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
for file in "$GITHUB_WORKSPACE"/dist/images/*.tar.gz; do
|
||||
base="$(basename "$file")"
|
||||
bun run upload -- --file "$file" --key "vm/images/$base" --content-type "application/gzip" --sidecar-sha
|
||||
done
|
||||
- name: Merge agent slice into manifest
|
||||
working-directory: ${{ env.PKG_DIR }}
|
||||
env:
|
||||
AGENT: ${{ inputs.agent || 'openclaw' }}
|
||||
R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
|
||||
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
|
||||
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
|
||||
R2_BUCKET: ${{ secrets.R2_BUCKET }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
mkdir -p dist/images
|
||||
cp -R "$GITHUB_WORKSPACE"/dist/images/* dist/images/
|
||||
bun run download -- --key vm/manifest.json --out dist/baseline-manifest.json
|
||||
bun run emit-manifest -- \
|
||||
--slice "agents:${AGENT}" \
|
||||
--dist-dir dist \
|
||||
--merge-from dist/baseline-manifest.json \
|
||||
--out dist/manifest.json
|
||||
bun run upload -- --file dist/manifest.json --key vm/manifest.json --content-type "application/json"
|
||||
62
.github/workflows/sync-internal-docs.yml
vendored
62
.github/workflows/sync-internal-docs.yml
vendored
@@ -1,62 +0,0 @@
|
||||
name: Sync internal-docs submodule
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 */4 * * *'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
sync:
|
||||
name: Bump internal-docs submodule pointer on dev
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
pull-requests: write
|
||||
steps:
|
||||
- name: Rewrite SSH submodule URL to HTTPS-with-token
|
||||
env:
|
||||
TOKEN: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
run: |
|
||||
git config --global "url.https://x-access-token:${TOKEN}@github.com/.insteadOf" "git@github.com:"
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
token: ${{ secrets.INTERNAL_DOCS_SYNC_TOKEN }}
|
||||
submodules: true
|
||||
ref: dev
|
||||
fetch-depth: 50
|
||||
|
||||
- name: Open auto-merge PR if internal-docs has new commits
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Skip if submodule not yet configured (handoff window before someone adds it)
|
||||
if ! git config --file .gitmodules --get-regexp '^submodule\..internal-docs\.path$' >/dev/null 2>&1; then
|
||||
echo "internal-docs submodule not yet configured in .gitmodules. Skipping."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
git submodule update --remote --merge .internal-docs
|
||||
|
||||
if git diff --quiet .internal-docs; then
|
||||
echo "No internal-docs changes to sync."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
BRANCH="bot/sync-internal-docs-$(date -u +%Y%m%d-%H%M%S)"
|
||||
git config user.name "browseros-bot"
|
||||
git config user.email "bot@browseros.ai"
|
||||
git checkout -b "$BRANCH"
|
||||
git add .internal-docs
|
||||
git commit -m "chore: sync internal-docs submodule"
|
||||
git push -u origin "$BRANCH"
|
||||
|
||||
PR_URL=$(gh pr create \
|
||||
--base dev \
|
||||
--head "$BRANCH" \
|
||||
--title "chore: sync internal-docs submodule" \
|
||||
--body "Automated bump of the \`.internal-docs\` submodule pointer. Auto-merging.")
|
||||
|
||||
gh pr merge "$PR_URL" --auto --squash --delete-branch
|
||||
6
.github/workflows/test.yml
vendored
6
.github/workflows/test.yml
vendored
@@ -63,15 +63,15 @@ jobs:
|
||||
junit_path: test-results/server-root.xml
|
||||
needs_browser: false
|
||||
- suite: agent
|
||||
command: (cd apps/agent && bun run test)
|
||||
command: bun run test:agent
|
||||
junit_path: test-results/agent.xml
|
||||
needs_browser: false
|
||||
- suite: eval
|
||||
command: (cd apps/eval && bun run test)
|
||||
command: bun run test:eval
|
||||
junit_path: test-results/eval.xml
|
||||
needs_browser: false
|
||||
- suite: build
|
||||
command: bun run ./scripts/run-bun-test.ts ./scripts/build
|
||||
command: bun run test:build
|
||||
junit_path: test-results/build.xml
|
||||
needs_browser: false
|
||||
|
||||
|
||||
4
.gitmodules
vendored
4
.gitmodules
vendored
@@ -1,4 +0,0 @@
|
||||
[submodule ".internal-docs"]
|
||||
path = .internal-docs
|
||||
url = git@github.com:browseros-ai/internal-docs.git
|
||||
branch = main
|
||||
|
||||
Submodule .internal-docs deleted from 590799ae1c
15
README.md
15
README.md
@@ -188,21 +188,6 @@ We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRI
|
||||
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
|
||||
- [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.
|
||||
|
||||
## Citation
|
||||
|
||||
If you use BrowserOS in your research or project, please cite:
|
||||
|
||||
```bibtex
|
||||
@software{browseros2025,
|
||||
author = {Nithin Sonti and Nikhil Sonti and {BrowserOS-team}},
|
||||
title = {BrowserOS: The open-source Agentic browser},
|
||||
url = {https://github.com/browseros-ai/BrowserOS},
|
||||
year = {2025},
|
||||
publisher = {GitHub},
|
||||
license = {AGPL-3.0},
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
|
||||
|
||||
@@ -79,15 +79,14 @@ cp apps/server/.env.example apps/server/.env.development
|
||||
cp apps/agent/.env.example apps/agent/.env.development
|
||||
cp apps/server/.env.production.example apps/server/.env.production
|
||||
|
||||
# Install deps and generate agent code
|
||||
# Install deps, generate agent code, and sync the VM cache
|
||||
bun run dev:setup
|
||||
|
||||
# Start the full dev environment
|
||||
bun run dev:watch
|
||||
```
|
||||
|
||||
`dev:watch` starts the server immediately. OpenClaw VM/image prewarm runs from
|
||||
the server startup path and pulls the configured GHCR image on demand.
|
||||
`dev:watch` exits when the VM cache manifest is missing, but setup stays in `dev:setup`.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
@@ -157,14 +156,9 @@ bun run build:server # Build production server resource artifacts and u
|
||||
bun run build:agent # Build agent extension
|
||||
|
||||
# Test
|
||||
bun run test # Run all tests
|
||||
bun run test:all # Run all tests
|
||||
bun run test:main # Run key server tools and integration tests
|
||||
|
||||
# App-specific test groups (from packages/browseros-agent)
|
||||
cd apps/server && bun run test:tools
|
||||
cd apps/server && bun run test:cdp
|
||||
cd apps/server && bun run test:integration
|
||||
bun run test # Run standard tests
|
||||
bun run test:cdp # Run CDP-based tests
|
||||
bun run test:integration # Run integration tests
|
||||
|
||||
# Quality
|
||||
bun run lint # Check with Biome
|
||||
|
||||
@@ -0,0 +1,136 @@
|
||||
import { Bot, Loader2, Wrench } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface AgentCardProps {
|
||||
agent: AgentCardData
|
||||
onClick: () => void
|
||||
active?: boolean
|
||||
}
|
||||
|
||||
function formatTimestamp(timestamp?: number): string {
|
||||
if (!timestamp) return 'No activity yet'
|
||||
const diff = Date.now() - timestamp
|
||||
const minutes = Math.floor(diff / 60000)
|
||||
if (minutes < 1) return 'just now'
|
||||
if (minutes < 60) return `${minutes}m ago`
|
||||
const hours = Math.floor(minutes / 60)
|
||||
if (hours < 24) return `${hours}h ago`
|
||||
return `${Math.floor(hours / 24)}d ago`
|
||||
}
|
||||
|
||||
function getStatusLabel(status: AgentCardData['status']): string {
|
||||
if (status === 'working') return 'Working'
|
||||
if (status === 'error') return 'Error'
|
||||
return 'Ready'
|
||||
}
|
||||
|
||||
function getStatusTone(status: AgentCardData['status']): string {
|
||||
if (status === 'working') return 'bg-amber-500'
|
||||
if (status === 'error') return 'bg-destructive'
|
||||
return 'bg-emerald-500'
|
||||
}
|
||||
|
||||
function formatCost(usd: number): string {
|
||||
if (usd < 0.005) return `$${usd.toFixed(4)}`
|
||||
return `$${usd.toFixed(2)}`
|
||||
}
|
||||
|
||||
export const AgentCardExpanded: FC<AgentCardProps> = ({
|
||||
agent,
|
||||
onClick,
|
||||
active,
|
||||
}) => (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border p-4 text-left shadow-sm transition-all duration-200',
|
||||
active
|
||||
? 'border-border/80 bg-card shadow-md ring-1 ring-[var(--accent-orange)]/20'
|
||||
: 'border-border/60 bg-card/85 hover:border-border hover:bg-card hover:shadow-md',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start justify-between gap-3">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<div
|
||||
className={cn(
|
||||
'flex size-10 shrink-0 items-center justify-center rounded-xl',
|
||||
active
|
||||
? 'bg-[var(--accent-orange)]/10 text-[var(--accent-orange)]'
|
||||
: 'bg-muted text-muted-foreground',
|
||||
)}
|
||||
>
|
||||
<Bot className="size-5" />
|
||||
</div>
|
||||
<div className="min-w-0">
|
||||
<div className="truncate font-semibold text-sm">{agent.name}</div>
|
||||
<div className="truncate text-muted-foreground text-xs">
|
||||
{agent.model ?? 'OpenClaw agent'}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-2 rounded-full border border-border/60 bg-background/70 px-2.5 py-1 text-[11px] text-muted-foreground">
|
||||
<span
|
||||
className={cn('size-2 rounded-full', getStatusTone(agent.status))}
|
||||
/>
|
||||
<span>{getStatusLabel(agent.status)}</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 flex-1">
|
||||
<p className="line-clamp-2 text-foreground/90 text-sm">
|
||||
{agent.lastMessage ??
|
||||
'Start a conversation to see recent work and summaries.'}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 space-y-1.5 text-muted-foreground text-xs">
|
||||
<div className="flex items-center justify-between gap-3">
|
||||
<span>{formatTimestamp(agent.lastMessageTimestamp)}</span>
|
||||
{agent.costUsd ? (
|
||||
<span className="tabular-nums opacity-70">
|
||||
{formatCost(agent.costUsd)}
|
||||
</span>
|
||||
) : null}
|
||||
</div>
|
||||
{agent.status === 'working' && agent.currentTool ? (
|
||||
<div className="flex items-center gap-1.5 text-[var(--accent-orange)]/70">
|
||||
<Loader2 className="size-3 shrink-0 animate-spin" />
|
||||
<span className="truncate">{agent.currentTool}</span>
|
||||
</div>
|
||||
) : agent.activitySummary ? (
|
||||
<div className="flex items-center gap-1.5 text-muted-foreground/60">
|
||||
<Wrench className="size-3 shrink-0" />
|
||||
<span className="truncate">{agent.activitySummary}</span>
|
||||
</div>
|
||||
) : null}
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
|
||||
export const AgentCardCompact: FC<AgentCardProps> = ({
|
||||
agent,
|
||||
onClick,
|
||||
active,
|
||||
}) => (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'inline-flex items-center gap-2 rounded-full border px-3 py-2 text-sm transition-colors',
|
||||
active
|
||||
? 'border-border bg-card shadow-sm ring-1 ring-[var(--accent-orange)]/20'
|
||||
: 'border-border/60 bg-card/85 text-foreground hover:border-border hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<span
|
||||
className={cn(
|
||||
'size-2 rounded-full',
|
||||
active ? 'bg-[var(--accent-orange)]' : getStatusTone(agent.status),
|
||||
)}
|
||||
/>
|
||||
<span className="truncate">{agent.name}</span>
|
||||
</button>
|
||||
)
|
||||
@@ -1,71 +1,70 @@
|
||||
import { Plus } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAdapterHealth,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { HomeAgentCard } from './HomeAgentCard'
|
||||
import { AgentCardCompact, AgentCardExpanded } from './AgentCard'
|
||||
|
||||
interface AgentCardDockProps {
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
agents: AgentCardData[]
|
||||
activeAgentId?: string
|
||||
onSelectAgent: (agentId: string) => void
|
||||
onCreateAgent?: () => void
|
||||
compact?: boolean
|
||||
}
|
||||
|
||||
function CreateAgentButton({ onCreateAgent }: { onCreateAgent: () => void }) {
|
||||
function CreateAgentButton({
|
||||
compact,
|
||||
onCreateAgent,
|
||||
}: {
|
||||
compact?: boolean
|
||||
onCreateAgent: () => void
|
||||
}) {
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onCreateAgent}
|
||||
className={cn(
|
||||
'flex min-h-32 shrink-0 items-center justify-center gap-2 rounded-2xl border border-dashed px-5 py-4 text-muted-foreground transition-colors',
|
||||
'hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
|
||||
'flex shrink-0 items-center justify-center gap-2 border border-dashed text-muted-foreground transition-colors hover:border-[var(--accent-orange)] hover:text-[var(--accent-orange)]',
|
||||
compact
|
||||
? 'rounded-full px-3 py-2 text-sm'
|
||||
: 'min-h-32 rounded-2xl px-5 py-4',
|
||||
)}
|
||||
>
|
||||
<Plus className="size-5" />
|
||||
<span>Create agent</span>
|
||||
<Plus className={compact ? 'size-3.5' : 'size-5'} />
|
||||
<span>{compact ? 'New' : 'Create agent'}</span>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* 3-column grid of HomeAgentCards plus a trailing "Create agent"
|
||||
* tile. The previous `compact` mode (rendered a horizontal pill rail)
|
||||
* had no callers and was dropped along with the legacy AgentCard.
|
||||
*/
|
||||
export const AgentCardDock: FC<AgentCardDockProps> = ({
|
||||
agents,
|
||||
adapters,
|
||||
activeAgentId,
|
||||
onSelectAgent,
|
||||
onCreateAgent,
|
||||
compact,
|
||||
}) => {
|
||||
if (agents.length === 0 && !onCreateAgent) return null
|
||||
|
||||
const adapterHealth = new Map<HarnessAgentAdapter, HarnessAdapterHealth>()
|
||||
for (const descriptor of adapters) {
|
||||
if (descriptor.health) adapterHealth.set(descriptor.id, descriptor.health)
|
||||
}
|
||||
const Card = compact ? AgentCardCompact : AgentCardExpanded
|
||||
|
||||
return (
|
||||
<div className="grid gap-4 md:grid-cols-3">
|
||||
<div
|
||||
className={cn(
|
||||
compact
|
||||
? 'flex items-center gap-2 overflow-x-auto pb-1'
|
||||
: 'grid gap-4 md:grid-cols-3',
|
||||
)}
|
||||
>
|
||||
{agents.map((agent) => (
|
||||
<HomeAgentCard
|
||||
key={agent.id}
|
||||
<Card
|
||||
key={agent.agentId}
|
||||
agent={agent}
|
||||
adapter={agent.adapter}
|
||||
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
|
||||
active={agent.id === activeAgentId}
|
||||
onClick={() => onSelectAgent(agent.id)}
|
||||
active={agent.agentId === activeAgentId}
|
||||
onClick={() => onSelectAgent(agent.agentId)}
|
||||
/>
|
||||
))}
|
||||
{onCreateAgent ? (
|
||||
<CreateAgentButton onCreateAgent={onCreateAgent} />
|
||||
<CreateAgentButton compact={compact} onCreateAgent={onCreateAgent} />
|
||||
) : null}
|
||||
</div>
|
||||
)
|
||||
|
||||
@@ -1,35 +1,179 @@
|
||||
import { ArrowLeft } from 'lucide-react'
|
||||
import { ArrowLeft, Bot, Home } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useRef } from 'react'
|
||||
import { Navigate, useNavigate, useParams, useSearchParams } from 'react-router'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import type {
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import {
|
||||
cancelHarnessTurn,
|
||||
useAgentAdapters,
|
||||
useEnqueueHarnessMessage,
|
||||
useHarnessAgents,
|
||||
useRemoveHarnessQueuedMessage,
|
||||
useUpdateHarnessAgent,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { AgentRail } from './AgentRail'
|
||||
type AgentEntry,
|
||||
getModelDisplayName,
|
||||
} from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import { ClawChat } from './ClawChat'
|
||||
import { ConversationHeader } from './ConversationHeader'
|
||||
import { ConversationInput } from './ConversationInput'
|
||||
import {
|
||||
buildChatHistoryFromClawMessages,
|
||||
filterTurnsPersistedInHistory,
|
||||
flattenHistoryPages,
|
||||
} from './claw-chat-types'
|
||||
import { QueuePanel } from './QueuePanel'
|
||||
import { useAgentConversation } from './useAgentConversation'
|
||||
import { useHarnessChatHistory } from './useHarnessChatHistory'
|
||||
|
||||
function StatusBadge({ status }: { status: string }) {
|
||||
return (
|
||||
<div className="inline-flex items-center gap-2 rounded-full border border-border/60 bg-card px-3 py-1 text-[11px] text-muted-foreground uppercase tracking-[0.18em]">
|
||||
<span
|
||||
className={cn(
|
||||
'size-1.5 rounded-full',
|
||||
status === 'Working on your request'
|
||||
? 'bg-amber-500'
|
||||
: status === 'Ready'
|
||||
? 'bg-emerald-500'
|
||||
: status === 'Offline'
|
||||
? 'bg-muted-foreground/50'
|
||||
: 'bg-[var(--accent-orange)]',
|
||||
)}
|
||||
/>
|
||||
<span>{status}</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentIdentity({
|
||||
name,
|
||||
meta,
|
||||
className,
|
||||
}: {
|
||||
name: string
|
||||
meta: string
|
||||
className?: string
|
||||
}) {
|
||||
return (
|
||||
<div className={cn('min-w-0', className)}>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">{name}</div>
|
||||
<div className="truncate text-muted-foreground text-xs leading-5">
|
||||
{meta}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function ConversationHeader({
|
||||
agentName,
|
||||
agentMeta,
|
||||
status,
|
||||
backLabel,
|
||||
backTarget,
|
||||
onGoHome,
|
||||
}: {
|
||||
agentName: string
|
||||
agentMeta: string
|
||||
status: string
|
||||
backLabel: string
|
||||
backTarget: 'home' | 'page'
|
||||
onGoHome: () => void
|
||||
}) {
|
||||
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
|
||||
|
||||
return (
|
||||
<div className="flex h-14 items-center justify-between gap-4 border-border/50 border-b px-5">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 rounded-xl lg:hidden"
|
||||
title={backLabel}
|
||||
>
|
||||
<BackIcon className="size-4" />
|
||||
</Button>
|
||||
<div className="flex size-8 shrink-0 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<Bot className="size-4" />
|
||||
</div>
|
||||
<AgentIdentity name={agentName} meta={agentMeta} />
|
||||
</div>
|
||||
|
||||
<StatusBadge status={status} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentRailHeader({ onGoHome }: { onGoHome: () => void }) {
|
||||
return (
|
||||
<div className="hidden h-14 items-center border-border/50 border-r border-b bg-background/70 px-4 lg:flex">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 rounded-xl"
|
||||
title="Back to home"
|
||||
>
|
||||
<ArrowLeft className="size-4" />
|
||||
</Button>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">
|
||||
Agents
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AgentRailList({
|
||||
activeAgentId,
|
||||
agents,
|
||||
onSelectAgent,
|
||||
}: {
|
||||
activeAgentId: string
|
||||
agents: AgentEntry[]
|
||||
onSelectAgent: (entry: AgentEntry) => void
|
||||
}) {
|
||||
return (
|
||||
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
|
||||
<div className="styled-scrollbar min-h-0 flex-1 space-y-2 overflow-y-auto px-3 py-3">
|
||||
{agents.map((entry) => {
|
||||
const active = entry.agentId === activeAgentId
|
||||
const modelName = getAgentEntryMeta(entry)
|
||||
|
||||
return (
|
||||
<button
|
||||
key={entry.agentId}
|
||||
type="button"
|
||||
onClick={() => onSelectAgent(entry)}
|
||||
className={cn(
|
||||
'w-full rounded-2xl border px-3 py-3 text-left transition-all',
|
||||
active
|
||||
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8 shadow-sm'
|
||||
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-center gap-3">
|
||||
<div
|
||||
className={cn(
|
||||
'flex size-9 items-center justify-center rounded-xl',
|
||||
active
|
||||
? 'bg-[var(--accent-orange)]/12 text-[var(--accent-orange)]'
|
||||
: 'bg-muted text-muted-foreground',
|
||||
)}
|
||||
>
|
||||
<Bot className="size-4" />
|
||||
</div>
|
||||
<AgentIdentity name={entry.name} meta={modelName} />
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</aside>
|
||||
)
|
||||
}
|
||||
|
||||
function getAgentEntryMeta(agent: AgentEntry | undefined): string {
|
||||
if (agent?.source === 'agent-harness') {
|
||||
return getModelDisplayName(agent.model) ?? 'ACP agent'
|
||||
}
|
||||
return getModelDisplayName(agent?.model) ?? 'OpenClaw agent'
|
||||
}
|
||||
|
||||
function AgentConversationController({
|
||||
agentId,
|
||||
initialMessage,
|
||||
@@ -68,33 +212,15 @@ function AgentConversationController({
|
||||
[historyMessages],
|
||||
)
|
||||
|
||||
// Listing query feeds queue + active-turn state for this agent. We
|
||||
// already poll it every 5s for the rail; reusing the same cache
|
||||
// keeps cross-tab queue state in sync without a second poll.
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const harnessAgent = harnessAgents.find((entry) => entry.id === agentId)
|
||||
const queue = harnessAgent?.queue ?? []
|
||||
const activeTurnId = harnessAgent?.activeTurnId ?? null
|
||||
|
||||
const { turns, streaming, send } = useAgentConversation(agentId, {
|
||||
runtime: 'agent-harness',
|
||||
sessionKey: null,
|
||||
history: chatHistory,
|
||||
activeTurnId,
|
||||
onComplete: () => {
|
||||
void harnessHistoryQuery.refetch()
|
||||
},
|
||||
onSessionKeyChange: () => {},
|
||||
})
|
||||
const enqueueMessage = useEnqueueHarnessMessage()
|
||||
const removeQueuedMessage = useRemoveHarnessQueuedMessage()
|
||||
|
||||
const handleStop = () => {
|
||||
void cancelHarnessTurn(agentId, {
|
||||
turnId: activeTurnId ?? undefined,
|
||||
reason: 'user pressed stop',
|
||||
})
|
||||
}
|
||||
const visibleTurns = useMemo(
|
||||
() => filterTurnsPersistedInHistory(turns, historyMessages),
|
||||
[historyMessages, turns],
|
||||
@@ -138,7 +264,7 @@ function AgentConversationController({
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex min-h-0 flex-1 flex-col overflow-hidden">
|
||||
<div className="flex min-h-0 flex-col overflow-hidden">
|
||||
<ClawChat
|
||||
agentName={agentName}
|
||||
historyMessages={historyMessages}
|
||||
@@ -155,15 +281,7 @@ function AgentConversationController({
|
||||
/>
|
||||
|
||||
<div className="border-border/50 border-t bg-background/88 px-4 py-3 backdrop-blur-md">
|
||||
<div className="mx-auto max-w-3xl space-y-3">
|
||||
{queue.length > 0 ? (
|
||||
<QueuePanel
|
||||
queue={queue}
|
||||
onRemove={(messageId) =>
|
||||
removeQueuedMessage.mutate({ agentId, messageId })
|
||||
}
|
||||
/>
|
||||
) : null}
|
||||
<div className="mx-auto max-w-3xl">
|
||||
<ConversationInput
|
||||
variant="conversation"
|
||||
agents={agents}
|
||||
@@ -178,31 +296,14 @@ function AgentConversationController({
|
||||
name: a.name,
|
||||
dataUrl: a.dataUrl,
|
||||
}))
|
||||
// When the agent already has an in-flight turn, route
|
||||
// the new message into the durable queue instead of
|
||||
// starting a parallel turn. Drains automatically as
|
||||
// soon as the active turn ends.
|
||||
if (streaming || activeTurnId) {
|
||||
enqueueMessage.mutate({
|
||||
agentId,
|
||||
message: input.text,
|
||||
attachments,
|
||||
})
|
||||
return
|
||||
}
|
||||
void send({ text: input.text, attachments, attachmentPreviews })
|
||||
}}
|
||||
onCreateAgent={() => navigate(createAgentPath)}
|
||||
onStop={handleStop}
|
||||
streaming={streaming}
|
||||
disabled={disabled}
|
||||
status="running"
|
||||
attachmentsEnabled={true}
|
||||
placeholder={
|
||||
streaming
|
||||
? `Type to queue another message for ${agentName}...`
|
||||
: `Message ${agentName}...`
|
||||
}
|
||||
placeholder={`Message ${agentName}...`}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
@@ -217,22 +318,6 @@ interface AgentCommandConversationProps {
|
||||
createAgentPath?: string
|
||||
}
|
||||
|
||||
function inferAdapterFromEntry(
|
||||
entry: AgentEntry | undefined,
|
||||
): HarnessAgentAdapter | 'unknown' {
|
||||
if (!entry) return 'unknown'
|
||||
if (entry.source === 'agent-harness') {
|
||||
// Harness entries don't carry the adapter on AgentEntry; the rail
|
||||
// / header read the harness record directly. This branch only runs
|
||||
// before the harness query resolves, so 'unknown' is correct — the
|
||||
// tile's bot fallback renders until data arrives.
|
||||
return 'unknown'
|
||||
}
|
||||
// OpenClaw-only entries (no harness shadow) are deprecated in
|
||||
// practice but the rail still tolerates them.
|
||||
return 'openclaw'
|
||||
}
|
||||
|
||||
export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
|
||||
variant = 'command',
|
||||
backPath = '/home',
|
||||
@@ -243,110 +328,60 @@ export const AgentCommandConversation: FC<AgentCommandConversationProps> = ({
|
||||
const [searchParams, setSearchParams] = useSearchParams()
|
||||
const navigate = useNavigate()
|
||||
const { agents } = useAgentCommandData()
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const { adapters } = useAgentAdapters()
|
||||
const updateAgent = useUpdateHarnessAgent()
|
||||
|
||||
const shouldRedirectHome = !agentId
|
||||
const resolvedAgentId = agentId ?? ''
|
||||
const harnessAgent = harnessAgents.find(
|
||||
(entry) => entry.id === resolvedAgentId,
|
||||
)
|
||||
const entry = agents.find((item) => item.agentId === resolvedAgentId)
|
||||
const fallbackName = entry?.name || resolvedAgentId || 'Agent'
|
||||
const fallbackAdapter = inferAdapterFromEntry(entry)
|
||||
const agent = agents.find((entry) => entry.agentId === resolvedAgentId)
|
||||
const agentName = agent?.name || resolvedAgentId || 'Agent'
|
||||
const agentMeta = getAgentEntryMeta(agent)
|
||||
const initialMessage = searchParams.get('q')
|
||||
const isPageVariant = variant === 'page'
|
||||
const backLabel = isPageVariant ? 'Back to agents' : 'Back to home'
|
||||
|
||||
const adapterHealth = useMemo<AgentAdapterHealth | null>(() => {
|
||||
const adapterId = harnessAgent?.adapter
|
||||
if (!adapterId) return null
|
||||
const descriptor = adapters.find((item) => item.id === adapterId)
|
||||
if (!descriptor?.health) return null
|
||||
return {
|
||||
healthy: descriptor.health.healthy,
|
||||
reason: descriptor.health.reason,
|
||||
}
|
||||
}, [adapters, harnessAgent?.adapter])
|
||||
|
||||
if (shouldRedirectHome) {
|
||||
return <Navigate to="/home" replace />
|
||||
}
|
||||
|
||||
const handleSelectHarnessAgent = (target: HarnessAgent) => {
|
||||
navigate(`${agentPathPrefix}/${target.id}`)
|
||||
const handleSelectAgent = (entry: AgentEntry) => {
|
||||
navigate(`${agentPathPrefix}/${entry.agentId}`)
|
||||
}
|
||||
|
||||
const handlePinToggle = (target: HarnessAgent | null, next: boolean) => {
|
||||
if (!target) return
|
||||
updateAgent.mutate({
|
||||
agentId: target.id,
|
||||
patch: { pinned: next },
|
||||
})
|
||||
}
|
||||
// Every visible agent runs through the harness now, so per-agent
|
||||
// runtime status doesn't gate chat the way OpenClaw's legacy
|
||||
// gateway lifecycle did. Show "Ready" once the agent record is
|
||||
// resolved from the rail, "Setup" otherwise.
|
||||
const statusCopy = agent ? 'Ready' : 'Setup'
|
||||
|
||||
return (
|
||||
<div className="absolute inset-0 overflow-hidden bg-background md:pl-[theme(spacing.14)]">
|
||||
<div className="mx-auto flex h-full w-full max-w-[1480px] flex-col">
|
||||
{/* Shared top band — the rail's "Agents" header and the chat
|
||||
header live on one row so they're aligned by construction. */}
|
||||
<div className="flex shrink-0 items-stretch border-border/50 border-b">
|
||||
<div className="hidden min-h-[60px] w-[288px] shrink-0 items-center gap-3 border-border/50 border-r px-4 lg:flex">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={() => navigate(backPath)}
|
||||
className="size-8 rounded-xl"
|
||||
title="Back to home"
|
||||
>
|
||||
<ArrowLeft className="size-4" />
|
||||
</Button>
|
||||
<div className="truncate font-semibold text-[15px] leading-5">
|
||||
Agents
|
||||
</div>
|
||||
</div>
|
||||
<div className="min-w-0 flex-1">
|
||||
<ConversationHeader
|
||||
agent={harnessAgent ?? null}
|
||||
fallbackName={fallbackName}
|
||||
fallbackAdapter={fallbackAdapter}
|
||||
adapterHealth={adapterHealth}
|
||||
backLabel={backLabel}
|
||||
backTarget={isPageVariant ? 'page' : 'home'}
|
||||
onGoHome={() => navigate(backPath)}
|
||||
onPinToggle={(next) =>
|
||||
handlePinToggle(harnessAgent ?? null, next)
|
||||
}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mx-auto grid h-full w-full max-w-[1480px] lg:grid-cols-[288px_minmax(0,1fr)] lg:grid-rows-[3.5rem_minmax(0,1fr)]">
|
||||
<AgentRailHeader onGoHome={() => navigate(backPath)} />
|
||||
|
||||
{/* Body grid: rail list + chat. Both columns share the same
|
||||
top edge (the band above) so headers can never drift. */}
|
||||
<div className="grid min-h-0 flex-1 grid-rows-[minmax(0,1fr)] lg:grid-cols-[288px_minmax(0,1fr)]">
|
||||
<AgentRail
|
||||
agents={harnessAgents}
|
||||
adapters={adapters}
|
||||
activeAgentId={resolvedAgentId}
|
||||
onSelectAgent={handleSelectHarnessAgent}
|
||||
onPinToggle={(target, next) => handlePinToggle(target, next)}
|
||||
/>
|
||||
<ConversationHeader
|
||||
agentName={agentName}
|
||||
agentMeta={agentMeta}
|
||||
status={statusCopy}
|
||||
backLabel={backLabel}
|
||||
backTarget={isPageVariant ? 'page' : 'home'}
|
||||
onGoHome={() => navigate(backPath)}
|
||||
/>
|
||||
|
||||
<div className="flex h-full min-h-0 flex-col overflow-hidden">
|
||||
<AgentConversationController
|
||||
key={resolvedAgentId}
|
||||
agentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
initialMessage={initialMessage}
|
||||
onInitialMessageConsumed={() =>
|
||||
setSearchParams({}, { replace: true })
|
||||
}
|
||||
agentPathPrefix={agentPathPrefix}
|
||||
createAgentPath={createAgentPath}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
<AgentRailList
|
||||
activeAgentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
onSelectAgent={handleSelectAgent}
|
||||
/>
|
||||
|
||||
<AgentConversationController
|
||||
key={resolvedAgentId}
|
||||
agentId={resolvedAgentId}
|
||||
agents={agents}
|
||||
initialMessage={initialMessage}
|
||||
onInitialMessageConsumed={() =>
|
||||
setSearchParams({}, { replace: true })
|
||||
}
|
||||
agentPathPrefix={agentPathPrefix}
|
||||
createAgentPath={createAgentPath}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
|
||||
@@ -1,25 +1,18 @@
|
||||
import { Plus } from 'lucide-react'
|
||||
import { type FC, useEffect, useMemo, useState } from 'react'
|
||||
import { type FC, useEffect, useState } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { Card, CardContent } from '@/components/ui/card'
|
||||
import { Separator } from '@/components/ui/separator'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import {
|
||||
useAgentAdapters,
|
||||
useHarnessAgents,
|
||||
} from '@/entrypoints/app/agents/useAgents'
|
||||
import type { AgentEntry } from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import { ImportDataHint } from '@/entrypoints/newtab/index/ImportDataHint'
|
||||
import { SignInHint } from '@/entrypoints/newtab/index/SignInHint'
|
||||
import { useActiveHint } from '@/entrypoints/newtab/index/useActiveHint'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import { AgentCardDock } from './AgentCardDock'
|
||||
import { useAgentCommandData } from './agent-command-layout'
|
||||
import { ConversationInput } from './ConversationInput'
|
||||
import { orderHomeAgents } from './home-agent-card.helpers'
|
||||
import { buildAgentCardData } from './useAgentCardData'
|
||||
|
||||
function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
|
||||
return (
|
||||
@@ -45,13 +38,11 @@ function EmptyAgentsState({ onOpenAgents }: { onOpenAgents: () => void }) {
|
||||
function RecentThreads({
|
||||
activeAgentId,
|
||||
agents,
|
||||
adapters,
|
||||
onOpenAgents,
|
||||
onSelectAgent,
|
||||
}: {
|
||||
activeAgentId?: string | null
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
agents: AgentCardData[]
|
||||
onOpenAgents: () => void
|
||||
onSelectAgent: (agentId: string) => void
|
||||
}) {
|
||||
@@ -77,7 +68,6 @@ function RecentThreads({
|
||||
</div>
|
||||
<AgentCardDock
|
||||
agents={agents}
|
||||
adapters={adapters}
|
||||
activeAgentId={activeAgentId ?? undefined}
|
||||
onSelectAgent={onSelectAgent}
|
||||
onCreateAgent={onOpenAgents}
|
||||
@@ -89,32 +79,25 @@ function RecentThreads({
|
||||
export const AgentCommandHome: FC = () => {
|
||||
const navigate = useNavigate()
|
||||
const activeHint = useActiveHint()
|
||||
// The conversation input still consumes the merged AgentEntry list
|
||||
// from the layout context (handles legacy /claw/agents entries that
|
||||
// haven't yet been backfilled into the harness store). The Recent
|
||||
// Agents grid below reads the richer harness payload directly.
|
||||
const { agents: legacyAgents, status } = useAgentCommandData()
|
||||
const { harnessAgents } = useHarnessAgents()
|
||||
const { adapters } = useAgentAdapters()
|
||||
const { agents, status } = useAgentCommandData()
|
||||
const [selectedAgentId, setSelectedAgentId] = useState<string | null>(null)
|
||||
|
||||
const orderedAgents = useMemo(
|
||||
() => orderHomeAgents(harnessAgents),
|
||||
[harnessAgents],
|
||||
)
|
||||
const cardData = buildAgentCardData(agents, status?.status, undefined)
|
||||
|
||||
useEffect(() => {
|
||||
if (legacyAgents.length === 0) {
|
||||
if (selectedAgentId) setSelectedAgentId(null)
|
||||
if (agents.length === 0) {
|
||||
if (selectedAgentId) {
|
||||
setSelectedAgentId(null)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if (
|
||||
!selectedAgentId ||
|
||||
!legacyAgents.some((agent) => agent.agentId === selectedAgentId)
|
||||
!agents.some((agent) => agent.agentId === selectedAgentId)
|
||||
) {
|
||||
setSelectedAgentId(legacyAgents[0].agentId)
|
||||
setSelectedAgentId(agents[0].agentId)
|
||||
}
|
||||
}, [legacyAgents, selectedAgentId])
|
||||
}, [agents, selectedAgentId])
|
||||
|
||||
const handleSend = (input: { text: string }) => {
|
||||
if (!selectedAgentId) return
|
||||
@@ -127,7 +110,7 @@ export const AgentCommandHome: FC = () => {
|
||||
setSelectedAgentId(agent.agentId)
|
||||
}
|
||||
|
||||
const selectedAgent = legacyAgents.find(
|
||||
const selectedAgent = agents.find(
|
||||
(agent) => agent.agentId === selectedAgentId,
|
||||
)
|
||||
const selectedAgentReady = selectedAgent
|
||||
@@ -135,15 +118,13 @@ export const AgentCommandHome: FC = () => {
|
||||
: false
|
||||
const selectedAgentStatus =
|
||||
selectedAgent?.source === 'agent-harness' ? 'running' : status?.status
|
||||
const selectedAgentName =
|
||||
selectedAgent?.name ?? orderedAgents[0]?.name ?? 'your agent'
|
||||
|
||||
const hasAgents = legacyAgents.length > 0
|
||||
const selectedCard =
|
||||
cardData.find((agent) => agent.agentId === selectedAgentId) ?? cardData[0]
|
||||
|
||||
return (
|
||||
<div className="min-h-full px-4 py-6">
|
||||
<div className="mx-auto flex w-full max-w-5xl flex-col gap-8">
|
||||
{hasAgents ? (
|
||||
{cardData.length > 0 ? (
|
||||
<>
|
||||
<div className="flex flex-col items-center gap-5 pt-[max(10vh,24px)] text-center">
|
||||
<div className="space-y-3">
|
||||
@@ -159,7 +140,7 @@ export const AgentCommandHome: FC = () => {
|
||||
<div className="w-full max-w-3xl">
|
||||
<ConversationInput
|
||||
variant="home"
|
||||
agents={legacyAgents}
|
||||
agents={agents}
|
||||
selectedAgentId={selectedAgentId}
|
||||
onSelectAgent={handleSelectAgent}
|
||||
onSend={handleSend}
|
||||
@@ -170,7 +151,7 @@ export const AgentCommandHome: FC = () => {
|
||||
attachmentsEnabled={false}
|
||||
placeholder={
|
||||
selectedAgentReady
|
||||
? `Ask ${selectedAgentName} to handle a task...`
|
||||
? `Ask ${selectedCard?.name ?? 'your agent'} to handle a task...`
|
||||
: 'Agent runtime is not running...'
|
||||
}
|
||||
/>
|
||||
@@ -181,8 +162,7 @@ export const AgentCommandHome: FC = () => {
|
||||
|
||||
<RecentThreads
|
||||
activeAgentId={selectedAgentId}
|
||||
agents={orderedAgents}
|
||||
adapters={adapters}
|
||||
agents={cardData}
|
||||
onOpenAgents={() => navigate('/agents')}
|
||||
onSelectAgent={(agentId) => navigate(`/home/agents/${agentId}`)}
|
||||
/>
|
||||
|
||||
@@ -1,65 +0,0 @@
|
||||
import { type FC, useMemo } from 'react'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { orderAgentsByPinThenRecency } from '@/entrypoints/app/agents/agents-list-order'
|
||||
import { AgentRailRow } from './AgentRailRow'
|
||||
|
||||
interface AgentRailProps {
|
||||
agents: HarnessAgent[]
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
activeAgentId: string
|
||||
onSelectAgent: (agent: HarnessAgent) => void
|
||||
onPinToggle: (agent: HarnessAgent, next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Left-column scrollable list of agents. The "Agents" label + back
|
||||
* button live in the shared top band above (so the rail header and
|
||||
* the chat header sit on a single aligned strip rather than as two
|
||||
* separately-sized headers per column). Sort matches `/agents`:
|
||||
* pinned-first → recency, so the rail doesn't reshuffle as turns
|
||||
* transition every 5 s.
|
||||
*/
|
||||
export const AgentRail: FC<AgentRailProps> = ({
|
||||
agents,
|
||||
adapters,
|
||||
activeAgentId,
|
||||
onSelectAgent,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const adapterHealth = useMemo(() => {
|
||||
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
|
||||
for (const adapter of adapters) {
|
||||
if (adapter.health) {
|
||||
map.set(adapter.id, {
|
||||
healthy: adapter.health.healthy,
|
||||
reason: adapter.health.reason,
|
||||
})
|
||||
}
|
||||
}
|
||||
return map
|
||||
}, [adapters])
|
||||
|
||||
const ordered = useMemo(() => orderAgentsByPinThenRecency(agents), [agents])
|
||||
|
||||
return (
|
||||
<aside className="hidden min-h-0 flex-col border-border/50 border-r bg-background/70 lg:flex">
|
||||
<div className="styled-scrollbar min-h-0 flex-1 space-y-1.5 overflow-y-auto px-3 py-3">
|
||||
{ordered.map((agent) => (
|
||||
<AgentRailRow
|
||||
key={agent.id}
|
||||
agent={agent}
|
||||
active={agent.id === activeAgentId}
|
||||
adapterHealth={adapterHealth.get(agent.adapter) ?? null}
|
||||
onSelect={() => onSelectAgent(agent)}
|
||||
onPinToggle={(next) => onPinToggle(agent, next)}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
</aside>
|
||||
)
|
||||
}
|
||||
@@ -1,102 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
|
||||
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface AgentRailRowProps {
|
||||
agent: HarnessAgent
|
||||
active: boolean
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
onSelect: () => void
|
||||
onPinToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Compact rail row for the chat-screen sidebar. Slims `<AgentRowCard>`
|
||||
* down to the essentials that fit a ~280 px rail: tile + name + status
|
||||
* badge + pin star, with the adapter / model / reasoning chips on a
|
||||
* second line. Token totals, sparkline, last-message preview all stay
|
||||
* on the `/agents` page where rows are full-width.
|
||||
*/
|
||||
export const AgentRailRow: FC<AgentRailRowProps> = ({
|
||||
agent,
|
||||
active,
|
||||
adapterHealth,
|
||||
onSelect,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const status = agent.status ?? 'unknown'
|
||||
const lastUsedAt = agent.lastUsedAt ?? null
|
||||
const pinned = agent.pinned ?? false
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onSelect}
|
||||
className={cn(
|
||||
'group w-full rounded-2xl border px-3 py-3 text-left transition-colors',
|
||||
active
|
||||
? 'border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/8'
|
||||
: 'border-transparent bg-transparent hover:border-border/60 hover:bg-card',
|
||||
)}
|
||||
>
|
||||
<div className="flex min-w-0 items-start gap-3">
|
||||
<AgentTile
|
||||
adapter={agent.adapter}
|
||||
status={status}
|
||||
lastUsedAt={lastUsedAt}
|
||||
/>
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<span className="truncate font-semibold text-[14px] leading-5">
|
||||
{agent.name}
|
||||
</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="h-5 bg-amber-50 px-1.5 text-[10px] text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 px-1.5 text-[10px] text-muted-foreground"
|
||||
>
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && (
|
||||
<Badge variant="destructive" className="h-5 px-1.5 text-[10px]">
|
||||
Attention
|
||||
</Badge>
|
||||
)}
|
||||
<div className="ml-auto">
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
</div>
|
||||
</div>
|
||||
<AgentSummaryChips
|
||||
adapter={agent.adapter}
|
||||
modelLabel={agent.modelId ?? null}
|
||||
reasoningEffort={agent.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Tooltip-only label helper kept exported in case the tile row needs to
|
||||
* show "Codex agent" or similar in a future state. Inlined fallback for
|
||||
* the rare `unknown` adapter rendering path.
|
||||
*/
|
||||
export function railRowAdapterLabel(agent: HarnessAgent): string {
|
||||
return adapterLabel(agent.adapter)
|
||||
}
|
||||
@@ -1,179 +0,0 @@
|
||||
import { ArrowLeft, Home } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentSummaryChips } from '@/entrypoints/app/agents/agent-row/AgentSummaryChips'
|
||||
import { formatTokens } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
import type { AgentAdapterHealth } from '@/entrypoints/app/agents/agent-row/agent-row.types'
|
||||
import { PinToggle } from '@/entrypoints/app/agents/agent-row/PinToggle'
|
||||
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface ConversationHeaderProps {
|
||||
agent: HarnessAgent | null
|
||||
fallbackName: string
|
||||
fallbackAdapter: 'claude' | 'codex' | 'openclaw' | 'unknown'
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
backLabel: string
|
||||
backTarget: 'home' | 'page'
|
||||
onGoHome: () => void
|
||||
onPinToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Strip above the chat. Mirrors the `/agents` row card's title row +
|
||||
* summary chips so the user gets adapter health, pin state, and status
|
||||
* at a glance — but adds the meta line (last used · lifetime tokens ·
|
||||
* queued) that's specific to this surface.
|
||||
*
|
||||
* The mobile `lg:hidden` Back button is preserved so the small-screen
|
||||
* collapse keeps a navigable header without a sidebar.
|
||||
*/
|
||||
export const ConversationHeader: FC<ConversationHeaderProps> = ({
|
||||
agent,
|
||||
fallbackName,
|
||||
fallbackAdapter,
|
||||
adapterHealth,
|
||||
backLabel,
|
||||
backTarget,
|
||||
onGoHome,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const BackIcon = backTarget === 'home' ? Home : ArrowLeft
|
||||
const adapter = agent?.adapter ?? fallbackAdapter
|
||||
const status: AgentLiveness = agent?.status ?? 'unknown'
|
||||
const lastUsedAt = agent?.lastUsedAt ?? null
|
||||
const pinned = agent?.pinned ?? false
|
||||
const queueCount = agent?.queue?.length ?? 0
|
||||
const tokens = agent?.tokens ?? null
|
||||
const lifetimeTotal = tokens
|
||||
? tokens.cumulative.input + tokens.cumulative.output
|
||||
: 0
|
||||
|
||||
const metaParts: string[] = []
|
||||
if (lastUsedAt !== null) metaParts.push(formatRelativeTime(lastUsedAt))
|
||||
if (lifetimeTotal > 0) metaParts.push(`${formatTokens(lifetimeTotal)} tokens`)
|
||||
if (queueCount > 0) {
|
||||
metaParts.push(queueCount === 1 ? '1 queued' : `${queueCount} queued`)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex min-h-[60px] shrink-0 items-center justify-between gap-4 px-5 py-2.5">
|
||||
<div className="flex min-w-0 items-center gap-3">
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
onClick={onGoHome}
|
||||
className="size-8 shrink-0 rounded-xl lg:hidden"
|
||||
title={backLabel}
|
||||
>
|
||||
<BackIcon className="size-4" />
|
||||
</Button>
|
||||
<div className="group min-w-0 flex-1">
|
||||
<div className="flex items-center gap-2">
|
||||
<span className="truncate font-semibold text-[15px] leading-6">
|
||||
{agent?.name || fallbackName}
|
||||
</span>
|
||||
{agent ? (
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
) : null}
|
||||
</div>
|
||||
<div className="mt-0.5 flex items-center gap-2">
|
||||
<AgentSummaryChips
|
||||
adapter={adapter}
|
||||
modelLabel={agent?.modelId ?? null}
|
||||
reasoningEffort={agent?.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex shrink-0 flex-col items-end gap-1">
|
||||
<StatusPill
|
||||
status={status}
|
||||
hasActiveTurn={Boolean(agent?.activeTurnId)}
|
||||
/>
|
||||
<div className="flex h-4 items-center text-[11px] text-muted-foreground">
|
||||
<span className="truncate">
|
||||
{metaParts.length > 0 ? metaParts.join(' · ') : '\u00A0'}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
interface StatusPillProps {
|
||||
status: AgentLiveness
|
||||
hasActiveTurn: boolean
|
||||
}
|
||||
|
||||
/**
|
||||
* Working / Asleep / Attention all get distinctive styling; idle keeps
|
||||
* the legacy emerald `Ready` pill so the default state is visually
|
||||
* calm. Defensive working: `idle + activeTurnId` falls through to the
|
||||
* working pill since the server says a turn is in flight.
|
||||
*/
|
||||
const StatusPill: FC<StatusPillProps> = ({ status, hasActiveTurn }) => {
|
||||
const effective: AgentLiveness =
|
||||
status === 'idle' && hasActiveTurn ? 'working' : status
|
||||
|
||||
const base =
|
||||
'inline-flex items-center gap-2 rounded-full border px-3 py-0.5 text-[11px] uppercase tracking-[0.18em]'
|
||||
|
||||
if (effective === 'working') {
|
||||
return (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className={cn(
|
||||
base,
|
||||
'border-amber-200 bg-amber-50 text-amber-900 hover:bg-amber-50',
|
||||
)}
|
||||
>
|
||||
<span className="size-1.5 animate-pulse rounded-full bg-amber-500" />
|
||||
Working
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'asleep') {
|
||||
return (
|
||||
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
|
||||
<span className="size-1.5 rounded-full bg-muted-foreground/50" />
|
||||
Asleep
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'error') {
|
||||
return (
|
||||
<Badge
|
||||
variant="destructive"
|
||||
className={cn(base, 'border-destructive/30')}
|
||||
>
|
||||
<span className="size-1.5 rounded-full bg-destructive-foreground" />
|
||||
Attention
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
if (effective === 'idle') {
|
||||
return (
|
||||
<Badge
|
||||
variant="outline"
|
||||
className={cn(
|
||||
base,
|
||||
'border-emerald-200 bg-emerald-50 text-emerald-900 hover:bg-emerald-50',
|
||||
)}
|
||||
>
|
||||
<span className="size-1.5 rounded-full bg-emerald-500" />
|
||||
Ready
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<Badge variant="outline" className={cn(base, 'text-muted-foreground')}>
|
||||
<span className="size-1.5 rounded-full bg-muted-foreground/30" />
|
||||
Setup
|
||||
</Badge>
|
||||
)
|
||||
}
|
||||
@@ -54,40 +54,25 @@ interface ConversationInputProps {
|
||||
placeholder?: string
|
||||
attachmentsEnabled?: boolean
|
||||
variant?: 'home' | 'conversation'
|
||||
/**
|
||||
* When set, a Stop button surfaces to the left of the voice mic
|
||||
* while `streaming === true`. Click cancels the active turn
|
||||
* server-side via the chat-cancel endpoint. Absent → no Stop
|
||||
* button (legacy behaviour for the home composer).
|
||||
*/
|
||||
onStop?: () => void
|
||||
}
|
||||
|
||||
function InputActionButton({
|
||||
disabled,
|
||||
onClick,
|
||||
streaming,
|
||||
hasContent,
|
||||
}: {
|
||||
disabled: boolean
|
||||
onClick: () => void
|
||||
streaming: boolean
|
||||
hasContent: boolean
|
||||
}) {
|
||||
// Show the spinner while streaming only when there's nothing to
|
||||
// send — once the user types something, the icon flips back to the
|
||||
// paper-plane so it reads as "queue this message" instead of
|
||||
// "still working".
|
||||
const showSpinner = streaming && !hasContent
|
||||
return (
|
||||
<Button
|
||||
onClick={onClick}
|
||||
size="icon"
|
||||
disabled={disabled}
|
||||
title={streaming && hasContent ? 'Queue message' : undefined}
|
||||
className="h-10 w-10 flex-shrink-0 rounded-xl bg-primary text-primary-foreground hover:bg-primary/90"
|
||||
>
|
||||
{showSpinner ? (
|
||||
{streaming ? (
|
||||
<Loader2 className="h-5 w-5 animate-spin" />
|
||||
) : (
|
||||
<ArrowRight className="h-5 w-5" />
|
||||
@@ -96,22 +81,6 @@ function InputActionButton({
|
||||
)
|
||||
}
|
||||
|
||||
function StopButton({ onStop }: { onStop: () => void }) {
|
||||
return (
|
||||
<Button
|
||||
type="button"
|
||||
size="icon"
|
||||
variant="ghost"
|
||||
onClick={onStop}
|
||||
title="Stop current turn — queued messages will start next."
|
||||
aria-label="Stop current turn"
|
||||
className="h-8 w-8 flex-shrink-0 rounded-lg bg-destructive/10 text-destructive transition-colors hover:bg-destructive/15 hover:text-destructive"
|
||||
>
|
||||
<Square className="h-3.5 w-3.5 fill-current" />
|
||||
</Button>
|
||||
)
|
||||
}
|
||||
|
||||
function VoiceButton({
|
||||
isRecording,
|
||||
isTranscribing,
|
||||
@@ -330,7 +299,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
placeholder,
|
||||
attachmentsEnabled = true,
|
||||
variant = 'conversation',
|
||||
onStop,
|
||||
}) => {
|
||||
const [input, setInput] = useState('')
|
||||
const [selectedTabs, setSelectedTabs] = useState<chrome.tabs.Tab[]>([])
|
||||
@@ -411,17 +379,10 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
}
|
||||
|
||||
const hasContent = input.trim().length > 0 || attachments.length > 0
|
||||
// Queue-aware composers (the conversation panel passes `onStop`)
|
||||
// accept input while streaming — the parent decides whether the
|
||||
// submission opens a new turn or enqueues onto the active one.
|
||||
// Surfaces without a Stop hook (home) keep the legacy behaviour
|
||||
// and block input until the current turn finishes.
|
||||
const queueAware = Boolean(onStop)
|
||||
|
||||
const handleSend = () => {
|
||||
const text = input.trim()
|
||||
if (disabled || isStaging) return
|
||||
if (streaming && !queueAware) return
|
||||
if (disabled || isStaging || streaming) return
|
||||
if (!text && attachments.length === 0) return
|
||||
onSend({ text, attachments })
|
||||
setInput('')
|
||||
@@ -551,7 +512,6 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
{streaming && onStop ? <StopButton onStop={onStop} /> : null}
|
||||
<VoiceButton
|
||||
isRecording={voice.isRecording}
|
||||
isTranscribing={voice.isTranscribing}
|
||||
@@ -569,13 +529,12 @@ export const ConversationInput: FC<ConversationInputProps> = ({
|
||||
!!disabled ||
|
||||
voice.isRecording ||
|
||||
voice.isTranscribing ||
|
||||
(streaming && !queueAware)
|
||||
streaming
|
||||
}
|
||||
onClick={handleSend}
|
||||
// Spinner stays the user-facing "agent is busy" hint; with the
|
||||
// queue active we still spin while a turn is in flight.
|
||||
streaming={streaming}
|
||||
hasContent={hasContent}
|
||||
/>
|
||||
</div>
|
||||
{voice.error ? (
|
||||
|
||||
@@ -1,243 +0,0 @@
|
||||
import { Quote, TriangleAlert } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { adapterLabel } from '@/entrypoints/app/agents/AdapterIcon'
|
||||
import { formatRelativeTime } from '@/entrypoints/app/agents/agent-display.helpers'
|
||||
import type {
|
||||
HarnessAdapterHealth,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { AgentTile } from '@/entrypoints/app/agents/agent-row/AgentTile'
|
||||
import {
|
||||
firstNonBlankLine,
|
||||
truncate,
|
||||
} from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
import type { AgentLiveness } from '@/entrypoints/app/agents/LivenessDot'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface HomeAgentCardProps {
|
||||
agent: HarnessAgent
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
/** Per-adapter health snapshot, shared across cards rendering the
|
||||
* same adapter. `null` when the /adapters response hasn't surfaced
|
||||
* health yet (we treat that as healthy until proven otherwise). */
|
||||
adapterHealth: HarnessAdapterHealth | null
|
||||
/** Highlights the card with an accent ring; tells the user which
|
||||
* agent the conversation input is bound to. */
|
||||
active?: boolean
|
||||
onClick: () => void
|
||||
}
|
||||
|
||||
const PREVIEW_CHARS = 100
|
||||
|
||||
/**
|
||||
* Grid-shaped card for the /home Recent agents section. Composition
|
||||
* mirrors the rail's `AgentRowCard` but the layout is a vertical
|
||||
* column sized for a 1/3-width tile rather than a full-width row.
|
||||
*
|
||||
* Reuses `<AgentTile>`, `<LivenessDot>`, `livenessDetail`,
|
||||
* `formatRelativeTime`, `firstNonBlankLine`, `truncate`, and the
|
||||
* inline `Unavailable` chip pattern so the visual language is
|
||||
* continuous between rail and grid.
|
||||
*/
|
||||
export const HomeAgentCard: FC<HomeAgentCardProps> = ({
|
||||
agent,
|
||||
adapter,
|
||||
adapterHealth,
|
||||
active,
|
||||
onClick,
|
||||
}) => {
|
||||
const status = agent.status ?? 'unknown'
|
||||
const lastUsedAt = agent.lastUsedAt ?? null
|
||||
const isWorking = status === 'working'
|
||||
const isAsleep = status === 'asleep'
|
||||
const isError = status === 'error'
|
||||
const hasActiveTurn = Boolean(agent.activeTurnId)
|
||||
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClick}
|
||||
className={cn(
|
||||
'group flex min-h-32 w-full min-w-0 flex-col rounded-2xl border bg-card p-4 text-left shadow-sm transition-colors',
|
||||
active && 'ring-1 ring-[var(--accent-orange)]/30',
|
||||
isWorking
|
||||
? 'border-[var(--accent-orange)]/40'
|
||||
: isError
|
||||
? 'border-destructive/30'
|
||||
: 'border-border/60 hover:border-[var(--accent-orange)]/30',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start gap-3">
|
||||
<AgentTile adapter={adapter} status={status} lastUsedAt={lastUsedAt} />
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<span className="truncate font-semibold text-sm">
|
||||
{displayName(agent)}
|
||||
</span>
|
||||
{isWorking && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="ml-auto bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
</div>
|
||||
<SummaryLine
|
||||
adapter={adapter}
|
||||
modelId={agent.modelId ?? null}
|
||||
reasoningEffort={agent.reasoningEffort ?? null}
|
||||
adapterHealth={adapterHealth}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<LastMessage message={agent.lastUserMessage ?? null} />
|
||||
|
||||
<div className="mt-3 flex items-center justify-between gap-2 text-muted-foreground text-xs">
|
||||
<span>{statusFootnote(status, lastUsedAt)}</span>
|
||||
{hasActiveTurn ? (
|
||||
<ResumeChip />
|
||||
) : isAsleep ? (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
) : isError ? (
|
||||
<ErrorChip lastError={agent.lastError ?? null} />
|
||||
) : null}
|
||||
</div>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
const SummaryLine: FC<{
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelId: string | null
|
||||
reasoningEffort: string | null
|
||||
adapterHealth: HarnessAdapterHealth | null
|
||||
}> = ({ adapter, modelId, reasoningEffort, adapterHealth }) => {
|
||||
const parts = [adapterLabel(adapter)]
|
||||
if (modelId) parts.push(modelId)
|
||||
if (reasoningEffort) parts.push(reasoningEffort)
|
||||
const unhealthy = adapterHealth?.healthy === false
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'mt-0.5 flex items-center gap-1.5 text-muted-foreground text-xs',
|
||||
unhealthy && 'text-muted-foreground/70',
|
||||
)}
|
||||
>
|
||||
<span className="truncate">{parts.join(' · ')}</span>
|
||||
{unhealthy && (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
<TriangleAlert className="size-2.5" />
|
||||
<span className="font-normal">Unavailable</span>
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="right" className="w-72 text-sm">
|
||||
<div className="font-medium">
|
||||
{adapterLabel(adapter)} CLI not available
|
||||
</div>
|
||||
<div className="mt-1 text-muted-foreground text-xs">
|
||||
{adapterHealth?.reason ??
|
||||
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
|
||||
</div>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
const LastMessage: FC<{ message: string | null }> = ({ message }) => {
|
||||
if (!message) {
|
||||
return (
|
||||
<p className="mt-3 flex-1 text-muted-foreground/70 text-xs italic">
|
||||
No messages yet — start a chat
|
||||
</p>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<p className="mt-3 line-clamp-2 flex flex-1 items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
|
||||
<Quote
|
||||
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
|
||||
aria-hidden
|
||||
/>
|
||||
<span className="line-clamp-2">
|
||||
{truncate(firstNonBlankLine(message), PREVIEW_CHARS)}
|
||||
</span>
|
||||
</p>
|
||||
)
|
||||
}
|
||||
|
||||
const ResumeChip: FC = () => (
|
||||
<span className="inline-flex items-center gap-1.5 rounded-full bg-[var(--accent-orange)] px-2.5 py-0.5 font-medium text-[11px] text-white shadow-sm">
|
||||
<span className="relative flex size-1.5">
|
||||
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
|
||||
<span className="relative inline-flex size-1.5 rounded-full bg-white" />
|
||||
</span>
|
||||
Resume
|
||||
</span>
|
||||
)
|
||||
|
||||
const ErrorChip: FC<{ lastError: string | null }> = ({ lastError }) => {
|
||||
if (!lastError) {
|
||||
return <Badge variant="destructive">Attention</Badge>
|
||||
}
|
||||
return (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge variant="destructive" className="cursor-default">
|
||||
Attention
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent
|
||||
side="left"
|
||||
className="max-w-xs whitespace-pre-wrap font-mono text-xs"
|
||||
>
|
||||
{lastError}
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Footer left side: relative time on every state EXCEPT working,
|
||||
* which shows `now` (the dot is already pulsing — restating it as
|
||||
* "Working" would duplicate the pill in the title row).
|
||||
*/
|
||||
function statusFootnote(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null,
|
||||
): string {
|
||||
if (status === 'working') return 'now'
|
||||
return formatRelativeTime(lastUsedAt)
|
||||
}
|
||||
|
||||
const UUID_PATTERN =
|
||||
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
|
||||
const OC_UUID_PATTERN =
|
||||
/^oc-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i
|
||||
|
||||
function displayName(agent: HarnessAgent): string {
|
||||
const name = agent.name?.trim()
|
||||
const id = agent.id
|
||||
if (!name || name === id) {
|
||||
if (OC_UUID_PATTERN.test(id)) return id.slice(0, 11)
|
||||
if (UUID_PATTERN.test(id)) return id.slice(0, 8)
|
||||
return id
|
||||
}
|
||||
return name
|
||||
}
|
||||
@@ -1,94 +0,0 @@
|
||||
import { ListPlus, X } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
Queue,
|
||||
QueueItem,
|
||||
QueueItemAction,
|
||||
QueueItemActions,
|
||||
QueueItemAttachment,
|
||||
QueueItemContent,
|
||||
QueueItemFile,
|
||||
QueueItemImage,
|
||||
QueueList,
|
||||
QueueSection,
|
||||
QueueSectionContent,
|
||||
QueueSectionLabel,
|
||||
QueueSectionTrigger,
|
||||
} from '@/components/ai-elements/queue'
|
||||
import type {
|
||||
HarnessQueuedMessage,
|
||||
HarnessQueuedMessageAttachment,
|
||||
} from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { firstNonBlankLine } from '@/entrypoints/app/agents/agent-row/agent-row.helpers'
|
||||
|
||||
interface QueuePanelProps {
|
||||
queue: HarnessQueuedMessage[]
|
||||
onRemove: (messageId: string) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Renders the agent's pending message queue using the shared AI
|
||||
* Elements `Queue` primitives. Caller is expected to gate render on
|
||||
* `queue.length > 0` — when empty, this returns null so the panel
|
||||
* disappears cleanly between turns.
|
||||
*/
|
||||
export const QueuePanel: FC<QueuePanelProps> = ({ queue, onRemove }) => {
|
||||
if (queue.length === 0) return null
|
||||
return (
|
||||
<Queue>
|
||||
<QueueSection>
|
||||
<QueueSectionTrigger>
|
||||
<QueueSectionLabel
|
||||
count={queue.length}
|
||||
label={queue.length === 1 ? 'queued message' : 'queued messages'}
|
||||
icon={<ListPlus className="size-3.5" />}
|
||||
/>
|
||||
</QueueSectionTrigger>
|
||||
<QueueSectionContent>
|
||||
<QueueList>
|
||||
{queue.map((entry) => (
|
||||
<QueueItem key={entry.id}>
|
||||
<div className="flex items-center gap-2">
|
||||
<QueueItemContent>
|
||||
{firstNonBlankLine(entry.message)}
|
||||
</QueueItemContent>
|
||||
<QueueItemActions>
|
||||
<QueueItemAction
|
||||
aria-label="Remove from queue"
|
||||
onClick={() => onRemove(entry.id)}
|
||||
>
|
||||
<X className="size-3" />
|
||||
</QueueItemAction>
|
||||
</QueueItemActions>
|
||||
</div>
|
||||
{entry.attachments && entry.attachments.length > 0 ? (
|
||||
<QueueItemAttachment>
|
||||
{entry.attachments.map((attachment, idx) =>
|
||||
renderAttachment(entry.id, attachment, idx),
|
||||
)}
|
||||
</QueueItemAttachment>
|
||||
) : null}
|
||||
</QueueItem>
|
||||
))}
|
||||
</QueueList>
|
||||
</QueueSectionContent>
|
||||
</QueueSection>
|
||||
</Queue>
|
||||
)
|
||||
}
|
||||
|
||||
function renderAttachment(
|
||||
messageId: string,
|
||||
attachment: HarnessQueuedMessageAttachment,
|
||||
idx: number,
|
||||
) {
|
||||
if (attachment.mediaType.startsWith('image/')) {
|
||||
const src = `data:${attachment.mediaType};base64,${attachment.data}`
|
||||
return <QueueItemImage key={`${messageId}-${idx}`} src={src} />
|
||||
}
|
||||
return (
|
||||
<QueueItemFile key={`${messageId}-${idx}`}>
|
||||
{attachment.mediaType}
|
||||
</QueueItemFile>
|
||||
)
|
||||
}
|
||||
@@ -1,69 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
import { orderHomeAgents } from './home-agent-card.helpers'
|
||||
|
||||
function agent(overrides: Partial<HarnessAgent>): HarnessAgent {
|
||||
return {
|
||||
id: overrides.id ?? 'agent-x',
|
||||
name: overrides.name ?? overrides.id ?? 'agent-x',
|
||||
adapter: overrides.adapter ?? 'codex',
|
||||
permissionMode: 'approve-all',
|
||||
sessionKey: `agent:${overrides.id ?? 'agent-x'}:main`,
|
||||
createdAt: 1000,
|
||||
updatedAt: 1000,
|
||||
...overrides,
|
||||
}
|
||||
}
|
||||
|
||||
describe('orderHomeAgents', () => {
|
||||
it('places active-turn agents before everyone else', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'a', lastUsedAt: 5000 }),
|
||||
agent({ id: 'b', lastUsedAt: 9000, activeTurnId: 'turn-1' }),
|
||||
agent({ id: 'c', lastUsedAt: 7000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['b', 'c', 'a'])
|
||||
})
|
||||
|
||||
it('orders non-active agents by lastUsedAt desc', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'old', lastUsedAt: 1000 }),
|
||||
agent({ id: 'new', lastUsedAt: 9000 }),
|
||||
agent({ id: 'mid', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['new', 'mid', 'old'])
|
||||
})
|
||||
|
||||
it('puts the gateway `main` seed agent above other never-used agents', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'oc-aaaaaa', lastUsedAt: null }),
|
||||
agent({ id: 'main', lastUsedAt: null }),
|
||||
agent({ id: 'oc-bbbbbb', lastUsedAt: null }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['main', 'oc-aaaaaa', 'oc-bbbbbb'])
|
||||
})
|
||||
|
||||
it('sends never-used agents to the bottom even when `main` is among them', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'main', lastUsedAt: null }),
|
||||
agent({ id: 'used', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['used', 'main'])
|
||||
})
|
||||
|
||||
it('does NOT sort by pinned — pinned agents are treated like any other', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'unpinned-recent', lastUsedAt: 9000, pinned: false }),
|
||||
agent({ id: 'pinned-old', lastUsedAt: 1000, pinned: true }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['unpinned-recent', 'pinned-old'])
|
||||
})
|
||||
|
||||
it('falls back to id-stable ordering when lastUsedAt ties', () => {
|
||||
const sorted = orderHomeAgents([
|
||||
agent({ id: 'b', lastUsedAt: 5000 }),
|
||||
agent({ id: 'a', lastUsedAt: 5000 }),
|
||||
])
|
||||
expect(sorted.map((a) => a.id)).toEqual(['a', 'b'])
|
||||
})
|
||||
})
|
||||
@@ -1,42 +0,0 @@
|
||||
import type { HarnessAgent } from '@/entrypoints/app/agents/agent-harness-types'
|
||||
|
||||
/**
|
||||
* Order for the /home Recent agents grid.
|
||||
*
|
||||
* 1. Active turn first — agents mid-turn float to the top so the
|
||||
* Resume affordance is the first thing the user sees on /home.
|
||||
* 2. The protected gateway-side `main` agent stays pinned-to-top in
|
||||
* the never-used group on a fresh install (mirrors the rail).
|
||||
* 3. Recency (`lastUsedAt` desc).
|
||||
* 4. `id` tiebreaker for stability so the grid doesn't reshuffle on
|
||||
* every 5-second poll.
|
||||
*
|
||||
* Pin is NOT a sort key. The home grid is action-oriented and trusts
|
||||
* recency + active-turn to surface the right agent; pinning is an
|
||||
* organisation tool that lives on the rail at /agents.
|
||||
*/
|
||||
export function orderHomeAgents(agents: HarnessAgent[]): HarnessAgent[] {
|
||||
return [...agents].sort((a, b) => {
|
||||
const aActive = a.activeTurnId != null
|
||||
const bActive = b.activeTurnId != null
|
||||
if (aActive !== bActive) return aActive ? -1 : 1
|
||||
|
||||
// Recency wins outright. Never-used agents (`lastUsedAt == null`)
|
||||
// both fall to the same `-Infinity` bucket and the seed/id rules
|
||||
// below decide their order — but a used agent always beats any
|
||||
// never-used agent regardless of id.
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
// Inside the never-used (or exact-tie) group: pin the gateway
|
||||
// `main` seed to the top of the group on a fresh install, then
|
||||
// fall back to id-stable order so the grid doesn't reshuffle on
|
||||
// every poll.
|
||||
const aSeed = a.id === 'main' && a.lastUsedAt == null
|
||||
const bSeed = b.id === 'main' && b.lastUsedAt == null
|
||||
if (aSeed !== bSeed) return aSeed ? -1 : 1
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,53 @@
|
||||
import {
|
||||
type AgentEntry,
|
||||
getModelDisplayName,
|
||||
type OpenClawStatus,
|
||||
} from '@/entrypoints/app/agents/useOpenClaw'
|
||||
import type { AgentCardData } from '@/lib/agent-conversations/types'
|
||||
import type { AgentOverview } from './useAgentDashboard'
|
||||
|
||||
function resolveAgentStatus(
|
||||
gatewayStatus: OpenClawStatus['status'] | undefined,
|
||||
liveStatus: AgentOverview['status'] | undefined,
|
||||
): AgentCardData['status'] {
|
||||
// Gateway-level errors take precedence
|
||||
if (gatewayStatus === 'error') return 'error'
|
||||
if (gatewayStatus === 'starting') return 'working'
|
||||
|
||||
// Per-agent live status from the WS observer
|
||||
if (liveStatus === 'working') return 'working'
|
||||
if (liveStatus === 'error') return 'error'
|
||||
|
||||
return 'idle'
|
||||
}
|
||||
|
||||
/**
|
||||
* Build agent card display data by merging the raw agent entries from
|
||||
* the gateway with enriched overview data from the dashboard API.
|
||||
*
|
||||
* Pure function — no hooks, no IndexedDB, no async.
|
||||
*/
|
||||
export function buildAgentCardData(
|
||||
agents: AgentEntry[],
|
||||
status: OpenClawStatus['status'] | undefined,
|
||||
dashboard: AgentOverview[] | undefined,
|
||||
): AgentCardData[] {
|
||||
return agents.map((agent) => {
|
||||
const overview = dashboard?.find((d) => d.agentId === agent.agentId)
|
||||
|
||||
return {
|
||||
agentId: agent.agentId,
|
||||
name: agent.name,
|
||||
model: getModelDisplayName(agent.model),
|
||||
status:
|
||||
agent.source === 'agent-harness'
|
||||
? 'idle'
|
||||
: resolveAgentStatus(status, overview?.status),
|
||||
lastMessage: overview?.latestMessage?.slice(0, 200) ?? undefined,
|
||||
lastMessageTimestamp: overview?.latestMessageAt ?? undefined,
|
||||
activitySummary: overview?.activitySummary ?? undefined,
|
||||
currentTool: overview?.currentTool ?? undefined,
|
||||
costUsd: overview?.totalCostUsd ?? undefined,
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -36,15 +36,6 @@ interface UseAgentConversationOptions {
|
||||
history?: OpenClawChatHistoryMessage[]
|
||||
onComplete?: () => void
|
||||
onSessionKeyChange?: (sessionKey: string) => void
|
||||
/**
|
||||
* Server-side active turn id, surfaced via the listing query. When
|
||||
* this changes from null/<id> to a different non-null id while we
|
||||
* aren't already streaming (e.g. the server just popped a queued
|
||||
* message and started a new turn), the hook reattaches via
|
||||
* /chat/active so the chat panel picks up the live stream without
|
||||
* waiting for a remount.
|
||||
*/
|
||||
activeTurnId?: string | null
|
||||
}
|
||||
|
||||
export function useAgentConversation(
|
||||
@@ -220,46 +211,31 @@ export function useAgentConversation(
|
||||
}
|
||||
processEventRef.current = processAgentHarnessStreamEvent
|
||||
|
||||
const activeTurnIdDep = options.activeTurnId ?? null
|
||||
|
||||
// On mount, on agent change, and whenever the listing reports a
|
||||
// *new* active turn id, check whether the server has an in-flight
|
||||
// turn for this agent and reattach to it. This catches three
|
||||
// cases at once: the chat resilience flow (tab close/reopen),
|
||||
// navigation between agents, AND queue drain (the server starts a
|
||||
// new turn from a queued message → activeTurnId flips → attach).
|
||||
// On mount (and whenever the agent changes), check whether the
|
||||
// server has an in-flight turn for this agent and reattach to it.
|
||||
// This is what makes the chat resilient across tab close/reopen,
|
||||
// refresh, and navigation: the runtime call kept running on the
|
||||
// server while we were away. Effect only depends on `agentId` —
|
||||
// the event handler is read off a ref so this doesn't re-subscribe
|
||||
// every render.
|
||||
useEffect(() => {
|
||||
let cancelled = false
|
||||
const abortController = new AbortController()
|
||||
// Reference the dep inside the body so biome's exhaustive-deps
|
||||
// rule sees it consumed; the value is just an "any non-null
|
||||
// active turn id" trigger — the actual id we attach to comes
|
||||
// from the fresh fetchActiveHarnessTurn call below.
|
||||
void activeTurnIdDep
|
||||
|
||||
const attemptResume = async () => {
|
||||
// Track whether *we* started a stream in this run. When the
|
||||
// early-return paths fire (no active turn, or a `send()` /
|
||||
// earlier resume already owns `streamAbortRef`), the finally
|
||||
// block must NOT touch streaming/turnIdRef/lastSeqRef —
|
||||
// otherwise we clobber the in-flight stream's state and the
|
||||
// Stop button drops out mid-turn while events keep arriving.
|
||||
let weStartedStream = false
|
||||
try {
|
||||
const active = await fetchActiveHarnessTurn(agentId)
|
||||
if (cancelled || !active || active.status !== 'running') return
|
||||
if (streamAbortRef.current) return // someone else already owns the stream
|
||||
if (streamAbortRef.current) return // a fresh send already in flight
|
||||
|
||||
// Stage a placeholder turn so the streamed events have a row
|
||||
// to render into. The server now persists the kicking-off
|
||||
// prompt on the active turn, so we render it as the user
|
||||
// bubble immediately — no empty-bubble flicker when a queued
|
||||
// message starts running.
|
||||
// to render into. We don't have the user message text on
|
||||
// resume; the assistant turn is what we're catching up on.
|
||||
setTurns((prev) => [
|
||||
...prev,
|
||||
{
|
||||
id: crypto.randomUUID(),
|
||||
userText: active.prompt ?? '',
|
||||
userText: '',
|
||||
parts: [],
|
||||
done: false,
|
||||
timestamp: active.startedAt,
|
||||
@@ -271,7 +247,6 @@ export function useAgentConversation(
|
||||
lastSeqRef.current = null
|
||||
streamAbortRef.current = abortController
|
||||
setStreaming(true)
|
||||
weStartedStream = true
|
||||
|
||||
const response = await attachToHarnessTurn(agentId, {
|
||||
turnId: active.turnId,
|
||||
@@ -290,20 +265,10 @@ export function useAgentConversation(
|
||||
// Resume is best-effort; transient errors fall back to the
|
||||
// user starting a new turn manually.
|
||||
} finally {
|
||||
// Always release `streamAbortRef` if we owned it — even when
|
||||
// the effect was cancelled mid-stream (a listing poll
|
||||
// captured the next queue-drain turn id, for example). If we
|
||||
// don't, the next effect run hits `if (streamAbortRef.current)
|
||||
// return` against our now-aborted controller and never
|
||||
// reattaches, leaving `streaming === true` with no live stream.
|
||||
if (weStartedStream && streamAbortRef.current === abortController) {
|
||||
streamAbortRef.current = null
|
||||
}
|
||||
// The other state (streaming flag, turn id, lastSeq) is the
|
||||
// *current run's* lifecycle: only reset it on a clean exit.
|
||||
// When `cancelled` is true the next run will set these
|
||||
// itself, so resetting here would only cause a brief flicker.
|
||||
if (!cancelled && weStartedStream) {
|
||||
if (!cancelled) {
|
||||
if (streamAbortRef.current === abortController) {
|
||||
streamAbortRef.current = null
|
||||
}
|
||||
turnIdRef.current = null
|
||||
lastSeqRef.current = null
|
||||
setStreaming(false)
|
||||
@@ -316,7 +281,7 @@ export function useAgentConversation(
|
||||
cancelled = true
|
||||
abortController.abort()
|
||||
}
|
||||
}, [agentId, activeTurnIdDep])
|
||||
}, [agentId])
|
||||
|
||||
const send = async (input: string | SendInput) => {
|
||||
const normalized: SendInput =
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
import { useQuery, useQueryClient } from '@tanstack/react-query'
|
||||
import { useEffect } from 'react'
|
||||
import { useAgentServerUrl } from '@/lib/browseros/useBrowserOSProviders'
|
||||
|
||||
export interface AgentOverview {
|
||||
agentId: string
|
||||
status: 'working' | 'idle' | 'error' | 'unknown'
|
||||
latestMessage: string | null
|
||||
latestMessageAt: number | null
|
||||
activitySummary: string | null
|
||||
currentTool: string | null
|
||||
totalCostUsd: number
|
||||
sessionCount: number
|
||||
}
|
||||
|
||||
export interface DashboardResponse {
|
||||
agents: AgentOverview[]
|
||||
summary: {
|
||||
totalAgents: number
|
||||
totalCostUsd: number
|
||||
}
|
||||
}
|
||||
|
||||
interface StatusEvent {
|
||||
agentId: string
|
||||
status: AgentOverview['status']
|
||||
currentTool: string | null
|
||||
error: string | null
|
||||
timestamp: number
|
||||
}
|
||||
|
||||
const DASHBOARD_QUERY_KEY = ['claw', 'dashboard']
|
||||
|
||||
export function useAgentDashboard(enabled: boolean) {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
const ready = enabled && Boolean(baseUrl) && !urlLoading
|
||||
|
||||
// Initial data load + periodic refresh as fallback
|
||||
const query = useQuery<DashboardResponse>({
|
||||
queryKey: [...DASHBOARD_QUERY_KEY, baseUrl],
|
||||
queryFn: async () => {
|
||||
const url = new URL('/claw/dashboard', baseUrl as string)
|
||||
const response = await fetch(url.toString())
|
||||
if (!response.ok) throw new Error('Failed to fetch dashboard')
|
||||
return response.json()
|
||||
},
|
||||
enabled: ready,
|
||||
})
|
||||
|
||||
// SSE subscription for real-time status patches
|
||||
useEffect(() => {
|
||||
if (!ready || !baseUrl) return
|
||||
|
||||
const streamUrl = new URL('/claw/dashboard/stream', baseUrl)
|
||||
const eventSource = new EventSource(streamUrl.toString())
|
||||
|
||||
eventSource.addEventListener('snapshot', (event) => {
|
||||
try {
|
||||
const dashboard = JSON.parse(event.data) as DashboardResponse
|
||||
queryClient.setQueryData([...DASHBOARD_QUERY_KEY, baseUrl], dashboard)
|
||||
} catch {}
|
||||
})
|
||||
|
||||
eventSource.addEventListener('status', (event) => {
|
||||
try {
|
||||
const status = JSON.parse(event.data) as StatusEvent
|
||||
queryClient.setQueryData<DashboardResponse>(
|
||||
[...DASHBOARD_QUERY_KEY, baseUrl],
|
||||
(prev) => {
|
||||
if (!prev) return prev
|
||||
return {
|
||||
...prev,
|
||||
agents: prev.agents.map((agent) =>
|
||||
agent.agentId === status.agentId
|
||||
? {
|
||||
...agent,
|
||||
status: status.status,
|
||||
currentTool: status.currentTool,
|
||||
}
|
||||
: agent,
|
||||
),
|
||||
}
|
||||
},
|
||||
)
|
||||
} catch {}
|
||||
})
|
||||
|
||||
return () => {
|
||||
eventSource.close()
|
||||
}
|
||||
}, [ready, baseUrl, queryClient])
|
||||
|
||||
return query
|
||||
}
|
||||
@@ -2,75 +2,67 @@ import { Loader2 } from 'lucide-react'
|
||||
import { type FC, useMemo } from 'react'
|
||||
import { AgentRowCard } from './AgentRowCard'
|
||||
import { AgentsEmptyState } from './AgentsEmptyState'
|
||||
import type {
|
||||
HarnessAdapterDescriptor,
|
||||
HarnessAgent,
|
||||
HarnessAgentAdapter,
|
||||
} from './agent-harness-types'
|
||||
import type {
|
||||
AgentAdapterHealth,
|
||||
AgentRowData,
|
||||
} from './agent-row/agent-row.types'
|
||||
import { compareAgentsByPinThenRecency } from './agents-list-order'
|
||||
import type { HarnessAgent, HarnessAgentAdapter } from './agent-harness-types'
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import type { AgentLiveness } from './LivenessDot'
|
||||
|
||||
interface AgentListProps {
|
||||
agents: AgentListItem[]
|
||||
/** Optional per-agent activity metadata, keyed by `agentId`. */
|
||||
/**
|
||||
* Optional per-agent activity metadata. Keyed by `agentId`. Missing
|
||||
* entries fall back to status='unknown' / lastUsedAt=null and the
|
||||
* row renders an "unknown" dot. The server will populate this once
|
||||
* the activity tracker ships; the page works without it.
|
||||
*/
|
||||
activity?: Record<
|
||||
string,
|
||||
{ status: AgentLiveness; lastUsedAt: number | null }
|
||||
>
|
||||
/** Lookup table from harness id → enriched agent record. */
|
||||
/**
|
||||
* Lookup table from harness agent id → adapter + reasoning effort,
|
||||
* sourced from `useHarnessAgents`. Lets the row card render the
|
||||
* correct adapter icon and chips for harness agents (legacy
|
||||
* /claw/agents entries fall back to inferring from `runtimeLabel`).
|
||||
*/
|
||||
harnessAgentLookup?: Map<string, HarnessAgent>
|
||||
/** Adapter catalog (carries per-adapter health). */
|
||||
adapters: HarnessAdapterDescriptor[]
|
||||
loading: boolean
|
||||
deletingAgentKey: string | null
|
||||
onCreateAgent: () => void
|
||||
onDeleteAgent: (agent: AgentListItem) => void
|
||||
onPinToggle: (agent: AgentListItem, next: boolean) => void
|
||||
}
|
||||
|
||||
export const AgentList: FC<AgentListProps> = ({
|
||||
agents,
|
||||
activity,
|
||||
harnessAgentLookup,
|
||||
adapters,
|
||||
loading,
|
||||
deletingAgentKey,
|
||||
onCreateAgent,
|
||||
onDeleteAgent,
|
||||
onPinToggle,
|
||||
}) => {
|
||||
const adapterHealth = useMemo(() => {
|
||||
const map = new Map<HarnessAgentAdapter, AgentAdapterHealth>()
|
||||
for (const adapter of adapters) {
|
||||
if (adapter.health) {
|
||||
map.set(adapter.id, {
|
||||
healthy: adapter.health.healthy,
|
||||
reason: adapter.health.reason,
|
||||
})
|
||||
}
|
||||
}
|
||||
return map
|
||||
}, [adapters])
|
||||
|
||||
// Sort by recency: most recently used first; never-used agents drop
|
||||
// to the bottom in id-stable order so the list doesn't reshuffle on
|
||||
// every refresh. The pinned exception is the gateway's `main` agent
|
||||
// when it's never been touched — keep it at the top so a fresh
|
||||
// install has an obvious starting point.
|
||||
const ordered = useMemo(() => {
|
||||
const withMeta = agents.map((agent) => {
|
||||
const harness = harnessAgentLookup?.get(agent.agentId)
|
||||
return {
|
||||
agent,
|
||||
id: agent.agentId,
|
||||
pinned: harness?.pinned ?? false,
|
||||
lastUsedAt: activity?.[agent.agentId]?.lastUsedAt ?? null,
|
||||
}
|
||||
const withScore = agents.map((agent) => {
|
||||
const lastUsedAt = activity?.[agent.agentId]?.lastUsedAt ?? null
|
||||
return { agent, lastUsedAt }
|
||||
})
|
||||
return withMeta
|
||||
.sort(compareAgentsByPinThenRecency)
|
||||
return withScore
|
||||
.sort((a, b) => {
|
||||
const aPinned = a.agent.agentId === 'main' && a.lastUsedAt === null
|
||||
const bPinned = b.agent.agentId === 'main' && b.lastUsedAt === null
|
||||
if (aPinned && !bPinned) return -1
|
||||
if (!aPinned && bPinned) return 1
|
||||
const aValue = a.lastUsedAt ?? -Infinity
|
||||
const bValue = b.lastUsedAt ?? -Infinity
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
return a.agent.agentId.localeCompare(b.agent.agentId)
|
||||
})
|
||||
.map((entry) => entry.agent)
|
||||
}, [activity, agents, harnessAgentLookup])
|
||||
}, [activity, agents])
|
||||
|
||||
if (loading && agents.length === 0) {
|
||||
return (
|
||||
@@ -88,23 +80,18 @@ export const AgentList: FC<AgentListProps> = ({
|
||||
<div className="grid gap-3">
|
||||
{ordered.map((agent) => {
|
||||
const harness = harnessAgentLookup?.get(agent.agentId)
|
||||
const adapter: HarnessAgentAdapter | 'unknown' =
|
||||
const adapter: HarnessAgentAdapter | undefined =
|
||||
harness?.adapter ?? inferAdapterFromLabel(agent.runtimeLabel)
|
||||
const data = buildRowData({
|
||||
agent,
|
||||
adapter,
|
||||
harness,
|
||||
activity: activity?.[agent.agentId],
|
||||
adapterHealth:
|
||||
adapterHealth.get(adapter as HarnessAgentAdapter) ?? null,
|
||||
})
|
||||
return (
|
||||
<AgentRowCard
|
||||
key={agent.key}
|
||||
data={data}
|
||||
deleting={deletingAgentKey === agent.key}
|
||||
agent={agent}
|
||||
status={activity?.[agent.agentId]?.status}
|
||||
lastUsedAt={activity?.[agent.agentId]?.lastUsedAt}
|
||||
adapter={adapter}
|
||||
reasoningEffort={harness?.reasoningEffort ?? null}
|
||||
onDelete={onDeleteAgent}
|
||||
onPinToggle={onPinToggle}
|
||||
deleting={deletingAgentKey === agent.key}
|
||||
/>
|
||||
)
|
||||
})}
|
||||
@@ -112,53 +99,10 @@ export const AgentList: FC<AgentListProps> = ({
|
||||
)
|
||||
}
|
||||
|
||||
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | 'unknown' {
|
||||
function inferAdapterFromLabel(label: string): HarnessAgentAdapter | undefined {
|
||||
const lower = label?.toLowerCase()
|
||||
if (lower === 'claude code') return 'claude'
|
||||
if (lower === 'codex') return 'codex'
|
||||
if (lower === 'openclaw') return 'openclaw'
|
||||
return 'unknown'
|
||||
}
|
||||
|
||||
const ZERO_BUCKETS = (): number[] => Array.from({ length: 14 }, () => 0)
|
||||
|
||||
function buildRowData(input: {
|
||||
agent: AgentListItem
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
harness: HarnessAgent | undefined
|
||||
activity: { status: AgentLiveness; lastUsedAt: number | null } | undefined
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}): AgentRowData {
|
||||
const { agent, adapter, harness, activity, adapterHealth } = input
|
||||
return {
|
||||
agent,
|
||||
adapter,
|
||||
modelLabel: deriveModelLabel(agent, harness),
|
||||
reasoningEffort: harness?.reasoningEffort ?? null,
|
||||
status: activity?.status ?? 'unknown',
|
||||
lastUsedAt: activity?.lastUsedAt ?? harness?.lastUsedAt ?? null,
|
||||
pinned: harness?.pinned ?? false,
|
||||
cwd: harness?.cwd ?? null,
|
||||
lastUserMessage: harness?.lastUserMessage ?? null,
|
||||
tokens: harness?.tokens ?? null,
|
||||
turnsByDay: harness?.turnsByDay ?? ZERO_BUCKETS(),
|
||||
failedByDay: harness?.failedByDay ?? ZERO_BUCKETS(),
|
||||
lastError: harness?.lastError ?? null,
|
||||
lastErrorAt: harness?.lastErrorAt ?? null,
|
||||
activeTurnId: harness?.activeTurnId ?? null,
|
||||
adapterHealth,
|
||||
}
|
||||
}
|
||||
|
||||
function deriveModelLabel(
|
||||
agent: AgentListItem,
|
||||
harness: HarnessAgent | undefined,
|
||||
): string | null {
|
||||
// Prefer the agent rail's modelLabel when meaningful; harness's
|
||||
// modelId is a stable identifier but the rail's `modelLabel`
|
||||
// already maps to a friendly display string.
|
||||
if (agent.modelLabel && agent.modelLabel !== 'default') {
|
||||
return agent.modelLabel
|
||||
}
|
||||
return harness?.modelId ?? null
|
||||
return undefined
|
||||
}
|
||||
|
||||
@@ -1,99 +1,270 @@
|
||||
import {
|
||||
Copy,
|
||||
Loader2,
|
||||
MessageSquare,
|
||||
MoreHorizontal,
|
||||
Pencil,
|
||||
RotateCcw,
|
||||
Trash2,
|
||||
} from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { toast } from 'sonner'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
DropdownMenuItem,
|
||||
DropdownMenuSeparator,
|
||||
DropdownMenuTrigger,
|
||||
} from '@/components/ui/dropdown-menu'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AgentActions } from './agent-row/AgentActions'
|
||||
import { AgentErrorPanel } from './agent-row/AgentErrorPanel'
|
||||
import { AgentLastMessage } from './agent-row/AgentLastMessage'
|
||||
import { AgentMetaRow } from './agent-row/AgentMetaRow'
|
||||
import { AgentSummaryChips } from './agent-row/AgentSummaryChips'
|
||||
import { AgentTile } from './agent-row/AgentTile'
|
||||
import { AgentTitleRow } from './agent-row/AgentTitleRow'
|
||||
import type {
|
||||
AgentRowCallbacks,
|
||||
AgentRowData,
|
||||
} from './agent-row/agent-row.types'
|
||||
import { AdapterIcon, adapterLabel } from './AdapterIcon'
|
||||
import {
|
||||
canDelete as canDeleteAgent,
|
||||
canRename as canRenameAgent,
|
||||
displayName,
|
||||
formatRelativeTime,
|
||||
workspaceLabel,
|
||||
} from './agent-display.helpers'
|
||||
import type { HarnessAgentAdapter } from './agent-harness-types'
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import { type AgentLiveness, LivenessDot } from './LivenessDot'
|
||||
|
||||
interface AgentRowCardProps extends AgentRowCallbacks {
|
||||
data: AgentRowData
|
||||
/** Whether THIS agent is mid-delete; renders a spinner in the menu. */
|
||||
interface AgentRowCardProps {
|
||||
agent: AgentListItem
|
||||
/**
|
||||
* Per-agent extras the listing surface provides on top of the
|
||||
* minimal `AgentListItem` shape. `lastUsedAt` survives server
|
||||
* restart (sourced from acpx session record); `status` is in-memory
|
||||
* server-side.
|
||||
*/
|
||||
status?: AgentLiveness
|
||||
lastUsedAt?: number | null
|
||||
/** Adapter the agent belongs to. Drives icon + label. */
|
||||
adapter?: HarnessAgentAdapter
|
||||
/** Reasoning effort chip (claude/codex/openclaw catalog). */
|
||||
reasoningEffort?: string | null
|
||||
/** Modeled directly off the inbound delete handler so the parent owns the dialog. */
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
/** Whether THIS agent is mid-delete; renders a spinner in place of the trash icon. */
|
||||
deleting?: boolean
|
||||
}
|
||||
|
||||
/**
|
||||
* Composition shell for the agent rail. Owns no state; sub-components
|
||||
* each handle their own micro-state (error-panel collapse, etc.) and
|
||||
* emit callbacks (delete, pin/unpin) for the page to act on.
|
||||
*
|
||||
* The whole card carries state — not just the tile — so the row's
|
||||
* border subtly tells the user what's going on at a glance:
|
||||
* working → accent-orange border with a soft glow
|
||||
* error → destructive border
|
||||
* idle → muted border, lifts on hover
|
||||
*/
|
||||
export const AgentRowCard: FC<AgentRowCardProps> = ({
|
||||
data,
|
||||
deleting,
|
||||
agent,
|
||||
status = 'unknown',
|
||||
lastUsedAt,
|
||||
adapter,
|
||||
reasoningEffort,
|
||||
onDelete,
|
||||
onPinToggle,
|
||||
deleting,
|
||||
}) => {
|
||||
const navigate = useNavigate()
|
||||
const adapterId = adapter ?? inferAdapterFromListItem(agent)
|
||||
const workspace = workspaceLabel(agent)
|
||||
const lastUsedLabel = formatRelativeTime(lastUsedAt ?? null)
|
||||
const allowDelete = canDeleteAgent(agent)
|
||||
const allowRename = canRenameAgent(agent)
|
||||
|
||||
const handleChat = () => navigate(`/agents/${agent.agentId}`)
|
||||
const handleCopyId = async () => {
|
||||
try {
|
||||
await navigator.clipboard.writeText(agent.agentId)
|
||||
toast.success('Agent id copied')
|
||||
} catch {
|
||||
toast.error('Could not copy agent id')
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
// Layout-stable hover. No translate, no shadow change — both
|
||||
// visibly perturb neighbouring rows. Only the border tint
|
||||
// shifts on hover, and the rail's vertical rhythm stays
|
||||
// exactly the same in every state.
|
||||
'group rounded-xl border bg-card p-4 shadow-sm transition-colors',
|
||||
data.status === 'working'
|
||||
? 'border-[var(--accent-orange)]/40'
|
||||
: data.status === 'error'
|
||||
? 'border-destructive/40'
|
||||
: 'border-border hover:border-[var(--accent-orange)]/30',
|
||||
'group rounded-xl border border-border bg-card p-4 shadow-sm transition-all',
|
||||
'hover:border-[var(--accent-orange)]/50 hover:shadow-sm',
|
||||
)}
|
||||
>
|
||||
<div className="flex items-start gap-4">
|
||||
<AgentTile
|
||||
adapter={data.adapter}
|
||||
status={data.status}
|
||||
lastUsedAt={data.lastUsedAt}
|
||||
/>
|
||||
|
||||
<div className="min-w-0 flex-1">
|
||||
<AgentTitleRow
|
||||
agent={data.agent}
|
||||
status={data.status}
|
||||
pinned={data.pinned}
|
||||
turnsByDay={data.turnsByDay}
|
||||
failedByDay={data.failedByDay}
|
||||
onPinToggle={(next) => onPinToggle(data.agent, next)}
|
||||
{/* Adapter tile + liveness dot in the corner. */}
|
||||
<div className="relative shrink-0">
|
||||
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<AdapterIcon adapter={adapterId} className="h-6 w-6" />
|
||||
</div>
|
||||
<LivenessDot
|
||||
status={status}
|
||||
detail={livenessDetail(status, lastUsedAt)}
|
||||
className="absolute -right-0.5 -bottom-0.5"
|
||||
/>
|
||||
|
||||
<AgentSummaryChips
|
||||
adapter={data.adapter}
|
||||
modelLabel={data.modelLabel}
|
||||
reasoningEffort={data.reasoningEffort}
|
||||
adapterHealth={data.adapterHealth}
|
||||
/>
|
||||
|
||||
<AgentLastMessage message={data.lastUserMessage} />
|
||||
|
||||
<AgentMetaRow lastUsedAt={data.lastUsedAt} tokens={data.tokens} />
|
||||
|
||||
{data.status === 'error' && data.lastError && (
|
||||
<AgentErrorPanel
|
||||
agentId={data.agent.agentId}
|
||||
message={data.lastError}
|
||||
errorAt={data.lastErrorAt}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<AgentActions
|
||||
agent={data.agent}
|
||||
activeTurnId={data.activeTurnId}
|
||||
deleting={deleting}
|
||||
onDelete={onDelete}
|
||||
/>
|
||||
<div className="min-w-0 flex-1">
|
||||
<div className="mb-1 flex items-center gap-2">
|
||||
<span className="truncate font-semibold">{displayName(agent)}</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && (
|
||||
<Badge variant="destructive">Attention</Badge>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div className="mb-2 flex flex-wrap items-center gap-1.5 text-xs">
|
||||
<Badge variant="secondary" className="font-normal">
|
||||
{adapterLabel(adapterId)}
|
||||
</Badge>
|
||||
{agent.modelLabel && agent.modelLabel !== 'default' && (
|
||||
<Badge variant="outline" className="font-normal">
|
||||
{agent.modelLabel}
|
||||
</Badge>
|
||||
)}
|
||||
{reasoningEffort && reasoningEffort !== 'medium' && (
|
||||
<Badge variant="outline" className="font-normal">
|
||||
{reasoningEffort}
|
||||
</Badge>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div className="flex flex-wrap items-center gap-2 text-muted-foreground text-xs">
|
||||
<span>Last used {lastUsedLabel}</span>
|
||||
{workspace && (
|
||||
<>
|
||||
<span aria-hidden>•</span>
|
||||
<span className="truncate font-mono" title={workspace}>
|
||||
{workspace}
|
||||
</span>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="flex shrink-0 items-center gap-2">
|
||||
<Button variant="outline" size="sm" onClick={handleChat}>
|
||||
<MessageSquare className="mr-1.5 h-3 w-3" />
|
||||
Chat
|
||||
</Button>
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
aria-label={`More actions for ${displayName(agent)}`}
|
||||
className="h-8 w-8"
|
||||
>
|
||||
<MoreHorizontal className="h-4 w-4" />
|
||||
</Button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end" className="w-44">
|
||||
<DropdownMenuItem onSelect={() => void handleCopyId()}>
|
||||
<Copy className="mr-2 h-3.5 w-3.5" />
|
||||
Copy id
|
||||
</DropdownMenuItem>
|
||||
<RenameMenuItem disabled={!allowRename} />
|
||||
<ResetHistoryMenuItem />
|
||||
<DropdownMenuSeparator />
|
||||
<DropdownMenuItem
|
||||
onSelect={() => onDelete(agent)}
|
||||
disabled={!allowDelete || deleting}
|
||||
className="text-destructive focus:text-destructive"
|
||||
>
|
||||
{deleting ? (
|
||||
<Loader2 className="mr-2 h-3.5 w-3.5 animate-spin" />
|
||||
) : (
|
||||
<Trash2 className="mr-2 h-3.5 w-3.5" />
|
||||
)}
|
||||
Delete
|
||||
</DropdownMenuItem>
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
const RenameMenuItem: FC<{ disabled: boolean }> = ({ disabled }) => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<Pencil className="mr-2 h-3.5 w-3.5" />
|
||||
Rename
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
if (!disabled) return item
|
||||
// Disabled but with a hint so users know it's coming, not broken.
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
Rename coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
|
||||
const ResetHistoryMenuItem: FC = () => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<RotateCcw className="mr-2 h-3.5 w-3.5" />
|
||||
Reset history
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
Reset history coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
|
||||
function inferAdapterFromListItem(
|
||||
agent: AgentListItem,
|
||||
): HarnessAgentAdapter | 'unknown' {
|
||||
const label = agent.runtimeLabel?.toLowerCase()
|
||||
if (label?.includes('claude')) return 'claude'
|
||||
if (label?.includes('codex')) return 'codex'
|
||||
if (label?.includes('openclaw')) return 'openclaw'
|
||||
return 'unknown'
|
||||
}
|
||||
|
||||
function livenessDetail(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null | undefined,
|
||||
): string | undefined {
|
||||
if (lastUsedAt == null) return undefined
|
||||
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
|
||||
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
|
||||
if (status === 'asleep') {
|
||||
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
|
||||
const hr = Math.floor(diffMin / 60)
|
||||
return `Asleep — quiet for ${hr} hr`
|
||||
}
|
||||
if (status === 'working') return 'Working on a turn'
|
||||
if (status === 'error') return 'Attention — last turn failed'
|
||||
return undefined
|
||||
}
|
||||
|
||||
@@ -44,7 +44,6 @@ import {
|
||||
useCreateHarnessAgent,
|
||||
useDeleteHarnessAgent,
|
||||
useHarnessAgents,
|
||||
useUpdateHarnessAgent,
|
||||
} from './useAgents'
|
||||
import { useOpenClawAgents, useOpenClawMutations } from './useOpenClaw'
|
||||
|
||||
@@ -77,7 +76,6 @@ export const AgentsPage: FC = () => {
|
||||
} = useOpenClawAgents(openClawAgentsEnabled)
|
||||
const createHarnessAgent = useCreateHarnessAgent()
|
||||
const deleteHarnessAgent = useDeleteHarnessAgent()
|
||||
const updateHarnessAgent = useUpdateHarnessAgent()
|
||||
const {
|
||||
setupOpenClaw,
|
||||
createAgent: createOpenClawAgent,
|
||||
@@ -344,24 +342,12 @@ export const AgentsPage: FC = () => {
|
||||
agents={agentListItems}
|
||||
activity={agentActivity}
|
||||
harnessAgentLookup={harnessAgentLookup}
|
||||
adapters={adapters}
|
||||
loading={agentsLoading}
|
||||
deletingAgentKey={deletingAgent ? deletingAgentKey : null}
|
||||
onCreateAgent={() => setCreateOpen(true)}
|
||||
onDeleteAgent={(agent) => {
|
||||
void handleDelete(agent)
|
||||
}}
|
||||
onPinToggle={(agent, next) => {
|
||||
// Optimistic mutation; harness-only — gateway-original
|
||||
// OpenClaw entries are gated server-side via the harness
|
||||
// backfill, so we only fire when the row maps to a
|
||||
// harness agent record.
|
||||
if (!harnessAgentLookup.has(agent.agentId)) return
|
||||
updateHarnessAgent.mutate({
|
||||
agentId: agent.agentId,
|
||||
patch: { pinned: next },
|
||||
})
|
||||
}}
|
||||
/>
|
||||
|
||||
<SetupOpenClawDialog
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
import type { AgentListItem } from './agents-page-types'
|
||||
import type { AgentLiveness } from './LivenessDot'
|
||||
|
||||
/**
|
||||
* Display rules for the redesigned agent rows. Pure helpers — no React,
|
||||
@@ -83,25 +82,3 @@ export function formatRelativeTime(epochMs: number | null): string {
|
||||
const d = Math.floor(diff / ONE_DAY)
|
||||
return d === 1 ? '1 day ago' : `${d} days ago`
|
||||
}
|
||||
|
||||
/**
|
||||
* Tooltip-friendly description of a row's current liveness state.
|
||||
* Returns `undefined` when the state has nothing extra to add (e.g.
|
||||
* `unknown` with no timestamp).
|
||||
*/
|
||||
export function livenessDetail(
|
||||
status: AgentLiveness,
|
||||
lastUsedAt: number | null | undefined,
|
||||
): string | undefined {
|
||||
if (lastUsedAt == null) return undefined
|
||||
const diffMin = Math.floor((Date.now() - lastUsedAt) / 60_000)
|
||||
if (status === 'idle') return `Idle for ${Math.max(0, diffMin)} min`
|
||||
if (status === 'asleep') {
|
||||
if (diffMin < 60) return `Asleep — quiet for ${diffMin} min`
|
||||
const hr = Math.floor(diffMin / 60)
|
||||
return `Asleep — quiet for ${hr} hr`
|
||||
}
|
||||
if (status === 'working') return 'Working on a turn'
|
||||
if (status === 'error') return 'Attention — last turn failed'
|
||||
return undefined
|
||||
}
|
||||
|
||||
@@ -56,43 +56,6 @@ export interface HarnessAgent {
|
||||
* agents. Drives the recency sort and the "Last used X min ago" copy.
|
||||
*/
|
||||
lastUsedAt?: number | null
|
||||
/** Pinned agents float to the top of the list. Defaults to `false`. */
|
||||
pinned?: boolean
|
||||
/** First non-blank line of the most recent user message; null if none. */
|
||||
lastUserMessage?: string | null
|
||||
/** Working directory the agent runs in; null when no session record yet. */
|
||||
cwd?: string | null
|
||||
/** Cumulative + 7-day rolling token usage; null when no record. */
|
||||
tokens?: {
|
||||
last7d: { input: number; output: number; requestCount: number }
|
||||
cumulative: { input: number; output: number }
|
||||
} | null
|
||||
turnsByDay?: number[]
|
||||
failedByDay?: number[]
|
||||
lastError?: string | null
|
||||
lastErrorAt?: number | null
|
||||
/** When non-null, an in-flight turn this row can be resumed from. */
|
||||
activeTurnId?: string | null
|
||||
/** Persistent FIFO queue of messages waiting for this agent. */
|
||||
queue?: HarnessQueuedMessage[]
|
||||
}
|
||||
|
||||
export interface HarnessQueuedMessageAttachment {
|
||||
mediaType: string
|
||||
data: string
|
||||
}
|
||||
|
||||
export interface HarnessQueuedMessage {
|
||||
id: string
|
||||
createdAt: number
|
||||
message: string
|
||||
attachments?: ReadonlyArray<HarnessQueuedMessageAttachment>
|
||||
}
|
||||
|
||||
export interface HarnessAdapterHealth {
|
||||
healthy: boolean
|
||||
reason?: string
|
||||
checkedAt: number
|
||||
}
|
||||
|
||||
export interface HarnessAdapterDescriptor {
|
||||
@@ -103,7 +66,6 @@ export interface HarnessAdapterDescriptor {
|
||||
modelControl: 'runtime-supported' | 'best-effort'
|
||||
models: Array<{ id: string; label: string; recommended?: boolean }>
|
||||
reasoningEfforts: Array<{ id: string; label: string; recommended?: boolean }>
|
||||
health?: HarnessAdapterHealth
|
||||
}
|
||||
|
||||
export interface CreateHarnessAgentInput {
|
||||
|
||||
@@ -1,160 +0,0 @@
|
||||
import {
|
||||
Copy,
|
||||
Loader2,
|
||||
MessageSquare,
|
||||
MoreHorizontal,
|
||||
Pencil,
|
||||
RotateCcw,
|
||||
Trash2,
|
||||
} from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { useNavigate } from 'react-router'
|
||||
import { toast } from 'sonner'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
DropdownMenuItem,
|
||||
DropdownMenuSeparator,
|
||||
DropdownMenuTrigger,
|
||||
} from '@/components/ui/dropdown-menu'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import {
|
||||
canDelete as canDeleteAgent,
|
||||
canRename as canRenameAgent,
|
||||
displayName,
|
||||
} from '../agent-display.helpers'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
|
||||
interface AgentActionsProps {
|
||||
agent: AgentListItem
|
||||
activeTurnId: string | null
|
||||
deleting?: boolean
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Single primary CTA per row: `Resume` (filled, accent-orange, with a
|
||||
* pulsing dot) when an active turn exists; otherwise `Chat` (outline).
|
||||
* Both navigate to the same place — the chat hook auto-attaches via
|
||||
* `/chat/active` when there's a live turn — but the row signals which
|
||||
* action the user is actually taking.
|
||||
*/
|
||||
export const AgentActions: FC<AgentActionsProps> = ({
|
||||
agent,
|
||||
activeTurnId,
|
||||
deleting,
|
||||
onDelete,
|
||||
}) => {
|
||||
const navigate = useNavigate()
|
||||
const allowDelete = canDeleteAgent(agent)
|
||||
const allowRename = canRenameAgent(agent)
|
||||
|
||||
const handleChat = () => navigate(`/agents/${agent.agentId}`)
|
||||
const handleCopyId = async () => {
|
||||
try {
|
||||
await navigator.clipboard.writeText(agent.agentId)
|
||||
toast.success('Agent id copied')
|
||||
} catch {
|
||||
toast.error('Could not copy agent id')
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="flex shrink-0 items-center gap-1.5">
|
||||
{activeTurnId ? (
|
||||
<Button
|
||||
variant="default"
|
||||
size="sm"
|
||||
onClick={handleChat}
|
||||
className="gap-2 bg-[var(--accent-orange)] text-white shadow-sm hover:bg-[var(--accent-orange)]/90"
|
||||
>
|
||||
<span className="relative flex size-2">
|
||||
<span className="absolute inline-flex h-full w-full animate-ping rounded-full bg-white/70 opacity-75" />
|
||||
<span className="relative inline-flex size-2 rounded-full bg-white" />
|
||||
</span>
|
||||
Resume
|
||||
</Button>
|
||||
) : (
|
||||
<Button variant="outline" size="sm" onClick={handleChat}>
|
||||
<MessageSquare className="mr-1.5 size-3" />
|
||||
Chat
|
||||
</Button>
|
||||
)}
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
aria-label={`More actions for ${displayName(agent)}`}
|
||||
className="size-8 text-muted-foreground hover:text-foreground"
|
||||
>
|
||||
<MoreHorizontal className="size-4" />
|
||||
</Button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end" className="w-44">
|
||||
<DropdownMenuItem onSelect={() => void handleCopyId()}>
|
||||
<Copy className="mr-2 size-3.5" />
|
||||
Copy id
|
||||
</DropdownMenuItem>
|
||||
<ComingSoonItem
|
||||
icon={Pencil}
|
||||
label="Rename"
|
||||
disabled={!allowRename}
|
||||
/>
|
||||
<ComingSoonItem icon={RotateCcw} label="Reset history" disabled />
|
||||
<DropdownMenuSeparator />
|
||||
<DropdownMenuItem
|
||||
onSelect={() => onDelete(agent)}
|
||||
disabled={!allowDelete || deleting}
|
||||
className="text-destructive focus:text-destructive"
|
||||
>
|
||||
{deleting ? (
|
||||
<Loader2 className="mr-2 size-3.5 animate-spin" />
|
||||
) : (
|
||||
<Trash2 className="mr-2 size-3.5" />
|
||||
)}
|
||||
Delete
|
||||
</DropdownMenuItem>
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
interface ComingSoonItemProps {
|
||||
icon: typeof Pencil
|
||||
label: string
|
||||
disabled: boolean
|
||||
}
|
||||
|
||||
const ComingSoonItem: FC<ComingSoonItemProps> = ({
|
||||
icon: Icon,
|
||||
label,
|
||||
disabled,
|
||||
}) => {
|
||||
const item = (
|
||||
<DropdownMenuItem disabled className="text-muted-foreground">
|
||||
<Icon className="mr-2 size-3.5" />
|
||||
{label}
|
||||
</DropdownMenuItem>
|
||||
)
|
||||
if (!disabled) return item
|
||||
return (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<span className="block w-full">{item}</span>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="left" className="text-xs">
|
||||
{label} coming soon
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
}
|
||||
@@ -1,96 +0,0 @@
|
||||
import { AlertTriangle, ChevronDown } from 'lucide-react'
|
||||
import { type FC, useEffect, useState } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
Collapsible,
|
||||
CollapsibleContent,
|
||||
CollapsibleTrigger,
|
||||
} from '@/components/ui/collapsible'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { truncate } from './agent-row.helpers'
|
||||
|
||||
interface AgentErrorPanelProps {
|
||||
agentId: string
|
||||
message: string
|
||||
errorAt: number | null
|
||||
}
|
||||
|
||||
const STORAGE_PREFIX = 'agent-row:lastErrorSeenAt:'
|
||||
const PREVIEW_CHARS = 200
|
||||
|
||||
export const AgentErrorPanel: FC<AgentErrorPanelProps> = ({
|
||||
agentId,
|
||||
message,
|
||||
errorAt,
|
||||
}) => {
|
||||
const storageKey = `${STORAGE_PREFIX}${agentId}`
|
||||
// Open if we've never seen this `errorAt` for this agent. Once the
|
||||
// user collapses the panel (or refreshes after seeing it), we mark
|
||||
// it seen so it doesn't re-pop on every poll.
|
||||
const [open, setOpen] = useState<boolean>(() => {
|
||||
if (typeof window === 'undefined' || !errorAt) return true
|
||||
const seen = Number(window.localStorage.getItem(storageKey) ?? 0)
|
||||
return !Number.isFinite(seen) || errorAt > seen
|
||||
})
|
||||
|
||||
useEffect(() => {
|
||||
if (!open && errorAt && typeof window !== 'undefined') {
|
||||
window.localStorage.setItem(storageKey, String(errorAt))
|
||||
}
|
||||
}, [open, errorAt, storageKey])
|
||||
|
||||
const preview = truncate(message, PREVIEW_CHARS)
|
||||
const truncated = preview.length < message.length
|
||||
|
||||
return (
|
||||
<Collapsible open={open} onOpenChange={setOpen} className="mt-3">
|
||||
<div className="flex items-center justify-between rounded-md border border-destructive/30 bg-destructive/5 px-3 py-2">
|
||||
<div className="flex items-center gap-2 font-medium text-destructive text-xs">
|
||||
<AlertTriangle className="size-3.5" />
|
||||
Last error
|
||||
</div>
|
||||
<CollapsibleTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="sm"
|
||||
className="h-6 px-2 text-muted-foreground"
|
||||
>
|
||||
<span className="text-xs">{open ? 'hide' : 'show'}</span>
|
||||
<ChevronDown
|
||||
className={cn(
|
||||
'ml-1 size-3 transition-transform',
|
||||
open && 'rotate-180',
|
||||
)}
|
||||
/>
|
||||
</Button>
|
||||
</CollapsibleTrigger>
|
||||
</div>
|
||||
<CollapsibleContent>
|
||||
<div className="mt-1 rounded-md border-destructive/30 border-x border-b bg-destructive/5 px-3 pb-2 text-xs">
|
||||
{truncated ? (
|
||||
<HoverCard openDelay={300}>
|
||||
<HoverCardTrigger asChild>
|
||||
<span className="cursor-default font-mono text-foreground/80">
|
||||
{preview}…
|
||||
</span>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent
|
||||
side="bottom"
|
||||
className="max-w-md whitespace-pre-wrap font-mono text-xs"
|
||||
>
|
||||
{message}
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
) : (
|
||||
<span className="font-mono text-foreground/80">{message}</span>
|
||||
)}
|
||||
</div>
|
||||
</CollapsibleContent>
|
||||
</Collapsible>
|
||||
)
|
||||
}
|
||||
@@ -1,35 +0,0 @@
|
||||
import { Quote } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { firstNonBlankLine, truncate } from './agent-row.helpers'
|
||||
|
||||
interface AgentLastMessageProps {
|
||||
message: string | null
|
||||
}
|
||||
|
||||
const PREVIEW_CHARS = 110
|
||||
|
||||
/**
|
||||
* Inline preview of the most recent user message. Renders as a quoted,
|
||||
* italic line so the row reads like a conversation snippet rather than
|
||||
* a label-and-value pair. No hover-card — opening the agent's chat is
|
||||
* the canonical way to read the full message.
|
||||
*/
|
||||
export const AgentLastMessage: FC<AgentLastMessageProps> = ({ message }) => {
|
||||
if (!message) {
|
||||
return (
|
||||
<p className="mt-1 text-muted-foreground/70 text-xs italic">
|
||||
No messages yet — start a chat
|
||||
</p>
|
||||
)
|
||||
}
|
||||
const preview = truncate(firstNonBlankLine(message), PREVIEW_CHARS)
|
||||
return (
|
||||
<p className="mt-1.5 flex items-start gap-1.5 text-foreground/85 text-sm italic leading-snug">
|
||||
<Quote
|
||||
className="mt-1 size-3 shrink-0 text-muted-foreground/60"
|
||||
aria-hidden
|
||||
/>
|
||||
<span className="truncate">{preview}</span>
|
||||
</p>
|
||||
)
|
||||
}
|
||||
@@ -1,37 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import { formatRelativeTime } from '../agent-display.helpers'
|
||||
import { AgentTokenSummary } from './AgentTokenSummary'
|
||||
import type { AgentTokenUsage } from './agent-row.types'
|
||||
|
||||
interface AgentMetaRowProps {
|
||||
lastUsedAt: number | null
|
||||
tokens: AgentTokenUsage | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Bottom-of-row meta line. Intentionally sparse — last activity time
|
||||
* and lifetime tokens. CWD is no longer surfaced here because the path
|
||||
* the server happens to be running from isn't actionable; if a future
|
||||
* surface needs the cwd (chat panel, debug view) it reads from the
|
||||
* listing payload directly.
|
||||
*/
|
||||
export const AgentMetaRow: FC<AgentMetaRowProps> = ({ lastUsedAt, tokens }) => {
|
||||
const lastUsedLabel = formatRelativeTime(lastUsedAt)
|
||||
const tokensTotal =
|
||||
(tokens?.cumulative.input ?? 0) + (tokens?.cumulative.output ?? 0)
|
||||
const showTokens = tokensTotal > 0
|
||||
|
||||
return (
|
||||
<div className="mt-2 flex flex-wrap items-center gap-x-2 text-muted-foreground text-xs">
|
||||
<span>{lastUsedLabel}</span>
|
||||
{showTokens && (
|
||||
<>
|
||||
<span aria-hidden className="text-muted-foreground/50">
|
||||
·
|
||||
</span>
|
||||
<AgentTokenSummary tokens={tokens} />
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,92 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { formatLocalDate, ROW_BAR_COUNT } from './agent-row.helpers'
|
||||
|
||||
interface AgentSparklineProps {
|
||||
/** 14 entries, oldest → newest. Today's bucket is the last index. */
|
||||
turnsByDay: number[]
|
||||
/** Same length, same order. Failed turns counted separately. */
|
||||
failedByDay: number[]
|
||||
className?: string
|
||||
}
|
||||
|
||||
const MIN_BAR_HEIGHT_PX = 2
|
||||
const MAX_BAR_HEIGHT_PX = 18
|
||||
|
||||
export const AgentSparkline: FC<AgentSparklineProps> = ({
|
||||
turnsByDay,
|
||||
failedByDay,
|
||||
className,
|
||||
}) => {
|
||||
if (turnsByDay.length === 0 || turnsByDay.every((n) => n === 0)) return null
|
||||
const max = Math.max(1, ...turnsByDay)
|
||||
|
||||
return (
|
||||
<HoverCard openDelay={250}>
|
||||
<HoverCardTrigger asChild>
|
||||
<div
|
||||
role="img"
|
||||
aria-label={`Last ${ROW_BAR_COUNT} days of activity`}
|
||||
className={cn('flex h-5 items-end gap-px', className)}
|
||||
>
|
||||
{turnsByDay.map((count, idx) => {
|
||||
const ratio = count / max
|
||||
const height = Math.max(
|
||||
MIN_BAR_HEIGHT_PX,
|
||||
Math.round(ratio * MAX_BAR_HEIGHT_PX),
|
||||
)
|
||||
const isToday = idx === ROW_BAR_COUNT - 1
|
||||
const failed = failedByDay[idx] ?? 0
|
||||
return (
|
||||
<div
|
||||
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length sparkline buckets keyed by day position
|
||||
key={`bar-${idx}`}
|
||||
className={cn(
|
||||
'w-1.5 rounded-sm',
|
||||
count === 0
|
||||
? 'bg-muted-foreground/15'
|
||||
: failed > 0
|
||||
? 'bg-destructive/50'
|
||||
: 'bg-[var(--accent-orange)]/50',
|
||||
isToday && 'ring-1 ring-foreground/30',
|
||||
)}
|
||||
style={{ height }}
|
||||
/>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="left" className="w-56 text-xs">
|
||||
<div className="mb-2 font-medium text-sm">Last 14 days</div>
|
||||
<ul className="space-y-0.5">
|
||||
{turnsByDay.map((count, idx) => {
|
||||
const failed = failedByDay[idx] ?? 0
|
||||
const dayLabel = formatLocalDate(idx)
|
||||
return (
|
||||
<li
|
||||
// biome-ignore lint/suspicious/noArrayIndexKey: fixed-length list keyed by day position
|
||||
key={`day-${idx}`}
|
||||
className="flex items-center justify-between text-muted-foreground"
|
||||
>
|
||||
<span>{dayLabel}</span>
|
||||
<span>
|
||||
{count}
|
||||
{failed > 0 && (
|
||||
<span className="ml-1 text-destructive">
|
||||
({failed} failed)
|
||||
</span>
|
||||
)}
|
||||
</span>
|
||||
</li>
|
||||
)
|
||||
})}
|
||||
</ul>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
@@ -1,71 +0,0 @@
|
||||
import { TriangleAlert } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { adapterLabel } from '../AdapterIcon'
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import type { AgentAdapterHealth } from './agent-row.types'
|
||||
|
||||
interface AgentSummaryChipsProps {
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelLabel: string | null
|
||||
reasoningEffort: string | null
|
||||
/** When unhealthy, the adapter label dims and a warning chip appears. */
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Adapter / model / reasoning summary line. Always rendered (so OpenClaw
|
||||
* rows that fall back to defaults still expose what they're set up to do)
|
||||
* and surfaces adapter-health *only when unhealthy* — keeping the calm
|
||||
* default state silent and reserving visual noise for things the user
|
||||
* needs to act on.
|
||||
*/
|
||||
export const AgentSummaryChips: FC<AgentSummaryChipsProps> = ({
|
||||
adapter,
|
||||
modelLabel,
|
||||
reasoningEffort,
|
||||
adapterHealth,
|
||||
}) => {
|
||||
const parts = [adapterLabel(adapter)]
|
||||
if (modelLabel) parts.push(modelLabel)
|
||||
if (reasoningEffort) parts.push(reasoningEffort)
|
||||
const unhealthy = adapterHealth?.healthy === false
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'flex items-center gap-1.5 text-muted-foreground text-xs',
|
||||
unhealthy && 'text-muted-foreground/70',
|
||||
)}
|
||||
>
|
||||
<span className="truncate">{parts.join(' · ')}</span>
|
||||
{unhealthy && adapterHealth && (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<Badge
|
||||
variant="outline"
|
||||
className="h-5 cursor-default gap-1 border-amber-500/40 bg-amber-50 px-1.5 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
<TriangleAlert className="size-2.5" />
|
||||
<span className="font-normal">Unavailable</span>
|
||||
</Badge>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="right" className="w-72 text-sm">
|
||||
<div className="font-medium">
|
||||
{adapterLabel(adapter)} CLI not available
|
||||
</div>
|
||||
<div className="mt-1 text-muted-foreground text-xs">
|
||||
{adapterHealth.reason ??
|
||||
'Adapter binary missing on $PATH. Install it from the adapter docs to use this agent.'}
|
||||
</div>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,37 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import { cn } from '@/lib/utils'
|
||||
import { AdapterIcon } from '../AdapterIcon'
|
||||
import { livenessDetail } from '../agent-display.helpers'
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import { type AgentLiveness, LivenessDot } from '../LivenessDot'
|
||||
|
||||
export interface AgentTileProps {
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
status: AgentLiveness
|
||||
lastUsedAt: number | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Adapter glyph + a single liveness dot. Adapter health is no longer
|
||||
* surfaced here — it lives as an inline pill inside `AgentSummaryChips`
|
||||
* so the user isn't asked to disambiguate two dots on the same tile.
|
||||
*/
|
||||
export const AgentTile: FC<AgentTileProps> = ({
|
||||
adapter,
|
||||
status,
|
||||
lastUsedAt,
|
||||
}) => (
|
||||
<div className="relative shrink-0">
|
||||
<div className="flex h-12 w-12 items-center justify-center rounded-xl bg-muted text-muted-foreground">
|
||||
<AdapterIcon adapter={adapter} className="h-6 w-6" />
|
||||
</div>
|
||||
<LivenessDot
|
||||
status={status}
|
||||
detail={livenessDetail(status, lastUsedAt)}
|
||||
className={cn(
|
||||
'absolute -right-0.5 -bottom-0.5',
|
||||
status === 'working' && 'animate-pulse',
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
@@ -1,55 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import { Badge } from '@/components/ui/badge'
|
||||
import { displayName } from '../agent-display.helpers'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
import type { AgentLiveness } from '../LivenessDot'
|
||||
import { AgentSparkline } from './AgentSparkline'
|
||||
import { PinToggle } from './PinToggle'
|
||||
|
||||
interface AgentTitleRowProps {
|
||||
agent: AgentListItem
|
||||
status: AgentLiveness
|
||||
pinned: boolean
|
||||
turnsByDay: number[]
|
||||
failedByDay: number[]
|
||||
onPinToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Title strip: name + status badge + (right-aligned) sparkline. The
|
||||
* pin toggle sits trailing the title so the title always flushes left
|
||||
* regardless of pin state — moving the star left of the title indents
|
||||
* the row's first line off-axis from the model/preview/meta lines
|
||||
* below it. When unpinned and not hovered, the toggle is removed from
|
||||
* layout entirely so it reserves no space at all.
|
||||
*/
|
||||
export const AgentTitleRow: FC<AgentTitleRowProps> = ({
|
||||
agent,
|
||||
status,
|
||||
pinned,
|
||||
turnsByDay,
|
||||
failedByDay,
|
||||
onPinToggle,
|
||||
}) => (
|
||||
<div className="mb-1 flex items-center gap-2">
|
||||
<span className="truncate font-semibold">{displayName(agent)}</span>
|
||||
{status === 'working' && (
|
||||
<Badge
|
||||
variant="secondary"
|
||||
className="bg-amber-50 text-amber-900 hover:bg-amber-50"
|
||||
>
|
||||
Working
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'asleep' && (
|
||||
<Badge variant="outline" className="text-muted-foreground">
|
||||
Asleep
|
||||
</Badge>
|
||||
)}
|
||||
{status === 'error' && <Badge variant="destructive">Attention</Badge>}
|
||||
<PinToggle pinned={pinned} onToggle={onPinToggle} />
|
||||
<div className="ml-auto">
|
||||
<AgentSparkline turnsByDay={turnsByDay} failedByDay={failedByDay} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
@@ -1,63 +0,0 @@
|
||||
import type { FC } from 'react'
|
||||
import {
|
||||
HoverCard,
|
||||
HoverCardContent,
|
||||
HoverCardTrigger,
|
||||
} from '@/components/ui/hover-card'
|
||||
import { Progress } from '@/components/ui/progress'
|
||||
import { formatTokens } from './agent-row.helpers'
|
||||
import type { AgentTokenUsage } from './agent-row.types'
|
||||
|
||||
interface AgentTokenSummaryProps {
|
||||
tokens: AgentTokenUsage | null
|
||||
}
|
||||
|
||||
/**
|
||||
* Inline token total + a HoverCard breakdown. Surfaces lifetime tokens
|
||||
* (the only window we can compute reliably from the session record).
|
||||
* Per-window stats land in a follow-up once the activity ledger ships.
|
||||
*/
|
||||
export const AgentTokenSummary: FC<AgentTokenSummaryProps> = ({ tokens }) => {
|
||||
if (!tokens) return null
|
||||
const { input, output } = tokens.cumulative
|
||||
const total = input + output
|
||||
if (total === 0) return null
|
||||
const inputPct = (input / total) * 100
|
||||
|
||||
return (
|
||||
<HoverCard openDelay={200}>
|
||||
<HoverCardTrigger asChild>
|
||||
<span className="cursor-default text-muted-foreground tabular-nums transition-colors hover:text-foreground">
|
||||
{formatTokens(total)} tokens
|
||||
</span>
|
||||
</HoverCardTrigger>
|
||||
<HoverCardContent side="top" align="end" className="w-72 text-sm">
|
||||
<div className="mb-3 flex items-center justify-between">
|
||||
<span className="font-medium">Lifetime tokens</span>
|
||||
<span className="text-muted-foreground text-xs tabular-nums">
|
||||
{formatTokens(total)} total
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center justify-between text-xs">
|
||||
<span className="text-muted-foreground">Input</span>
|
||||
<span className="tabular-nums">{formatTokens(input)}</span>
|
||||
</div>
|
||||
<Progress value={inputPct} className="h-1.5" />
|
||||
|
||||
<div className="mt-2 flex items-center justify-between text-xs">
|
||||
<span className="text-muted-foreground">Output</span>
|
||||
<span className="tabular-nums">{formatTokens(output)}</span>
|
||||
</div>
|
||||
<Progress value={100 - inputPct} className="h-1.5" />
|
||||
</div>
|
||||
|
||||
<p className="mt-3 border-t pt-2 text-muted-foreground text-xs leading-snug">
|
||||
Cumulative across every turn this agent has run. Per-window stats
|
||||
arrive in a future release.
|
||||
</p>
|
||||
</HoverCardContent>
|
||||
</HoverCard>
|
||||
)
|
||||
}
|
||||
@@ -1,60 +0,0 @@
|
||||
import { Star } from 'lucide-react'
|
||||
import type { FC } from 'react'
|
||||
import { Button } from '@/components/ui/button'
|
||||
import {
|
||||
Tooltip,
|
||||
TooltipContent,
|
||||
TooltipProvider,
|
||||
TooltipTrigger,
|
||||
} from '@/components/ui/tooltip'
|
||||
import { cn } from '@/lib/utils'
|
||||
|
||||
interface PinToggleProps {
|
||||
pinned: boolean
|
||||
onToggle: (next: boolean) => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Trailing star toggle. The button is *always rendered* — only its
|
||||
* opacity changes between pinned/unpinned/hover states — so the title
|
||||
* row's height is constant. Hiding the slot via `display: none` would
|
||||
* collapse the row's vertical metrics on hover and shift every card
|
||||
* below in the rail.
|
||||
*
|
||||
* Placement is trailing the title (after the status badge) so the
|
||||
* title itself flushes left regardless of pin state — leading the
|
||||
* row with the star would indent the title relative to the model /
|
||||
* preview / meta lines beneath it.
|
||||
*/
|
||||
export const PinToggle: FC<PinToggleProps> = ({ pinned, onToggle }) => (
|
||||
<TooltipProvider delayDuration={300}>
|
||||
<Tooltip>
|
||||
<TooltipTrigger asChild>
|
||||
<Button
|
||||
variant="ghost"
|
||||
size="icon"
|
||||
className={cn(
|
||||
'size-6 text-muted-foreground transition-opacity hover:text-foreground',
|
||||
pinned ? 'opacity-100' : 'opacity-0 group-hover:opacity-100',
|
||||
)}
|
||||
aria-pressed={pinned}
|
||||
aria-label={pinned ? 'Unpin agent' : 'Pin agent'}
|
||||
onClick={(event) => {
|
||||
event.stopPropagation()
|
||||
onToggle(!pinned)
|
||||
}}
|
||||
>
|
||||
<Star
|
||||
className={cn(
|
||||
'size-3.5',
|
||||
pinned && 'fill-amber-400 text-amber-500',
|
||||
)}
|
||||
/>
|
||||
</Button>
|
||||
</TooltipTrigger>
|
||||
<TooltipContent side="top" className="text-xs">
|
||||
{pinned ? 'Unpin' : 'Pin to top'}
|
||||
</TooltipContent>
|
||||
</Tooltip>
|
||||
</TooltipProvider>
|
||||
)
|
||||
@@ -1,73 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import {
|
||||
firstNonBlankLine,
|
||||
formatLocalDate,
|
||||
formatTokens,
|
||||
ROW_BAR_COUNT,
|
||||
truncate,
|
||||
} from './agent-row.helpers'
|
||||
|
||||
describe('formatTokens', () => {
|
||||
it('renders zero / NaN as "0"', () => {
|
||||
expect(formatTokens(0)).toBe('0')
|
||||
expect(formatTokens(Number.NaN)).toBe('0')
|
||||
})
|
||||
|
||||
it('renders sub-1K as integer', () => {
|
||||
expect(formatTokens(142)).toBe('142')
|
||||
})
|
||||
|
||||
it('renders K with one decimal under 10', () => {
|
||||
expect(formatTokens(8_400)).toBe('8.4K')
|
||||
})
|
||||
|
||||
it('drops the decimal at >=10K', () => {
|
||||
expect(formatTokens(120_000)).toBe('120K')
|
||||
})
|
||||
|
||||
it('renders M with one decimal under 10', () => {
|
||||
expect(formatTokens(1_200_000)).toBe('1.2M')
|
||||
})
|
||||
})
|
||||
|
||||
describe('firstNonBlankLine', () => {
|
||||
it('returns the first non-blank line', () => {
|
||||
expect(firstNonBlankLine('\n\nhello\nworld')).toBe('hello')
|
||||
})
|
||||
|
||||
it('skips USER_QUERY envelope tags', () => {
|
||||
expect(firstNonBlankLine('<USER_QUERY>\nfix tests\n</USER_QUERY>')).toBe(
|
||||
'fix tests',
|
||||
)
|
||||
})
|
||||
|
||||
it('falls back to the trimmed input when nothing matches', () => {
|
||||
expect(firstNonBlankLine(' single ')).toBe('single')
|
||||
})
|
||||
})
|
||||
|
||||
describe('truncate', () => {
|
||||
it('returns input unchanged when within limit', () => {
|
||||
expect(truncate('hello', 10)).toBe('hello')
|
||||
})
|
||||
|
||||
it('appends an ellipsis when over limit', () => {
|
||||
expect(truncate('hello world', 6)).toBe('hello…')
|
||||
})
|
||||
})
|
||||
|
||||
describe('formatLocalDate', () => {
|
||||
const today = new Date('2026-04-30T12:00:00Z')
|
||||
|
||||
it('labels today and yesterday explicitly', () => {
|
||||
expect(formatLocalDate(ROW_BAR_COUNT - 1, today)).toBe('today')
|
||||
expect(formatLocalDate(ROW_BAR_COUNT - 2, today)).toBe('yesterday')
|
||||
})
|
||||
|
||||
it('returns a "Mon D" format for older days', () => {
|
||||
const label = formatLocalDate(0, today)
|
||||
// "Apr 17" or "Apr 17," depending on locale; just assert it
|
||||
// contains a month abbreviation and a day number.
|
||||
expect(label).toMatch(/[A-Za-z]+ \d+/)
|
||||
})
|
||||
})
|
||||
@@ -1,64 +0,0 @@
|
||||
/**
|
||||
* Pure formatters consumed by row sub-components. Kept distinct from
|
||||
* `agent-display.helpers.ts` (page-level helpers) so the row internals
|
||||
* have an obvious single home.
|
||||
*/
|
||||
|
||||
const TOKEN_THRESHOLDS: Array<[number, string]> = [
|
||||
[1_000_000, 'M'],
|
||||
[1_000, 'K'],
|
||||
]
|
||||
|
||||
/** `1.2M`, `820K`, `8.4K`, `142`, `0`. */
|
||||
export function formatTokens(n: number): string {
|
||||
if (!Number.isFinite(n) || n <= 0) return '0'
|
||||
for (const [threshold, suffix] of TOKEN_THRESHOLDS) {
|
||||
if (n >= threshold) {
|
||||
const value = n / threshold
|
||||
const decimal = value < 10 ? value.toFixed(1) : value.toFixed(0)
|
||||
return `${decimal}${suffix}`
|
||||
}
|
||||
}
|
||||
return String(Math.round(n))
|
||||
}
|
||||
|
||||
const USER_QUERY_OPEN = /^<USER_QUERY>$/i
|
||||
const USER_QUERY_CLOSE = /^<\/USER_QUERY>$/i
|
||||
|
||||
/**
|
||||
* First non-blank line, with the BrowserOS user-system-prompt
|
||||
* `<USER_QUERY>` envelope tags stripped so previews don't show
|
||||
* structural noise.
|
||||
*/
|
||||
export function firstNonBlankLine(text: string): string {
|
||||
const lines = text.split('\n').map((line) => line.trim())
|
||||
for (const line of lines) {
|
||||
if (!line) continue
|
||||
if (USER_QUERY_OPEN.test(line) || USER_QUERY_CLOSE.test(line)) continue
|
||||
return line
|
||||
}
|
||||
return text.trim()
|
||||
}
|
||||
|
||||
export function truncate(text: string, max: number): string {
|
||||
if (text.length <= max) return text
|
||||
return `${text.slice(0, max - 1).trimEnd()}…`
|
||||
}
|
||||
|
||||
const SPARKLINE_DAYS = 14
|
||||
|
||||
/**
|
||||
* "today" / "yesterday" / "Apr 17" — given an index 0..13 from
|
||||
* oldest → newest. `today` defaults to `new Date()` so callers don't
|
||||
* have to thread a clock through.
|
||||
*/
|
||||
export function formatLocalDate(idx: number, today: Date = new Date()): string {
|
||||
if (idx === SPARKLINE_DAYS - 1) return 'today'
|
||||
if (idx === SPARKLINE_DAYS - 2) return 'yesterday'
|
||||
const offset = SPARKLINE_DAYS - 1 - idx
|
||||
const date = new Date(today)
|
||||
date.setDate(date.getDate() - offset)
|
||||
return date.toLocaleDateString(undefined, { month: 'short', day: 'numeric' })
|
||||
}
|
||||
|
||||
export const ROW_BAR_COUNT = SPARKLINE_DAYS
|
||||
@@ -1,51 +0,0 @@
|
||||
import type { HarnessAgentAdapter } from '../agent-harness-types'
|
||||
import type { AgentListItem } from '../agents-page-types'
|
||||
import type { AgentLiveness } from '../LivenessDot'
|
||||
|
||||
/**
|
||||
* Window-bounded token usage. Server returns `null` when no session
|
||||
* record exists yet for the agent.
|
||||
*/
|
||||
export interface AgentTokenUsage {
|
||||
last7d: { input: number; output: number; requestCount: number }
|
||||
cumulative: { input: number; output: number }
|
||||
}
|
||||
|
||||
export interface AgentAdapterHealth {
|
||||
healthy: boolean
|
||||
reason?: string
|
||||
}
|
||||
|
||||
/**
|
||||
* Everything an `AgentRowCard` needs to render. Mirrors the shape
|
||||
* `useHarnessAgents` exposes; the page assembles one entry per row in
|
||||
* `AgentList` and passes it down. Sub-components only see slices of
|
||||
* this object — no prop drilling beyond two levels.
|
||||
*/
|
||||
export interface AgentRowData {
|
||||
agent: AgentListItem
|
||||
adapter: HarnessAgentAdapter | 'unknown'
|
||||
modelLabel: string | null
|
||||
reasoningEffort: string | null
|
||||
status: AgentLiveness
|
||||
lastUsedAt: number | null
|
||||
pinned: boolean
|
||||
cwd: string | null
|
||||
lastUserMessage: string | null
|
||||
tokens: AgentTokenUsage | null
|
||||
/** 14 entries, oldest → newest. Today is the last index. */
|
||||
turnsByDay: number[]
|
||||
/** Same length and ordering as `turnsByDay`. */
|
||||
failedByDay: number[]
|
||||
lastError: string | null
|
||||
lastErrorAt: number | null
|
||||
/** When non-null, an in-flight turn this row can be resumed from. */
|
||||
activeTurnId: string | null
|
||||
/** Adapter-level health, shared across rows for the same adapter. */
|
||||
adapterHealth: AgentAdapterHealth | null
|
||||
}
|
||||
|
||||
export interface AgentRowCallbacks {
|
||||
onDelete: (agent: AgentListItem) => void
|
||||
onPinToggle: (agent: AgentListItem, next: boolean) => void
|
||||
}
|
||||
@@ -1,104 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import type { HarnessAgent } from './agent-harness-types'
|
||||
import {
|
||||
compareAgentsByPinThenRecency,
|
||||
orderAgentsByPinThenRecency,
|
||||
} from './agents-list-order'
|
||||
|
||||
function makeAgent(input: {
|
||||
id: string
|
||||
pinned?: boolean
|
||||
lastUsedAt?: number | null
|
||||
}): HarnessAgent {
|
||||
return {
|
||||
id: input.id,
|
||||
name: input.id,
|
||||
adapter: 'codex',
|
||||
permissionMode: 'approve-all',
|
||||
sessionKey: 'session',
|
||||
createdAt: 0,
|
||||
updatedAt: 0,
|
||||
pinned: input.pinned,
|
||||
lastUsedAt: input.lastUsedAt,
|
||||
}
|
||||
}
|
||||
|
||||
describe('orderAgentsByPinThenRecency', () => {
|
||||
it('floats pinned agents to the top regardless of recency', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'a', pinned: false, lastUsedAt: 1_000 }),
|
||||
makeAgent({ id: 'b', pinned: true, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'c', pinned: false, lastUsedAt: 500 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['b', 'a', 'c'])
|
||||
})
|
||||
|
||||
it('sorts by lastUsedAt desc within each pin group', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'older-pin', pinned: true, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'newer-pin', pinned: true, lastUsedAt: 200 }),
|
||||
makeAgent({ id: 'older', pinned: false, lastUsedAt: 50 }),
|
||||
makeAgent({ id: 'newer', pinned: false, lastUsedAt: 80 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual([
|
||||
'newer-pin',
|
||||
'older-pin',
|
||||
'newer',
|
||||
'older',
|
||||
])
|
||||
})
|
||||
|
||||
it('seed-pins the gateway main agent above other never-used agents', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'main', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'zzz', pinned: false, lastUsedAt: null }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['main', 'aaa', 'zzz'])
|
||||
})
|
||||
|
||||
it('drops the main seed-pin once the agent has been used', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'aaa', pinned: false, lastUsedAt: 999 }),
|
||||
makeAgent({ id: 'main', pinned: false, lastUsedAt: 1 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['aaa', 'main'])
|
||||
})
|
||||
|
||||
it('puts never-used agents below recently-used ones', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'fresh', pinned: false, lastUsedAt: null }),
|
||||
makeAgent({ id: 'used', pinned: false, lastUsedAt: 100 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['used', 'fresh'])
|
||||
})
|
||||
|
||||
it('id-stable tiebreaks two agents with identical lastUsedAt', () => {
|
||||
const result = orderAgentsByPinThenRecency([
|
||||
makeAgent({ id: 'b', pinned: false, lastUsedAt: 100 }),
|
||||
makeAgent({ id: 'a', pinned: false, lastUsedAt: 100 }),
|
||||
])
|
||||
expect(result.map((entry) => entry.id)).toEqual(['a', 'b'])
|
||||
})
|
||||
})
|
||||
|
||||
describe('compareAgentsByPinThenRecency', () => {
|
||||
it('produces the same order as the harness-shape helper', () => {
|
||||
const items = [
|
||||
{ id: 'older', pinned: false, lastUsedAt: 50 },
|
||||
{ id: 'newer', pinned: false, lastUsedAt: 80 },
|
||||
{ id: 'pinned', pinned: true, lastUsedAt: 1 },
|
||||
]
|
||||
const sorted = [...items].sort(compareAgentsByPinThenRecency)
|
||||
expect(sorted.map((item) => item.id)).toEqual(['pinned', 'newer', 'older'])
|
||||
})
|
||||
|
||||
it('seeds the main agent above other never-used rows', () => {
|
||||
const items = [
|
||||
{ id: 'zzz', pinned: false, lastUsedAt: null },
|
||||
{ id: 'main', pinned: false, lastUsedAt: null },
|
||||
]
|
||||
const sorted = [...items].sort(compareAgentsByPinThenRecency)
|
||||
expect(sorted.map((item) => item.id)).toEqual(['main', 'zzz'])
|
||||
})
|
||||
})
|
||||
@@ -1,59 +0,0 @@
|
||||
import type { HarnessAgent } from './agent-harness-types'
|
||||
|
||||
/**
|
||||
* Stable ordering for index-shaped agent surfaces (the `/agents` rail
|
||||
* and the chat-screen rail at `/agents/:agentId`). Pinned rows float
|
||||
* to the top, then recency desc, with never-used agents falling to
|
||||
* the bottom in id-stable order. The gateway's `main` agent gets
|
||||
* seed-pinned to the top of the never-used group so a fresh install
|
||||
* has an obvious starting point even before the user has used it.
|
||||
*
|
||||
* NOT the same rule as the home grid (`orderHomeAgents`): home is
|
||||
* action-shaped — active-turn floats to the top — so users can
|
||||
* resume what's running. The chat rail keeps recency stable so it
|
||||
* doesn't reshuffle as turns transition every 5s.
|
||||
*/
|
||||
export function orderAgentsByPinThenRecency(
|
||||
agents: HarnessAgent[],
|
||||
): HarnessAgent[] {
|
||||
return [...agents].sort((a, b) => {
|
||||
const aPinned = a.pinned ?? false
|
||||
const bPinned = b.pinned ?? false
|
||||
if (aPinned !== bPinned) return aPinned ? -1 : 1
|
||||
|
||||
const aSeed = a.id === 'main' && (a.lastUsedAt ?? null) === null
|
||||
const bSeed = b.id === 'main' && (b.lastUsedAt ?? null) === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Same comparator, but operates over arbitrary records that carry
|
||||
* `pinned`, `lastUsedAt`, and an `id`-equivalent key. Used by the
|
||||
* `/agents` `AgentList` which pivots `AgentListItem` + harness
|
||||
* lookup into a sortable shape; both surfaces stay on identical
|
||||
* sort semantics through this adapter.
|
||||
*/
|
||||
export function compareAgentsByPinThenRecency<
|
||||
T extends { pinned: boolean; lastUsedAt: number | null; id: string },
|
||||
>(a: T, b: T): number {
|
||||
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1
|
||||
|
||||
const aSeed = a.id === 'main' && a.lastUsedAt === null
|
||||
const bSeed = b.id === 'main' && b.lastUsedAt === null
|
||||
if (aSeed && !bSeed) return -1
|
||||
if (!aSeed && bSeed) return 1
|
||||
|
||||
const aValue = a.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
const bValue = b.lastUsedAt ?? Number.NEGATIVE_INFINITY
|
||||
if (aValue !== bValue) return bValue - aValue
|
||||
|
||||
return a.id.localeCompare(b.id)
|
||||
}
|
||||
@@ -8,7 +8,6 @@ import {
|
||||
type HarnessAdapterDescriptor,
|
||||
type HarnessAgent,
|
||||
type HarnessAgentHistoryPage,
|
||||
type HarnessQueuedMessage,
|
||||
mapHarnessAgentToEntry,
|
||||
} from './agent-harness-types'
|
||||
import type { OpenClawStatus } from './useOpenClaw'
|
||||
@@ -136,63 +135,6 @@ export function useCreateHarnessAgent() {
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply a partial update to a harness agent. Used by the pin-toggle
|
||||
* star and (eventually) the inline rename UI. Optimistically writes
|
||||
* the patch into the listing query cache so the row updates instantly,
|
||||
* then rolls back if the server rejects the change.
|
||||
*/
|
||||
export function useUpdateHarnessAgent() {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: {
|
||||
agentId: string
|
||||
patch: { name?: string; pinned?: boolean }
|
||||
}) => {
|
||||
if (!baseUrl || urlLoading) {
|
||||
throw new Error('BrowserOS agent server URL is not ready')
|
||||
}
|
||||
const data = await agentsFetch<{ agent: HarnessAgent }>(
|
||||
baseUrl,
|
||||
`/${encodeURIComponent(input.agentId)}`,
|
||||
{
|
||||
method: 'PATCH',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify(input.patch),
|
||||
},
|
||||
)
|
||||
return data.agent
|
||||
},
|
||||
onMutate: async ({ agentId, patch }) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === agentId ? { ...agent, ...patch } : agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
export function useDeleteHarnessAgent() {
|
||||
const { baseUrl, isLoading: urlLoading } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
@@ -264,8 +206,6 @@ export interface HarnessActiveTurnInfo {
|
||||
lastSeq: number
|
||||
startedAt: number
|
||||
endedAt?: number
|
||||
/** User message that kicked off the turn; null when not captured. */
|
||||
prompt: string | null
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -320,145 +260,3 @@ export async function fetchHarnessAgentHistory(
|
||||
`/${encodeURIComponent(agentId)}/sessions/main/history`,
|
||||
)
|
||||
}
|
||||
|
||||
export interface EnqueueMessageInput {
|
||||
message: string
|
||||
attachments?: ReadonlyArray<unknown>
|
||||
}
|
||||
|
||||
export async function enqueueHarnessMessage(
|
||||
agentId: string,
|
||||
input: EnqueueMessageInput,
|
||||
): Promise<HarnessQueuedMessage> {
|
||||
const baseUrl = await getAgentServerUrl()
|
||||
const response = await fetch(
|
||||
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue`,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
message: input.message,
|
||||
...(input.attachments && input.attachments.length > 0
|
||||
? { attachments: input.attachments }
|
||||
: {}),
|
||||
}),
|
||||
},
|
||||
)
|
||||
if (!response.ok) {
|
||||
let message = `Request failed with status ${response.status}`
|
||||
try {
|
||||
const body = (await response.json()) as { error?: string }
|
||||
if (body.error) message = body.error
|
||||
} catch {}
|
||||
throw new Error(message)
|
||||
}
|
||||
const body = (await response.json()) as { queued: HarnessQueuedMessage }
|
||||
return body.queued
|
||||
}
|
||||
|
||||
export async function removeHarnessQueuedMessage(
|
||||
agentId: string,
|
||||
messageId: string,
|
||||
): Promise<{ removed: boolean }> {
|
||||
const baseUrl = await getAgentServerUrl()
|
||||
const response = await fetch(
|
||||
`${baseUrl}/agents/${encodeURIComponent(agentId)}/queue/${encodeURIComponent(
|
||||
messageId,
|
||||
)}`,
|
||||
{ method: 'DELETE' },
|
||||
)
|
||||
if (!response.ok) return { removed: false }
|
||||
return (await response.json()) as { removed: boolean }
|
||||
}
|
||||
|
||||
/**
|
||||
* Optimistic enqueue: writes the new queued message into the listing
|
||||
* cache immediately so the queue panel reflects the change without
|
||||
* waiting for the next poll. Rolls back if the server rejects.
|
||||
*/
|
||||
export function useEnqueueHarnessMessage() {
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: { agentId: string } & EnqueueMessageInput) =>
|
||||
enqueueHarnessMessage(input.agentId, input),
|
||||
onMutate: async (input) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
const optimistic: HarnessQueuedMessage = {
|
||||
id: `optimistic-${Math.random().toString(36).slice(2, 10)}`,
|
||||
createdAt: Date.now(),
|
||||
message: input.message,
|
||||
}
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === input.agentId
|
||||
? { ...agent, queue: [...(agent.queue ?? []), optimistic] }
|
||||
: agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
/**
|
||||
* Optimistic queue removal mirror of `useEnqueueHarnessMessage`.
|
||||
*/
|
||||
export function useRemoveHarnessQueuedMessage() {
|
||||
const { baseUrl } = useAgentServerUrl()
|
||||
const queryClient = useQueryClient()
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (input: { agentId: string; messageId: string }) =>
|
||||
removeHarnessQueuedMessage(input.agentId, input.messageId),
|
||||
onMutate: async (input) => {
|
||||
const queryKey = [AGENT_QUERY_KEYS.agents, baseUrl]
|
||||
await queryClient.cancelQueries({ queryKey })
|
||||
const previous = queryClient.getQueryData<HarnessAgentsResponse>(queryKey)
|
||||
if (!previous) return { previous: undefined }
|
||||
queryClient.setQueryData<HarnessAgentsResponse>(queryKey, {
|
||||
...previous,
|
||||
agents: previous.agents.map((agent) =>
|
||||
agent.id === input.agentId
|
||||
? {
|
||||
...agent,
|
||||
queue: (agent.queue ?? []).filter(
|
||||
(entry) => entry.id !== input.messageId,
|
||||
),
|
||||
}
|
||||
: agent,
|
||||
),
|
||||
})
|
||||
return { previous }
|
||||
},
|
||||
onError: (_err, _vars, context) => {
|
||||
if (!context?.previous) return
|
||||
queryClient.setQueryData(
|
||||
[AGENT_QUERY_KEYS.agents, baseUrl],
|
||||
context.previous,
|
||||
)
|
||||
},
|
||||
onSettled: async () => {
|
||||
await queryClient.invalidateQueries({
|
||||
queryKey: [AGENT_QUERY_KEYS.agents],
|
||||
})
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
@@ -59,3 +59,15 @@ export interface AgentConversation {
|
||||
createdAt: number
|
||||
updatedAt: number
|
||||
}
|
||||
|
||||
export interface AgentCardData {
|
||||
agentId: string
|
||||
name: string
|
||||
model?: string
|
||||
status: 'idle' | 'working' | 'error'
|
||||
lastMessage?: string
|
||||
lastMessageTimestamp?: number
|
||||
activitySummary?: string
|
||||
currentTool?: string
|
||||
costUsd?: number
|
||||
}
|
||||
|
||||
@@ -9,7 +9,6 @@
|
||||
"build": "bun run codegen && wxt build",
|
||||
"build:dev": "bun --env-file=.env.development wxt build --mode development",
|
||||
"zip": "wxt zip",
|
||||
"test": "bun run ../../scripts/run-bun-test.ts ./apps/agent",
|
||||
"compile": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
|
||||
"lint": "bunx biome check",
|
||||
"typecheck": "bun --env-file=.env.development wxt prepare && tsgo --noEmit",
|
||||
|
||||
@@ -38,8 +38,8 @@ browseros-cli install # downloads BrowserOS for your platform
|
||||
# If BrowserOS is installed but not running
|
||||
browseros-cli launch # opens BrowserOS, waits for server
|
||||
|
||||
# Configure the CLI with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
# Configure the CLI (auto-discovers running BrowserOS)
|
||||
browseros-cli init --auto # detects server URL and saves config
|
||||
|
||||
# Verify connection
|
||||
browseros-cli health
|
||||
@@ -52,7 +52,7 @@ browseros-cli init <url> # non-interactive — pass URL directly
|
||||
browseros-cli init # interactive — prompts for URL
|
||||
```
|
||||
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. If `browseros-cli health` cannot connect, copy the current Server URL from BrowserOS Settings > BrowserOS MCP and run `browseros-cli init <Server URL>` again.
|
||||
Config is saved to `~/.config/browseros-cli/config.yaml`. The CLI also auto-discovers the server from `~/.browseros/server.json` (written by BrowserOS on startup).
|
||||
|
||||
### CLI updates
|
||||
|
||||
@@ -126,9 +126,9 @@ To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide]
|
||||
| `--debug` | `BOS_DEBUG=1` | Debug output |
|
||||
| `--timeout, -t` | | Request timeout (default: 2m) |
|
||||
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > config file
|
||||
Priority for server URL: `--server` flag > `BROWSEROS_URL` env > `~/.browseros/server.json` > config file
|
||||
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init <Server URL>`.
|
||||
If no server URL is configured, the CLI exits with setup instructions pointing to `install`, `launch`, and `init`.
|
||||
|
||||
## Testing
|
||||
|
||||
@@ -179,7 +179,7 @@ apps/cli/
|
||||
│ └── config.go # Config file (~/.config/browseros-cli/config.yaml)
|
||||
├── cmd/
|
||||
│ ├── root.go # Root command, global flags
|
||||
│ ├── init.go # Server URL configuration (URL arg or interactive)
|
||||
│ ├── init.go # Server URL configuration (URL arg, --auto, interactive)
|
||||
│ ├── install.go # install (download BrowserOS for current platform)
|
||||
│ ├── launch.go # launch (find and start BrowserOS, wait for server)
|
||||
│ ├── open.go # open (new_page / new_hidden_page)
|
||||
|
||||
@@ -17,6 +17,8 @@ import (
|
||||
)
|
||||
|
||||
func init() {
|
||||
var autoDiscover bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "init [url]",
|
||||
Short: "Configure the BrowserOS server connection",
|
||||
@@ -32,8 +34,9 @@ You can provide the full URL or just the port number:
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
browseros-cli init 9000
|
||||
|
||||
Modes:
|
||||
Three modes:
|
||||
browseros-cli init <url> Non-interactive (full URL or port number)
|
||||
browseros-cli init --auto Auto-discover from ~/.browseros/server.json
|
||||
browseros-cli init Interactive prompt`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.MaximumNArgs(1),
|
||||
@@ -46,9 +49,22 @@ Modes:
|
||||
|
||||
switch {
|
||||
case len(args) == 1:
|
||||
// Non-interactive: URL provided as argument
|
||||
input = args[0]
|
||||
|
||||
case autoDiscover:
|
||||
// Auto-discover: server.json → config → probe common ports
|
||||
discovered := probeRunningServer()
|
||||
if discovered == "" {
|
||||
output.Error("auto-discovery failed: no running BrowserOS found.\n\n"+
|
||||
" If not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", 1)
|
||||
}
|
||||
input = discovered
|
||||
fmt.Printf("Auto-discovered server at %s\n", input)
|
||||
|
||||
default:
|
||||
// Interactive prompt (original behavior)
|
||||
fmt.Println()
|
||||
bold.Println("BrowserOS CLI Setup")
|
||||
fmt.Println()
|
||||
@@ -79,14 +95,12 @@ Modes:
|
||||
output.Errorf(1, "invalid URL: %s", input)
|
||||
}
|
||||
|
||||
// Verify connectivity
|
||||
fmt.Printf("Checking connection to %s ...\n", baseURL)
|
||||
client := &http.Client{Timeout: 5 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
output.Errorf(1, "cannot connect to %s: %v\n\n"+
|
||||
"Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n"+
|
||||
"Then run: browseros-cli init <Server URL>\n"+
|
||||
"Example: browseros-cli init http://127.0.0.1:9000/mcp", baseURL, err)
|
||||
output.Errorf(1, "cannot connect to %s: %v\nIs BrowserOS running?", baseURL, err)
|
||||
}
|
||||
resp.Body.Close()
|
||||
|
||||
@@ -107,5 +121,6 @@ Modes:
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&autoDiscover, "auto", false, "Auto-discover server URL from ~/.browseros/server.json")
|
||||
rootCmd.AddCommand(cmd)
|
||||
}
|
||||
|
||||
@@ -28,7 +28,7 @@ Linux: Downloads AppImage (or .deb with --deb flag)
|
||||
|
||||
After installation:
|
||||
browseros-cli launch # start BrowserOS
|
||||
browseros-cli init <url> # configure the CLI with the Server URL`,
|
||||
browseros-cli init --auto # configure the CLI`,
|
||||
Annotations: map[string]string{"group": "Setup:"},
|
||||
Args: cobra.NoArgs,
|
||||
Run: func(cmd *cobra.Command, args []string) {
|
||||
@@ -81,7 +81,7 @@ After installation:
|
||||
fmt.Println()
|
||||
bold.Println("Next steps:")
|
||||
dim.Println(" browseros-cli launch # start BrowserOS")
|
||||
dim.Println(" browseros-cli init <url> # use the Server URL from BrowserOS settings")
|
||||
dim.Println(" browseros-cli init --auto # configure the CLI")
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"os"
|
||||
@@ -39,7 +38,6 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
if url := probeRunningServer(); url != "" {
|
||||
green.Printf("BrowserOS is already running at %s\n", url)
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
return
|
||||
}
|
||||
|
||||
@@ -65,7 +63,7 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
|
||||
green.Printf("BrowserOS is ready at %s\n", url)
|
||||
fmt.Println()
|
||||
dim.Printf("Next: browseros-cli init %s\n", mcpEndpointURL(url))
|
||||
dim.Println("Next: browseros-cli init --auto")
|
||||
},
|
||||
}
|
||||
|
||||
@@ -77,77 +75,39 @@ If BrowserOS is already running, reports the server URL.`,
|
||||
// Server probing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
var commonBrowserOSPorts = []int{9100, 9200, 9300}
|
||||
|
||||
// probeRunningServer checks launch discovery, explicit config, and common ports for a running server.
|
||||
// probeRunningServer checks server.json, config, and common ports for a running server.
|
||||
func probeRunningServer() string {
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
check := func(baseURL string) bool {
|
||||
client := &http.Client{Timeout: 2 * time.Second}
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
// 1. server.json — written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" && check(url) {
|
||||
return url
|
||||
}
|
||||
|
||||
if url := defaultServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
// 2. Saved config / env var
|
||||
if url := defaultServerURL(); url != "" && check(url) {
|
||||
return url
|
||||
}
|
||||
|
||||
return probeCommonServerPorts(client)
|
||||
}
|
||||
|
||||
func checkServerHealth(client *http.Client, baseURL string) bool {
|
||||
resp, err := client.Get(baseURL + "/health")
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode == 200
|
||||
}
|
||||
|
||||
func probeCommonServerPorts(client *http.Client) string {
|
||||
for _, port := range commonBrowserOSPorts {
|
||||
// 3. Probe common BrowserOS ports as last resort
|
||||
for _, port := range []int{9100, 9200, 9300} {
|
||||
url := fmt.Sprintf("http://127.0.0.1:%d", port)
|
||||
if checkServerHealth(client, url) {
|
||||
if check(url) {
|
||||
return url
|
||||
}
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
// loadBrowserosServerURL reads BrowserOS's runtime discovery file for launch readiness only.
|
||||
//
|
||||
// Normal command resolution must not call this because it can override a URL the
|
||||
// user explicitly saved with `browseros-cli init <Server URL>`.
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func mcpEndpointURL(baseURL string) string {
|
||||
return strings.TrimSuffix(baseURL, "/") + "/mcp"
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Platform-native installation detection
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -157,8 +117,7 @@ func mcpEndpointURL(baseURL string) string {
|
||||
// macOS: `open -Ra "BrowserOS"` — queries Launch Services (finds apps anywhere)
|
||||
// Linux: checks /usr/bin/browseros (.deb), browseros.desktop, or AppImage files
|
||||
// Windows: checks executable at %LOCALAPPDATA%\BrowserOS\Application\BrowserOS.exe
|
||||
//
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
// and registry uninstall key (per-user Chromium install pattern)
|
||||
func isBrowserOSInstalled() bool {
|
||||
switch runtime.GOOS {
|
||||
case "darwin":
|
||||
@@ -312,11 +271,14 @@ func waitForServer(maxWait time.Duration) (string, bool) {
|
||||
|
||||
for time.Now().Before(deadline) {
|
||||
// server.json is written by BrowserOS on startup with the actual port
|
||||
if url := loadBrowserosServerURL(); url != "" && checkServerHealth(client, url) {
|
||||
return url, true
|
||||
}
|
||||
if url := probeCommonServerPorts(client); url != "" {
|
||||
return url, true
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
resp, err := client.Get(url + "/health")
|
||||
if err == nil {
|
||||
resp.Body.Close()
|
||||
if resp.StatusCode == 200 {
|
||||
return url, true
|
||||
}
|
||||
}
|
||||
}
|
||||
fmt.Print(".")
|
||||
time.Sleep(1 * time.Second)
|
||||
|
||||
@@ -1,99 +0,0 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"net/url"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestProbeRunningServerUsesDiscoveryBeforeConfig(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
discoveredServer := newHealthyServer(t)
|
||||
configServer := newHealthyServer(t)
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(fmt.Sprintf(`{"url":%q}`, discoveredServer.URL))
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
if err := config.Save(&config.Config{ServerURL: configServer.URL}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := probeRunningServer()
|
||||
if got != normalizeServerURL(discoveredServer.URL) {
|
||||
t.Fatalf("probeRunningServer() = %q, want %q", got, normalizeServerURL(discoveredServer.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func TestWaitForServerUsesCommonPortFallback(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
|
||||
server := newHealthyServer(t)
|
||||
port := serverPort(t, server.URL)
|
||||
|
||||
originalPorts := commonBrowserOSPorts
|
||||
commonBrowserOSPorts = []int{port}
|
||||
t.Cleanup(func() {
|
||||
commonBrowserOSPorts = originalPorts
|
||||
})
|
||||
|
||||
got, ok := waitForServer(100 * time.Millisecond)
|
||||
if !ok {
|
||||
t.Fatal("waitForServer() ok = false, want true")
|
||||
}
|
||||
if got != normalizeServerURL(server.URL) {
|
||||
t.Fatalf("waitForServer() = %q, want %q", got, normalizeServerURL(server.URL))
|
||||
}
|
||||
}
|
||||
|
||||
func newHealthyServer(t *testing.T) *httptest.Server {
|
||||
t.Helper()
|
||||
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if r.URL.Path != "/health" {
|
||||
http.NotFound(w, r)
|
||||
return
|
||||
}
|
||||
w.WriteHeader(http.StatusOK)
|
||||
}))
|
||||
t.Cleanup(server.Close)
|
||||
return server
|
||||
}
|
||||
|
||||
func serverPort(t *testing.T, rawURL string) int {
|
||||
t.Helper()
|
||||
|
||||
parsed, err := url.Parse(rawURL)
|
||||
if err != nil {
|
||||
t.Fatalf("url.Parse() error = %v", err)
|
||||
}
|
||||
_, portText, err := net.SplitHostPort(parsed.Host)
|
||||
if err != nil {
|
||||
t.Fatalf("net.SplitHostPort() error = %v", err)
|
||||
}
|
||||
port, err := strconv.Atoi(portText)
|
||||
if err != nil {
|
||||
t.Fatalf("strconv.Atoi() error = %v", err)
|
||||
}
|
||||
return port
|
||||
}
|
||||
@@ -2,8 +2,10 @@ package cmd
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -287,15 +289,18 @@ func drainAutomaticUpdateCheckWithTimeout(done <-chan struct{}, timeout time.Dur
|
||||
}
|
||||
}
|
||||
|
||||
// defaultServerURL returns the implicit target from user-controlled settings only.
|
||||
//
|
||||
// BrowserOS writes a discovery file at runtime, but normal commands intentionally
|
||||
// ignore it so a saved URL is not silently overridden by another running server.
|
||||
func defaultServerURL() string {
|
||||
// 1. Explicit env var always wins
|
||||
if env := normalizeServerURL(os.Getenv("BROWSEROS_URL")); env != "" {
|
||||
return env
|
||||
}
|
||||
|
||||
// 2. Live discovery file from running BrowserOS (most current)
|
||||
if url := loadBrowserosServerURL(); url != "" {
|
||||
return url
|
||||
}
|
||||
|
||||
// 3. Saved config (may be stale if port changed)
|
||||
cfg, err := config.Load()
|
||||
if err == nil {
|
||||
if url := normalizeServerURL(cfg.ServerURL); url != "" {
|
||||
@@ -306,6 +311,33 @@ func defaultServerURL() string {
|
||||
return ""
|
||||
}
|
||||
|
||||
type serverDiscoveryConfig struct {
|
||||
ServerPort int `json:"server_port"`
|
||||
URL string `json:"url"`
|
||||
ServerVersion string `json:"server_version"`
|
||||
BrowserOSVersion string `json:"browseros_version,omitempty"`
|
||||
ChromiumVersion string `json:"chromium_version,omitempty"`
|
||||
}
|
||||
|
||||
func loadBrowserosServerURL() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
data, err := os.ReadFile(filepath.Join(home, ".browseros", "server.json"))
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
var sc serverDiscoveryConfig
|
||||
if err := json.Unmarshal(data, &sc); err != nil {
|
||||
return ""
|
||||
}
|
||||
|
||||
return normalizeServerURL(sc.URL)
|
||||
}
|
||||
|
||||
func normalizeServerURL(raw string) string {
|
||||
normalized := strings.TrimSpace(raw)
|
||||
|
||||
@@ -337,10 +369,8 @@ func validateServerURL(raw string) (string, error) {
|
||||
|
||||
return "", fmt.Errorf(
|
||||
"BrowserOS server URL is not configured.\n\n" +
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
" If BrowserOS is running: browseros-cli init --auto\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install",
|
||||
)
|
||||
}
|
||||
|
||||
@@ -1,13 +1,8 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"browseros-cli/config"
|
||||
)
|
||||
|
||||
func TestSetVersionUpdatesRootCommand(t *testing.T) {
|
||||
@@ -105,76 +100,6 @@ func TestShouldSkipAutomaticUpdates(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesEnvBeforeConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "http://127.0.0.1:9115/mcp")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9000/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLUsesSavedConfig(t *testing.T) {
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
if err := config.Save(&config.Config{ServerURL: "http://127.0.0.1:9115/mcp"}); err != nil {
|
||||
t.Fatalf("config.Save() error = %v", err)
|
||||
}
|
||||
|
||||
got := defaultServerURL()
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("defaultServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDefaultServerURLIgnoresBrowserOSServerJSON(t *testing.T) {
|
||||
home := t.TempDir()
|
||||
t.Setenv("HOME", home)
|
||||
t.Setenv("USERPROFILE", home)
|
||||
t.Setenv("XDG_CONFIG_HOME", t.TempDir())
|
||||
t.Setenv("BROWSEROS_URL", "")
|
||||
|
||||
serverDir := filepath.Join(home, ".browseros")
|
||||
if err := os.MkdirAll(serverDir, 0755); err != nil {
|
||||
t.Fatalf("os.MkdirAll() error = %v", err)
|
||||
}
|
||||
data := []byte(`{"url":"http://127.0.0.1:9999"}`)
|
||||
if err := os.WriteFile(filepath.Join(serverDir, "server.json"), data, 0644); err != nil {
|
||||
t.Fatalf("os.WriteFile() error = %v", err)
|
||||
}
|
||||
|
||||
if got := defaultServerURL(); got != "" {
|
||||
t.Fatalf("defaultServerURL() = %q, want empty", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestNormalizeServerURLAcceptsMCPEndpoint(t *testing.T) {
|
||||
got := normalizeServerURL(" http://127.0.0.1:9115/mcp ")
|
||||
if got != "http://127.0.0.1:9115" {
|
||||
t.Fatalf("normalizeServerURL() = %q, want %q", got, "http://127.0.0.1:9115")
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateServerURLExplainsManualInit(t *testing.T) {
|
||||
_, err := validateServerURL("")
|
||||
if err == nil {
|
||||
t.Fatal("validateServerURL() error = nil, want setup instructions")
|
||||
}
|
||||
msg := err.Error()
|
||||
if !strings.Contains(msg, "browseros-cli init <Server URL>") {
|
||||
t.Fatalf("validateServerURL() error = %q, want manual init instructions", msg)
|
||||
}
|
||||
if strings.Contains(msg, "init --auto") {
|
||||
t.Fatalf("validateServerURL() error = %q, should not mention init --auto", msg)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDrainAutomaticUpdateCheckWithTimeoutWaitsForCompletion(t *testing.T) {
|
||||
done := make(chan struct{})
|
||||
returned := make(chan struct{})
|
||||
|
||||
@@ -44,7 +44,10 @@ func (c *Client) connect(ctx context.Context) (*sdkmcp.ClientSession, error) {
|
||||
|
||||
session, err := sdkClient.Connect(ctx, transport, nil)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
}
|
||||
return session, nil
|
||||
}
|
||||
@@ -184,7 +187,10 @@ func (c *Client) Status() (map[string]any, error) {
|
||||
func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
resp, err := c.HTTPClient.Get(c.BaseURL + path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w%s", c.BaseURL, err, connectionSetupInstructions())
|
||||
return nil, fmt.Errorf("cannot connect to BrowserOS at %s: %w\n\n"+
|
||||
" If BrowserOS is running on a different port: browseros-cli init --auto\n"+
|
||||
" If BrowserOS is not running: browseros-cli launch\n"+
|
||||
" If not installed: browseros-cli install", c.BaseURL, err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
@@ -199,14 +205,3 @@ func (c *Client) restGET(path string) (map[string]any, error) {
|
||||
}
|
||||
return data, nil
|
||||
}
|
||||
|
||||
// connectionSetupInstructions explains how to recover from a stale or missing server URL.
|
||||
func connectionSetupInstructions() string {
|
||||
return "\n\n" +
|
||||
" Open BrowserOS Settings > BrowserOS MCP and copy the Server URL.\n" +
|
||||
" Save it with: browseros-cli init <Server URL>\n" +
|
||||
" Example: browseros-cli init http://127.0.0.1:9000/mcp\n" +
|
||||
" Run once with: browseros-cli --server <Server URL> health\n" +
|
||||
" If BrowserOS is closed: browseros-cli launch\n" +
|
||||
" If not installed: browseros-cli install"
|
||||
}
|
||||
|
||||
@@ -31,8 +31,8 @@ browseros-cli install
|
||||
# Start BrowserOS
|
||||
browseros-cli launch
|
||||
|
||||
# Configure MCP settings with the Server URL from BrowserOS settings
|
||||
browseros-cli init http://127.0.0.1:9000/mcp
|
||||
# Auto-configure MCP settings for your AI tools
|
||||
browseros-cli init --auto
|
||||
|
||||
# Verify everything is working
|
||||
browseros-cli health
|
||||
|
||||
57
packages/browseros-agent/apps/eval/README.md
vendored
57
packages/browseros-agent/apps/eval/README.md
vendored
@@ -9,7 +9,6 @@ Evaluation framework for BrowserOS browser automation agents. Runs tasks from st
|
||||
- **BrowserOS binary** at `/Applications/BrowserOS.app` (macOS) or `BROWSEROS_BINARY` pointing at it
|
||||
- **Bun** runtime
|
||||
- **API keys** for your LLM provider (and `CLAUDE_CODE_OAUTH_TOKEN` if you use `performance_grader`)
|
||||
- **Python 3.10+ with `agisdk`** for AGI SDK / REAL Bench grading. Set `BROWSEROS_EVAL_PYTHON` if your default `python3` is older.
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -68,7 +67,7 @@ This lets us run the same suite against multiple model setups without copying th
|
||||
|
||||
```txt
|
||||
agisdk-daily-10 + kimi-fireworks
|
||||
agisdk-daily-10 + claude-opus
|
||||
agisdk-daily-10 + claude-sonnet
|
||||
agisdk-daily-10 + clado-action-000159
|
||||
```
|
||||
|
||||
@@ -80,7 +79,6 @@ For `orchestrator-executor` suites, there can also be an executor model/backend.
|
||||
|------|-------------|
|
||||
| `single` | Single LLM agent driven by the BrowserOS tool loop (CDP) |
|
||||
| `orchestrator-executor` | High-level orchestrator + per-step executor (LLM or Clado visual model) |
|
||||
| `claude-code` | External Claude Code CLI driven through BrowserOS MCP |
|
||||
|
||||
### Single agent
|
||||
|
||||
@@ -121,24 +119,6 @@ The orchestrator works with any LLM provider. The executor can be another LLM, o
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Code
|
||||
|
||||
Claude Code runs as an external `claude -p` subprocess. The eval runner passes a task-scoped MCP config that points Claude Code at the active worker's BrowserOS MCP endpoint, while the eval capture layer still saves messages, screenshots, trajectory metadata, and grader outputs.
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
BROWSEROS_EVAL_PYTHON=/path/to/python3 bun run eval run --config configs/legacy/claude-code-agisdk-real.json
|
||||
bun run eval suite --config configs/legacy/claude-code-agisdk-real.json --publish r2
|
||||
```
|
||||
|
||||
## Graders
|
||||
|
||||
| Name | Description |
|
||||
@@ -171,7 +151,6 @@ The `apiKey` field supports two formats:
|
||||
| `CLADO_ACTION_MODEL`, `CLADO_ACTION_API_KEY`, `CLADO_ACTION_BASE_URL` | Clado executor defaults |
|
||||
| `BROWSEROS_BINARY` | BrowserOS binary path in CI/local smoke runs |
|
||||
| `BROWSEROS_SERVER_URL` | Optional grader MCP URL override |
|
||||
| `BROWSEROS_EVAL_PYTHON` | Optional Python interpreter for JSON graders such as `agisdk_state_diff` |
|
||||
| `WEBARENA_INFINITY_DIR` | Local WebArena-Infinity checkout for Infinity tasks |
|
||||
| `NOPECHA_API_KEY` | CAPTCHA solver extension |
|
||||
| `EVAL_R2_ACCOUNT_ID`, `EVAL_R2_ACCESS_KEY_ID`, `EVAL_R2_SECRET_ACCESS_KEY`, `EVAL_R2_BUCKET`, `EVAL_R2_CDN_BASE_URL` | R2 upload and viewer URL |
|
||||
@@ -202,8 +181,6 @@ export EVAL_R2_BUCKET=browseros-eval
|
||||
export EVAL_R2_CDN_BASE_URL=https://eval.browseros.com
|
||||
```
|
||||
|
||||
`EVAL_R2_CDN_BASE_URL` must be a public R2 custom domain, `r2.dev` URL, or Worker URL. Do not set it to the private `*.r2.cloudflarestorage.com` S3 API endpoint.
|
||||
|
||||
Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
|
||||
### BrowserOS infrastructure
|
||||
@@ -215,7 +192,7 @@ Published runs are available at `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
}
|
||||
```
|
||||
|
||||
@@ -276,35 +253,7 @@ results/
|
||||
summary.json # Aggregate pass rates
|
||||
```
|
||||
|
||||
R2 publishing preserves the task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
|
||||
### R2 viewer manifest
|
||||
|
||||
`runs/<run-id>/manifest.json` is the source of truth for the public viewer. New manifests include `schemaVersion: 2` and each task includes explicit artifact paths:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 2,
|
||||
"runId": "agisdk-real-smoke-2026-04-30-0000",
|
||||
"tasks": [
|
||||
{
|
||||
"queryId": "agisdk-dashdish-10",
|
||||
"paths": {
|
||||
"metadata": "tasks/agisdk-dashdish-10/metadata.json",
|
||||
"messages": "tasks/agisdk-dashdish-10/messages.jsonl",
|
||||
"grades": "tasks/agisdk-dashdish-10/grades.json",
|
||||
"trace": "tasks/agisdk-dashdish-10/trace.jsonl",
|
||||
"screenshots": "tasks/agisdk-dashdish-10/screenshots",
|
||||
"graderArtifacts": "tasks/agisdk-dashdish-10/grader-artifacts"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The static viewer uses `task.paths` when present. Older uploaded runs without `schemaVersion` or `task.paths` still work through the legacy inferred layout: `runs/<run-id>/<task-id>/metadata.json`, `messages.jsonl`, and `screenshots/<n>.png`.
|
||||
|
||||
Manifest paths are stable artifact locations, not a guarantee that every optional artifact exists for every task. For example, `attempt.json`, `trace.jsonl`, or grader artifact directories may be absent when that artifact was not produced by the run.
|
||||
R2 publishing preserves the same task files under `runs/<run-id>/...`, writes `runs/<run-id>/manifest.json`, and uploads `viewer.html` at the bucket root. The viewer URL is `EVAL_R2_CDN_BASE_URL/viewer.html?run=<run-id>`.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"supportsImages": true
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"dataset": "../../data/webbench-2of4-50.jsonl",
|
||||
"num_workers": 10,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
@@ -21,6 +21,6 @@
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"graders": ["performance_grader"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
{
|
||||
"agent": {
|
||||
"type": "claude-code",
|
||||
"model": "opus"
|
||||
},
|
||||
"dataset": "../../data/agisdk-real.jsonl",
|
||||
"num_workers": 1,
|
||||
"restart_server_per_task": true,
|
||||
"browseros": {
|
||||
"server_url": "http://127.0.0.1:9110",
|
||||
"base_cdp_port": 9010,
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
},
|
||||
"graders": ["agisdk_state_diff"],
|
||||
"timeout_ms": 1800000
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
"base_server_port": 9110,
|
||||
"base_extension_port": 9310,
|
||||
"load_extensions": false,
|
||||
"headless": false
|
||||
"headless": true
|
||||
},
|
||||
"captcha": {
|
||||
"api_key_env": "NOPECHA_API_KEY"
|
||||
|
||||
@@ -5,7 +5,6 @@
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"eval": "bun --env-file=.env.development run src/index.ts",
|
||||
"test": "bun run ../../scripts/run-bun-test.ts ./apps/eval/tests",
|
||||
"typecheck": "tsc --noEmit"
|
||||
},
|
||||
"dependencies": {
|
||||
|
||||
@@ -24,11 +24,45 @@ import {
|
||||
PutObjectCommand,
|
||||
S3Client,
|
||||
} from '@aws-sdk/client-s3'
|
||||
import {
|
||||
buildRunSummaries,
|
||||
type ReportManifest,
|
||||
type RunSummary,
|
||||
} from '../src/reporting/run-summary'
|
||||
|
||||
interface ManifestTask {
|
||||
queryId: string
|
||||
query: string
|
||||
status: string
|
||||
durationMs: number
|
||||
screenshotCount: number
|
||||
graderResults: Record<string, { pass: boolean; score: number }>
|
||||
}
|
||||
|
||||
interface Manifest {
|
||||
runId: string
|
||||
uploadedAt: string
|
||||
agentConfig?: { type?: string; model?: string }
|
||||
dataset?: string
|
||||
summary?: { passRate?: number; avgDurationMs?: number }
|
||||
tasks: ManifestTask[]
|
||||
}
|
||||
|
||||
interface RunSummary {
|
||||
runId: string
|
||||
configName: string
|
||||
date: string
|
||||
avgScore: number
|
||||
total: number
|
||||
completed: number
|
||||
failed: number
|
||||
timeout: number
|
||||
avgDurationMs: number
|
||||
model: string
|
||||
dataset: string
|
||||
agentType: string
|
||||
}
|
||||
|
||||
const PASS_FAIL_GRADER_ORDER = [
|
||||
'agisdk_state_diff',
|
||||
'infinity_state',
|
||||
'performance_grader',
|
||||
]
|
||||
|
||||
function requireEnv(name: string): string {
|
||||
const value = process.env[name]
|
||||
@@ -53,7 +87,7 @@ const client = new S3Client({
|
||||
// Step 1: List all manifest.json files in runs/
|
||||
console.log('Scanning R2 for eval runs...')
|
||||
|
||||
const manifests: ReportManifest[] = []
|
||||
const manifests: Manifest[] = []
|
||||
let continuationToken: string | undefined
|
||||
|
||||
do {
|
||||
@@ -93,9 +127,64 @@ if (manifests.length === 0) {
|
||||
}
|
||||
|
||||
// Step 2: Build run summaries
|
||||
const runs: RunSummary[] = buildRunSummaries(manifests)
|
||||
const runs: RunSummary[] = manifests
|
||||
.map((m) => {
|
||||
const total = m.tasks.length
|
||||
const completed = m.tasks.filter((t) => t.status === 'completed').length
|
||||
const failed = m.tasks.filter((t) => t.status === 'failed').length
|
||||
const timeout = m.tasks.filter((t) => t.status === 'timeout').length
|
||||
|
||||
let scoredCount = 0
|
||||
let scoreSum = 0
|
||||
for (const task of m.tasks) {
|
||||
if (!task.graderResults) continue
|
||||
for (const name of PASS_FAIL_GRADER_ORDER) {
|
||||
if (task.graderResults[name]) {
|
||||
scoredCount++
|
||||
scoreSum += task.graderResults[name].score ?? 0
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const avgScore = scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0
|
||||
const durations = m.tasks
|
||||
.filter((t) => t.durationMs > 0)
|
||||
.map((t) => t.durationMs)
|
||||
const avgDurationMs =
|
||||
durations.length > 0
|
||||
? durations.reduce((a, b) => a + b, 0) / durations.length
|
||||
: 0
|
||||
|
||||
const date = m.uploadedAt
|
||||
? `${m.uploadedAt.split('T')[0]} ${m.uploadedAt.split('T')[1]?.slice(0, 5) || ''}`
|
||||
: m.runId.slice(0, 15)
|
||||
|
||||
const model = m.agentConfig?.model || 'unknown'
|
||||
const dataset = m.dataset || m.runId
|
||||
const agentType = m.agentConfig?.type || 'unknown'
|
||||
|
||||
const configName = extractConfigName(m.runId)
|
||||
return {
|
||||
runId: m.runId,
|
||||
configName,
|
||||
date,
|
||||
avgScore,
|
||||
total,
|
||||
completed,
|
||||
failed,
|
||||
timeout,
|
||||
avgDurationMs,
|
||||
model,
|
||||
dataset,
|
||||
agentType,
|
||||
}
|
||||
})
|
||||
.sort((a, b) => a.date.localeCompare(b.date))
|
||||
|
||||
// Step 3: Identify unique config groups
|
||||
// runId can be "ci-weekly" (old) or "ci-weekly-2026-03-21-1730" (timestamped)
|
||||
// Extract config name by stripping the date-time suffix pattern
|
||||
function escHtml(s: string): string {
|
||||
return s
|
||||
.replace(/&/g, '&')
|
||||
@@ -104,6 +193,12 @@ function escHtml(s: string): string {
|
||||
.replace(/"/g, '"')
|
||||
}
|
||||
|
||||
function extractConfigName(runId: string): string {
|
||||
// "browseros-agent-weekly-2026-03-21-1730" → "browseros-agent-weekly"
|
||||
// "ci-weekly" → "ci-weekly" (no timestamp, old format)
|
||||
return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
|
||||
}
|
||||
|
||||
const configGroups = [...new Set(runs.map((r) => r.configName))]
|
||||
const defaultConfig = configGroups.includes('ci-weekly')
|
||||
? 'ci-weekly'
|
||||
|
||||
@@ -1,238 +0,0 @@
|
||||
import { writeFile } from 'node:fs/promises'
|
||||
import { join } from 'node:path'
|
||||
import { DEFAULT_TIMEOUT_MS } from '../../constants'
|
||||
import type { ClaudeCodeAgentConfig, UIMessageStreamEvent } from '../../types'
|
||||
import { withEvalTimeout } from '../../utils/with-eval-timeout'
|
||||
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
|
||||
import {
|
||||
type ClaudeCodeProcessRunner,
|
||||
createClaudeCodeProcessRunner,
|
||||
} from './process-runner'
|
||||
import {
|
||||
ClaudeCodeStreamParser,
|
||||
shouldCaptureScreenshotForTool,
|
||||
} from './stream-parser'
|
||||
|
||||
export interface ClaudeCodeEvaluatorDeps {
|
||||
processRunner?: ClaudeCodeProcessRunner
|
||||
}
|
||||
|
||||
export class ClaudeCodeEvaluator implements AgentEvaluator {
|
||||
private processRunner: ClaudeCodeProcessRunner
|
||||
|
||||
constructor(
|
||||
private ctx: AgentContext,
|
||||
deps: ClaudeCodeEvaluatorDeps = {},
|
||||
) {
|
||||
this.processRunner = deps.processRunner ?? createClaudeCodeProcessRunner()
|
||||
}
|
||||
|
||||
async execute(): Promise<AgentResult> {
|
||||
const { config, task, capture, taskOutputDir } = this.ctx
|
||||
const startTime = Date.now()
|
||||
const timeoutMs = config.timeout_ms ?? DEFAULT_TIMEOUT_MS
|
||||
|
||||
await capture.messageLogger.logUser(task.query)
|
||||
|
||||
if (config.agent.type !== 'claude-code') {
|
||||
throw new Error('ClaudeCodeEvaluator only supports claude-code config')
|
||||
}
|
||||
const agentConfig = config.agent
|
||||
|
||||
const mcpConfigPath = join(taskOutputDir, 'claude-code-mcp.json')
|
||||
await writeFile(
|
||||
mcpConfigPath,
|
||||
JSON.stringify(
|
||||
buildClaudeCodeMcpConfig(config.browseros.server_url),
|
||||
null,
|
||||
2,
|
||||
),
|
||||
)
|
||||
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const toolNamesById = new Map<string, string>()
|
||||
const prompt = buildClaudeCodePrompt(task.query)
|
||||
const args = buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config: agentConfig,
|
||||
})
|
||||
|
||||
const { terminationReason } = await withEvalTimeout(
|
||||
timeoutMs,
|
||||
capture,
|
||||
async (signal) => {
|
||||
const runResult = await this.processRunner.run({
|
||||
executable: agentConfig.claudePath,
|
||||
args,
|
||||
cwd: taskOutputDir,
|
||||
signal,
|
||||
onStdoutLine: async (line) => {
|
||||
const events = parser.pushLine(line)
|
||||
for (const event of events) {
|
||||
await this.handleStreamEvent(event, toolNamesById)
|
||||
}
|
||||
},
|
||||
})
|
||||
|
||||
if (runResult.exitCode !== 0) {
|
||||
const message =
|
||||
runResult.stderr.trim() ||
|
||||
`Claude Code exited with status ${runResult.exitCode}`
|
||||
capture.addError('agent_execution', message, {
|
||||
exitCode: runResult.exitCode,
|
||||
})
|
||||
if (!parser.getLastText()) {
|
||||
throw new Error(message)
|
||||
}
|
||||
}
|
||||
|
||||
for (const error of runResult.streamErrors ?? []) {
|
||||
capture.addWarning(
|
||||
'message_logging',
|
||||
`Claude Code stream event processing failed: ${error}`,
|
||||
)
|
||||
}
|
||||
|
||||
return runResult
|
||||
},
|
||||
)
|
||||
|
||||
const endTime = Date.now()
|
||||
const finalAnswer = parser.getLastText() ?? capture.getLastAssistantText()
|
||||
const metadata = {
|
||||
query_id: task.query_id,
|
||||
dataset: task.dataset,
|
||||
query: task.query,
|
||||
started_at: new Date(startTime).toISOString(),
|
||||
completed_at: new Date(endTime).toISOString(),
|
||||
total_duration_ms: endTime - startTime,
|
||||
total_steps: parser.getToolCallCount() || capture.getScreenshotCount(),
|
||||
termination_reason: terminationReason,
|
||||
final_answer: finalAnswer,
|
||||
errors: capture.getErrors(),
|
||||
warnings: capture.getWarnings(),
|
||||
device_pixel_ratio: capture.screenshot.getDevicePixelRatio(),
|
||||
agent_config: {
|
||||
type: 'claude-code' as const,
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
}
|
||||
|
||||
await capture.trajectorySaver.saveMetadata(metadata)
|
||||
|
||||
return {
|
||||
metadata,
|
||||
messages: capture.getMessages(),
|
||||
finalAnswer,
|
||||
}
|
||||
}
|
||||
|
||||
private async handleStreamEvent(
|
||||
event: UIMessageStreamEvent,
|
||||
toolNamesById: Map<string, string>,
|
||||
): Promise<void> {
|
||||
const { capture, task } = this.ctx
|
||||
let screenshot: number | undefined
|
||||
|
||||
if (event.type === 'tool-input-available') {
|
||||
toolNamesById.set(event.toolCallId, event.toolName)
|
||||
if (isPageInput(event.input)) {
|
||||
capture.setActivePageId(event.input.page)
|
||||
}
|
||||
}
|
||||
|
||||
if (
|
||||
event.type === 'tool-output-available' ||
|
||||
event.type === 'tool-output-error'
|
||||
) {
|
||||
const toolName = toolNamesById.get(event.toolCallId)
|
||||
if (toolName && shouldCaptureScreenshotForTool(toolName)) {
|
||||
screenshot = await this.captureScreenshot()
|
||||
}
|
||||
}
|
||||
|
||||
await capture.messageLogger.logStreamEvent(event, screenshot)
|
||||
capture.emitEvent(task.query_id, {
|
||||
...event,
|
||||
...(screenshot !== undefined && { screenshot }),
|
||||
})
|
||||
}
|
||||
|
||||
private async captureScreenshot(): Promise<number | undefined> {
|
||||
const { capture, task } = this.ctx
|
||||
try {
|
||||
const screenshot = await capture.screenshot.capture(
|
||||
capture.getActivePageId(),
|
||||
)
|
||||
capture.emitEvent(task.query_id, {
|
||||
type: 'screenshot-captured',
|
||||
screenshot,
|
||||
})
|
||||
return screenshot
|
||||
} catch {
|
||||
return undefined
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function isPageInput(input: unknown): input is { page: number } {
|
||||
return (
|
||||
typeof input === 'object' &&
|
||||
input !== null &&
|
||||
'page' in input &&
|
||||
typeof input.page === 'number'
|
||||
)
|
||||
}
|
||||
|
||||
function buildClaudeCodePrompt(taskQuery: string): string {
|
||||
return [
|
||||
'You are running inside BrowserOS eval.',
|
||||
'Use the BrowserOS MCP tools to interact with the already-open browser and complete the user task.',
|
||||
'When the task is complete, respond with the final answer only.',
|
||||
'If blocked, explain the blocker clearly.',
|
||||
'',
|
||||
`Task: ${taskQuery}`,
|
||||
].join('\n')
|
||||
}
|
||||
|
||||
function buildClaudeCodeArgs({
|
||||
prompt,
|
||||
mcpConfigPath,
|
||||
config,
|
||||
}: {
|
||||
prompt: string
|
||||
mcpConfigPath: string
|
||||
config: ClaudeCodeAgentConfig
|
||||
}): string[] {
|
||||
const args = [
|
||||
'-p',
|
||||
prompt,
|
||||
'--mcp-config',
|
||||
mcpConfigPath,
|
||||
'--strict-mcp-config',
|
||||
'--output-format',
|
||||
'stream-json',
|
||||
'--verbose',
|
||||
]
|
||||
|
||||
if (config.model) args.push('--model', config.model)
|
||||
args.push(...config.extraArgs)
|
||||
|
||||
return args
|
||||
}
|
||||
|
||||
function buildClaudeCodeMcpConfig(serverUrl: string) {
|
||||
const trimmed = serverUrl.replace(/\/$/, '')
|
||||
const url = trimmed.endsWith('/mcp') ? trimmed : `${trimmed}/mcp`
|
||||
return {
|
||||
mcpServers: {
|
||||
browseros: {
|
||||
type: 'http',
|
||||
url,
|
||||
headers: { 'X-BrowserOS-Source': 'sdk-internal' },
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
@@ -1,114 +0,0 @@
|
||||
export interface ClaudeCodeRunOptions {
|
||||
executable: string
|
||||
args: string[]
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface ClaudeCodeRunResult {
|
||||
exitCode: number
|
||||
stderr: string
|
||||
streamErrors?: string[]
|
||||
}
|
||||
|
||||
export interface ClaudeCodeProcessRunner {
|
||||
run(options: ClaudeCodeRunOptions): Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export interface SpawnOptions {
|
||||
cwd: string
|
||||
signal?: AbortSignal
|
||||
onStdoutLine: (line: string) => Promise<void>
|
||||
}
|
||||
|
||||
export interface CreateClaudeCodeProcessRunnerDeps {
|
||||
spawn?: (cmd: string[], options: SpawnOptions) => Promise<ClaudeCodeRunResult>
|
||||
}
|
||||
|
||||
export function createClaudeCodeProcessRunner(
|
||||
deps: CreateClaudeCodeProcessRunnerDeps = {},
|
||||
): ClaudeCodeProcessRunner {
|
||||
const spawn = deps.spawn ?? spawnClaudeCode
|
||||
return {
|
||||
run: async ({ executable, args, cwd, signal, onStdoutLine }) =>
|
||||
spawn([executable, ...args], { cwd, signal, onStdoutLine }),
|
||||
}
|
||||
}
|
||||
|
||||
async function spawnClaudeCode(
|
||||
cmd: string[],
|
||||
options: SpawnOptions,
|
||||
): Promise<ClaudeCodeRunResult> {
|
||||
const proc = Bun.spawn({
|
||||
cmd,
|
||||
cwd: options.cwd,
|
||||
stdin: 'ignore',
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
})
|
||||
|
||||
const abort = () => {
|
||||
try {
|
||||
proc.kill('SIGTERM')
|
||||
} catch {
|
||||
// Process may already have exited.
|
||||
}
|
||||
}
|
||||
options.signal?.addEventListener('abort', abort, { once: true })
|
||||
|
||||
try {
|
||||
const streamErrors: string[] = []
|
||||
const stdoutPromise = readLines(
|
||||
proc.stdout,
|
||||
options.onStdoutLine,
|
||||
streamErrors,
|
||||
)
|
||||
const stderrPromise = new Response(proc.stderr).text()
|
||||
const exitCode = await proc.exited
|
||||
await stdoutPromise
|
||||
const stderr = await stderrPromise
|
||||
return { exitCode, stderr, streamErrors }
|
||||
} finally {
|
||||
options.signal?.removeEventListener('abort', abort)
|
||||
}
|
||||
}
|
||||
|
||||
async function readLines(
|
||||
stream: ReadableStream<Uint8Array>,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
const reader = stream.getReader()
|
||||
const decoder = new TextDecoder()
|
||||
let buffer = ''
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read()
|
||||
if (done) break
|
||||
|
||||
buffer += decoder.decode(value, { stream: true })
|
||||
const lines = buffer.split('\n')
|
||||
buffer = lines.pop() ?? ''
|
||||
for (const line of lines) {
|
||||
await emitLine(line, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
buffer += decoder.decode()
|
||||
if (buffer.length > 0) {
|
||||
await emitLine(buffer, onLine, streamErrors)
|
||||
}
|
||||
}
|
||||
|
||||
async function emitLine(
|
||||
line: string,
|
||||
onLine: (line: string) => Promise<void>,
|
||||
streamErrors: string[],
|
||||
): Promise<void> {
|
||||
try {
|
||||
await onLine(line)
|
||||
} catch (error) {
|
||||
streamErrors.push(error instanceof Error ? error.message : String(error))
|
||||
}
|
||||
}
|
||||
@@ -1,142 +0,0 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import type { UIMessageStreamEvent } from '../../types'
|
||||
|
||||
type JsonObject = Record<string, unknown>
|
||||
|
||||
export class ClaudeCodeStreamParser {
|
||||
private lastText: string | null = null
|
||||
private toolCallCount = 0
|
||||
|
||||
pushLine(line: string): UIMessageStreamEvent[] {
|
||||
const trimmed = line.trim()
|
||||
if (!trimmed) return []
|
||||
|
||||
let parsed: unknown
|
||||
try {
|
||||
parsed = JSON.parse(trimmed)
|
||||
} catch {
|
||||
return []
|
||||
}
|
||||
|
||||
if (!isObject(parsed)) return []
|
||||
|
||||
if (parsed.type === 'assistant') {
|
||||
return this.parseAssistantMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'user') {
|
||||
return this.parseUserMessage(parsed)
|
||||
}
|
||||
if (parsed.type === 'result' && typeof parsed.result === 'string') {
|
||||
this.lastText = parsed.result
|
||||
}
|
||||
|
||||
return []
|
||||
}
|
||||
|
||||
getLastText(): string | null {
|
||||
return this.lastText
|
||||
}
|
||||
|
||||
getToolCallCount(): number {
|
||||
return this.toolCallCount
|
||||
}
|
||||
|
||||
private parseAssistantMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (block.type === 'text' && typeof block.text === 'string') {
|
||||
const id = randomUUID()
|
||||
this.lastText = block.text
|
||||
events.push(
|
||||
{ type: 'text-start', id },
|
||||
{ type: 'text-delta', id, delta: block.text },
|
||||
{ type: 'text-end', id },
|
||||
)
|
||||
} else if (
|
||||
block.type === 'tool_use' &&
|
||||
typeof block.id === 'string' &&
|
||||
typeof block.name === 'string'
|
||||
) {
|
||||
this.toolCallCount++
|
||||
events.push({
|
||||
type: 'tool-input-available',
|
||||
toolCallId: block.id,
|
||||
toolName: block.name,
|
||||
input: block.input,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
|
||||
private parseUserMessage(message: JsonObject): UIMessageStreamEvent[] {
|
||||
const content = contentBlocks(message)
|
||||
const events: UIMessageStreamEvent[] = []
|
||||
|
||||
for (const block of content) {
|
||||
if (
|
||||
block.type !== 'tool_result' ||
|
||||
typeof block.tool_use_id !== 'string'
|
||||
) {
|
||||
continue
|
||||
}
|
||||
|
||||
if (block.is_error === true) {
|
||||
events.push({
|
||||
type: 'tool-output-error',
|
||||
toolCallId: block.tool_use_id,
|
||||
errorText: stringifyToolContent(block.content),
|
||||
})
|
||||
} else {
|
||||
events.push({
|
||||
type: 'tool-output-available',
|
||||
toolCallId: block.tool_use_id,
|
||||
output: normalizeToolContent(block.content),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
}
|
||||
|
||||
export function shouldCaptureScreenshotForTool(toolName: string): boolean {
|
||||
if (!toolName.startsWith('mcp__browseros__')) return false
|
||||
return !toolName.endsWith('__take_screenshot')
|
||||
}
|
||||
|
||||
function contentBlocks(message: JsonObject): JsonObject[] {
|
||||
const inner = isObject(message.message) ? message.message : message
|
||||
return Array.isArray(inner.content) ? inner.content.filter(isObject) : []
|
||||
}
|
||||
|
||||
function isObject(value: unknown): value is JsonObject {
|
||||
return typeof value === 'object' && value !== null
|
||||
}
|
||||
|
||||
function normalizeToolContent(content: unknown): unknown {
|
||||
if (!Array.isArray(content)) return content
|
||||
return content.map((item) => {
|
||||
if (
|
||||
isObject(item) &&
|
||||
item.type === 'text' &&
|
||||
typeof item.text === 'string'
|
||||
) {
|
||||
return item.text
|
||||
}
|
||||
return item
|
||||
})
|
||||
}
|
||||
|
||||
function stringifyToolContent(content: unknown): string {
|
||||
const normalized = normalizeToolContent(content)
|
||||
if (typeof normalized === 'string') return normalized
|
||||
try {
|
||||
return JSON.stringify(normalized)
|
||||
} catch {
|
||||
return String(normalized)
|
||||
}
|
||||
}
|
||||
@@ -1,4 +1,3 @@
|
||||
import { ClaudeCodeEvaluator } from './claude-code'
|
||||
import { OrchestratorExecutorEvaluator } from './orchestrator-executor'
|
||||
import { SingleAgentEvaluator } from './single-agent'
|
||||
import type { AgentContext, AgentEvaluator } from './types'
|
||||
@@ -9,8 +8,6 @@ export function createAgent(context: AgentContext): AgentEvaluator {
|
||||
return new SingleAgentEvaluator(context)
|
||||
case 'orchestrator-executor':
|
||||
return new OrchestratorExecutorEvaluator(context)
|
||||
case 'claude-code':
|
||||
return new ClaudeCodeEvaluator(context)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,56 +0,0 @@
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type {
|
||||
DelegationResult,
|
||||
ExecutorBackend,
|
||||
ExecutorCallbacks,
|
||||
} from '../../executor-backend'
|
||||
import { CladoActionExecutor } from './clado-action-executor'
|
||||
|
||||
export interface CladoExecutorBackendOptions {
|
||||
configTemplate: ResolvedAgentConfig
|
||||
serverUrl: string
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
}
|
||||
|
||||
/** Executes delegated goals through the Clado visual action model. */
|
||||
export class CladoExecutorBackend implements ExecutorBackend {
|
||||
readonly kind = 'clado'
|
||||
private executor: CladoActionExecutor | null = null
|
||||
|
||||
constructor(private readonly options: CladoExecutorBackendOptions) {}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<DelegationResult> {
|
||||
const executor = this.getExecutor()
|
||||
const result = await executor.execute(instruction, signal)
|
||||
return result
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
await this.executor?.close()
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
return this.executor?.getTotalSteps() ?? 0
|
||||
}
|
||||
|
||||
private getExecutor(): CladoActionExecutor {
|
||||
if (this.executor) return this.executor
|
||||
|
||||
this.executor = new CladoActionExecutor(
|
||||
{
|
||||
provider: this.options.configTemplate.provider,
|
||||
model: this.options.configTemplate.model,
|
||||
apiKey: this.options.configTemplate.apiKey ?? '',
|
||||
baseUrl: this.options.configTemplate.baseUrl,
|
||||
},
|
||||
this.options.serverUrl,
|
||||
this.options.initialPageId,
|
||||
)
|
||||
this.executor.setCallbacks(this.options.callbacks ?? {})
|
||||
return this.executor
|
||||
}
|
||||
}
|
||||
@@ -1,13 +1,8 @@
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import type {
|
||||
ExecutorBackend,
|
||||
ExecutorBackendKind,
|
||||
ExecutorCallbacks,
|
||||
} from '../executor-backend'
|
||||
import { CladoExecutorBackend } from './clado/clado-executor-backend'
|
||||
import { isCladoActionProvider } from './clado/types'
|
||||
import { ToolLoopExecutorBackend } from './tool-loop/tool-loop-executor-backend'
|
||||
import type { ExecutorCallbacks } from '../../orchestrator-executor/executor'
|
||||
import type { ExecutorBackend, ExecutorBackendKind } from '../executor-backend'
|
||||
import { ExecutorAdapterBackend } from './tool-loop-backend'
|
||||
|
||||
export interface CreateExecutorBackendOptions {
|
||||
backendKind?: ExecutorBackendKind
|
||||
@@ -23,38 +18,28 @@ export interface CreateExecutorBackendOptions {
|
||||
}
|
||||
|
||||
export function backendKindForProvider(provider: string): ExecutorBackendKind {
|
||||
return isCladoActionProvider(provider) ? 'clado' : 'tool-loop'
|
||||
return provider === 'clado-action' ? 'clado' : 'tool-loop'
|
||||
}
|
||||
|
||||
/** Creates the backend used for one orchestrator delegation. */
|
||||
export function createExecutorBackend(
|
||||
options: CreateExecutorBackendOptions,
|
||||
): ExecutorBackend {
|
||||
if (options.executor) return options.executor
|
||||
|
||||
const kind =
|
||||
options.backendKind ??
|
||||
backendKindForProvider(
|
||||
options.provider ?? options.configTemplate?.provider ?? '',
|
||||
)
|
||||
|
||||
if (kind === 'clado') {
|
||||
return new CladoExecutorBackend({
|
||||
configTemplate: required(options.configTemplate, 'configTemplate'),
|
||||
serverUrl: required(options.serverUrl, 'serverUrl'),
|
||||
initialPageId: options.initialPageId,
|
||||
callbacks: options.callbacks,
|
||||
})
|
||||
}
|
||||
|
||||
return new ToolLoopExecutorBackend({
|
||||
configTemplate: required(options.configTemplate, 'configTemplate'),
|
||||
browser: options.browser ?? null,
|
||||
return new ExecutorAdapterBackend({
|
||||
kind,
|
||||
configTemplate: options.configTemplate,
|
||||
browser: options.browser,
|
||||
serverUrl: options.serverUrl,
|
||||
windowId: options.windowId,
|
||||
tabId: options.tabId,
|
||||
initialPageId: options.initialPageId,
|
||||
callbacks: options.callbacks,
|
||||
executor: options.executor,
|
||||
})
|
||||
}
|
||||
|
||||
function required<T>(value: T | undefined, name: string): T {
|
||||
if (value === undefined) throw new Error(`${name} is required`)
|
||||
return value
|
||||
}
|
||||
|
||||
72
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop-backend.ts
vendored
Normal file
72
packages/browseros-agent/apps/eval/src/agents/orchestrated/backends/tool-loop-backend.ts
vendored
Normal file
@@ -0,0 +1,72 @@
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import {
|
||||
Executor,
|
||||
type ExecutorCallbacks,
|
||||
} from '../../orchestrator-executor/executor'
|
||||
import type {
|
||||
DelegationResult,
|
||||
ExecutorBackend,
|
||||
ExecutorBackendKind,
|
||||
} from '../executor-backend'
|
||||
|
||||
interface ExecutorRunner {
|
||||
execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
|
||||
close(): Promise<void>
|
||||
getTotalSteps(): number
|
||||
}
|
||||
|
||||
export interface ExecutorAdapterBackendOptions {
|
||||
kind: ExecutorBackendKind
|
||||
configTemplate?: ResolvedAgentConfig
|
||||
browser?: Browser | null
|
||||
serverUrl?: string
|
||||
windowId?: number
|
||||
tabId?: number
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
executor?: ExecutorRunner
|
||||
}
|
||||
|
||||
export class ExecutorAdapterBackend implements ExecutorBackend {
|
||||
readonly kind: ExecutorBackendKind
|
||||
private readonly executor: ExecutorRunner
|
||||
|
||||
constructor(options: ExecutorAdapterBackendOptions) {
|
||||
this.kind = options.kind
|
||||
this.executor =
|
||||
options.executor ??
|
||||
new Executor(
|
||||
required(options.configTemplate, 'configTemplate'),
|
||||
options.browser ?? null,
|
||||
required(options.serverUrl, 'serverUrl'),
|
||||
{
|
||||
isCladoAction: options.kind === 'clado',
|
||||
windowId: options.windowId,
|
||||
tabId: options.tabId,
|
||||
initialPageId: options.initialPageId,
|
||||
callbacks: options.callbacks,
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<DelegationResult> {
|
||||
return this.executor.execute(instruction, signal)
|
||||
}
|
||||
|
||||
close(): Promise<void> {
|
||||
return this.executor.close()
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
return this.executor.getTotalSteps()
|
||||
}
|
||||
}
|
||||
|
||||
function required<T>(value: T | undefined, name: string): T {
|
||||
if (value === undefined) throw new Error(`${name} is required`)
|
||||
return value
|
||||
}
|
||||
@@ -1,144 +0,0 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import { registry } from '@browseros/server/tools/registry'
|
||||
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
|
||||
import type {
|
||||
DelegationResult,
|
||||
ExecutorBackend,
|
||||
ExecutorCallbacks,
|
||||
} from '../../executor-backend'
|
||||
import { TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT } from './tool-loop-executor-prompt'
|
||||
|
||||
export interface ToolLoopExecutorBackendOptions {
|
||||
configTemplate: ResolvedAgentConfig
|
||||
browser: Browser | null
|
||||
callbacks?: ExecutorCallbacks
|
||||
}
|
||||
|
||||
/** Executes delegated goals through the BrowserOS ToolLoopAgent. */
|
||||
export class ToolLoopExecutorBackend implements ExecutorBackend {
|
||||
readonly kind = 'tool-loop'
|
||||
private stepsUsed = 0
|
||||
private currentUrl = ''
|
||||
|
||||
constructor(private readonly options: ToolLoopExecutorBackendOptions) {}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<DelegationResult> {
|
||||
const browser = this.options.browser
|
||||
if (!browser) {
|
||||
throw new Error('Browser instance is required for tool-loop executor')
|
||||
}
|
||||
|
||||
const stepsAtStart = this.stepsUsed
|
||||
const toolsUsed: string[] = []
|
||||
let status: DelegationResult['status'] = 'done'
|
||||
let resultText = ''
|
||||
|
||||
const conversationId = randomUUID()
|
||||
const agentConfig: ResolvedAgentConfig = {
|
||||
...this.options.configTemplate,
|
||||
conversationId,
|
||||
userSystemPrompt: TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT,
|
||||
evalMode: true,
|
||||
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
|
||||
}
|
||||
|
||||
const browserContext = await this.browserContext(browser)
|
||||
let agent: AiSdkAgent | null = null
|
||||
|
||||
try {
|
||||
agent = await AiSdkAgent.create({
|
||||
resolvedConfig: agentConfig,
|
||||
browser,
|
||||
registry,
|
||||
browserContext,
|
||||
})
|
||||
|
||||
await agent.toolLoopAgent.generate({
|
||||
prompt: instruction,
|
||||
abortSignal: signal,
|
||||
|
||||
experimental_onToolCallStart: ({ toolCall }) => {
|
||||
const input = toolCall.input as Record<string, unknown> | undefined
|
||||
if (input && typeof input.url === 'string' && input.url.length > 0) {
|
||||
this.currentUrl = input.url
|
||||
}
|
||||
this.options.callbacks?.onToolCallStart?.({
|
||||
toolCallId: toolCall.toolCallId,
|
||||
toolName: toolCall.toolName,
|
||||
input: toolCall.input,
|
||||
})
|
||||
},
|
||||
|
||||
experimental_onToolCallFinish: async () => {
|
||||
this.stepsUsed++
|
||||
await this.options.callbacks?.onToolCallFinish?.()
|
||||
},
|
||||
|
||||
onStepFinish: async ({ toolCalls, toolResults, text }) => {
|
||||
if (toolCalls) {
|
||||
for (const toolCall of toolCalls) {
|
||||
if (!toolsUsed.includes(toolCall.toolName)) {
|
||||
toolsUsed.push(toolCall.toolName)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (text) resultText = text
|
||||
|
||||
await this.options.callbacks?.onStepFinish?.({
|
||||
toolCalls,
|
||||
toolResults,
|
||||
text,
|
||||
})
|
||||
},
|
||||
})
|
||||
} catch {
|
||||
status = signal?.aborted ? 'timeout' : 'blocked'
|
||||
} finally {
|
||||
if (agent) await agent.dispose().catch(() => {})
|
||||
}
|
||||
|
||||
if (status === 'done' && signal?.aborted) {
|
||||
status = 'timeout'
|
||||
}
|
||||
|
||||
return {
|
||||
observation: resultText || 'Execution completed with no actions taken.',
|
||||
status,
|
||||
url: this.currentUrl,
|
||||
actionsPerformed: this.stepsUsed - stepsAtStart,
|
||||
toolsUsed,
|
||||
}
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
// No persistent resources; AiSdkAgent is disposed at the end of each execute() call.
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
return this.stepsUsed
|
||||
}
|
||||
|
||||
private async browserContext(
|
||||
browser: Browser,
|
||||
): Promise<BrowserContext | undefined> {
|
||||
const pages = await browser.listPages()
|
||||
const activePage = pages[0]
|
||||
if (!activePage) return undefined
|
||||
|
||||
return {
|
||||
activeTab: {
|
||||
id: activePage.tabId,
|
||||
pageId: activePage.pageId,
|
||||
url: activePage.url,
|
||||
title: activePage.title,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,21 +0,0 @@
|
||||
export const TOOL_LOOP_EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
|
||||
|
||||
## Your Job
|
||||
1. Execute browser actions to achieve the given goal
|
||||
2. Stop as soon as the goal is accomplished -- do NOT perform extra actions
|
||||
3. Write a final observation describing the result
|
||||
|
||||
## Final Response Format
|
||||
When done, your response MUST include:
|
||||
- What you accomplished (or what went wrong)
|
||||
- What the page currently shows: key headings, links, data, or content visible
|
||||
- The current URL from the address bar
|
||||
- If you got stuck, what is blocking progress
|
||||
|
||||
## Rules
|
||||
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
|
||||
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
|
||||
- If the goal is to click something, confirm the result of the click.
|
||||
- If you cannot find what was asked for, say so clearly -- do not guess or improvise.
|
||||
- Prefer browser_navigate over browser_open_tab for going to URLs.
|
||||
- Do NOT call browser_group_tabs or other organizational tools.`
|
||||
@@ -3,28 +3,6 @@ import type { ExecutorResult } from '../orchestrator-executor/types'
|
||||
export type ExecutorBackendKind = 'tool-loop' | 'clado'
|
||||
export type DelegationResult = ExecutorResult
|
||||
|
||||
export interface ToolCallInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
input: unknown
|
||||
}
|
||||
|
||||
export interface ToolResultInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
output: unknown
|
||||
}
|
||||
|
||||
export interface ExecutorCallbacks {
|
||||
onToolCallStart?: (toolCall: ToolCallInfo) => void
|
||||
onToolCallFinish?: () => Promise<void>
|
||||
onStepFinish?: (step: {
|
||||
toolCalls?: ReadonlyArray<ToolCallInfo>
|
||||
toolResults?: ReadonlyArray<ToolResultInfo>
|
||||
text?: string
|
||||
}) => Promise<void>
|
||||
}
|
||||
|
||||
export interface ExecutorBackend {
|
||||
readonly kind: ExecutorBackendKind
|
||||
execute(instruction: string, signal?: AbortSignal): Promise<DelegationResult>
|
||||
|
||||
@@ -1,27 +1,22 @@
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { MAX_ACTIONS_PER_DELEGATION } from '../../../../constants'
|
||||
import { McpClient, type McpToolResult } from '../../../../utils/mcp-client'
|
||||
import { sleep } from '../../../../utils/sleep'
|
||||
import type {
|
||||
ExecutorConfig,
|
||||
ExecutorResult,
|
||||
} from '../../../orchestrator-executor/types'
|
||||
import type { ExecutorCallbacks } from '../../executor-backend'
|
||||
import { MAX_ACTIONS_PER_DELEGATION } from '../../constants'
|
||||
import { McpClient, type McpToolResult } from '../../utils/mcp-client'
|
||||
import { sleep } from '../../utils/sleep'
|
||||
import {
|
||||
extractCladoThinking,
|
||||
formatCladoHistory,
|
||||
getCladoActionSignature,
|
||||
parseCladoActions,
|
||||
summarizeCladoPrediction,
|
||||
} from './clado-actions'
|
||||
} from '../orchestrated/backends/clado/clado-actions'
|
||||
import {
|
||||
normalizeCladoDirection,
|
||||
normalizeCladoPressKey,
|
||||
normalizeCladoScrollAmount,
|
||||
prepareCladoToolArgs,
|
||||
resolveCladoPoint,
|
||||
} from './clado-browser-driver'
|
||||
import { CladoActionClient } from './clado-client'
|
||||
} from '../orchestrated/backends/clado/clado-browser-driver'
|
||||
import { CladoActionClient } from '../orchestrated/backends/clado/clado-client'
|
||||
import {
|
||||
CLADO_ACTION_PROVIDER,
|
||||
type CladoAction,
|
||||
@@ -29,7 +24,9 @@ import {
|
||||
type CladoActionResponse,
|
||||
type CladoViewport,
|
||||
isCladoActionProvider,
|
||||
} from './types'
|
||||
} from '../orchestrated/backends/clado/types'
|
||||
import type { ExecutorCallbacks } from './executor'
|
||||
import type { ExecutorConfig, ExecutorResult } from './types'
|
||||
|
||||
const MAX_CONSECUTIVE_PARSE_FAILURES = 3
|
||||
|
||||
@@ -48,8 +45,10 @@ export class CladoActionExecutor {
|
||||
private currentUrl = ''
|
||||
|
||||
constructor(
|
||||
config: ExecutorConfig,
|
||||
private readonly config: ExecutorConfig,
|
||||
serverUrl: string,
|
||||
readonly _windowId?: number,
|
||||
readonly _tabId?: number,
|
||||
initialPageId?: number,
|
||||
) {
|
||||
if (!isCladoActionProvider(config.provider)) {
|
||||
243
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
vendored
Normal file
243
packages/browseros-agent/apps/eval/src/agents/orchestrator-executor/executor.ts
vendored
Normal file
@@ -0,0 +1,243 @@
|
||||
/**
|
||||
* Executor - Wraps AiSdkAgent for page-level browser actions (direct CDP)
|
||||
*
|
||||
* The executor:
|
||||
* - Receives goal-level instructions from orchestrator
|
||||
* - Executes browser actions until the goal is accomplished
|
||||
* - Returns observation to orchestrator (not full history)
|
||||
*/
|
||||
|
||||
import { randomUUID } from 'node:crypto'
|
||||
import { AiSdkAgent } from '@browseros/server/agent/tool-loop'
|
||||
import type { ResolvedAgentConfig } from '@browseros/server/agent/types'
|
||||
import type { Browser } from '@browseros/server/browser'
|
||||
import { registry } from '@browseros/server/tools/registry'
|
||||
import type { BrowserContext } from '@browseros/shared/schemas/browser-context'
|
||||
import { CladoActionExecutor } from './clado-action-executor'
|
||||
import type { ExecutorResult } from './types'
|
||||
|
||||
const EXECUTOR_SYSTEM_PROMPT = `You are a browser executor. You receive a single goal-level instruction and execute it using browser tools.
|
||||
|
||||
## Your Job
|
||||
1. Execute browser actions to achieve the given goal
|
||||
2. Stop as soon as the goal is accomplished — do NOT perform extra actions
|
||||
3. Write a final observation describing the result
|
||||
|
||||
## Final Response Format
|
||||
When done, your response MUST include:
|
||||
- What you accomplished (or what went wrong)
|
||||
- What the page currently shows: key headings, links, data, or content visible
|
||||
- The current URL from the address bar
|
||||
- If you got stuck, what is blocking progress
|
||||
|
||||
## Rules
|
||||
- Only do what was asked. Do not navigate away, open extra tabs, or reorganize the browser.
|
||||
- If the goal is to navigate somewhere, confirm you arrived by describing what you see.
|
||||
- If the goal is to click something, confirm the result of the click.
|
||||
- If you cannot find what was asked for, say so clearly — do not guess or improvise.
|
||||
- Prefer browser_navigate over browser_open_tab for going to URLs.
|
||||
- Do NOT call browser_group_tabs or other organizational tools.`
|
||||
|
||||
export interface ToolCallInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
input: unknown
|
||||
}
|
||||
|
||||
export interface ToolResultInfo {
|
||||
toolCallId: string
|
||||
toolName: string
|
||||
output: unknown
|
||||
}
|
||||
|
||||
export interface ExecutorCallbacks {
|
||||
onToolCallStart?: (toolCall: ToolCallInfo) => void
|
||||
onToolCallFinish?: () => Promise<void>
|
||||
onStepFinish?: (step: {
|
||||
toolCalls?: ReadonlyArray<ToolCallInfo>
|
||||
toolResults?: ReadonlyArray<ToolResultInfo>
|
||||
text?: string
|
||||
}) => Promise<void>
|
||||
}
|
||||
|
||||
export class Executor {
|
||||
private cladoExecutor: CladoActionExecutor | null = null
|
||||
private stepsUsed = 0
|
||||
private currentUrl = ''
|
||||
private configTemplate: ResolvedAgentConfig
|
||||
private isCladoAction: boolean
|
||||
private browser: Browser | null
|
||||
private serverUrl: string
|
||||
private windowId?: number
|
||||
private tabId?: number
|
||||
private initialPageId?: number
|
||||
private callbacks: ExecutorCallbacks
|
||||
|
||||
constructor(
|
||||
configTemplate: ResolvedAgentConfig,
|
||||
browser: Browser | null,
|
||||
serverUrl: string,
|
||||
options?: {
|
||||
isCladoAction?: boolean
|
||||
windowId?: number
|
||||
tabId?: number
|
||||
initialPageId?: number
|
||||
callbacks?: ExecutorCallbacks
|
||||
},
|
||||
) {
|
||||
this.configTemplate = configTemplate
|
||||
this.isCladoAction = options?.isCladoAction ?? false
|
||||
this.browser = browser
|
||||
this.serverUrl = serverUrl
|
||||
this.windowId = options?.windowId
|
||||
this.tabId = options?.tabId
|
||||
this.initialPageId = options?.initialPageId
|
||||
this.callbacks = options?.callbacks ?? {}
|
||||
}
|
||||
|
||||
async execute(
|
||||
instruction: string,
|
||||
signal?: AbortSignal,
|
||||
): Promise<ExecutorResult> {
|
||||
if (this.isCladoAction) {
|
||||
if (!this.cladoExecutor) {
|
||||
this.cladoExecutor = new CladoActionExecutor(
|
||||
{
|
||||
provider: this.configTemplate.provider,
|
||||
model: this.configTemplate.model,
|
||||
apiKey: this.configTemplate.apiKey ?? '',
|
||||
baseUrl: this.configTemplate.baseUrl,
|
||||
},
|
||||
this.serverUrl,
|
||||
this.windowId,
|
||||
this.tabId,
|
||||
this.initialPageId,
|
||||
)
|
||||
this.cladoExecutor.setCallbacks(this.callbacks)
|
||||
}
|
||||
|
||||
const result = await this.cladoExecutor.execute(instruction, signal)
|
||||
this.stepsUsed = this.cladoExecutor.getTotalSteps()
|
||||
this.currentUrl = result.url || this.currentUrl
|
||||
return result
|
||||
}
|
||||
|
||||
if (!this.browser) {
|
||||
throw new Error('Browser instance is required for standard executor path')
|
||||
}
|
||||
|
||||
const stepsAtStart = this.stepsUsed
|
||||
const toolsUsed: string[] = []
|
||||
let status: 'done' | 'blocked' | 'timeout' = 'done'
|
||||
let resultText = ''
|
||||
|
||||
const conversationId = randomUUID()
|
||||
const agentConfig: ResolvedAgentConfig = {
|
||||
...this.configTemplate,
|
||||
conversationId,
|
||||
userSystemPrompt: EXECUTOR_SYSTEM_PROMPT,
|
||||
evalMode: true,
|
||||
workingDir: `/tmp/browseros-eval-executor-${conversationId}`,
|
||||
}
|
||||
|
||||
// Build browser context so executor agent knows the correct page ID
|
||||
let browserContext: BrowserContext | undefined
|
||||
if (this.browser) {
|
||||
const pages = await this.browser.listPages()
|
||||
const activePage = pages[0]
|
||||
if (activePage) {
|
||||
browserContext = {
|
||||
activeTab: {
|
||||
id: activePage.tabId,
|
||||
pageId: activePage.pageId,
|
||||
url: activePage.url,
|
||||
title: activePage.title,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let agent: AiSdkAgent | null = null
|
||||
|
||||
try {
|
||||
agent = await AiSdkAgent.create({
|
||||
resolvedConfig: agentConfig,
|
||||
browser: this.browser,
|
||||
registry,
|
||||
browserContext,
|
||||
})
|
||||
|
||||
await agent.toolLoopAgent.generate({
|
||||
prompt: instruction,
|
||||
abortSignal: signal,
|
||||
|
||||
experimental_onToolCallStart: ({ toolCall }) => {
|
||||
const input = toolCall.input as Record<string, unknown> | undefined
|
||||
if (input && typeof input.url === 'string' && input.url.length > 0) {
|
||||
this.currentUrl = input.url
|
||||
}
|
||||
this.callbacks.onToolCallStart?.({
|
||||
toolCallId: toolCall.toolCallId,
|
||||
toolName: toolCall.toolName,
|
||||
input: toolCall.input,
|
||||
})
|
||||
},
|
||||
|
||||
experimental_onToolCallFinish: async () => {
|
||||
this.stepsUsed++
|
||||
await this.callbacks.onToolCallFinish?.()
|
||||
},
|
||||
|
||||
onStepFinish: async ({ toolCalls, toolResults, text }) => {
|
||||
if (toolCalls) {
|
||||
for (const tc of toolCalls) {
|
||||
if (!toolsUsed.includes(tc.toolName)) {
|
||||
toolsUsed.push(tc.toolName)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (text) {
|
||||
resultText = text
|
||||
}
|
||||
|
||||
await this.callbacks.onStepFinish?.({ toolCalls, toolResults, text })
|
||||
},
|
||||
})
|
||||
} catch {
|
||||
if (signal?.aborted) {
|
||||
status = 'timeout'
|
||||
} else {
|
||||
status = 'blocked'
|
||||
}
|
||||
} finally {
|
||||
if (agent) await agent.dispose().catch(() => {})
|
||||
}
|
||||
|
||||
if (status === 'done' && signal?.aborted) {
|
||||
status = 'timeout'
|
||||
}
|
||||
|
||||
const observation =
|
||||
resultText || 'Execution completed with no actions taken.'
|
||||
|
||||
return {
|
||||
observation,
|
||||
status,
|
||||
url: this.currentUrl,
|
||||
actionsPerformed: this.stepsUsed - stepsAtStart,
|
||||
toolsUsed,
|
||||
}
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
await this.cladoExecutor?.close()
|
||||
}
|
||||
|
||||
getTotalSteps(): number {
|
||||
if (this.isCladoAction) {
|
||||
return this.cladoExecutor?.getTotalSteps() ?? 0
|
||||
}
|
||||
return this.stepsUsed
|
||||
}
|
||||
}
|
||||
@@ -24,16 +24,16 @@ import {
|
||||
resolveProviderConfig,
|
||||
} from '../../utils/resolve-provider-config'
|
||||
import { withEvalTimeout } from '../../utils/with-eval-timeout'
|
||||
import { isCladoActionProvider } from '../orchestrated/backends/clado/types'
|
||||
import { createExecutorBackend } from '../orchestrated/backends/create-executor-backend'
|
||||
import type { ExecutorCallbacks } from '../orchestrated/executor-backend'
|
||||
import type { AgentContext, AgentEvaluator, AgentResult } from '../types'
|
||||
import type { ExecutorCallbacks } from './executor'
|
||||
import { OrchestratorAgent } from './orchestrator-agent'
|
||||
import type { ExecutorFactory, ExecutorResult } from './types'
|
||||
|
||||
interface ResolvedConfigs {
|
||||
orchestratorConfig: ResolvedAgentConfig & { maxTurns?: number }
|
||||
executorConfig: ResolvedAgentConfig
|
||||
isCladoAction: boolean
|
||||
}
|
||||
|
||||
function toResolvedAgentConfig(
|
||||
@@ -68,10 +68,7 @@ async function resolveAgentConfig(
|
||||
if (!executorModel) {
|
||||
throw new Error('executor.model is required in config')
|
||||
}
|
||||
if (
|
||||
isCladoActionProvider(config.executor.provider) &&
|
||||
!config.executor.baseUrl
|
||||
) {
|
||||
if (config.executor.provider === 'clado-action' && !config.executor.baseUrl) {
|
||||
throw new Error(
|
||||
'executor.baseUrl is required in config for clado-action provider',
|
||||
)
|
||||
@@ -79,8 +76,10 @@ async function resolveAgentConfig(
|
||||
|
||||
const resolvedOrchestrator = await resolveProviderConfig(config.orchestrator)
|
||||
|
||||
const isCladoAction = config.executor.provider === 'clado-action'
|
||||
|
||||
let executorConfig: ResolvedAgentConfig
|
||||
if (isCladoActionProvider(config.executor.provider)) {
|
||||
if (isCladoAction) {
|
||||
executorConfig = {
|
||||
conversationId: crypto.randomUUID(),
|
||||
provider: config.executor.provider as ResolvedAgentConfig['provider'],
|
||||
@@ -109,7 +108,7 @@ async function resolveAgentConfig(
|
||||
maxTurns: config.orchestrator.maxTurns,
|
||||
}
|
||||
|
||||
return { orchestratorConfig, executorConfig }
|
||||
return { orchestratorConfig, executorConfig, isCladoAction }
|
||||
}
|
||||
|
||||
export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
@@ -129,7 +128,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
}
|
||||
|
||||
const agentConfig = config.agent as OrchestratorExecutorConfig
|
||||
const { orchestratorConfig, executorConfig } =
|
||||
const { orchestratorConfig, executorConfig, isCladoAction } =
|
||||
await resolveAgentConfig(agentConfig)
|
||||
|
||||
// Connect to Chrome via CDP — same per-worker offset used by app-manager.
|
||||
@@ -238,6 +237,7 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
capture.emitEvent(task.query_id, delegateInputEvent)
|
||||
|
||||
const executor = createExecutorBackend({
|
||||
backendKind: isCladoAction ? 'clado' : 'tool-loop',
|
||||
configTemplate: executorConfig,
|
||||
browser,
|
||||
serverUrl: config.browseros.server_url,
|
||||
@@ -331,5 +331,6 @@ export class OrchestratorExecutorEvaluator implements AgentEvaluator {
|
||||
}
|
||||
}
|
||||
|
||||
export { Executor } from './executor'
|
||||
export { OrchestratorAgent } from './orchestrator-agent'
|
||||
export * from './types'
|
||||
|
||||
@@ -105,10 +105,7 @@ export class TrajectorySaver {
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: {
|
||||
type: agentConfig.type as
|
||||
| 'single'
|
||||
| 'orchestrator-executor'
|
||||
| 'claude-code',
|
||||
type: agentConfig.type as 'single' | 'orchestrator-executor',
|
||||
model: agentConfig.model,
|
||||
},
|
||||
grader_results: {},
|
||||
|
||||
@@ -82,16 +82,6 @@ function suiteToEvalConfig(
|
||||
})
|
||||
}
|
||||
|
||||
if (suite.agent.type === 'claude-code') {
|
||||
return EvalConfigSchema.parse({
|
||||
...base,
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
...(variant.agent.model && { model: variant.agent.model }),
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
const executorBackend = suite.agent.executorBackend ?? 'tool-loop'
|
||||
const executor =
|
||||
executorBackend === 'clado'
|
||||
@@ -145,10 +135,7 @@ export async function resolveSuiteCommand(
|
||||
const loaded = await loadSuite(options.suitePath)
|
||||
const variant = resolveVariant({
|
||||
variantId: options.variantId,
|
||||
provider:
|
||||
loaded.suite.agent.type === 'claude-code'
|
||||
? 'claude-code'
|
||||
: options.provider,
|
||||
provider: options.provider,
|
||||
model: options.model,
|
||||
apiKey: options.apiKey,
|
||||
baseUrl: options.baseUrl,
|
||||
|
||||
@@ -685,59 +685,6 @@
|
||||
});
|
||||
}
|
||||
|
||||
// Test harness note: these ASCII section markers are used by r2-viewer-compat.test.ts.
|
||||
// -- Artifact path resolution
|
||||
function taskKey(task) {
|
||||
return task.queryId || task.id || 'unknown-task';
|
||||
}
|
||||
|
||||
function legacyArtifactPath(task, artifact) {
|
||||
const id = taskKey(task);
|
||||
switch (artifact) {
|
||||
case 'attempt':
|
||||
return `${id}/attempt.json`;
|
||||
case 'metadata':
|
||||
return `${id}/metadata.json`;
|
||||
case 'messages':
|
||||
return `${id}/messages.jsonl`;
|
||||
case 'trace':
|
||||
return `${id}/trace.jsonl`;
|
||||
case 'grades':
|
||||
return `${id}/grades.json`;
|
||||
case 'screenshots':
|
||||
return `${id}/screenshots`;
|
||||
case 'graderArtifacts':
|
||||
return `${id}/grader-artifacts`;
|
||||
default:
|
||||
return `${id}/${artifact}`;
|
||||
}
|
||||
}
|
||||
|
||||
function artifactPath(task, artifact) {
|
||||
const manifestPath = task.paths && task.paths[artifact];
|
||||
if (typeof manifestPath === 'string' && manifestPath.length > 0) {
|
||||
return manifestPath.replace(/^\/+/, '');
|
||||
}
|
||||
return legacyArtifactPath(task, artifact);
|
||||
}
|
||||
|
||||
function artifactUrl(task, artifact) {
|
||||
return `${basePath}/${artifactPath(task, artifact)}`;
|
||||
}
|
||||
|
||||
function metadataUrl(task) {
|
||||
return artifactUrl(task, 'metadata');
|
||||
}
|
||||
|
||||
function messagesUrl(task) {
|
||||
return artifactUrl(task, 'messages');
|
||||
}
|
||||
|
||||
function screenshotUrl(task, n) {
|
||||
return `${artifactUrl(task, 'screenshots')}/${n}.png`;
|
||||
}
|
||||
|
||||
// -- Task selection
|
||||
// ── Task selection ─────────────────────────────────────────────
|
||||
function selectTask(task) {
|
||||
stopAutoplay();
|
||||
@@ -769,7 +716,6 @@
|
||||
}
|
||||
}
|
||||
|
||||
// -- Center panel
|
||||
// ── Center panel: screenshot viewer ────────────────────────────
|
||||
function renderCenterPanel(task) {
|
||||
const panel = document.getElementById('center-panel');
|
||||
@@ -817,6 +763,10 @@
|
||||
updateControls();
|
||||
}
|
||||
|
||||
function screenshotUrl(task, n) {
|
||||
return `${basePath}/${task.queryId || task.id}/screenshots/${n}.png`;
|
||||
}
|
||||
|
||||
function goToStep(n) {
|
||||
if (!selectedTask || n < 1 || n > totalSteps) return;
|
||||
currentStep = n;
|
||||
@@ -964,7 +914,7 @@
|
||||
body.innerHTML = '<div class="placeholder"><div class="ph-text" style="color: #6e7681;">Loading messages...</div></div>';
|
||||
countEl.textContent = '';
|
||||
|
||||
const msgUrl = messagesUrl(task);
|
||||
const msgUrl = `${basePath}/${task.queryId || task.id}/messages.jsonl`;
|
||||
|
||||
fetch(msgUrl)
|
||||
.then((res) => {
|
||||
@@ -1125,7 +1075,7 @@
|
||||
|
||||
// ── Load task metadata for rich grader details ──────────────────
|
||||
function loadTaskMetadata(task) {
|
||||
const metaUrl = metadataUrl(task);
|
||||
const metaUrl = `${basePath}/${task.queryId || task.id}/metadata.json`;
|
||||
fetch(metaUrl)
|
||||
.then((res) => res.ok ? res.json() : null)
|
||||
.then((meta) => {
|
||||
|
||||
@@ -2,7 +2,6 @@ export interface PythonEvaluatorOptions {
|
||||
scriptPath: string
|
||||
input: unknown
|
||||
timeoutMs: number
|
||||
pythonPath?: string
|
||||
}
|
||||
|
||||
export interface PythonEvaluatorResult<T> {
|
||||
@@ -16,9 +15,7 @@ export interface PythonEvaluatorResult<T> {
|
||||
export async function runPythonJsonEvaluator<T>(
|
||||
options: PythonEvaluatorOptions,
|
||||
): Promise<PythonEvaluatorResult<T>> {
|
||||
const pythonPath =
|
||||
options.pythonPath || process.env.BROWSEROS_EVAL_PYTHON || 'python3'
|
||||
const proc = Bun.spawn([pythonPath, options.scriptPath], {
|
||||
const proc = Bun.spawn(['python3', options.scriptPath], {
|
||||
stdin: 'pipe',
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
|
||||
@@ -1,8 +1,3 @@
|
||||
import type {
|
||||
ViewerManifest,
|
||||
ViewerManifestTask,
|
||||
} from '../viewer/viewer-manifest'
|
||||
|
||||
export interface R2UploadConfig {
|
||||
accountId: string
|
||||
accessKeyId: string
|
||||
@@ -11,9 +6,27 @@ export interface R2UploadConfig {
|
||||
cdnBaseUrl: string
|
||||
}
|
||||
|
||||
export type R2ManifestTask = ViewerManifestTask
|
||||
export interface R2ManifestTask {
|
||||
queryId: string
|
||||
query: string
|
||||
startUrl: string
|
||||
status: string
|
||||
durationMs: number
|
||||
screenshotCount: number
|
||||
graderResults: Record<string, unknown>
|
||||
}
|
||||
|
||||
export type R2RunManifest = ViewerManifest
|
||||
export interface R2RunManifest {
|
||||
runId: string
|
||||
uploadedAt: string
|
||||
agentConfig?: Record<string, unknown>
|
||||
dataset?: string
|
||||
summary?: {
|
||||
passRate?: unknown
|
||||
avgDurationMs?: unknown
|
||||
}
|
||||
tasks: R2ManifestTask[]
|
||||
}
|
||||
|
||||
export interface R2PublishRunResult {
|
||||
runId: string
|
||||
|
||||
@@ -5,11 +5,8 @@ import {
|
||||
PutObjectCommand,
|
||||
S3Client,
|
||||
} from '@aws-sdk/client-s3'
|
||||
import {
|
||||
buildViewerManifest,
|
||||
type ViewerManifestTaskInput,
|
||||
} from '../viewer/viewer-manifest'
|
||||
import type {
|
||||
R2ManifestTask,
|
||||
R2PublishPathResult,
|
||||
R2PublishRunResult,
|
||||
R2RunManifest,
|
||||
@@ -46,6 +43,7 @@ interface UploadJob {
|
||||
interface TaskDirEntry {
|
||||
taskId: string
|
||||
taskPath: string
|
||||
canonicalLayout: boolean
|
||||
}
|
||||
|
||||
export function contentTypeForPath(filePath: string): string {
|
||||
@@ -131,6 +129,7 @@ async function findTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
|
||||
legacyTasks.push({
|
||||
taskId: entry.name,
|
||||
taskPath,
|
||||
canonicalLayout: false,
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -147,6 +146,7 @@ async function findTaskDirs(runDir: string): Promise<TaskDirEntry[]> {
|
||||
canonicalTasks.push({
|
||||
taskId: entry.name,
|
||||
taskPath,
|
||||
canonicalLayout: true,
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -262,7 +262,7 @@ export class R2Publisher {
|
||||
throw new Error(`No task subdirectories in ${runId}`)
|
||||
}
|
||||
|
||||
const manifestTasks: ViewerManifestTaskInput[] = []
|
||||
const manifestTasks: R2ManifestTask[] = []
|
||||
const jobs: UploadJob[] = (await collectRunRootFiles(runDir)).map(
|
||||
(job) => ({
|
||||
...job,
|
||||
@@ -289,23 +289,22 @@ export class R2Publisher {
|
||||
if (relative.startsWith('screenshots/') && extname(file) === '.png') {
|
||||
screenshotCount++
|
||||
}
|
||||
// Keep legacy keys during the manifest v2 rollout so cached viewers and
|
||||
// old manifests can still resolve task artifacts.
|
||||
jobs.push({
|
||||
key: `runs/${runId}/${taskId}/${relative}`,
|
||||
filePath: file,
|
||||
contentType: contentTypeForPath(file),
|
||||
})
|
||||
jobs.push({
|
||||
key: `runs/${runId}/tasks/${taskId}/${relative}`,
|
||||
filePath: file,
|
||||
contentType: contentTypeForPath(file),
|
||||
})
|
||||
if (taskDirEntry.canonicalLayout) {
|
||||
jobs.push({
|
||||
key: `runs/${runId}/tasks/${taskId}/${relative}`,
|
||||
filePath: file,
|
||||
contentType: contentTypeForPath(file),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
manifestTasks.push({
|
||||
queryId: (meta.query_id as string | undefined) || taskId,
|
||||
artifactId: taskId,
|
||||
query: (meta.query as string | undefined) || '',
|
||||
startUrl: (meta.start_url as string | undefined) || '',
|
||||
status: statusFromMetadata(meta),
|
||||
@@ -313,8 +312,7 @@ export class R2Publisher {
|
||||
screenshotCount:
|
||||
(meta.screenshot_count as number | undefined) || screenshotCount,
|
||||
graderResults:
|
||||
(meta.grader_results as ViewerManifestTaskInput['graderResults']) ||
|
||||
{},
|
||||
(meta.grader_results as Record<string, unknown> | undefined) || {},
|
||||
})
|
||||
}
|
||||
|
||||
@@ -349,7 +347,7 @@ export class R2Publisher {
|
||||
return {
|
||||
runId,
|
||||
uploadedFiles: uploaded + 2,
|
||||
viewerUrl: `${this.config.cdnBaseUrl}/viewer.html?run=${encodeURIComponent(runId)}`,
|
||||
viewerUrl: `${this.config.cdnBaseUrl}/viewer.html?run=${runId}`,
|
||||
manifest,
|
||||
}
|
||||
}
|
||||
@@ -371,7 +369,7 @@ export class R2Publisher {
|
||||
runId: string,
|
||||
agentConfig: Record<string, unknown> | undefined,
|
||||
dataset: string | undefined,
|
||||
tasks: ViewerManifestTaskInput[],
|
||||
tasks: R2ManifestTask[],
|
||||
): Promise<R2RunManifest> {
|
||||
let summaryData: Record<string, unknown> | undefined
|
||||
try {
|
||||
@@ -380,7 +378,7 @@ export class R2Publisher {
|
||||
) as Record<string, unknown>
|
||||
} catch {}
|
||||
|
||||
return buildViewerManifest({
|
||||
return {
|
||||
runId,
|
||||
uploadedAt: this.now().toISOString(),
|
||||
agentConfig,
|
||||
@@ -392,7 +390,7 @@ export class R2Publisher {
|
||||
}
|
||||
: undefined,
|
||||
tasks,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
private async uploadFile(job: UploadJob): Promise<void> {
|
||||
|
||||
@@ -1,104 +0,0 @@
|
||||
export interface ReportManifestTask {
|
||||
queryId: string
|
||||
query?: string
|
||||
status: string
|
||||
durationMs: number
|
||||
screenshotCount?: number
|
||||
paths?: Record<string, string>
|
||||
graderResults?: Record<string, { pass?: boolean; score?: number }>
|
||||
}
|
||||
|
||||
export interface ReportManifest {
|
||||
schemaVersion?: number
|
||||
runId: string
|
||||
uploadedAt?: string
|
||||
agentConfig?: { type?: string; model?: string }
|
||||
dataset?: string
|
||||
summary?: { passRate?: number; avgDurationMs?: number }
|
||||
tasks?: ReportManifestTask[]
|
||||
}
|
||||
|
||||
export interface RunSummary {
|
||||
runId: string
|
||||
configName: string
|
||||
date: string
|
||||
avgScore: number
|
||||
total: number
|
||||
completed: number
|
||||
failed: number
|
||||
timeout: number
|
||||
avgDurationMs: number
|
||||
model: string
|
||||
dataset: string
|
||||
agentType: string
|
||||
}
|
||||
|
||||
// Report score uses the primary pass/fail grader so mixed-grader runs keep
|
||||
// the same precedence as the eval summary.
|
||||
const PASS_FAIL_GRADER_ORDER = [
|
||||
'agisdk_state_diff',
|
||||
'infinity_state',
|
||||
'performance_grader',
|
||||
]
|
||||
|
||||
export function extractConfigName(runId: string): string {
|
||||
return runId.replace(/-\d{4}-\d{2}-\d{2}-\d{4}$/, '')
|
||||
}
|
||||
|
||||
function reportDate(manifest: ReportManifest): string {
|
||||
if (!manifest.uploadedAt) return 'unknown'
|
||||
const [date, time] = manifest.uploadedAt.split('T')
|
||||
return `${date} ${time?.slice(0, 5) || ''}`
|
||||
}
|
||||
|
||||
function primaryScore(task: ReportManifestTask): number | null {
|
||||
if (!task.graderResults) return null
|
||||
for (const name of PASS_FAIL_GRADER_ORDER) {
|
||||
const result = task.graderResults[name]
|
||||
if (result) return result.score ?? 0
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
export function buildRunSummaries(manifests: ReportManifest[]): RunSummary[] {
|
||||
return manifests
|
||||
.map((manifest) => {
|
||||
const tasks = Array.isArray(manifest.tasks) ? manifest.tasks : []
|
||||
const total = tasks.length
|
||||
const completed = tasks.filter((t) => t.status === 'completed').length
|
||||
const failed = tasks.filter((t) => t.status === 'failed').length
|
||||
const timeout = tasks.filter((t) => t.status === 'timeout').length
|
||||
|
||||
let scoredCount = 0
|
||||
let scoreSum = 0
|
||||
for (const task of tasks) {
|
||||
const score = primaryScore(task)
|
||||
if (score === null) continue
|
||||
scoredCount++
|
||||
scoreSum += score
|
||||
}
|
||||
|
||||
const durations = tasks
|
||||
.filter((t) => t.durationMs > 0)
|
||||
.map((t) => t.durationMs)
|
||||
|
||||
return {
|
||||
runId: manifest.runId,
|
||||
configName: extractConfigName(manifest.runId),
|
||||
date: reportDate(manifest),
|
||||
avgScore: scoredCount > 0 ? (scoreSum / scoredCount) * 100 : 0,
|
||||
total,
|
||||
completed,
|
||||
failed,
|
||||
timeout,
|
||||
avgDurationMs:
|
||||
durations.length > 0
|
||||
? durations.reduce((a, b) => a + b, 0) / durations.length
|
||||
: 0,
|
||||
model: manifest.agentConfig?.model || 'unknown',
|
||||
dataset: manifest.dataset || manifest.runId,
|
||||
agentType: manifest.agentConfig?.type || 'unknown',
|
||||
}
|
||||
})
|
||||
.sort((a, b) => a.date.localeCompare(b.date))
|
||||
}
|
||||
@@ -33,13 +33,6 @@ function variantSource(config: EvalConfig): {
|
||||
baseUrl?: string
|
||||
supportsImages?: boolean
|
||||
} {
|
||||
if (config.agent.type === 'claude-code') {
|
||||
return {
|
||||
provider: 'claude-code',
|
||||
model: config.agent.model ?? 'default',
|
||||
}
|
||||
}
|
||||
|
||||
const agent =
|
||||
config.agent.type === 'single' ? config.agent : config.agent.orchestrator
|
||||
if (!agent.model) {
|
||||
@@ -83,7 +76,10 @@ export async function adaptEvalConfigFile(
|
||||
suite: {
|
||||
id,
|
||||
dataset: evalConfig.dataset,
|
||||
agent: suiteAgent(evalConfig, backend),
|
||||
agent:
|
||||
evalConfig.agent.type === 'single'
|
||||
? { type: 'tool-loop' }
|
||||
: { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' },
|
||||
graders: evalConfig.graders ?? [],
|
||||
workers: evalConfig.num_workers,
|
||||
restartBrowserPerTask: evalConfig.restart_server_per_task,
|
||||
@@ -103,17 +99,3 @@ export async function adaptEvalConfigFile(
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
function suiteAgent(
|
||||
config: EvalConfig,
|
||||
backend: ReturnType<typeof executorBackend>,
|
||||
): EvalSuite['agent'] {
|
||||
switch (config.agent.type) {
|
||||
case 'single':
|
||||
return { type: 'tool-loop' }
|
||||
case 'orchestrator-executor':
|
||||
return { type: 'orchestrated', executorBackend: backend ?? 'tool-loop' }
|
||||
case 'claude-code':
|
||||
return { type: 'claude-code' }
|
||||
}
|
||||
}
|
||||
|
||||
@@ -57,30 +57,10 @@ export function resolveVariant(
|
||||
options: ResolveVariantOptions = {},
|
||||
): EvalVariant {
|
||||
const env = options.env ?? process.env
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
|
||||
const provider =
|
||||
options.provider ?? env.EVAL_AGENT_PROVIDER ?? 'openai-compatible'
|
||||
const model = options.model ?? env.EVAL_AGENT_MODEL
|
||||
|
||||
if (provider === 'claude-code') {
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'claude-code'
|
||||
return {
|
||||
id,
|
||||
agent: {
|
||||
provider,
|
||||
model: model ?? '',
|
||||
},
|
||||
publicMetadata: {
|
||||
id,
|
||||
agent: {
|
||||
provider,
|
||||
model: model || 'default',
|
||||
apiKeyConfigured: false,
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
const id = options.variantId ?? env.EVAL_VARIANT ?? 'default'
|
||||
const apiKey = options.apiKey ?? env.EVAL_AGENT_API_KEY
|
||||
const apiKeyEnv =
|
||||
options.apiKeyEnv ?? (options.apiKey ? undefined : 'EVAL_AGENT_API_KEY')
|
||||
|
||||
@@ -8,7 +8,6 @@ export const SuiteAgentSchema = z
|
||||
'single',
|
||||
'orchestrated',
|
||||
'orchestrator-executor',
|
||||
'claude-code',
|
||||
]),
|
||||
executorBackend: z.enum(['tool-loop', 'clado']).optional(),
|
||||
})
|
||||
|
||||
@@ -19,19 +19,9 @@ export const OrchestratorExecutorConfigSchema = z.object({
|
||||
}),
|
||||
})
|
||||
|
||||
export const ClaudeCodeAgentConfigSchema = z
|
||||
.object({
|
||||
type: z.literal('claude-code'),
|
||||
model: z.string().min(1).optional(),
|
||||
claudePath: z.string().min(1).default('claude'),
|
||||
extraArgs: z.array(z.string()).default([]),
|
||||
})
|
||||
.strict()
|
||||
|
||||
export const AgentConfigSchema = z.discriminatedUnion('type', [
|
||||
SingleAgentConfigSchema,
|
||||
OrchestratorExecutorConfigSchema,
|
||||
ClaudeCodeAgentConfigSchema,
|
||||
])
|
||||
|
||||
export const EvalConfigSchema = z.object({
|
||||
@@ -63,6 +53,5 @@ export type SingleAgentConfig = z.infer<typeof SingleAgentConfigSchema>
|
||||
export type OrchestratorExecutorConfig = z.infer<
|
||||
typeof OrchestratorExecutorConfigSchema
|
||||
>
|
||||
export type ClaudeCodeAgentConfig = z.infer<typeof ClaudeCodeAgentConfigSchema>
|
||||
export type AgentConfig = z.infer<typeof AgentConfigSchema>
|
||||
export type EvalConfig = z.infer<typeof EvalConfigSchema>
|
||||
|
||||
@@ -2,8 +2,6 @@
|
||||
export {
|
||||
type AgentConfig,
|
||||
AgentConfigSchema,
|
||||
type ClaudeCodeAgentConfig,
|
||||
ClaudeCodeAgentConfigSchema,
|
||||
type EvalConfig,
|
||||
EvalConfigSchema,
|
||||
type OrchestratorExecutorConfig,
|
||||
|
||||
@@ -13,7 +13,7 @@ export const GraderResultSchema = z.object({
|
||||
// Agent config in metadata
|
||||
const AgentConfigMetaSchema = z
|
||||
.object({
|
||||
type: z.enum(['single', 'orchestrator-executor', 'claude-code']),
|
||||
type: z.enum(['single', 'orchestrator-executor']),
|
||||
model: z.string().optional(),
|
||||
})
|
||||
.passthrough()
|
||||
|
||||
@@ -59,7 +59,7 @@ export async function validateConfig(
|
||||
) {
|
||||
envVarsToCheck.push(config.agent.apiKey)
|
||||
}
|
||||
} else if (config.agent.type === 'orchestrator-executor') {
|
||||
} else {
|
||||
const { orchestrator, executor } = config.agent
|
||||
if (orchestrator.apiKey && isEnvVarName(orchestrator.apiKey)) {
|
||||
envVarsToCheck.push(orchestrator.apiKey)
|
||||
|
||||
@@ -1,20 +1,7 @@
|
||||
import type { GraderResult } from '../types'
|
||||
|
||||
export const VIEWER_MANIFEST_SCHEMA_VERSION = 2
|
||||
|
||||
export interface ViewerManifestTaskPaths {
|
||||
attempt: string
|
||||
metadata: string
|
||||
messages: string
|
||||
trace: string
|
||||
grades: string
|
||||
screenshots: string
|
||||
graderArtifacts: string
|
||||
}
|
||||
|
||||
export interface ViewerManifestTaskInput {
|
||||
queryId: string
|
||||
artifactId?: string
|
||||
query: string
|
||||
startUrl?: string
|
||||
status: string
|
||||
@@ -23,67 +10,57 @@ export interface ViewerManifestTaskInput {
|
||||
graderResults: Record<string, GraderResult>
|
||||
}
|
||||
|
||||
export interface ViewerManifestTask
|
||||
extends Omit<ViewerManifestTaskInput, 'artifactId'> {
|
||||
startUrl: string
|
||||
paths: ViewerManifestTaskPaths
|
||||
export interface ViewerManifestTask extends ViewerManifestTaskInput {
|
||||
paths: {
|
||||
attempt: string
|
||||
metadata: string
|
||||
messages: string
|
||||
trace: string
|
||||
grades: string
|
||||
screenshots: string
|
||||
graderArtifacts: string
|
||||
}
|
||||
}
|
||||
|
||||
export interface ViewerManifest {
|
||||
schemaVersion: typeof VIEWER_MANIFEST_SCHEMA_VERSION
|
||||
runId: string
|
||||
suiteId?: string
|
||||
variantId?: string
|
||||
suiteId: string
|
||||
variantId: string
|
||||
uploadedAt?: string
|
||||
agentConfig?: Record<string, unknown>
|
||||
dataset?: string
|
||||
summary?: Record<string, unknown>
|
||||
summary: Record<string, unknown>
|
||||
tasks: ViewerManifestTask[]
|
||||
}
|
||||
|
||||
export interface BuildViewerManifestInput {
|
||||
runId: string
|
||||
suiteId?: string
|
||||
variantId?: string
|
||||
suiteId: string
|
||||
variantId: string
|
||||
uploadedAt?: string
|
||||
agentConfig?: Record<string, unknown>
|
||||
dataset?: string
|
||||
summary?: Record<string, unknown>
|
||||
summary: Record<string, unknown>
|
||||
tasks: ViewerManifestTaskInput[]
|
||||
}
|
||||
|
||||
function taskPaths(queryId: string): ViewerManifestTaskPaths {
|
||||
return {
|
||||
attempt: `tasks/${queryId}/attempt.json`,
|
||||
metadata: `tasks/${queryId}/metadata.json`,
|
||||
messages: `tasks/${queryId}/messages.jsonl`,
|
||||
trace: `tasks/${queryId}/trace.jsonl`,
|
||||
grades: `tasks/${queryId}/grades.json`,
|
||||
screenshots: `tasks/${queryId}/screenshots`,
|
||||
graderArtifacts: `tasks/${queryId}/grader-artifacts`,
|
||||
}
|
||||
}
|
||||
|
||||
/** Builds the compact JSON index consumed by the static R2 viewer. */
|
||||
export function buildViewerManifest(
|
||||
input: BuildViewerManifestInput,
|
||||
): ViewerManifest {
|
||||
return {
|
||||
schemaVersion: VIEWER_MANIFEST_SCHEMA_VERSION,
|
||||
runId: input.runId,
|
||||
...(input.suiteId ? { suiteId: input.suiteId } : {}),
|
||||
...(input.variantId ? { variantId: input.variantId } : {}),
|
||||
...(input.uploadedAt ? { uploadedAt: input.uploadedAt } : {}),
|
||||
...(input.agentConfig ? { agentConfig: input.agentConfig } : {}),
|
||||
...(input.dataset ? { dataset: input.dataset } : {}),
|
||||
...(input.summary ? { summary: input.summary } : {}),
|
||||
tasks: input.tasks.map((task) => {
|
||||
const { artifactId, ...publicTask } = task
|
||||
return {
|
||||
...publicTask,
|
||||
startUrl: publicTask.startUrl ?? '',
|
||||
paths: taskPaths(artifactId ?? publicTask.queryId),
|
||||
}
|
||||
}),
|
||||
suiteId: input.suiteId,
|
||||
variantId: input.variantId,
|
||||
uploadedAt: input.uploadedAt,
|
||||
summary: input.summary,
|
||||
tasks: input.tasks.map((task) => ({
|
||||
...task,
|
||||
paths: {
|
||||
attempt: `tasks/${task.queryId}/attempt.json`,
|
||||
metadata: `tasks/${task.queryId}/metadata.json`,
|
||||
messages: `tasks/${task.queryId}/messages.jsonl`,
|
||||
trace: `tasks/${task.queryId}/trace.jsonl`,
|
||||
grades: `tasks/${task.queryId}/grades.json`,
|
||||
screenshots: `tasks/${task.queryId}/screenshots`,
|
||||
graderArtifacts: `tasks/${task.queryId}/grader-artifacts`,
|
||||
},
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,268 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { mkdtemp, readFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { createAgent } from '../../src/agents'
|
||||
import { ClaudeCodeEvaluator } from '../../src/agents/claude-code'
|
||||
import { CaptureContext } from '../../src/capture/context'
|
||||
import {
|
||||
AgentConfigSchema,
|
||||
type EvalConfig,
|
||||
EvalConfigSchema,
|
||||
type Task,
|
||||
TaskMetadataSchema,
|
||||
} from '../../src/types'
|
||||
|
||||
function config(): EvalConfig {
|
||||
return {
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
claudePath: 'claude',
|
||||
extraArgs: [],
|
||||
},
|
||||
dataset: 'data/test.jsonl',
|
||||
num_workers: 1,
|
||||
restart_server_per_task: false,
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
base_cdp_port: 9010,
|
||||
base_server_port: 9110,
|
||||
base_extension_port: 9310,
|
||||
load_extensions: false,
|
||||
headless: false,
|
||||
},
|
||||
graders: [],
|
||||
}
|
||||
}
|
||||
|
||||
const task: Task = {
|
||||
query_id: 'task-1',
|
||||
dataset: 'test',
|
||||
query: 'Find the title',
|
||||
graders: [],
|
||||
metadata: {
|
||||
original_task_id: 'task-1',
|
||||
},
|
||||
}
|
||||
|
||||
describe('ClaudeCodeEvaluator', () => {
|
||||
it('accepts claude-code config defaults without permission mode', () => {
|
||||
const agent = AgentConfigSchema.parse({ type: 'claude-code' })
|
||||
|
||||
expect(agent).toEqual({
|
||||
type: 'claude-code',
|
||||
claudePath: 'claude',
|
||||
extraArgs: [],
|
||||
})
|
||||
})
|
||||
|
||||
it('accepts claude-code as a runnable eval agent', () => {
|
||||
const parsed = EvalConfigSchema.parse({
|
||||
agent: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
},
|
||||
dataset: 'data/test-set.jsonl',
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
},
|
||||
})
|
||||
|
||||
expect(parsed.agent.type).toBe('claude-code')
|
||||
expect(parsed.agent.model).toBe('opus')
|
||||
})
|
||||
|
||||
it('rejects unsupported claude-code settings instead of silently ignoring them', () => {
|
||||
expect(
|
||||
AgentConfigSchema.safeParse({
|
||||
type: 'claude-code',
|
||||
permissionMode: 'bypassPermissions',
|
||||
}).success,
|
||||
).toBe(false)
|
||||
expect(
|
||||
AgentConfigSchema.safeParse({
|
||||
type: 'claude-code',
|
||||
maxTurns: 3,
|
||||
}).success,
|
||||
).toBe(false)
|
||||
})
|
||||
|
||||
it('allows claude-code in task metadata', () => {
|
||||
const metadata = TaskMetadataSchema.parse({
|
||||
query_id: 'task-1',
|
||||
dataset: 'test',
|
||||
query: 'Do the thing',
|
||||
started_at: new Date().toISOString(),
|
||||
completed_at: new Date().toISOString(),
|
||||
total_duration_ms: 100,
|
||||
total_steps: 1,
|
||||
termination_reason: 'completed',
|
||||
final_answer: 'done',
|
||||
errors: [],
|
||||
warnings: [],
|
||||
agent_config: {
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
},
|
||||
grader_results: {},
|
||||
})
|
||||
|
||||
expect(metadata.agent_config.type).toBe('claude-code')
|
||||
})
|
||||
|
||||
it('is created by the agent factory', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
|
||||
const agent = createAgent({
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
})
|
||||
|
||||
expect(agent).toBeInstanceOf(ClaudeCodeEvaluator)
|
||||
})
|
||||
|
||||
it('runs claude code, logs messages, writes MCP config, and saves metadata', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
const calls: Array<{ executable: string; args: string[]; cwd: string }> = []
|
||||
const evaluator = new ClaudeCodeEvaluator(
|
||||
{
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
},
|
||||
{
|
||||
processRunner: {
|
||||
async run(options) {
|
||||
calls.push(options)
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [{ type: 'text', text: 'The title is Example' }],
|
||||
},
|
||||
}),
|
||||
)
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'The title is Example',
|
||||
}),
|
||||
)
|
||||
return { exitCode: 0, stderr: '' }
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
const result = await evaluator.execute()
|
||||
|
||||
expect(result.finalAnswer).toBe('The title is Example')
|
||||
expect(result.metadata.agent_config).toMatchObject({
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
expect(result.messages.some((msg) => msg.type === 'user')).toBe(true)
|
||||
expect(result.messages.some((msg) => msg.type === 'text-delta')).toBe(true)
|
||||
const mcpConfig = JSON.parse(
|
||||
await readFile(join(taskOutputDir, 'claude-code-mcp.json'), 'utf-8'),
|
||||
)
|
||||
expect(mcpConfig.mcpServers.browseros).toMatchObject({
|
||||
type: 'http',
|
||||
url: 'http://127.0.0.1:9110/mcp',
|
||||
headers: {
|
||||
'X-BrowserOS-Source': 'sdk-internal',
|
||||
},
|
||||
})
|
||||
expect(calls).toEqual([
|
||||
expect.objectContaining({
|
||||
executable: 'claude',
|
||||
cwd: taskOutputDir,
|
||||
args: [
|
||||
'-p',
|
||||
expect.stringContaining('Task: Find the title'),
|
||||
'--mcp-config',
|
||||
join(taskOutputDir, 'claude-code-mcp.json'),
|
||||
'--strict-mcp-config',
|
||||
'--output-format',
|
||||
'stream-json',
|
||||
'--verbose',
|
||||
'--model',
|
||||
'opus',
|
||||
],
|
||||
}),
|
||||
])
|
||||
expect(calls[0].args).not.toContain('--permission-mode')
|
||||
})
|
||||
|
||||
it('records non-fatal stream processing errors as warnings', async () => {
|
||||
const outputDir = await mkdtemp(join(tmpdir(), 'claude-code-eval-'))
|
||||
const { capture, taskOutputDir } = await CaptureContext.create({
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
outputDir,
|
||||
taskId: task.query_id,
|
||||
initialPageId: 1,
|
||||
})
|
||||
const evaluator = new ClaudeCodeEvaluator(
|
||||
{
|
||||
config: config(),
|
||||
task,
|
||||
workerIndex: 0,
|
||||
initialPageId: 1,
|
||||
outputDir,
|
||||
taskOutputDir,
|
||||
capture,
|
||||
},
|
||||
{
|
||||
processRunner: {
|
||||
async run(options) {
|
||||
await options.onStdoutLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'done',
|
||||
}),
|
||||
)
|
||||
return {
|
||||
exitCode: 0,
|
||||
stderr: '',
|
||||
streamErrors: ['bad stream line'],
|
||||
}
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
const result = await evaluator.execute()
|
||||
|
||||
expect(result.finalAnswer).toBe('done')
|
||||
expect(result.metadata.warnings).toEqual([
|
||||
expect.objectContaining({
|
||||
source: 'message_logging',
|
||||
message: 'Claude Code stream event processing failed: bad stream line',
|
||||
}),
|
||||
])
|
||||
})
|
||||
})
|
||||
@@ -1,78 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { chmod, mkdtemp, writeFile } from 'node:fs/promises'
|
||||
import { tmpdir } from 'node:os'
|
||||
import { join } from 'node:path'
|
||||
import { createClaudeCodeProcessRunner } from '../../src/agents/claude-code/process-runner'
|
||||
|
||||
async function writeStdoutScript(): Promise<string> {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'claude-code-runner-'))
|
||||
const script = join(dir, 'stdout-lines')
|
||||
await writeFile(script, '#!/bin/sh\nprintf "first\\nbad\\nlast\\n"\n')
|
||||
await chmod(script, 0o755)
|
||||
return script
|
||||
}
|
||||
|
||||
describe('createClaudeCodeProcessRunner', () => {
|
||||
it('passes executable and args to the spawn dependency', async () => {
|
||||
const calls: unknown[] = []
|
||||
const runner = createClaudeCodeProcessRunner({
|
||||
spawn: async (cmd, options) => {
|
||||
calls.push({ cmd, options })
|
||||
await options.onStdoutLine('{"type":"result","result":"done"}')
|
||||
return { exitCode: 0, stderr: '' }
|
||||
},
|
||||
})
|
||||
|
||||
const result = await runner.run({
|
||||
executable: 'claude',
|
||||
args: ['-p', 'hello'],
|
||||
cwd: '/tmp',
|
||||
signal: new AbortController().signal,
|
||||
onStdoutLine: async () => {},
|
||||
})
|
||||
|
||||
expect(result.exitCode).toBe(0)
|
||||
expect(calls).toEqual([
|
||||
{
|
||||
cmd: ['claude', '-p', 'hello'],
|
||||
options: expect.objectContaining({ cwd: '/tmp' }),
|
||||
},
|
||||
])
|
||||
})
|
||||
|
||||
it('returns stderr and non-zero exit codes', async () => {
|
||||
const runner = createClaudeCodeProcessRunner({
|
||||
spawn: async () => ({ exitCode: 2, stderr: 'bad auth' }),
|
||||
})
|
||||
|
||||
const result = await runner.run({
|
||||
executable: 'claude',
|
||||
args: [],
|
||||
cwd: '/tmp',
|
||||
signal: new AbortController().signal,
|
||||
onStdoutLine: async () => {},
|
||||
})
|
||||
|
||||
expect(result).toEqual({ exitCode: 2, stderr: 'bad auth' })
|
||||
})
|
||||
|
||||
it('continues reading stdout after a line handler error', async () => {
|
||||
const script = await writeStdoutScript()
|
||||
const lines: string[] = []
|
||||
const runner = createClaudeCodeProcessRunner()
|
||||
|
||||
const result = await runner.run({
|
||||
executable: script,
|
||||
args: [],
|
||||
cwd: '/tmp',
|
||||
onStdoutLine: async (line) => {
|
||||
lines.push(line)
|
||||
if (line === 'bad') throw new Error('bad line')
|
||||
},
|
||||
})
|
||||
|
||||
expect(result.exitCode).toBe(0)
|
||||
expect(result.streamErrors).toEqual(['bad line'])
|
||||
expect(lines).toEqual(['first', 'bad', 'last'])
|
||||
})
|
||||
})
|
||||
@@ -1,102 +0,0 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import {
|
||||
ClaudeCodeStreamParser,
|
||||
shouldCaptureScreenshotForTool,
|
||||
} from '../../src/agents/claude-code/stream-parser'
|
||||
|
||||
describe('ClaudeCodeStreamParser', () => {
|
||||
it('maps assistant text and MCP tool use into eval stream events', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const events = parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [
|
||||
{ type: 'text', text: 'I will navigate.' },
|
||||
{
|
||||
type: 'tool_use',
|
||||
id: 'toolu_1',
|
||||
name: 'mcp__browseros__navigate_page',
|
||||
input: { page: 2, url: 'https://example.com' },
|
||||
},
|
||||
],
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
expect(events).toEqual([
|
||||
{ type: 'text-start', id: expect.any(String) },
|
||||
{
|
||||
type: 'text-delta',
|
||||
id: expect.any(String),
|
||||
delta: 'I will navigate.',
|
||||
},
|
||||
{ type: 'text-end', id: expect.any(String) },
|
||||
{
|
||||
type: 'tool-input-available',
|
||||
toolCallId: 'toolu_1',
|
||||
toolName: 'mcp__browseros__navigate_page',
|
||||
input: { page: 2, url: 'https://example.com' },
|
||||
},
|
||||
])
|
||||
expect(parser.getLastText()).toBe('I will navigate.')
|
||||
expect(parser.getToolCallCount()).toBe(1)
|
||||
})
|
||||
|
||||
it('maps Claude Code tool results into eval output events', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
const events = parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'user',
|
||||
message: {
|
||||
content: [
|
||||
{
|
||||
type: 'tool_result',
|
||||
tool_use_id: 'toolu_1',
|
||||
content: 'Navigated successfully',
|
||||
},
|
||||
],
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
expect(events).toEqual([
|
||||
{
|
||||
type: 'tool-output-available',
|
||||
toolCallId: 'toolu_1',
|
||||
output: 'Navigated successfully',
|
||||
},
|
||||
])
|
||||
})
|
||||
|
||||
it('uses result messages as the authoritative final text', () => {
|
||||
const parser = new ClaudeCodeStreamParser()
|
||||
parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: {
|
||||
content: [{ type: 'text', text: 'I will complete the task.' }],
|
||||
},
|
||||
}),
|
||||
)
|
||||
parser.pushLine(
|
||||
JSON.stringify({
|
||||
type: 'result',
|
||||
subtype: 'success',
|
||||
result: 'Final answer',
|
||||
}),
|
||||
)
|
||||
|
||||
expect(parser.getLastText()).toBe('Final answer')
|
||||
})
|
||||
|
||||
it('identifies BrowserOS MCP tools that should trigger screenshots', () => {
|
||||
expect(
|
||||
shouldCaptureScreenshotForTool('mcp__browseros__navigate_page'),
|
||||
).toBe(true)
|
||||
expect(
|
||||
shouldCaptureScreenshotForTool('mcp__browseros__take_screenshot'),
|
||||
).toBe(false)
|
||||
expect(shouldCaptureScreenshotForTool('Read')).toBe(false)
|
||||
})
|
||||
})
|
||||
@@ -1,10 +1,8 @@
|
||||
import { describe, expect, it } from 'bun:test'
|
||||
import { CladoExecutorBackend } from '../../src/agents/orchestrated/backends/clado/clado-executor-backend'
|
||||
import {
|
||||
backendKindForProvider,
|
||||
createExecutorBackend,
|
||||
} from '../../src/agents/orchestrated/backends/create-executor-backend'
|
||||
import { ToolLoopExecutorBackend } from '../../src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend'
|
||||
import type { ExecutorBackend } from '../../src/agents/orchestrated/executor-backend'
|
||||
|
||||
describe('executor backend boundary', () => {
|
||||
@@ -13,32 +11,6 @@ describe('executor backend boundary', () => {
|
||||
expect(backendKindForProvider('openai-compatible')).toBe('tool-loop')
|
||||
})
|
||||
|
||||
it('creates concrete backend classes for each executor path', () => {
|
||||
expect(
|
||||
createExecutorBackend({
|
||||
backendKind: 'tool-loop',
|
||||
configTemplate: {
|
||||
provider: 'openai-compatible',
|
||||
model: 'tool-loop-model',
|
||||
},
|
||||
browser: null,
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
}),
|
||||
).toBeInstanceOf(ToolLoopExecutorBackend)
|
||||
|
||||
expect(
|
||||
createExecutorBackend({
|
||||
backendKind: 'clado',
|
||||
configTemplate: {
|
||||
provider: 'clado-action',
|
||||
model: 'clado-model',
|
||||
baseUrl: 'https://clado.example.test',
|
||||
},
|
||||
serverUrl: 'http://127.0.0.1:9110',
|
||||
}),
|
||||
).toBeInstanceOf(CladoExecutorBackend)
|
||||
})
|
||||
|
||||
it('forwards execution and step state through the backend interface', async () => {
|
||||
const signal = new AbortController().signal
|
||||
const fakeBackend: ExecutorBackend = {
|
||||
@@ -61,6 +33,7 @@ describe('executor backend boundary', () => {
|
||||
}
|
||||
|
||||
const backend = createExecutorBackend({
|
||||
backendKind: 'tool-loop',
|
||||
executor: fakeBackend,
|
||||
})
|
||||
const result = await backend.execute('Click checkout', signal)
|
||||
|
||||
@@ -7,11 +7,8 @@ import {
|
||||
runSuiteCommand,
|
||||
} from '../../src/cli/commands/suite'
|
||||
import type { RunEvalOptions } from '../../src/runner/types'
|
||||
import type { EvalSuite } from '../../src/suites/schema'
|
||||
|
||||
async function writeTempSuite(
|
||||
overrides: Partial<EvalSuite> = {},
|
||||
): Promise<{ dir: string; suitePath: string }> {
|
||||
async function writeTempSuite(): Promise<{ dir: string; suitePath: string }> {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'eval-suite-cli-'))
|
||||
const suitePath = join(dir, 'agisdk-daily-10.json')
|
||||
await writeFile(
|
||||
@@ -26,9 +23,8 @@ async function writeTempSuite(
|
||||
restartBrowserPerTask: true,
|
||||
browseros: {
|
||||
server_url: 'http://127.0.0.1:9110',
|
||||
headless: false,
|
||||
headless: true,
|
||||
},
|
||||
...overrides,
|
||||
},
|
||||
null,
|
||||
2,
|
||||
@@ -47,7 +43,9 @@ describe('suite command', () => {
|
||||
|
||||
expect(resolved.kind).toBe('config')
|
||||
expect(resolved.suite.id).toBe('browseros-agent-weekly')
|
||||
expect(resolved.evalConfig.dataset).toBe('../../data/agisdk-real.jsonl')
|
||||
expect(resolved.evalConfig.dataset).toBe(
|
||||
'../../data/webbench-2of4-50.jsonl',
|
||||
)
|
||||
expect(resolved.variant.publicMetadata.agent.apiKeyConfigured).toBe(true)
|
||||
})
|
||||
|
||||
@@ -77,25 +75,6 @@ describe('suite command', () => {
|
||||
expect(resolved.evalConfig.num_workers).toBe(2)
|
||||
})
|
||||
|
||||
it('resolves claude-code suites without provider API credentials', async () => {
|
||||
const { dir, suitePath } = await writeTempSuite({
|
||||
agent: { type: 'claude-code' },
|
||||
})
|
||||
|
||||
const resolved = await resolveSuiteCommand({
|
||||
suitePath,
|
||||
model: 'opus',
|
||||
env: {},
|
||||
})
|
||||
|
||||
expect(resolved.kind).toBe('suite')
|
||||
expect(resolved.evalConfig.agent).toMatchObject({
|
||||
type: 'claude-code',
|
||||
model: 'opus',
|
||||
})
|
||||
expect(resolved.datasetPath).toBe(join(dir, 'tasks.jsonl'))
|
||||
})
|
||||
|
||||
it('runs config and suite commands through the runner dependency', async () => {
|
||||
const calls: RunEvalOptions[] = []
|
||||
await runSuiteCommand(
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user