* feat(agent): /home composer parity with image attachments
The /home composer used the same ConversationInput component as the
chat screen but passed attachmentsEnabled={false}, and the home →
chat handoff was a URL search param `?q=<text>` that physically
can't carry binary attachments. Pasting a screenshot at /home did
nothing.
Add a small in-memory registry (pending-initial-message.ts) as the
rich-data side channel for the same navigation: the home composer
writes { agentId, text, attachments } there before navigating; the
chat screen consumes it on mount and replays through the existing
harness send() path that already supports attachments. URL `?q=`
stays for shareable text-only prompts; the registry wins when both
are present. Module-scope, 10s TTL, destructive consume.
Net: home is now flagged attachmentsEnabled={true}; users can paste,
drag, or pick image files at /home and they survive the navigation
into the chat screen with previews intact.
* docs(agent): clarify why initial-message ref reset is safe post-registry-fire
* feat(agent): rich rail + header on /agents/:agentId chat
Replace the chat screen's legacy AgentEntry rail and binary READY
header with the same rich data the /agents page already exposes:
adapter glyph, liveness dot, pin star, status badge, adapter · model ·
reasoning chip line, last-used time, lifetime tokens, queue count,
and the Adapter Unavailable warning. Source of truth flips from the
merged AgentEntry list to useHarnessAgents() directly.
Sort order matches /agents (pinned → recency) — not /home
(active-first → recency) — because chat is index-shaped and shuffling
rows every 5s as turns transition would be jarring while reading.
Lift the inline pin-then-recency comparator out of /agents
AgentList.tsx into a shared agents-list-order.ts so both surfaces
stay on identical sort semantics.
* fix(agent): chat header height + composer sticking to bottom
Header was clipping descenders because the strip was vertical-content
sized at min-h-14 with tight py-2.5; bump padding and lean on natural
content height. Drop the AgentTile glyph (the rail row already shows
adapter identity) and the cwd path (too long, pushed the meta line
off-screen). Header is now name + pin star + status pill, then
adapter · model · reasoning, then last-used · tokens · queued.
Composer was floating mid-screen on short chats because the chat
grid had no grid-template-rows — the implicit auto row collapsed to
content height, so the right-column flex wrapper never received the
full container height. Add grid-rows-[minmax(0,1fr)] so the single
row claims 100% and ClawChat's flex-1 expands to push the composer
flush to the bottom.
* fix(agent): composer flush to bottom on short chats
Match the sidepanel chat's nested-flex pattern. The right-column
wrapper got h-full so it expands to the grid row; the conversation
controller's root added flex-1 so ClawChat's existing flex-1 has
something to actually fill against. Without these, the grid cell
stretched but the inner flex columns shrank to content height,
leaving the composer floating mid-screen.
* fix(agent): align rail header with chat header in shared top band
Pull the rail's "Agents" + back-button into the same horizontal strip
as the agent identity header. The two halves now sit on a single row
that spans both columns, so they can't drift in height as the chat
header gains/loses meta lines (last-used, tokens, queued).
The rail below the band keeps its scrollable list only; the chat
column below holds the conversation + composer. Border-bottom moves
from ConversationHeader to the band wrapper so we don't get a
double-rule on the boundary.
* fix(agent): reserve header height to prevent layout shift on data load
The chat header grew from a single line to three lines once the
useHarnessAgents() poll resolved (adapter chips + meta line populate
asynchronously), shoving the rail and conversation body downward.
Lock min-h-[84px] on both the band's left "Agents" cell and the
ConversationHeader root, and always render the meta line slot
(non-breaking space when empty) so the typographic frame is stable
regardless of data state.
* refactor(agent): pull status pill + meta to right side of chat header
Two-column header layout instead of three stacked rows: name + pin
star + adapter chips on the left, status pill stacked on top of the
last-used / tokens / queued meta line on the right. Drops min-h
from 84px → 60px so the band reclaims ~24px of vertical space and
the chat body starts higher on screen. Band's left "Agents" cell
matches the new height.
* fix(agents): hide BrowserOS ACP envelope from chat history payloads (TKT-774)
The user-message text persisted on the wire carried two nested
envelopes — the outer `<role>You are BrowserOS…</role>` +
`<user_request>…</user_request>` block from buildBrowserosAcpPrompt
and the inner `## Browser Context` + `<selected_text>` +
`<USER_QUERY>` block from formatUserMessage. PR #856 had unwrapped
only the outer envelope on history reads, so the user bubble in
the agent rail still rendered the inner envelope, and the LLM
chat-service path leaked the wrapper all the way back to the
sidepanel client through AI SDK's stream sync.
Two surgical fixes, both server-only:
1) ACP path (acpx-runtime.ts) — replace unwrapBrowserosAcpPrompt
with a comprehensive unwrapBrowserosAcpUserMessage that strips
both layers and decodes the </>/& escapes the server
applied via escapePromptTagText. Each step is independently
defensive (anchors that don't match are skipped) so the helper
is idempotent and tolerates partial / older / future-shape
envelopes. Applied in userContentToText (history mapper) and
inherited by extractLastUserMessage (listing's lastUserMessage).
2) LLM chat path (chat-service.ts) — split the persisted user
message from the prompt-time copy. session.agent.appendUserMessage
now stores the raw user text; a transient promptUiMessages array
is built with the wrapped (formatUserMessage + context-change
prefix) form and passed to createAgentUIStreamResponse for the
model. onFinish restores the raw form before persisting, so the
user-visible message and any future history reads see only the
user's typed text.
Tests:
- acpx-runtime.test.ts: new dedicated unwrapBrowserosAcpUserMessage
suite covering fully-wrapped messages, only-outer / only-inner
inputs, selected_text blocks with attribute strings, idempotency,
literal user-typed angle-bracket round-trip, and an integration
test that round-trips the real formatUserMessage output through
the unwrap to pin the writer/reader contract.
- chat-service.test.ts: existing 'rebuilds a managed-app session'
test updated for the new behaviour — asserts the persisted user
message is the raw text and the prompt copy passed to the agent
carries the Klavis context-change notice.
* fix(agents): decode entity escapes before stripping inner envelope (TKT-774)
The unwrap was running its inner-envelope strips against the
literal-tag form (<USER_QUERY>, <selected_text>) but the persisted
payload has those tags entity-escaped (<USER_QUERY>,
<selected_text>) — buildBrowserosAcpPrompt runs
escapePromptTagText over the entire formatUserMessage payload
before adding the outer <role>+<user_request> envelope, so the
inner anchors never matched against the on-disk text and the user
was still seeing <USER_QUERY> in /agents/:id/sessions/main/history
responses.
Reorder unwrapBrowserosAcpUserMessage to: outer-strip → decode
entities → inner-strips. Test fixtures updated to reflect the
actual on-wire form (escaped inner tags); the round-trip test
duplicates the escape rule inline so the contract between
buildBrowserosAcpPrompt and the unwrap is pinned end-to-end.
Without a token on actions/checkout, the action falls back to
GITHUB_TOKEN, which has no access to the private internal-docs
repo. Submodule clone fails with "repository not found".
PAT is back on checkout. PR ops still use GITHUB_TOKEN via the
GH_TOKEN env var on the run step. The bot-branch git push uses
the credential helper set up by checkout (the PAT, which has
Contents: Read and write).
Direct push to dev fails the dev ruleset's "Require pull request"
rule. Open a tiny PR from a bot branch and enable auto-merge
(squash, 0 approvals required) instead. No bypass actor needed —
the rule stays strict for everyone, including the bot.
PR ops use GITHUB_TOKEN with explicit pull-requests: write
permission. The cross-repo PAT is only used to rewrite the SSH
submodule URL so internal-docs can be cloned over HTTPS.
Mounts browseros-ai/internal-docs at .internal-docs/, tracking main.
This activates the /document-internal and /ask-internal skills (which
early-exit if the submodule is missing) and lets the sync-internal-docs
workflow start bumping the pointer on its 4-hourly schedule.
Team members: after this lands, run once from a fresh dev pull:
git submodule update --init .internal-docs
* feat(internal-docs): scaffold private docs submodule, skills, sync action
Adds the OSS-side scaffolding for the internal-docs system:
- /document-internal skill — drafts a 1-page feature/architecture/design
doc from the current branch's diff, asks four sharp questions, enforces
voice rules (no em dashes, banned filler words, 60-line cap on feature
notes), then opens a PR to browseros-ai/internal-docs via a tmp clone.
- /ask-internal skill — answers team-internal questions by greping
internal-docs and the codebase, synthesizing with file:line citations,
optionally executing surfaced commands with per-command confirmation,
and drafting a new doc + PR if grep returns nothing useful.
- .github/workflows/sync-internal-docs.yml — every 4 hours, bumps the
submodule pointer on dev directly (no PR; relies on dev branch
protection blocking force-push). Skips silently until the submodule
is configured. Uses url.insteadOf to rewrite the SSH submodule URL
to HTTPS-with-token for the bot, while keeping SSH the local default.
- .claude/skills/document-internal/seeds/ — root README and three
templates (feature-note, architecture-note, design-spec) ready to
copy into the new internal-docs repo on rollout.
Design spec: .llm/superpowers/specs/2026-04-30-internal-docs-submodule-design.md
Manual prereqs (NOT in this PR — handled out-of-band):
1. Create private repo browseros-ai/internal-docs with branch protection on main.
2. Seed it with the contents of .claude/skills/document-internal/seeds/.
3. Create a bot account, mark as bypass actor on dev branch protection.
4. Add INTERNAL_DOCS_SYNC_TOKEN secret with repo + read access to internal-docs.
5. Once internal-docs exists, on a follow-up branch:
git submodule add -b main git@github.com:browseros-ai/internal-docs.git .internal-docs
6. Send the team the one-time init snippet for their existing checkouts:
git submodule update --init .internal-docs
* fix(internal-docs): address Greptile review feedback
- Workflow: rebase onto dev before push to handle non-fast-forward race;
bump fetch-depth 1->50 so rebase has merge-base history.
- Workflow: move INTERNAL_DOCS_SYNC_TOKEN into step env: per Actions
credential-injection pattern, instead of inlining in the script body.
- Skill (BASE bug): suppress git rev-parse stdout so SHA does not get
captured into BASE alongside the literal 'dev'. Was breaking every
downstream git log/diff call.
- Skill (tmp clone): trap 'rm -rf "$TMP" EXIT after mktemp so cleanup
always runs, even if any subsequent step fails.
* feat(agents): durable per-agent chat message queue + composer Stop button
* fix(agents): tighten queue UI — smaller Stop, drop empty indicator, live drain attach
User feedback round 1 on the message-queue UX:
1) The Stop button matched the send/voice mics at h-10 w-10 with a
solid destructive fill, which read as alarming. Shrunk to h-8 w-8,
ghost variant with a soft destructive/10 background, smaller
filled square glyph. Reads as a calm 'stop' affordance instead of
a panic button.
2) The QueueItem's leading <QueueItemIndicator> dot was decorative
only — no state, no interaction. Dropped it from QueuePanel along
with the import; queue items now render as a clean preview line
with the trailing X remove action.
3) When the server drained the queue and started the next turn, the
chat panel didn't pick up the live stream until the user
navigated away and back. The hook's resume effect previously
only fired on agent change, not on listing-observed activeTurnId
change. Surface activeTurnId from useHarnessAgents into
useAgentConversation; effect now re-runs when the id changes,
calls /chat/active, and attaches to the new turn — so a queued
message starts streaming the moment the server drain pops it.
* fix(agents): don't reset streaming state from the resume effect's no-op paths
The Stop button was disappearing while the agent was actively
streaming, even though events were still flowing into the chat. Root
cause: the resume effect's `finally` block reset `streaming`,
`turnIdRef`, and `lastSeqRef` unconditionally — including on the
early-return paths (no active turn, or another mechanism already
owns the stream).
Sequence that triggered it:
1) User sends a message → send() sets streamAbortRef + streaming=true
and starts consuming the SSE.
2) User enqueues another message → enqueue mutation invalidates the
listing query.
3) Listing refetches with the live activeTurnId → the resume
effect re-fires (deps include activeTurnIdDep).
4) attemptResume hits `if (streamAbortRef.current) return` because
send() owns it.
5) The finally clause fires anyway and calls setStreaming(false),
clobbering the live state set by send(). The SSE consumer keeps
running (refs are intact) so text keeps streaming, but the React
flag is wrong, so the Stop button gates off.
Fix: track whether *this* run actually started a stream
(`weStartedStream`). The finally only resets state when it does.
Early-return / no-active-turn paths now leave streaming/turnIdRef/
lastSeqRef alone for whoever does own them.
Also widens the Stop button's visibility (`canStop` prop on
ConversationInput) so it stays steady across the brief gap between
turns when a queue drain is mid-flight; the parent computes
`streaming || activeTurnId !== null || queue.length > 0`. The
visibility widening is independent of the streaming-state fix above
— both are now in place.
* revert: drop canStop widening — Stop only shows while streaming
Reverts the canStop prop on ConversationInput and the OR-with-queue
visibility from AgentCommandConversation. Stop is gated solely on
`streaming` again. Between turns (queue draining) the button stays
hidden — only the actively-streaming turn is interruptible from the
composer, which matches what the user actually expects.
* fix(agents): persist the kicking-off prompt on active turns so the resume placeholder isn't empty
When a queued message drained and started a new turn, the chat
panel's resume effect staged a placeholder turn with userText: ''
because the hook had no way to know what message kicked off the
turn — only the agent-side stream was visible, and the user bubble
above it was blank until the user navigated away and back (at which
point the session record's history loaded normally).
Fix: ActiveTurnRegistry.register now accepts an optional `prompt`
that's stashed on the turn and surfaced via describe() / the
ActiveTurnInfo response. AgentHarnessService.startTurn passes the
incoming message into register. /chat/active returns it. The chat
hook's resume effect uses active.prompt as the placeholder
turn's userText, so the user bubble shows the queued message text
the moment streaming begins. Falls back to '' for older clients
that haven't been refetched yet.
* fix(agents): always release streamAbortRef on resume cleanup, even when cancelled
Greptile P1 follow-up. The previous `weStartedStream` guard correctly
stopped the resume effect's no-op early-returns from clobbering an
in-flight `send()` stream — but it also stopped a *cancelled*
mid-stream resume from clearing its own `streamAbortRef`. When the
cleanup fires (e.g. the 5s listing poll captures a new queue-drain
turn id while the SSE for the prior turn is still finishing), the
next effect run hits the `if (streamAbortRef.current) return` guard
against the now-aborted controller and never reattaches, leaving
`streaming === true` with no live stream until the user navigates
away.
Split the finally block: always release `streamAbortRef` when we
owned the controller (so the next run can take over), but only
reset the streaming flag / turn id / lastSeq on a clean exit (the
new run will set those itself, so resetting on cancel would just
flicker).
* feat(agents): rich-info command center rows + pin/PATCH/adapter-health backbone
Splits AgentRowCard from a 271-line monolith into a shallow tree of
single-responsibility sub-components under `agent-row/`:
AgentTile, AdapterHealthDot, PinToggle, AgentTitleRow,
AgentSparkline, AgentSummaryChips, AgentLastMessage, CwdChip,
AgentTokenSummary, AgentMetaRow, AgentErrorPanel, AgentActions
Adds the data each row consumes:
- pinned: boolean field on AgentDefinition + FileAgentStore.update
+ new PATCH /agents/:id route. useUpdateHarnessAgent mutation
optimistically updates the listing cache so the star flips
instantly; rolls back on error.
- Listing payload extended with lastUserMessage, cwd, tokens
(cumulative + last7d shape — last7d zero-filled until the
activity ledger lands), turnsByDay/failedByDay (zero-filled),
lastError/lastErrorAt, activeTurnId. AcpxRuntime grows a
getRowSnapshot() that reads cwd + cumulative tokens + last user
message from the session record in one pass.
- Adapter health: in-memory AdapterHealthChecker probes
`claude --version` / `codex --version` with a 2s timeout and
caches results for 5 min. /adapters response carries
{ healthy, reason?, checkedAt }. Tile-corner dot exposes the
state via HoverCard; openclaw inherits health from the gateway
snapshot already on the page.
Sub-components are pure: card itself owns no state. Sort order
becomes pinned-first, then recency. HoverCard is the workhorse for
keeping rows compact while exposing depth (full message, token
breakdown, daily turn list, error stack, adapter reason).
* refactor(agents): tighten command-center row design + cut redundant affordances
User feedback round 1:
1) Two green dots on the tile (health + liveness) was confusing. Health
moves out of the tile entirely and surfaces as an inline 'Unavailable'
chip in the model line — silent when the adapter is healthy, with a
warning amber chip + HoverCard reason when not. The tile now shows
one signal: liveness.
2) The last-user-message HoverCard wasn't telegraphing intent. Drop the
HoverCard. The line is informational, italic, with a leading quote
glyph so the row reads like a conversation snippet. To see the full
message the user opens the chat (which is the action they want next
anyway).
3) Resume + Chat were duplicate CTAs. Single primary action per row:
Resume (filled, accent-orange, with a pulsing dot) replaces Chat
when there's an active turn. Both navigate to /agents/:id but the
row tells the user which action they're taking.
4) Tokens weren't visible because the row gated on last7d.requestCount,
which is zero until the activity ledger ships. Switch to lifetime
tokens (which we have today). Drop the '7d stats:' framing — talking
about a window we can't compute would be misleading. The HoverCard
surfaces input/output split + a footnote that per-window stats land
in a follow-up.
5) CWD was rendering the server's own running directory, which is
meaningless to users. Hide it from the row entirely. The cwd field
still rides in the listing payload for future surfaces (chat panel,
debug view) — only the row stops rendering it.
Aesthetic refinements while we're here:
- Whole card carries state, not just the tile: working rows get an
accent-orange tinted border with a soft glow, error rows tint
destructive, idle rows lift on hover.
- Pin star fades in on hover (group-hover) when unpinned and stays
solid amber when pinned — keeps the rail calm by default.
- Tabular-nums on token figures so columns visually align across rows.
- Drop CwdChip and AdapterHealthDot files: no callers left.
* fix(agents): align row title flush-left whether pinned or not
Pin star moved from leading the title to trailing the badges, and
hidden from layout entirely (`hidden group-hover:inline-flex`) when
unpinned. The previous `opacity-0` rule kept the star reserving its
`size-6` slot, which left every unpinned title indented relative to
the model / preview / meta lines underneath it. Title now flushes
left in both states; pinned star stays solid amber so the signal
isn't hidden, and unpinned reveals an outline star on row hover for
the toggle affordance.
* fix(agents): keep pin-toggle slot reserved so row height is constant
Switching the unpinned star from `hidden group-hover:inline-flex`
to `opacity-0 group-hover:opacity-100`. The hidden/show variant was
collapsing the title row's height when the star wasn't rendered,
which made every card below visibly shift on hover. Always rendering
the button (with opacity-only visibility) keeps the row's vertical
metrics constant; the title still flushes left because the slot is
trailing, not leading.
Card hover effect (-translate-y + shadow-md) restored — the layout
shift wasn't coming from the card hover; it was the pin slot
appearing and disappearing.
* fix(agents): quieten row hover — border-tint only, no lift, no shadow
Drop the `-translate-y-px` and `hover:shadow-md` from the row card
plus the working-state inner ring. The translate + shadow grow
combination was visibly noisy as the cursor moved through the rail —
each row 'lifted' as you passed over it. Hover now just tints the
border in accent-orange/30; working and error states keep their
distinct border colours but no inner ring. Card height and shadow
stay constant in every state, so the rail reads as a calm vertical
list of cards.
* feat(home): rich Recent Agents grid + dead-code sweep
The /home Recent Agents grid was a placeholder shell. Every 'rich'
field on the card (lastMessage, lastMessageTimestamp, activitySummary,
currentTool, costUsd) was wired to undefined because AgentCommandHome
called `buildAgentCardData(agents, status?.status, undefined)` — the
dashboard arg has been hard-coded undefined since the harness
migration. Repointing the grid at `useHarnessAgents` + `useAgentAdapters`
gives every card the same enriched data the rail uses.
What the new card shows per agent:
• Adapter glyph tile + liveness dot (working pulses; asleep is
hollow; error is red)
• Name + Working pill (when active)
• Adapter · model · reasoning summary line, with an inline
Unavailable chip + HoverCard reason when the adapter binary
isn't on $PATH
• Italic last-user-message preview (line-clamp-2, leading quote
glyph) — same visual language as the rail
• Footer: 'X ago' + state chip (Asleep / Attention) OR a Resume
button (orange, with pulsing dot) when activeTurnId is non-null
Sort on the home grid is active-turn → recency. Pinning is NOT a
sort key here (and there's no pin indicator on the card) — pinning
belongs to the rail at /agents; the home page is action-oriented
and trusts active-turn + recency to surface the right agent.
Dead code removed:
• useAgentDashboard.ts (96 lines, no callers; subscribed to the
dead /claw/dashboard/stream from the OpenClaw-only era)
• useAgentCardData.ts (the dashboard-merge shim; passed undefined
every call so all enriched fields landed as undefined)
• AgentCard.tsx (AgentCardExpanded replaced by HomeAgentCard;
AgentCardCompact had no callers — the dock's compact mode was
never used)
• AgentCardData interface dropped from lib/agent-conversations/
types.ts; the new card consumes HarnessAgent directly
Visual language stays continuous between rail and grid: same
<AgentTile>, same <LivenessDot>, same italic-quote message
preview, same orange Resume button with a pulsing dot.
* chore(eval): instrument server startup to root-cause dev CI health-check timeouts
Three diagnostics + one config swap to investigate why the eval-weekly
workflow has been failing on dev since 2026-04-25 with "Server health
check timed out" (every worker, every retry).
Background:
- Last successful weekly eval on dev: 2026-04-18 (sha f5a2b73)
- Since then, ~30 server commits landed including Lima/VM runtime,
OpenClaw service, ACL system, ACP SDK — 108 server files changed,
~13K LOC added.
- Server process spawns cleanly in CI (PID logged) but never binds
/health within the 30s eval-side timeout. Static analysis finds no
obvious blocker; we need runtime evidence.
Changes:
1. apps/server/package.json — add `start:ci` script (no `--watch`).
The default `start` uses `bun --watch` which forks a child process
that watches every file in the import graph. Dev's graph is ~108
files larger than main's; on a cold CI runner the watcher setup is a
plausible source of multi-second startup overhead.
2. apps/eval/src/runner/browseros-app-manager.ts:
- Use `start:ci` when `process.env.CI` is set (true on
GitHub-hosted runners by default), else `start`.
- Capture per-worker server stderr to /tmp/browseros-server-logs/
instead of ignoring it. Without this we have no visibility into
why the server is hung pre-/health.
- Bump SERVER_HEALTH_TIMEOUT_MS 30s -> 90s. Dev's larger module
graph may simply need more cold-start time on CI.
3. .github/workflows/eval-weekly.yml — upload the server logs dir as a
workflow artifact (always, not just on success) so we can post-mortem
any startup failure on the next run.
4. configs/agisdk-real-smoke.json — swap K2.5 from OpenRouter ->
Fireworks (bypasses the OpenRouter per-key spend cap that has been
eating recent runs) and drop num_workers 10 -> 4 (well below the
Fireworks per-account TPM threshold that overwhelmed the original
2026-04-23 run).
Plan: trigger the eval-weekly workflow on this branch with the agisdk
config and observe (a) whether it gets past server startup, and
(b) if it doesn't, what the captured server stderr says.
* fix(eval): capture stdout too — pino logger writes to stdout, not stderr
Previous diagnostic patch only redirected stderr; the captured per-worker
log files came back as 0 bytes because the server uses pino which writes
all log output to stdout (fd 1), not stderr (fd 2). Capture both into
the same file.
* fix(server): catch sync throw from OpenClaw constructor on Linux
The container runtime constructor in OpenClawService throws synchronously
on non-darwin platforms, e.g. GitHub Actions Linux runners. The existing
.catch() on tryAutoStart() only handles async throws inside auto-start —
the sync throw from configureOpenClawService(...) itself propagates up
through Application.start() and crashes the process via index.ts:48
(process.exit(EXIT_CODES.GENERAL_ERROR)).
This is what's been killing dev's eval-weekly CI: the server crashes in
milliseconds, the eval client polls /health, gets nothing, times out.
Fix: wrap the configureOpenClawService call in try/catch matching the
existing .catch() intent (best-effort, don't crash). Server continues
without OpenClaw on platforms where it can't initialize.
Verified by reading captured server stdout from run 25123195126:
Failed to start server: error: browseros-vm currently supports macOS only
at buildContainerRuntime (container-runtime-factory.ts:54:11)
at new OpenClawService (openclaw-service.ts:652:15)
at configureOpenClawService (openclaw-service.ts:1527:19)
at start (main.ts:127:5)
* fix(server): defer OpenClaw chat client port lookup to request time
apps/server/src/api/server.ts:149 was calling getOpenClawService().getPort()
synchronously when constructing the OpenClawGatewayChatClient inside the
createHttpServer object literal. On non-darwin platforms this throws via
the OpenClawService constructor → buildContainerRuntime, escaping the
try/catch added in 5cf7b765 (which only protected the configureOpenClawService
call further down in main.ts).
Every other getOpenClawService() reference in server.ts is already wrapped
in an arrow function. This was the lone holdout. Make it lazy too: change
the chat client constructor to take getHostPort: () => number instead of
hostPort: number, evaluate it inside streamTurn at request time. Behavior
on darwin is unchanged.
This unblocks dev's eval-weekly CI on Linux runners where OpenClaw isn't
available — the chat endpoint isn't exercised by the eval, so a deferred
throw is acceptable.
* fix(server): allow Linux to skip OpenClaw via BROWSEROS_SKIP_OPENCLAW=1
Earlier surgical fixes (try/catch in main.ts, lazy chat client port) didn't
unblock dev's Linux CI — same throw kept reproducing. Whether this is bun
caching stale stack frames or a missed eager call site, the safer move is
to fix it at the root: make buildContainerRuntime never throw on Linux
when the runner has explicitly opted out.
Adds BROWSEROS_SKIP_OPENCLAW env check alongside the existing NODE_ENV=test
escape hatch in container-runtime-factory.ts. When set, returns the existing
UnsupportedPlatformTestRuntime stub — server boots normally, /health binds,
any actual OpenClaw API call still fails loudly at request time.
eval-weekly.yml sets the flag for the Linux runner. Darwin behavior and
non-CI Linux behavior unchanged (without the flag they still throw).
* feat(eval): align Clado action executor with new endpoint contract
David Shan shared the updated Clado BrowserOS Action Model spec.
Changes to match it:
- Bump endpoint URL + model id to the 000159-merged checkpoint
(clado-ai--clado-browseros-action-000159-merged-actionmod-f4a6ef)
in browseros-oe-clado-weekly.json and the README example.
- CLADO_REQUEST_TIMEOUT_MS 120s → 360s. Cold start can take ~5 min;
the 2-min ceiling was failing every cold-start request.
- Treat HTTP 200 with action=null / parse_error as an INVALID step
instead of aborting the executor loop. The model can self-correct
on the next call. Cap consecutive parse failures at 3 to avoid
infinite loops.
- Capture final_answer from end actions. Surface it in the observation
back to the orchestrator so its task answer can use the model's
declared result.
- Add macOS Cmd-* key mappings (M-a, M-c, M-v, M-x → Meta+A/C/V/X).
- Switch screenshot format from webp → png to match the documented
"PNG or JPEG" contract.
* chore(eval): refresh test-clado-api script for new Clado contract
Updated the local smoke-test to match the new Clado endpoint and
response contract:
- New action + health URLs (000159-merged checkpoint).
- Drop the grounding-model branch (orchestrator-executor doesn't
use it; the README David shared only documents the action model).
- Health-check waits up to 6 minutes for cold start with a 30s
warning so the operator knows it's spinning up.
- Print every documented response field (action, x/y, text, key,
direction, amount, drag start/end, time, final_answer, thinking,
parse_error, inference_time_seconds).
- Three-step run that exercises a click, a typing continuation
with formatted history, and an end+final_answer probe.
* chore(eval): point clado weekly config at agisdk-real
Switches the orchestrator-executor + Clado weekly config to run on the
AGI SDK / REAL Bench task set with the deterministic agisdk_state_diff
grader. Matches the orchestrator-executor smoke target (Fireworks K2.5
orchestrator + Clado action executor) we want to track week-over-week.
* chore(eval): run clado weekly headless
Default to headless so the weekly job (and local repros) don't pop ten
visible Chrome windows. Set headless=false locally if you need to watch
a worker.
* fix(eval): address Greptile P1+P2 on server log fd handling
P1: openSync was outside the mkdirSync try/catch, so a swallowed mkdir
failure (e.g. unwritable custom BROWSEROS_SERVER_LOG_DIR) would leave the
log directory missing and crash the server spawn with ENOENT. Move openSync
into the same try block; fall back to /dev/null so spawn always succeeds.
P2: the log fd was opened on every server start but never closed. Each
restart attempt leaked one fd across all workers — over a long eval run
that could exhaust the process fd limit. Track the fd on the manager and
closeSync it in killApp() right after the server process exits (the child's
dup keeps the file open until it exits, so we don't truncate output).
* feat(agent): list created agents in sidepanel target catalog
* feat(agent): show created agents in sidepanel selector
* feat(server): add sidepanel chat route for created agents
* feat(agent): route sidepanel agent sends by agent id
* chore(agent): retire virtual sidepanel acp targets
* fix: address review feedback for PR #865