* fix: robust compaction with Pi-style token counting + overflow middleware
Root cause: getCurrentTokenCount() returned stale inputTokens from the
previous step, ignoring new tool results added to messages since that
step. A large tool output (DOM snapshot, page content) caused a token
jump that bypassed the compaction threshold check, leading to
context_length_exceeded errors (322K tokens sent, model max 262K).
Layer 1 — Accurate token counting (proactive):
- Adopt Pi coding agent's additive approach: base(inputTokens) +
outputTokens + estimate(trailing tool results)
- Trailing tool results are estimated by walking backwards from end of
messages array until a non-tool message is found
- Falls back to full estimation with safety multiplier when no real
usage data is available (first step of a turn)
Layer 2 — Context overflow middleware (reactive):
- LanguageModelV3Middleware that wraps doGenerate/doStream
- Catches context_length_exceeded errors at the model call level
- Truncates prompt (keeps system messages + most recent non-system
messages targeting 60% of context window)
- Retries the model call once
Verified end-to-end with real model (Gemini Flash Lite via OpenRouter)
on 16K context window: 4 compactions triggered correctly across 8
steps, no context_length_exceeded errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: adopt Pi-style overflow detection patterns + fix truncation edge case
- Replace 6 generic substring matches with 17 provider-specific regex
patterns from Pi coding agent (Anthropic, OpenAI, Google, xAI, Groq,
OpenRouter, Bedrock, Copilot, llama.cpp, LM Studio, MiniMax, Kimi,
Mistral, z.ai)
- Fix truncatePrompt edge case: when the last message alone exceeds the
target, keepFrom was never updated → empty non-system messages. Now
always keeps at least the most recent non-system message.
- Add runtime guard for LanguageModelV3 cast in ai-sdk-agent.ts
- Add tests for false-positive rejection and truncation edge case
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: return element coordinates in tool responses and DPR in screenshots
- click, hover, fill, drag now return resolved coordinates in response text
- take_screenshot returns devicePixelRatio for mapping coordinates to pixels
- Coordinates are in CSS pixels; multiply by DPR to get screenshot pixels
* fix: use Promise.allSettled in screenshot to prevent DPR eval from aborting capture
Runtime.evaluate for devicePixelRatio can fail on PDF pages or
chrome-extension pages. Using Promise.allSettled ensures the screenshot
still succeeds, falling back to DPR=1.
* fix: anchor agent to active tab page ID from browser context
Generalize the scheduled-task page anchoring instruction to all tasks.
The agent now always uses the page ID from Browser Context instead of
calling get_active_page or list_pages, preventing it from operating
on the wrong tab.
* fix: add chatMode guard and scope windowLine to scheduled tasks
- Skip page-context section in chat mode where list_pages is allowed
- Only show windowId instruction for scheduled tasks (hidden window)
* feat: integrate models.dev registry for auto-populated model defaults
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: fall back to upstream provider for model registry lookup
When the browseros meta-provider is used, the registry lookup now
also tries the upstream provider (e.g., openrouter, anthropic) so
that BrowserOS-hosted models get correct context window and image
support defaults.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add Object.hasOwn guards to prevent prototype chain lookup
Addresses Greptile review: bracket notation on the registry object
could return prototype-chain properties for keys like __proto__ or
constructor, bypassing the 404 guard in the route handler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add server-level instructions that get injected into the LLM system
prompt when external MCP clients (Claude Desktop, Cursor, Gemini CLI)
connect. Covers browser automation workflow, Klavis integration
discovery, and auth flow guidance.
The AI SDK agent (v2) was allowing all 54 browser tools in chat mode,
while the Gemini agent correctly restricted to 6 read-only tools.
Extract CHAT_MODE_ALLOWED_TOOLS to a shared constant and filter
browser tools in AiSdkAgent.create() when chatMode is true.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: expose Klavis MCP tools to external MCP clients
Connect to Klavis Strata at server startup and register discovered tools
on each per-request McpServer instance. This lets external MCP clients
(Claude Code, Gemini CLI) access Klavis-proxied integrations (Gmail,
Slack, GitHub, etc.) alongside browser tools.
- Add register-klavis-mcp.ts with connectKlavisProxy() and registerKlavisTools()
- Wire KlavisProxyHandle through server.ts -> mcp routes -> mcp-server
- Use structured logging and proper type imports
* fix: forward Klavis tool schemas and add shutdown cleanup
- Use zod-from-json-schema to convert Strata's JSON Schema to Zod,
so MCP clients see proper parameter names, types, and required fields
- Close Klavis proxy transport on server shutdown
- Move per-request Klavis tool registration logging to debug level
- Use proper type imports instead of inline import() types
- Fix connectKlavisProxy return type (never returns null)
* fix: add timeout to Klavis MCP connect/listTools and log shutdown errors
* fix: clear timeout timer and pre-compute Klavis tool schemas at startup
* fix: use client.close() instead of transport.close() for proper cleanup
## Summary
- Add `VITE_PUBLIC_KIMI_LAUNCH` feature flag controlling Kimi partnership branding
- BrowserOS provider card shows "Powered by Kimi K2.5 from Moonshot AI" badge and "Extended usage limits for the next 2 weeks!" when flag is on
- Moonshot/Kimi highlighted as "Recommended" in provider templates
- LLM Hub defaults to Kimi, ChatGPT, Claude, Gemini (with legacy defaults migration)
- Kimi hub row shows "Powered by Moonshot AI" flare
- Model selector locked to kimi-k2.5
- "How to get a Kimi API key" link in provider dialog
- Moonshot provider fully integrated across frontend and backend
* fix: refactor SDK BrowserService to use Browser class directly
The tools system was completely rewritten with new tool names and response
formats. BrowserService was calling non-existent MCP tools (browser_get_active_tab,
browser_navigate, etc.) that returned structuredContent which no longer exists.
Replaced MCP HTTP client calls with direct Browser class method calls:
- getActiveTab → browser.getActivePage() / browser.listPages()
- getPageContent → browser.contentAsMarkdown()
- getScreenshot → browser.screenshot()
- navigate → browser.goto() with tabId/windowId resolution
- getPageLoadStatus → browser.listPages() with isLoading check
- getInteractiveElements → browser.snapshot() / browser.enhancedSnapshot()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review — consistent tabId guard and remove dead PageContent type
- Change `if (tabId)` to `if (tabId !== undefined)` in navigate() to match
the guard style used for windowId and elsewhere in the file
- Remove orphaned PageContent interface no longer imported after refactor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add get_dom and search_dom tools for HTML DOM inspection
Add two new observation tools:
- get_dom: Returns raw HTML of a page or scoped element via CSS selector
- search_dom: Fuzzy searches DOM elements by text, attributes, IDs, and
class names using Fuse.js with extended search syntax support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use CDP DOM protocol instead of injected scripts for DOM tools
Replace Runtime.evaluate-based approach with native CDP DOM methods:
- get_dom uses DOM.getDocument + DOM.querySelector + DOM.getOuterHTML
- search_dom uses DOM.performSearch + DOM.getSearchResults + DOM.describeNode
- Remove fuse.js dependency (CDP performSearch handles text/CSS/XPath natively)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add comprehensive tests for get_dom and search_dom tools
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve text nodes to parent elements in searchDom
CDP performSearch returns text nodes (nodeType 3) for plain text queries.
describeNode does not populate parentId, so use resolveNode + callFunctionOn
to get parentElement, then requestNode to obtain the parent's nodeId.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add limit bounds validation and searchId leak prevention
- Add .int().min(1).max(200) to search_dom limit parameter
- Wrap searchDom result processing in try/finally to ensure
discardSearchResults is always called
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Tests were passing raw Chrome tabIds to group_tabs and ungroup_tabs tools,
but the Zod schemas expect pageIds (MCP-layer page IDs). The tabIds field
was silently stripped during validation, causing both tests to fail.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add new CDP tools for links, hidden pages/windows, show/move
- get_page_links: extract deduplicated links from a page via evaluate
- new_hidden_page: open a hidden tab for background automation
- create_hidden_window: create a hidden window for background automation
- show_page: restore a hidden page back into a visible window
- move_page: move a tab to a different window or position
- Default includeLinks to false in get_page_content
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: use AX tree for get_page_links, add tests, fix test scripts
- Refactor get_page_links to use accessibility tree instead of raw JS
evaluate — more reliable for role="link" elements and shadow DOM
- Add extractLinkNodes() to snapshot.ts and getPageLinks() to browser.ts
- Add tests for get_page_links (constructed HTML with dedup/filtering),
new_hidden_page, show_page, move_page, create_hidden_window
- Fix root package.json test scripts to match server's actual scripts
- Update CLAUDE.md test docs to reflect current structure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: move ChatV2Service to api/services layer and add resolvePageIds
Move ChatV2Service from agent/tool-loop/ to api/services/ where it
belongs as a service-layer concern. Add resolvePageIds() to convert
Chrome tab IDs to internal page IDs before they reach the agent,
fixing undefined pageId issues in browser automation tools.
Clean up server.ts by removing the USE_TOOL_AGENT flag, SessionManager,
and old chat route import — both /chat and /chat-v2 now directly use
createChatV2Routes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments for chat-v2-service
- Fix TOCTOU race: derive isNewSession inside the creation block
instead of separate has()/get() calls
- Log warning when resolvePageIds can't map a tab ID
- Deduplicate tab IDs with Set before resolving
- Remove redundant null check on session in onFinish
- Add license header
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update bun.lock
* fix: skip resolvePageIds for scheduled tasks to prevent pageId corruption
Scheduled tasks build browserContext with internal page IDs from
browser.newPage(), not Chrome tab IDs. The unconditional second
resolvePageIds() call was passing these internal IDs to resolveTabIds()
which expects Chrome tab IDs, causing the lookup to fail and overwrite
correct pageIds with undefined.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add biome-ignore comments for noExcessiveCognitiveComplexity on compaction.ts
and grep.ts, and noExplicitAny on filesystem test helpers.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: generalized compaction prompts with split turn handling
Replace browser-specific XML prompts with domain-agnostic markdown format.
Add split turn detection and parallel summarization for large single-turn
conversations. Switch compaction from generateText to streamText for
Fireworks API compatibility. Add comprehensive unit and E2E tests (84 total).
* fix: address code review issues for compaction (PR #391)
Enforce COMPACTION_MAX_SUMMARIZATION_INPUT cap, extract shared
callSummarizer helper, add runtime type guard for experimental_context,
move magic constants to AGENT_LIMITS, and remove dead constants.
* fix: cap truncatedTurnPrefix input to maxSummarizationInput
Apply the same sliding window cap to turn prefix messages that was
already applied to toSummarize, preventing unbounded LLM input for
long single-turn conversations with many tool calls.
* fix: reduce browseros-auto default context window to 200K
The 400K setting caused compaction to trigger at ~383K, but the actual
model limit is 262K. Conversations hit the hard limit before compaction
could kick in.
* fix: use fresh browser context for selected tabs on each message
Previously, session.browserContext (set on the first message) always
took precedence via the nullish coalescing operator. On subsequent
messages with different tab selections, the new selectedTabs from the
request were silently ignored.
Now normal messages always use request.browserContext so freshly
selected tabs are included. Scheduled tasks still use the stored
session context to preserve the hidden window's pageId/windowId.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use singleton transport for MCP route
MCP SDK 1.26.0 added a strict guard in Protocol.connect() that throws
"Already connected to a transport" if called when already connected.
The previous code created a new transport per request and called
connect() each time, causing every request after the first to fail
with -32603 Internal server error.
Move transport creation outside the request handler and add
isConnected() check per @hono/mcp docs pattern.
* fix: per-request MCP server+transport for SDK 1.26.0 compat
MCP SDK 1.26.0 patched a security vulnerability (GHSA-345p-7cg4-v4c7)
where sharing a singleton McpServer across requests could leak
cross-client response data via message ID collisions.
Create fresh McpServer + StreamableHTTPTransport per request:
no shared state, no race conditions, no ID collisions.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The agent had no knowledge of its working directory, so it couldn't
reference created files by absolute path or help users locate them.
Pass sessionExecutionDir into buildSystemPrompt for both AiSdkAgent
and GeminiAgent so the prompt includes a <workspace> section with
the resolved directory path.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Previously, session.browserContext (set on the first message) always
took precedence via the nullish coalescing operator. On subsequent
messages with different tab selections, the new selectedTabs from the
request were silently ignored.
Now normal messages always use request.browserContext so freshly
selected tabs are included. Scheduled tasks still use the stored
session context to preserve the hidden window's pageId/windowId.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore glow overlay for CDP-based tools
After migrating to CDP tools, glow broke because the hook looked for
input.tabId (controller tools) while CDP tools use input.page (pageId).
- Server: add getTabIdForPage() to Browser, include tabId in tool output
- Client: extract tabId from output, fall back to active Chrome tab
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use ToolResultMetadata for tabId resolution
Move tabId resolution from tool-adapter into the framework layer:
- response.ts: add ToolResultMetadata interface with tabId field
- framework.ts: auto-resolve pageId→tabId after tool execution
- tool-adapter.ts: just forward metadata (no domain logic)
This makes metadata available to all ToolResult consumers, not just
the AI SDK adapter, and the metadata bag is extensible for future fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add todo
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: replace pi-mono filesystem tools with native Bun/Node.js implementation
Remove @mariozechner/pi-coding-agent and @mariozechner/pi-agent-core
dependencies that caused bun compile issues (tree traversal, package.json
resolution). Reimplement all 7 filesystem tools (read, write, edit, bash,
grep, find, ls) using only Bun and Node.js built-in libraries.
- No external binary dependencies (no ripgrep, fd, etc.)
- Cross-platform: Linux, macOS, Windows
- 107 tests covering all tools and utilities
- Pure JS grep/find using Bun.Glob and async directory walking
* fix: add explicit ENOENT handling in grep tool stat() call
* feat: ensure scheduled tasks open in hidden tab
* fix: update scheduled task result in the UI
* fix: remove unnecessary useEffect
* fix: race condition with deleteSession