* fix: compaction config for small context windows (≤32K)
Raise COMPACTION_SMALL_CONTEXT_WINDOW from 16K to 32K so models like
Haiku 4.5 (30K context) use proportional 50% reserve instead of the
fixed 20K reserve. Also scale fixedOverhead for small contexts (capped
at 40% of context window) to prevent the doom loop where overhead alone
triggers compaction on every step.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add compaction tuning guidance to limits constants
Explain the relationship between SMALL_CONTEXT_WINDOW and
FIXED_OVERHEAD so devs know the 24K minimum constraint when
tweaking these values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add 2-stage pruning to compaction pipeline before LLM summarization
Add two new lightweight stages to the compaction prepareStep pipeline that
recover context tokens cheaply before falling back to expensive LLM
summarization:
- Stage 2: Use AI SDK's pruneMessages to remove old tool call/result
pairs beyond the last 6 messages entirely
- Stage 3: Replace remaining tool output values with short placeholders
("[Cleared — N chars]") while preserving tool call structure and IDs
Both stages re-estimate tokens from message content (not stale step
usage) after modifying messages. The existing LLM summarization and
sliding window fallback remain as Stage 4.
Also adds estimateTokensForThreshold() helper, clearToolOutputs()
function, and COMPACTION_PRUNE_KEEP_RECENT_MESSAGES /
COMPACTION_CLEAR_OUTPUT_MIN_CHARS constants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: reorder compaction pipeline — truncate before clear, protect recent tools
- Stage 0: Check threshold, return untouched when under (no data loss)
- Stage 1: Prune old tool call/result pairs beyond last 6 messages
- Stage 2: Truncate large tool outputs to 15K chars (keeps partial content)
- Stage 3: Clear old tool outputs with placeholders, protect last 2
- Stage 4: LLM-based compaction with sliding window fallback
clearToolOutputs now accepts keepRecentCount parameter (default 2) to
skip the N most recent tool messages from clearing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: limits fixes
* fix: address review — preserve toKeep context, derive test values from constants
- When Stage 3 (clearToolOutputs) doesn't resolve overflow, pass
truncated (not cleared) messages to Stage 4 so toKeep retains
meaningful tool outputs for the agent's immediate context
- Add comment explaining intentional conservatism in post-prune
token estimation (step usage is stale, must re-estimate safely)
- Refactor computeConfig tests to derive expected values from
AGENT_LIMITS constants instead of hardcoding magic numbers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- truncateToolOutputs: handle all output.type variants (text, json,
content) by checking output.value directly instead of branching on
type. The old code missed type 'content' (array of content parts),
causing 1M+ char tool results to pass through untouched.
- estimateTokens: change chars/4 to chars/3 — HTML/Markdown content
tokenizes at ~3.14 chars/token empirically, not 4.
- COMPACTION_FIXED_OVERHEAD: 5K → 12K to account for system prompt
(~2.5K tokens) + tool definitions as JSON Schema (~8-9K tokens).
- Apply truncateToolOutputs in prepareStep (Stage 0) before token
estimation, not just during summarization.
* fix: robust compaction with Pi-style token counting + overflow middleware
Root cause: getCurrentTokenCount() returned stale inputTokens from the
previous step, ignoring new tool results added to messages since that
step. A large tool output (DOM snapshot, page content) caused a token
jump that bypassed the compaction threshold check, leading to
context_length_exceeded errors (322K tokens sent, model max 262K).
Layer 1 — Accurate token counting (proactive):
- Adopt Pi coding agent's additive approach: base(inputTokens) +
outputTokens + estimate(trailing tool results)
- Trailing tool results are estimated by walking backwards from end of
messages array until a non-tool message is found
- Falls back to full estimation with safety multiplier when no real
usage data is available (first step of a turn)
Layer 2 — Context overflow middleware (reactive):
- LanguageModelV3Middleware that wraps doGenerate/doStream
- Catches context_length_exceeded errors at the model call level
- Truncates prompt (keeps system messages + most recent non-system
messages targeting 60% of context window)
- Retries the model call once
Verified end-to-end with real model (Gemini Flash Lite via OpenRouter)
on 16K context window: 4 compactions triggered correctly across 8
steps, no context_length_exceeded errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: adopt Pi-style overflow detection patterns + fix truncation edge case
- Replace 6 generic substring matches with 17 provider-specific regex
patterns from Pi coding agent (Anthropic, OpenAI, Google, xAI, Groq,
OpenRouter, Bedrock, Copilot, llama.cpp, LM Studio, MiniMax, Kimi,
Mistral, z.ai)
- Fix truncatePrompt edge case: when the last message alone exceeds the
target, keepFrom was never updated → empty non-system messages. Now
always keeps at least the most recent non-system message.
- Add runtime guard for LanguageModelV3 cast in ai-sdk-agent.ts
- Add tests for false-positive rejection and truncation edge case
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add get_dom and search_dom tools for HTML DOM inspection
Add two new observation tools:
- get_dom: Returns raw HTML of a page or scoped element via CSS selector
- search_dom: Fuzzy searches DOM elements by text, attributes, IDs, and
class names using Fuse.js with extended search syntax support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use CDP DOM protocol instead of injected scripts for DOM tools
Replace Runtime.evaluate-based approach with native CDP DOM methods:
- get_dom uses DOM.getDocument + DOM.querySelector + DOM.getOuterHTML
- search_dom uses DOM.performSearch + DOM.getSearchResults + DOM.describeNode
- Remove fuse.js dependency (CDP performSearch handles text/CSS/XPath natively)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add comprehensive tests for get_dom and search_dom tools
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve text nodes to parent elements in searchDom
CDP performSearch returns text nodes (nodeType 3) for plain text queries.
describeNode does not populate parentId, so use resolveNode + callFunctionOn
to get parentElement, then requestNode to obtain the parent's nodeId.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add limit bounds validation and searchId leak prevention
- Add .int().min(1).max(200) to search_dom limit parameter
- Wrap searchDom result processing in try/finally to ensure
discardSearchResults is always called
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Tests were passing raw Chrome tabIds to group_tabs and ungroup_tabs tools,
but the Zod schemas expect pageIds (MCP-layer page IDs). The tabIds field
was silently stripped during validation, causing both tests to fail.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add new CDP tools for links, hidden pages/windows, show/move
- get_page_links: extract deduplicated links from a page via evaluate
- new_hidden_page: open a hidden tab for background automation
- create_hidden_window: create a hidden window for background automation
- show_page: restore a hidden page back into a visible window
- move_page: move a tab to a different window or position
- Default includeLinks to false in get_page_content
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: use AX tree for get_page_links, add tests, fix test scripts
- Refactor get_page_links to use accessibility tree instead of raw JS
evaluate — more reliable for role="link" elements and shadow DOM
- Add extractLinkNodes() to snapshot.ts and getPageLinks() to browser.ts
- Add tests for get_page_links (constructed HTML with dedup/filtering),
new_hidden_page, show_page, move_page, create_hidden_window
- Fix root package.json test scripts to match server's actual scripts
- Update CLAUDE.md test docs to reflect current structure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add biome-ignore comments for noExcessiveCognitiveComplexity on compaction.ts
and grep.ts, and noExplicitAny on filesystem test helpers.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: generalized compaction prompts with split turn handling
Replace browser-specific XML prompts with domain-agnostic markdown format.
Add split turn detection and parallel summarization for large single-turn
conversations. Switch compaction from generateText to streamText for
Fireworks API compatibility. Add comprehensive unit and E2E tests (84 total).
* fix: address code review issues for compaction (PR #391)
Enforce COMPACTION_MAX_SUMMARIZATION_INPUT cap, extract shared
callSummarizer helper, add runtime type guard for experimental_context,
move magic constants to AGENT_LIMITS, and remove dead constants.
* fix: cap truncatedTurnPrefix input to maxSummarizationInput
Apply the same sliding window cap to turn prefix messages that was
already applied to toSummarize, preventing unbounded LLM input for
long single-turn conversations with many tool calls.
* fix: reduce browseros-auto default context window to 200K
The 400K setting caused compaction to trigger at ~383K, but the actual
model limit is 262K. Conversations hit the hard limit before compaction
could kick in.
* feat: replace pi-mono filesystem tools with native Bun/Node.js implementation
Remove @mariozechner/pi-coding-agent and @mariozechner/pi-agent-core
dependencies that caused bun compile issues (tree traversal, package.json
resolution). Reimplement all 7 filesystem tools (read, write, edit, bash,
grep, find, ls) using only Bun and Node.js built-in libraries.
- No external binary dependencies (no ripgrep, fd, etc.)
- Cross-platform: Linux, macOS, Windows
- 107 tests covering all tools and utilities
- Pure JS grep/find using Bun.Glob and async directory walking
* fix: add explicit ENOENT handling in grep tool stat() call
* feat: update to support more klavis MCP servers
* fix: minor icon fix
* fix: normalize klavis mcp auth flow compatibility
* feat: add API key auth flow for Klavis MCP servers
Servers that use API key authentication (Stripe, Cloudflare, Brave
Search, Exa, Mem0, Resend, Mixpanel, PostHog, Postman, Zendesk,
Intercom) were failing with "Failed to add app" because the frontend
only handled OAuth flows. This adds the complete API key auth path:
- Backend: apiKeyUrls in StrataCreateResponse, submitApiKey() method,
/servers/submit-api-key route
- Frontend: ApiKeyDialog component, useSubmitApiKey hook, ConnectMCP
updated to show dialog for API-key servers instead of opening OAuth
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove broken success check in Klavis submitApiKey
The Klavis /mcp-server/instance/set-auth endpoint returns
{ message: "Authentication updated successfully." } without a
success field. Our code checked `data.success` which was always
undefined, causing API key auth to fail even when Klavis accepted
the key. The request() method already throws on non-2xx responses,
so the explicit check was redundant and incorrect.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore: refactoring
* fix: return all response parts from tool execution
Previously, handleToolExecution only returned responseParts[0], causing
data loss when tools returned multiple parts. This fix:
- Changes ToolExecutionResult.part to ToolExecutionResult.parts (array)
- Returns all responseParts instead of just the first one
- Spreads all parts into toolResponseParts in processToolRequests
* feat: agent-sdk outline
* feat: unit tests for agent-sdk
* feat: implement /sdk routes
* feat: integration test for agent-sdk with server
* feat: ENV to disble headless mode for testing
* feat: act() integration test working
* chore: refactor package/shared to have constants/ and /types separately
* feat: verify() and extract() sdk APIs
* feat: extract() use remote endpoint for extraction
* feat: verify() implemented - lazy parsing to avoid strong schema checks
* fix: remove generateStructuredOutput as not models support it
* fix: clean-up LLM types and use zod schema
* fix: typecheck vitetest error
* fix: remove directly calling GeminiAgent in sdk act()
* fix: lefthook for refactor warning
* fix: refactor routes/sdk to move business logic out
* chore: fix monorepo setup
1) use single .env.development file at the root
2) update package.json to contain commands to start server and agent
3) rename "Assistant" package name to "agent"
4) rename HTTP_MCP_PORT to SERVER_PORT
* chore: update README
* chore: update .env.example
* fix: clean-up old docs
* feat: refactored test utils
* fix: clean-up dev scripts and move to scripts/dev
* fix: clean-up script
* fix: refactor tests into properly controller tests and cdp tests
* feat: import all the missing tests before refactor
* fix: biome errors for tests
* fix: few type errors and add exceptiosn
* fix: few more type errors
* fix: remove agent port from tests
* fix: exclude tests from tsconfig, bun run tests natively
* fix: mcpServer test now waits for extension connected
* feat: refactor packages into single project
* feat: created apps directory
* chore: removed duplicate packages
* fix: delete package-lock.json
since project uses bun