BrowserOS

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-20 20:39:10 +00:00

Author	SHA1	Message	Date
Nikhil	a824078f6d	fix: compaction config for small context windows (≤32K) (#466 ) * fix: compaction config for small context windows (≤32K) Raise COMPACTION_SMALL_CONTEXT_WINDOW from 16K to 32K so models like Haiku 4.5 (30K context) use proportional 50% reserve instead of the fixed 20K reserve. Also scale fixedOverhead for small contexts (capped at 40% of context window) to prevent the doom loop where overhead alone triggers compaction on every step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add compaction tuning guidance to limits constants Explain the relationship between SMALL_CONTEXT_WINDOW and FIXED_OVERHEAD so devs know the 24K minimum constraint when tweaking these values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 18:12:20 -07:00
Nikhil	2d6d08c9fe	fix: move tool-result media normalization into agent (#460 ) * fix: sanitize media during compaction * fix: normalize content outputs in compaction helpers * fix: move tool-result media normalization into agent * chore: rename compaction orchestrator file	2026-03-10 17:21:09 -07:00
Nikhil	f81e73f6a4	fix: avoid crashing on controller startup failure (#458 ) * fix: avoid crashing on controller startup failure * fix: address PR review comments for remove_controller_startup_crash	2026-03-10 11:53:11 -07:00
Nikhil	4fc68b5264	feat: use execution dir for tool temp output (#456 ) * feat: use execution dir for tool temp output * fix: harden execution dir temp staging * refactor: use temp files for transient tool output	2026-03-10 10:57:00 -07:00
Nikhil	5b27933c63	feat: add 2-stage pruning to compaction pipeline (#455 ) * feat: add 2-stage pruning to compaction pipeline before LLM summarization Add two new lightweight stages to the compaction prepareStep pipeline that recover context tokens cheaply before falling back to expensive LLM summarization: - Stage 2: Use AI SDK's pruneMessages to remove old tool call/result pairs beyond the last 6 messages entirely - Stage 3: Replace remaining tool output values with short placeholders ("[Cleared — N chars]") while preserving tool call structure and IDs Both stages re-estimate tokens from message content (not stale step usage) after modifying messages. The existing LLM summarization and sliding window fallback remain as Stage 4. Also adds estimateTokensForThreshold() helper, clearToolOutputs() function, and COMPACTION_PRUNE_KEEP_RECENT_MESSAGES / COMPACTION_CLEAR_OUTPUT_MIN_CHARS constants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: reorder compaction pipeline — truncate before clear, protect recent tools - Stage 0: Check threshold, return untouched when under (no data loss) - Stage 1: Prune old tool call/result pairs beyond last 6 messages - Stage 2: Truncate large tool outputs to 15K chars (keeps partial content) - Stage 3: Clear old tool outputs with placeholders, protect last 2 - Stage 4: LLM-based compaction with sliding window fallback clearToolOutputs now accepts keepRecentCount parameter (default 2) to skip the N most recent tool messages from clearing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: limits fixes * fix: address review — preserve toKeep context, derive test values from constants - When Stage 3 (clearToolOutputs) doesn't resolve overflow, pass truncated (not cleared) messages to Stage 4 so toKeep retains meaningful tool outputs for the agent's immediate context - Add comment explaining intentional conservatism in post-prune token estimation (step usage is stale, must re-estimate safely) - Refactor computeConfig tests to derive expected values from AGENT_LIMITS constants instead of hardcoding magic numbers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:41:34 -07:00
Nikhil	15755a84d9	feat: use execution dir in browser tool context (#453 )	2026-03-10 09:38:36 -07:00
Nikhil	7d20768d8e	feat: persist large tool outputs to disk (#452 ) * feat: persist large tool outputs to disk * fix: address PR review comments for tool output limits * chore: raise filesystem read line limit to 500	2026-03-10 09:25:19 -07:00
shivammittal274	44071cb0f4	fix: fix compaction tool output truncation and token estimation (#448 ) - truncateToolOutputs: handle all output.type variants (text, json, content) by checking output.value directly instead of branching on type. The old code missed type 'content' (array of content parts), causing 1M+ char tool results to pass through untouched. - estimateTokens: change chars/4 to chars/3 — HTML/Markdown content tokenizes at ~3.14 chars/token empirically, not 4. - COMPACTION_FIXED_OVERHEAD: 5K → 12K to account for system prompt (~2.5K tokens) + tool definitions as JSON Schema (~8-9K tokens). - Apply truncateToolOutputs in prepareStep (Stage 0) before token estimation, not just during summarization.	2026-03-10 02:39:54 +05:30
shivammittal274	3808faf94d	fix: robust compaction with Pi-style token counting + overflow middle… (#444 ) * fix: robust compaction with Pi-style token counting + overflow middleware Root cause: getCurrentTokenCount() returned stale inputTokens from the previous step, ignoring new tool results added to messages since that step. A large tool output (DOM snapshot, page content) caused a token jump that bypassed the compaction threshold check, leading to context_length_exceeded errors (322K tokens sent, model max 262K). Layer 1 — Accurate token counting (proactive): - Adopt Pi coding agent's additive approach: base(inputTokens) + outputTokens + estimate(trailing tool results) - Trailing tool results are estimated by walking backwards from end of messages array until a non-tool message is found - Falls back to full estimation with safety multiplier when no real usage data is available (first step of a turn) Layer 2 — Context overflow middleware (reactive): - LanguageModelV3Middleware that wraps doGenerate/doStream - Catches context_length_exceeded errors at the model call level - Truncates prompt (keeps system messages + most recent non-system messages targeting 60% of context window) - Retries the model call once Verified end-to-end with real model (Gemini Flash Lite via OpenRouter) on 16K context window: 4 compactions triggered correctly across 8 steps, no context_length_exceeded errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: adopt Pi-style overflow detection patterns + fix truncation edge case - Replace 6 generic substring matches with 17 provider-specific regex patterns from Pi coding agent (Anthropic, OpenAI, Google, xAI, Groq, OpenRouter, Bedrock, Copilot, llama.cpp, LM Studio, MiniMax, Kimi, Mistral, z.ai) - Fix truncatePrompt edge case: when the last message alone exceeds the target, keepFrom was never updated → empty non-system messages. Now always keeps at least the most recent non-system message. - Add runtime guard for LanguageModelV3 cast in ai-sdk-agent.ts - Add tests for false-positive rejection and truncation edge case Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 14:22:35 +05:30
Nikhil	2e79933cae	refactor: flatten server agent directory (#435 )	2026-03-06 16:07:14 -08:00
Nikhil	bc53ff52e5	feat: remove legacy /chat endpoint (#428 ) * feat: remove legacy chat endpoint alias * refactor: rename chat-v2 to chat	2026-03-06 09:29:42 -08:00
Nikhil	e37d19da51	feat: add structured MCP tool outputs and schemas (#420 ) * feat: add structured MCP outputs for browser tools * fix: address PR review comments for mcp_structured_content	2026-03-05 13:19:01 -08:00
Nikhil	52570bd6aa	feat: make server tests use dynamic browser runtime allocation (#416 ) * feat: use dynamic runtime allocation for server test browser startup * fix: address PR review comments for sdk_test_dev_runner_migration	2026-03-05 11:19:31 -08:00
Nikhil	ae2c216321	feat: add get_dom and search_dom tools (#398 ) * feat: add get_dom and search_dom tools for HTML DOM inspection Add two new observation tools: - get_dom: Returns raw HTML of a page or scoped element via CSS selector - search_dom: Fuzzy searches DOM elements by text, attributes, IDs, and class names using Fuse.js with extended search syntax support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: use CDP DOM protocol instead of injected scripts for DOM tools Replace Runtime.evaluate-based approach with native CDP DOM methods: - get_dom uses DOM.getDocument + DOM.querySelector + DOM.getOuterHTML - search_dom uses DOM.performSearch + DOM.getSearchResults + DOM.describeNode - Remove fuse.js dependency (CDP performSearch handles text/CSS/XPath natively) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add comprehensive tests for get_dom and search_dom tools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve text nodes to parent elements in searchDom CDP performSearch returns text nodes (nodeType 3) for plain text queries. describeNode does not populate parentId, so use resolveNode + callFunctionOn to get parentElement, then requestNode to obtain the parent's nodeId. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add limit bounds validation and searchId leak prevention - Add .int().min(1).max(200) to search_dom limit parameter - Wrap searchDom result processing in try/finally to ensure discardSearchResults is always called Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 11:35:09 -08:00
Nikhil	20bb4cb21e	fix: use pageIds instead of tabIds in tab group tests (#397 ) Tests were passing raw Chrome tabIds to group_tabs and ungroup_tabs tools, but the Zod schemas expect pageIds (MCP-layer page IDs). The tabIds field was silently stripped during validation, causing both tests to fail. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:54:49 -08:00
Nikhil	14ab8fe97e	feat: add new CDP tools and improve tool ergonomics (#396 ) * feat: add new CDP tools for links, hidden pages/windows, show/move - get_page_links: extract deduplicated links from a page via evaluate - new_hidden_page: open a hidden tab for background automation - create_hidden_window: create a hidden window for background automation - show_page: restore a hidden page back into a visible window - move_page: move a tab to a different window or position - Default includeLinks to false in get_page_content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: use AX tree for get_page_links, add tests, fix test scripts - Refactor get_page_links to use accessibility tree instead of raw JS evaluate — more reliable for role="link" elements and shadow DOM - Add extractLinkNodes() to snapshot.ts and getPageLinks() to browser.ts - Add tests for get_page_links (constructed HTML with dedup/filtering), new_hidden_page, show_page, move_page, create_hidden_window - Fix root package.json test scripts to match server's actual scripts - Update CLAUDE.md test docs to reflect current structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:38:23 -08:00
Nikhil	f449162699	fix: suppress biome lint warnings with biome-ignore directives (#395 ) Add biome-ignore comments for noExcessiveCognitiveComplexity on compaction.ts and grep.ts, and noExplicitAny on filesystem test helpers. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:12:07 -08:00
shivammittal274	de52afbc55	feat: generalized compaction prompts with split turn handling (#391 ) * feat: generalized compaction prompts with split turn handling Replace browser-specific XML prompts with domain-agnostic markdown format. Add split turn detection and parallel summarization for large single-turn conversations. Switch compaction from generateText to streamText for Fireworks API compatibility. Add comprehensive unit and E2E tests (84 total). * fix: address code review issues for compaction (PR #391) Enforce COMPACTION_MAX_SUMMARIZATION_INPUT cap, extract shared callSummarizer helper, add runtime type guard for experimental_context, move magic constants to AGENT_LIMITS, and remove dead constants. * fix: cap truncatedTurnPrefix input to maxSummarizationInput Apply the same sliding window cap to turn prefix messages that was already applied to toSummarize, preventing unbounded LLM input for long single-turn conversations with many tool calls. * fix: reduce browseros-auto default context window to 200K The 400K setting caused compaction to trigger at ~383K, but the actual model limit is 262K. Conversations hit the hard limit before compaction could kick in.	2026-03-03 17:20:18 +05:30
Nikhil	91cb0300d4	fix: make CDP discovery resilient on localhost-only setups (#378 ) * chore: bump server version * feat: add loopback fallback for cdp discovery	2026-02-28 13:56:56 -08:00
Nikhil	47a70b43de	feat: improve scroll reliability and tool response latency (#374 ) * feat: improve scroll reliability and tool response latency * fix: address PR review comments for fix_scroll_tool	2026-02-27 09:24:29 -08:00
Nikhil	1cba45e7b7	fix: stabilize cdp connect and reconnect lifecycle (#373 ) * chore: bump server * fix: harden cdp connect and reconnect flow	2026-02-26 20:48:41 -08:00
Nikhil	e02ba395f9	feat: fix input key (#370 ) * feat: fix input key * fix: more tests	2026-02-26 18:22:53 -08:00
Nikhil	19c4175631	feat: replace pi-mono filesystem tools with native implementation (#366 ) * feat: replace pi-mono filesystem tools with native Bun/Node.js implementation Remove @mariozechner/pi-coding-agent and @mariozechner/pi-agent-core dependencies that caused bun compile issues (tree traversal, package.json resolution). Reimplement all 7 filesystem tools (read, write, edit, bash, grep, find, ls) using only Bun and Node.js built-in libraries. - No external binary dependencies (no ripgrep, fd, etc.) - Cross-platform: Linux, macOS, Windows - 107 tests covering all tools and utilities - Pure JS grep/find using Bun.Glob and async directory walking * fix: add explicit ENOENT handling in grep tool stat() call	2026-02-26 14:56:25 -08:00
Nikhil	cb8aa6c60e	feat: fix new cdp tests for tools (#358 ) * feat: new tools tests * fix: lint warnings by disabling or TODO * fix: minore update to branch cleaner	2026-02-23 16:08:34 -08:00
Nikhil	81a6d20fe8	feat: cdp tools (#353 ) * feat: unified CDP + controller tools architecture Merge CDP and controller tools into a single Browser abstraction with backend-agnostic tool definitions. Replaces old separate cdp/controller tool registries with unified registry, adds new tools (bookmarks, tab-groups, history, keyboard, mouse, snapshot, content-markdown). * feat: fix bookmarks and history, move browseros-info tool * chore: bump server version * fix: increase console truncate limit * fix: previous conversation fix * chore: bump server version * fix: tab-group cdp * fix: update types based on pdl * fix: enable tab grouping * fix: prompt enable tab grouping * chore: bump server version	2026-02-23 07:28:45 -08:00
Felarof	2e1fc2e8f9	feat: add API key auth flow for Klavis MCP servers (#343 ) * feat: update to support more klavis MCP servers * fix: minor icon fix * fix: normalize klavis mcp auth flow compatibility * feat: add API key auth flow for Klavis MCP servers Servers that use API key authentication (Stripe, Cloudflare, Brave Search, Exa, Mem0, Resend, Mixpanel, PostHog, Postman, Zendesk, Intercom) were failing with "Failed to add app" because the frontend only handled OAuth flows. This adds the complete API key auth path: - Backend: apiKeyUrls in StrataCreateResponse, submitApiKey() method, /servers/submit-api-key route - Frontend: ApiKeyDialog component, useSubmitApiKey hook, ConnectMCP updated to show dialog for API-key servers instead of opening OAuth Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove broken success check in Klavis submitApiKey The Klavis /mcp-server/instance/set-auth endpoint returns { message: "Authentication updated successfully." } without a success field. Our code checked `data.success` which was always undefined, causing API key auth to fail even when Klavis accepted the key. The request() method already throws on non-2xx responses, so the explicit check was redundant and incorrect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 19:31:59 +05:30
Nikhil	06dd421776	feat: remove watchdog (#283 )	2026-01-30 09:57:54 -08:00
shivammittal274	ec91d69b1d	feat: added bookmarks tool and instructions in prompt (#276 ) * feat: added bookmarks tool and instructions in prompt * feat: added bookmarks tool and instructions in prompt	2026-01-27 09:56:57 -08:00
Nikhil	69e159f886	feat: new /shutodwn route + refactor (#281 ) * feat: /shutdown API * fix: rename extension status to status	2026-01-27 09:45:01 -08:00
Nikhil	ad4e391b9c	feat: health watch to self terminate process on crash (#256 ) * feat: health watch to self terminate process on crash * feat: add tests	2026-01-20 16:09:57 -08:00
Nikhil Sonti	63c89c1712	fix: import clean-up + unit test for transformCode	2026-01-20 10:36:17 -08:00
Nikhil	6f30dc748e	fix: improve graph execution (#246 ) * fix: [remove] debug logs * feat: add stateful act() support * fix: [TMP] always load tmp/current_code * feat: interactive snapshot structured content and adding that api in browseros service for sdk * fix: verify pass interactive elements * feat: refactored agent sdk with act having verify options * fix: verify uses simplified snapshot * fix: remove testing code, lint fixes * fix: remove debug logs	2026-01-19 16:58:22 -08:00
Nikhil	eacdfaf579	feat: config + codegen env handling (#242 ) * feat: better INLINE & PROD env handling * chore: bump server version * feat: refactor config ts better	2026-01-16 16:53:06 -08:00
Nikhil	752f4319b6	feat: refactor better structure for apps/server (#213 )	2026-01-12 15:47:16 -08:00
shivammittal274	940bdebaaf	chore: refactoring linting (#186 ) * chore: refactoring * fix: return all response parts from tool execution Previously, handleToolExecution only returned responseParts[0], causing data loss when tools returned multiple parts. This fix: - Changes ToolExecutionResult.part to ToolExecutionResult.parts (array) - Returns all responseParts instead of just the first one - Spreads all parts into toolResponseParts in processToolRequests	2026-01-08 09:05:50 -08:00
Nikhil	3b838d0f94	feat: remove index.ts pattern (#177 )	2026-01-07 15:57:26 -08:00
shivammittal274	8b8c81eb74	chore: agent sdk release (#172 ) * chore: agent sdk release: * chore: agent sdk release * chore: agent sdk release	2026-01-07 18:45:01 +05:30
Nikhil	afddda015a	feat: fix imports to remove .js (#170 ) * fix: remove all .js in imports * fix: update claude mode to use right import * fix: remove addition in main package.json	2026-01-06 10:54:26 -08:00
Nikhil	d1561df83c	feat: refactor for consistent file-names in apps/server (#155 ) * feat: rename files following kebab case * chore: add claude.md with filename instructions	2026-01-02 16:47:05 -08:00
Nikhil	3a370ce27d	fix: test helpers extension timeout (#154 )	2026-01-02 16:29:05 -08:00
Nikhil	f66fdae2c1	feat: fix all typescript errors and biome errors (#153 ) * feat: fix all typescript errors and biome errors * fix: address review feedback	2026-01-02 15:36:08 -08:00
Nikhil	47b9c1894d	feat: implement agent-sdk (#145 ) * feat: agent-sdk outline * feat: unit tests for agent-sdk * feat: implement /sdk routes * feat: integration test for agent-sdk with server * feat: ENV to disble headless mode for testing * feat: act() integration test working * chore: refactor package/shared to have constants/ and /types separately * feat: verify() and extract() sdk APIs * feat: extract() use remote endpoint for extraction * feat: verify() implemented - lazy parsing to avoid strong schema checks * fix: remove generateStructuredOutput as not models support it * fix: clean-up LLM types and use zod schema * fix: typecheck vitetest error * fix: remove directly calling GeminiAgent in sdk act() * fix: lefthook for refactor warning * fix: refactor routes/sdk to move business logic out	2026-01-01 17:38:40 -08:00
Felarof	1044888d9a	feat: fix mono repo setup (#139 ) * chore: fix monorepo setup 1) use single .env.development file at the root 2) update package.json to contain commands to start server and agent 3) rename "Assistant" package name to "agent" 4) rename HTTP_MCP_PORT to SERVER_PORT * chore: update README * chore: update .env.example	2025-12-30 11:39:55 -08:00
Nikhil	ee14a0841c	feat: created shared/ and move all constants to avoid magic numbers spread out (#126 ) * feat: create a shared workspace * feat: use constants from shared. No magic numbers spread out * fix: update claude.md	2025-12-25 15:22:26 -08:00
Nikhil	803ea51dbf	feat: fix tests and refactor (#125 ) * fix: clean-up old docs * feat: refactored test utils * fix: clean-up dev scripts and move to scripts/dev * fix: clean-up script * fix: refactor tests into properly controller tests and cdp tests	2025-12-25 14:32:45 -08:00
Nikhil	742c349f86	feat: import missing tests (#124 ) * feat: import all the missing tests before refactor * fix: biome errors for tests * fix: few type errors and add exceptiosn * fix: few more type errors * fix: remove agent port from tests * fix: exclude tests from tsconfig, bun run tests natively * fix: mcpServer test now waits for extension connected	2025-12-25 13:34:10 -08:00
shivammittal274	ab362d828d	chore: improve integration test (#123 )	2025-12-25 09:33:42 -08:00
Dani Akash	038056161e	feat: setup biome as the new linter (#114 ) * feat: install biome * chore: remove eslint * chore: remove prettier * chore: fix lint issues * chore: added biome precommit hook	2025-12-23 21:58:41 +05:30
Dani Akash	0fc9741a5d	refactor: streamline monorepo structure (#112 ) * feat: refactor packages into single project * feat: created apps directory * chore: removed duplicate packages * fix: delete package-lock.json since project uses bun	2025-12-22 23:39:21 +05:30

49 Commits