Files
BrowserOS/packages/browseros-agent/apps/server
Dani Akash febaf58f91 fix: guard filesystem tools behind workspace selection and handle mid-conversation changes (#595)
* fix: remove filesystem tools when no workspace is selected

- Make workingDir optional on ResolvedAgentConfig
- Remove resolveSessionDir() fallback that always created a session dir,
  masking the no-workspace state and keeping filesystem tools available
- Gate buildFilesystemToolSet() on workingDir being defined
- Add workspace change detection mid-conversation — rebuilds the agent
  session when workspace is added, removed, or switched (same pattern
  as existing MCP server change detection)
- download_file falls back to tmpdir() when no workspace is set
- Memory/soul tools are unaffected — they use ~/BrowserOS/ paths

* fix: sanitize message history when session rebuilds with different tools

When a session is rebuilt due to workspace or MCP changes, the carried-over
message history may contain tool parts for tools that no longer exist in
the new session. The AI SDK validates messages against the current toolset
and rejects parts with no matching schema.

- Add toolNames getter to AiSdkAgent exposing registered tool names
- Add sanitizeMessagesForToolset() to strip tool parts referencing
  removed tools from carried-over messages
- Apply sanitization in both MCP and workspace session rebuilds

* fix: prepend tool-change context to user message on session rebuild

When workspace or MCP integrations change mid-conversation, prepend a
[Context: ...] block to the user's message explaining what changed.
This prevents the LLM from hallucinating tool usage based on patterns
in the carried-over conversation history.

Context messages vary by change type:
- Workspace removed: lists unavailable filesystem tools, suggests
  selecting a working directory
- Workspace added: confirms filesystem tools are available with path
- Workspace switched: notes the new working directory
- MCP changed: notes that some integration tools may have changed

Only fires on the first message after a rebuild. Invisible in the UI.

* fix: make MCP change context specific about which apps were added/removed

Diff the old and new MCP server keys to produce specific context like:
- "The following app integrations were disconnected: Gmail, Slack."
- "The following app integrations were connected: Linear."
instead of a generic "some tools may no longer be available" message.

* refactor: extract shared rebuildSession helper in ChatService

Eliminates the duplicated 20-line dispose→create→sanitize→store flow
that existed separately in both the MCP and workspace change-detection
blocks.

Co-authored-by: Dani Akash <DaniAkash@users.noreply.github.com>

* test: add sanitizeMessagesForToolset test suite

Tests for the message sanitization that runs when a session rebuilds
with a different toolset (workspace or MCP change mid-conversation):

- Preserves messages with no tool parts
- Preserves tool parts when tool is in the toolset
- Strips tool parts when tool is NOT in the toolset
- Strips multiple removed tool parts from same message
- Keeps browser tools while removing filesystem tools
- Removes messages that become empty after stripping
- Preserves non-tool parts (reasoning, step-start, file)
- Returns same references when no filtering needed
- Handles empty message array and empty toolset

* style: fix biome formatting in chat-service.ts

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-03-27 18:30:25 +05:30
..
2026-03-17 19:01:10 +05:30
2026-03-26 19:13:56 -07:00

BrowserOS Server

MCP server and AI agent loop powering BrowserOS browser automation. This is the core backend — it connects to Chromium via CDP, exposes 53+ MCP tools, and runs the AI agent that interprets natural language into browser actions.

Runtime: Bun · Framework: Hono · AI: Vercel AI SDK · License: AGPL-3.0

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         MCP Clients                                  │
│           (Agent UI, Claude Code, Gemini CLI, browseros-cli)         │
└──────────────────────────────────────────────────────────────────────┘
                                │
                                │ HTTP / SSE / StreamableHTTP
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│                    BrowserOS Server (Bun)                             │
│                                                                      │
│   /mcp ─────── MCP tool endpoints (53+ tools)                       │
│   /chat ────── Agent streaming (AI SDK)                              │
│   /health ─── Health check                                           │
│                                                                      │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  Agent Loop                                                  │   │
│   │  ├── Multi-provider AI SDK (OpenAI, Anthropic, Google, ...) │   │
│   │  ├── Session & conversation management                       │   │
│   │  ├── Context overflow handling + compaction                  │   │
│   │  └── MCP client for external tool servers                    │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
│   ┌────────────────────┐    ┌────────────────────────────────────┐  │
│   │  CDP Tools          │    │  Controller Tools                  │  │
│   │  (screenshots,      │    │  (tabs, bookmarks, history,        │  │
│   │   DOM, network,     │    │   navigation, tab groups)          │  │
│   │   console, input)   │    │                                    │  │
│   └────────────────────┘    └────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
          │                                         │
          │ Chrome DevTools Protocol                │ WebSocket
          ▼                                         ▼
┌─────────────────────┐              ┌─────────────────────────────────┐
│   Chromium CDP       │              │   Controller Extension          │
│  (port 9000)         │              │  (port 9300)                    │
│                      │              │                                 │
│  DOM, network,       │              │  chrome.tabs, chrome.history,   │
│  input, screenshots  │              │  chrome.bookmarks               │
└─────────────────────┘              └─────────────────────────────────┘

MCP Tools

53+ tools organized by category:

Category Tools
Navigation new_page, navigate, go_back, go_forward, reload
Input click, type, press_key, hover, scroll, drag, fill, clear, focus, check, uncheck, select_option, upload_file
Observation take_snapshot, take_enhanced_snapshot, extract_text, extract_links
Screenshots take_screenshot, save_screenshot
Evaluation evaluate_script
Pages list_pages, active_page, close_page, new_hidden_page
Windows window_list, window_create, window_close, window_activate
Bookmarks bookmark_list, bookmark_create, bookmark_remove, bookmark_update, bookmark_move, bookmark_search
History history_search, history_recent, history_delete, history_delete_range
Tab Groups group_list, group_create, group_update, group_ungroup, group_close
Filesystem ls, read, write, edit, find, grep, bash
Memory read_core, update_core, read_soul, update_soul, search_memory, write_memory
DOM dom, dom_search
Console get_console_messages
Other browseros_info, handle_dialog, wait_for, download, export_pdf, output_file, nudges

Agent Loop

The agent loop uses the Vercel AI SDK to orchestrate multi-step browser automation:

  • Multi-provider support — OpenAI, Anthropic, Google, Azure, Bedrock, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible endpoint
  • Session management — conversations persist in a local SQLite database
  • Context overflow handling — automatic message compaction when context windows fill up
  • MCP client — connects to external MCP servers for additional tool access (40+ app integrations)
  • Tool adapter — bridges MCP tool definitions to AI SDK tool format

Provider Factory

The provider factory (src/agent/provider-factory.ts) creates AI SDK providers from runtime configuration, supporting hot-swapping between providers without restart.

Skills System

Skills are custom instruction sets that shape agent behavior:

  • Catalog (src/skills/catalog.ts) — registry of available skills
  • Defaults (src/skills/defaults/) — built-in skill definitions
  • Loader (src/skills/loader.ts) — loads skills from local and remote sources
  • Remote sync (src/skills/remote-sync.ts) — syncs skills from the BrowserOS cloud

Graph Executor (Workflows)

The graph executor (src/graph/executor.ts) runs visual workflow graphs built in the BrowserOS workflow editor. Each node in the graph maps to agent actions, conditionals, or data transformations.

Directory Structure

apps/server/
├── src/
│   ├── index.ts               # Server entry point
│   ├── main.ts                # Server initialization
│   ├── api/                   # HTTP route handlers
│   ├── agent/                 # Agent loop
│   │   ├── ai-sdk-agent.ts    # Main agent implementation
│   │   ├── provider-factory.ts# LLM provider factory
│   │   ├── session-store.ts   # Conversation persistence
│   │   ├── compaction.ts      # Context window management
│   │   ├── mcp-builder.ts     # External MCP client setup
│   │   └── tool-adapter.ts    # MCP → AI SDK tool bridge
│   ├── browser/               # Browser connection layer
│   ├── tools/                 # MCP tool implementations
│   │   ├── navigation.ts
│   │   ├── input.ts
│   │   ├── snapshot.ts
│   │   ├── memory/
│   │   ├── filesystem/
│   │   └── ...
│   ├── skills/                # Skills system
│   ├── graph/                 # Workflow graph executor
│   ├── lib/                   # Shared utilities
│   └── rpc.ts                 # JSON-RPC type definitions
├── tests/
│   ├── tools/                 # Tool-level tests
│   ├── sdk/                   # SDK integration tests
│   └── server.integration.test.ts
├── graph/                     # Workflow graph definitions
└── package.json

Development

Prerequisites

  • Bun runtime
  • A running BrowserOS instance (for CDP and controller connections)

Setup

# Copy environment files
cp .env.example .env.development

# Start the server (with hot reload)
bun run start

See the agent monorepo README for full environment variable reference and process-compose setup.

Testing

bun run test:tools          # Tool-level tests
bun run test:integration    # Full integration tests (requires running BrowserOS)
bun run test:sdk            # SDK integration tests

Building

# Build cross-platform server binaries
bun run build

# Build for specific targets
bun scripts/build/server.ts --target=darwin-arm64,linux-x64

# Build without uploading to R2
bun scripts/build/server.ts --target=all --no-upload

Ports

Port Env Variable Purpose
9100 BROWSEROS_SERVER_PORT HTTP server (MCP, chat, health)
9000 BROWSEROS_CDP_PORT Chromium CDP (server connects as client)
9300 BROWSEROS_EXTENSION_PORT WebSocket for controller extension