Files
BrowserOS/packages/browseros-agent/apps/server
Nikhil dee3086a48 feat(server): cache klavis createStrata to unblock /chat hot path (#654)
* feat(server): cache klavis createStrata to unblock /chat hot path

Conversation creation in /chat was blocking on a Worker-proxied
klavisClient.createStrata round-trip every time the user had any
managed Klavis app connected. The 5s KLAVIS_TIMEOUT_MS in the
ai-worker proxy existed specifically to bound this latency, but
the same cap also caused user-visible 504s on /klavis/servers/remove
since Strata DELETE operations routinely take >5s. Without caching
we couldn't raise the timeout without regressing chat creation.

This adds an in-process cache for Strata createStrata responses,
keyed by (browserosId, hashed sorted-server-set) and gated by a 1h
TTL. The cache stores only immutable JSON metadata (strataServerUrl,
strataId, addedServers); per-session MCP clients continue to be
opened and disposed by AiSdkAgent exactly as before, which keeps
the cache concurrency-safe by construction.

Cache invalidation has two layers: (a) the cache key embeds the
server set, so adding/removing apps naturally produces a different
key; (b) POST /klavis/servers/add and DELETE /klavis/servers/remove
explicitly call invalidate(browserosId) after their underlying
Klavis API call succeeds, as defense-in-depth.

Other changes:
- Consolidates klavis-related services into a new
  apps/server/src/api/services/klavis/ directory; moves
  register-klavis-mcp.ts -> strata-proxy.ts and adds strata-cache.ts
  there. lib/clients/klavis/ stays unchanged.
- Refactors KlavisClient.removeServer into a low-level
  deleteServersFromStrata(strataId, servers) primitive. The
  cache-lookup + delete + invalidate orchestration moves up into
  routes/klavis.ts where it belongs, eliminating the lib->api
  layering inversion the original removeServer would have introduced.
- Uses Bun.hash (xxhash64) for fixed-width 16-hex-char keys, with
  serverKey verified on read to make collision risk strictly zero.
- Dedupes concurrent fetches via in-flight Promise sharing, with
  identity-checks before delete to avoid races between invalidate()
  and a racing replacement insert.

Follow-up (separate PR): bump KLAVIS_TIMEOUT_MS to 30000 in
ai-worker/wrangler.toml so /klavis/servers/remove stops 504-ing.

* fix: address greptile review comments for klavis strata cache

- Drop dead `invalidated` field on InflightEntry. It was added to
  support a "discard post-resolution if invalidated" check that I
  later replaced with identity-checked deletes during self-review,
  but I forgot to remove the field and the misleading comment
  referencing it. Simplify Map<string, InflightEntry> to plain
  Map<string, Promise<CacheEntry>>.
- Lower cache miss log from info to debug. Misses fire on every new
  conversation; matching the existing debug-level for hits.
- Stop routing the /klavis/servers/remove handler through
  klavisStrataCache.getOrFetch. The chat hot path keys its cache by
  the user's full enabled-server set (e.g. hash('Gmail,Linear')),
  so a single-server lookup here (hash('Gmail')) is guaranteed to
  miss, write a spurious entry, and then have it immediately
  cleared by invalidate() on the next line. Call createStrata
  directly to recover the strataId, mirroring the original
  removeServer flow.
2026-04-07 11:11:41 -07:00
..
2026-03-17 19:01:10 +05:30

BrowserOS Server

MCP server and AI agent loop powering BrowserOS browser automation. This is the core backend — it connects to Chromium via CDP, exposes 53+ MCP tools, and runs the AI agent that interprets natural language into browser actions.

Runtime: Bun · Framework: Hono · AI: Vercel AI SDK · License: AGPL-3.0

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         MCP Clients                                  │
│           (Agent UI, Claude Code, Gemini CLI, browseros-cli)         │
└──────────────────────────────────────────────────────────────────────┘
                                │
                                │ HTTP / SSE / StreamableHTTP
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│                    BrowserOS Server (Bun)                             │
│                                                                      │
│   /mcp ─────── MCP tool endpoints (53+ tools)                       │
│   /chat ────── Agent streaming (AI SDK)                              │
│   /health ─── Health check                                           │
│                                                                      │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  Agent Loop                                                  │   │
│   │  ├── Multi-provider AI SDK (OpenAI, Anthropic, Google, ...) │   │
│   │  ├── Session & conversation management                       │   │
│   │  ├── Context overflow handling + compaction                  │   │
│   │  └── MCP client for external tool servers                    │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  CDP-backed browser tools                                   │   │
│   │  (tabs, bookmarks, history, navigation, tab groups,         │   │
│   │   screenshots, DOM, network, console, input)                │   │
│   └─────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘
                                │
                                │ Chrome DevTools Protocol
                                ▼
                     ┌─────────────────────┐
                     │   Chromium CDP      │
                     │  (port 9000)        │
                     │                     │
                     │  DOM, network,      │
                     │  input, screenshots │
                     └─────────────────────┘

MCP Tools

53+ tools organized by category:

Category Tools
Navigation new_page, navigate, go_back, go_forward, reload
Input click, type, press_key, hover, scroll, drag, fill, clear, focus, check, uncheck, select_option, upload_file
Observation take_snapshot, take_enhanced_snapshot, extract_text, extract_links
Screenshots take_screenshot, save_screenshot
Evaluation evaluate_script
Pages list_pages, active_page, close_page, new_hidden_page
Windows window_list, window_create, window_close, window_activate
Bookmarks bookmark_list, bookmark_create, bookmark_remove, bookmark_update, bookmark_move, bookmark_search
History history_search, history_recent, history_delete, history_delete_range
Tab Groups group_list, group_create, group_update, group_ungroup, group_close
Filesystem ls, read, write, edit, find, grep, bash
Memory read_core, update_core, read_soul, update_soul, search_memory, write_memory
DOM dom, dom_search
Console get_console_messages
Other browseros_info, handle_dialog, wait_for, download, export_pdf, output_file, nudges

Agent Loop

The agent loop uses the Vercel AI SDK to orchestrate multi-step browser automation:

  • Multi-provider support — OpenAI, Anthropic, Google, Azure, Bedrock, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible endpoint
  • Session management — conversations persist in a local SQLite database
  • Context overflow handling — automatic message compaction when context windows fill up
  • MCP client — connects to external MCP servers for additional tool access (40+ app integrations)
  • Tool adapter — bridges MCP tool definitions to AI SDK tool format

Provider Factory

The provider factory (src/agent/provider-factory.ts) creates AI SDK providers from runtime configuration, supporting hot-swapping between providers without restart.

Skills System

Skills are custom instruction sets that shape agent behavior:

  • Catalog (src/skills/catalog.ts) — registry of available skills
  • Defaults (src/skills/defaults/) — built-in skill definitions
  • Loader (src/skills/loader.ts) — loads skills from local and remote sources
  • Remote sync (src/skills/remote-sync.ts) — syncs skills from the BrowserOS cloud

Graph Executor (Workflows)

The graph executor (src/graph/executor.ts) runs visual workflow graphs built in the BrowserOS workflow editor. Each node in the graph maps to agent actions, conditionals, or data transformations.

Directory Structure

apps/server/
├── src/
│   ├── index.ts               # Server entry point
│   ├── main.ts                # Server initialization
│   ├── api/                   # HTTP route handlers
│   ├── agent/                 # Agent loop
│   │   ├── ai-sdk-agent.ts    # Main agent implementation
│   │   ├── provider-factory.ts# LLM provider factory
│   │   ├── session-store.ts   # Conversation persistence
│   │   ├── compaction.ts      # Context window management
│   │   ├── mcp-builder.ts     # External MCP client setup
│   │   └── tool-adapter.ts    # MCP → AI SDK tool bridge
│   ├── browser/               # Browser connection layer
│   ├── tools/                 # MCP tool implementations
│   │   ├── navigation.ts
│   │   ├── input.ts
│   │   ├── snapshot.ts
│   │   ├── memory/
│   │   ├── filesystem/
│   │   └── ...
│   ├── skills/                # Skills system
│   ├── graph/                 # Workflow graph executor
│   ├── lib/                   # Shared utilities
│   └── rpc.ts                 # JSON-RPC type definitions
├── tests/
│   ├── tools/                 # Tool-level tests
│   ├── sdk/                   # SDK integration tests
│   └── server.integration.test.ts
├── graph/                     # Workflow graph definitions
└── package.json

Development

Prerequisites

  • Bun runtime
  • A running BrowserOS instance (for CDP connectivity)

Setup

# Copy environment files
cp .env.example .env.development

# Start the server (with hot reload)
bun run start

See the agent monorepo README for full environment variable reference and process-compose setup.

Testing

bun run test:tools          # Tool-level tests
bun run test:integration    # Full integration tests (requires running BrowserOS)
bun run test:sdk            # SDK integration tests

Building

# Build cross-platform server binaries
bun run build

# Build for specific targets
bun scripts/build/server.ts --target=darwin-arm64,linux-x64

# Build without uploading to R2
bun scripts/build/server.ts --target=all --no-upload

Ports

Port Env Variable Purpose
9100 BROWSEROS_SERVER_PORT HTTP server (MCP, chat, health)
9000 BROWSEROS_CDP_PORT Chromium CDP (server connects as client)
9300 BROWSEROS_EXTENSION_PORT Legacy BrowserOS launch arg kept for compatibility