mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-13 15:46:22 +00:00
* fix(server): tighten CORS allowlist for the agent server Replace the permissive `origin || '*'` reflection in `defaultCorsConfig` with an explicit allowlist composed of: - a static list (empty by default) - comma-separated origins from `BROWSEROS_TRUSTED_ORIGINS` Add a small `requireTrustedOrigin` middleware that actively rejects (403) any request whose `Origin` header is present and not in the allowlist. The middleware is permissive when the `Origin` header is absent — CLI tools, internal Node clients, and some service-worker fetches legitimately omit it; the threat model only covers cross-origin browser fetches, which always carry `Origin` (it's on the Forbidden Header List, so JS cannot suppress it). Mount the middleware globally in `createHttpServer` after the existing `cors()` layer. Document the new env var in `.env.example`. Tests cover allowlist parsing (empty, single, multi, trims, case sensitivity, port match) and middleware behaviour (missing Origin allowed, allowlisted Origin allowed, unknown Origin rejected, "null" rejected, port mismatch rejected, disallowed Origin doesn't reach the handler). * fix(server): include published extension origin in default allowlist Pin the published BrowserOS extension origin in the static allowlist so the default install accepts the legitimate extension without requiring `BROWSEROS_TRUSTED_ORIGINS` to be populated. Additional origins (dev / alpha) keep working through the env override. * chore(server): trim .env.example comments * chore(server): drop redundant comments from cors helpers
BrowserOS Server
MCP server and AI agent loop powering BrowserOS browser automation. This is the core backend — it connects to Chromium via CDP, exposes 53+ MCP tools, and runs the AI agent that interprets natural language into browser actions.
Runtime: Bun · Framework: Hono · AI: Vercel AI SDK · License: AGPL-3.0
Architecture
┌──────────────────────────────────────────────────────────────────────┐
│ MCP Clients │
│ (Agent UI, Claude Code, Gemini CLI, browseros-cli) │
└──────────────────────────────────────────────────────────────────────┘
│
│ HTTP / SSE / StreamableHTTP
▼
┌──────────────────────────────────────────────────────────────────────┐
│ BrowserOS Server (Bun) │
│ │
│ /mcp ─────── MCP tool endpoints (53+ tools) │
│ /chat ────── Agent streaming (AI SDK) │
│ /health ─── Health check │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Agent Loop │ │
│ │ ├── Multi-provider AI SDK (OpenAI, Anthropic, Google, ...) │ │
│ │ ├── Session & conversation management │ │
│ │ ├── Context overflow handling + compaction │ │
│ │ └── MCP client for external tool servers │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ CDP-backed browser tools │ │
│ │ (tabs, bookmarks, history, navigation, tab groups, │ │
│ │ screenshots, DOM, network, console, input) │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
│ Chrome DevTools Protocol
▼
┌─────────────────────┐
│ Chromium CDP │
│ (port 9000) │
│ │
│ DOM, network, │
│ input, screenshots │
└─────────────────────┘
MCP Tools
53+ tools organized by category:
| Category | Tools |
|---|---|
| Navigation | new_page, navigate, go_back, go_forward, reload |
| Input | click, type, press_key, hover, scroll, drag, fill, clear, focus, check, uncheck, select_option, upload_file |
| Observation | take_snapshot, take_enhanced_snapshot, extract_text, extract_links |
| Screenshots | take_screenshot, save_screenshot |
| Evaluation | evaluate_script |
| Pages | list_pages, active_page, close_page, new_hidden_page |
| Windows | window_list, window_create, window_close, window_activate |
| Bookmarks | bookmark_list, bookmark_create, bookmark_remove, bookmark_update, bookmark_move, bookmark_search |
| History | history_search, history_recent, history_delete, history_delete_range |
| Tab Groups | group_list, group_create, group_update, group_ungroup, group_close |
| Filesystem | ls, read, write, edit, find, grep, bash |
| Memory | read_core, update_core, read_soul, update_soul, search_memory, write_memory |
| DOM | dom, dom_search |
| Console | get_console_messages |
| Other | browseros_info, handle_dialog, wait_for, download, export_pdf, output_file, nudges |
Agent Loop
The agent loop uses the Vercel AI SDK to orchestrate multi-step browser automation:
- Multi-provider support — OpenAI, Anthropic, Google, Azure, Bedrock, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible endpoint
- Session management — conversations persist in a local SQLite database
- Context overflow handling — automatic message compaction when context windows fill up
- MCP client — connects to external MCP servers for additional tool access (40+ app integrations)
- Tool adapter — bridges MCP tool definitions to AI SDK tool format
Provider Factory
The provider factory (src/agent/provider-factory.ts) creates AI SDK providers from runtime configuration, supporting hot-swapping between providers without restart.
Skills System
Skills are custom instruction sets that shape agent behavior:
- Catalog (
src/skills/catalog.ts) — registry of available skills - Defaults (
src/skills/defaults/) — built-in skill definitions - Loader (
src/skills/loader.ts) — loads skills from local and remote sources - Remote sync (
src/skills/remote-sync.ts) — syncs skills from the BrowserOS cloud
Directory Structure
apps/server/
├── src/
│ ├── index.ts # Server entry point
│ ├── main.ts # Server initialization
│ ├── api/ # HTTP route handlers
│ ├── agent/ # Agent loop
│ │ ├── ai-sdk-agent.ts # Main agent implementation
│ │ ├── provider-factory.ts# LLM provider factory
│ │ ├── session-store.ts # Conversation persistence
│ │ ├── compaction.ts # Context window management
│ │ ├── mcp-builder.ts # External MCP client setup
│ │ └── tool-adapter.ts # MCP → AI SDK tool bridge
│ ├── browser/ # Browser connection layer
│ ├── tools/ # MCP tool implementations
│ │ ├── navigation.ts
│ │ ├── input.ts
│ │ ├── snapshot.ts
│ │ ├── memory/
│ │ ├── filesystem/
│ │ └── ...
│ ├── skills/ # Skills system
│ ├── lib/ # Shared utilities
│ └── rpc.ts # JSON-RPC type definitions
├── tests/
│ ├── tools/ # Tool-level tests
│ ├── sdk/ # SDK integration tests
│ └── server.integration.test.ts
└── package.json
Development
Prerequisites
- Bun runtime
- A running BrowserOS instance (for CDP connectivity)
Setup
# Copy environment files
cp .env.example .env.development
# Start the server (with hot reload)
bun run start
See the agent monorepo README for full environment variable reference and process-compose setup.
Testing
bun run test:tools # Tool-level tests
bun run test:integration # Full integration tests (requires running BrowserOS)
bun run test:sdk # SDK integration tests
Building
# Build cross-platform server binaries
bun run build
# Build for specific targets
bun scripts/build/server.ts --target=darwin-arm64,linux-x64
# Build without uploading to R2
bun scripts/build/server.ts --target=all --no-upload
Ports
| Port | Env Variable | Purpose |
|---|---|---|
| 9100 | BROWSEROS_SERVER_PORT |
HTTP server (MCP, chat, health) |
| 9000 | BROWSEROS_CDP_PORT |
Chromium CDP (server connects as client) |
| 9300 | BROWSEROS_EXTENSION_PORT |
Legacy BrowserOS launch arg kept for compatibility |