mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-13 15:46:22 +00:00
docs: overhaul READMEs across all major packages (#594)
* docs: overhaul READMEs across all major packages - Root README: restructure with feature table, LLM provider table, comparison matrix, architecture map, and docs link - New: packages/browseros/README.md (Chromium fork build system) - New: apps/server/README.md (MCP server + agent loop) - New: packages/cdp-protocol/README.md (CDP type bindings) - Polish: agent-sdk (badges, prerequisites, multi-step example, links) - Polish: cli (badges, install section, MCP server section, links) - Polish: agent extension (badges, WXT mention, architecture context) - Polish: eval (badges, paper links) * fix: address review — consistent tool count and correct default port - CLI README: "54 MCP tools" → "53+ MCP tools" to match root and server docs - Agent SDK README: localhost:3000 → localhost:9100 to match documented default * docs: add detailed comparison links to How We Compare section * docs: update comparison table with verified competitor data Research all 5 competitors via official websites and docs: - Chrome: no AI agent, Gemini Nano only, MV3 weakening ad blocking - Brave: BYOM feature, local models via BYOM, Shields ad blocking, MV2+MV3 - Dia: Skills-based AI, no BYOK, cloud AI, acquired by Atlassian - Comet: full cloud-based agent, built-in ad blocking, extensions on desktop - Atlas: standalone Chromium browser with Agent Mode, 30-day cloud memory Renamed Arc/Dia column to just Dia (Arc is sunset). * docs: simplify comparison table with clean checkmarks and key differentiators * docs: update browseros-agent README — remove submodule note, add missing packages
This commit is contained in:
181
packages/browseros-agent/apps/server/README.md
Normal file
181
packages/browseros-agent/apps/server/README.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# BrowserOS Server
|
||||
|
||||
MCP server and AI agent loop powering BrowserOS browser automation. This is the core backend — it connects to Chromium via CDP, exposes 53+ MCP tools, and runs the AI agent that interprets natural language into browser actions.
|
||||
|
||||
> **Runtime:** [Bun](https://bun.sh) · **Framework:** [Hono](https://hono.dev) · **AI:** [Vercel AI SDK](https://sdk.vercel.ai) · **License:** [AGPL-3.0](../../../../LICENSE)
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ MCP Clients │
|
||||
│ (Agent UI, Claude Code, Gemini CLI, browseros-cli) │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTP / SSE / StreamableHTTP
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ BrowserOS Server (Bun) │
|
||||
│ │
|
||||
│ /mcp ─────── MCP tool endpoints (53+ tools) │
|
||||
│ /chat ────── Agent streaming (AI SDK) │
|
||||
│ /health ─── Health check │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Agent Loop │ │
|
||||
│ │ ├── Multi-provider AI SDK (OpenAI, Anthropic, Google, ...) │ │
|
||||
│ │ ├── Session & conversation management │ │
|
||||
│ │ ├── Context overflow handling + compaction │ │
|
||||
│ │ └── MCP client for external tool servers │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────┐ ┌────────────────────────────────────┐ │
|
||||
│ │ CDP Tools │ │ Controller Tools │ │
|
||||
│ │ (screenshots, │ │ (tabs, bookmarks, history, │ │
|
||||
│ │ DOM, network, │ │ navigation, tab groups) │ │
|
||||
│ │ console, input) │ │ │ │
|
||||
│ └────────────────────┘ └────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│ │
|
||||
│ Chrome DevTools Protocol │ WebSocket
|
||||
▼ ▼
|
||||
┌─────────────────────┐ ┌─────────────────────────────────┐
|
||||
│ Chromium CDP │ │ Controller Extension │
|
||||
│ (port 9000) │ │ (port 9300) │
|
||||
│ │ │ │
|
||||
│ DOM, network, │ │ chrome.tabs, chrome.history, │
|
||||
│ input, screenshots │ │ chrome.bookmarks │
|
||||
└─────────────────────┘ └─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## MCP Tools
|
||||
|
||||
53+ tools organized by category:
|
||||
|
||||
| Category | Tools |
|
||||
|----------|-------|
|
||||
| **Navigation** | `new_page`, `navigate`, `go_back`, `go_forward`, `reload` |
|
||||
| **Input** | `click`, `type`, `press_key`, `hover`, `scroll`, `drag`, `fill`, `clear`, `focus`, `check`, `uncheck`, `select_option`, `upload_file` |
|
||||
| **Observation** | `take_snapshot`, `take_enhanced_snapshot`, `extract_text`, `extract_links` |
|
||||
| **Screenshots** | `take_screenshot`, `save_screenshot` |
|
||||
| **Evaluation** | `evaluate_script` |
|
||||
| **Pages** | `list_pages`, `active_page`, `close_page`, `new_hidden_page` |
|
||||
| **Windows** | `window_list`, `window_create`, `window_close`, `window_activate` |
|
||||
| **Bookmarks** | `bookmark_list`, `bookmark_create`, `bookmark_remove`, `bookmark_update`, `bookmark_move`, `bookmark_search` |
|
||||
| **History** | `history_search`, `history_recent`, `history_delete`, `history_delete_range` |
|
||||
| **Tab Groups** | `group_list`, `group_create`, `group_update`, `group_ungroup`, `group_close` |
|
||||
| **Filesystem** | `ls`, `read`, `write`, `edit`, `find`, `grep`, `bash` |
|
||||
| **Memory** | `read_core`, `update_core`, `read_soul`, `update_soul`, `search_memory`, `write_memory` |
|
||||
| **DOM** | `dom`, `dom_search` |
|
||||
| **Console** | `get_console_messages` |
|
||||
| **Other** | `browseros_info`, `handle_dialog`, `wait_for`, `download`, `export_pdf`, `output_file`, `nudges` |
|
||||
|
||||
## Agent Loop
|
||||
|
||||
The agent loop uses the [Vercel AI SDK](https://sdk.vercel.ai) to orchestrate multi-step browser automation:
|
||||
|
||||
- **Multi-provider support** — OpenAI, Anthropic, Google, Azure, Bedrock, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible endpoint
|
||||
- **Session management** — conversations persist in a local SQLite database
|
||||
- **Context overflow handling** — automatic message compaction when context windows fill up
|
||||
- **MCP client** — connects to external MCP servers for additional tool access (40+ app integrations)
|
||||
- **Tool adapter** — bridges MCP tool definitions to AI SDK tool format
|
||||
|
||||
### Provider Factory
|
||||
|
||||
The provider factory (`src/agent/provider-factory.ts`) creates AI SDK providers from runtime configuration, supporting hot-swapping between providers without restart.
|
||||
|
||||
## Skills System
|
||||
|
||||
Skills are custom instruction sets that shape agent behavior:
|
||||
|
||||
- **Catalog** (`src/skills/catalog.ts`) — registry of available skills
|
||||
- **Defaults** (`src/skills/defaults/`) — built-in skill definitions
|
||||
- **Loader** (`src/skills/loader.ts`) — loads skills from local and remote sources
|
||||
- **Remote sync** (`src/skills/remote-sync.ts`) — syncs skills from the BrowserOS cloud
|
||||
|
||||
## Graph Executor (Workflows)
|
||||
|
||||
The graph executor (`src/graph/executor.ts`) runs visual workflow graphs built in the BrowserOS workflow editor. Each node in the graph maps to agent actions, conditionals, or data transformations.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
apps/server/
|
||||
├── src/
|
||||
│ ├── index.ts # Server entry point
|
||||
│ ├── main.ts # Server initialization
|
||||
│ ├── api/ # HTTP route handlers
|
||||
│ ├── agent/ # Agent loop
|
||||
│ │ ├── ai-sdk-agent.ts # Main agent implementation
|
||||
│ │ ├── provider-factory.ts# LLM provider factory
|
||||
│ │ ├── session-store.ts # Conversation persistence
|
||||
│ │ ├── compaction.ts # Context window management
|
||||
│ │ ├── mcp-builder.ts # External MCP client setup
|
||||
│ │ └── tool-adapter.ts # MCP → AI SDK tool bridge
|
||||
│ ├── browser/ # Browser connection layer
|
||||
│ ├── tools/ # MCP tool implementations
|
||||
│ │ ├── navigation.ts
|
||||
│ │ ├── input.ts
|
||||
│ │ ├── snapshot.ts
|
||||
│ │ ├── memory/
|
||||
│ │ ├── filesystem/
|
||||
│ │ └── ...
|
||||
│ ├── skills/ # Skills system
|
||||
│ ├── graph/ # Workflow graph executor
|
||||
│ ├── lib/ # Shared utilities
|
||||
│ └── rpc.ts # JSON-RPC type definitions
|
||||
├── tests/
|
||||
│ ├── tools/ # Tool-level tests
|
||||
│ ├── sdk/ # SDK integration tests
|
||||
│ └── server.integration.test.ts
|
||||
├── graph/ # Workflow graph definitions
|
||||
└── package.json
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Bun](https://bun.sh) runtime
|
||||
- A running BrowserOS instance (for CDP and controller connections)
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Copy environment files
|
||||
cp .env.example .env.development
|
||||
|
||||
# Start the server (with hot reload)
|
||||
bun run start
|
||||
```
|
||||
|
||||
See the [agent monorepo README](../../README.md) for full environment variable reference and `process-compose` setup.
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
bun run test:tools # Tool-level tests
|
||||
bun run test:integration # Full integration tests (requires running BrowserOS)
|
||||
bun run test:sdk # SDK integration tests
|
||||
```
|
||||
|
||||
### Building
|
||||
|
||||
```bash
|
||||
# Build cross-platform server binaries
|
||||
bun run build
|
||||
|
||||
# Build for specific targets
|
||||
bun scripts/build/server.ts --target=darwin-arm64,linux-x64
|
||||
|
||||
# Build without uploading to R2
|
||||
bun scripts/build/server.ts --target=all --no-upload
|
||||
```
|
||||
|
||||
## Ports
|
||||
|
||||
| Port | Env Variable | Purpose |
|
||||
|------|-------------|---------|
|
||||
| 9100 | `BROWSEROS_SERVER_PORT` | HTTP server (MCP, chat, health) |
|
||||
| 9000 | `BROWSEROS_CDP_PORT` | Chromium CDP (server connects as client) |
|
||||
| 9300 | `BROWSEROS_EXTENSION_PORT` | WebSocket for controller extension |
|
||||
Reference in New Issue
Block a user