docs: overhaul READMEs across all major packages (#594)

* docs: overhaul READMEs across all major packages

- Root README: restructure with feature table, LLM provider table,
  comparison matrix, architecture map, and docs link
- New: packages/browseros/README.md (Chromium fork build system)
- New: apps/server/README.md (MCP server + agent loop)
- New: packages/cdp-protocol/README.md (CDP type bindings)
- Polish: agent-sdk (badges, prerequisites, multi-step example, links)
- Polish: cli (badges, install section, MCP server section, links)
- Polish: agent extension (badges, WXT mention, architecture context)
- Polish: eval (badges, paper links)

* fix: address review — consistent tool count and correct default port

- CLI README: "54 MCP tools" → "53+ MCP tools" to match root and server docs
- Agent SDK README: localhost:3000 → localhost:9100 to match documented default

* docs: add detailed comparison links to How We Compare section

* docs: update comparison table with verified competitor data

Research all 5 competitors via official websites and docs:
- Chrome: no AI agent, Gemini Nano only, MV3 weakening ad blocking
- Brave: BYOM feature, local models via BYOM, Shields ad blocking, MV2+MV3
- Dia: Skills-based AI, no BYOK, cloud AI, acquired by Atlassian
- Comet: full cloud-based agent, built-in ad blocking, extensions on desktop
- Atlas: standalone Chromium browser with Agent Mode, 30-day cloud memory

Renamed Arc/Dia column to just Dia (Arc is sunset).

* docs: simplify comparison table with clean checkmarks and key differentiators

* docs: update browseros-agent README — remove submodule note, add missing packages
This commit is contained in:
Dani Akash
2026-03-27 11:59:04 +05:30
committed by GitHub
parent aba7a10430
commit b3003542d8
9 changed files with 652 additions and 162 deletions

238
README.md
View File

@@ -6,6 +6,7 @@
[![Slack](https://img.shields.io/badge/Slack-Join%20us-4A154B?logo=slack&logoColor=white)](https://dub.sh/browserOS-slack)
[![Twitter](https://img.shields.io/twitter/follow/browserOS_ai?style=social)](https://twitter.com/browseros_ai)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE)
[![Docs](https://img.shields.io/badge/Docs-docs.browseros.com-blue)](https://docs.browseros.com)
<br></br>
<a href="https://files.browseros.com/download/BrowserOS.dmg">
<img src="https://img.shields.io/badge/Download-macOS-black?style=flat&logo=apple&logoColor=white" alt="Download for macOS (beta)" />
@@ -22,38 +23,72 @@
<br />
</div>
##
🌐 BrowserOS is an open-source Chromium fork that runs AI agents natively. **The privacy-first alternative to ChatGPT Atlas, Perplexity Comet, and Dia.**
BrowserOS is an open-source Chromium fork that runs AI agents natively. **The privacy-first alternative to ChatGPT Atlas, Perplexity Comet, and Dia.**
🔒 Use your own API keys or run local models with Ollama. Your data never leaves your machine.
Use your own API keys or run local models with Ollama. Your data never leaves your machine.
💡 Join our [Discord](https://discord.gg/YKwjt5vuKr) or [Slack](https://dub.sh/browserOS-slack) and help us build! Have feature requests? [Suggest here](https://github.com/browseros-ai/BrowserOS/issues/99).
> **[Documentation](https://docs.browseros.com)** · **[Discord](https://discord.gg/YKwjt5vuKr)** · **[Slack](https://dub.sh/browserOS-slack)** · **[Twitter](https://x.com/browserOS_ai)** · **[Feature Requests](https://github.com/browseros-ai/BrowserOS/issues/99)**
## Quick start
## Quick Start
1. Download and install BrowserOS:
- [macOS](https://files.browseros.com/download/BrowserOS.dmg)
- [Windows](https://files.browseros.com/download/BrowserOS_installer.exe)
- [Linux (AppImage)](https://files.browseros.com/download/BrowserOS.AppImage)
- [Linux (Debian)](https://cdn.browseros.com/download/BrowserOS.deb)
1. **Download and install** BrowserOS — [macOS](https://files.browseros.com/download/BrowserOS.dmg) · [Windows](https://files.browseros.com/download/BrowserOS_installer.exe) · [Linux (AppImage)](https://files.browseros.com/download/BrowserOS.AppImage) · [Linux (Debian)](https://cdn.browseros.com/download/BrowserOS.deb)
2. **Import your Chrome data** (optional) — bookmarks, passwords, extensions all carry over
3. **Connect your AI provider** — Claude, OpenAI, Gemini, ChatGPT Pro via OAuth, or local models via Ollama/LM Studio
2. Import your Chrome data (optional)
## Features
3. Connect your AI provider — use Claude, OpenAI, Gemini, or local models via Ollama and LMStudio.
| Feature | Description | Docs |
|---------|-------------|------|
| **AI Agent** | 53+ browser automation tools — navigate, click, type, extract data, all with natural language | [Guide](https://docs.browseros.com/getting-started) |
| **MCP Server** | Control the browser from Claude Code, Gemini CLI, or any MCP client | [Setup](https://docs.browseros.com/features/use-with-claude-code) |
| **Workflows** | Build repeatable browser automations with a visual graph builder | [Docs](https://docs.browseros.com/features/workflows) |
| **Cowork** | Combine browser automation with local file operations — research the web, save reports to your folder | [Docs](https://docs.browseros.com/features/cowork) |
| **Scheduled Tasks** | Run agents on autopilot — daily, hourly, or every few minutes | [Docs](https://docs.browseros.com/features/scheduled-tasks) |
| **Memory** | Persistent memory across conversations — your assistant remembers context over time | [Docs](https://docs.browseros.com/features/memory) |
| **SOUL.md** | Define your AI's personality and instructions in a single markdown file | [Docs](https://docs.browseros.com/features/soul-md) |
| **LLM Hub** | Compare Claude, ChatGPT, and Gemini responses side-by-side on any page | [Docs](https://docs.browseros.com/features/llm-chat-hub) |
| **40+ App Integrations** | Gmail, Slack, GitHub, Linear, Notion, Figma, Salesforce, and more via MCP | [Docs](https://docs.browseros.com/features/connect-apps) |
| **Vertical Tabs** | Side-panel tab management — stay organized even with 100+ tabs open | [Docs](https://docs.browseros.com/features/vertical-tabs) |
| **Ad Blocking** | uBlock Origin + Manifest V2 support — [10x more protection](https://docs.browseros.com/features/ad-blocking) than Chrome | [Docs](https://docs.browseros.com/features/ad-blocking) |
| **Cloud Sync** | Sync browser config and agent history across devices | [Docs](https://docs.browseros.com/features/sync) |
| **Skills** | Custom instruction sets that shape how your AI assistant behaves | [Docs](https://docs.browseros.com/features/skills) |
| **Smart Nudges** | Contextual suggestions to connect apps and use features at the right moment | [Docs](https://docs.browseros.com/features/smart-nudges) |
4. Start automating!
## Demos
### BrowserOS agent in action
[![BrowserOS agent in action](docs/videos/browserOS-agent-in-action.gif)](https://www.youtube.com/watch?v=SoSFev5R5dI)
<br/><br/>
### Install [BrowserOS as MCP](https://docs.browseros.com/features/use-with-claude-code) and control it from `claude-code`
https://github.com/user-attachments/assets/c725d6df-1a0d-40eb-a125-ea009bf664dc
<br/><br/>
### Use BrowserOS to chat
https://github.com/user-attachments/assets/726803c5-8e36-420e-8694-c63a2607beca
<br/><br/>
### Use BrowserOS to scrape data
https://github.com/user-attachments/assets/9f038216-bc24-4555-abf1-af2adcb7ebc0
<br/><br/>
## Install `browseros-cli`
Use `browseros-cli` to launch and control BrowserOS from the terminal or from AI coding agents like Claude Code.
### macOS / Linux
**macOS / Linux:**
```bash
curl -fsSL https://cdn.browseros.com/cli/install.sh | bash
```
### Windows
**Windows:**
```powershell
irm https://cdn.browseros.com/cli/install.ps1 | iex
@@ -61,125 +96,110 @@ irm https://cdn.browseros.com/cli/install.ps1 | iex
After install, run `browseros-cli init` to connect the CLI to your running BrowserOS instance.
## What makes BrowserOS special
- 🏠 Feels like home — same Chrome interface, all your extensions just work
- 🤖 AI agents that run on YOUR browser, not in the cloud
- 🔒 Privacy first — bring your own keys or run local models with Ollama. Your browsing history stays on your machine
- 🤝 [BrowserOS as MCP server](https://docs.browseros.com/features/use-with-claude-code) — control the browser from `claude-code`, `gemini-cli`, or any MCP client (31 tools)
- 🔄 [Workflows](https://docs.browseros.com/features/workflows) — build repeatable browser automations with a visual graph builder
- 📂 [Cowork](https://docs.browseros.com/features/cowork) — combine browser automation with local file operations. Research the web, save reports to your folder
- ⏰ [Scheduled Tasks](https://docs.browseros.com/features/scheduled-tasks) — run the agent on autopilot, daily or every few minutes
- 💬 [LLM Hub](https://docs.browseros.com/features/llm-chat-hub) — compare Claude, ChatGPT, and Gemini side-by-side on any page
- 📌 [Vertical Tabs](https://docs.browseros.com/features/vertical-tabs) — move tabs to a side panel for a cleaner layout, even with 100+ tabs open
- 🛡️ Built-in ad blocker — [10x more protection than Chrome](https://docs.browseros.com/features/ad-blocking) with uBlock Origin + Manifest V2 support
- 🚀 100% open source under AGPL-3.0
## LLM Providers
## Demos
BrowserOS works with any LLM. Bring your own keys, use OAuth, or run models locally.
### 🤖 BrowserOS agent in action
[![BrowserOS agent in action](docs/videos/browserOS-agent-in-action.gif)](https://www.youtube.com/watch?v=SoSFev5R5dI)
<br/><br/>
| Provider | Type | Auth |
|----------|------|------|
| Kimi K2.5 | Cloud (default) | Built-in |
| ChatGPT Pro/Plus | Cloud | [OAuth](https://docs.browseros.com/features/chatgpt) |
| GitHub Copilot | Cloud | [OAuth](https://docs.browseros.com/features/github-copilot) |
| Qwen Code | Cloud | [OAuth](https://docs.browseros.com/features/qwen-code) |
| Claude (Anthropic) | Cloud | API key |
| GPT-4o / o3 (OpenAI) | Cloud | API key |
| Gemini (Google) | Cloud | API key |
| Azure OpenAI | Cloud | API key |
| AWS Bedrock | Cloud | IAM credentials |
| OpenRouter | Cloud | API key |
| Ollama | Local | [Setup](https://docs.browseros.com/features/ollama) |
| LM Studio | Local | [Setup](https://docs.browseros.com/features/lm-studio) |
### 🎇 Install [BrowserOS as MCP](https://docs.browseros.com/features/use-with-claude-code) and control it from `claude-code`
## How We Compare
https://github.com/user-attachments/assets/c725d6df-1a0d-40eb-a125-ea009bf664dc
| | BrowserOS | Chrome | Brave | Dia | Comet | Atlas |
|---|:---:|:---:|:---:|:---:|:---:|:---:|
| Open Source | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| AI Agent | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ |
| MCP Server | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Visual Workflows | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Cowork (files + browser) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Scheduled Tasks | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Bring Your Own Keys | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Local Models (Ollama) | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Local-first Privacy | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Ad Blocking (MV2) | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ |
<br/><br/>
**Detailed comparisons:**
- [BrowserOS vs Chrome DevTools MCP](https://docs.browseros.com/comparisons/chrome-devtools-mcp) — developer-focused comparison for browser automation
- [BrowserOS vs Claude Cowork](https://docs.browseros.com/comparisons/claude-cowork) — getting real work done with AI
- [BrowserOS vs OpenClaw](https://docs.browseros.com/comparisons/openclaw) — everyday AI assistance
### 💬 Use BrowserOS to chat
## Architecture
https://github.com/user-attachments/assets/726803c5-8e36-420e-8694-c63a2607beca
BrowserOS is a monorepo with two main subsystems: the **browser** (Chromium fork) and the **agent platform** (TypeScript/Go).
<br/><br/>
```
BrowserOS/
├── packages/browseros/ # Chromium fork + build system (Python)
│ ├── chromium_patches/ # Patches applied to Chromium source
│ ├── build/ # Build CLI and modules
│ └── resources/ # Icons, entitlements, signing
├── packages/browseros-agent/ # Agent platform (TypeScript/Go)
│ ├── apps/
│ │ ├── server/ # MCP server + AI agent loop (Bun)
│ │ ├── agent/ # Browser extension UI (WXT + React)
│ │ ├── cli/ # CLI tool (Go)
│ │ ├── eval/ # Benchmark framework
│ │ └── controller-ext/ # Chrome API bridge extension
│ │
│ └── packages/
│ ├── agent-sdk/ # Node.js SDK (npm: @browseros-ai/agent-sdk)
│ ├── cdp-protocol/ # CDP type bindings
│ └── shared/ # Shared constants
```
### ⚡ Use BrowserOS to scrape data
https://github.com/user-attachments/assets/9f038216-bc24-4555-abf1-af2adcb7ebc0
<br/><br/>
## Why We're Building BrowserOS
For the first time since Netscape pioneered the web in 1994, AI gives us the chance to completely reimagine the browser. We've seen tools like Cursor deliver 10x productivity gains for developers—yet everyday browsing remains frustratingly archaic.
You're likely juggling 70+ tabs, battling your browser instead of having it assist you. Routine tasks, like ordering something from amazon or filling a form should be handled seamlessly by AI agents.
At BrowserOS, we're convinced that AI should empower you by automating tasks locally and securely—keeping your data private. We are building the best browser for this future!
## How we compare
<details>
<summary><b>vs Chrome</b></summary>
<br>
While we're grateful for Google open-sourcing Chromium, but Chrome hasn't evolved much in 10 years. No AI features, no automation, no MCP support.
</details>
<details>
<summary><b>vs Brave</b></summary>
<br>
We love what Brave started, but they've spread themselves too thin with crypto, search, VPNs. We're laser-focused on AI-powered browsing.
</details>
<details>
<summary><b>vs Arc/Dia</b></summary>
<br>
Many loved Arc, but it was closed source. When they abandoned users, there was no recourse. We're 100% open source - fork it anytime!
</details>
<details>
<summary><b>vs Perplexity Comet</b></summary>
<br>
They're a search/ad company. Your browser history becomes their product. We keep everything local.
</details>
<details>
<summary><b>vs ChatGPT Atlas</b></summary>
<br>
Your browsing data could be used for ads or to train their models. We keep your history and agent interactions strictly local.
</details>
| Package | What it does |
|---------|-------------|
| [`packages/browseros`](packages/browseros/) | Chromium fork — patches, build system, signing |
| [`apps/server`](packages/browseros-agent/apps/server/) | Bun server exposing 53+ MCP tools and running the AI agent loop |
| [`apps/agent`](packages/browseros-agent/apps/agent/) | Browser extension — new tab, side panel chat, onboarding, settings |
| [`apps/cli`](packages/browseros-agent/apps/cli/) | Go CLI — control BrowserOS from the terminal or AI coding agents |
| [`apps/eval`](packages/browseros-agent/apps/eval/) | Benchmark framework — WebVoyager, Mind2Web evaluation |
| [`agent-sdk`](packages/browseros-agent/packages/agent-sdk/) | Node.js SDK for browser automation with natural language |
| [`cdp-protocol`](packages/browseros-agent/packages/cdp-protocol/) | Type-safe Chrome DevTools Protocol bindings |
## Contributing
We'd love your help making BrowserOS better!
We'd love your help making BrowserOS better! See our [Contributing Guide](CONTRIBUTING.md) for details.
- 🐛 [Report bugs](https://github.com/browseros-ai/BrowserOS/issues)
- 💡 [Suggest features](https://github.com/browseros-ai/BrowserOS/issues/99)
- 💬 [Join Discord](https://discord.gg/YKwjt5vuKr)
- 🐦 [Follow on Twitter](https://x.com/browserOS_ai)
- [Report bugs](https://github.com/browseros-ai/BrowserOS/issues)
- [Suggest features](https://github.com/browseros-ai/BrowserOS/issues/99)
- [Join Discord](https://discord.gg/YKwjt5vuKr) · [Join Slack](https://dub.sh/browserOS-slack)
- [Follow on Twitter](https://x.com/browserOS_ai)
**Agent development** (TypeScript/Go) — see the [agent monorepo README](packages/browseros-agent/README.md) for setup instructions.
**Browser development** (C++/Python) — requires ~100GB disk space. See [`packages/browseros`](packages/browseros/) for build instructions.
## Credits
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
- [The Chromium Project](https://www.chromium.org/) — at the core of BrowserOS, making it possible to exist in the first place.
## License
BrowserOS is open source under the [AGPL-3.0 license](LICENSE).
## Credits
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) - BrowserOS uses some patches for enhanced privacy. Thanks to everyone behind this project!
- [The Chromium Project](https://www.chromium.org/) - At the core of BrowserOS, making it possible to exist in the first place.
## Citation
If you use BrowserOS in your research or project, please cite:
```bibtex
@software{browseros2025,
author = {Sonti, Nithin and Sonti, Nikhil and {BrowserOS-team}},
title = {BrowserOS: The open-source Agentic browser},
url = {https://github.com/browseros-ai/BrowserOS},
year = {2025},
publisher = {GitHub},
license = {AGPL-3.0},
}
```
Copyright &copy; 2025 Felafax, Inc.
## Stargazers
Thank you to all our supporters!
[![Star History Chart](https://api.star-history.com/svg?repos=browseros-ai/BrowserOS&type=Date)](https://www.star-history.com/#browseros-ai/BrowserOS&Date)
##
<p align="center">
Built with ❤️ from San Francisco
</p>

View File

@@ -1,8 +1,6 @@
# BrowserOS Agent
Monorepo for the BrowserOS-agent -- contains 3 packages: agent-UI, server (which contains the agent loop) and controller-extension (which is used by the tools within the agent loop).
> **⚠️ NOTE:** This is only a submodule, the main project is at -- https://github.com/browseros-ai/BrowserOS
The agent platform powering [BrowserOS](https://github.com/browseros-ai/BrowserOS) — contains the MCP server, agent UI, CLI, evaluation framework, and SDK.
## Monorepo Structure
@@ -10,17 +8,25 @@ Monorepo for the BrowserOS-agent -- contains 3 packages: agent-UI, server (which
apps/
server/ # Bun server - MCP endpoints + agent loop
agent/ # Agent UI (Chrome extension)
cli/ # Go CLI for controlling BrowserOS from the terminal
eval/ # Evaluation framework for benchmarking agents
controller-ext/ # BrowserOS Controller (Chrome extension for chrome.* APIs)
packages/
agent-sdk/ # Node.js SDK (@browseros-ai/agent-sdk)
cdp-protocol/ # Type-safe Chrome DevTools Protocol bindings
shared/ # Shared constants (ports, timeouts, limits)
```
| Package | Description |
|---------|-------------|
| `apps/server` | Bun server exposing MCP tools and running the agent loop |
| `apps/agent` | Agent UI - Chrome extension for the chat interface |
| `apps/controller-ext` | BrowserOS Controller - Chrome extension that bridges `chrome.*` APIs (tabs, bookmarks, history) to the server via WebSocket |
| `apps/agent` | Agent UI Chrome extension for the chat interface |
| `apps/cli` | Go CLI — control BrowserOS from the terminal or AI coding agents |
| `apps/eval` | Benchmark framework — WebVoyager, Mind2Web evaluation |
| `apps/controller-ext` | BrowserOS Controller — bridges `chrome.*` APIs to the server via WebSocket |
| `packages/agent-sdk` | Node.js SDK for browser automation with natural language |
| `packages/cdp-protocol` | Auto-generated CDP type bindings used by the server |
| `packages/shared` | Shared constants used across packages |
## Architecture

View File

@@ -1,16 +1,24 @@
# BrowserOS Agent Chrome Extension
# BrowserOS Agent Extension
The official Chrome extension for BrowserOS Agent, providing the UI layer for interacting with BrowserOS Core and Controllers. This extension enables intelligent browser automation, AI-powered search, and seamless integration with multiple LLM providers.
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](../../../../LICENSE)
The built-in browser extension that powers BrowserOS's AI interface — new tab with unified search, side panel chat, onboarding, and settings. Built with [WXT](https://wxt.dev) and React.
> For user-facing feature documentation, see [docs.browseros.com](https://docs.browseros.com).
## Features
- **AI-Powered New Tab**: Custom new tab page with unified search across Google and AI assistants
- **Side Panel Chat**: Full-featured chat interface for interacting with BrowserOS Core
- **Side Panel Chat**: Full-featured chat interface for interacting with BrowserOS
- **Multi-Provider Support**: Connect to various LLM providers (OpenAI, Anthropic, Azure, Bedrock, and more)
- **MCP Integration**: Model Context Protocol support for extending AI capabilities
- **Visual Feedback**: Animated glow effect on tabs during AI agent operations
- **Privacy-First**: Local data handling with configurable provider settings
## How It Connects
The extension communicates with the [BrowserOS Server](../../apps/server/) running locally. The server handles the AI agent loop, MCP tools, and CDP connections — the extension provides the UI layer.
## Project Structure
```
@@ -80,47 +88,20 @@ Settings dashboard with multiple sections:
Content script that creates a visual indicator (pulsing orange glow) around the browser viewport when an AI agent is actively working on a tab.
## How Tools Are Used
### Bun
Bun is the exclusive runtime and package manager:
- All scripts use `bun run <script>` instead of npm
- Package installation via `bun install`
- Environment files automatically loaded (no dotenv needed)
- Enforced via `engines` field in `package.json`
```bash
bun install # Install dependencies
bun run dev # Development mode
bun run build # Production build
bun run lint # Run Biome linting
```
### Biome
Unified linter and formatter configured in `biome.json`:
- **Formatting**: 2-space indentation, single quotes, no semicolons
- **Linting**: Recommended rules plus custom rules for unused imports/variables
- **CSS Support**: Tailwind directives parsing enabled
- **Import Organization**: Automatic import sorting via assist actions
```bash
bun run lint # Check for issues
bun run lint:fix # Auto-fix issues
```
## Development
### Prerequisites
- [Bun](https://bun.sh) installed
- Chrome or Chromium-based browser
- BrowserOS Core running locally (for full functionality)
- BrowserOS Server running locally (for full functionality)
### Setup
```bash
# Copy environment file
cp .env.example .env.development
# Install dependencies
bun install
@@ -153,12 +134,30 @@ SENTRY_AUTH_TOKEN=your-token
### GraphQL Schema
Codegen requires a GraphQL schema. By default it uses the bundled `schema/schema.graphql`, so no extra setup is needed. If you have access to the original API source, you can set the following environment variable
Codegen requires a GraphQL schema. By default it uses the bundled `schema/schema.graphql`, so no extra setup is needed. If you have access to the original API source, you can set the following environment variable:
```env
GRAPHQL_SCHEMA_PATH=/path/to/api-repo/.../schema.graphql
```
## Development Tooling
### Bun
Bun is the exclusive runtime and package manager:
- All scripts use `bun run <script>` instead of npm
- Package installation via `bun install`
- Environment files automatically loaded (no dotenv needed)
- Enforced via `engines` field in `package.json`
### Biome
Unified linter and formatter configured in `biome.json`:
- **Formatting**: 2-space indentation, single quotes, no semicolons
- **Linting**: Recommended rules plus custom rules for unused imports/variables
- **CSS Support**: Tailwind directives parsing enabled
- **Import Organization**: Automatic import sorting via assist actions
## Scripts
| Script | Description |
@@ -169,4 +168,5 @@ GRAPHQL_SCHEMA_PATH=/path/to/api-repo/.../schema.graphql
| `bun run lint` | Run Biome linter |
| `bun run lint:fix` | Auto-fix linting issues |
| `bun run typecheck` | Run TypeScript type checking |
| `bun run codegen` | Generate GraphQL types |
| `bun run clean:cache` | Clear build caches |

View File

@@ -1,17 +1,39 @@
# browseros-cli
Command-line interface for controlling BrowserOS via MCP. Talks to the BrowserOS MCP server over JSON-RPC 2.0 / StreamableHTTP.
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](../../../../LICENSE)
## Setup
Command-line interface for controlling BrowserOS — launch and automate the browser from the terminal or from AI coding agents like Claude Code and Gemini CLI.
Communicates with the BrowserOS MCP server over JSON-RPC 2.0 / StreamableHTTP. All 53+ MCP tools are mapped to CLI commands.
## Install
### macOS / Linux
```bash
curl -fsSL https://cdn.browseros.com/cli/install.sh | bash
```
### Windows
```powershell
irm https://cdn.browseros.com/cli/install.ps1 | iex
```
### Build from Source
Requires Go 1.25+.
```bash
# Build
make
make # Build binary
make install # Install to $GOPATH/bin
```
## Setup
```bash
# First run — configure server connection
./browseros-cli init
browseros-cli init
```
The `init` command prompts for your MCP server URL. Find it in:
@@ -67,6 +89,12 @@ browseros-cli history recent
browseros-cli group list
```
## Use as MCP Server
BrowserOS exposes an MCP server that AI coding agents can connect to directly. The CLI is the easiest way to verify the connection and interact with tools from the terminal.
To connect Claude Code, Gemini CLI, or any MCP client, see the [MCP setup guide](https://docs.browseros.com/features/use-with-claude-code).
## Global Flags
| Flag | Env Var | Description |
@@ -163,4 +191,8 @@ The CLI communicates with BrowserOS via two HTTP POST requests per command:
1. `initialize` — MCP handshake
2. `tools/call` — execute the actual tool
All 54 MCP tools are mapped to CLI commands.
## Links
- [Documentation](https://docs.browseros.com)
- [MCP Setup Guide](https://docs.browseros.com/features/use-with-claude-code)
- [Changelog](./CHANGELOG.md)

View File

@@ -1,6 +1,8 @@
# BrowserOS Eval
Evaluation framework for benchmarking BrowserOS browser automation agents. Runs tasks from standard datasets (WebVoyager, Mind2Web), captures trajectories with screenshots, and grades results automatically.
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](../../../../LICENSE)
Evaluation framework for benchmarking BrowserOS browser automation agents. Runs tasks from standard datasets ([WebVoyager](https://arxiv.org/abs/2401.13919), [Mind2Web](https://arxiv.org/abs/2306.06070)), captures trajectories with screenshots, and grades results automatically.
## Prerequisites

View File

@@ -0,0 +1,181 @@
# BrowserOS Server
MCP server and AI agent loop powering BrowserOS browser automation. This is the core backend — it connects to Chromium via CDP, exposes 53+ MCP tools, and runs the AI agent that interprets natural language into browser actions.
> **Runtime:** [Bun](https://bun.sh) · **Framework:** [Hono](https://hono.dev) · **AI:** [Vercel AI SDK](https://sdk.vercel.ai) · **License:** [AGPL-3.0](../../../../LICENSE)
## Architecture
```
┌──────────────────────────────────────────────────────────────────────┐
│ MCP Clients │
│ (Agent UI, Claude Code, Gemini CLI, browseros-cli) │
└──────────────────────────────────────────────────────────────────────┘
│ HTTP / SSE / StreamableHTTP
┌──────────────────────────────────────────────────────────────────────┐
│ BrowserOS Server (Bun) │
│ │
│ /mcp ─────── MCP tool endpoints (53+ tools) │
│ /chat ────── Agent streaming (AI SDK) │
│ /health ─── Health check │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Agent Loop │ │
│ │ ├── Multi-provider AI SDK (OpenAI, Anthropic, Google, ...) │ │
│ │ ├── Session & conversation management │ │
│ │ ├── Context overflow handling + compaction │ │
│ │ └── MCP client for external tool servers │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────┐ ┌────────────────────────────────────┐ │
│ │ CDP Tools │ │ Controller Tools │ │
│ │ (screenshots, │ │ (tabs, bookmarks, history, │ │
│ │ DOM, network, │ │ navigation, tab groups) │ │
│ │ console, input) │ │ │ │
│ └────────────────────┘ └────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│ │
│ Chrome DevTools Protocol │ WebSocket
▼ ▼
┌─────────────────────┐ ┌─────────────────────────────────┐
│ Chromium CDP │ │ Controller Extension │
│ (port 9000) │ │ (port 9300) │
│ │ │ │
│ DOM, network, │ │ chrome.tabs, chrome.history, │
│ input, screenshots │ │ chrome.bookmarks │
└─────────────────────┘ └─────────────────────────────────┘
```
## MCP Tools
53+ tools organized by category:
| Category | Tools |
|----------|-------|
| **Navigation** | `new_page`, `navigate`, `go_back`, `go_forward`, `reload` |
| **Input** | `click`, `type`, `press_key`, `hover`, `scroll`, `drag`, `fill`, `clear`, `focus`, `check`, `uncheck`, `select_option`, `upload_file` |
| **Observation** | `take_snapshot`, `take_enhanced_snapshot`, `extract_text`, `extract_links` |
| **Screenshots** | `take_screenshot`, `save_screenshot` |
| **Evaluation** | `evaluate_script` |
| **Pages** | `list_pages`, `active_page`, `close_page`, `new_hidden_page` |
| **Windows** | `window_list`, `window_create`, `window_close`, `window_activate` |
| **Bookmarks** | `bookmark_list`, `bookmark_create`, `bookmark_remove`, `bookmark_update`, `bookmark_move`, `bookmark_search` |
| **History** | `history_search`, `history_recent`, `history_delete`, `history_delete_range` |
| **Tab Groups** | `group_list`, `group_create`, `group_update`, `group_ungroup`, `group_close` |
| **Filesystem** | `ls`, `read`, `write`, `edit`, `find`, `grep`, `bash` |
| **Memory** | `read_core`, `update_core`, `read_soul`, `update_soul`, `search_memory`, `write_memory` |
| **DOM** | `dom`, `dom_search` |
| **Console** | `get_console_messages` |
| **Other** | `browseros_info`, `handle_dialog`, `wait_for`, `download`, `export_pdf`, `output_file`, `nudges` |
## Agent Loop
The agent loop uses the [Vercel AI SDK](https://sdk.vercel.ai) to orchestrate multi-step browser automation:
- **Multi-provider support** — OpenAI, Anthropic, Google, Azure, Bedrock, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible endpoint
- **Session management** — conversations persist in a local SQLite database
- **Context overflow handling** — automatic message compaction when context windows fill up
- **MCP client** — connects to external MCP servers for additional tool access (40+ app integrations)
- **Tool adapter** — bridges MCP tool definitions to AI SDK tool format
### Provider Factory
The provider factory (`src/agent/provider-factory.ts`) creates AI SDK providers from runtime configuration, supporting hot-swapping between providers without restart.
## Skills System
Skills are custom instruction sets that shape agent behavior:
- **Catalog** (`src/skills/catalog.ts`) — registry of available skills
- **Defaults** (`src/skills/defaults/`) — built-in skill definitions
- **Loader** (`src/skills/loader.ts`) — loads skills from local and remote sources
- **Remote sync** (`src/skills/remote-sync.ts`) — syncs skills from the BrowserOS cloud
## Graph Executor (Workflows)
The graph executor (`src/graph/executor.ts`) runs visual workflow graphs built in the BrowserOS workflow editor. Each node in the graph maps to agent actions, conditionals, or data transformations.
## Directory Structure
```
apps/server/
├── src/
│ ├── index.ts # Server entry point
│ ├── main.ts # Server initialization
│ ├── api/ # HTTP route handlers
│ ├── agent/ # Agent loop
│ │ ├── ai-sdk-agent.ts # Main agent implementation
│ │ ├── provider-factory.ts# LLM provider factory
│ │ ├── session-store.ts # Conversation persistence
│ │ ├── compaction.ts # Context window management
│ │ ├── mcp-builder.ts # External MCP client setup
│ │ └── tool-adapter.ts # MCP → AI SDK tool bridge
│ ├── browser/ # Browser connection layer
│ ├── tools/ # MCP tool implementations
│ │ ├── navigation.ts
│ │ ├── input.ts
│ │ ├── snapshot.ts
│ │ ├── memory/
│ │ ├── filesystem/
│ │ └── ...
│ ├── skills/ # Skills system
│ ├── graph/ # Workflow graph executor
│ ├── lib/ # Shared utilities
│ └── rpc.ts # JSON-RPC type definitions
├── tests/
│ ├── tools/ # Tool-level tests
│ ├── sdk/ # SDK integration tests
│ └── server.integration.test.ts
├── graph/ # Workflow graph definitions
└── package.json
```
## Development
### Prerequisites
- [Bun](https://bun.sh) runtime
- A running BrowserOS instance (for CDP and controller connections)
### Setup
```bash
# Copy environment files
cp .env.example .env.development
# Start the server (with hot reload)
bun run start
```
See the [agent monorepo README](../../README.md) for full environment variable reference and `process-compose` setup.
### Testing
```bash
bun run test:tools # Tool-level tests
bun run test:integration # Full integration tests (requires running BrowserOS)
bun run test:sdk # SDK integration tests
```
### Building
```bash
# Build cross-platform server binaries
bun run build
# Build for specific targets
bun scripts/build/server.ts --target=darwin-arm64,linux-x64
# Build without uploading to R2
bun scripts/build/server.ts --target=all --no-upload
```
## Ports
| Port | Env Variable | Purpose |
|------|-------------|---------|
| 9100 | `BROWSEROS_SERVER_PORT` | HTTP server (MCP, chat, health) |
| 9000 | `BROWSEROS_CDP_PORT` | Chromium CDP (server connects as client) |
| 9300 | `BROWSEROS_EXTENSION_PORT` | WebSocket for controller extension |

View File

@@ -1,7 +1,17 @@
# @browseros-ai/agent-sdk
[![npm version](https://img.shields.io/npm/v/@browseros-ai/agent-sdk)](https://www.npmjs.com/package/@browseros-ai/agent-sdk)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](../../../../LICENSE)
Browser automation SDK for BrowserOS — navigate, interact, extract data, and verify page state using natural language.
Build automations that describe *what* to do, not *how* to do it. The SDK connects to a running BrowserOS instance and translates natural language instructions into browser actions using your choice of LLM provider.
## Prerequisites
- A running [BrowserOS](https://browseros.com) instance
- An API key for at least one [supported LLM provider](#llm-providers)
## Installation
```bash
@@ -17,7 +27,7 @@ import { Agent } from '@browseros-ai/agent-sdk'
import { z } from 'zod'
const agent = new Agent({
url: 'http://localhost:3000',
url: 'http://localhost:9100',
llm: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
@@ -42,6 +52,40 @@ const { data } = await agent.extract('get all product names and prices', {
const { success, reason } = await agent.verify('user is logged in')
```
## Multi-Step Example
Combine navigation, actions, extraction, and verification for end-to-end automation:
```typescript
import { Agent } from '@browseros-ai/agent-sdk'
import { z } from 'zod'
const agent = new Agent({
url: 'http://localhost:9100',
llm: { provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
})
// 1. Navigate
await agent.nav('https://news.ycombinator.com')
// 2. Extract data
const { data: stories } = await agent.extract('get the top 5 stories with title, points, and link', {
schema: z.array(z.object({
title: z.string(),
points: z.number(),
link: z.string(),
})),
})
// 3. Act on extracted data
await agent.act(`click on the story titled "${stories[0].title}"`)
// 4. Verify the result
const { success } = await agent.verify('the story page or external link has loaded')
console.log({ stories, navigationSuccess: success })
```
## API Reference
### `new Agent(options)`
@@ -105,8 +149,6 @@ const { success, reason } = await agent.verify('the form was submitted successfu
## LLM Providers
Supported providers:
| Provider | Config |
|----------|--------|
| OpenAI | `{ provider: 'openai', apiKey: '...' }` |
@@ -121,11 +163,11 @@ Supported providers:
## Progress Events
Track agent operations:
Track agent operations in real time:
```typescript
const agent = new Agent({
url: 'http://localhost:3000',
url: 'http://localhost:9100',
onProgress: (event) => {
console.log(`[${event.type}] ${event.message}`)
},
@@ -154,6 +196,13 @@ try {
}
```
## Links
- [Documentation](https://docs.browseros.com)
- [GitHub](https://github.com/browseros-ai/BrowserOS)
- [Changelog](./CHANGELOG.md)
- [Discord](https://discord.gg/YKwjt5vuKr)
## License
AGPL-3.0-or-later
[AGPL-3.0-or-later](../../../../LICENSE)

View File

@@ -0,0 +1,67 @@
# @browseros/cdp-protocol
Type-safe Chrome DevTools Protocol bindings for BrowserOS.
> **Internal package** — auto-generated TypeScript types and API wrappers for all CDP domains. Used by `@browseros/server` to communicate with Chromium.
## Usage
Import domain types or domain API wrappers using subpath exports:
```typescript
// Import type definitions for a CDP domain
import type { NavigateParams, NavigateReturn } from '@browseros/cdp-protocol/domains/page'
// Import the API wrapper for a domain
import { PageAPI } from '@browseros/cdp-protocol/domain-apis/page'
// Core protocol API
import { ProtocolAPI } from '@browseros/cdp-protocol/protocol-api'
// Factory function
import { createAPI } from '@browseros/cdp-protocol/create-api'
```
## Supported Domains
All standard Chrome DevTools Protocol domains are supported:
| Category | Domains |
|----------|---------|
| **Page & DOM** | Page, DOM, DOMDebugger, DOMSnapshot, DOMStorage, CSS, Overlay |
| **Network** | Network, Fetch, IO, ServiceWorker, CacheStorage |
| **Input & Interaction** | Input, Emulation, DeviceOrientation, DeviceAccess |
| **JavaScript** | Runtime, Debugger, Console, Profiler, HeapProfiler |
| **Browser** | Browser, Target, Inspector, Extensions, PWA |
| **Performance** | Performance, PerformanceTimeline, Tracing, Memory |
| **Media** | Media, WebAudio, Cast |
| **Security** | Security, WebAuthn, FedCm |
| **Storage** | IndexedDB, Storage, FileSystem |
| **Other** | Accessibility, Animation, Audits, Autofill, BackgroundService, BluetoothEmulation, EventBreakpoints, HeadlessExperimental, LayerTree, Log, Preload, Schema, SystemInfo, Tethering |
| **BrowserOS Custom** | Bookmarks, History |
## Structure
```
src/generated/
├── domains/ # Type definitions for each CDP domain
│ ├── page.ts
│ ├── dom.ts
│ ├── network.ts
│ └── ...
├── domain-apis/ # API wrapper classes for each domain
│ ├── page.ts
│ ├── dom.ts
│ ├── network.ts
│ └── ...
├── protocol-api.ts # Unified protocol API
└── create-api.ts # API factory
```
## Regenerating Types
Types are auto-generated from the CDP protocol specification. The generated output lives in `src/generated/` and should not be edited manually.
## License
[AGPL-3.0-or-later](../../../../LICENSE)

View File

@@ -0,0 +1,133 @@
# BrowserOS Browser (Chromium Fork)
Custom Chromium build with AI agent integration, enhanced privacy patches, and native MCP support.
> Based on **Chromium 146.0.7680.31** · Built with Python 3.12+ · Licensed under [AGPL-3.0](../../LICENSE)
## What This Is
This package contains the BrowserOS browser build system — everything needed to fetch Chromium source, apply BrowserOS patches, and produce signed binaries for macOS, Windows, and Linux. The build system is a Python CLI that orchestrates the entire pipeline from source to distributable.
BrowserOS patches add:
- Native AI agent sidebar and new tab integration
- MCP server endpoints baked into the browser
- Enhanced privacy via [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) patches
- Custom branding, icons, and entitlements
- Keychain access group management (macOS)
- Sparkle auto-update framework (macOS)
## Prerequisites
| Requirement | Details |
|-------------|---------|
| **Disk space** | ~100 GB for Chromium source + build artifacts |
| **Python** | 3.12+ |
| **macOS** | Xcode + Command Line Tools |
| **Linux** | `build-essential`, `clang`, `lld`, and Chromium's [Linux deps](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md) |
| **Windows** | Visual Studio 2022, Windows SDK |
## Directory Structure
```
packages/browseros/
├── build/ # Build system (Python CLI)
│ ├── __main__.py # CLI entry point
│ ├── browseros.py # Main app definition
│ ├── modules/
│ │ ├── setup/ # Chromium source fetch and setup
│ │ ├── patches/ # Patch application logic
│ │ ├── apply/ # Apply patches to source tree
│ │ ├── extract/ # Extract patches from modified source
│ │ ├── feature/ # Feature flag management
│ │ ├── package/ # Binary packaging
│ │ ├── sign/ # Code signing (macOS, Windows)
│ │ ├── ota/ # Over-the-air update support
│ │ └── resources/ # Resource management
│ ├── config/ # Build configuration
│ └── features.yaml # Feature flag definitions
├── chromium_patches/ # BrowserOS patches applied to Chromium source
│ ├── chrome/browser/ # Browser UI and feature patches
│ ├── components/ # Component patches (e.g., os_crypt)
│ └── ... # Organized to mirror Chromium source tree
├── chromium_files/ # New files added to Chromium (not patches)
├── series_patches/ # Ordered patch series
├── resources/ # Icons, entitlements, signing resources
│ └── entitlements/ # macOS entitlements (app, helper, GPU, etc.)
├── tools/
│ └── bdev # Developer tool
├── CHROMIUM_VERSION # Pinned Chromium version (MAJOR.MINOR.BUILD.PATCH)
├── BASE_COMMIT # Base Chromium commit hash
├── pyproject.toml # Python project config
└── requirements.txt # Python dependencies
```
## Build System
The `browseros` CLI manages the full build lifecycle:
```bash
# Install the build system
pip install -e .
# Or use uv
uv pip install -e .
```
Key commands:
```bash
browseros setup # Fetch and prepare Chromium source
browseros apply # Apply all patches to Chromium source
browseros build # Build BrowserOS binary
browseros package # Package into distributable (DMG, installer, AppImage)
browseros sign # Code sign the binary (macOS/Windows)
```
## Patch System
BrowserOS applies patches on top of vanilla Chromium. Patches are organized in two directories:
- **`chromium_patches/`** — Individual file patches, organized to mirror the Chromium source tree. Each file here replaces or modifies the corresponding file in Chromium.
- **`series_patches/`** — Ordered patch series applied sequentially.
### Adding a New Patch
1. Make your changes in the Chromium source tree
2. Use `browseros extract` to pull changes back into patch format
3. Place the patch in the appropriate directory mirroring Chromium's structure
4. Test with a full `browseros apply && browseros build` cycle
### Chromium Version Pinning
The exact Chromium version is pinned in `CHROMIUM_VERSION`:
```
MAJOR=146
MINOR=0
BUILD=7680
PATCH=31
```
To update the base Chromium version, update this file and `BASE_COMMIT`, then resolve any patch conflicts.
## Signing (macOS)
macOS builds require code signing for Keychain access, Gatekeeper, and notarization:
- Entitlements are in `resources/entitlements/` (app, helper, GPU, renderer, etc.)
- Designated requirements pin to Team ID for Keychain persistence across updates
- The signing module is at `build/modules/sign/macos.py`
## Feature Flags
Feature flags are defined in `features.yaml` and control which BrowserOS-specific features are compiled into the build. The feature module (`build/modules/feature/`) manages flag resolution at build time.
## Related Resources
- [Chromium Build Instructions](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
- [ungoogled-chromium](https://github.com/ungoogled-software/ungoogled-chromium) — upstream privacy patches
- [BrowserOS Agent Platform](../browseros-agent/) — the TypeScript/Go agent system that runs inside the browser