Files
BrowserOS/packages/browseros-agent/packages/agent-sdk
Dani Akash b3003542d8 docs: overhaul READMEs across all major packages (#594)
* docs: overhaul READMEs across all major packages

- Root README: restructure with feature table, LLM provider table,
  comparison matrix, architecture map, and docs link
- New: packages/browseros/README.md (Chromium fork build system)
- New: apps/server/README.md (MCP server + agent loop)
- New: packages/cdp-protocol/README.md (CDP type bindings)
- Polish: agent-sdk (badges, prerequisites, multi-step example, links)
- Polish: cli (badges, install section, MCP server section, links)
- Polish: agent extension (badges, WXT mention, architecture context)
- Polish: eval (badges, paper links)

* fix: address review — consistent tool count and correct default port

- CLI README: "54 MCP tools" → "53+ MCP tools" to match root and server docs
- Agent SDK README: localhost:3000 → localhost:9100 to match documented default

* docs: add detailed comparison links to How We Compare section

* docs: update comparison table with verified competitor data

Research all 5 competitors via official websites and docs:
- Chrome: no AI agent, Gemini Nano only, MV3 weakening ad blocking
- Brave: BYOM feature, local models via BYOM, Shields ad blocking, MV2+MV3
- Dia: Skills-based AI, no BYOK, cloud AI, acquired by Atlassian
- Comet: full cloud-based agent, built-in ad blocking, extensions on desktop
- Atlas: standalone Chromium browser with Agent Mode, 30-day cloud memory

Renamed Arc/Dia column to just Dia (Arc is sunset).

* docs: simplify comparison table with clean checkmarks and key differentiators

* docs: update browseros-agent README — remove submodule note, add missing packages
2026-03-27 11:59:04 +05:30
..

@browseros-ai/agent-sdk

npm version License: AGPL v3

Browser automation SDK for BrowserOS — navigate, interact, extract data, and verify page state using natural language.

Build automations that describe what to do, not how to do it. The SDK connects to a running BrowserOS instance and translates natural language instructions into browser actions using your choice of LLM provider.

Prerequisites

Installation

npm install @browseros-ai/agent-sdk
# or
bun add @browseros-ai/agent-sdk

Quick Start

import { Agent } from '@browseros-ai/agent-sdk'
import { z } from 'zod'

const agent = new Agent({
  url: 'http://localhost:9100',
  llm: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
  },
})

// Navigate to a page
await agent.nav('https://example.com')

// Perform actions with natural language
await agent.act('click the login button')

// Extract structured data
const { data } = await agent.extract('get all product names and prices', {
  schema: z.array(z.object({
    name: z.string(),
    price: z.number(),
  })),
})

// Verify page state
const { success, reason } = await agent.verify('user is logged in')

Multi-Step Example

Combine navigation, actions, extraction, and verification for end-to-end automation:

import { Agent } from '@browseros-ai/agent-sdk'
import { z } from 'zod'

const agent = new Agent({
  url: 'http://localhost:9100',
  llm: { provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
})

// 1. Navigate
await agent.nav('https://news.ycombinator.com')

// 2. Extract data
const { data: stories } = await agent.extract('get the top 5 stories with title, points, and link', {
  schema: z.array(z.object({
    title: z.string(),
    points: z.number(),
    link: z.string(),
  })),
})

// 3. Act on extracted data
await agent.act(`click on the story titled "${stories[0].title}"`)

// 4. Verify the result
const { success } = await agent.verify('the story page or external link has loaded')

console.log({ stories, navigationSuccess: success })

API Reference

new Agent(options)

Create a new agent instance.

const agent = new Agent({
  url: string,           // BrowserOS server URL
  llm?: LLMConfig,       // Optional LLM configuration
  onProgress?: (event) => void,  // Progress callback
})

agent.nav(url, options?)

Navigate to a URL.

const { success } = await agent.nav('https://google.com')

agent.act(instruction, options?)

Perform browser actions using natural language.

// Simple action
await agent.act('click the submit button')

// With context interpolation
await agent.act('search for {{query}}', {
  context: { query: 'browseros' },
})

// Multi-step with limit
await agent.act('fill out the form and submit', {
  maxSteps: 15,
})

agent.extract(instruction, options)

Extract structured data from the page.

import { z } from 'zod'

const { data } = await agent.extract('get the page title', {
  schema: z.object({ title: z.string() }),
})

agent.verify(expectation, options?)

Verify the current page state.

const { success, reason } = await agent.verify('the form was submitted successfully')

LLM Providers

Provider Config
OpenAI { provider: 'openai', apiKey: '...' }
Anthropic { provider: 'anthropic', apiKey: '...' }
Google { provider: 'google', apiKey: '...' }
Azure { provider: 'azure', apiKey: '...', resourceName: '...' }
OpenRouter { provider: 'openrouter', apiKey: '...' }
Ollama { provider: 'ollama', baseUrl: 'http://localhost:11434' }
LM Studio { provider: 'lmstudio', baseUrl: 'http://localhost:1234' }
AWS Bedrock { provider: 'bedrock', region: '...', accessKeyId: '...' }
OpenAI Compatible { provider: 'openai-compatible', baseUrl: '...', apiKey: '...' }

Progress Events

Track agent operations in real time:

const agent = new Agent({
  url: 'http://localhost:9100',
  onProgress: (event) => {
    console.log(`[${event.type}] ${event.message}`)
  },
})

Event types: nav, act, extract, verify, error, done

Error Handling

import {
  NavigationError,
  ActionError,
  ExtractionError,
  VerificationError,
  ConnectionError
} from '@browseros-ai/agent-sdk'

try {
  await agent.act('click non-existent button')
} catch (error) {
  if (error instanceof ActionError) {
    console.error('Action failed:', error.message)
  }
}

License

AGPL-3.0-or-later