Files
BrowserOS/packages/agent-sdk
Felarof edfdaaeaf5 feat: onboaring page fix it and other minor issues (#270)
* fix: use source files for agent-sdk during development

Export src/index.ts directly in workspace mode so the server can import
without requiring a build step. publishConfig overrides exports to use
dist/ when publishing to npm.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: onboarding try it

* fix: summarize current page

* fix: ask browser os opens in agent mode

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 07:48:32 -08:00
..
2026-01-19 16:58:22 -08:00
2026-01-07 18:45:01 +05:30
2026-01-07 18:45:01 +05:30

@browseros-ai/agent-sdk

Browser automation SDK for BrowserOS — navigate, interact, extract data, and verify page state using natural language.

Installation

npm install @browseros-ai/agent-sdk
# or
bun add @browseros-ai/agent-sdk

Quick Start

import { Agent } from '@browseros-ai/agent-sdk'
import { z } from 'zod'

const agent = new Agent({
  url: 'http://localhost:3000',
  llm: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
  },
})

// Navigate to a page
await agent.nav('https://example.com')

// Perform actions with natural language
await agent.act('click the login button')

// Extract structured data
const { data } = await agent.extract('get all product names and prices', {
  schema: z.array(z.object({
    name: z.string(),
    price: z.number(),
  })),
})

// Verify page state
const { success, reason } = await agent.verify('user is logged in')

API Reference

new Agent(options)

Create a new agent instance.

const agent = new Agent({
  url: string,           // BrowserOS server URL
  llm?: LLMConfig,       // Optional LLM configuration
  onProgress?: (event) => void,  // Progress callback
})

agent.nav(url, options?)

Navigate to a URL.

const { success } = await agent.nav('https://google.com')

agent.act(instruction, options?)

Perform browser actions using natural language.

// Simple action
await agent.act('click the submit button')

// With context interpolation
await agent.act('search for {{query}}', {
  context: { query: 'browseros' },
})

// Multi-step with limit
await agent.act('fill out the form and submit', {
  maxSteps: 15,
})

agent.extract(instruction, options)

Extract structured data from the page.

import { z } from 'zod'

const { data } = await agent.extract('get the page title', {
  schema: z.object({ title: z.string() }),
})

agent.verify(expectation, options?)

Verify the current page state.

const { success, reason } = await agent.verify('the form was submitted successfully')

LLM Providers

Supported providers:

Provider Config
OpenAI { provider: 'openai', apiKey: '...' }
Anthropic { provider: 'anthropic', apiKey: '...' }
Google { provider: 'google', apiKey: '...' }
Azure { provider: 'azure', apiKey: '...', resourceName: '...' }
OpenRouter { provider: 'openrouter', apiKey: '...' }
Ollama { provider: 'ollama', baseUrl: 'http://localhost:11434' }
LM Studio { provider: 'lmstudio', baseUrl: 'http://localhost:1234' }
AWS Bedrock { provider: 'bedrock', region: '...', accessKeyId: '...' }
OpenAI Compatible { provider: 'openai-compatible', baseUrl: '...', apiKey: '...' }

Progress Events

Track agent operations:

const agent = new Agent({
  url: 'http://localhost:3000',
  onProgress: (event) => {
    console.log(`[${event.type}] ${event.message}`)
  },
})

Event types: nav, act, extract, verify, error, done

Error Handling

import {
  NavigationError,
  ActionError,
  ExtractionError,
  VerificationError,
  ConnectionError
} from '@browseros-ai/agent-sdk'

try {
  await agent.act('click non-existent button')
} catch (error) {
  if (error instanceof ActionError) {
    console.error('Action failed:', error.message)
  }
}

License

AGPL-3.0-or-later