Commit Graph

2221 Commits

Author SHA1 Message Date
shivammittal274
65547c60c0 fix(eval): clean up eval configs and add test-clado-api script (#540)
Consolidate 13 configs down to 7 with uniform settings:
- 3 weekly (CI): browseros-agent, browseros-oe-agent, browseros-oe-clado
- 4 test (local): test_gemini-computer-use, test_yutori-navigator, test_webvoyager, test_mind2web
- All configs: headless=false, captcha block, full browseros ports, restart_server_per_task

Deleted: debug-test, mind2web-test, tool-loop-test, orchestrator-executor-test,
orchestrator-executor-clado-test, fireworks-minimax-m2, webvoyager-test

Added: test-clado-api.ts script, browseros-oe-agent-weekly.json (OE with AI SDK executor)
2026-03-24 01:28:05 +05:30
shivammittal274
0babc05077 feat(eval): NopeCHA CAPTCHA solver integration (#537)
* feat(eval): show mean score instead of pass/fail in report and viewer

* feat(eval): integrate NopeCHA CAPTCHA solver into eval pipeline

Add CAPTCHA detection and waiting so screenshots capture post-solve state.
Run headed with xvfb on CI since headless breaks extension content scripts.

- Add CaptchaWaiter module (detect reCAPTCHA/hCaptcha/Turnstile, poll until solved)
- Add optional `captcha` config block to EvalConfigSchema
- Wait for CAPTCHA solve before screenshot in single-agent and orchestrator-executor
- Patch NopeCHA manifest with API key before launching workers
- Fix CAPTCHA_EXT_DIR path (was pointing one level too high)
- Remove --incognito (extensions don't run in incognito; fresh user-data-dir isolates)
- CI: install xvfb, run headed via xvfb-run, pass NOPECHA_API_KEY secret
2026-03-24 00:14:16 +05:30
Nikhil
1270b5b55c feat: new manifest perms (#536)
* feat: new manifest perms

* fix: minor

* fix: minor
2026-03-23 09:31:07 -07:00
Nikhil
e97d8bc1cb fix: remove daily rate-limit middleware (#535)
* fix: remove daily rate-limit middleware

The daily conversation rate limit is no longer needed. Remove the
middleware, RateLimiter class, fetch-config, error type, shared
constants, DB schema table, and integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unused getDb() method

No longer needed after rate-limiter removal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 08:31:20 -07:00
Dani Akash
5109ca4347 feat: added scope in server error logs (#533)
* feat: added scope in server error logs

* fix: prevent double capture on chat request
2026-03-23 20:47:28 +05:30
shivammittal274
f14942c6f9 feat(eval): show mean score instead of pass/fail in report and viewer (#534) 2026-03-23 20:28:34 +05:30
Dani Akash
86ec88ed80 feat: sentry improvements (#532)
* feat: process request record from sentry locally

* feat: added analytics for logged in users
2026-03-23 19:45:28 +05:30
Dani Akash
4928b7e84b fix: no current window and sentry context (#531)
* fix: error reporting and better breadcrumbs

* fix: lint issues
2026-03-23 18:46:39 +05:30
shivammittal274
94a1a701f6 fix(eval): include browser context in agent prompt (#530)
The eval's single-agent was passing raw task.query as the prompt,
without browser context (active tab URL, title). The agent didn't
know which page it was on, causing it to ask "which website?" instead
of browsing.

Use formatUserMessage() (same as chat-service.ts) to include browser
context in the prompt. Re-export formatUserMessage from agent/tool-loop.
2026-03-23 17:42:03 +05:30
Dani Akash
ecf2efa857 fix: add unlimited storage permission to agent (#529) 2026-03-23 17:36:26 +05:30
shivammittal274
026c6a03a3 feat(eval): auto-trigger eval on agent/tools changes pushed to main (#528) 2026-03-23 16:52:30 +05:30
Nikhil
2b53daf641 fix: prevent deleted scheduled tasks from reappearing after sync (#518)
* fix: prevent deleted scheduled tasks from reappearing after sync

When a scheduled task was deleted, the sync function would see the
remote job missing locally and re-add it, undoing the delete. Fix by
tracking pending deletions in storage so the sync function deletes
them from the backend instead of re-adding them locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use read-modify-write for pending deletions to prevent concurrent clobber

Re-read pendingDeletionStorage before write-back and only remove
resolved IDs, preserving any new entries added by concurrent
removeJob calls during the sync's network I/O.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:31:57 -07:00
Nikhil
3cc946ded8 fix(ci): report test pass/fail status on PRs (#520)
The test workflow captured exit codes but never failed the job, so PR
checks always showed green even when tests failed. Exit with the
captured code in the summarize step so each suite properly reports
pass/fail. Not a required check, so failures remain non-blocking.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:31:23 -07:00
shivammittal274
70be5c5c21 fix(eval): log agent errors in task progress for CI visibility (#523) 2026-03-21 23:33:19 +05:30
shivammittal274
0f9d93058f chore(eval): remove unused env vars from workflow (OPENROUTER, OPENAI) (#522) 2026-03-21 23:22:03 +05:30
shivammittal274
cafed57832 fix(eval): use CLAUDE_CODE_OAUTH_TOKEN for performance grader auth (#521) 2026-03-21 23:14:23 +05:30
shivammittal274
f157436e7d feat(eval): switch to Linux GitHub-hosted runner (#519)
* feat(eval): switch to ubuntu-latest runner, add OE-Clado config

- Switch workflow from self-hosted Mac Studio to ubuntu-latest
- Install BrowserOS Linux .deb in CI (no self-hosted runner needed)
- Add browseros-oe-clado-weekly.json config for orchestrator-executor
- Fix report chart to show date+time (not just date)
- Make BROWSEROS_BINARY configurable via env var

* feat(eval): add NopeCHA captcha solver extension to eval runs

- Auto-load NopeCHA extension in eval Chrome instances
- Works in incognito + headless mode
- CI workflow downloads NopeCHA before eval
- extensions/ directory gitignored (downloaded at runtime)

* feat(eval): per-config concurrency — different configs run in parallel

* feat(eval): remove concurrency limit — all runs execute in parallel
2026-03-21 23:04:45 +05:30
Nikhil
ba7892322b ci: run BrowserOS test suites on PRs (#514)
* ci: run browseros tests on pull requests

* refactor: rework 0320-github_action_for_tests based on feedback

* refactor: rework 0320-github_action_for_tests based on feedback

* chore: add CI artifacts to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove mikepenz/action-junit-report to fix check suite misattribution

The JUnit report action creates check runs that GitHub associates with the
CLA check suite instead of the Tests check suite, causing test reports to
appear under "CLA Assistant" in the PR checks UI.

Remove the action and rely on job status + step summary + artifact upload
for test result visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:36 -07:00
shivammittal274
4e90b4561a feat(eval): weekly eval pipeline with R2 uploads and trend dashboard (#516)
* feat(eval): weekly eval pipeline with R2 uploads and trend dashboard

Add infrastructure for running weekly evaluations and tracking score
trends over time:

- Auto-generated output dirs: results/{config-name}/{timestamp}/
  Each eval run gets its own timestamped folder, nothing is overwritten.

- upload-run.ts: uploads eval results to Cloudflare R2. Supports
  uploading a specific run or all un-uploaded runs for a config.

- weekly-report.ts: generates an interactive HTML dashboard from R2
  data. Config dropdown, trend chart with hover tooltips, searchable
  runs table. Groups runs by config name.

- viewer.html: client-facing 3-column run viewer (task list,
  screenshots with autoplay, agent stream with messages.jsonl).
  Shows performance grader axis breakdown with per-axis scores.

- browseros-agent-weekly.json: weekly benchmark config (kimi-k2p5,
  webbench-2of4-50, 10 workers, performance grader, headless).

- eval-weekly.yml: GitHub Actions workflow with cron (Saturday 6am)
  and manual trigger. Runs on self-hosted Mac Studio runner.
  Concurrency group ensures only one eval runs at a time.

- Dashboard updates: load previous runs, messages.jsonl viewer,
  grade badges show percentages, async stream loading.

- Grader updates: timeout 30min, max turns 100, DOM content
  verification guidance for performance grader.

* fix(eval): address Greptile review — injection, nested dirs, escaping

- Fix script injection in eval-weekly.yml: pass github.event.inputs
  through env var instead of interpolating into shell
- Fix /api/runs to enumerate nested results/{config}/{timestamp}/ dirs
- Fix /api/load-run to allow single-slash run names (config/timestamp)
- Add HTML escaping for R2-sourced values in weekly-report.ts
- Escape axis names in viewer.html renderAxesBreakdown

* fix(eval): fix biome lint — non-null assertion, template literals

* fix(eval): fix biome errors — replace var with let, fix inner function declaration

* fix(eval): address Greptile P2 issues

- isRunDir: check all subdirs for metadata.json, not just first 3
- eval-runner: guard configPath for dashboard-driven runs (fallback to 'eval')
- load-run: default unknown termination_reason to 'failed' not 'completed'

* feat(eval): make BROWSEROS_BINARY configurable via env var
2026-03-21 22:12:52 +05:30
shivammittal274
86eed82350 fix: lazy OAuth callback server with cancel+retry (Codex CLI pattern) (#515)
The OAuth callback server on port 1455 was bound eagerly at startup,
crashing the server if another BrowserOS instance was already running.

Rewrite as a lazy class (OAuthCallbackServer) that:
- Only binds port 1455 when the user initiates a ChatGPT Pro login
- Sends GET /cancel to any existing server on the port first, then
  retries up to 5 times (follows Codex CLI's cancel+retry pattern)
- Exposes /cancel endpoint so other instances/tools can cancel us
- Releases the port after the OAuth callback arrives
- Device-code providers (GitHub Copilot, Qwen) never touch port 1455

This allows running eval, dev instances, and multiple BrowserOS
instances without port conflicts. OAuth login works on whichever
instance initiates it — the others continue without OAuth.
2026-03-21 16:44:03 +05:30
Nikhil
be6ed22af4 test: fix BrowserOS tool test harness regressions (#513)
* test: fix browseros tool test harness regressions

* test: align working directory naming in page action tests
v0.44.0.1
2026-03-20 12:05:39 -07:00
Nikhil
149cde118d chore: bump server version, offset and patch for release (#512) 2026-03-20 11:45:12 -07:00
Nikhil
9bc5e666c4 feat: auto-discover server port via ~/.browseros/server.json (#504)
* feat: auto-discover server port via ~/.browseros/server.json

Server writes its port to ~/.browseros/server.json on startup so the CLI
can auto-discover the server URL without requiring `browseros-cli init`.

Discovery chain: BROWSEROS_URL env > config.yaml > server.json > error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback for PR #504

- Use synchronous unlinkSync in stop() since process.exit() fires
  immediately after, abandoning any pending async operations
- Wrap writeServerConfig in try/catch so a write failure doesn't crash
  a healthy server for a convenience feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: type server discovery config and add version metadata

Add ServerDiscoveryConfig interface to @browseros/shared and enrich
server.json with server_version, browseros_version, and chromium_version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: normalize URL from server.json for consistency

All other URL sources (env var, config.yaml) pass through
normalizeServerURL; apply the same to the server.json path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:37:00 -07:00
Nikhil
2271277b4d feat: add voice input to new tab search bar (#509)
* feat: add voice recording UI with waveform overlay to new tab search bar

Add a microphone button to the NewTab search bar that opens a fullscreen
recording overlay powered by react-voice-visualizer. The overlay shows a
real-time waveform visualization during recording, recording time, and a
stop button. On completion, the audio is transcribed via the existing
gateway endpoint and the transcript auto-navigates to inline chat.

Changes:
- Extract transcribeAudio() to shared lib/voice/transcribe-audio.ts
- Add VoiceRecordingOverlay component with react-voice-visualizer
- Add Mic button to NewTab search bar
- Track analytics via existing NEWTAB_VOICE_* events
- Handle cancel (backdrop click) vs submit (stop button) correctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review comments for voice recording overlay

- Reset processingRef on transcription error to prevent stuck state
- Use stable callback refs to prevent useEffect re-runs from inline
  arrow function props (fixes timer reset and unnecessary re-processing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace voice overlay with inline sidepanel-style voice UI

Remove react-voice-visualizer dependency and VoiceRecordingOverlay.
Instead use the same inline voice pattern as the sidepanel ChatInput:
- Waveform bars replace the search input during recording
- Mic/stop/loading button states in the search bar
- Transcript populates the search input on completion
- Voice error shown inline below the search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:33:01 -07:00
Nikhil
f865d301a2 test: add build smoke test to catch compile failures (#511)
* test: add build smoke test to catch compile failures

Compiles the server binary (darwin-arm64) and verifies --version outputs
the correct version from package.json. Uses an empty resource manifest
and stub env vars so the test runs without R2 access or real secrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback for PR #511

- Derive build target from process.platform/arch for CI portability
- Include binary stderr in --version assertion for better diagnostics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:16:57 -07:00
Nikhil
6f398f0b36 fix: replace sharp with jimp to fix compiled binary crash (#510)
sharp is a native C module (libvips) whose .node binaries can't be
embedded in Bun compiled executables. It was imported at the top level
in copilot-fetch.ts, crashing the entire server at startup.

Replace with jimp (pure JavaScript, zero native deps) which bundles
cleanly into compiled binaries. Same resize algorithm preserved.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:06:05 -07:00
shivammittal274
8548bcf50a feat: credit-based tracking for BrowserOS provider (#489)
* feat: add credit-based tracking for BrowserOS provider

Send X-BrowserOS-ID header on all LLM requests through the BrowserOS
gateway for per-installation credit tracking. Handle 429 CREDITS_EXHAUSTED
as non-retryable. Add GET/PUT /credits endpoints to check and manage
credit balance.

* docs: add credits tracking UI design

Design for showing credit balance in side panel chat header (color-coded
badge) and a dedicated Usage & Billing settings page. Credits refresh
after each completed message turn or on exhaustion error.

* docs: add credits tracking UI implementation plan

8-task plan covering useCredits hook, CreditBadge component, ChatHeader
integration, message completion refresh, ChatError CREDITS_EXHAUSTED
handling, Usage & Billing settings page, and route/sidebar registration.

* feat: add useCredits React Query hook

* feat: add CreditBadge component with color thresholds

* feat: show credit badge in chat header for BrowserOS provider

* feat: refresh credits after chat message completion and on error

* feat: handle CREDITS_EXHAUSTED error in chat

* feat: add Usage & Billing settings page

* feat: register usage page route and sidebar entry

* fix: lint and formatting fixes for credit tracking UI

* fix: separate credits exhausted from Kimi rate limit in ChatError, redesign Usage page

* chore: remove PUT /credits endpoint and setCredits function

* fix: extract shared credit colors, add error state to UsagePage, use dailyLimit from gateway

* fix: make dailyLimit required in CreditsInfo (gateway always returns it)

* feat: gate credits UI behind CREDITS_SUPPORT feature flag (server >= 0.0.78)
2026-03-20 22:49:00 +05:30
shivammittal274
e3601bfdc1 feat: gate Qwen Code behind server version 0.0.77 (#508) v0.44.0 2026-03-20 20:07:39 +05:30
Dani Akash
2b4fdf1aad feat: improved multi tab agent workflow (#507)
* feat: updated multitab workflow

* fix: updated prompt with fix for test cases

* fix: active agent glow

* fix: review comments
2026-03-20 18:31:36 +05:30
shivammittal274
11d15d079f feat: alibaba qwen oauth (#506)
* feat: add Qwen Code as OAuth LLM provider with refactored OAuth hooks

Add Alibaba Qwen Code as a third OAuth provider using Device Code flow
with PKCE. Free tier: 2,000 requests/day, up to 1M token context.

Refactoring:
- Extract useOAuthProviderFlow hook (eliminates ~180 lines of duplicated
  OAuth logic from AISettingsPage for ChatGPT Pro + Copilot + Qwen)
- Extract resolveOAuthConfig in config.ts (shared resolver for all OAuth
  providers, parameterized by provider name, default model, refresh flag)
- Generalize token-manager device code flow to support PKCE
  (code_challenge/code_verifier) and form-urlencoded content type

New code:
- Qwen Code provider config with PKCE + form encoding flags
- Provider factories (both provider.ts and provider-factory.ts)
- Extension UI (template card, models, analytics, dialog)

* fix: use portal.qwen.ai as API base URL for OAuth tokens

DashScope (dashscope.aliyuncs.com) expects Alibaba Cloud API keys,
not OAuth tokens from chat.qwen.ai. The correct endpoint for OAuth
Bearer tokens is portal.qwen.ai/v1.

* fix: correct Qwen Code model IDs and context windows

- coder-model (1M context): virtual alias that routes to best model
- qwen3-coder-plus (1M): was incorrectly 131K
- qwen3-coder-flash (1M): new, speed-optimized variant
- qwen3.5-plus (1M): was incorrectly 1048576 (power-of-two vs decimal)
- Removed qwen3-coder-next (local/self-hosted, not available via OAuth)
- Default model changed to coder-model (auto-routes server-side)

* fix: move Qwen device code request to extension (bypasses WAF)

Alibaba WAF blocks server-side requests to chat.qwen.ai. Move the
initial device code request to the extension (browser context with
cookies), then hand off the deviceCode + codeVerifier to the server
for background polling via new POST /oauth/:provider/poll endpoint.

* fix: persist OAuth flow-started flag in sessionStorage

The flowStartedRef was lost when the component remounted (e.g. user
navigated to onboarding then back to settings). Use sessionStorage
to persist the flag so auto-create works after navigation.

* revert: remove sessionStorage for OAuth flow flag

Revert to simple useRef pattern matching the original ChatGPT Pro
implementation. The auto-create works when the user stays on the
AI settings page during auth.

* revert: move Qwen back to server-side device code flow

WAF block was temporary (rate-limiting), not permanent. Server-side
fetch to chat.qwen.ai now works. Reverted client-side device code
approach — Qwen now uses the same clean server-side flow as Copilot.

Removed: clientSideDeviceCode config, startClientSideDeviceCode(),
POST /oauth/:provider/poll endpoint, startDeviceCodePolling().

* feat: add WAF detection, rate-limit protection, and token storage endpoint

- Detect WAF captcha responses (HTML instead of JSON) in device code
  request and token polling, with user-friendly error messages
- Add 30s cooldown on "USE" button to prevent rapid clicks triggering WAF
- WAF-blocked poll requests silently retry instead of aborting
- Add POST /oauth/:provider/token endpoint for storing externally-provided
  tokens (useful for future fallback flows)
- Add storeTokens() method to OAuthTokenManager
- Pass server error messages through to extension toast notifications

* refactor: remove 30s cooldown, simplify OAuth hook

The hook is now identical for all providers — server handles retries
via activeDeviceFlows.delete(). Removed flowStartedAtRef cooldown
that was blocking legitimate retries.

* feat: client-side OAuth for Copilot and Qwen Code

Move device code OAuth flow to the extension for GitHub Copilot and
Qwen Code. The extension makes requests using Chrome's network stack,
which bypasses Alibaba WAF TLS fingerprint detection that blocks
server-side Bun/Node.js fetch.

New files:
- client-oauth.ts: Client-side device code + PKCE + token polling

Changes:
- useOAuthProviderFlow: handleClientAuth() for providers with clientAuth
  config, handleServerAuth() for others (ChatGPT Pro)
- AISettingsPage: clientAuth config for Copilot and Qwen Code
- WAF detection: opens provider site for captcha solving on block

Server-side device code flow preserved as fallback (token-manager.ts,
providers.ts). Token storage via POST /oauth/:provider/token endpoint.

* fix: export OAuthProviderFlowConfig type, fix typecheck errors

- Export OAuthProviderFlowConfig interface so AISettingsPage can use it
  instead of duplicating the type inline
- Fix string | null → string | undefined for agentServerUrl parameter
2026-03-20 17:46:48 +05:30
Nikhil
9257832acf feat: gate ChatGPT Pro and GitHub Copilot behind server version 0.0.77 (#503)
Add CHATGPT_PRO_SUPPORT and GITHUB_COPILOT_SUPPORT feature flags gated
on minServerVersion 0.0.77. Hide template cards and provider type
dropdown options when the server doesn't support the OAuth endpoints.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:43:09 -07:00
Nikhil
7bde0d59fa chore: bump chromium version (#502) 2026-03-19 16:22:13 -07:00
Nikhil
1c737b0f02 chore: bump server version (#501) 2026-03-19 16:17:50 -07:00
Nikhil
5d0a2b9bfe feat: add model selector to newtab search bar (#499)
* feat: add model selector to newtab search bar

Add AI provider/model selector button to the newtab homepage footer bar,
matching the existing button aesthetics (Workspace, Tabs, Apps). Reuses
ChatProviderSelector popover from sidepanel. Users can now see and change
their AI provider before starting a conversation from the newtab page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clean up newtab footer with icon-only buttons

Reduce visual clutter in the search bar footer by converting Provider,
Workspace, and Tabs buttons to compact icon-only buttons (8x8). Text
labels and chevron indicators are removed — native title tooltips
provide discoverability on hover. Apps button on the right keeps its
text label per user preference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add hover-expand labels to newtab footer icon buttons

Replace static title tooltips with smooth hover-expand animation —
buttons show icon-only by default, text label slides out on hover
via max-w transition. Gives a clean compact look while keeping
labels discoverable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert workspace/tabs to full text, keep provider hover-expand only

Restore full text labels for Workspace and Tabs buttons. Only the
provider selector uses the compact icon + hover-expand pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify provider selector to plain icon button

Remove hover-expand animation, use a simple icon-only button with
native title tooltip for the provider selector.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:14:15 -07:00
shivammittal274
720baaed3e feat: add GitHub Copilot as OAuth LLM provider (#500)
* feat: add GitHub Copilot as OAuth-based LLM provider

Add GitHub Copilot as a second OAuth provider using the Device Code flow
(RFC 8628). Users authenticate via github.com/login/device, and the server
polls for token completion. Supports 25+ models through a single Copilot
subscription.

Key changes:
- Device Code OAuth flow in token manager (poll with safety margin)
- Custom fetch wrapper injecting Copilot headers + vision detection
- Provider factory using createOpenAICompatible for Chat Completions API
- Extension UI with template card, auto-create on auth, and disconnect

* fix: address PR review comments for GitHub Copilot OAuth

- Validate device code response for error fields (GitHub can return 200
  with error payload)
- Store empty refreshToken instead of access token for GitHub tokens
- Add closeButton to Toaster for dismissing device code toast

* fix: add github-copilot to agent provider factory

The chat route uses a separate provider-factory.ts (agent layer) from the
test-provider route (llm/provider.ts). Added createGitHubCopilotFactory
to the agent factory so chat works with GitHub Copilot.

* fix: add github-copilot to provider icons, models, and dialog

- Add Github icon from lucide-react to providerIcons map
- Add 8 Copilot models (GPT-4o, Claude, Gemini, Grok) to models.ts
- Add github-copilot to NewProviderDialog zod enum, validation skip,
  canTest check, and OAuth credential message

* fix: reorder copilot models with free-tier models first

Put models available on Copilot Free at the top (gpt-4o, gpt-4.1,
gpt-5-mini, claude-haiku-4.5, grok-code-fast-1), followed by
premium models that require paid Copilot subscription.

* fix: set correct 64K context window for Copilot models

Copilot API enforces a 64K input token limit regardless of the
underlying model's native context window. Updated all model entries
and the default template to 64000 so compaction triggers correctly.

* fix: use actual per-model prompt limits from Copilot /models API

Queried api.githubcopilot.com/models for real max_prompt_tokens values.
GPT-4o/4.1 have 64K, Claude/gpt-5-mini have 128K, GPT-5.x have 272K.
Also updated model list to match what's actually available on the API
(e.g. claude-sonnet-4.6 instead of 4.5, added gpt-5.4/5.2-codex).

* feat: resize images for Copilot using VS Code's algorithm

Large screenshots cause 413 errors on Copilot's API. Resize images
following VS Code's approach: max 2048px longest side, 768px shortest
side, re-encode as JPEG at 75% quality. Uses sharp for server-side
image processing.

* fix: address all Greptile P1 review comments

- Add .catch() on fire-and-forget pollDeviceCode to prevent unhandled
  rejection crashes (Node 15+)
- Add deduplication guard (activeDeviceFlows Set) to prevent concurrent
  device code flows for the same provider
- Add runtime validation of server response in frontend before calling
  window.open() and showing toast
- Remove dead GITHUB_DEVICE_VERIFICATION constant from urls.ts

* fix: upgrade biome to 2.4.8, fix all lint errors, and address review bugs

- Upgrade biome from 2.4.5 to 2.4.8 (matches CI) and migrate configs
- Fix image resize: only re-encode when dimensions actually change
- Fix device code polling: retry on transient network errors instead of aborting
- Allow restarting device code flow (clear old flow instead of throwing 500)
- Fix pre-existing noNonNullAssertion and noExplicitAny lint errors globally

* fix: address Greptile P2 review — image resize and config guard

- Fix early-return guard: check max/min sides against their respective
  limits (MAX_LONG_SIDE/MAX_SHORT_SIDE) instead of both against SHORT
- Preserve PNG alpha: detect hasAlpha and keep PNG format instead of
  unconditionally converting to lossy JPEG
- Keep browserosId guard in resolveGitHubCopilotConfig consistent with
  ChatGPT Pro pattern (safety check that caller context is valid)

* feat: update Copilot models to full list from pricing page, default to gpt-5-mini

Added all 23 models from GitHub Copilot pricing page. Ordered with
free-tier models first (gpt-5-mini, claude-haiku-4.5), then premium.
Changed default from gpt-4o to gpt-5-mini since it's unlimited on
Pro plan and has 128K context (vs gpt-4o's 64K limit).
2026-03-20 02:33:09 +05:30
shivammittal274
cee9c764b1 fix(skills): read-only view mode for built-in skills (#494)
* fix(skills): read-only view mode for built-in skills

- SkillCard shows Eye icon + "View" for built-in, Pencil + "Edit" for user
- SkillDialog in read-only mode: disabled fields, no toolbar on markdown
  editor, "View Skill" title, "Close" button, no "Update Skill"
- Hide tip section in read-only mode

* fix(skills): use react-markdown for read-only skill view

Replace MDXEditor with react-markdown for viewing built-in skills.
MDXEditor chokes on code fences, angle brackets, and image syntax
causing content truncation. react-markdown handles standard markdown
correctly with no rendering issues.
2026-03-19 23:48:51 +05:30
Nikhil
7bdeeb85d5 fix: revert: convert settings to popup dialog (#477) (#498)
* Revert "feat: convert settings to popup dialog (#477)"

This reverts commit 42aa0ff1ef.

* fix: address review feedback for PR #498

- Remove erroneous SETTINGS_PAGE_VIEWED_EVENT tracking from SidebarLayout
  (was firing on every non-settings page navigation)
- Fix mobile settings sidebar not closing on route change by merging
  setMobileOpen(false) into the pathname-dependent analytics useEffect

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 11:13:14 -07:00
Dani Akash
19069cb9c4 fix: newtab layout (#497) 2026-03-19 20:40:38 +05:30
Dani Akash
5bb6143373 feat: display selected text from page in sidepanel (#496)
* feat: select text and pass to sidepanel

* fix: lint issues

* fix: persist selection across tabs

* fix: review comments

* fix: change when the selection is cleared

* feat: sanitize url
2026-03-19 20:21:31 +05:30
Dani Akash
f4d4b73a24 fix: improved memory tools (#495)
* fix: new prompt update tool

* fix: memory search tool

* fix: all review comments

* chore: remove dead code
2026-03-19 19:01:25 +05:30
Dani Akash
d965698905 fix: biome & tsc setup across repo (#493)
* fix: biome lint issues

* fix: code quality workflow

* fix: all lint issues

* chore: test lefthook pre-commit hook

* chore: test lefthook with agent file

* chore: revert test comment from lefthook verification

* feat: setup tsgo for typechecking agent

* fix: typecheck cli command

* fix: early return to prevent errors
2026-03-19 18:18:24 +05:30
shivammittal274
50b2f45590 fix(skills): UI section separation and fix find-alternatives rendering (#492)
* fix(skills): UI section separation and fix find-alternatives rendering

- Split skills page into "My Skills" (user) and "BrowserOS Skills" (built-in) sections
- Fix find-alternatives SKILL.md — replace angle bracket placeholders with curly
  braces to prevent MDXEditor from parsing them as JSX and rendering empty content

* fix(skills): bump find-alternatives to v1.1 for CDN sync
2026-03-19 17:38:28 +05:30
Dani Akash
1b88ade021 feat: updated homepage chat (#481)
* feat: updated chat ui from homepage

* fix: vertical scroll

* fix: horizontal scroll issue

* fix: lint issues

* fix: header width

* fix: message input from home to chat

* feat: created sidebar header support in new tab chat

* fix: remove history from new tab chat

* fix: remove the shared element transition

* fix: lint issues

* fix: review comments

* fix: defer the sendMessage callback

* fix: all code concerns

* fix: preserve state of chat on homepage

* fix: review comments
2026-03-19 15:24:05 +05:30
shivammittal274
079a254fa4 fix(skills): separate built-in and user skills into distinct directories (#487)
* fix(skills): separate built-in and user skills into distinct directories

- Move built-in skills to ~/.browseros/skills/builtin/, user skills stay in root
- Unify seed + sync into single syncBuiltinSkills() function, delete seed.ts
- Preserve user's enabled/disabled state during remote sync version updates
- Add catalog reconciliation — remove built-in skills dropped from remote catalog
- Fallback to bundled defaults per-skill when remote sync fails
- One-time migration moves existing default skills from root to builtin/
- Add builtIn field to SkillMeta, determined by directory (not metadata)
- UI shows "Built-in" badge, hides delete button for built-in skills
- Reject deletion of built-in skills in service layer
- Check both dirs for ID collision on skill creation

* fix(skills): address review — dedup by id, guard applyEnabled regex

- loader.ts: deduplication now keys on skill.id (directory slug) not
  skill.name (display name), preventing silent drops on name collision
- remote-sync.ts: applyEnabled checks if regex matched before writing,
  logs warning if remote content lacks an enabled field

* fix(skills): reconciliation preserves bundled defaults, delete returns 403

- reconcileRemovedSkills now keeps DEFAULT_SKILLS IDs in the safe set,
  preventing delete-then-reinstall cycle that lost enabled:false state
- DELETE /skills/:id returns 403 for built-in skills instead of 500

* refactor(skills): simplify syncBuiltinSkills to single clean pass

Build content map (bundled + remote), iterate once, preserve enabled,
reconcile deletions. Removes 7 helper functions, 70 lines of code.

* refactor(skills): extract syncOneSkill, patch content before writing

- syncBuiltinSkills is now 15 lines: build map, iterate, clean up
- syncOneSkill: flat, patches enabled state before writing (single write)
- setEnabled: pure function for content patching
- removeObsoleteSkills: extracted from inline block
2026-03-19 13:35:47 +05:30
Felarof
42aa0ff1ef feat: convert settings to popup dialog (#477)
* feat: convert settings page to popup dialog, move workflows to main nav

Replace the dedicated settings page layout (SettingsSidebarLayout) with a
modal dialog (SettingsDialog) that opens on top of the current page. Settings
are now accessible via a dialog triggered from the main sidebar, eliminating
the confusing dual-sidebar navigation pattern.

- Create SettingsDialog with tabbed left panel and content area
- Move Workflows into main sidebar navigation (feature-gated)
- Remove /settings/* routes (except /settings/survey)
- Delete SettingsSidebarLayout and SettingsSidebar components
- Update backward compatibility redirects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: setup new urls for the dialog box

* fix: dialog close button

* fix: settings analytics

* fix: address review comments

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Dani Akash <DaniAkash@users.noreply.github.com>
2026-03-18 23:26:13 +05:30
shivammittal274
4000f094f6 Feat/chatgpt pro polish (#484)
* fix: ChatGPT Pro UI polish — fix undefined display and add icon

- Fix "gpt-5.3-codex · undefined" — hide baseUrl when not set
- Add OpenAI icon for chatgpt-pro provider in icon map

* chore: rename ChatGPT Pro to ChatGPT Plus/Pro (supports both plans)

* chore: remove accidentally committed files
2026-03-18 22:51:22 +05:30
shivammittal274
151be81cee fix: ChatGPT Pro UI polish — fix undefined display and add icon (#483)
- Fix "gpt-5.3-codex · undefined" — hide baseUrl when not set
- Add OpenAI icon for chatgpt-pro provider in icon map
2026-03-18 22:23:28 +05:30
shivammittal274
46a8326140 feat: add ChatGPT Pro OAuth as LLM provider (#476)
* feat: add ChatGPT Pro OAuth as LLM provider

Adds OAuth 2.0 (Authorization Code + PKCE) flow so users can authenticate
with their ChatGPT Pro subscription to power BrowserOS's agent, matching
the pattern used by Codex CLI, OpenCode, and Pi.

Server:
- OAuth token lifecycle (PKCE, exchange, refresh, SQLite storage)
- Dedicated callback server on port 1455 (Codex client ID registration)
- Codex fetch wrapper routing API calls to chatgpt.com/backend-api
- Config resolution + provider factories for all code paths (chat, test, refine)

Extension:
- ChatGPT Pro template card with OAuth flow trigger
- Status polling hook + auto-create provider on auth success
- Model list with Codex-supported models (gpt-5.x-codex family)

* fix: address Greptile PR review comments

- Wire OAuth callback server stop handle into onShutdown (P1: port 1455 leak)
- Guard against missing refresh token + clear stale tokens on failed refresh (P1)
- Add logger.warn to silent catch in codex-fetch body mutation
- Document JWT trust assumption in parseAccessTokenClaims
- Source model ID from provider template instead of hard-coding

* simplify: remove unnecessary OAuth shutdown wiring and useCallback

- Revert OAuthHandle interface — callback server port releases on process exit
- Remove stopCallbackServer from shutdown flow (dead code)
- Remove all useCallback from useOAuthStatus per CLAUDE.md guidance

* style: add readonly modifiers and braces per TS style guide

* docs: add E2E test screenshots for ChatGPT Pro OAuth

* fix: strip item IDs from Codex requests to fix multi-turn conversations

* fix: preserve function_call_output IDs in Codex requests

* fix: resolve Codex store=false + tool-use incompatibility

- Pass providerOptions { openai: { store: false } } to ToolLoopAgent
  so the AI SDK inlines content instead of using item_reference
- Strip item IDs and previous_response_id in codex-fetch (safety net)
- Use .responses() model (Codex only speaks Responses API format)

* fix: remove non-Codex model gpt-5.2 from chatgpt-pro model list

* fix: strip unsupported Codex params and update model list

- Strip temperature, max_tokens, top_p from Codex requests (unsupported)
- Add all available Codex models including gpt-5.4, gpt-5.2, gpt-5.1

* chore: remove screenshots containing email

* feat: enable reasoning events for ChatGPT Pro Codex models

* chore: set reasoning effort to high for ChatGPT Pro

* feat: add configurable reasoning effort and summary for ChatGPT Pro

- Add reasoningEffort (none/low/medium/high) and reasoningSummary
  (auto/concise/detailed) dropdowns in the Edit Provider dialog
- Pass through extension → chat request → agent config → providerOptions
- Defaults: effort=high, summary=auto

* fix: strip max_output_tokens from Codex requests (fixes compaction)

* fix: address Greptile P1 issues

- Fix default model fallback: gpt-4o → gpt-5.3-codex (Codex endpoint)
- Clear stale tokens on refresh failure (prevents infinite retry loop)
- Only auto-create provider after explicit OAuth flow, not on page load
- Add catch block to auto-create effect with error toast
2026-03-18 22:07:43 +05:30
Dani Akash
4b18723a21 fix: undo shortcut in rewrite button (#472)
* fix: undo shortcut in rewrite button

* fix: address reviews
2026-03-18 07:04:48 +05:30
Nikhil
4909927c03 chore: bump PATCH and OFFSET (#479) 2026-03-17 17:41:45 -07:00