* fix(server): tighten CORS allowlist for the agent server
Replace the permissive `origin || '*'` reflection in
`defaultCorsConfig` with an explicit allowlist composed of:
- a static list (empty by default)
- comma-separated origins from `BROWSEROS_TRUSTED_ORIGINS`
Add a small `requireTrustedOrigin` middleware that actively
rejects (403) any request whose `Origin` header is present and
not in the allowlist. The middleware is permissive when the
`Origin` header is absent — CLI tools, internal Node clients,
and some service-worker fetches legitimately omit it; the
threat model only covers cross-origin browser fetches, which
always carry `Origin` (it's on the Forbidden Header List, so
JS cannot suppress it).
Mount the middleware globally in `createHttpServer` after the
existing `cors()` layer. Document the new env var in
`.env.example`.
Tests cover allowlist parsing (empty, single, multi, trims,
case sensitivity, port match) and middleware behaviour
(missing Origin allowed, allowlisted Origin allowed, unknown
Origin rejected, "null" rejected, port mismatch rejected,
disallowed Origin doesn't reach the handler).
* fix(server): include published extension origin in default allowlist
Pin the published BrowserOS extension origin in the static
allowlist so the default install accepts the legitimate
extension without requiring `BROWSEROS_TRUSTED_ORIGINS` to be
populated. Additional origins (dev / alpha) keep working
through the env override.
* chore(server): trim .env.example comments
* chore(server): drop redundant comments from cors helpers
* feat: add deterministic eval graders (AGI SDK + WebArena-Infinity)
Two new benchmark integrations with programmatic grading — no LLM judge.
AGI SDK / REAL Bench (52 tasks):
- 11 React/Next.js clones of consumer apps (DoorDash, Amazon, Gmail, etc.)
- Grader navigates browser to /finish, extracts state diff from <pre> tag
- Python verifier checks exact values via jmespath queries
WebArena-Infinity (50 hard tasks):
- 13 LLM-generated SaaS clones (Gmail, GitLab, Linear, Figma, etc.)
- InfinityAppManager starts fresh app server per task per worker
- Python verifier calls /api/state and asserts on JSON state
Infrastructure:
- GraderInput extended with mcpUrl + infinityAppUrl for parallel workers
- Each worker gets isolated ports (no cross-worker state contamination)
- CI workflow: pip install agisdk, clone webarena-infinity repo
* chore: switch eval configs back to kimi-k2p5
* fix: register deterministic graders in pass rate calculation
Add agisdk_state_diff and infinity_state to PASS_FAIL_GRADER_ORDER
in both runner types and weekly report script, so scores show correctly
in the dashboard.
* chore: temp switch to opus 4.6 for eval run
* chore: restore kimi-k2p5 as default eval config
* ci: add timeout and continue-on-error for trend report step
* feat(llm): Minimax Chinese and International Users providers
* fix(llm): Patch for p2 bugs
* fix(agent): correct MiniMax base URL handling and enforce API key validation
* fix(agent): add minimax entry to PROVIDER_DISPLAY_NAMES
The Record<ProviderType, string> map in ChatError.tsx was missing
the new minimax key added in this PR, causing a typecheck failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: krish-mm <112251957+krish-mm@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(credits): move credits fetch to extension side using install_id
Extension now reads `browseros.metrics_install_id` pref directly and fetches
credits from `llm.browseros.com` without going through the bundled server.
Unblocks the referral submit flow in prod without requiring a BrowserOS
binary release.
- Revert `/credits` route change that added `browserosId` to the response.
- Add `getOrCreateBrowserosId()` helper reading from BrowserOS prefs.
- Add `CREDITS_GATEWAY` to shared EXTERNAL_URLS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(credits): drop fallback UUID, read install_id directly
Extension only runs inside BrowserOS, so the prefs API is always available.
The chrome.storage fallback was dead code that would generate a ghost ID
diverging from the server's install_id anyway. Rename the helper to match
its simpler contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(credits): guard against empty install_id pref
Address Greptile P1 — throw instead of silently fetching `/credits/null`
when `browseros.metrics_install_id` is unset. Fails loudly so the broken
state is observable rather than masquerading as a credits outage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): declare @browseros/shared as workspace dependency
The agent app imports @browseros/shared/constants/urls in
lib/referral/submit-referral.ts but never declared the package in its
dependencies, so vite failed to resolve the import during dev.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(referral): cap daily referral earnings at 500 credits
Block tweet submissions client-side once the user's balance reaches
500 to prevent unlimited credit farming via repeated shares.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(referral): randomize tweet variations for Twitter share
Replace the single hardcoded share text with 10 feature-specific
variations (agent mode, chat, scheduled tasks, connect apps, cowork,
workflows, memory, skills, local models, ad blocking) and pick one at
random each time the share button is clicked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(referral): regenerate share URL on click
Previously getShareOnTwitterUrl() was evaluated once at render time as
a static href, so every click produced the same tweet variation. Move
the call into onClick so a new random variation is picked each time.
Addresses Greptile P1 review on PR #737.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): clarify upstream provider rate-limit errors
When a non-BrowserOS provider (OpenAI, Anthropic, OpenRouter, etc.)
returned a 429, ChatError rendered the retry-wrapped message
"Failed after 3 attempts. Last error: The usage limit has been reached"
with a generic "Something went wrong" title, leading users to blame
BrowserOS for throttling imposed by their configured upstream.
Detect upstream 429s in parseErrorMessage, show the provider name in
the title ("OpenAI rate limit reached"), strip the retry prefix,
render the raw upstream message, and add clarifying subtext that
names the provider and explicitly excludes BrowserOS. Skip the
BrowserOS-specific ShareForCredits / survey / upgrade affordances on
this path — they do not apply.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile review comments
- Tighten 429 pattern to \b429\b so it only matches the standalone
status code, not incidental substrings (model IDs, paths, etc.).
- Unwrap JSON-encoded provider error bodies on the upstream-rate-limit
path so users see the human-readable message instead of raw JSON.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(referral): show share rules and lower default daily limit fallback
Surface the three referral validation rules (must mention @browserOS_ai,
posted within last 30 minutes, single-use) directly in the ShareForCredits
UI so users understand submission requirements before pasting a tweet link.
Also align the UsagePage daily-limit fallback (used while credits load) with
the gateway default of 50.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(usage): handle credit balance exceeding daily limit
The "Credits used today" stat was computed as `dailyLimit - credits`,
which goes negative once a referral bonus pushes the balance above the
daily cap (e.g. balance 294 with cap 100 showed "-194 of 100"). Clamp
the math to zero and surface a separate "Bonus credits" stat when the
balance exceeds the daily allowance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: add Twitter share referral UI and expose browserosId
When credits are exhausted, users now see a "Share on Twitter" CTA with
a pre-filled tweet URL and an input to paste their tweet link. Reusable
ShareForCredits component used in both ChatError and UsagePage. Server's
GET /credits now includes browserosId for the extension to pass to the
referral service.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: rebuild chat session on provider change
* fix: address Greptile review comments
- Move referral service URL to EXTERNAL_URLS
- Guard submitReferral on !response.ok
- Remove stale TODO comment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: enable agent interaction with elements inside iframes
Fetch accessibility trees from all frames via Page.getFrameTree() +
per-frame Accessibility.getFullAXTree(frameId), so iframe elements
appear in snapshots with valid backendNodeIds. Pages without iframes
take the original single-call path with zero overhead.
Update snapshot tree builders to walk multiple RootWebArea roots from
merged multi-frame trees. Extract same-origin iframe content in the
markdown walker; show [iframe: url] placeholder for cross-origin.
* fix: namespace AX nodeIds by frameId to prevent cross-frame collisions
CDP AXNodeId values are frame-scoped — each frame's accessibility tree
starts its own counter from 1. Prefix nodeId and childIds with frameId
before merging so the nodeMap in snapshot builders never overwrites
nodes from a different frame.
* docs: add uBlock Origin install info to getting started and ad-blocking pages
Chrome dropped support for the full uBlock Origin extension — highlight
that BrowserOS brings it back and make it easy to install from both the
getting started guide and the dedicated ad-blocking page.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: revert Kimi partnership UI, restore daily limit survey
Remove Kimi/Moonshot AI partnership branding from the rate limit
banner, provider card, provider templates, and LLM hub. Restore
the original survey CTA on daily limit errors. Moonshot AI remains
as a regular provider template without the "Recommended" badge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address Greptile review comments
- Guard survey CTA with !isCreditsExhausted to avoid showing it for
credits-exhausted users who already see "View Usage & Billing"
- Remove dead kimi-launch feature flag files (kimi-launch.ts,
useKimiLaunch.ts)
- Remove unused KIMI_RATE_LIMIT analytics events
- Remove VITE_PUBLIC_KIMI_LAUNCH from env schema and .env.example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The merged PR (#661) injected custom entries into filteredModels, but
cmdk auto-scrolls to its first selected CommandItem, pushing the custom
entry out of view. Fix by using forceMount on a separate CommandGroup
and resetting scroll to top on every keystroke via requestAnimationFrame.
* feat: show custom model ID as first option in model selector
When typing in the model dropdown, the user's exact input now appears as the
first selectable row, followed by fuzzy search suggestions. This makes entering
custom model IDs intuitive — previously the option was hidden behind a
zero-results-only Enter shortcut that fuzzy search almost always prevented.
* fix: correct is_custom_model flag and prevent duplicate analytics events
- Use modelInfoList check instead of hardcoding is_custom_model: true in
the Enter key handler
- Add stopPropagation to prevent cmdk's root keydown handler from also
firing onSelect, which caused duplicate MODEL_SELECTED_EVENT emissions
* fix: install linux sysroot in configure, not via gclient hook
`gn gen` was failing on the arm64 leg with `Missing sysroot
(//build/linux/debian_bullseye_arm64-sysroot)`. The previous design
relied on `git_setup` writing `target_cpus` to `.gclient` so that
`gclient sync`'s DEPS hook would download the cross-arch sysroot. That
chain breaks for any chromium_src that was synced before cross-arch
support landed (the hook is gated on .gclient state at sync time) and
for partial pipeline runs that skip git_setup entirely. Nothing in
configure declared or verified its sysroot precondition.
Make configure self-healing: on Linux, invoke
`build/linux/sysroot_scripts/install-sysroot.py --arch=<target>`
directly before `gn gen`. install-sysroot.py is idempotent (stamp file
+ SHA check), fast when already installed, and decoupled from .gclient
— it's exactly what the failing assertion's error message recommends.
The script accepts our arch names directly: `x64` translates to `amd64`
internally via ARCH_TRANSLATIONS, and `arm64` is a valid pass-through.
Also temporarily pin release.linux.yaml to x64 only while we validate
the sysroot bootstrap end-to-end. Flip back to `[x64, arm64]` once
arm64 is green.
* chore: pin release.linux.yaml to arm64-only for sysroot bootstrap test
x64 already builds cleanly — the failing leg is arm64 cross-compile from
an x64 host. Pin the config to arm64 to exercise the new
install-sysroot.py path in configure without burning time on x64.
Flip back to [x64, arm64] once arm64 is green.
* feat(server): cache klavis createStrata to unblock /chat hot path
Conversation creation in /chat was blocking on a Worker-proxied
klavisClient.createStrata round-trip every time the user had any
managed Klavis app connected. The 5s KLAVIS_TIMEOUT_MS in the
ai-worker proxy existed specifically to bound this latency, but
the same cap also caused user-visible 504s on /klavis/servers/remove
since Strata DELETE operations routinely take >5s. Without caching
we couldn't raise the timeout without regressing chat creation.
This adds an in-process cache for Strata createStrata responses,
keyed by (browserosId, hashed sorted-server-set) and gated by a 1h
TTL. The cache stores only immutable JSON metadata (strataServerUrl,
strataId, addedServers); per-session MCP clients continue to be
opened and disposed by AiSdkAgent exactly as before, which keeps
the cache concurrency-safe by construction.
Cache invalidation has two layers: (a) the cache key embeds the
server set, so adding/removing apps naturally produces a different
key; (b) POST /klavis/servers/add and DELETE /klavis/servers/remove
explicitly call invalidate(browserosId) after their underlying
Klavis API call succeeds, as defense-in-depth.
Other changes:
- Consolidates klavis-related services into a new
apps/server/src/api/services/klavis/ directory; moves
register-klavis-mcp.ts -> strata-proxy.ts and adds strata-cache.ts
there. lib/clients/klavis/ stays unchanged.
- Refactors KlavisClient.removeServer into a low-level
deleteServersFromStrata(strataId, servers) primitive. The
cache-lookup + delete + invalidate orchestration moves up into
routes/klavis.ts where it belongs, eliminating the lib->api
layering inversion the original removeServer would have introduced.
- Uses Bun.hash (xxhash64) for fixed-width 16-hex-char keys, with
serverKey verified on read to make collision risk strictly zero.
- Dedupes concurrent fetches via in-flight Promise sharing, with
identity-checks before delete to avoid races between invalidate()
and a racing replacement insert.
Follow-up (separate PR): bump KLAVIS_TIMEOUT_MS to 30000 in
ai-worker/wrangler.toml so /klavis/servers/remove stops 504-ing.
* fix: address greptile review comments for klavis strata cache
- Drop dead `invalidated` field on InflightEntry. It was added to
support a "discard post-resolution if invalidated" check that I
later replaced with identity-checked deletes during self-review,
but I forgot to remove the field and the misleading comment
referencing it. Simplify Map<string, InflightEntry> to plain
Map<string, Promise<CacheEntry>>.
- Lower cache miss log from info to debug. Misses fire on every new
conversation; matching the existing debug-level for hits.
- Stop routing the /klavis/servers/remove handler through
klavisStrataCache.getOrFetch. The chat hot path keys its cache by
the user's full enabled-server set (e.g. hash('Gmail,Linear')),
so a single-server lookup here (hash('Gmail')) is guaranteed to
miss, write a spurious entry, and then have it immediately
cleared by invalidate() on the next line. Call createStrata
directly to recover the strataId, mirroring the original
removeServer flow.
`release.linux.yaml` now declares `architecture: [x64, arm64]` and the
runner loops the entire pipeline once per architecture. depot_tools
fetches both Linux sysroots automatically — `git_setup` idempotently
ensures `target_cpus = ['x64', 'arm64']` is in `.gclient` before
`gclient sync`, so cross-compiling arm64 from an x64 host just works.
The resolver returns `List[Context]` (single-element for the common
single-arch case), and `build/cli/build.py` loops `execute_pipeline` over
the per-arch contexts. Modules stay 100% arch-agnostic — no new
orchestration module, no new YAML schema beyond the list form.
Also fix a cross-compile bug in `build/modules/package/linux.py`: the
appimagetool binary must match the BUILD machine's arch (it executes
locally), not the target arch. Split into a host-keyed
`LINUX_HOST_APPIMAGETOOL` lookup vs the existing target-keyed
`LINUX_ARCHITECTURE_CONFIG`. Target arch is still passed to appimagetool
via the `ARCH` env var.
- build/common/resolver.py: scalar OR list `architecture` -> List[Context]
- build/cli/build.py: loop pipeline per arch, log multi-arch headers
- build/config/release.linux.yaml: `architecture: [x64, arm64]`
- build/modules/setup/git.py: idempotent `target_cpus` edit on Linux
- build/modules/package/linux.py: host vs target appimagetool split
- build/modules/package/linux_test.py: cover the host/target split
The --compile-only and --ci flags served overlapping purposes for CI
builds. Remove --compile-only entirely since --ci already handles the
CI use case (skip R2, skip prod env validation, local zip packaging)
and --no-upload covers the upload-skipping use case for full builds.
The server release CI workflow fails on ubuntu-latest because
patch-windows-exe.ts requires Wine to run rcedit. Thread the existing
--ci flag through compileServerBinaries so Windows PE metadata patching
is skipped in CI mode with a warning log.
* feat: add server release workflow
* fix: address PR review comments for 0331-add_server_release_workflow
* refactor: rework 0331-add_server_release_workflow based on feedback
* refactor: rework 0331-add_server_release_workflow based on feedback
* feat(cli): skip self-update prompts for package manager installs
Checks BROWSEROS_INSTALL_METHOD env var (npm, brew) and skips automatic
update checks. Users should use their package manager's update mechanism.
FormatNotice now shows the appropriate upgrade command based on install method.
* feat(cli): add npm bin wrapper for browseros-cli
* feat(cli): add npm postinstall script to download platform binary
Downloads the correct platform binary from GitHub releases during npm
install, verifies SHA256 checksums, and extracts to .binary directory.
* feat(cli): add npm package metadata and README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: move npm package files to correct monorepo path
The bin wrapper and postinstall were created at apps/cli/npm/ instead of
packages/browseros-agent/apps/cli/npm/. Moves them to the correct location.
* style: use node: protocol for builtin module imports
* feat(cli): add Makefile npm targets and release workflow npm publish step
Adds npm-version and npm-publish Makefile targets for version sync.
Adds Node.js setup and npm publish step to the release workflow.
Adds npm/npx install instructions to release notes template.
* fix(cli): fail on missing checksum entry and limit redirect depth
- Abort if checksums.txt downloaded but archive entry is missing
- Warn if checksums.txt itself failed to download
- Cap redirect depth at 5 to prevent stack overflow on circular redirects
* fix(cli): match install.sh checksum behavior — warn instead of abort
The existing shell installer (install.sh) warns and continues when the
checksum entry is missing from checksums.txt. Match that behavior in the
npm postinstall to avoid unnecessary install failures. Both files come
from the same GitHub release, so the checksum is a corruption check,
not a strong security boundary.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The model picker in NewProviderDialog rendered inline, causing dialog
resizing and lacked keyboard navigation. Replace it with a Popover +
Command (shadcn Combobox) pattern and add fuse.js for fuzzy search.
- Replace custom ModelPickerList with Popover + Command dropdown
- Add fuse.js for fuzzy model search (replaces string.includes)
- Add MODEL_SELECTED_EVENT and AI_PROVIDER_UPDATED_EVENT analytics
- Enrich PROVIDER_SELECTED_EVENT with model_id in chat sessions
* feat: add browseros-cli self-updater
* fix: address review comments for 0327-cli_self_updater
* fix: address PR review comments for 0327-cli_self_updater
* fix: replace goreleaser with Makefile-based release build
Remove .goreleaser.yml (required Pro license for monorepo field) and
consolidate cross-compilation into `make release`. CI now uses the same
Makefile target, fixing a bug where POSTHOG_API_KEY was missing from
release ldflags.
* fix: address critical self-updater bugs from code review
- Fix SHA256 checksum mismatch: verify archive checksum before extraction
instead of verifying extracted binary against archive hash (was always
failing). Add VerifyChecksum() and integration test.
- Fix JSON field name mismatch: TypeScript was emitting camelCase
(publishedAt, archiveFormat) but Go expected snake_case
(published_at, archive_format). Manifest parsing was silently broken.
- Add decompression size limit (256 MB) to prevent zip/gzip bombs.
- Don't update LastCheckedAt on transient errors so retry happens on
next CLI invocation instead of waiting 24h.
* feat: add PostHog usage analytics to CLI
Add anonymous command-level analytics to browseros-cli using the PostHog
Go SDK. Tracks which commands are executed, their success/failure status,
and duration — no PII or person profiles.
- New analytics package with Init/Track/Close singleton
- Distinct ID resolves from server's browseros_id (server.json), falls
back to CLI-generated UUID (~/.config/browseros-cli/install_id)
- API key injected at build time via ldflags (dev builds = silent no-op)
- Server now writes browseros_id into server.json for cross-surface
identity correlation
* fix: address PR review feedback for #603
- Return "unknown" for unrecognized args in commandName to avoid
sending arbitrary user input to PostHog
- Revert goreleaser to {{ .Env.POSTHOG_API_KEY }} (intentional hard
fail — release builds must have the key set)
- go mod tidy to fix posthog-go direct/indirect marker
- Add POSTHOG_API_KEY to .env.production.example
* feat: upload CLI binaries to CDN during release and gate workflow to core team
- Extend scripts/build/cli/upload.ts with uploadCliRelease() that pushes
archives + checksums to R2 under versioned (cli/v{VERSION}/) and latest
(cli/latest/) paths, plus a version.txt for lightweight latest resolution
- Update scripts/build/cli.ts entry point with --release/--version/--binaries-dir
flags (existing no-args behavior preserved for upload:cli-installers)
- Rewrite install.sh and install.ps1 to fetch from cdn.browseros.com instead of
GitHub releases API — eliminates rate limits and API dependency
- Add environment: release-core to release-cli.yml for core-team gating via
GitHub environment protection rules
- Add Bun setup + CDN upload step to the workflow between build and GitHub release
* fix: address review feedback for PR #602
- Make loadProdEnv return empty map when .env.production is absent so
pickEnv falls through to process.env in CI (Greptile P1)
- Add semver format validation for version string in install.sh and
install.ps1 to guard against malformed CDN responses
- Pass inputs.version via env var instead of inline ${{ }} interpolation
to prevent command injection in workflow shell
* test: temporarily allow release workflow on any branch
* fix(cli): restore main-only guard, remove goreleaser dependency
Replaces GoReleaser (Pro-only monorepo feature) with plain go build.
Tested: RC release created successfully on branch with all 6 binaries.
* fix(cli): fix hdiutil mount detection, update README with install/launch/init flow
* test: temporarily allow release workflow on any branch
* fix(cli): restore main-only guard, remove goreleaser dependency
Replaces GoReleaser (Pro-only monorepo feature) with plain go build.
Tested: RC release created successfully on branch with all 6 binaries.
* fix(cli): remove -quiet from hdiutil so mount point is detected