Compare commits

...

16 Commits

Author SHA1 Message Date
Dani Akash
dde403962f fix(server): tighten CORS allowlist for the agent server (#966)
* fix(server): tighten CORS allowlist for the agent server

Replace the permissive `origin || '*'` reflection in
`defaultCorsConfig` with an explicit allowlist composed of:

- a static list (empty by default)
- comma-separated origins from `BROWSEROS_TRUSTED_ORIGINS`

Add a small `requireTrustedOrigin` middleware that actively
rejects (403) any request whose `Origin` header is present and
not in the allowlist. The middleware is permissive when the
`Origin` header is absent — CLI tools, internal Node clients,
and some service-worker fetches legitimately omit it; the
threat model only covers cross-origin browser fetches, which
always carry `Origin` (it's on the Forbidden Header List, so
JS cannot suppress it).

Mount the middleware globally in `createHttpServer` after the
existing `cors()` layer. Document the new env var in
`.env.example`.

Tests cover allowlist parsing (empty, single, multi, trims,
case sensitivity, port match) and middleware behaviour
(missing Origin allowed, allowlisted Origin allowed, unknown
Origin rejected, "null" rejected, port mismatch rejected,
disallowed Origin doesn't reach the handler).

* fix(server): include published extension origin in default allowlist

Pin the published BrowserOS extension origin in the static
allowlist so the default install accepts the legitimate
extension without requiring `BROWSEROS_TRUSTED_ORIGINS` to be
populated. Additional origins (dev / alpha) keep working
through the env override.

* chore(server): trim .env.example comments

* chore(server): drop redundant comments from cors helpers
2026-05-08 11:22:54 +05:30
shivammittal274
4a3b9ff294 feat: deterministic eval graders (AGI SDK + WebArena-Infinity) (#664)
* feat: add deterministic eval graders (AGI SDK + WebArena-Infinity)

Two new benchmark integrations with programmatic grading — no LLM judge.

AGI SDK / REAL Bench (52 tasks):
- 11 React/Next.js clones of consumer apps (DoorDash, Amazon, Gmail, etc.)
- Grader navigates browser to /finish, extracts state diff from <pre> tag
- Python verifier checks exact values via jmespath queries

WebArena-Infinity (50 hard tasks):
- 13 LLM-generated SaaS clones (Gmail, GitLab, Linear, Figma, etc.)
- InfinityAppManager starts fresh app server per task per worker
- Python verifier calls /api/state and asserts on JSON state

Infrastructure:
- GraderInput extended with mcpUrl + infinityAppUrl for parallel workers
- Each worker gets isolated ports (no cross-worker state contamination)
- CI workflow: pip install agisdk, clone webarena-infinity repo

* chore: switch eval configs back to kimi-k2p5

* fix: register deterministic graders in pass rate calculation

Add agisdk_state_diff and infinity_state to PASS_FAIL_GRADER_ORDER
in both runner types and weekly report script, so scores show correctly
in the dashboard.

* chore: temp switch to opus 4.6 for eval run

* chore: restore kimi-k2p5 as default eval config

* ci: add timeout and continue-on-error for trend report step
2026-04-23 13:11:55 +05:30
Felarof
1a2fe3a5bf feat(llm): Minimax LLM provider for Chinese and International Users (#756)
* feat(llm): Minimax Chinese and International Users providers

* fix(llm): Patch for p2 bugs

* fix(agent): correct MiniMax base URL handling and enforce API key validation

* fix(agent): add minimax entry to PROVIDER_DISPLAY_NAMES

The Record<ProviderType, string> map in ChatError.tsx was missing
the new minimax key added in this PR, causing a typecheck failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: krish-mm <112251957+krish-mm@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:40:50 -07:00
Felarof
392cd58932 docs(byollm): add NVIDIA free endpoint provider (#784)
Document NVIDIA's free OpenAI-compatible API at build.nvidia.com — 80+ free models including GLM 5.1, MiniMax M2.7, Qwen 3.5, Mistral, and Nemotron — wired through BrowserOS's OpenAI Compatible provider template.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:45:37 -07:00
Felarof
b5bbbe1aff fix(credits): move credits fetch to extension side (#740)
* fix(credits): move credits fetch to extension side using install_id

Extension now reads `browseros.metrics_install_id` pref directly and fetches
credits from `llm.browseros.com` without going through the bundled server.
Unblocks the referral submit flow in prod without requiring a BrowserOS
binary release.

- Revert `/credits` route change that added `browserosId` to the response.
- Add `getOrCreateBrowserosId()` helper reading from BrowserOS prefs.
- Add `CREDITS_GATEWAY` to shared EXTERNAL_URLS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(credits): drop fallback UUID, read install_id directly

Extension only runs inside BrowserOS, so the prefs API is always available.
The chrome.storage fallback was dead code that would generate a ghost ID
diverging from the server's install_id anyway. Rename the helper to match
its simpler contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(credits): guard against empty install_id pref

Address Greptile P1 — throw instead of silently fetching `/credits/null`
when `browseros.metrics_install_id` is unset. Fails loudly so the broken
state is observable rather than masquerading as a credits outage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:27:21 -07:00
Felarof
4f03afcac8 chore: add .auctor entries to gitignore (#738)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 18:00:20 -07:00
Felarof
6d3498c91b fix: randomized tweet variations + referral fixes (#737)
* fix(agent): declare @browseros/shared as workspace dependency

The agent app imports @browseros/shared/constants/urls in
lib/referral/submit-referral.ts but never declared the package in its
dependencies, so vite failed to resolve the import during dev.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(referral): cap daily referral earnings at 500 credits

Block tweet submissions client-side once the user's balance reaches
500 to prevent unlimited credit farming via repeated shares.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(referral): randomize tweet variations for Twitter share

Replace the single hardcoded share text with 10 feature-specific
variations (agent mode, chat, scheduled tasks, connect apps, cowork,
workflows, memory, skills, local models, ad blocking) and pick one at
random each time the share button is clicked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(referral): regenerate share URL on click

Previously getShareOnTwitterUrl() was evaluated once at render time as
a static href, so every click produced the same tweet variation. Move
the call into onClick so a new random variation is picked each time.

Addresses Greptile P1 review on PR #737.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:09:28 -07:00
Felarof
7f2e387903 fix(agent): clarify upstream provider rate-limit errors (#734)
* fix(agent): clarify upstream provider rate-limit errors

When a non-BrowserOS provider (OpenAI, Anthropic, OpenRouter, etc.)
returned a 429, ChatError rendered the retry-wrapped message
"Failed after 3 attempts. Last error: The usage limit has been reached"
with a generic "Something went wrong" title, leading users to blame
BrowserOS for throttling imposed by their configured upstream.

Detect upstream 429s in parseErrorMessage, show the provider name in
the title ("OpenAI rate limit reached"), strip the retry prefix,
render the raw upstream message, and add clarifying subtext that
names the provider and explicitly excludes BrowserOS. Skip the
BrowserOS-specific ShareForCredits / survey / upgrade affordances on
this path — they do not apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile review comments

- Tighten 429 pattern to \b429\b so it only matches the standalone
  status code, not incidental substrings (model IDs, paths, etc.).
- Unwrap JSON-encoded provider error bodies on the upstream-rate-limit
  path so users see the human-readable message instead of raw JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 16:14:45 -07:00
Felarof
fc00ed23bf feat(referral): show tweet share rules and lower default daily limit fallback (#731)
* feat(referral): show share rules and lower default daily limit fallback

Surface the three referral validation rules (must mention @browserOS_ai,
posted within last 30 minutes, single-use) directly in the ShareForCredits
UI so users understand submission requirements before pasting a tweet link.
Also align the UsagePage daily-limit fallback (used while credits load) with
the gateway default of 50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(usage): handle credit balance exceeding daily limit

The "Credits used today" stat was computed as `dailyLimit - credits`,
which goes negative once a referral bonus pushes the balance above the
daily cap (e.g. balance 294 with cap 100 showed "-194 of 100"). Clamp
the math to zero and surface a separate "Bonus credits" stat when the
balance exceeds the daily allowance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 15:34:33 -07:00
Felarof
b6d6d4eb1d feat: Twitter share referral UI for credit rewards (#729)
* feat: add Twitter share referral UI and expose browserosId

When credits are exhausted, users now see a "Share on Twitter" CTA with
a pre-filled tweet URL and an input to paste their tweet link. Reusable
ShareForCredits component used in both ChatError and UsagePage. Server's
GET /credits now includes browserosId for the extension to pass to the
referral service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rebuild chat session on provider change

* fix: address Greptile review comments

- Move referral service URL to EXTERNAL_URLS
- Guard submitReferral on !response.ok
- Remove stale TODO comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:25:04 -07:00
Felarof
f78068bb9d chore: add .omc/ to gitignore (#682)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 20:53:24 -07:00
github-actions[bot]
6b18ebb1d8 docs: update agent extension changelog for v0.0.99 (#660)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-04-10 09:53:44 -07:00
shivammittal274
1f2e783ab9 fix: enable agent interaction with elements inside iframes (#667)
* fix: enable agent interaction with elements inside iframes

Fetch accessibility trees from all frames via Page.getFrameTree() +
per-frame Accessibility.getFullAXTree(frameId), so iframe elements
appear in snapshots with valid backendNodeIds. Pages without iframes
take the original single-call path with zero overhead.

Update snapshot tree builders to walk multiple RootWebArea roots from
merged multi-frame trees. Extract same-origin iframe content in the
markdown walker; show [iframe: url] placeholder for cross-origin.

* fix: namespace AX nodeIds by frameId to prevent cross-frame collisions

CDP AXNodeId values are frame-scoped — each frame's accessibility tree
starts its own counter from 1. Prefix nodeId and childIds with frameId
before merging so the nodeMap in snapshot builders never overwrites
nodes from a different frame.
2026-04-09 23:14:53 +05:30
Felarof
df7873562d Revert Kimi partnership UI, restore daily limit survey (#663)
* docs: add uBlock Origin install info to getting started and ad-blocking pages

Chrome dropped support for the full uBlock Origin extension — highlight
that BrowserOS brings it back and make it easy to install from both the
getting started guide and the dedicated ad-blocking page.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: revert Kimi partnership UI, restore daily limit survey

Remove Kimi/Moonshot AI partnership branding from the rate limit
banner, provider card, provider templates, and LLM hub. Restore
the original survey CTA on daily limit errors. Moonshot AI remains
as a regular provider template without the "Recommended" badge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Greptile review comments

- Guard survey CTA with !isCreditsExhausted to avoid showing it for
  credits-exhausted users who already see "View Usage & Billing"
- Remove dead kimi-launch feature flag files (kimi-launch.ts,
  useKimiLaunch.ts)
- Remove unused KIMI_RATE_LIMIT analytics events
- Remove VITE_PUBLIC_KIMI_LAUNCH from env schema and .env.example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:39:00 -07:00
shivammittal274
412386b489 fix: ensure custom model entry is always visible in model selector (#662)
The merged PR (#661) injected custom entries into filteredModels, but
cmdk auto-scrolls to its first selected CommandItem, pushing the custom
entry out of view. Fix by using forceMount on a separate CommandGroup
and resetting scroll to top on every keystroke via requestAnimationFrame.
2026-04-09 02:40:38 +05:30
shivammittal274
33617ba9e7 feat: show custom model ID as first option in model selector (#661)
* feat: show custom model ID as first option in model selector

When typing in the model dropdown, the user's exact input now appears as the
first selectable row, followed by fuzzy search suggestions. This makes entering
custom model IDs intuitive — previously the option was hidden behind a
zero-results-only Enter shortcut that fuzzy search almost always prevented.

* fix: correct is_custom_model flag and prevent duplicate analytics events

- Use modelInfoList check instead of hardcoding is_custom_model: true in
  the Enter key handler
- Add stopPropagation to prevent cmdk's root keydown handler from also
  firing onSelect, which caused duplicate MODEL_SELECTED_EVENT emissions
2026-04-09 01:44:17 +05:30
61 changed files with 2265 additions and 263 deletions

View File

@@ -43,6 +43,12 @@ jobs:
working-directory: packages/browseros-agent
run: bun install --ignore-scripts && bun run build:agent-sdk
- name: Install Python eval dependencies
run: pip install agisdk requests
- name: Clone WebArena-Infinity
run: git clone --depth 1 https://github.com/web-arena-x/webarena-infinity.git /tmp/webarena-infinity
- name: Install xvfb
run: sudo apt-get update && sudo apt-get install -y xvfb
@@ -57,9 +63,11 @@ jobs:
working-directory: packages/browseros-agent/apps/eval
env:
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
NOPECHA_API_KEY: ${{ secrets.NOPECHA_API_KEY }}
BROWSEROS_BINARY: /usr/bin/browseros
WEBARENA_INFINITY_DIR: /tmp/webarena-infinity
EVAL_CONFIG: ${{ github.event.inputs.config || 'configs/browseros-agent-weekly.json' }}
run: |
echo "Running eval with config: $EVAL_CONFIG"
@@ -81,6 +89,8 @@ jobs:
- name: Generate trend report
if: success()
timeout-minutes: 5
continue-on-error: true
working-directory: packages/browseros-agent
env:
EVAL_R2_ACCOUNT_ID: ${{ secrets.EVAL_R2_ACCOUNT_ID }}

3
.gitignore vendored
View File

@@ -1,4 +1,6 @@
**/.DS_Store
**.auctor/**
.auctor.json
.gcs_entries
**/dmg
**/env
@@ -29,3 +31,4 @@ packages/browseros/build/tools/
# AI SDK DevTools traces
.devtools/
.omc/

View File

@@ -3,13 +3,17 @@ title: "Ad Blocking"
description: "BrowserOS supports full ad blocking with uBlock Origin"
---
BrowserOS supports full ad blocking through [uBlock Origin](https://ublockorigin.com/), the most effective open-source ad blocker available.
BrowserOS supports full ad blocking through [uBlock Origin](https://ublockorigin.com/), the most powerful open-source ad blocker available — the full extension, not the watered-down "Lite" version.
## How It Works
## Why BrowserOS?
Chrome has been [phasing out support](https://developer.chrome.com/docs/extensions/develop/migrate/mv2-deprecation-timeline) for Manifest V2 extensions, which uBlock Origin relies on for its full blocking capabilities. We re-enabled Manifest V2 support in BrowserOS so uBlock Origin can run at full power.
Chrome [killed support](https://developer.chrome.com/docs/extensions/develop/migrate/mv2-deprecation-timeline) for uBlock Origin by phasing out Manifest V2 extensions. The only option left on Chrome is "uBlock Origin Lite," a significantly weaker version that can't use advanced filtering rules.
Install it from the Chrome Web Store: [uBlock Origin](https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm)
**BrowserOS re-enabled full Manifest V2 support**, so you can install and run the original uBlock Origin at full power — the same extension Chrome no longer allows.
<Card title="Install uBlock Origin" icon="shield-check" href="https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm">
Install the full uBlock Origin extension from the Chrome Web Store. Works on BrowserOS out of the box.
</Card>
## BrowserOS vs Chrome

View File

@@ -131,6 +131,29 @@ Connect to powerful AI models using your API keys. Your keys stay on your machin
![Gemini config](/images/byollm--gemini-provider-config.png)
</Accordion>
<div id="nvidia" />
<Accordion title="NVIDIA (Free)" icon="microchip">
NVIDIA's [build.nvidia.com](https://build.nvidia.com/models) hosts 80+ models — including GLM 5.1, MiniMax M2.7, GPT-OSS-120B, Qwen 3.5, Mistral, and Nemotron — behind a **free OpenAI-compatible API endpoint**. Great for chatting, prototyping, and personal projects.
**Get your API key:**
1. Go to [build.nvidia.com/models](https://build.nvidia.com/models) and sign in with a free NVIDIA developer account
2. Pick any model tagged **Free Endpoint** (e.g. [`minimaxai/minimax-m2.7`](https://build.nvidia.com/minimaxai/minimax-m2.7), [`z-ai/glm-5.1`](https://build.nvidia.com/z-ai/glm-5.1), [`qwen/qwen3.5-122b-a10b`](https://build.nvidia.com/qwen/qwen3.5-122b-a10b))
3. Click **Get API Key** on the model page and copy the `nvapi-...` key
**Add to BrowserOS:**
1. Go to `chrome://browseros/settings`
2. Click **USE** on the **OpenAI Compatible** card
3. Set **Base URL** to `https://integrate.api.nvidia.com/v1`
4. Set **Model ID** to a model from the catalog (e.g. `minimaxai/minimax-m2.7`, `z-ai/glm-5.1`, `qwen/qwen3.5-122b-a10b`)
5. Paste your NVIDIA API key
6. Set **Context Window** based on the model (most are `128000` or higher)
7. Click **Save**
<Tip>
NVIDIA's free endpoints share GPU capacity across all developers, so throughput is slower than a paid API. They're best for Chat Mode, exploring new open-source models, and personal projects. For production agent workloads, use a paid provider like Claude or Kimi.
</Tip>
</Accordion>
<div id="claude" />
<Accordion title="Claude (Best for Agents)" icon="message-bot">
Claude Opus 4.5 gives the best results for Agent Mode.

View File

@@ -42,6 +42,10 @@ Welcome to BrowserOS! Let's get you set up.
## You're all set!
<Tip>
**Block ads with uBlock Origin** — Chrome dropped support for the full uBlock Origin extension, but BrowserOS brought it back. [Install it from the Chrome Web Store](https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm) and browse ad-free. [Learn more →](/features/ad-blocking)
</Tip>
Explore what BrowserOS can do:
<Columns cols={2}>

View File

@@ -15,9 +15,6 @@ VITE_PUBLIC_SENTRY_DSN=
# BrowserOS API URL
VITE_PUBLIC_BROWSEROS_API=https://api.browseros.com
# Launch feature flags
VITE_PUBLIC_KIMI_LAUNCH=false
# GraphQL Schema Path (optional — falls back to schema/schema.graphql)
GRAPHQL_SCHEMA_PATH=

View File

@@ -1,5 +1,16 @@
# BrowserOS Agent Extension
## v0.0.99 (2026-04-08)
## What's Changed
- chore: bump server and extension version (#659)
- chore(agent): remove workflows feature (#656)
- feat: replace model picker with shadcn Combobox + fuse.js fuzzy search (#617)
- feat: clean-up - remove obsolete controller extension (#610)
- docs: update agent extension changelog for v0.0.98 (#609)
## v0.0.98 (2026-03-27)
## What's Changed

View File

@@ -0,0 +1,148 @@
import { REFERRAL_LIMITS } from '@browseros/shared/constants/limits'
import { ExternalLink, Loader2, Send } from 'lucide-react'
import type { FC } from 'react'
import { useState } from 'react'
import { Button } from '@/components/ui/button'
import { Input } from '@/components/ui/input'
import { useCredits, useInvalidateCredits } from '@/lib/credits/useCredits'
import {
getShareOnTwitterUrl,
submitReferral,
} from '@/lib/referral/submit-referral'
interface ShareForCreditsProps {
compact?: boolean
}
export const ShareForCredits: FC<ShareForCreditsProps> = ({ compact }) => {
const [tweetUrl, setTweetUrl] = useState('')
const [isSubmitting, setIsSubmitting] = useState(false)
const [result, setResult] = useState<{
success: boolean
message: string
} | null>(null)
const { data } = useCredits()
const invalidateCredits = useInvalidateCredits()
const credits = data?.credits ?? 0
const atDailyMax = credits >= REFERRAL_LIMITS.MAX_DAILY_CREDITS
const handleSubmit = async () => {
if (!tweetUrl.trim() || !data?.browserosId || atDailyMax) return
setIsSubmitting(true)
setResult(null)
try {
const res = await submitReferral(tweetUrl.trim(), data.browserosId)
if (res.success) {
setResult({
success: true,
message: `${res.creditsAdded ?? 200} credits added!`,
})
setTweetUrl('')
invalidateCredits()
} else {
setResult({
success: false,
message: res.reason ?? 'Submission failed. Please try again.',
})
}
} catch {
setResult({
success: false,
message: 'Network error. Please try again.',
})
} finally {
setIsSubmitting(false)
}
}
if (atDailyMax) {
return (
<div className={compact ? 'space-y-2' : 'space-y-3'}>
<p className={compact ? 'text-muted-foreground text-xs' : 'text-sm'}>
You've reached the daily cap of {REFERRAL_LIMITS.MAX_DAILY_CREDITS}{' '}
credits. Come back tomorrow to earn more!
</p>
</div>
)
}
return (
<div className={compact ? 'space-y-2' : 'space-y-3'}>
<p className={compact ? 'text-muted-foreground text-xs' : 'text-sm'}>
Share BrowserOS on Twitter to earn{' '}
{REFERRAL_LIMITS.CREDITS_PER_REFERRAL} bonus credits!
</p>
<ul className="list-disc space-y-0.5 pl-4 text-muted-foreground text-xs">
<li>
Tweet must mention <span className="font-medium">@browserOS_ai</span>
</li>
<li>Tweet must be posted within the last 30 minutes</li>
<li>Each tweet can only be submitted once</li>
<li>
Daily cap of {REFERRAL_LIMITS.MAX_DAILY_CREDITS} credits — resets at
midnight UTC
</li>
</ul>
<Button variant="outline" size="sm" className="w-full gap-2" asChild>
<a
href={getShareOnTwitterUrl()}
target="_blank"
rel="noopener noreferrer"
onClick={(e) => {
e.currentTarget.href = getShareOnTwitterUrl()
}}
>
<ExternalLink className="h-3.5 w-3.5" />
Share on Twitter
</a>
</Button>
<p className="text-muted-foreground text-xs">
Already shared? Paste your tweet link:
</p>
<div className="flex gap-2">
<Input
type="url"
placeholder="https://x.com/..."
value={tweetUrl}
onChange={(e) => setTweetUrl(e.target.value)}
className="h-8 text-xs"
disabled={isSubmitting}
/>
<Button
variant="default"
size="sm"
onClick={handleSubmit}
disabled={isSubmitting || !tweetUrl.trim()}
className="shrink-0 gap-1.5"
>
{isSubmitting ? (
<Loader2 className="h-3.5 w-3.5 animate-spin" />
) : (
<Send className="h-3.5 w-3.5" />
)}
Submit
</Button>
</div>
{result && (
<p
className={
result.success
? 'text-green-600 text-xs dark:text-green-400'
: 'text-destructive text-xs'
}
>
{result.message}
</p>
)}
</div>
)
}

View File

@@ -8,7 +8,7 @@ import {
Loader2,
XCircle,
} from 'lucide-react'
import { type FC, useEffect, useMemo, useState } from 'react'
import { type FC, useEffect, useMemo, useRef, useState } from 'react'
import { useForm } from 'react-hook-form'
import { z } from 'zod/v3'
import { Button } from '@/components/ui/button'
@@ -61,10 +61,10 @@ import {
KIMI_API_KEY_GUIDE_CLICKED_EVENT,
MODEL_SELECTED_EVENT,
} from '@/lib/constants/analyticsEvents'
import { useKimiLaunch } from '@/lib/feature-flags/useKimiLaunch'
import {
getDefaultBaseUrlForProviders,
getProviderTemplate,
MINIMAX_REGIONS,
providerTypeOptions,
} from '@/lib/llm-providers/providerTemplates'
import { type TestResult, testProvider } from '@/lib/llm-providers/testProvider'
@@ -88,6 +88,7 @@ const providerTypeEnum = z.enum([
'chatgpt-pro',
'github-copilot',
'qwen-code',
'minimax',
])
/**
@@ -106,7 +107,7 @@ export const providerFormSchema = z
temperature: z.number().min(0).max(2),
// Azure-specific
resourceName: z.string().optional(),
// Bedrock-specific
// Bedrock-specific / MiniMax region
accessKeyId: z.string().optional(),
secretAccessKey: z.string().optional(),
region: z.string().optional(),
@@ -165,6 +166,30 @@ export const providerFormSchema = z
) {
// No validation needed — OAuth tokens are on the server
}
// MiniMax: require baseUrl + apiKey
else if (data.type === 'minimax') {
if (!data.baseUrl) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'Base URL is required',
path: ['baseUrl'],
})
} else if (!/^https?:\/\/.+/.test(data.baseUrl)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'Must be a valid URL',
path: ['baseUrl'],
})
}
if (!data.apiKey?.trim()) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'API Key is required',
path: ['apiKey'],
})
}
}
// Other providers: require baseUrl
else if (!data.baseUrl) {
ctx.addIssue({
@@ -223,9 +248,9 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
const [testResult, setTestResult] = useState<TestResult | null>(null)
const [modelPickerOpen, setModelPickerOpen] = useState(false)
const [modelSearch, setModelSearch] = useState('')
const modelListRef = useRef<HTMLDivElement>(null)
const { supports } = useCapabilities()
const { baseUrl: agentServerUrl } = useAgentServerUrl()
const kimiLaunch = useKimiLaunch()
const filteredProviderTypeOptions = providerTypeOptions.filter((opt) => {
if (opt.value === 'chatgpt-pro')
@@ -233,8 +258,6 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
if (opt.value === 'github-copilot')
return supports(Feature.GITHUB_COPILOT_SUPPORT)
if (opt.value === 'qwen-code') return supports(Feature.QWEN_CODE_SUPPORT)
if (opt.value === 'moonshot')
return kimiLaunch || initialValues?.type === 'moonshot'
if (opt.value === 'openai-compatible') {
return supports(Feature.OPENAI_COMPATIBLE_SUPPORT)
}
@@ -309,6 +332,9 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
? modelFuse.search(modelSearch).map((r) => r.item)
: modelInfoList
const showCustomEntry =
modelSearch && !filteredModels.some((m) => m.modelId === modelSearch)
// Handle provider type change (user-initiated via Select)
const handleTypeChange = (newType: ProviderType) => {
form.setValue('type', newType)
@@ -316,6 +342,9 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
if (defaultUrl) {
form.setValue('baseUrl', defaultUrl)
}
if (newType === 'minimax') {
form.setValue('region', 'chinese')
}
form.setValue('modelId', '')
}
@@ -722,6 +751,94 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
)
}
// Minimax: region selector
if (watchedType === 'minimax') {
return (
<>
<FormField
control={form.control}
name="region"
render={({ field }) => (
<FormItem>
<FormLabel>Region *</FormLabel>
<Select
onValueChange={(v) => {
field.onChange(v)
form.setValue(
'baseUrl',
MINIMAX_REGIONS[v as keyof typeof MINIMAX_REGIONS].api,
)
}}
value={field.value || 'chinese'}
>
<FormControl>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
</FormControl>
<SelectContent>
<SelectItem value="chinese">
Chinese (api.minimaxi.com)
</SelectItem>
<SelectItem value="international">
International (api.minimax.io)
</SelectItem>
</SelectContent>
</Select>
<FormDescription>
Choose the endpoint closest to your location
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<FormField
control={form.control}
name="baseUrl"
render={({ field }) => (
<FormItem>
<FormLabel>Base URL *</FormLabel>
<FormControl>
<Input placeholder="https://api.minimaxi.com/v1" {...field} />
</FormControl>
<FormMessage />
</FormItem>
)}
/>
<FormField
control={form.control}
name="apiKey"
render={({ field }) => (
<FormItem>
<FormLabel>API Key *</FormLabel>
<FormControl>
<Input
type="password"
placeholder="Enter your MiniMax API key"
{...field}
/>
</FormControl>
<FormDescription>
Your API key is encrypted and stored locally.{' '}
{setupGuideUrl && (
<a
href={setupGuideUrl}
onClick={handleSetupGuideClick}
className="inline-flex cursor-pointer items-center gap-1 text-primary hover:underline"
>
<ExternalLink className="h-3 w-3" />
{setupGuideText}
</a>
)}
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
</>
)
}
// Standard providers (OpenAI, Anthropic, Google, etc.)
return (
<>
@@ -894,59 +1011,96 @@ export const NewProviderDialog: FC<NewProviderDialogProps> = ({
<CommandInput
placeholder="Search models..."
value={modelSearch}
onValueChange={setModelSearch}
onValueChange={(v) => {
setModelSearch(v)
requestAnimationFrame(() => {
modelListRef.current?.scrollTo(0, 0)
})
}}
onKeyDown={(e) => {
if (
e.key === 'Enter' &&
modelSearch &&
filteredModels.length === 0
) {
if (e.key === 'Enter' && modelSearch) {
e.preventDefault()
e.stopPropagation()
form.setValue('modelId', modelSearch)
track(MODEL_SELECTED_EVENT, {
provider_type: watchedType,
model_id: modelSearch,
is_custom_model: true,
is_custom_model: !modelInfoList.some(
(m) => m.modelId === modelSearch,
),
})
setModelPickerOpen(false)
setModelSearch('')
}
}}
/>
<CommandList>
<CommandList ref={modelListRef}>
<CommandEmpty>
No models found. Press Enter to use &quot;
{modelSearch}&quot;
</CommandEmpty>
<CommandGroup>
{filteredModels.map((model) => (
{showCustomEntry && (
<CommandGroup forceMount>
<CommandItem
key={model.modelId}
value={model.modelId}
forceMount
value={`custom:${modelSearch}`}
onSelect={() => {
form.setValue('modelId', model.modelId)
form.setValue('modelId', modelSearch)
track(MODEL_SELECTED_EVENT, {
provider_type: watchedType,
model_id: model.modelId,
context_window: model.contextLength,
is_custom_model: false,
model_id: modelSearch,
is_custom_model: true,
})
setModelPickerOpen(false)
setModelSearch('')
}}
>
<span className="flex-1 truncate">
{model.modelId}
{modelSearch}
</span>
<span className="ml-2 shrink-0 rounded-md bg-muted px-1.5 py-0.5 font-mono text-[10px] text-muted-foreground">
{formatContextWindow(model.contextLength)}
</span>
{field.value === model.modelId && (
{field.value === modelSearch && (
<Check className="ml-2 h-4 w-4 shrink-0" />
)}
</CommandItem>
))}
</CommandGroup>
</CommandGroup>
)}
{filteredModels.length > 0 && (
<CommandGroup>
{filteredModels.map((model) => (
<CommandItem
key={model.modelId}
value={model.modelId}
onSelect={() => {
form.setValue('modelId', model.modelId)
track(MODEL_SELECTED_EVENT, {
provider_type: watchedType,
model_id: model.modelId,
context_window: model.contextLength,
is_custom_model: !modelInfoList.some(
(m) => m.modelId === model.modelId,
),
})
setModelPickerOpen(false)
setModelSearch('')
}}
>
<span className="flex-1 truncate">
{model.modelId}
</span>
{model.contextLength > 0 && (
<span className="ml-2 shrink-0 rounded-md bg-muted px-1.5 py-0.5 font-mono text-[10px] text-muted-foreground">
{formatContextWindow(
model.contextLength,
)}
</span>
)}
{field.value === model.modelId && (
<Check className="ml-2 h-4 w-4 shrink-0" />
)}
</CommandItem>
))}
</CommandGroup>
)}
</CommandList>
</Command>
</PopoverContent>

View File

@@ -2,7 +2,6 @@ import { Check, Loader2, Trash2 } from 'lucide-react'
import type { FC } from 'react'
import { Badge } from '@/components/ui/badge'
import { Button } from '@/components/ui/button'
import { useKimiLaunch } from '@/lib/feature-flags/useKimiLaunch'
import { BrowserOSIcon, ProviderIcon } from '@/lib/llm-providers/providerIcons'
import type { LlmProviderConfig } from '@/lib/llm-providers/types'
import { cn } from '@/lib/utils'
@@ -30,7 +29,6 @@ export const ProviderCard: FC<ProviderCardProps> = ({
isTesting = false,
}) => {
const inputId = `provider-${provider.id}`
const kimiLaunch = useKimiLaunch()
return (
<label
@@ -79,30 +77,21 @@ export const ProviderCard: FC<ProviderCardProps> = ({
</Badge>
)}
</div>
{isBuiltIn && provider.type === 'browseros' && kimiLaunch && (
<span className="mb-1 inline-block rounded-full border border-orange-300/60 bg-orange-100/70 px-3 py-0.5 font-semibold text-orange-700 text-xs dark:border-orange-400/40 dark:bg-orange-500/15 dark:text-orange-300">
In partnership with Moonshot AI
</span>
)}
<p className="truncate text-muted-foreground text-sm">
{isBuiltIn ? (
kimiLaunch ? (
'Extended usage limits for the next 2 weeks!'
) : (
<>
BrowserOS-hosted model with strict rate limits.{' '}
<a
href="https://docs.browseros.com/features/bring-your-own-llm"
target="_blank"
rel="noopener noreferrer"
className="underline hover:text-foreground"
onClick={(e) => e.stopPropagation()}
>
Bring your own key
</a>{' '}
for better performance.
</>
)
<>
BrowserOS-hosted model with strict rate limits.{' '}
<a
href="https://docs.browseros.com/features/bring-your-own-llm"
target="_blank"
rel="noopener noreferrer"
className="underline hover:text-foreground"
onClick={(e) => e.stopPropagation()}
>
Bring your own key
</a>{' '}
for better performance.
</>
) : provider.baseUrl ? (
`${provider.modelId}${provider.baseUrl}`
) : (

View File

@@ -7,7 +7,6 @@ import {
} from '@/components/ui/collapsible'
import { Feature } from '@/lib/browseros/capabilities'
import { useCapabilities } from '@/lib/browseros/useCapabilities'
import { useKimiLaunch } from '@/lib/feature-flags/useKimiLaunch'
import {
type ProviderTemplate,
providerTemplates,
@@ -23,7 +22,6 @@ export const ProviderTemplatesSection: FC<ProviderTemplatesSectionProps> = ({
onUseTemplate,
}) => {
const { supports } = useCapabilities()
const kimiLaunch = useKimiLaunch()
const filteredTemplates = providerTemplates.filter((template) => {
if (template.id === 'chatgpt-pro')
@@ -31,7 +29,6 @@ export const ProviderTemplatesSection: FC<ProviderTemplatesSectionProps> = ({
if (template.id === 'github-copilot')
return supports(Feature.GITHUB_COPILOT_SUPPORT)
if (template.id === 'qwen-code') return supports(Feature.QWEN_CODE_SUPPORT)
if (template.id === 'moonshot') return kimiLaunch
if (template.id === 'openai-compatible') {
return supports(Feature.OPENAI_COMPATIBLE_SUPPORT)
}
@@ -67,7 +64,6 @@ export const ProviderTemplatesSection: FC<ProviderTemplatesSectionProps> = ({
<ProviderTemplateCard
key={template.id}
template={template}
highlighted={template.id === 'moonshot'}
isNew={isNew}
onUseTemplate={onUseTemplate}
/>

View File

@@ -2,8 +2,6 @@ import { Globe2, Trash2 } from 'lucide-react'
import type { FC } from 'react'
import { useMemo } from 'react'
import { Button } from '@/components/ui/button'
import { useKimiLaunch } from '@/lib/feature-flags/useKimiLaunch'
import { cn } from '@/lib/utils'
import { getFaviconUrl, type LlmHubProvider } from './models'
interface HubProviderRowProps {
@@ -20,20 +18,9 @@ export const HubProviderRow: FC<HubProviderRowProps> = ({
onDelete,
}) => {
const iconUrl = useMemo(() => getFaviconUrl(provider.url), [provider.url])
const kimiLaunch = useKimiLaunch()
const normalizedName = provider.name.trim().toLowerCase()
const normalizedUrl = provider.url.trim().toLowerCase()
const isKimi = normalizedName === 'kimi' || normalizedUrl.includes('kimi.com')
const showKimiFlare = isKimi && kimiLaunch
return (
<div
className={cn(
'group flex w-full items-center gap-4 rounded-xl border border-border bg-card p-4 transition-all hover:border-[var(--accent-orange)] hover:shadow-md',
showKimiFlare &&
'border-orange-300/80 bg-orange-50/20 shadow-sm ring-1 ring-orange-300/45 dark:bg-orange-500/5',
)}
>
<div className="group flex w-full items-center gap-4 rounded-xl border border-border bg-card p-4 transition-all hover:border-[var(--accent-orange)] hover:shadow-md">
<div className="flex h-10 w-10 shrink-0 items-center justify-center overflow-hidden rounded-lg bg-muted">
{iconUrl ? (
<img
@@ -49,16 +36,6 @@ export const HubProviderRow: FC<HubProviderRowProps> = ({
<div className="min-w-0 flex-1">
<div className="mb-0.5 flex items-center gap-2">
<span className="block truncate font-semibold">{provider.name}</span>
{showKimiFlare && (
<div className="flex flex-wrap items-center gap-1">
<span className="rounded-full border border-orange-300/60 bg-orange-100/70 px-2 py-0.5 font-semibold text-[11px] text-orange-700 dark:border-orange-400/40 dark:bg-orange-500/15 dark:text-orange-300">
Recommended
</span>
<span className="rounded-full border border-orange-300/60 bg-orange-100/60 px-2.5 py-0.5 font-medium text-orange-700 text-xs dark:border-orange-400/40 dark:bg-orange-500/15 dark:text-orange-300">
Powered by Moonshot AI
</span>
</div>
)}
</div>
<p className="truncate text-muted-foreground/70 text-xs">
{provider.url}

View File

@@ -1,5 +1,6 @@
import { AlertCircle, Clock, Coins, CreditCard, Zap } from 'lucide-react'
import { AlertCircle, Clock, Coins, Gift, Zap } from 'lucide-react'
import type { FC } from 'react'
import { ShareForCredits } from '@/components/referral/ShareForCredits'
import { Button } from '@/components/ui/button'
import {
getCreditBarColor,
@@ -43,8 +44,10 @@ export const UsagePage: FC = () => {
}
const credits = data?.credits ?? 0
const total = data?.dailyLimit ?? 100
const total = data?.dailyLimit ?? 50
const percentage = Math.min((credits / total) * 100, 100)
const bonusCredits = Math.max(0, credits - total)
const creditsUsed = Math.max(0, total - credits)
return (
<div className="space-y-6 p-6">
@@ -95,30 +98,32 @@ export const UsagePage: FC = () => {
<div className="flex items-center gap-2.5 rounded-lg bg-muted/50 px-3 py-2.5">
<Zap className="h-4 w-4 shrink-0 text-muted-foreground" />
<div>
<p className="font-medium text-xs">Credits used today</p>
<p className="text-muted-foreground text-xs">
{total - credits} of {total}
</p>
{bonusCredits > 0 ? (
<>
<p className="font-medium text-xs">Bonus credits</p>
<p className="text-muted-foreground text-xs">
+{bonusCredits} from referrals
</p>
</>
) : (
<>
<p className="font-medium text-xs">Credits used today</p>
<p className="text-muted-foreground text-xs">
{creditsUsed} of {total}
</p>
</>
)}
</div>
</div>
</div>
</div>
<div className="rounded-xl border p-5">
<div className="flex items-center gap-3">
<CreditCard className="h-5 w-5 text-muted-foreground" />
<div>
<p className="flex items-center gap-2 font-semibold text-sm">
Need more credits?
<span className="rounded-full bg-muted px-2 py-0.5 font-medium text-[10px] text-muted-foreground uppercase tracking-wide">
Coming soon
</span>
</p>
<p className="text-muted-foreground text-xs">
Additional credit packages will be available soon
</p>
</div>
<div className="mb-4 flex items-center gap-2">
<Gift className="h-5 w-5 text-muted-foreground" />
<span className="font-semibold text-sm">Earn More Credits</span>
</div>
<ShareForCredits />
</div>
<div className="rounded-xl border border-[var(--accent-orange)]/30 bg-[var(--accent-orange)]/5 p-5">

View File

@@ -1,20 +1,59 @@
import { AlertCircle, RefreshCw } from 'lucide-react'
import type { FC } from 'react'
// import { useMemo } from 'react'
import { useMemo } from 'react'
import { ShareForCredits } from '@/components/referral/ShareForCredits'
import { Button } from '@/components/ui/button'
import type { ProviderType } from '@/lib/llm-providers/types'
// --- Commented out for Kimi partnership launch (restore after) ---
// const SURVEY_DIRECTIONS = [
// 'competitor',
// 'switching',
// 'workflow',
// 'activation',
// ] as const
//
// function pickRandomDirection(): string {
// return SURVEY_DIRECTIONS[Math.floor(Math.random() * SURVEY_DIRECTIONS.length)]
// }
// --- End commented out survey code ---
const SURVEY_DIRECTIONS = [
'competitor',
'switching',
'workflow',
'activation',
] as const
function pickRandomDirection(): string {
return SURVEY_DIRECTIONS[Math.floor(Math.random() * SURVEY_DIRECTIONS.length)]
}
const PROVIDER_DISPLAY_NAMES: Record<ProviderType, string> = {
anthropic: 'Anthropic',
openai: 'OpenAI',
'openai-compatible': 'OpenAI-compatible',
google: 'Google',
openrouter: 'OpenRouter',
azure: 'Azure OpenAI',
ollama: 'Ollama',
lmstudio: 'LM Studio',
bedrock: 'AWS Bedrock',
browseros: 'BrowserOS',
moonshot: 'Moonshot',
'chatgpt-pro': 'ChatGPT Pro',
'github-copilot': 'GitHub Copilot',
'qwen-code': 'Qwen Code',
minimax: 'MiniMax',
}
const UPSTREAM_RATE_LIMIT_PATTERNS: Array<string | RegExp> = [
'usage limit',
'rate limit',
'rate-limit',
'quota',
/\b429\b/,
'too many requests',
'insufficient_quota',
]
function getProviderDisplayName(providerType?: string): string {
if (providerType && providerType in PROVIDER_DISPLAY_NAMES) {
return PROVIDER_DISPLAY_NAMES[providerType as ProviderType]
}
return 'your provider'
}
function stripRetryPrefix(message: string): string {
return message.replace(/^Failed after \d+ attempts?\.\s*Last error:\s*/i, '')
}
interface ChatErrorProps {
error: Error
@@ -31,6 +70,8 @@ function parseErrorMessage(
isRateLimit?: boolean
isCreditsExhausted?: boolean
isConnectionError?: boolean
isUpstreamRateLimit?: boolean
providerName?: string
} {
const isBrowserosProvider = providerType === 'browseros'
@@ -71,6 +112,28 @@ function parseErrorMessage(
}
}
// Detect rate limits from non-BrowserOS upstream providers. Users were
// confused that a quota/429 from OpenAI/Anthropic/etc. looked like a
// BrowserOS-imposed limit.
if (!isBrowserosProvider && providerType) {
const lower = message.toLowerCase()
const matchesRateLimit = UPSTREAM_RATE_LIMIT_PATTERNS.some((p) =>
typeof p === 'string' ? lower.includes(p) : p.test(lower),
)
if (matchesRateLimit) {
let stripped = stripRetryPrefix(message).trim()
try {
const parsed = JSON.parse(stripped)
if (parsed?.error?.message) stripped = parsed.error.message
} catch {}
return {
text: stripped || message,
isUpstreamRateLimit: true,
providerName: getProviderDisplayName(providerType),
}
}
}
let text = message
try {
const parsed = JSON.parse(message)
@@ -92,18 +155,28 @@ export const ChatError: FC<ChatErrorProps> = ({
onRetry,
providerType,
}) => {
const { text, url, isRateLimit, isCreditsExhausted, isConnectionError } =
parseErrorMessage(error.message, providerType)
const {
text,
url,
isRateLimit,
isCreditsExhausted,
isConnectionError,
isUpstreamRateLimit,
providerName,
} = parseErrorMessage(error.message, providerType)
// --- Commented out for Kimi partnership launch (restore after) ---
// const surveyUrl = useMemo(
// () =>
// `/app.html?page=survey&maxTurns=20&experimentId=daily_limit_${pickRandomDirection()}#/settings/survey`,
// [],
// )
// --- End commented out survey code ---
const surveyUrl = useMemo(
() =>
`/app.html?page=survey&maxTurns=20&experimentId=daily_limit_${pickRandomDirection()}#/settings/survey`,
[],
)
const getTitle = () => {
if (isUpstreamRateLimit) {
return providerName && providerName !== 'your provider'
? `${providerName} rate limit reached`
: 'Upstream rate limit reached'
}
if (isRateLimit) return 'Daily limit reached'
if (isConnectionError) return 'Connection failed'
return 'Something went wrong'
@@ -116,6 +189,14 @@ export const ChatError: FC<ChatErrorProps> = ({
<span className="font-medium text-sm">{getTitle()}</span>
</div>
<p className="text-center text-destructive text-xs">{text}</p>
{isUpstreamRateLimit && (
<p className="text-center text-muted-foreground text-xs">
This is a limit from{' '}
<span className="font-medium">{providerName}</span>
{' — your configured model provider — not BrowserOS. Check your '}
provider's dashboard for quota, usage, or billing details.
</p>
)}
{isConnectionError && url && (
<a
href={url}
@@ -126,8 +207,24 @@ export const ChatError: FC<ChatErrorProps> = ({
View troubleshooting guide
</a>
)}
{/* --- Commented out for Kimi partnership launch (restore after) ---
{isRateLimit && (
{isCreditsExhausted && (
<>
<div className="w-full border-border/50 border-t pt-3">
<ShareForCredits compact />
</div>
{url && (
<a
href={url}
target="_blank"
rel="noopener noreferrer"
className="text-muted-foreground text-xs underline hover:text-foreground"
>
View Usage & Billing
</a>
)}
</>
)}
{isRateLimit && !isCreditsExhausted && (
<p className="text-muted-foreground text-xs">
<a
href={url}
@@ -148,27 +245,6 @@ export const ChatError: FC<ChatErrorProps> = ({
</a>
</p>
)}
--- End commented out survey code --- */}
{isCreditsExhausted && url && (
<a
href={url}
target="_blank"
rel="noopener noreferrer"
className="text-muted-foreground text-xs underline hover:text-foreground"
>
View Usage & Billing
</a>
)}
{isRateLimit && providerType === 'browseros' && (
<a
href="/app.html#/settings/ai"
target="_blank"
rel="noopener noreferrer"
className="inline-flex items-center gap-1.5 rounded-md border border-[var(--accent-orange)] bg-[var(--accent-orange)]/10 px-3 py-1.5 font-medium text-[var(--accent-orange)] text-xs transition-colors hover:bg-[var(--accent-orange)]/20"
>
Add your own provider for unlimited usage
</a>
)}
{onRetry && (
<Button
variant="outline"

View File

@@ -280,14 +280,6 @@ export const KIMI_API_KEY_CONFIGURED_EVENT = 'settings.kimi.api_key_configured'
export const KIMI_API_KEY_GUIDE_CLICKED_EVENT =
'settings.kimi.api_key_guide_clicked'
/** @public */
export const KIMI_RATE_LIMIT_DOCS_CLICKED_EVENT =
'ui.rate_limit.kimi_docs_clicked'
/** @public */
export const KIMI_RATE_LIMIT_PLATFORM_CLICKED_EVENT =
'ui.rate_limit.moonshot_platform_clicked'
/** @public */
export const SIDEPANEL_VOICE_RECORDING_STARTED_EVENT =
'sidepanel.voice.recording_started'

View File

@@ -0,0 +1,15 @@
import { getBrowserOSAdapter } from '@/lib/browseros/adapter'
import { BROWSEROS_PREFS } from '@/lib/browseros/prefs'
// TODO(credits-identity): temporary shim — reuses the BrowserOS metrics
// install_id as the credits/referral identifier. Replace with a dedicated
// identity module once we have one.
export async function getBrowserosId(): Promise<string> {
const adapter = getBrowserOSAdapter()
const pref = await adapter.getPref(BROWSEROS_PREFS.INSTALL_ID)
const id = pref.value
if (typeof id !== 'string' || id.length === 0) {
throw new Error('browseros.metrics_install_id is not set')
}
return id
}

View File

@@ -1,20 +1,25 @@
import { EXTERNAL_URLS } from '@browseros/shared/constants/urls'
import { useQuery, useQueryClient } from '@tanstack/react-query'
import { getAgentServerUrl } from '@/lib/browseros/helpers'
import { getBrowserosId } from './browseros-id'
export interface CreditsInfo {
credits: number
dailyLimit: number
lastResetAt?: string
browserosId?: string
}
const CREDITS_QUERY_KEY = ['credits']
async function fetchCredits(): Promise<CreditsInfo> {
const baseUrl = await getAgentServerUrl()
const response = await fetch(`${baseUrl}/credits`)
const browserosId = await getBrowserosId()
const response = await fetch(
`${EXTERNAL_URLS.CREDITS_GATEWAY}/credits/${browserosId}`,
)
if (!response.ok)
throw new Error(`Failed to fetch credits: ${response.status}`)
return response.json()
const data = (await response.json()) as CreditsInfo
return { ...data, browserosId }
}
export function useCredits() {

View File

@@ -6,7 +6,6 @@ const EnvSchema = z.object({
VITE_PUBLIC_POSTHOG_HOST: z.string().optional(),
VITE_PUBLIC_SENTRY_DSN: z.string().optional(),
VITE_PUBLIC_BROWSEROS_API: z.string().optional(),
VITE_PUBLIC_KIMI_LAUNCH: z.string().optional(),
PROD: z.boolean(),
})

View File

@@ -1,14 +0,0 @@
import { env } from '@/lib/env'
const ENABLED_VALUES = new Set(['1', 'true', 'yes', 'on'])
function parseKimiLaunchFlag(value: string | undefined): boolean {
if (!value) return false
return ENABLED_VALUES.has(value.trim().toLowerCase())
}
const kimiLaunchEnabled = parseKimiLaunchFlag(env.VITE_PUBLIC_KIMI_LAUNCH)
export function isKimiLaunchEnabled(): boolean {
return kimiLaunchEnabled
}

View File

@@ -1,5 +0,0 @@
import { isKimiLaunchEnabled } from './kimi-launch'
export function useKimiLaunch(): boolean {
return isKimiLaunchEnabled()
}

View File

@@ -1,6 +1,5 @@
import { getBrowserOSAdapter } from '@/lib/browseros/adapter'
import { BROWSEROS_PREFS } from '@/lib/browseros/prefs'
import { isKimiLaunchEnabled } from '@/lib/feature-flags/kimi-launch'
/** @public */
export interface LlmHubProvider {
@@ -8,43 +7,15 @@ export interface LlmHubProvider {
url: string
}
const KIMI_PROVIDER: LlmHubProvider = {
name: 'Kimi',
url: 'https://www.kimi.com',
}
function ensureKimiFirst(providers: LlmHubProvider[]): LlmHubProvider[] {
if (!isKimiLaunchEnabled()) return providers
const hasKimi = providers.some(
(p) => p.name === 'Kimi' || p.url.includes('kimi.com'),
)
return hasKimi ? providers : [KIMI_PROVIDER, ...providers]
}
export async function loadProviders(): Promise<LlmHubProvider[]> {
try {
const adapter = getBrowserOSAdapter()
const providersPref = await adapter.getPref(
BROWSEROS_PREFS.THIRD_PARTY_LLM_PROVIDERS,
)
const providers = (providersPref?.value as LlmHubProvider[]) || []
if (providers.length === 0) {
if (isKimiLaunchEnabled()) {
const defaults = [KIMI_PROVIDER]
await saveProviders(defaults)
return defaults
}
return []
}
const normalized = ensureKimiFirst(providers)
if (normalized !== providers) {
await saveProviders(normalized)
}
return normalized
return (providersPref?.value as LlmHubProvider[]) || []
} catch {
return isKimiLaunchEnabled() ? [KIMI_PROVIDER] : []
return []
}
}

View File

@@ -5402,5 +5402,89 @@
"outputCost": 0
}
]
},
"minimax": {
"name": "MiniMax",
"api": "https://api.minimaxi.com/v1",
"doc": "https://platform.minimax.io",
"models": [
{
"id": "MiniMax-M2.7",
"name": "MiniMax M2.7",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.3,
"outputCost": 1.2
},
{
"id": "MiniMax-M2.7-highspeed",
"name": "MiniMax M2.7 Highspeed",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.6,
"outputCost": 2.4
},
{
"id": "MiniMax-M2.5",
"name": "MiniMax M2.5",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.3,
"outputCost": 1.2
},
{
"id": "MiniMax-M2.5-highspeed",
"name": "MiniMax M2.5 Highspeed",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.6,
"outputCost": 2.4
},
{
"id": "MiniMax-M2.1",
"name": "MiniMax M2.1",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.3,
"outputCost": 1.2
},
{
"id": "MiniMax-M2.1-highspeed",
"name": "MiniMax M2.1 Highspeed",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": true,
"supportsToolCall": true,
"inputCost": 0.6,
"outputCost": 2.4
},
{
"id": "M2-her",
"name": "M2-her",
"contextWindow": 204800,
"maxOutput": 8192,
"supportsImages": false,
"supportsReasoning": false,
"supportsToolCall": true,
"inputCost": 0.3,
"outputCost": 1.2
}
]
}
}

View File

@@ -5,6 +5,7 @@ import {
Gemini,
Kimi,
LmStudio,
Minimax,
Ollama,
OpenAI,
OpenRouter,
@@ -36,6 +37,7 @@ const providerIconMap: Record<ProviderType, IconComponent | null> = {
'chatgpt-pro': OpenAI,
'github-copilot': Github,
'qwen-code': Qwen,
minimax: Minimax,
}
interface ProviderIconProps {

View File

@@ -140,8 +140,31 @@ export const providerTemplates: ProviderTemplate[] = [
setupGuideUrl:
'https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html',
}),
enrichTemplate('minimax', {
defaultModelId: 'MiniMax-M2.7',
apiKeyUrl:
'https://platform.minimax.io/user-center/basic-information/interface-key',
setupGuideUrl: 'https://platform.minimax.io/docs/guides/models-intro',
}),
]
export const MINIMAX_REGIONS = {
chinese: {
api: 'https://api.minimaxi.com/v1',
apiKeyUrl:
'https://platform.minimaxi.com/user-center/basic-information/interface-key',
setupGuideUrl: 'https://platform.minimaxi.com/document',
},
international: {
api: 'https://api.minimax.io/v1',
apiKeyUrl:
'https://platform.minimax.io/user-center/basic-information/interface-key',
setupGuideUrl: 'https://platform.minimax.io/docs/guides/models-intro',
},
} as const
export type MinimaxRegion = keyof typeof MINIMAX_REGIONS
/**
* Provider type options for select dropdowns
* @public
@@ -161,6 +184,7 @@ export const providerTypeOptions: { value: ProviderType; label: string }[] = [
{ value: 'lmstudio', label: 'LM Studio' },
{ value: 'bedrock', label: 'AWS Bedrock' },
{ value: 'browseros', label: 'BrowserOS' },
{ value: 'minimax', label: 'MiniMax' },
]
/**
@@ -192,6 +216,7 @@ export const DEFAULT_BASE_URLS: Record<ProviderType, string> = {
lmstudio: 'http://localhost:1234/v1',
bedrock: '',
browseros: '',
minimax: MINIMAX_REGIONS.chinese.api,
}
/**

View File

@@ -2,14 +2,12 @@ import { storage } from '@wxt-dev/storage'
import { sessionStorage } from '@/lib/auth/sessionStorage'
import { getBrowserOSAdapter } from '@/lib/browseros/adapter'
import { BROWSEROS_PREFS } from '@/lib/browseros/prefs'
import { isKimiLaunchEnabled } from '@/lib/feature-flags/kimi-launch'
import type { LlmProviderConfig, LlmProvidersBackup } from './types'
import { uploadLlmProvidersToGraphql } from './uploadLlmProvidersToGraphql'
/** Default provider ID constant */
export const DEFAULT_PROVIDER_ID = 'browseros'
const DEFAULT_PROVIDER_NAME = 'BrowserOS'
const KIMI_LAUNCH_PROVIDER_NAME = 'Kimi K2.5'
/** Storage key for LLM providers array */
export const providersStorage = storage.defineItem<LlmProviderConfig[]>(
@@ -91,7 +89,7 @@ export function setupLlmProvidersSyncToBackend(): () => void {
/** Load providers from storage */
export async function loadProviders(): Promise<LlmProviderConfig[]> {
const providers = (await providersStorage.getValue()) || []
const normalizedProviders = normalizeProvidersForLaunch(providers)
const normalizedProviders = normalizeProviderNames(providers)
// Keep storage consistent so every consumer sees the same provider name.
if (
@@ -109,7 +107,7 @@ export function createDefaultBrowserOSProvider(): LlmProviderConfig {
return {
id: DEFAULT_PROVIDER_ID,
type: 'browseros',
name: getBuiltInProviderName(),
name: DEFAULT_PROVIDER_NAME,
baseUrl: 'https://api.browseros.com/v1',
modelId: 'browseros-auto',
supportsImages: true,
@@ -125,26 +123,22 @@ export function createDefaultProvidersConfig(): LlmProviderConfig[] {
return [createDefaultBrowserOSProvider()]
}
function getBuiltInProviderName(): string {
return isKimiLaunchEnabled()
? KIMI_LAUNCH_PROVIDER_NAME
: DEFAULT_PROVIDER_NAME
}
function normalizeProvidersForLaunch(
/**
* Normalize built-in provider names back to "BrowserOS" (e.g. from "Kimi K2.5"
* which was set during a previous partnership launch).
*/
function normalizeProviderNames(
providers: LlmProviderConfig[],
): LlmProviderConfig[] {
const builtInProviderName = getBuiltInProviderName()
return providers.map((provider) => {
if (
provider.id === DEFAULT_PROVIDER_ID &&
provider.type === 'browseros' &&
provider.name !== builtInProviderName
provider.name !== DEFAULT_PROVIDER_NAME
) {
return {
...provider,
name: builtInProviderName,
name: DEFAULT_PROVIDER_NAME,
}
}
return provider

View File

@@ -17,6 +17,7 @@ export type ProviderType =
| 'chatgpt-pro'
| 'github-copilot'
| 'qwen-code'
| 'minimax'
/**
* LLM Provider configuration

View File

@@ -0,0 +1,108 @@
import { EXTERNAL_URLS } from '@browseros/shared/constants/urls'
interface ReferralResult {
success: boolean
creditsAdded?: number
reason?: string
}
export async function submitReferral(
tweetUrl: string,
browserosId: string,
): Promise<ReferralResult> {
const response = await fetch(
`${EXTERNAL_URLS.REFERRAL_SERVICE}/referral/submit`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ tweetUrl, browserosId }),
},
)
if (!response.ok) {
return {
success: false,
reason: `Request failed with status ${response.status}`,
}
}
return response.json()
}
const TWEET_VARIATIONS = [
`ngl @browseros_ai is kinda wild
just type what u want in plain english and it handles the annoying web shit
forms, research, data pulls... all automated
actually works`,
`been using @browseros_ai to chat with webpages lately
summarize articles, pull data, translate stuff
all happens in the same tab
no copy/paste, no switching windows
just ask and it does it`,
`wake up to @browseros_ai having already read ur emails and calendar while u were sleeping
scheduled agents are lowkey magic`,
`ngl @browseros_ai is kinda crazy
connects gmail, slack, linear, notion + 40 other apps into one ai assistant
just talk to it in plain english and it handles cross-app workflows for u
no more switching between tabs like a psycho`,
`i use @browseros_ai to automate research
it handles the browser work and drops reports straight into local folders
no switching between tools or manually saving files
just one task instead of three`,
`been messing with @browseros_ai lately
it comes with a prebuilt MCP server and I connect it claude code or codex and it just runs things for you
set it up once, use it whenever
way better than clicking through the same shit manually every time`,
`the ai actually remembers what we talked about yesterday
no more "here's the context again" every single conversation
@browseros_ai just picks up where we left off
feels like talking to someone who actually pays attention`,
`i built a skill library for my ai agent
now when i need it to do something specific, i just load the recipe i made earlier
@browseros_ai MCP is very handy`,
`been running @browseros_ai with ollama locally
everything stays on my machine, nothing gets sent out
kinda nice not having to think about what data i'm sharing`,
`switched to @browseros_ai from chrome
blocks 10x more ads and runs full ublock origin (not the lite version)
check it out`,
]
export function getShareOnTwitterUrl(): string {
const text =
TWEET_VARIATIONS[Math.floor(Math.random() * TWEET_VARIATIONS.length)]
return `https://x.com/intent/tweet?text=${encodeURIComponent(text)}`
}

View File

@@ -20,6 +20,7 @@
"dependencies": {
"@ai-sdk/react": "^3.0.96",
"@browseros/server": "workspace:*",
"@browseros/shared": "workspace:*",
"@hookform/resolvers": "^5.2.2",
"@lobehub/icons": "^2.44.0",
"@mdxeditor/editor": "^3.52.4",

View File

@@ -0,0 +1,26 @@
{
"agent": {
"type": "single",
"provider": "openai-compatible",
"model": "accounts/fireworks/models/kimi-k2p5",
"apiKey": "FIREWORKS_API_KEY",
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../data/agisdk-real.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["agisdk_state_diff"],
"timeout_ms": 1800000
}

View File

@@ -0,0 +1,26 @@
{
"agent": {
"type": "single",
"provider": "openai-compatible",
"model": "accounts/fireworks/models/kimi-k2p5",
"apiKey": "FIREWORKS_API_KEY",
"baseUrl": "https://api.fireworks.ai/inference/v1",
"supportsImages": true
},
"dataset": "../data/webarena-infinity-hard-50.jsonl",
"num_workers": 10,
"restart_server_per_task": true,
"browseros": {
"server_url": "http://127.0.0.1:9110",
"base_cdp_port": 9010,
"base_server_port": 9110,
"base_extension_port": 9310,
"load_extensions": false,
"headless": false
},
"captcha": {
"api_key_env": "NOPECHA_API_KEY"
},
"graders": ["infinity_state"],
"timeout_ms": 1800000
}

View File

@@ -0,0 +1,52 @@
{"query_id": "agisdk-dashdish-10", "dataset": "agisdk-real", "query": "Place an order from \"Souvla\" for a \"Medium Classic Cheeseburger\" and a \"Small Bacon Double Cheeseburger\" with \"Standard Delivery\" as the method with the default charged options.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-10", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
{"query_id": "agisdk-fly-unified-5", "dataset": "agisdk-real", "query": "Find me the cheapest fare for a flight from Orlando to Milwaukee on December 5th, 2024 and book it.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/25, Security Code: 420 Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-5", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-udriver-10", "dataset": "agisdk-real", "query": "Order me a ride for 4pm, I'll be at the de Young muesum headed to the Waterbar, fanciest option possible please.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-10", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-udriver-9", "dataset": "agisdk-real", "query": "Book me a ride from the thai restaurant I last took a ride to for later today at 2pm, I'll be at 333 Apartments on Fremont", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-9", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-9", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-topwork-4", "dataset": "agisdk-real", "query": "Create a job post for a UI/UX Designer with expertise in Figma, Sketch, and Adobe Creative Suite, including project details, timeline, and required skills (Wireframing, Prototyping, Responsive Design).", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-4", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-gocalendar-4", "dataset": "agisdk-real", "query": "Change the \"Team Check-In\" event on July 18, 2024, name to \"Project Kickoff\" and update the location to \"Zoom\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-4", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-staynb-6", "dataset": "agisdk-real", "query": "Find and book the stay with the best value for money (cheapest stay with the best reviews) for 1 day. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-6", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-6", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-omnizon-10", "dataset": "agisdk-real", "query": "Click on \"buy now\" on any product, increase its quantity to the maximum allowed, update the delivery date to the last available, and place the order.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-omnizon.vercel.app", "metadata": {"original_task_id": "omnizon-10", "website": "Omnizon", "category": "agisdk-real", "additional": {"agisdk_task_id": "omnizon-10", "challenge_type": "action", "difficulty": "hard", "similar_to": "Amazon"}}}
{"query_id": "agisdk-fly-unified-9", "dataset": "agisdk-real", "query": "Book me a flight from San Francisco to Chicago in Basic Economy on December 18th at 10:00. Ensure no seat selection is made.\nPassenger: David Lee\nDate of Birth: 07/22/1985\nSex: Male\nSeat Selection: No\nPayment: Credit Card (9999 8888 7777), Exp: 03/30, Address: 987 Cedar St, Chicago, IL, 60601, USA, Phone: 555-987-1234, Email: davidlee@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-9", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-9", "challenge_type": "action", "difficulty": "hard", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-networkin-9", "dataset": "agisdk-real", "query": "Find a professional who attended Stanford and send them a connection request and a message.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-9", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-9", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-udriver-11", "dataset": "agisdk-real", "query": "I need to go from Pacific Catch on Chestnut back home to 333 Fremont now. If the fancy version is within ten dollars of the regular one, book that.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-11", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-11", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-fly-unified-4", "dataset": "agisdk-real", "query": "Book me a round-trip flight from Providence (Rhode Island) to Indianapolis, departing on December 5th, 2024 at 08:00 and returning on December 9th at 14:00.\nPassenger: Jane Smith\nDate of Birth: 02/14/1995\nSex: Female\nSeat Selection: Yes (Window seat)\nPayment: Credit Card (378342143523967), Exp: 06/26, security code: 345 Address: 456 Elm St, Miami, FL, 33101, USA, Phone: 555-987-6543, Email: janesmith@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-4", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-networkin-5", "dataset": "agisdk-real", "query": "Send a connection request to John Smith.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-5", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-5", "challenge_type": "action", "difficulty": "easy", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-zilloft-6", "dataset": "agisdk-real", "query": "Select a property listed in San Francisco as \"Condos\" within a price range under $300,000 and request a tour for tomorrow at 4:00 PM. Use these contact details: Name: Sarah Brown, Email: sarahbrown@example.com, Phone: 555-987-6543.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-6", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "Zillow"}}}
{"query_id": "agisdk-topwork-2", "dataset": "agisdk-real", "query": "Create a job posting for a Backend Developer specializing in Python, Django, and Flask to develop a high-performance web application. Include project details such as required skills (PostgreSQL, Docker, AWS, CI/CD), estimated project timeline, and budget.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-2", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-2", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-gocalendar-3", "dataset": "agisdk-real", "query": "Delete the event titled \"Breakfast Meeting with Client\" scheduled for July 19, 2024", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-3", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-3", "challenge_type": "action", "difficulty": "easy", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-topwork-3", "dataset": "agisdk-real", "query": "Create a job listing for a Full-Stack Developer with expertise in Java, Spring Boot, and Angular, outlining the project scope, estimated duration, and required skills (MySQL, Docker, Kubernetes, and Jenkins). The ideal candidate should have experience in enterprise-level applications and building scalable microservices. After creating the job post, please describe what you included in the job listing.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-3", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-3", "challenge_type": "retrieval", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-fly-unified-2", "dataset": "agisdk-real", "query": "Book me a one-way flight from Indiana to New York on December 2nd 2024 at 12:00.\nPassenger: John Doe\nDate of Birth: 01/01/1990\nSex: Male\nSeat Selection: No\nPayment: Credit Card (378342143523967), Exp: 12/25, Security Code: 245, Address: 123 Main St, San Francisco, CA, 94105, USA, Phone: 555-123-4567, Email: johndoe@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-2", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-2", "challenge_type": "action", "difficulty": "easy", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-dashdish-7", "dataset": "agisdk-real", "query": "Select \"Express Delivery\" for an order from \"DragonEats\" of \"Mushroom Swiss Burger\" and complete the checkout with the pre-loaded Visa card.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-7", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-7", "challenge_type": "action", "difficulty": "hard", "similar_to": "Doordash"}}}
{"query_id": "agisdk-networkin-3", "dataset": "agisdk-real", "query": "Write a post inviting users to a networking event, including details about the event's purpose, date, and target audience.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-3", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-3", "challenge_type": "action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-gomail-7", "dataset": "agisdk-real", "query": "Delete the email with the subject \"New Leadership Articles You Can't Miss\" from the Inbox.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-7", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-7", "challenge_type": "retrieval-action", "difficulty": "hard", "similar_to": "Gmail"}}}
{"query_id": "agisdk-opendining-8", "dataset": "agisdk-real", "query": "Identify and book the restaurant with the lowest rating. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-8", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-8", "challenge_type": "retrieval-action", "difficulty": "easy", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-omnizon-2", "dataset": "agisdk-real", "query": "Search for \"smartphones\" using the search bar, add the first two to your cart, view the details of the third product, click on \"Buy Now,\" and proceed through the checkout process.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-omnizon.vercel.app", "metadata": {"original_task_id": "omnizon-2", "website": "Omnizon", "category": "agisdk-real", "additional": {"agisdk_task_id": "omnizon-2", "challenge_type": "action", "difficulty": "medium", "similar_to": "Amazon"}}}
{"query_id": "agisdk-udriver-1", "dataset": "agisdk-real", "query": "Book a ride from Fitness Urbano to Pacific Cafe", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-1", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-1", "challenge_type": "action", "difficulty": "easy", "similar_to": "Uber"}}}
{"query_id": "agisdk-staynb-2", "dataset": "agisdk-real", "query": "Click on one of the stays displayed on the homepage and book it for a family of 4 (2 adults and 2 children). For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-2", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-2", "challenge_type": "action", "difficulty": "easy", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-opendining-10", "dataset": "agisdk-real", "query": "Check the menus of all restaurants for vegetarian options and make a reservation at the one with the most vegetarian choices. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-10", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-10", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-opendining-4", "dataset": "agisdk-real", "query": "Use the search bar to search for a restaurant on September 2nd at 4:30 PM for 7 people, using \"Japanese\" as the search term, and book the first result. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-4", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-4", "challenge_type": "action", "difficulty": "hard", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-gomail-8", "dataset": "agisdk-real", "query": "Clear all emails from \"GitHub\" in the inbox to trash.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-8", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-8", "challenge_type": "action", "difficulty": "medium", "similar_to": "Gmail"}}}
{"query_id": "agisdk-dashdish-4", "dataset": "agisdk-real", "query": "Schedule a delivery order from \"Taco Bell\" adding a \"Classic Cheeseburger\" large size for later and add the note \"Leave at the front door\".", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-4", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Doordash"}}}
{"query_id": "agisdk-networkin-1", "dataset": "agisdk-real", "query": "Create a new text post for the feed with a professional update about AI trends in 2025, mentioning three key advancements and their impact on the job market.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-1", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-1", "challenge_type": "action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-dashdish-5", "dataset": "agisdk-real", "query": "Add three \"Loaded Bacon Cheese Fries\" to the shopping cart from \"Man vs. Fries\". Proceed to checkout and select \"Pickup\" as the delivery method.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-5", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Doordash"}}}
{"query_id": "agisdk-opendining-5", "dataset": "agisdk-real", "query": "Scroll through the homepage carousel until \"Ocean Breeze\" is visible, select the second available time slot, and complete the reservation. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-5", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-5", "challenge_type": "action", "difficulty": "medium", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-topwork-1", "dataset": "agisdk-real", "query": "Create a new job post for a Frontend Developer with expertise in React and TypeScript, specifying project details such as estimated duration, required skills, and budget.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-topwork.vercel.app", "metadata": {"original_task_id": "topwork-1", "website": "TopWork", "category": "agisdk-real", "additional": {"agisdk_task_id": "topwork-1", "challenge_type": "action", "difficulty": "medium", "similar_to": "Upwork"}}}
{"query_id": "agisdk-gocalendar-1", "dataset": "agisdk-real", "query": "Create a new event titled \"Team Meeting\" on July 19, 2024, from 2 PM to 2:30 PM, and include \"Conference Room A\" as the location", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-1", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-1", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-gomail-5", "dataset": "agisdk-real", "query": "Schedule an email to jane.doe@example.com with the subject \"Weekly Update\" to be sent next Monday at 9:00 AM.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-5", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-5", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Gmail"}}}
{"query_id": "agisdk-staynb-4", "dataset": "agisdk-real", "query": "Book a stay for 2 children with 1 adult. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-4", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-4", "challenge_type": "action", "difficulty": "medium", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-omnizon-8", "dataset": "agisdk-real", "query": "Search for \"Automatic Espresso Machine,\" click on the cheapest one, change the quantity to 5, use \"buy now\" to purchase them and complete the checkout.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-omnizon.vercel.app", "metadata": {"original_task_id": "omnizon-8", "website": "Omnizon", "category": "agisdk-real", "additional": {"agisdk_task_id": "omnizon-8", "challenge_type": "retrieval-action", "difficulty": "easy", "similar_to": "Amazon"}}}
{"query_id": "agisdk-networkin-6", "dataset": "agisdk-real", "query": "Choose a random person who you haven't connected with, connect with them, and send them a message saying, 'howdy, partner'.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-6", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-dashdish-2", "dataset": "agisdk-real", "query": "Add a \"Medium Pepperoni Pizza\" from the restaurant \"Papa Johns Pizza\" to the shopping cart and purchase it.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-dashdish.vercel.app", "metadata": {"original_task_id": "dashdish-2", "website": "DashDish", "category": "agisdk-real", "additional": {"agisdk_task_id": "dashdish-2", "challenge_type": "action", "difficulty": "easy", "similar_to": "Doordash"}}}
{"query_id": "agisdk-staynb-8", "dataset": "agisdk-real", "query": "Scroll through the homepage and book the last stay located in Paris.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-8", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-8", "challenge_type": "retrieval-action", "difficulty": "medium", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-omnizon-4", "dataset": "agisdk-real", "query": "Search for a \"Marshall Emberton II Portable Bluetooth Speaker\" and add it to your cart, then search for the \"Michael Kors Oversized Slim Runway Men's Watch,\" add it to the cart, and complete the checkout with both items.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-omnizon.vercel.app", "metadata": {"original_task_id": "omnizon-4", "website": "Omnizon", "category": "agisdk-real", "additional": {"agisdk_task_id": "omnizon-4", "challenge_type": "action", "difficulty": "hard", "similar_to": "Amazon"}}}
{"query_id": "agisdk-gomail-2", "dataset": "agisdk-real", "query": "Mark the first email in the Inbox as \"read\".", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-2", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-2", "challenge_type": "action", "difficulty": "easy", "similar_to": "Gmail"}}}
{"query_id": "agisdk-networkin-10", "dataset": "agisdk-real", "query": "Generate a polite follow-up message for a previous unanswered chat, starting with \"Following up on\".", "graders": ["agisdk_state_diff"], "start_url": "https://evals-networkin.vercel.app", "metadata": {"original_task_id": "networkin-10", "website": "Networkin", "category": "agisdk-real", "additional": {"agisdk_task_id": "networkin-10", "challenge_type": "action", "difficulty": "medium", "similar_to": "LinkedIn"}}}
{"query_id": "agisdk-gomail-3", "dataset": "agisdk-real", "query": "Compose a new email to jonathan.smith@example.com with the subject \"Meeting Notes\" and body \"Please find the meeting notes attached.\"", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gomail.vercel.app", "metadata": {"original_task_id": "gomail-3", "website": "GoMail", "category": "agisdk-real", "additional": {"agisdk_task_id": "gomail-3", "challenge_type": "action", "difficulty": "easy", "similar_to": "Gmail"}}}
{"query_id": "agisdk-udriver-6", "dataset": "agisdk-real", "query": "Me and 4 friends need a ride from the Palace Hotel to dinner at Osha Thai leaving now", "graders": ["agisdk_state_diff"], "start_url": "https://evals-udriver.vercel.app", "metadata": {"original_task_id": "udriver-6", "website": "UDriver", "category": "agisdk-real", "additional": {"agisdk_task_id": "udriver-6", "challenge_type": "action", "difficulty": "hard", "similar_to": "Uber"}}}
{"query_id": "agisdk-staynb-9", "dataset": "agisdk-real", "query": "Book a stay with the maximum number of guests supported. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-9", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-9", "challenge_type": "action", "difficulty": "hard", "similar_to": "Airbnb"}}}
{"query_id": "agisdk-zilloft-3", "dataset": "agisdk-real", "query": "Find a home in San Diego priced under $150,000 with at least 2 bedrooms and request a tour. Use these details: Contact Name: John Doe, Email: johndoe@example.com, Phone: 555-123-4567, Tour Time: 2:00 PM, Tour Date: First available.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-zilloft.vercel.app", "metadata": {"original_task_id": "zilloft-3", "website": "Zilloft", "category": "agisdk-real", "additional": {"agisdk_task_id": "zilloft-3", "challenge_type": "retrieval-action", "difficulty": "easy", "similar_to": "Zillow"}}}
{"query_id": "agisdk-fly-unified-6", "dataset": "agisdk-real", "query": "Reserve me a seat for the flight from Austin to Pittsburgh departing on December 11th, 2024 at 8:00 in Basic Economy.\nPassenger: Alice Brown\nDate of Birth: 05/20/1992\nSex: Female\nSeat Selection: Yes (Aisle seat)\nPayment: Credit Card (378342143523967), Exp: 09/27, security code: 332 Address: 789 Pine St, Los Angeles, CA, 90012, USA, Phone: 555-456-7890, Email: alicebrown@example.com.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-fly-unified.vercel.app", "metadata": {"original_task_id": "fly-unified-6", "website": "Fly Unified", "category": "agisdk-real", "additional": {"agisdk_task_id": "fly-unified-6", "challenge_type": "action", "difficulty": "medium", "similar_to": "United Airlines"}}}
{"query_id": "agisdk-opendining-3", "dataset": "agisdk-real", "query": "Book a table at \"The Royal Dine\" for a party of 4 on July 20, 2024, at 7 PM. For fields you don't know the answer for, just fill them in with anything of your choice.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-opendining.vercel.app", "metadata": {"original_task_id": "opendining-3", "website": "OpenDining", "category": "agisdk-real", "additional": {"agisdk_task_id": "opendining-3", "challenge_type": "action", "difficulty": "easy", "similar_to": "OpenTable"}}}
{"query_id": "agisdk-omnizon-9", "dataset": "agisdk-real", "query": "Search for \"PlayStation DualSense\", purchase it using the \"buy now\" button after opening the first result and change the default payment method to:\nname: Jack Fulton\ncard number: 9231 3432 8927 7764\nexp date: 1/2029\nsecurity code: 128\n before placing your order. ", "graders": ["agisdk_state_diff"], "start_url": "https://evals-omnizon.vercel.app", "metadata": {"original_task_id": "omnizon-9", "website": "Omnizon", "category": "agisdk-real", "additional": {"agisdk_task_id": "omnizon-9", "challenge_type": "action", "difficulty": "hard", "similar_to": "Amazon"}}}
{"query_id": "agisdk-gocalendar-7", "dataset": "agisdk-real", "query": "Reschedule the \"Morning Coffee with sister\" event from July 18, 2024, at 9 AM to July 19, 2024, at 10AM using drag-and-drop functionality", "graders": ["agisdk_state_diff"], "start_url": "https://evals-gocalendar.vercel.app", "metadata": {"original_task_id": "gocalendar-7", "website": "GoCalendar", "category": "agisdk-real", "additional": {"agisdk_task_id": "gocalendar-7", "challenge_type": "action", "difficulty": "medium", "similar_to": "Google Calendar"}}}
{"query_id": "agisdk-staynb-5", "dataset": "agisdk-real", "query": "Use the search bar to look for a stay. For the \"Where\" section, use the \"Search by region\" popover and select \"Europe\". Set the check-in date to October 13th and the check-out date to October 23rd. For the \"Who\" section, select 1 infant, 2 children, and 2 adults. Press the search button, select the first stay, and book it.", "graders": ["agisdk_state_diff"], "start_url": "https://evals-staynb.vercel.app", "metadata": {"original_task_id": "staynb-5", "website": "StayNB", "category": "agisdk-real", "additional": {"agisdk_task_id": "staynb-5", "challenge_type": "action", "difficulty": "medium", "similar_to": "Airbnb"}}}

View File

@@ -0,0 +1,50 @@
{"query_id": "infinity-elation-prescriptions-task_h69", "dataset": "webarena-infinity", "query": "Approve all pending refill requests except for any medication that is involved in a major drug-drug interaction with another of the patient's active medications. Deny those with the reason 'Drug interaction \u2014 needs provider review before renewal'.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h69", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h69.py", "app_base_port": 8020}}}
{"query_id": "infinity-elation-clinical-records-task_h52", "dataset": "webarena-infinity", "query": "Add the document tag 'Provider-Reviewed' to every visit note template that was created by the current logged-in provider. Do not modify templates created by other providers.", "graders": ["infinity_state"], "start_url": "http://localhost:8000", "metadata": {"original_task_id": "elation-clinical-records-task_h52", "website": "elation-clinical-records", "category": "webarena-infinity", "additional": {"app_name": "elation-clinical-records", "difficulty": "hard", "verifier_path": "real-tasks/task_h52.py", "app_base_port": 8000}}}
{"query_id": "infinity-gmail-accounts-and-contacts-task_h44", "dataset": "webarena-infinity", "query": "Your sister's husband is one of your contacts. Find him, star his entry, and add the Friends label.", "graders": ["infinity_state"], "start_url": "http://localhost:8070", "metadata": {"original_task_id": "gmail-accounts-and-contacts-task_h44", "website": "gmail-accounts-and-contacts", "category": "webarena-infinity", "additional": {"app_name": "gmail-accounts-and-contacts", "difficulty": "hard", "verifier_path": "real-tasks/task_h44.py", "app_base_port": 8070}}}
{"query_id": "infinity-gmail-task_h2", "dataset": "webarena-infinity", "query": "Update the Datadog alerts filter to also archive matching emails and forward them to priya.sharma@cloudnine.dev instead of nate.patel@devops.tools.", "graders": ["infinity_state"], "start_url": "http://localhost:8060", "metadata": {"original_task_id": "gmail-task_h2", "website": "gmail", "category": "webarena-infinity", "additional": {"app_name": "gmail", "difficulty": "hard", "verifier_path": "real-tasks/task_h2.py", "app_base_port": 8060}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h58", "dataset": "webarena-infinity", "query": "The Performance Initiative epic has two child epics. For the child epic with more open issues, set the weight of every issue in it to 13. For the other child epic, close all its open issues.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h58", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h58.py", "app_base_port": 8050}}}
{"query_id": "infinity-figma-slides-task_h46", "dataset": "webarena-infinity", "query": "There are two slides with tables in the deck. Lock the table that compares competitors, and change the font size to 16 on the table that tracks quarterly feature adoption.", "graders": ["infinity_state"], "start_url": "http://localhost:8030", "metadata": {"original_task_id": "figma-slides-task_h46", "website": "figma-slides", "category": "webarena-infinity", "additional": {"app_name": "figma-slides", "difficulty": "hard", "verifier_path": "real-tasks/task_h46.py", "app_base_port": 8030}}}
{"query_id": "infinity-elation-prescriptions-task_h50", "dataset": "webarena-infinity", "query": "Deny the pending refill for the patient's cholesterol medication because his lipid panel is overdue. Then deny the Lisinopril refill as well \u2014 he needs a follow-up blood pressure check first.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h50", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h50.py", "app_base_port": 8020}}}
{"query_id": "infinity-elation-prescriptions-task_h19", "dataset": "webarena-infinity", "query": "Discontinue the Omeprazole and prescribe Famotidine 20mg tablet twice daily as a replacement for GERD \u2014 qty 60, 3 refills, send to CVS #4521.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h19", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h19.py", "app_base_port": 8020}}}
{"query_id": "infinity-paypal-my-wallet-task_h25", "dataset": "webarena-infinity", "query": "Convert all of my Australian dollars to euros.", "graders": ["infinity_state"], "start_url": "http://localhost:8100", "metadata": {"original_task_id": "paypal-my-wallet-task_h25", "website": "paypal-my-wallet", "category": "webarena-infinity", "additional": {"app_name": "paypal-my-wallet", "difficulty": "hard", "verifier_path": "real-tasks/task_h25.py", "app_base_port": 8100}}}
{"query_id": "infinity-elation-clinical-records-task_h66", "dataset": "webarena-infinity", "query": "Create a new template called 'Anxiety Management' with HPI and Assessment sections, and billing code 99213 with description 'Office visit, established, low complexity'. Then create a visit note for Emily Nakamura using that new template and the Telehealth category, add a Psychological Status block to the note, and sign it.", "graders": ["infinity_state"], "start_url": "http://localhost:8000", "metadata": {"original_task_id": "elation-clinical-records-task_h66", "website": "elation-clinical-records", "category": "webarena-infinity", "additional": {"app_name": "elation-clinical-records", "difficulty": "hard", "verifier_path": "real-tasks/task_h66.py", "app_base_port": 8000}}}
{"query_id": "infinity-elation-clinical-records-task_h62", "dataset": "webarena-infinity", "query": "Look up which template is assigned to the COVID Vaccine appointment type. Remove all its existing document tags and replace them with the single tag 'COVID-Protocol'. Then also assign that same template to the Urgent Same-Day appointment type.", "graders": ["infinity_state"], "start_url": "http://localhost:8000", "metadata": {"original_task_id": "elation-clinical-records-task_h62", "website": "elation-clinical-records", "category": "webarena-infinity", "additional": {"app_name": "elation-clinical-records", "difficulty": "hard", "verifier_path": "real-tasks/task_h62.py", "app_base_port": 8000}}}
{"query_id": "infinity-elation-prescriptions-task_h32", "dataset": "webarena-infinity", "query": "The patient has a medication that's being dispensed as written (brand name only). Discontinue that prescription and replace it with a new one \u2014 same medication, same sig, same pharmacy \u2014 but allow generic substitution this time. Qty 30, 3 refills, 30 days supply.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h32", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h32.py", "app_base_port": 8020}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h48", "dataset": "webarena-infinity", "query": "Add the 'breaking-change' label to every open issue in the API v3 Migration epic and remove any existing workflow-scoped labels from those issues.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h48", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h48.py", "app_base_port": 8050}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h77", "dataset": "webarena-infinity", "query": "Rename the 'UX' label to 'user-experience', change its type to 'group', and then add it to every open issue in the Frontend Modernization epic that doesn't already have it.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h77", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h77.py", "app_base_port": 8050}}}
{"query_id": "infinity-xero-invoicing-task_h15", "dataset": "webarena-infinity", "query": "Create a new invoice for Summit Health Group for an annual software license and 12 months of support with a 10% discount on support.", "graders": ["infinity_state"], "start_url": "http://localhost:8120", "metadata": {"original_task_id": "xero-invoicing-task_h15", "website": "xero-invoicing", "category": "webarena-infinity", "additional": {"app_name": "xero-invoicing", "difficulty": "hard", "verifier_path": "real-tasks/task_h15.py", "app_base_port": 8120}}}
{"query_id": "infinity-elation-clinical-records-task_h55", "dataset": "webarena-infinity", "query": "Resolve every problem across all patients in the system that currently has a status of Controlled.", "graders": ["infinity_state"], "start_url": "http://localhost:8000", "metadata": {"original_task_id": "elation-clinical-records-task_h55", "website": "elation-clinical-records", "category": "webarena-infinity", "additional": {"app_name": "elation-clinical-records", "difficulty": "hard", "verifier_path": "real-tasks/task_h55.py", "app_base_port": 8000}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h8", "dataset": "webarena-infinity", "query": "Create a confidential issue titled 'Emergency security patch' with priority::critical and the 'security' label, assigned to James O'Brien and Oliver Schmidt, with weight 2 in the Security Hardening milestone.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h8", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h8.py", "app_base_port": 8050}}}
{"query_id": "infinity-paypal-my-wallet-task_h20", "dataset": "webarena-infinity", "query": "Make a $200 payment on PayPal Credit and change autopay to pay the full balance.", "graders": ["infinity_state"], "start_url": "http://localhost:8100", "metadata": {"original_task_id": "paypal-my-wallet-task_h20", "website": "paypal-my-wallet", "category": "webarena-infinity", "additional": {"app_name": "paypal-my-wallet", "difficulty": "hard", "verifier_path": "real-tasks/task_h20.py", "app_base_port": 8100}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h52", "dataset": "webarena-infinity", "query": "Create a new board called 'Performance Tracker' with lists for the priority::critical, priority::high, and priority::medium labels. Then add the 'priority::high' label to every open issue in the v4.1 milestone that has the 'performance' label.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h52", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h52.py", "app_base_port": 8050}}}
{"query_id": "infinity-paypal-my-wallet-task_h80", "dataset": "webarena-infinity", "query": "Save all available Food & Drink offers, buy a $25 DoorDash gift card for yourself, and switch currency conversion to use my card issuer.", "graders": ["infinity_state"], "start_url": "http://localhost:8100", "metadata": {"original_task_id": "paypal-my-wallet-task_h80", "website": "paypal-my-wallet", "category": "webarena-infinity", "additional": {"app_name": "paypal-my-wallet", "difficulty": "hard", "verifier_path": "real-tasks/task_h80.py", "app_base_port": 8100}}}
{"query_id": "infinity-gmail-accounts-and-contacts-task_h50", "dataset": "webarena-infinity", "query": "Add the Emergency label to every contact who is currently listed as a delegate (active, pending, or expired). Then remove all delegates whose status is not 'active'.", "graders": ["infinity_state"], "start_url": "http://localhost:8070", "metadata": {"original_task_id": "gmail-accounts-and-contacts-task_h50", "website": "gmail-accounts-and-contacts", "category": "webarena-infinity", "additional": {"app_name": "gmail-accounts-and-contacts", "difficulty": "hard", "verifier_path": "real-tasks/task_h50.py", "app_base_port": 8070}}}
{"query_id": "infinity-elation-clinical-records-task_h14", "dataset": "webarena-infinity", "query": "Add the tag 'Flu-Season' to every patient whose primary provider is Dr. Sarah Chen.", "graders": ["infinity_state"], "start_url": "http://localhost:8000", "metadata": {"original_task_id": "elation-clinical-records-task_h14", "website": "elation-clinical-records", "category": "webarena-infinity", "additional": {"app_name": "elation-clinical-records", "difficulty": "hard", "verifier_path": "real-tasks/task_h14.py", "app_base_port": 8000}}}
{"query_id": "infinity-figma-text-and-typography-task_h7", "dataset": "webarena-infinity", "query": "Remove all list formatting from every layer.", "graders": ["infinity_state"], "start_url": "http://localhost:8040", "metadata": {"original_task_id": "figma-text-and-typography-task_h7", "website": "figma-text-and-typography", "category": "webarena-infinity", "additional": {"app_name": "figma-text-and-typography", "difficulty": "hard", "verifier_path": "real-tasks/task_h7.py", "app_base_port": 8040}}}
{"query_id": "infinity-paypal-my-wallet-task_h26", "dataset": "webarena-infinity", "query": "Send a $50 Amazon gift card to sarah.chen@email.com with 'Thank you!' as the message, and save the Amazon cashback offer.", "graders": ["infinity_state"], "start_url": "http://localhost:8100", "metadata": {"original_task_id": "paypal-my-wallet-task_h26", "website": "paypal-my-wallet", "category": "webarena-infinity", "additional": {"app_name": "paypal-my-wallet", "difficulty": "hard", "verifier_path": "real-tasks/task_h26.py", "app_base_port": 8100}}}
{"query_id": "infinity-handshake-career-exploration-task_h97", "dataset": "webarena-infinity", "query": "Find the single most helpful answer across all Q&A questions and mark it helpful. Then find the most-viewed question and submit your own answer to it.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h97", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h97.py", "app_base_port": 8080}}}
{"query_id": "infinity-figma-slides-task_h79", "dataset": "webarena-infinity", "query": "In the adoption table, find the feature with the highest Target Q4 percentage. In the competitive table, change DesignCraft's entry for that same feature to 'Market Leader'. Then update that feature's Target Q4 to '95%'.", "graders": ["infinity_state"], "start_url": "http://localhost:8030", "metadata": {"original_task_id": "figma-slides-task_h79", "website": "figma-slides", "category": "webarena-infinity", "additional": {"app_name": "figma-slides", "difficulty": "hard", "verifier_path": "real-tasks/task_h79.py", "app_base_port": 8030}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h41", "dataset": "webarena-infinity", "query": "For every open issue in the v4.2 - Security Hardening milestone: if it is already confidential, set its health status to 'at risk'. If it is not confidential, make it confidential and set its health status to 'needs attention'.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h41", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h41.py", "app_base_port": 8050}}}
{"query_id": "infinity-handshake-career-exploration-task_h90", "dataset": "webarena-infinity", "query": "A student in the feed mentioned attending the NSBE conference. That student also answered a Q&A question about diversity programs in tech. Submit your own answer to that same question sharing your experience, then bookmark that student's feed post.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h90", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h90.py", "app_base_port": 8080}}}
{"query_id": "infinity-elation-prescriptions-task_h30", "dataset": "webarena-infinity", "query": "The patient has three temporary medications. Discontinue the corticosteroid taper and the penicillin antibiotic \u2014 the patient completed both courses. Move the remaining temporary medication to permanent Rx.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h30", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h30.py", "app_base_port": 8020}}}
{"query_id": "infinity-linear-account-settings-task_h19", "dataset": "webarena-infinity", "query": "Turn off all desktop application settings: open in desktop app, notification badge, and spell check.", "graders": ["infinity_state"], "start_url": "http://localhost:8090", "metadata": {"original_task_id": "linear-account-settings-task_h19", "website": "linear-account-settings", "category": "webarena-infinity", "additional": {"app_name": "linear-account-settings", "difficulty": "hard", "verifier_path": "real-tasks/task_h19.py", "app_base_port": 8090}}}
{"query_id": "infinity-elation-prescriptions-task_h39", "dataset": "webarena-infinity", "query": "Change the default pharmacy to Express Scripts Mail Pharmacy for mail-order prescriptions. Then document that the patient takes Magnesium Citrate 400mg tablet as an OTC supplement \u2014 once daily at bedtime, 30-day supply.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h39", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h39.py", "app_base_port": 8020}}}
{"query_id": "infinity-handshake-career-exploration-task_h136", "dataset": "webarena-infinity", "query": "Your earliest completed appointment was a specific type. Schedule a follow-up appointment of the same category and type with the same staff member, for March 28, 2026 at 9:00 AM, in person.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h136", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h136.py", "app_base_port": 8080}}}
{"query_id": "infinity-handshake-career-exploration-task_h105", "dataset": "webarena-infinity", "query": "Find the second-most-viewed question in Q&A. It has two answers \u2014 mark the one with fewer helpful votes as helpful.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h105", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h105.py", "app_base_port": 8080}}}
{"query_id": "infinity-gmail-accounts-and-contacts-task_h22", "dataset": "webarena-infinity", "query": "The Engineering Manager at TechCorp is listed as one of your delegates. Remove her delegation and unstar her contact.", "graders": ["infinity_state"], "start_url": "http://localhost:8070", "metadata": {"original_task_id": "gmail-accounts-and-contacts-task_h22", "website": "gmail-accounts-and-contacts", "category": "webarena-infinity", "additional": {"app_name": "gmail-accounts-and-contacts", "difficulty": "hard", "verifier_path": "real-tasks/task_h22.py", "app_base_port": 8070}}}
{"query_id": "infinity-elation-patient-communication-task_h9", "dataset": "webarena-infinity", "query": "Acknowledge all unacknowledged reminders in the system.", "graders": ["infinity_state"], "start_url": "http://localhost:8010", "metadata": {"original_task_id": "elation-patient-communication-task_h9", "website": "elation-patient-communication", "category": "webarena-infinity", "additional": {"app_name": "elation-patient-communication", "difficulty": "hard", "verifier_path": "real-tasks/task_h9.py", "app_base_port": 8010}}}
{"query_id": "infinity-superhuman-general-task_h1", "dataset": "webarena-infinity", "query": "Label the FinancePlus partnership email and the QuantumLab prototype email as 'Clients'.", "graders": ["infinity_state"], "start_url": "http://localhost:8110", "metadata": {"original_task_id": "superhuman-general-task_h1", "website": "superhuman-general", "category": "webarena-infinity", "additional": {"app_name": "superhuman-general", "difficulty": "hard", "verifier_path": "real-tasks/task_h1.py", "app_base_port": 8110}}}
{"query_id": "infinity-xero-invoicing-task_h79", "dataset": "webarena-infinity", "query": "Change the invoice prefix to 'AUS-' and the next number to 100, then create a new invoice for CloudNine Analytics for 8 hours of UI/UX design work.", "graders": ["infinity_state"], "start_url": "http://localhost:8120", "metadata": {"original_task_id": "xero-invoicing-task_h79", "website": "xero-invoicing", "category": "webarena-infinity", "additional": {"app_name": "xero-invoicing", "difficulty": "hard", "verifier_path": "real-tasks/task_h79.py", "app_base_port": 8120}}}
{"query_id": "infinity-figma-slides-task_h16", "dataset": "webarena-infinity", "query": "Enable slide numbers on every slide using the 'with total' format and change the aspect ratio to 4:3.", "graders": ["infinity_state"], "start_url": "http://localhost:8030", "metadata": {"original_task_id": "figma-slides-task_h16", "website": "figma-slides", "category": "webarena-infinity", "additional": {"app_name": "figma-slides", "difficulty": "hard", "verifier_path": "real-tasks/task_h16.py", "app_base_port": 8030}}}
{"query_id": "infinity-linear-account-settings-task_h16", "dataset": "webarena-infinity", "query": "Revoke all API keys that have an expiration date.", "graders": ["infinity_state"], "start_url": "http://localhost:8090", "metadata": {"original_task_id": "linear-account-settings-task_h16", "website": "linear-account-settings", "category": "webarena-infinity", "additional": {"app_name": "linear-account-settings", "difficulty": "hard", "verifier_path": "real-tasks/task_h16.py", "app_base_port": 8090}}}
{"query_id": "infinity-elation-prescriptions-task_h2", "dataset": "webarena-infinity", "query": "Prescribe Buspirone 10mg for the patient's anxiety \u2014 once daily in the morning, qty 30, 5 refills. Send it to the same pharmacy that fills his Sertraline.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h2", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h2.py", "app_base_port": 8020}}}
{"query_id": "infinity-handshake-career-exploration-task_h1", "dataset": "webarena-infinity", "query": "Follow all consulting firms on Handshake.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h1", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h1.py", "app_base_port": 8080}}}
{"query_id": "infinity-handshake-career-exploration-task_h141", "dataset": "webarena-infinity", "query": "Some of your saved jobs are from employers you haven't followed yet. Find and follow each of those employers.", "graders": ["infinity_state"], "start_url": "http://localhost:8080", "metadata": {"original_task_id": "handshake-career-exploration-task_h141", "website": "handshake-career-exploration", "category": "webarena-infinity", "additional": {"app_name": "handshake-career-exploration", "difficulty": "hard", "verifier_path": "real-tasks/task_h141.py", "app_base_port": 8080}}}
{"query_id": "infinity-figma-text-and-typography-task_h74", "dataset": "webarena-infinity", "query": "Set the spelling language to Japanese, the big nudge amount to 50, and the default horizontal alignment to right.", "graders": ["infinity_state"], "start_url": "http://localhost:8040", "metadata": {"original_task_id": "figma-text-and-typography-task_h74", "website": "figma-text-and-typography", "category": "webarena-infinity", "additional": {"app_name": "figma-text-and-typography", "difficulty": "hard", "verifier_path": "real-tasks/task_h74.py", "app_base_port": 8040}}}
{"query_id": "infinity-elation-patient-communication-task_h63", "dataset": "webarena-infinity", "query": "Check the visit summaries to find the patient whose BNP level improved. Reply to their most recent message confirming they can resume light activity, then update their emergency contact's phone number to (650) 555-0001.", "graders": ["infinity_state"], "start_url": "http://localhost:8010", "metadata": {"original_task_id": "elation-patient-communication-task_h63", "website": "elation-patient-communication", "category": "webarena-infinity", "additional": {"app_name": "elation-patient-communication", "difficulty": "hard", "verifier_path": "real-tasks/task_h63.py", "app_base_port": 8010}}}
{"query_id": "infinity-elation-patient-communication-task_h14", "dataset": "webarena-infinity", "query": "Change Dr. Torres's notification timeframe to 'Do not notify me' and remove Dr. Torres from Dr. Chen's General Question routing.", "graders": ["infinity_state"], "start_url": "http://localhost:8010", "metadata": {"original_task_id": "elation-patient-communication-task_h14", "website": "elation-patient-communication", "category": "webarena-infinity", "additional": {"app_name": "elation-patient-communication", "difficulty": "hard", "verifier_path": "real-tasks/task_h14.py", "app_base_port": 8010}}}
{"query_id": "infinity-gitlab-plan-and-track-task_h67", "dataset": "webarena-infinity", "query": "Delete all time entries from the GraphQL gateway issue, add a single new entry of 16 hours with summary 'Complete rewrite estimate', and set its time estimate to 40 hours.", "graders": ["infinity_state"], "start_url": "http://localhost:8050", "metadata": {"original_task_id": "gitlab-plan-and-track-task_h67", "website": "gitlab-plan-and-track", "category": "webarena-infinity", "additional": {"app_name": "gitlab-plan-and-track", "difficulty": "hard", "verifier_path": "real-tasks/task_h67.py", "app_base_port": 8050}}}
{"query_id": "infinity-gmail-accounts-and-contacts-task_h73", "dataset": "webarena-infinity", "query": "Among the individual people in your other contacts (those with a first and last name), find the one who was saved most recently. Move them to your main contacts, set their company to 'Salesforce', job title to 'Account Executive', and add the Work label.", "graders": ["infinity_state"], "start_url": "http://localhost:8070", "metadata": {"original_task_id": "gmail-accounts-and-contacts-task_h73", "website": "gmail-accounts-and-contacts", "category": "webarena-infinity", "additional": {"app_name": "gmail-accounts-and-contacts", "difficulty": "hard", "verifier_path": "real-tasks/task_h73.py", "app_base_port": 8070}}}
{"query_id": "infinity-elation-prescriptions-task_h4", "dataset": "webarena-infinity", "query": "Run a medication reconciliation and mark the Calcium+D3 supplement for discontinuation during the review.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h4", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h4.py", "app_base_port": 8020}}}
{"query_id": "infinity-elation-prescriptions-task_h47", "dataset": "webarena-infinity", "query": "The patient's SSRI is currently dispensed at a different pharmacy than most of his other medications. Prescribe a refill of the same SSRI at the same dose and sig, but send it to CVS #4521 instead \u2014 qty 30, 5 refills, 30 days supply.", "graders": ["infinity_state"], "start_url": "http://localhost:8020", "metadata": {"original_task_id": "elation-prescriptions-task_h47", "website": "elation-prescriptions", "category": "webarena-infinity", "additional": {"app_name": "elation-prescriptions", "difficulty": "hard", "verifier_path": "real-tasks/task_h47.py", "app_base_port": 8020}}}
{"query_id": "infinity-paypal-my-wallet-task_h89", "dataset": "webarena-infinity", "query": "If your USD PayPal balance is above $2,500, convert $500 to Japanese Yen. If it is $2,500 or below, first add $500 from your Chase bank account, then convert $500 to JPY. Either way, set the debit card cash back category to Fuel.", "graders": ["infinity_state"], "start_url": "http://localhost:8100", "metadata": {"original_task_id": "paypal-my-wallet-task_h89", "website": "paypal-my-wallet", "category": "webarena-infinity", "additional": {"app_name": "paypal-my-wallet", "difficulty": "hard", "verifier_path": "real-tasks/task_h89.py", "app_base_port": 8100}}}

View File

@@ -0,0 +1,88 @@
#!/usr/bin/env python3
"""
AGI SDK evaluation helper for BrowserOS eval framework.
Reads JSON from stdin with task_id and env_state, runs the agisdk
evaluator, and outputs the result as JSON to stdout.
Input format:
{"task_id": "dashdish-1", "env_state": {...}, "model_response": ""}
Output format:
{"reward": 0.0, "pass": false, "message": "...", "per_criterion": [...]}
"""
import json
import sys
def main():
data = json.loads(sys.stdin.read())
task_id = data["task_id"]
env_state = data["env_state"]
model_response = data.get("model_response", "")
try:
from agisdk.REAL.browsergym.webclones.evaluate import WebCloneEvaluator
from agisdk.REAL.browsergym.webclones.task_config import TaskConfig
except ImportError:
print(
json.dumps(
{
"reward": 0,
"pass": False,
"message": "agisdk package not installed. Run: pip install agisdk",
"per_criterion": [],
}
)
)
sys.exit(0)
try:
# Redirect stdout to stderr during evaluation — agisdk's rich logger
# prints directly to stdout, which would corrupt our JSON output
real_stdout = sys.stdout
sys.stdout = sys.stderr
tc = TaskConfig(task_id)
evaluator = WebCloneEvaluator(tc)
reward_val, _done, message, info = evaluator.evaluate(
env_state=env_state, model_response=model_response
)
sys.stdout = real_stdout
reward_val = float(reward_val) if reward_val is not None else 0.0
results = info.get("results", [])
per_criterion = [
{"passed": r[0], "detail": str(r[1]) if len(r) > 1 else ""}
for r in results
]
print(
json.dumps(
{
"reward": reward_val,
"pass": reward_val == 1.0,
"message": str(message),
"per_criterion": per_criterion,
}
)
)
except Exception as e:
sys.stdout = real_stdout if "real_stdout" in dir() else sys.__stdout__
print(
json.dumps(
{
"reward": 0,
"pass": False,
"message": f"Evaluation error: {str(e)}",
"per_criterion": [],
}
)
)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,83 @@
#!/usr/bin/env python3
"""
Build JSONL dataset for AGI SDK / REAL Bench evaluation.
Reads task definitions from the agisdk package, filters to feasible
action-only tasks (excludes llm_boolean evaluators), and outputs JSONL
to stdout in the BrowserOS eval framework format.
Usage:
python scripts/build-agisdk-dataset.py > data/agisdk-real.jsonl
"""
import json
import sys
def has_llm_eval(task: dict) -> bool:
return any(e.get("type") == "llm_boolean" for e in task.get("evals", []))
def main():
try:
from agisdk.REAL.tasks import all_tasks
except ImportError:
print(
"Error: agisdk package not installed. Run: pip install agisdk",
file=sys.stderr,
)
sys.exit(1)
count = 0
skipped_infeasible = 0
skipped_llm = 0
for task in all_tasks:
if not task.get("possible", True):
skipped_infeasible += 1
continue
if has_llm_eval(task):
skipped_llm += 1
continue
task_id = task["id"]
website = task.get("website", {})
goal = task.get("goal", "")
start_url = website.get("url", "")
if not start_url or not goal:
print(f"Warning: Skipping {task_id} — missing url or goal", file=sys.stderr)
continue
entry = {
"query_id": f"agisdk-{task_id}",
"dataset": "agisdk-real",
"query": goal,
"graders": ["agisdk_state_diff"],
"start_url": start_url,
"metadata": {
"original_task_id": task_id,
"website": website.get("name", ""),
"category": "agisdk-real",
"additional": {
"agisdk_task_id": task_id,
"challenge_type": task.get("challengeType", "action"),
"difficulty": task.get("difficulty", "unknown"),
"similar_to": website.get("similarTo", ""),
},
},
}
print(json.dumps(entry))
count += 1
print(
f"Generated {count} tasks (skipped {skipped_infeasible} infeasible, "
f"{skipped_llm} llm_boolean)",
file=sys.stderr,
)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,118 @@
#!/usr/bin/env python3
"""
Dataset generator for WebArena-Infinity benchmark.
Reads real-tasks.json from each app directory and outputs JSONL
in the eval framework's TaskSchema format.
Usage:
python build-infinity-dataset.py --apps-dir /path/to/webarena-infinity/apps
python build-infinity-dataset.py --apps-dir /path/to/apps --apps gmail linear --difficulty medium
"""
import argparse
import json
import os
import sys
def load_tasks(app_dir: str, app_name: str) -> list[dict]:
tasks_file = os.path.join(app_dir, "real-tasks.json")
if not os.path.exists(tasks_file):
print(f"Warning: No real-tasks.json found in {app_dir}", file=sys.stderr)
return []
with open(tasks_file) as f:
return json.load(f)
def build_task_entry(
app_name: str,
task: dict,
base_port: int,
) -> dict:
task_id = task.get("id", task.get("task_id", "unknown"))
difficulty = task.get("difficulty", "unknown")
query = task.get("query", task.get("instruction", task.get("task", "")))
verifier_path = task.get(
"verify",
task.get("verifier_path", f"real-tasks/{task_id}.py"),
)
return {
"query_id": f"infinity-{app_name}-{task_id}",
"dataset": "webarena-infinity",
"query": query,
"graders": ["infinity_state"],
"start_url": f"http://localhost:{base_port}",
"setup_script": f"POST http://localhost:{base_port}/api/reset",
"metadata": {
"original_task_id": f"{app_name}-{task_id}",
"website": app_name,
"category": "webarena-infinity",
"additional": {
"app_name": app_name,
"difficulty": difficulty,
"verifier_path": verifier_path,
"app_port": base_port,
},
},
}
def main():
parser = argparse.ArgumentParser(
description="Generate JSONL dataset from WebArena-Infinity apps"
)
parser.add_argument(
"--apps-dir",
required=True,
help="Path to webarena-infinity/apps/ directory",
)
parser.add_argument(
"--apps",
nargs="*",
default=None,
help="Filter to specific app names (default: all)",
)
parser.add_argument(
"--difficulty",
choices=["easy", "medium", "hard"],
default=None,
help="Filter by difficulty tier",
)
parser.add_argument(
"--base-port",
type=int,
default=8000,
help="Starting port number for apps (default: 8000)",
)
args = parser.parse_args()
if not os.path.isdir(args.apps_dir):
print(f"Error: {args.apps_dir} is not a directory", file=sys.stderr)
sys.exit(1)
app_dirs = sorted(os.listdir(args.apps_dir))
if args.apps:
app_dirs = [d for d in app_dirs if d in args.apps]
port = args.base_port
for app_name in app_dirs:
app_path = os.path.join(args.apps_dir, app_name)
if not os.path.isdir(app_path):
continue
tasks = load_tasks(app_path, app_name)
for task in tasks:
difficulty = task.get("difficulty", "unknown")
if args.difficulty and difficulty != args.difficulty:
continue
entry = build_task_entry(app_name, task, port)
print(json.dumps(entry))
port += 1
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,82 @@
#!/usr/bin/env python3
"""
Evaluation helper for WebArena-Infinity verifier scripts.
Reads JSON from stdin with app_server_url, verifier_path, and task_id.
Runs the verifier against the app server and outputs a JSON result.
Verifiers have the signature: verify(server_url: str) -> tuple[bool, str]
They fetch /api/state internally and return (passed, message).
Usage:
echo '{"app_server_url": "http://localhost:8000", "verifier_path": "/path/to/verify.py"}' | python infinity-evaluate.py
"""
import importlib.util
import json
import sys
import traceback
def load_verifier(verifier_path: str):
spec = importlib.util.spec_from_file_location("verifier", verifier_path)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot load verifier from {verifier_path}")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def main():
try:
data = json.loads(sys.stdin.read())
except json.JSONDecodeError as e:
print(json.dumps({"pass": False, "reward": 0.0, "message": f"Invalid JSON input: {e}"}))
sys.exit(1)
server_url = data.get("app_server_url", "")
verifier_path = data.get("verifier_path", "")
if not server_url or not verifier_path:
print(json.dumps({
"pass": False,
"reward": 0.0,
"message": "Missing app_server_url or verifier_path",
}))
sys.exit(1)
try:
verifier = load_verifier(verifier_path)
fn = getattr(verifier, "verify", None)
if not callable(fn):
raise AttributeError(
f"Verifier has no verify() function. "
f"Available: {[a for a in dir(verifier) if not a.startswith('_')]}"
)
# Verifiers take server_url and fetch state internally
result = fn(server_url)
# Return is tuple[bool, str]
if isinstance(result, tuple) and len(result) >= 2:
passed, message = result[0], str(result[1])
else:
passed, message = bool(result), str(result)
except Exception as e:
print(json.dumps({
"pass": False,
"reward": 0.0,
"message": f"Verifier error: {e}\n{traceback.format_exc()}",
}))
sys.exit(1)
print(json.dumps({
"pass": passed,
"reward": 1.0 if passed else 0.0,
"message": message,
}))
if __name__ == "__main__":
main()

View File

@@ -59,6 +59,8 @@ interface RunSummary {
}
const PASS_FAIL_GRADER_ORDER = [
'agisdk_state_diff',
'infinity_state',
'performance_grader',
'webvoyager_grader',
'fara_combined',

View File

@@ -0,0 +1,202 @@
import { spawn } from 'node:child_process'
import { join } from 'node:path'
import type { GraderResult } from '../../types'
import { callMcpTool } from '../../utils/mcp-client'
import type { Grader, GraderInput } from '../types'
const EVAL_SCRIPT = join(
import.meta.dirname,
'..',
'..',
'..',
'scripts',
'agisdk-evaluate.py',
)
export class AgisdkStateDiffGrader implements Grader {
name = 'agisdk_state_diff'
async grade(input: GraderInput): Promise<GraderResult> {
const taskId = this.extractTaskId(input.task.query_id)
const startUrl = this.extractStartUrl(input)
const mcpEndpoint =
input.mcpUrl ||
`${process.env.BROWSEROS_SERVER_URL || 'http://127.0.0.1:9110'}/mcp`
if (!startUrl) {
return {
score: 0,
pass: false,
reasoning: 'Could not determine clone site URL from task',
}
}
const origin = new URL(startUrl).origin
let envState: Record<string, unknown>
try {
envState = await this.fetchFinishState(origin, mcpEndpoint)
} catch (error) {
return {
score: 0,
pass: false,
reasoning: `Failed to fetch /finish endpoint: ${error instanceof Error ? error.message : String(error)}`,
details: { origin, error: true },
}
}
try {
const result = await this.runPythonEvaluator(
taskId,
envState,
input.finalAnswer || '',
)
return {
score: result.reward,
pass: result.pass,
reasoning:
result.message ||
(result.pass ? 'All criteria passed' : 'Some criteria failed'),
details: {
reward: result.reward,
per_criterion: result.per_criterion,
origin,
agisdk_task_id: taskId,
},
}
} catch (error) {
return {
score: 0,
pass: false,
reasoning: `Python evaluator error: ${error instanceof Error ? error.message : String(error)}`,
details: { error: true },
}
}
}
private extractTaskId(queryId: string): string {
return queryId.replace(/^agisdk-/, '')
}
private extractStartUrl(input: GraderInput): string | null {
// Derive from task_id: "dashdish-10" → "https://evals-dashdish.vercel.app"
// Task IDs are "{site}-{number}" where site may contain hyphens (e.g. "fly-unified-5")
const taskId = this.extractTaskId(input.task.query_id)
const siteId = taskId.replace(/-\d+$/, '')
if (siteId) return `https://evals-${siteId}.vercel.app`
// Fallback: search messages for vercel.app URLs
for (const msg of input.messages) {
const text =
msg.type === 'user'
? msg.content
: msg.type === 'tool-input-available'
? JSON.stringify(msg.input)
: ''
const urlMatch = text.match(/https?:\/\/[^\s"']+\.vercel\.app/)
if (urlMatch) return urlMatch[0]
}
return null
}
private async fetchFinishState(
origin: string,
mcpEndpoint: string,
): Promise<Record<string, unknown>> {
const finishUrl = `${origin}/finish`
// Navigate browser to /finish page (state diff is rendered client-side)
await callMcpTool(mcpEndpoint, 'navigate_page', {
url: finishUrl,
page: 1,
})
// Wait for the page to render, then extract JSON from <pre> element
const result = await callMcpTool(mcpEndpoint, 'evaluate_script', {
page: 1,
expression: `
new Promise((resolve, reject) => {
let attempts = 0;
const check = () => {
const pre = document.querySelector('pre');
if (pre && pre.textContent.trim().startsWith('{')) {
resolve(pre.textContent);
} else if (++attempts > 20) {
reject(new Error('Timed out waiting for <pre> JSON on /finish'));
} else {
setTimeout(check, 500);
}
};
check();
})
`,
})
const textContent = result.content?.find(
(c: { type: string }) => c.type === 'text',
)
if (!textContent?.text) {
throw new Error('No text content returned from /finish page')
}
return JSON.parse(textContent.text) as Record<string, unknown>
}
private runPythonEvaluator(
taskId: string,
envState: Record<string, unknown>,
modelResponse: string,
): Promise<{
reward: number
pass: boolean
message: string
per_criterion: unknown[]
}> {
return new Promise((resolve, reject) => {
const proc = spawn('python3', [EVAL_SCRIPT], {
stdio: ['pipe', 'pipe', 'pipe'],
})
const inputData = JSON.stringify({
task_id: taskId,
env_state: envState,
model_response: modelResponse,
})
let stdout = ''
let stderr = ''
proc.stdout.on('data', (data: Buffer) => {
stdout += data.toString()
})
proc.stderr.on('data', (data: Buffer) => {
stderr += data.toString()
})
proc.on('close', (code) => {
if (code !== 0) {
reject(
new Error(`Python evaluator exited with code ${code}: ${stderr}`),
)
return
}
try {
const result = JSON.parse(stdout.trim())
resolve(result)
} catch {
reject(new Error(`Failed to parse evaluator output: ${stdout}`))
}
})
proc.on('error', (err) => {
reject(new Error(`Failed to spawn Python evaluator: ${err.message}`))
})
proc.stdin.write(inputData)
proc.stdin.end()
})
}
}

View File

@@ -0,0 +1,134 @@
import { join, resolve } from 'node:path'
import type { GraderResult } from '../../types'
import type { Grader, GraderInput } from '../types'
interface InfinityEvalInput {
app_server_url: string
verifier_path: string
task_id: string
}
interface InfinityEvalOutput {
pass: boolean
reward: number
message: string
}
const EVAL_SCRIPT = resolve(
import.meta.dir,
'../../../scripts/infinity-evaluate.py',
)
export class InfinityStateGrader implements Grader {
name = 'infinity_state'
async grade(input: GraderInput): Promise<GraderResult> {
const parsed = this.parseQueryId(input.task.query_id)
if (!parsed) {
return {
score: 0,
pass: false,
reasoning: `Cannot parse query_id "${input.task.query_id}" — expected format: infinity-{app}-{task_id}`,
}
}
const appServerUrl = this.resolveAppServerUrl(input)
if (!appServerUrl) {
return {
score: 0,
pass: false,
reasoning: 'Cannot determine app server URL',
}
}
const infinityDir = process.env.WEBARENA_INFINITY_DIR
if (!infinityDir) {
return {
score: 0,
pass: false,
reasoning:
'WEBARENA_INFINITY_DIR env var not set. Point it to the webarena-infinity repo root.',
}
}
const verifierPath = join(
infinityDir,
'apps',
parsed.appName,
'real-tasks',
`${parsed.taskId}.py`,
)
const evalInput: InfinityEvalInput = {
app_server_url: appServerUrl,
verifier_path: verifierPath,
task_id: input.task.query_id,
}
try {
const result = await this.runPythonEvaluator(evalInput)
return {
score: result.pass ? 1 : 0,
pass: result.pass,
reasoning: result.message,
details: {
reward: result.reward,
app_name: parsed.appName,
app_server_url: appServerUrl,
},
}
} catch (error) {
return {
score: 0,
pass: false,
reasoning: `Evaluator process error: ${error instanceof Error ? error.message : String(error)}`,
}
}
}
private parseQueryId(
queryId: string,
): { appName: string; taskId: string } | null {
// Task IDs start with "task_", app names may contain hyphens
// e.g. "infinity-elation-prescriptions-task_h69"
const match = queryId.match(/^infinity-(.+)-(task_.+)$/)
if (!match) return null
return { appName: match[1], taskId: match[2] }
}
private resolveAppServerUrl(input: GraderInput): string | null {
// Passed directly from task executor (started by InfinityAppManager)
if (input.infinityAppUrl) return input.infinityAppUrl
// Fallback: env var for manual testing
if (process.env.INFINITY_APP_URL) return process.env.INFINITY_APP_URL
return null
}
private async runPythonEvaluator(
evalInput: InfinityEvalInput,
): Promise<InfinityEvalOutput> {
const proc = Bun.spawn(['python3', EVAL_SCRIPT], {
stdin: 'pipe',
stdout: 'pipe',
stderr: 'pipe',
})
const inputJson = JSON.stringify(evalInput)
proc.stdin.write(inputJson)
proc.stdin.end()
const stdout = await new Response(proc.stdout).text()
const stderr = await new Response(proc.stderr).text()
const exitCode = await proc.exited
if (exitCode !== 0) {
throw new Error(
`Python evaluator exited with code ${exitCode}: ${stderr || stdout}`,
)
}
return JSON.parse(stdout.trim()) as InfinityEvalOutput
}
}

View File

@@ -1,4 +1,6 @@
import type { GraderResult } from '../types'
import { AgisdkStateDiffGrader } from './benchmark/agisdk-state-diff'
import { InfinityStateGrader } from './benchmark/infinity-state'
import { Mind2WebJudgeGrader } from './benchmark/mind2web'
import { WebVoyagerGrader } from './benchmark/webvoyager'
import { FaraAlignmentGrader } from './fara/alignment'
@@ -19,7 +21,13 @@ export function createGrader(
options: GraderOptions | null,
): Grader | null {
switch (name) {
// Benchmark graders
// Deterministic benchmark graders (no LLM judge)
case 'agisdk_state_diff':
return new AgisdkStateDiffGrader()
case 'infinity_state':
return new InfinityStateGrader()
// LLM-based benchmark graders
case 'webvoyager_grader':
if (!options?.apiKey) return null
return new WebVoyagerGrader(
@@ -107,10 +115,12 @@ export async function runGraders(
// Export grader classes for direct use
export {
AgisdkStateDiffGrader,
FaraAlignmentGrader,
FaraCombinedGrader,
FaraMultimodalGrader,
FaraRubricGrader,
InfinityStateGrader,
Mind2WebJudgeGrader,
PerformanceGrader,
WebVoyagerGrader,

View File

@@ -11,6 +11,8 @@ export interface GraderInput {
finalAnswer: string | null
expectedAnswer?: string | null
outputDir: string
mcpUrl?: string
infinityAppUrl?: string
}
export interface Grader {

View File

@@ -0,0 +1,89 @@
/**
* Manages WebArena-Infinity app server lifecycle per task.
*
* Each worker gets a unique port: base_port + worker_index.
* Server is started fresh before each task and killed after,
* guaranteeing clean state.
*/
import { type ChildProcess, spawn } from 'node:child_process'
import { join } from 'node:path'
export class InfinityAppManager {
private proc: ChildProcess | null = null
private port: number
private infinityDir: string
constructor(
private workerIndex: number,
private basePort: number = 8000,
) {
this.port = basePort + workerIndex
this.infinityDir = process.env.WEBARENA_INFINITY_DIR || ''
}
async startApp(appName: string): Promise<string> {
await this.stop()
if (!this.infinityDir) {
throw new Error('WEBARENA_INFINITY_DIR env var not set')
}
const serverScript = join(this.infinityDir, 'apps', appName, 'server.py')
this.proc = spawn('python3', [serverScript, '--port', String(this.port)], {
stdio: ['ignore', 'pipe', 'pipe'],
cwd: join(this.infinityDir, 'apps', appName),
})
// Wait for server to be ready
const url = `http://localhost:${this.port}`
await this.waitForReady(url)
return url
}
async stop(): Promise<void> {
if (this.proc) {
this.proc.kill('SIGTERM')
await new Promise<void>((resolve) => {
const timeout = setTimeout(() => {
this.proc?.kill('SIGKILL')
resolve()
}, 3000)
this.proc?.on('exit', () => {
clearTimeout(timeout)
resolve()
})
})
this.proc = null
}
}
getPort(): number {
return this.port
}
getUrl(): string {
return `http://localhost:${this.port}`
}
private async waitForReady(
url: string,
maxAttempts = 30,
intervalMs = 500,
): Promise<void> {
for (let i = 0; i < maxAttempts; i++) {
try {
const resp = await fetch(url, {
signal: AbortSignal.timeout(2000),
})
if (resp.ok) return
} catch {
// Server not ready yet
}
await new Promise((r) => setTimeout(r, intervalMs))
}
throw new Error(
`Infinity app server not ready after ${maxAttempts * intervalMs}ms on port ${this.port}`,
)
}
}

View File

@@ -9,6 +9,7 @@ import {
import { runGraders } from '../graders/registry'
import type { ErrorSource, EvalConfig, GraderResult, Task } from '../types'
import { callMcpTool } from '../utils/mcp-client'
import { InfinityAppManager } from './infinity-app-manager'
import type { GraderOptions, TaskResult } from './types'
// ============================================================================
@@ -101,6 +102,36 @@ export class TaskExecutor {
// Resolve page ID once — fresh browser has exactly one page
const pageId = await this.resolveInitialPageId(mcpUrl)
// For Infinity tasks, start a fresh app server per task
let infinityManager: InfinityAppManager | null = null
let actualStartUrl = task.start_url
if (task.dataset === 'webarena-infinity') {
const appName = (task.metadata?.additional as Record<string, unknown>)
?.app_name as string
const appBasePort =
((task.metadata?.additional as Record<string, unknown>)
?.app_base_port as number) || 8000
const workerIndex = this.config.browseros.base_server_port - 9110 // derive from port offset
if (appName && process.env.WEBARENA_INFINITY_DIR) {
infinityManager = new InfinityAppManager(workerIndex, appBasePort)
try {
actualStartUrl = await infinityManager.startApp(appName)
console.log(
` Infinity app "${appName}" started on port ${infinityManager.getPort()}`,
)
} catch (error) {
throw new TaskExecutionError(
`Failed to start Infinity app: ${error instanceof Error ? error.message : String(error)}`,
task,
'navigation',
error instanceof Error ? error : undefined,
)
}
}
}
try {
// Phase 1: Set viewport + navigate to start URL
try {
@@ -114,10 +145,10 @@ export class TaskExecutor {
)
}
if (task.start_url && task.start_url !== 'about:blank') {
if (actualStartUrl && actualStartUrl !== 'about:blank') {
try {
await callMcpTool(mcpUrl, 'navigate_page', {
url: task.start_url,
url: actualStartUrl,
page: pageId,
})
} catch (error) {
@@ -134,7 +165,11 @@ export class TaskExecutor {
const agentResult = await this.executeAgent(task, pageId)
// Phase 3: Run graders
const graderResults = await this.runGraders(task, agentResult)
const graderResults = await this.runGraders(
task,
agentResult,
infinityManager?.getUrl(),
)
const status =
agentResult.metadata.termination_reason === 'timeout'
@@ -169,6 +204,11 @@ export class TaskExecutor {
} catch {
// Ignore cleanup errors
}
// Stop Infinity app server if running
if (infinityManager) {
await infinityManager.stop().catch(() => {})
}
}
}
@@ -209,6 +249,7 @@ export class TaskExecutor {
private async runGraders(
task: Task,
agentResult: AgentResult,
infinityAppUrl?: string,
): Promise<Record<string, GraderResult>> {
const configGraders = this.config.graders ?? []
const taskGraders = task.graders ?? []
@@ -234,6 +275,8 @@ export class TaskExecutor {
expectedAnswer: (task.metadata?.additional as Record<string, unknown>)
?.answer as string | undefined,
outputDir: join(this.outputDir, task.query_id),
mcpUrl: `${this.config.browseros.server_url}/mcp`,
infinityAppUrl,
},
this.deps.graderOptions,
)

View File

@@ -100,6 +100,8 @@ export interface TaskResultSummary {
// ============================================================================
export const PASS_FAIL_GRADER_ORDER = [
'agisdk_state_diff',
'infinity_state',
'performance_grader',
'webvoyager_grader',
'fara_combined',

View File

@@ -13,6 +13,8 @@ BROWSEROS_VERSION=
BROWSEROS_INSTALL_ID=
BROWSEROS_CLIENT_ID=
BROWSEROS_TRUSTED_ORIGINS=
# Graph service
CODEGEN_SERVICE_URL=

View File

@@ -11,6 +11,8 @@ export interface AgentSession {
mcpServerKey?: string
/** Workspace directory when the session was created, for change detection. */
workingDir?: string
/** LLM config used when the session was created, for provider/model changes. */
llmConfigKey?: string
}
export class SessionStore {

View File

@@ -0,0 +1,28 @@
/**
* @license
* Copyright 2025 BrowserOS
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
import type { MiddlewareHandler } from 'hono'
import { isAllowedOrigin } from '../utils/cors'
export function requireTrustedOrigin(): MiddlewareHandler {
return async (c, next) => {
const origin = c.req.header('Origin')
if (origin !== undefined && !isAllowedOrigin(origin)) {
return c.json(
{
error: {
name: 'ForbiddenOrigin',
message: 'Origin not allowed',
code: 'FORBIDDEN_ORIGIN',
statusCode: 403,
},
},
403,
)
}
return next()
}
}

View File

@@ -20,6 +20,7 @@ import { initializeOAuth } from '../lib/clients/oauth'
import { getDb } from '../lib/db'
import { logger } from '../lib/logger'
import { Sentry } from '../lib/sentry'
import { requireTrustedOrigin } from './middleware/require-trusted-origin'
import { createChatRoutes } from './routes/chat'
import { createCreditsRoutes } from './routes/credits'
import { createHealthRoute } from './routes/health'
@@ -103,6 +104,7 @@ export async function createHttpServer(config: HttpServerConfig) {
const app = new Hono<Env>()
.use('/*', cors(defaultCorsConfig))
.use('/*', requireTrustedOrigin())
.route('/health', createHealthRoute({ browser }))
.route(
'/shutdown',

View File

@@ -65,6 +65,7 @@ export class ChatService {
declinedApps: request.declinedApps,
browserosId: this.deps.browserosId,
}
const llmConfigKey = this.buildLlmConfigKey(agentConfig)
let session = sessionStore.get(request.conversationId)
let isNewSession = false
@@ -144,6 +145,24 @@ export class ChatService {
}
}
// Detect provider/model/auth change mid-conversation -> rebuild session.
// The AI SDK agent captures the language model at construction time, so a
// reused session would keep calling the previous provider.
if (session && session.llmConfigKey !== llmConfigKey) {
logger.info('LLM config changed mid-conversation, rebuilding session', {
conversationId: request.conversationId,
provider: agentConfig.provider,
model: agentConfig.model,
})
session = await this.rebuildSession(
session,
request,
agentConfig,
mcpServerKey,
llmConfigKey,
)
}
if (!session) {
isNewSession = true
let hiddenPageId: number | undefined
@@ -209,6 +228,7 @@ export class ChatService {
browserContext,
mcpServerKey,
workingDir: request.userWorkingDir,
llmConfigKey,
}
sessionStore.set(request.conversationId, session)
}
@@ -341,6 +361,7 @@ export class ChatService {
request: ChatRequest,
agentConfig: ResolvedAgentConfig,
mcpServerKey: string,
llmConfigKey = this.buildLlmConfigKey(agentConfig),
): Promise<AgentSession> {
const previousMessages = session.agent.messages
await session.agent.dispose()
@@ -365,6 +386,7 @@ export class ChatService {
browserContext,
mcpServerKey,
workingDir: request.userWorkingDir,
llmConfigKey,
}
newSession.agent.messages = sanitizeMessagesForToolset(
previousMessages,
@@ -374,6 +396,26 @@ export class ChatService {
return newSession
}
private buildLlmConfigKey(config: ResolvedAgentConfig): string {
return JSON.stringify({
provider: config.provider,
model: config.model,
apiKey: config.apiKey,
baseUrl: config.baseUrl,
upstreamProvider: config.upstreamProvider,
resourceName: config.resourceName,
region: config.region,
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey,
sessionToken: config.sessionToken,
accountId: config.accountId,
reasoningEffort: config.reasoningEffort,
reasoningSummary: config.reasoningSummary,
contextWindowSize: config.contextWindowSize,
supportsImages: config.supportsImages,
})
}
private buildMcpServerKey(browserContext?: BrowserContext): string {
const managed = browserContext?.enabledMcpServers?.slice().sort() ?? []
const custom =

View File

@@ -8,12 +8,40 @@ import type { cors } from 'hono/cors'
type CorsOptions = Parameters<typeof cors>[0]
/**
* Default CORS configuration for the HTTP server.
* Permissive since MCP endpoints are protected by localhost check.
*/
const STATIC_ALLOWED_ORIGINS = new Set<string>([
'chrome-extension://bflpfmnmnokmjhmgnolecpppdbdophmk',
])
let cachedAllowedOrigins: Set<string> | null = null
function buildAllowedOrigins(): Set<string> {
const fromEnv = (process.env.BROWSEROS_TRUSTED_ORIGINS ?? '')
.split(',')
.map((value) => value.trim())
.filter((value) => value.length > 0)
return new Set([...STATIC_ALLOWED_ORIGINS, ...fromEnv])
}
function getAllowedOrigins(): Set<string> {
if (!cachedAllowedOrigins) {
cachedAllowedOrigins = buildAllowedOrigins()
}
return cachedAllowedOrigins
}
export function resetAllowedOriginsForTesting(): void {
cachedAllowedOrigins = null
}
export function isAllowedOrigin(origin: string): boolean {
return getAllowedOrigins().has(origin)
}
export const defaultCorsConfig: CorsOptions = {
origin: (origin: string | undefined) => origin || '*',
origin: (origin: string | undefined) => {
if (origin && isAllowedOrigin(origin)) return origin
return null
},
allowMethods: ['GET', 'POST', 'DELETE', 'OPTIONS'],
allowHeaders: ['Content-Type', 'Authorization', 'Accept'],
credentials: true,

View File

@@ -392,9 +392,48 @@ export class Browser {
// --- Observation ---
private async getFrameIds(session: ProtocolApi): Promise<string[]> {
try {
const result = await session.Page.getFrameTree()
const ids: string[] = []
type Tree = { frame: { id: string }; childFrames?: Tree[] }
function collect(tree: Tree) {
ids.push(tree.frame.id)
if (tree.childFrames)
for (const child of tree.childFrames) collect(child)
}
collect(result.frameTree as Tree)
return ids
} catch {
return []
}
}
private async fetchAXTree(session: ProtocolApi): Promise<AXNode[]> {
const result = await session.Accessibility.getFullAXTree()
return (result.nodes as AXNode[]) ?? []
const frameIds = await this.getFrameIds(session)
if (frameIds.length <= 1) {
const result = await session.Accessibility.getFullAXTree()
return (result.nodes as AXNode[]) ?? []
}
const allNodes: AXNode[] = []
for (const frameId of frameIds) {
try {
const result = await session.Accessibility.getFullAXTree({ frameId })
const nodes = (result.nodes as AXNode[]) ?? []
for (const node of nodes) {
allNodes.push({
...node,
nodeId: `${frameId}:${node.nodeId}`,
childIds: node.childIds?.map((id) => `${frameId}:${id}`),
})
}
} catch {
// Cross-origin or detached frames may fail — skip
}
}
return allNodes
}
async snapshot(page: number): Promise<string> {

View File

@@ -20,7 +20,7 @@ export function buildContentMarkdownExpression(
// Uses var + ES5 style for consistency with other injected scripts.
// Context object: { pre: bool, ld: listDepth, lt: listType, td: tableDepth }
const DOM_WALKER_SCRIPT = `(function(o) {
var SKIP = {SCRIPT:1,STYLE:1,NOSCRIPT:1,SVG:1,TEMPLATE:1,IFRAME:1,CANVAS:1,VIDEO:1,AUDIO:1,OBJECT:1,EMBED:1};
var SKIP = {SCRIPT:1,STYLE:1,NOSCRIPT:1,SVG:1,TEMPLATE:1,CANVAS:1,VIDEO:1,AUDIO:1,OBJECT:1,EMBED:1};
var FORM = {INPUT:1,SELECT:1,TEXTAREA:1,BUTTON:1};
var vh = window.innerHeight, vw = window.innerWidth;
var root = o.selector ? document.querySelector(o.selector) : document.body;
@@ -219,6 +219,15 @@ function walk(node, ctx) {
t = kids(el, ctx).trim();
return t ? '\\n*' + t + '*\\n' : '';
case 'IFRAME':
try {
var idoc = el.contentDocument;
if (idoc && idoc.body) return walk(idoc.body, ctx);
} catch(e) {}
var isrc = el.src || el.getAttribute('src');
if (isrc) return '\\n\\n[iframe: ' + isrc + ']\\n\\n';
return '';
default:
return kids(el, ctx);
}

View File

@@ -100,11 +100,16 @@ export function buildInteractiveTree(nodes: AXNode[]): string[] {
if (node.childIds) for (const childId of node.childIds) walk(childId)
}
const root =
nodes.find(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
) ?? nodes[0]
if (root?.childIds) for (const childId of root.childIds) walk(childId)
const roots = nodes.filter(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
)
if (roots.length === 0 && nodes[0]?.childIds) {
for (const childId of nodes[0].childIds) walk(childId)
} else {
for (const root of roots) {
if (root.childIds) for (const childId of root.childIds) walk(childId)
}
}
return lines
}
@@ -160,11 +165,16 @@ export function buildEnhancedTree(nodes: AXNode[]): string[] {
for (const childId of node.childIds) walk(childId, depth + 1)
}
const root =
nodes.find(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
) ?? nodes[0]
if (root?.childIds) for (const childId of root.childIds) walk(childId, 0)
const roots = nodes.filter(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
)
if (roots.length === 0 && nodes[0]?.childIds) {
for (const childId of nodes[0].childIds) walk(childId, 0)
} else {
for (const root of roots) {
if (root.childIds) for (const childId of root.childIds) walk(childId, 0)
}
}
return lines
}
@@ -292,11 +302,16 @@ export function extractLinkNodes(nodes: AXNode[]): LinkNode[] {
if (node.childIds) for (const childId of node.childIds) walk(childId)
}
const root =
nodes.find(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
) ?? nodes[0]
if (root?.childIds) for (const childId of root.childIds) walk(childId)
const roots = nodes.filter(
(n) => n.role?.value === 'RootWebArea' || n.role?.value === 'WebArea',
)
if (roots.length === 0 && nodes[0]?.childIds) {
for (const childId of nodes[0].childIds) walk(childId)
} else {
for (const root of roots) {
if (root.childIds) for (const childId of root.childIds) walk(childId)
}
}
return links
}

View File

@@ -148,6 +148,16 @@ function createMoonshotModel(config: ResolvedLLMConfig): LanguageModel {
})(config.model)
}
function createMinimaxModel(config: ResolvedLLMConfig): LanguageModel {
if (!config.baseUrl) throw new Error('Minimax provider requires baseUrl')
if (!config.apiKey) throw new Error('Minimax provider requires apiKey')
return createOpenAICompatible({
name: 'minimax',
baseURL: config.baseUrl,
apiKey: config.apiKey,
})(config.model)
}
function createQwenCodeModel(config: ResolvedLLMConfig): LanguageModel {
if (!config.apiKey) throw new Error('Qwen Code requires OAuth authentication')
return createOpenAICompatible({
@@ -192,6 +202,7 @@ const PROVIDER_FACTORIES: Record<string, ProviderFactory> = {
[LLM_PROVIDERS.CHATGPT_PRO]: createChatGPTProModel,
[LLM_PROVIDERS.GITHUB_COPILOT]: createGitHubCopilotModel,
[LLM_PROVIDERS.QWEN_CODE]: createQwenCodeModel,
[LLM_PROVIDERS.MINIMAX]: createMinimaxModel,
}
export function createLLMProvider(config: ResolvedLLMConfig): LanguageModel {

View File

@@ -0,0 +1,87 @@
/**
* @license
* Copyright 2025 BrowserOS
*/
import { afterEach, beforeEach, describe, expect, it } from 'bun:test'
import { Hono } from 'hono'
import { requireTrustedOrigin } from '../../../src/api/middleware/require-trusted-origin'
import { resetAllowedOriginsForTesting } from '../../../src/api/utils/cors'
function buildApp() {
return new Hono()
.use('/*', requireTrustedOrigin())
.get('/probe', (c) => c.json({ ok: true }))
.post('/probe', (c) => c.json({ ok: true, method: 'POST' }))
}
describe('requireTrustedOrigin', () => {
const previousEnv = process.env.BROWSEROS_TRUSTED_ORIGINS
beforeEach(() => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'chrome-extension://allowed'
resetAllowedOriginsForTesting()
})
afterEach(() => {
process.env.BROWSEROS_TRUSTED_ORIGINS = previousEnv
resetAllowedOriginsForTesting()
})
it('passes when no Origin header is set', async () => {
const res = await buildApp().request('/probe')
expect(res.status).toBe(200)
})
it('passes when Origin matches the allowlist', async () => {
const res = await buildApp().request('/probe', {
headers: { Origin: 'chrome-extension://allowed' },
})
expect(res.status).toBe(200)
})
it('rejects with 403 when Origin is unknown', async () => {
const res = await buildApp().request('/probe', {
headers: { Origin: 'https://example.com' },
})
expect(res.status).toBe(403)
const body = (await res.json()) as { error?: { code?: string } }
expect(body.error?.code).toBe('FORBIDDEN_ORIGIN')
})
it('rejects with 403 when Origin is the literal "null"', async () => {
const res = await buildApp().request('/probe', {
headers: { Origin: 'null' },
})
expect(res.status).toBe(403)
})
it('rejects POST with disallowed Origin without invoking the route handler', async () => {
const app = new Hono()
.use('/*', requireTrustedOrigin())
.post('/probe', () => {
throw new Error('handler must not run')
})
const res = await app.request('/probe', {
method: 'POST',
headers: { Origin: 'https://example.com' },
})
expect(res.status).toBe(403)
})
it('rejects port mismatches even when host matches', async () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'http://localhost:5173'
resetAllowedOriginsForTesting()
const res = await buildApp().request('/probe', {
headers: { Origin: 'http://localhost:5174' },
})
expect(res.status).toBe(403)
})
it('treats the allowlist as case-sensitive', async () => {
const res = await buildApp().request('/probe', {
headers: { Origin: 'CHROME-EXTENSION://allowed' },
})
expect(res.status).toBe(403)
})
})

View File

@@ -44,11 +44,19 @@ const createAgentUIStreamResponseSpy = mock(
},
)
const resolveLLMConfigSpy = mock(async () => ({
provider: 'openai',
model: 'gpt-5',
apiKey: 'test-key',
}))
const resolveLLMConfigSpy = mock(
async (config: {
provider?: string
model?: string
apiKey?: string
baseUrl?: string
}) => ({
provider: config.provider ?? 'openai',
model: config.model ?? 'gpt-5',
apiKey: config.apiKey ?? 'test-key',
baseUrl: config.baseUrl,
}),
)
mock.module('ai', () => ({
createAgentUIStreamResponse: createAgentUIStreamResponseSpy,
@@ -288,4 +296,65 @@ describe('ChatService scheduled task hidden page lifecycle', () => {
})
expect(browser.closePage).toHaveBeenCalledWith(88)
})
it('rebuilds an existing session when the LLM provider changes', async () => {
const firstAgent = createFakeAgent()
agentToReturn = firstAgent
streamResponseHandler = async ({ onFinish }) => {
await onFinish({ messages: agentToReturn?.messages ?? [] })
return new Response('ok')
}
const browser = {
resolveTabIds: mock(async () => new Map<number, number>()),
}
const sessionStore = createSessionStore()
const service = new ChatService({
sessionStore: sessionStore as never,
klavisClient: {} as never,
browser: browser as never,
registry: {} as never,
})
const conversationId = crypto.randomUUID()
const createCallsBefore = createAgentSpy.mock.calls.length
await service.processMessage(
{
conversationId,
message: 'First message',
provider: 'browseros',
model: 'browseros-auto',
mode: 'agent',
origin: 'sidepanel',
} as never,
new AbortController().signal,
)
const secondAgent = createFakeAgent()
agentToReturn = secondAgent
await service.processMessage(
{
conversationId,
message: 'Second message',
provider: 'chatgpt-pro',
model: 'gpt-5.3-codex',
mode: 'agent',
origin: 'sidepanel',
} as never,
new AbortController().signal,
)
expect(createAgentSpy.mock.calls.length).toBe(createCallsBefore + 2)
expect(firstAgent.dispose).toHaveBeenCalledTimes(1)
expect(sessionStore.get(conversationId)?.agent).toBe(secondAgent)
const latestCreateArgs = createAgentSpy.mock.calls.at(-1)?.[0] as {
resolvedConfig: { provider: string; model: string }
}
expect(latestCreateArgs.resolvedConfig).toMatchObject({
provider: 'chatgpt-pro',
model: 'gpt-5.3-codex',
})
})
})

View File

@@ -0,0 +1,73 @@
/**
* @license
* Copyright 2025 BrowserOS
*/
import { afterEach, beforeEach, describe, expect, it } from 'bun:test'
import {
isAllowedOrigin,
resetAllowedOriginsForTesting,
} from '../../../src/api/utils/cors'
describe('isAllowedOrigin', () => {
const previousEnv = process.env.BROWSEROS_TRUSTED_ORIGINS
beforeEach(() => {
resetAllowedOriginsForTesting()
})
afterEach(() => {
process.env.BROWSEROS_TRUSTED_ORIGINS = previousEnv
resetAllowedOriginsForTesting()
})
it('accepts the pinned published extension origin even when env is empty', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = ''
expect(
isAllowedOrigin('chrome-extension://bflpfmnmnokmjhmgnolecpppdbdophmk'),
).toBe(true)
})
it('rejects unknown origins when env is empty', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = ''
expect(isAllowedOrigin('https://example.com')).toBe(false)
expect(isAllowedOrigin('chrome-extension://someotherid')).toBe(false)
expect(isAllowedOrigin('null')).toBe(false)
})
it('accepts a single origin from env', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'chrome-extension://abcdef'
expect(isAllowedOrigin('chrome-extension://abcdef')).toBe(true)
expect(isAllowedOrigin('chrome-extension://other')).toBe(false)
})
it('accepts multiple comma-separated origins and trims whitespace', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS =
' chrome-extension://abc , http://localhost:5173 '
expect(isAllowedOrigin('chrome-extension://abc')).toBe(true)
expect(isAllowedOrigin('http://localhost:5173')).toBe(true)
expect(isAllowedOrigin('http://localhost:5174')).toBe(false)
})
it('is case-sensitive on origin match', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'chrome-extension://abc'
expect(isAllowedOrigin('CHROME-EXTENSION://abc')).toBe(false)
})
it('treats port as part of the origin', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'http://localhost:5173'
expect(isAllowedOrigin('http://localhost:5173')).toBe(true)
expect(isAllowedOrigin('http://localhost:5174')).toBe(false)
expect(isAllowedOrigin('http://localhost')).toBe(false)
})
it('rejects the literal string "null" unless explicitly allowlisted', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'chrome-extension://abc'
expect(isAllowedOrigin('null')).toBe(false)
})
it('drops empty entries between commas', () => {
process.env.BROWSEROS_TRUSTED_ORIGINS = 'chrome-extension://abc,,, ,'
expect(isAllowedOrigin('chrome-extension://abc')).toBe(true)
expect(isAllowedOrigin('')).toBe(false)
})
})

View File

@@ -27,6 +27,7 @@
"dependencies": {
"@ai-sdk/react": "^3.0.96",
"@browseros/server": "workspace:*",
"@browseros/shared": "workspace:*",
"@hookform/resolvers": "^5.2.2",
"@lobehub/icons": "^2.44.0",
"@mdxeditor/editor": "^3.52.4",
@@ -2210,7 +2211,7 @@
"chrome-devtools-frontend": ["chrome-devtools-frontend@1.0.1577886", "", {}, "sha512-B9hY3o/0RuVCDWNYh9YnkEbRrPUMCY+NaOgBxvZRzGvqbGSMNckkVSdO67SwWR8bm4fo/qplXbUj0cSr229V6w=="],
"chrome-devtools-mcp": ["chrome-devtools-mcp@0.20.3", "", { "bin": { "chrome-devtools-mcp": "build/src/bin/chrome-devtools-mcp.js", "chrome-devtools": "build/src/bin/chrome-devtools.js" } }, "sha512-6MlNKlKa+J1FX9w4SUnFERF4MRGWLlrnZvIJGhhsuuMPM7qUG0F4SwheRyjwl0+tsTemxMCBHiib8mXkg5j6og=="],
"chrome-devtools-mcp": ["chrome-devtools-mcp@0.21.0", "", { "bin": { "chrome-devtools-mcp": "build/src/bin/chrome-devtools-mcp.js", "chrome-devtools": "build/src/bin/chrome-devtools.js" } }, "sha512-d+iqrRmcwpRFV3Q4DRCF2LCoq+WCRU3GhISKQ9v8g+1C2Uh8upj3urkjxNO4QIjhBMIYei/VQ1OQLFceby80Og=="],
"chrome-launcher": ["chrome-launcher@1.2.0", "", { "dependencies": { "@types/node": "*", "escape-string-regexp": "^4.0.0", "is-wsl": "^2.2.0", "lighthouse-logger": "^2.0.1" }, "bin": { "print-chrome-path": "bin/print-chrome-path.cjs" } }, "sha512-JbuGuBNss258bvGil7FT4HKdC3SC2K7UAEUqiPy3ACS3Yxo3hAW6bvFpCu2HsIJLgTqxgEX6BkujvzZfLpUD0Q=="],

View File

@@ -80,3 +80,8 @@ export const CONTENT_LIMITS = {
CONSOLE_DEFAULT_LIMIT: 50,
CONSOLE_MAX_LIMIT: 200,
} as const
export const REFERRAL_LIMITS = {
MAX_DAILY_CREDITS: 500,
CREDITS_PER_REFERRAL: 200,
} as const

View File

@@ -19,4 +19,6 @@ export const EXTERNAL_URLS = {
QWEN_DEVICE_CODE: 'https://chat.qwen.ai/api/v1/oauth2/device/code',
QWEN_OAUTH_TOKEN: 'https://chat.qwen.ai/api/v1/oauth2/token',
QWEN_CODE_API: 'https://portal.qwen.ai/v1',
REFERRAL_SERVICE: 'https://browseros-referral.fly.dev',
CREDITS_GATEWAY: 'https://llm.browseros.com',
} as const

View File

@@ -27,6 +27,7 @@ export const LLM_PROVIDERS = {
CHATGPT_PRO: 'chatgpt-pro',
GITHUB_COPILOT: 'github-copilot',
QWEN_CODE: 'qwen-code',
MINIMAX: 'minimax',
} as const
/**
@@ -48,6 +49,7 @@ export const LLMProviderSchema: z.ZodEnum<
'chatgpt-pro',
'github-copilot',
'qwen-code',
'minimax',
]
> = z.enum([
LLM_PROVIDERS.ANTHROPIC,
@@ -64,6 +66,7 @@ export const LLMProviderSchema: z.ZodEnum<
LLM_PROVIDERS.CHATGPT_PRO,
LLM_PROVIDERS.GITHUB_COPILOT,
LLM_PROVIDERS.QWEN_CODE,
LLM_PROVIDERS.MINIMAX,
])
export type LLMProvider = z.infer<typeof LLMProviderSchema>