mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-21 12:55:09 +00:00
* feat: add GitHub Copilot as OAuth-based LLM provider Add GitHub Copilot as a second OAuth provider using the Device Code flow (RFC 8628). Users authenticate via github.com/login/device, and the server polls for token completion. Supports 25+ models through a single Copilot subscription. Key changes: - Device Code OAuth flow in token manager (poll with safety margin) - Custom fetch wrapper injecting Copilot headers + vision detection - Provider factory using createOpenAICompatible for Chat Completions API - Extension UI with template card, auto-create on auth, and disconnect * fix: address PR review comments for GitHub Copilot OAuth - Validate device code response for error fields (GitHub can return 200 with error payload) - Store empty refreshToken instead of access token for GitHub tokens - Add closeButton to Toaster for dismissing device code toast * fix: add github-copilot to agent provider factory The chat route uses a separate provider-factory.ts (agent layer) from the test-provider route (llm/provider.ts). Added createGitHubCopilotFactory to the agent factory so chat works with GitHub Copilot. * fix: add github-copilot to provider icons, models, and dialog - Add Github icon from lucide-react to providerIcons map - Add 8 Copilot models (GPT-4o, Claude, Gemini, Grok) to models.ts - Add github-copilot to NewProviderDialog zod enum, validation skip, canTest check, and OAuth credential message * fix: reorder copilot models with free-tier models first Put models available on Copilot Free at the top (gpt-4o, gpt-4.1, gpt-5-mini, claude-haiku-4.5, grok-code-fast-1), followed by premium models that require paid Copilot subscription. * fix: set correct 64K context window for Copilot models Copilot API enforces a 64K input token limit regardless of the underlying model's native context window. Updated all model entries and the default template to 64000 so compaction triggers correctly. * fix: use actual per-model prompt limits from Copilot /models API Queried api.githubcopilot.com/models for real max_prompt_tokens values. GPT-4o/4.1 have 64K, Claude/gpt-5-mini have 128K, GPT-5.x have 272K. Also updated model list to match what's actually available on the API (e.g. claude-sonnet-4.6 instead of 4.5, added gpt-5.4/5.2-codex). * feat: resize images for Copilot using VS Code's algorithm Large screenshots cause 413 errors on Copilot's API. Resize images following VS Code's approach: max 2048px longest side, 768px shortest side, re-encode as JPEG at 75% quality. Uses sharp for server-side image processing. * fix: address all Greptile P1 review comments - Add .catch() on fire-and-forget pollDeviceCode to prevent unhandled rejection crashes (Node 15+) - Add deduplication guard (activeDeviceFlows Set) to prevent concurrent device code flows for the same provider - Add runtime validation of server response in frontend before calling window.open() and showing toast - Remove dead GITHUB_DEVICE_VERIFICATION constant from urls.ts * fix: upgrade biome to 2.4.8, fix all lint errors, and address review bugs - Upgrade biome from 2.4.5 to 2.4.8 (matches CI) and migrate configs - Fix image resize: only re-encode when dimensions actually change - Fix device code polling: retry on transient network errors instead of aborting - Allow restarting device code flow (clear old flow instead of throwing 500) - Fix pre-existing noNonNullAssertion and noExplicitAny lint errors globally * fix: address Greptile P2 review — image resize and config guard - Fix early-return guard: check max/min sides against their respective limits (MAX_LONG_SIDE/MAX_SHORT_SIDE) instead of both against SHORT - Preserve PNG alpha: detect hasAlpha and keep PNG format instead of unconditionally converting to lossy JPEG - Keep browserosId guard in resolveGitHubCopilotConfig consistent with ChatGPT Pro pattern (safety check that caller context is valid) * feat: update Copilot models to full list from pricing page, default to gpt-5-mini Added all 23 models from GitHub Copilot pricing page. Ordered with free-tier models first (gpt-5-mini, claude-haiku-4.5), then premium. Changed default from gpt-4o to gpt-5-mini since it's unlimited on Pro plan and has 128K context (vs gpt-4o's 64K limit).
341 lines
10 KiB
TypeScript
341 lines
10 KiB
TypeScript
#!/usr/bin/env bun
|
||
/**
|
||
* Annotate Screenshots with Tool Coordinates
|
||
*
|
||
* Reads messages.jsonl from an eval run and annotates screenshots with
|
||
* coordinate markers showing where browser actions (click, fill, hover, drag)
|
||
* actually landed.
|
||
*
|
||
* Coordinates are in CSS pixels (returned by tool outputs). They're mapped to
|
||
* screenshot pixels using: screenshot_xy = css_xy × devicePixelRatio
|
||
*
|
||
* Usage:
|
||
* bun run apps/eval/scripts/annotate-screenshots.ts <results-folder> [--dpr=2]
|
||
*
|
||
* Options:
|
||
* --dpr=N devicePixelRatio (default: 2). Use the value from take_screenshot output.
|
||
*
|
||
* Output:
|
||
* Creates an 'annotated' folder inside the screenshots directory.
|
||
*/
|
||
|
||
import {
|
||
copyFileSync,
|
||
existsSync,
|
||
mkdirSync,
|
||
readdirSync,
|
||
readFileSync,
|
||
} from 'node:fs'
|
||
import { basename, join } from 'node:path'
|
||
import sharp from 'sharp'
|
||
|
||
interface ActionInfo {
|
||
screenshotNum: number
|
||
toolName: string
|
||
cssX: number
|
||
cssY: number
|
||
// For drag: second coordinate
|
||
cssX2?: number
|
||
cssY2?: number
|
||
}
|
||
|
||
const COORDINATE_TOOLS = new Set([
|
||
'click',
|
||
'click_at',
|
||
'fill',
|
||
'hover',
|
||
'hover_at',
|
||
'type_at',
|
||
'drag',
|
||
'drag_at',
|
||
])
|
||
|
||
/**
|
||
* Parse CSS coordinates from tool output text.
|
||
*
|
||
* Formats returned by tools:
|
||
* "Clicked [47] at (125, 42)"
|
||
* "Typed 5 characters into [12] at (300, 150)"
|
||
* "Hovered over [31] at (200, 88)"
|
||
* "Clicked at (125, 42)"
|
||
* "Hovered at (125, 42)"
|
||
* "Typed 10 chars at (125, 42)"
|
||
* "Dragged [10] (50, 100) → [20] (400, 300)"
|
||
* "Dragged from (50, 100) to (400, 300)"
|
||
*/
|
||
function parseCoordinates(
|
||
toolName: string,
|
||
output: unknown,
|
||
): { x: number; y: number; x2?: number; y2?: number } | null {
|
||
const text = extractText(output)
|
||
if (!text) return null
|
||
|
||
// Drag with two coordinate pairs: "(x1, y1) → ... (x2, y2)" or "from (x1, y1) to (x2, y2)"
|
||
if (toolName === 'drag' || toolName === 'drag_at') {
|
||
const dragMatch = text.match(
|
||
/\((\d+),\s*(\d+)\).*?(?:→|to)\s*.*?\((\d+),\s*(\d+)\)/,
|
||
)
|
||
if (dragMatch) {
|
||
return {
|
||
x: Number(dragMatch[1]),
|
||
y: Number(dragMatch[2]),
|
||
x2: Number(dragMatch[3]),
|
||
y2: Number(dragMatch[4]),
|
||
}
|
||
}
|
||
}
|
||
|
||
// Single coordinate: "at (x, y)" or just "(x, y)"
|
||
const singleMatch = text.match(/\((\d+),\s*(\d+)\)/)
|
||
if (singleMatch) {
|
||
return { x: Number(singleMatch[1]), y: Number(singleMatch[2]) }
|
||
}
|
||
|
||
return null
|
||
}
|
||
|
||
function extractText(output: unknown): string | null {
|
||
if (typeof output === 'string') return output
|
||
if (Array.isArray(output)) {
|
||
for (const item of output) {
|
||
if (item?.type === 'text' && typeof item.text === 'string')
|
||
return item.text
|
||
}
|
||
}
|
||
if (output && typeof output === 'object' && 'text' in output) {
|
||
return String((output as Record<string, unknown>).text)
|
||
}
|
||
return null
|
||
}
|
||
|
||
/**
|
||
* Parse messages.jsonl to extract actions with coordinates
|
||
*/
|
||
function parseMessages(messagesPath: string): ActionInfo[] {
|
||
const content = readFileSync(messagesPath, 'utf-8')
|
||
const lines = content.trim().split('\n')
|
||
const messages = lines.map((line) => JSON.parse(line))
|
||
|
||
const actions: ActionInfo[] = []
|
||
const pendingTools = new Map<
|
||
string,
|
||
{ toolName: string; screenshotNum: number }
|
||
>()
|
||
let screenshotNum = 0
|
||
|
||
for (const msg of messages) {
|
||
if (msg.type === 'tool-input-available') {
|
||
pendingTools.set(msg.toolCallId, {
|
||
toolName: msg.toolName,
|
||
screenshotNum: -1,
|
||
})
|
||
}
|
||
|
||
if (msg.type === 'tool-output-available') {
|
||
screenshotNum++
|
||
const pending = pendingTools.get(msg.toolCallId)
|
||
if (!pending) continue
|
||
|
||
if (!COORDINATE_TOOLS.has(pending.toolName)) {
|
||
pendingTools.delete(msg.toolCallId)
|
||
continue
|
||
}
|
||
|
||
const coords = parseCoordinates(pending.toolName, msg.output)
|
||
if (coords) {
|
||
actions.push({
|
||
screenshotNum,
|
||
toolName: pending.toolName,
|
||
cssX: coords.x,
|
||
cssY: coords.y,
|
||
cssX2: coords.x2,
|
||
cssY2: coords.y2,
|
||
})
|
||
}
|
||
|
||
pendingTools.delete(msg.toolCallId)
|
||
}
|
||
}
|
||
|
||
return actions
|
||
}
|
||
|
||
async function annotateScreenshot(
|
||
inputPath: string,
|
||
outputPath: string,
|
||
action: ActionInfo | null,
|
||
dpr: number,
|
||
): Promise<void> {
|
||
if (!action) {
|
||
copyFileSync(inputPath, outputPath)
|
||
return
|
||
}
|
||
|
||
const image = sharp(inputPath)
|
||
const metadata = await image.metadata()
|
||
// biome-ignore lint/style/noNonNullAssertion: sharp metadata always has dimensions for valid images
|
||
const imgWidth = metadata.width!
|
||
// biome-ignore lint/style/noNonNullAssertion: sharp metadata always has dimensions for valid images
|
||
const imgHeight = metadata.height!
|
||
|
||
const sx = Math.round(action.cssX * dpr)
|
||
const sy = Math.round(action.cssY * dpr)
|
||
|
||
let markersSvg = ''
|
||
|
||
// Primary marker (red crosshair)
|
||
markersSvg += `
|
||
<circle cx="${sx}" cy="${sy}" r="25" fill="none" stroke="red" stroke-width="4"/>
|
||
<circle cx="${sx}" cy="${sy}" r="6" fill="red" fill-opacity="0.6"/>
|
||
<line x1="${sx - 40}" y1="${sy}" x2="${sx - 10}" y2="${sy}" stroke="red" stroke-width="3"/>
|
||
<line x1="${sx + 10}" y1="${sy}" x2="${sx + 40}" y2="${sy}" stroke="red" stroke-width="3"/>
|
||
<line x1="${sx}" y1="${sy - 40}" x2="${sx}" y2="${sy - 10}" stroke="red" stroke-width="3"/>
|
||
<line x1="${sx}" y1="${sy + 10}" x2="${sx}" y2="${sy + 40}" stroke="red" stroke-width="3"/>
|
||
`
|
||
|
||
// Drag target marker (orange)
|
||
if (action.cssX2 !== undefined && action.cssY2 !== undefined) {
|
||
const sx2 = Math.round(action.cssX2 * dpr)
|
||
const sy2 = Math.round(action.cssY2 * dpr)
|
||
markersSvg += `
|
||
<circle cx="${sx2}" cy="${sy2}" r="25" fill="none" stroke="orange" stroke-width="4"/>
|
||
<circle cx="${sx2}" cy="${sy2}" r="6" fill="orange" fill-opacity="0.6"/>
|
||
<line x1="${sx}" y1="${sy}" x2="${sx2}" y2="${sy2}" stroke="orange" stroke-width="2" stroke-dasharray="8,4"/>
|
||
`
|
||
}
|
||
|
||
// Info box
|
||
const label2 =
|
||
action.cssX2 !== undefined
|
||
? ` → (${action.cssX2}, ${action.cssY2}) css`
|
||
: ''
|
||
const infoText = `${action.toolName}: (${action.cssX}, ${action.cssY}) css × ${dpr} dpr = (${sx}, ${sy}) px${label2}`
|
||
|
||
markersSvg += `
|
||
<rect x="10" y="10" width="${Math.min(infoText.length * 8 + 20, imgWidth - 20)}" height="50" fill="rgba(0,0,0,0.9)" rx="5"/>
|
||
<text x="20" y="30" fill="red" font-family="monospace" font-size="14" font-weight="bold">
|
||
Screenshot ${action.screenshotNum}: AFTER ${action.toolName}
|
||
</text>
|
||
<text x="20" y="50" fill="white" font-family="monospace" font-size="12">
|
||
${infoText}
|
||
</text>
|
||
`
|
||
|
||
const svg = `<svg width="${imgWidth}" height="${imgHeight}">${markersSvg}</svg>`
|
||
|
||
await image
|
||
.composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
|
||
.toFile(outputPath)
|
||
}
|
||
|
||
async function main() {
|
||
const args = process.argv.slice(2)
|
||
const flags = args.filter((a) => a.startsWith('--'))
|
||
const positional = args.filter((a) => !a.startsWith('--'))
|
||
|
||
if (positional.length === 0) {
|
||
console.log(
|
||
'Usage: bun run apps/eval/scripts/annotate-screenshots.ts <results-folder> [--dpr=2]',
|
||
)
|
||
console.log('')
|
||
console.log('Example:')
|
||
console.log(
|
||
' bun run apps/eval/scripts/annotate-screenshots.ts apps/eval/results/single/Amazon--3',
|
||
)
|
||
process.exit(1)
|
||
}
|
||
|
||
const dprFlag = flags.find((f) => f.startsWith('--dpr='))
|
||
let dpr = dprFlag ? Number(dprFlag.split('=')[1]) : 0
|
||
|
||
// Try reading DPR from metadata.json if not explicitly provided
|
||
if (!dpr) {
|
||
const metadataPath = join(positional[0], 'metadata.json')
|
||
if (existsSync(metadataPath)) {
|
||
const meta = JSON.parse(readFileSync(metadataPath, 'utf-8'))
|
||
dpr = meta.device_pixel_ratio ?? 0
|
||
if (dpr) console.log(`Read devicePixelRatio=${dpr} from metadata.json`)
|
||
}
|
||
}
|
||
if (!dpr) {
|
||
console.error(
|
||
'Error: devicePixelRatio not found in metadata.json. Provide --dpr=N flag.',
|
||
)
|
||
process.exit(1)
|
||
}
|
||
|
||
const resultsFolder = positional[0]
|
||
const messagesPath = join(resultsFolder, 'messages.jsonl')
|
||
const screenshotsDir = join(resultsFolder, 'screenshots')
|
||
const annotatedDir = join(screenshotsDir, 'annotated')
|
||
|
||
if (!existsSync(messagesPath)) {
|
||
console.error(`Error: messages.jsonl not found at ${messagesPath}`)
|
||
process.exit(1)
|
||
}
|
||
|
||
if (!existsSync(screenshotsDir)) {
|
||
console.error(`Error: screenshots directory not found at ${screenshotsDir}`)
|
||
process.exit(1)
|
||
}
|
||
|
||
mkdirSync(annotatedDir, { recursive: true })
|
||
|
||
console.log(`devicePixelRatio: ${dpr}`)
|
||
console.log('Parsing messages.jsonl...')
|
||
const actions = parseMessages(messagesPath)
|
||
|
||
console.log(`Found ${actions.length} actions with coordinates:`)
|
||
for (const action of actions) {
|
||
const dragInfo =
|
||
action.cssX2 !== undefined ? ` → (${action.cssX2}, ${action.cssY2})` : ''
|
||
console.log(
|
||
` Screenshot ${action.screenshotNum}: ${action.toolName} at (${action.cssX}, ${action.cssY})${dragInfo} css → (${Math.round(action.cssX * dpr)}, ${Math.round(action.cssY * dpr)}) px`,
|
||
)
|
||
}
|
||
console.log('')
|
||
|
||
const screenshots = readdirSync(screenshotsDir)
|
||
.filter((f) => f.endsWith('.png') && !f.includes('annotated'))
|
||
.sort((a, b) => {
|
||
const numA = parseInt(basename(a, '.png'), 10)
|
||
const numB = parseInt(basename(b, '.png'), 10)
|
||
return numA - numB
|
||
})
|
||
|
||
console.log(`Found ${screenshots.length} screenshots`)
|
||
|
||
const firstMeta = await sharp(join(screenshotsDir, screenshots[0])).metadata()
|
||
console.log(`Screenshot dimensions: ${firstMeta.width} x ${firstMeta.height}`)
|
||
console.log('')
|
||
|
||
const actionByScreenshot = new Map<number, ActionInfo>()
|
||
for (const action of actions) {
|
||
actionByScreenshot.set(action.screenshotNum, action)
|
||
}
|
||
|
||
console.log('Annotating screenshots...')
|
||
for (const ss of screenshots) {
|
||
const ssNum = parseInt(basename(ss, '.png'), 10)
|
||
const inputPath = join(screenshotsDir, ss)
|
||
const outputPath = join(annotatedDir, `${ssNum}_annotated.png`)
|
||
const action = actionByScreenshot.get(ssNum) || null
|
||
|
||
if (action) {
|
||
console.log(` ${ss} → annotated (${action.toolName})`)
|
||
} else {
|
||
console.log(` ${ss} → copied (no coordinates)`)
|
||
}
|
||
|
||
await annotateScreenshot(inputPath, outputPath, action, dpr)
|
||
}
|
||
|
||
console.log('')
|
||
console.log(`Done! Annotated screenshots saved to: ${annotatedDir}`)
|
||
}
|
||
|
||
main().catch((err) => {
|
||
console.error('Error:', err)
|
||
process.exit(1)
|
||
})
|