BrowserOS

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-18 11:06:19 +00:00

Author	SHA1	Message	Date
shivammittal274	51de62fcb6	fix(eval): use CLAUDE_CODE_OAUTH_TOKEN for performance grader auth	2026-03-21 23:14:05 +05:30
shivammittal274	f157436e7d	feat(eval): switch to Linux GitHub-hosted runner (#519 ) * feat(eval): switch to ubuntu-latest runner, add OE-Clado config - Switch workflow from self-hosted Mac Studio to ubuntu-latest - Install BrowserOS Linux .deb in CI (no self-hosted runner needed) - Add browseros-oe-clado-weekly.json config for orchestrator-executor - Fix report chart to show date+time (not just date) - Make BROWSEROS_BINARY configurable via env var * feat(eval): add NopeCHA captcha solver extension to eval runs - Auto-load NopeCHA extension in eval Chrome instances - Works in incognito + headless mode - CI workflow downloads NopeCHA before eval - extensions/ directory gitignored (downloaded at runtime) * feat(eval): per-config concurrency — different configs run in parallel * feat(eval): remove concurrency limit — all runs execute in parallel	2026-03-21 23:04:45 +05:30
Nikhil	ba7892322b	ci: run BrowserOS test suites on PRs (#514 ) * ci: run browseros tests on pull requests * refactor: rework 0320-github_action_for_tests based on feedback * refactor: rework 0320-github_action_for_tests based on feedback * chore: add CI artifacts to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove mikepenz/action-junit-report to fix check suite misattribution The JUnit report action creates check runs that GitHub associates with the CLA check suite instead of the Tests check suite, causing test reports to appear under "CLA Assistant" in the PR checks UI. Remove the action and rely on job status + step summary + artifact upload for test result visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 09:46:36 -07:00
shivammittal274	4e90b4561a	feat(eval): weekly eval pipeline with R2 uploads and trend dashboard (#516 ) * feat(eval): weekly eval pipeline with R2 uploads and trend dashboard Add infrastructure for running weekly evaluations and tracking score trends over time: - Auto-generated output dirs: results/{config-name}/{timestamp}/ Each eval run gets its own timestamped folder, nothing is overwritten. - upload-run.ts: uploads eval results to Cloudflare R2. Supports uploading a specific run or all un-uploaded runs for a config. - weekly-report.ts: generates an interactive HTML dashboard from R2 data. Config dropdown, trend chart with hover tooltips, searchable runs table. Groups runs by config name. - viewer.html: client-facing 3-column run viewer (task list, screenshots with autoplay, agent stream with messages.jsonl). Shows performance grader axis breakdown with per-axis scores. - browseros-agent-weekly.json: weekly benchmark config (kimi-k2p5, webbench-2of4-50, 10 workers, performance grader, headless). - eval-weekly.yml: GitHub Actions workflow with cron (Saturday 6am) and manual trigger. Runs on self-hosted Mac Studio runner. Concurrency group ensures only one eval runs at a time. - Dashboard updates: load previous runs, messages.jsonl viewer, grade badges show percentages, async stream loading. - Grader updates: timeout 30min, max turns 100, DOM content verification guidance for performance grader. * fix(eval): address Greptile review — injection, nested dirs, escaping - Fix script injection in eval-weekly.yml: pass github.event.inputs through env var instead of interpolating into shell - Fix /api/runs to enumerate nested results/{config}/{timestamp}/ dirs - Fix /api/load-run to allow single-slash run names (config/timestamp) - Add HTML escaping for R2-sourced values in weekly-report.ts - Escape axis names in viewer.html renderAxesBreakdown * fix(eval): fix biome lint — non-null assertion, template literals * fix(eval): fix biome errors — replace var with let, fix inner function declaration * fix(eval): address Greptile P2 issues - isRunDir: check all subdirs for metadata.json, not just first 3 - eval-runner: guard configPath for dashboard-driven runs (fallback to 'eval') - load-run: default unknown termination_reason to 'failed' not 'completed' * feat(eval): make BROWSEROS_BINARY configurable via env var	2026-03-21 22:12:52 +05:30
Dani Akash	d965698905	fix: biome & tsc setup across repo (#493 ) * fix: biome lint issues * fix: code quality workflow * fix: all lint issues * chore: test lefthook pre-commit hook * chore: test lefthook with agent file * chore: revert test comment from lefthook verification * feat: setup tsgo for typechecking agent * fix: typecheck cli command * fix: early return to prevent errors	2026-03-19 18:18:24 +05:30
Dani Akash	58adac17db	feat: new workflows (#470 )	2026-03-17 18:56:55 +05:30
Nikhil Sonti	304b3b3289	chore: remove update submodule sync	2026-03-13 09:14:57 -07:00
Nikhil Sonti	5cee158876	feat: update top issues action to include RFCs	2026-01-22 12:39:47 -08:00
Nikhil Sonti	05ca99bb62	fix: update top-issues yml to be more descriptive	2025-11-18 15:22:59 -08:00
Felarof	fa42090e31	Update github action	2025-11-07 09:21:11 -08:00
Felarof	a4f1fb14af	Update update-agent-submodule.yml	2025-11-07 09:21:11 -08:00
Felarof	d8c7b96d4a	Update github action	2025-11-07 09:13:59 -08:00
Felarof	1708d68cd3	Added github action to update agent submodule	2025-11-06 15:32:38 -08:00
Nikhil Sonti	913fb3f483	github action: top issues by vote	2025-10-31 16:01:20 -07:00
Felarof	c2e4bd4605	Add CLA workflow	2025-08-21 14:58:08 -07:00

15 Commits