Commit Graph

23 Commits

Author SHA1 Message Date
shivammittal274
51de62fcb6 fix(eval): use CLAUDE_CODE_OAUTH_TOKEN for performance grader auth 2026-03-21 23:14:05 +05:30
shivammittal274
f157436e7d feat(eval): switch to Linux GitHub-hosted runner (#519)
* feat(eval): switch to ubuntu-latest runner, add OE-Clado config

- Switch workflow from self-hosted Mac Studio to ubuntu-latest
- Install BrowserOS Linux .deb in CI (no self-hosted runner needed)
- Add browseros-oe-clado-weekly.json config for orchestrator-executor
- Fix report chart to show date+time (not just date)
- Make BROWSEROS_BINARY configurable via env var

* feat(eval): add NopeCHA captcha solver extension to eval runs

- Auto-load NopeCHA extension in eval Chrome instances
- Works in incognito + headless mode
- CI workflow downloads NopeCHA before eval
- extensions/ directory gitignored (downloaded at runtime)

* feat(eval): per-config concurrency — different configs run in parallel

* feat(eval): remove concurrency limit — all runs execute in parallel
2026-03-21 23:04:45 +05:30
Nikhil
ba7892322b ci: run BrowserOS test suites on PRs (#514)
* ci: run browseros tests on pull requests

* refactor: rework 0320-github_action_for_tests based on feedback

* refactor: rework 0320-github_action_for_tests based on feedback

* chore: add CI artifacts to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove mikepenz/action-junit-report to fix check suite misattribution

The JUnit report action creates check runs that GitHub associates with the
CLA check suite instead of the Tests check suite, causing test reports to
appear under "CLA Assistant" in the PR checks UI.

Remove the action and rely on job status + step summary + artifact upload
for test result visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:36 -07:00
shivammittal274
4e90b4561a feat(eval): weekly eval pipeline with R2 uploads and trend dashboard (#516)
* feat(eval): weekly eval pipeline with R2 uploads and trend dashboard

Add infrastructure for running weekly evaluations and tracking score
trends over time:

- Auto-generated output dirs: results/{config-name}/{timestamp}/
  Each eval run gets its own timestamped folder, nothing is overwritten.

- upload-run.ts: uploads eval results to Cloudflare R2. Supports
  uploading a specific run or all un-uploaded runs for a config.

- weekly-report.ts: generates an interactive HTML dashboard from R2
  data. Config dropdown, trend chart with hover tooltips, searchable
  runs table. Groups runs by config name.

- viewer.html: client-facing 3-column run viewer (task list,
  screenshots with autoplay, agent stream with messages.jsonl).
  Shows performance grader axis breakdown with per-axis scores.

- browseros-agent-weekly.json: weekly benchmark config (kimi-k2p5,
  webbench-2of4-50, 10 workers, performance grader, headless).

- eval-weekly.yml: GitHub Actions workflow with cron (Saturday 6am)
  and manual trigger. Runs on self-hosted Mac Studio runner.
  Concurrency group ensures only one eval runs at a time.

- Dashboard updates: load previous runs, messages.jsonl viewer,
  grade badges show percentages, async stream loading.

- Grader updates: timeout 30min, max turns 100, DOM content
  verification guidance for performance grader.

* fix(eval): address Greptile review — injection, nested dirs, escaping

- Fix script injection in eval-weekly.yml: pass github.event.inputs
  through env var instead of interpolating into shell
- Fix /api/runs to enumerate nested results/{config}/{timestamp}/ dirs
- Fix /api/load-run to allow single-slash run names (config/timestamp)
- Add HTML escaping for R2-sourced values in weekly-report.ts
- Escape axis names in viewer.html renderAxesBreakdown

* fix(eval): fix biome lint — non-null assertion, template literals

* fix(eval): fix biome errors — replace var with let, fix inner function declaration

* fix(eval): address Greptile P2 issues

- isRunDir: check all subdirs for metadata.json, not just first 3
- eval-runner: guard configPath for dashboard-driven runs (fallback to 'eval')
- load-run: default unknown termination_reason to 'failed' not 'completed'

* feat(eval): make BROWSEROS_BINARY configurable via env var
2026-03-21 22:12:52 +05:30
Dani Akash
d965698905 fix: biome & tsc setup across repo (#493)
* fix: biome lint issues

* fix: code quality workflow

* fix: all lint issues

* chore: test lefthook pre-commit hook

* chore: test lefthook with agent file

* chore: revert test comment from lefthook verification

* feat: setup tsgo for typechecking agent

* fix: typecheck cli command

* fix: early return to prevent errors
2026-03-19 18:18:24 +05:30
Dani Akash
58adac17db feat: new workflows (#470) 2026-03-17 18:56:55 +05:30
Nikhil Sonti
304b3b3289 chore: remove update submodule sync 2026-03-13 09:14:57 -07:00
Nikhil Sonti
5cee158876 feat: update top issues action to include RFCs 2026-01-22 12:39:47 -08:00
Felarof
e0628e3506 Update SECURITY.md 2025-11-19 10:41:57 -08:00
Felarof
10efeb52dc Create SECURITY.md 2025-11-19 10:41:36 -08:00
Nikhil Sonti
05ca99bb62 fix: update top-issues yml to be more descriptive 2025-11-18 15:22:59 -08:00
Felarof
fa42090e31 Update github action 2025-11-07 09:21:11 -08:00
Felarof
a4f1fb14af Update update-agent-submodule.yml 2025-11-07 09:21:11 -08:00
Felarof
d8c7b96d4a Update github action 2025-11-07 09:13:59 -08:00
Felarof
1708d68cd3 Added github action to update agent submodule 2025-11-06 15:32:38 -08:00
Nikhil Sonti
913fb3f483 github action: top issues by vote 2025-10-31 16:01:20 -07:00
Nikhil Sonti
cfb2356a41 update issue template 2025-10-08 10:32:52 -07:00
Nikhil Sonti
3dbc5dc7ef minor 2025-10-08 10:31:33 -07:00
Nikhil Sonti
9cf520b957 update ISSUE_TEMPALTES 2025-10-08 10:29:12 -07:00
Nikhil Sonti
83c77ae41d update ISSUE_TEMPALTES 2025-10-08 10:25:00 -07:00
Nikhil Sonti
9fab4b0492 minor updates to template 2025-10-08 10:07:50 -07:00
Nikhil Sonti
cdb3461fb2 add issue template 2025-10-08 10:03:52 -07:00
Felarof
c2e4bd4605 Add CLA workflow 2025-08-21 14:58:08 -07:00