mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-18 02:57:47 +00:00
Performance grader now connects to the live BrowserOS the agent just used (still on the task page during Phase 3 grading) and can verify state-change claims via read-only mcp__browseros__* tools. System prompt teaches per-axis usage and caps live calls at 2-3 per task. Adds mind2web-e2e-perf suite (10 online-mind2web tasks, Bedrock Opus 4.6) for smoke-testing the new path.