Files
BrowserOS/packages
shivammittal274 e51e2fad90 feat(eval): wire BrowserOS MCP into performance grader
Performance grader now connects to the live BrowserOS the agent just
used (still on the task page during Phase 3 grading) and can verify
state-change claims via read-only mcp__browseros__* tools. System
prompt teaches per-axis usage and caps live calls at 2-3 per task.

Adds mind2web-e2e-perf suite (10 online-mind2web tasks, Bedrock
Opus 4.6) for smoke-testing the new path.
2026-05-05 22:43:41 +05:30
..