mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-18 19:16:22 +00:00
* implement validator eval * implement online eval foundation * further implementing online evals * enhance evaluation data logging * implement LLM scoring, remove redundant EventEnricher * cleanup * fix build errs from merging, extend LLM scorer context * settled evaluation framework * update evals documentation * fix evals screenshots * fix typos * Evals config moved to env variables and tested * test * Update manifest to 49.1 * Removed duplciate + button * Just use previous way of registering tools as that is not required for evals * Add claude commands for research, plan and implement * evals2 research and plan implementation plan new implementation plan * Evals2 implementation test test * Removed old eval hooks Remove old evals hooks * evals 2 added to env * Eval2 enhancement plan backup * Make Braintrust project configurable Make Braintrust project configurable * Enhanced scorer -- using Gemini 2.5 pro for evaluation backup v0.1 enhancement v0.2 v0.2 backup v0.3 backup v0.4 * Deleted old evals directory * Clean up old evals code * Bunch of fixes and improvements backup fixes 0.1 more fixes fixes more elaborate prompts braintrust logger fix * Renamed files backup