BrowserOS

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-18 19:16:22 +00:00

Files

Felarof 1abbee638a Braintrust basic evals (#87 )

* implement validator eval

* implement online eval foundation

* further implementing online evals

* enhance evaluation data logging

* implement LLM scoring, remove redundant EventEnricher

* cleanup

* fix build errs from merging, extend LLM scorer context

* settled evaluation framework

* update evals documentation

* fix evals screenshots

* fix typos

* Evals config moved to env variables and tested

* test

* Update manifest to 49.1

* Removed duplciate + button

* Just use previous way of registering tools as that is not required for evals

* Add claude commands for research, plan and implement

* evals2 research and plan

implementation plan

new implementation plan

* Evals2 implementation

test test

* Removed old eval hooks

Remove old evals hooks

* evals 2 added to env

* Eval2 enhancement plan

backup

* Make Braintrust project configurable

Make Braintrust project configurable

* Enhanced scorer -- using Gemini 2.5 pro for evaluation

backup v0.1

enhancement v0.2

v0.2

backup v0.3

backup v0.4

* Deleted old evals directory

* Clean up old evals code

* Bunch of fixes and improvements

backup

fixes 0.1

more fixes

fixes

more elaborate prompts

braintrust logger fix

* Renamed files

backup

2025-09-05 18:04:07 -07:00

plans

Braintrust basic evals (#87 )

2025-09-05 18:04:07 -07:00

research

Braintrust basic evals (#87 )

2025-09-05 18:04:07 -07:00