Files
BrowserOS/thoughts/shared
Felarof 1abbee638a Braintrust basic evals (#87)
* implement validator eval

* implement online eval foundation

* further implementing online evals

* enhance evaluation data logging

* implement LLM scoring, remove redundant EventEnricher

* cleanup

* fix build errs from merging, extend LLM scorer context

* settled evaluation framework

* update evals documentation

* fix evals screenshots

* fix typos

* Evals config moved to env variables and tested

* test

* Update manifest to 49.1

* Removed duplciate + button

* Just use previous way of registering tools as that is not required for evals

* Add claude commands for research, plan and implement

* evals2 research and plan

implementation plan

new implementation plan

* Evals2 implementation

test test

* Removed old eval hooks

Remove old evals hooks

* evals 2 added to env

* Eval2 enhancement plan

backup

* Make Braintrust project configurable

Make Braintrust project configurable

* Enhanced scorer -- using Gemini 2.5 pro for evaluation

backup v0.1

enhancement v0.2

v0.2

backup v0.3

backup v0.4

* Deleted old evals directory

* Clean up old evals code

* Bunch of fixes and improvements

backup

fixes 0.1

more fixes

fixes

more elaborate prompts

braintrust logger fix

* Renamed files

backup
2025-09-05 18:04:07 -07:00
..
2025-09-05 18:04:07 -07:00
2025-09-05 18:04:07 -07:00