BrowserOS

LLM/BrowserOS

Fork 0

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-21 12:55:09 +00:00

Commit Graph

Author	SHA1	Message	Date
Felarof	1abbee638a	Braintrust basic evals (#87 ) * implement validator eval * implement online eval foundation * further implementing online evals * enhance evaluation data logging * implement LLM scoring, remove redundant EventEnricher * cleanup * fix build errs from merging, extend LLM scorer context * settled evaluation framework * update evals documentation * fix evals screenshots * fix typos * Evals config moved to env variables and tested * test * Update manifest to 49.1 * Removed duplciate + button * Just use previous way of registering tools as that is not required for evals * Add claude commands for research, plan and implement * evals2 research and plan implementation plan new implementation plan * Evals2 implementation test test * Removed old eval hooks Remove old evals hooks * evals 2 added to env * Eval2 enhancement plan backup * Make Braintrust project configurable Make Braintrust project configurable * Enhanced scorer -- using Gemini 2.5 pro for evaluation backup v0.1 enhancement v0.2 v0.2 backup v0.3 backup v0.4 * Deleted old evals directory * Clean up old evals code * Bunch of fixes and improvements backup fixes 0.1 more fixes fixes more elaborate prompts braintrust logger fix * Renamed files backup	2025-09-05 18:04:07 -07:00

Author

SHA1

Message

Date

Felarof

1abbee638a

Braintrust basic evals (#87 )

* implement validator eval

* implement online eval foundation

* further implementing online evals

* enhance evaluation data logging

* implement LLM scoring, remove redundant EventEnricher

* cleanup

* fix build errs from merging, extend LLM scorer context

* settled evaluation framework

* update evals documentation

* fix evals screenshots

* fix typos

* Evals config moved to env variables and tested

* test

* Update manifest to 49.1

* Removed duplciate + button

* Just use previous way of registering tools as that is not required for evals

* Add claude commands for research, plan and implement

* evals2 research and plan

implementation plan

new implementation plan

* Evals2 implementation

test test

* Removed old eval hooks

Remove old evals hooks

* evals 2 added to env

* Eval2 enhancement plan

backup

* Make Braintrust project configurable

Make Braintrust project configurable

* Enhanced scorer -- using Gemini 2.5 pro for evaluation

backup v0.1

enhancement v0.2

v0.2

backup v0.3

backup v0.4

* Deleted old evals directory

* Clean up old evals code

* Bunch of fixes and improvements

backup

fixes 0.1

more fixes

fixes

more elaborate prompts

braintrust logger fix

* Renamed files

backup

2025-09-05 18:04:07 -07:00

1 Commits