shivammittal274
|
d383b5e344
|
feat(eval): add claude-generated run report artifact (#892)
* feat(eval): add claude-generated run report artifact
* fix(eval): install claude code cli for CI evals
* fix(eval): bypass claude code tool permissions
* Eval metrics configs (#932)
* feat(eval): add agisdk comparison metrics configs
* fix(eval): keep cdp crashes from aborting run
|
2026-05-04 21:09:06 +05:30 |
|
Nikhil
|
26afb826c6
|
feat(eval): add viewer manifest contract (#878)
* refactor(eval): canonicalize viewer manifest contract
* refactor(eval): publish canonical viewer manifests
* feat(eval): make r2 viewer use manifest artifact paths
* fix(eval): keep weekly report compatible with viewer manifests
* docs(eval): document r2 viewer manifest contract
* chore: self-review fixes
* fix: address review feedback for PR #878
|
2026-04-29 20:50:35 -07:00 |
|
Nikhil
|
84a79ba0a1
|
feat: refactor eval pipeline workflow (#875)
* feat(eval): add suite variant config bridge
* feat(eval): add stable run artifacts
* refactor(eval): add shared grader contract
* feat(eval): persist grader artifacts
* refactor(eval): rename runner layers
* refactor(eval): add executor backend boundary
* refactor(eval): split clado backend
* feat(eval): add workflow compatible cli
* feat(eval): add r2 publisher module
* ci(eval): migrate weekly workflow to eval cli
* docs(eval): document suite pipeline
* chore(eval): verify pipeline refactor
* fix: address review feedback for PR #875
* docs(eval): add env example
* docs(eval): explain suites and variants
* chore(eval): organize config layouts
* chore(eval): colocate grader python evaluators
|
2026-04-29 17:21:02 -07:00 |
|