EvSys

report

Pass@1/pass@3/pass^3 aggregation + retry-failure section.

Works on any per-row, per-query result shape: the caller supplies a function that maps one query-result to a bool found, so the same scorer serves model evals and any custom eval harness.

funcmodel_query_found(qresult, expected_slug, matcher) -> bool
paramqresultdict[str, Any]
paramexpected_slugstr
parammatcherAliasMatcher

Returns

bool
funcscore_rows(rows, *, matcher, query_found_fn, retry_report=None) -> EvalSummary
paramrowslist[dict[str, Any]]
parammatcherAliasMatcher
paramquery_found_fnCallable[[dict[str, Any], str, AliasMatcher], bool]
paramretry_reportRetryReport | None
= None

Returns

evsys_sdk.eval.report.EvalSummary
funcformat_summary_markdown(summary, *, title='Eval Summary') -> str
paramsummaryEvalSummary
paramtitlestr
= 'Eval Summary'

Returns

str