report
Pass@1/pass@3/pass^3 aggregation + retry-failure section.
Works on any per-row, per-query result shape: the caller supplies a function
that maps one query-result to a bool found, so the same scorer serves
model evals and any custom eval harness.
funcmodel_query_found(qresult, expected_slug, matcher) -> boolparamqresultdict[str, Any]paramexpected_slugstrparammatcherAliasMatcherReturns
boolfuncscore_rows(rows, *, matcher, query_found_fn, retry_report=None) -> EvalSummaryparamrowslist[dict[str, Any]]parammatcherAliasMatcherparamquery_found_fnCallable[[dict[str, Any], str, AliasMatcher], bool]paramretry_reportRetryReport | None= NoneReturns
evsys_sdk.eval.report.EvalSummaryfuncformat_summary_markdown(summary, *, title='Eval Summary') -> strparamsummaryEvalSummaryparamtitlestr= 'Eval Summary'Returns
str