runner
High-level eval runner: loads dataset, runs model eval, computes summary.
Generic, domain-agnostic. Project-specific eval harnesses (e.g. an API search
eval) live in their own repos and reuse this infra (score_rows, AliasMatcher,
load_eval_dataset, …).
funcload_eval_dataset(path) -> list[dict[str, Any]]Load an eval JSON file. Supports both:
- list of rows
[\{tool_slug, toolkit, queries\}, ...](v2 shape), or - dict with
results: [...](older 3-query shape).
parampathstr | PathReturns
list[dict[str, typing.Any]]funcevaluate_model(*, dataset_path, aliases_path, client, secondary_aliases_path=None, config=None, output_dir=None, progress=True) -> EvalArtifactsparamdataset_pathstr | Pathparamaliases_pathstr | PathparamclientInferenceClientparamsecondary_aliases_pathstr | Path | None= NoneparamconfigModelEvalConfig | None= Noneparamoutput_dirstr | Path | None= Noneparamprogressbool= TrueReturns
evsys_sdk.eval.runner.EvalArtifactsfunc_write_artifacts(artifacts, output_dir, *, kind) -> NoneparamartifactsEvalArtifactsparamoutput_dirstr | PathparamkindstrReturns
None