EvSys

runner

High-level eval runner: loads dataset, runs model eval, computes summary.

Generic, domain-agnostic. Project-specific eval harnesses (e.g. an API search eval) live in their own repos and reuse this infra (score_rows, AliasMatcher, load_eval_dataset, …).

funcload_eval_dataset(path) -> list[dict[str, Any]]

Load an eval JSON file. Supports both:

  • list of rows [\{tool_slug, toolkit, queries\}, ...] (v2 shape), or
  • dict with results: [...] (older 3-query shape).
parampathstr | Path

Returns

list[dict[str, typing.Any]]
funcevaluate_model(*, dataset_path, aliases_path, client, secondary_aliases_path=None, config=None, output_dir=None, progress=True) -> EvalArtifacts
paramdataset_pathstr | Path
paramaliases_pathstr | Path
paramclientInferenceClient
paramsecondary_aliases_pathstr | Path | None
= None
paramconfigModelEvalConfig | None
= None
paramoutput_dirstr | Path | None
= None
paramprogressbool
= True

Returns

evsys_sdk.eval.runner.EvalArtifacts
func_write_artifacts(artifacts, output_dir, *, kind) -> None
paramartifactsEvalArtifacts
paramoutput_dirstr | Path
paramkindstr

Returns

None