EvSys

ArmResult

One arm of the experiment (one expanded RunConfig).

Attributes

attributenamestr
attributerun_configRunConfig
attributestatusstr
attributemetricsdict[str, float]
= field(default_factory=dict)

Training-side metrics from RunResult.metrics.

attributeeval_metricsdict[str, float]
= field(default_factory=dict)

Back-compat alias - mirrors evals[\<primary>].metrics. primary is the first test-tagged post-training eval, or the first post-training eval, or the first eval overall.

attributeeval_breakdownsdict[str, Any]
= field(default_factory=dict)
attributeevalslist[EvalResult]
= field(default_factory=list)

All benchmarks scored against this arm - see EvalResult.

attributerun_idstr | None
= None
attributeerrorstr | None
= None
attributetrain_secondsfloat | None
= None
attributeeval_secondsfloat | None
= None
attributerun_resultRunResult | None
= None

Raw underlying RunResult for advanced consumers.

attributegroup_idstr | None
= None

Dashboard group_id when n_repeats > 1 (else None).

attributegroup_namestr | None
= None

Primary's name (= the group's name) when grouped; else None.

Functions

funceval(self, name, *, step=None) -> EvalResult | None

Look up an eval result by benchmark name.

step=None (default) → return the post-training row (step is None) if present; else the last in-loop row for that benchmark.

step=\<int> → return the exact in-loop row at that step (or None if no exact match - callers can do their own nearest-step lookup over arm.evals).

paramself
paramnamestr
paramstepint | None
= None

Returns

evsys_sdk.experiment.EvalResult | None
funcscore(self, metric) -> float | None

Return the metric value for ranking by success_metric.

Three forms supported:

  • "pass_rate" (bare) - looked up on eval_metrics (the primary post-training eval), then metrics (training-side).
  • "bench.pass_rate" (dotted) - looked up on arm.eval("bench").metrics["pass_rate"]. The dotted form wins when an experiment carries multiple named benchmarks and success_metric picks one explicitly.
  • Returns None if neither path resolves.
paramself
parammetricstr

Returns

float | None
func__init__(self, name, run_config, status, metrics=dict(), eval_metrics=dict(), eval_breakdowns=dict(), evals=list(), run_id=None, error=None, train_seconds=None, eval_seconds=None, run_result=None, group_id=None, group_name=None) -> None
paramself
paramnamestr
paramrun_configRunConfig
paramstatusstr
parammetricsdict[str, float]
= dict()
parameval_metricsdict[str, float]
= dict()
parameval_breakdownsdict[str, Any]
= dict()
paramevalslist[EvalResult]
= list()
paramrun_idstr | None
= None
paramerrorstr | None
= None
paramtrain_secondsfloat | None
= None
parameval_secondsfloat | None
= None
paramrun_resultRunResult | None
= None
paramgroup_idstr | None
= None
paramgroup_namestr | None
= None

Returns

None

On this page