ArmResult
One arm of the experiment (one expanded RunConfig).
Attributes
attributenamestrattributerun_configRunConfigattributestatusstrattributemetricsdict[str, float]= field(default_factory=dict)Training-side metrics from RunResult.metrics.
attributeeval_metricsdict[str, float]= field(default_factory=dict)Back-compat alias - mirrors evals[\<primary>].metrics. primary
is the first test-tagged post-training eval, or the first
post-training eval, or the first eval overall.
attributeeval_breakdownsdict[str, Any]= field(default_factory=dict)attributeevalslist[EvalResult]= field(default_factory=list)All benchmarks scored against this arm - see EvalResult.
attributerun_idstr | None= Noneattributeerrorstr | None= Noneattributetrain_secondsfloat | None= Noneattributeeval_secondsfloat | None= Noneattributerun_resultRunResult | None= NoneRaw underlying RunResult for advanced consumers.
attributegroup_idstr | None= NoneDashboard group_id when n_repeats > 1 (else None).
attributegroup_namestr | None= NonePrimary's name (= the group's name) when grouped; else None.
Functions
funceval(self, name, *, step=None) -> EvalResult | NoneLook up an eval result by benchmark name.
step=None (default) → return the post-training row (step is None)
if present; else the last in-loop row for that benchmark.
step=\<int> → return the exact in-loop row at that step (or
None if no exact match - callers can do their own nearest-step
lookup over arm.evals).
paramselfparamnamestrparamstepint | None= NoneReturns
evsys_sdk.experiment.EvalResult | Nonefuncscore(self, metric) -> float | NoneReturn the metric value for ranking by success_metric.
Three forms supported:
"pass_rate"(bare) - looked up oneval_metrics(the primary post-training eval), thenmetrics(training-side)."bench.pass_rate"(dotted) - looked up onarm.eval("bench").metrics["pass_rate"]. The dotted form wins when an experiment carries multiple named benchmarks andsuccess_metricpicks one explicitly.- Returns
Noneif neither path resolves.
paramselfparammetricstrReturns
float | Nonefunc__init__(self, name, run_config, status, metrics=dict(), eval_metrics=dict(), eval_breakdowns=dict(), evals=list(), run_id=None, error=None, train_seconds=None, eval_seconds=None, run_result=None, group_id=None, group_name=None) -> Noneparamselfparamnamestrparamrun_configRunConfigparamstatusstrparammetricsdict[str, float]= dict()parameval_metricsdict[str, float]= dict()parameval_breakdownsdict[str, Any]= dict()paramevalslist[EvalResult]= list()paramrun_idstr | None= Noneparamerrorstr | None= Noneparamtrain_secondsfloat | None= Noneparameval_secondsfloat | None= Noneparamrun_resultRunResult | None= Noneparamgroup_idstr | None= Noneparamgroup_namestr | None= NoneReturns
None