EvalResult
One benchmark scored against one arm at one moment.
Multiple EvalResult entries can attach to an ArmResult when an
experiment carries several benchmarks under metadata.benchmark
(e.g. a val set + a test set). The step field disambiguates
in-loop validation rows (with their training-step value) from a
single post-training row (step is None).
Attributes
attributenamestrattributebenchmark_idstr | Noneattributemetricsdict[str, float]attributebreakdownsdict[str, Any]attributeeval_secondsfloatattributestepint | None= NoneNone → scored once post-training; int → in-loop at that step.
attributetagslist[str]= field(default_factory=list)Functions
func__init__(self, name, benchmark_id, metrics, breakdowns, eval_seconds, step=None, tags=list()) -> Noneparamselfparamnamestrparambenchmark_idstr | Noneparammetricsdict[str, float]parambreakdownsdict[str, Any]parameval_secondsfloatparamstepint | None= Noneparamtagslist[str]= list()Returns
None