EvSys

EvalResult

One benchmark scored against one arm at one moment.

Multiple EvalResult entries can attach to an ArmResult when an experiment carries several benchmarks under metadata.benchmark (e.g. a val set + a test set). The step field disambiguates in-loop validation rows (with their training-step value) from a single post-training row (step is None).

Attributes

attributenamestr
attributebenchmark_idstr | None
attributemetricsdict[str, float]
attributebreakdownsdict[str, Any]
attributeeval_secondsfloat
attributestepint | None
= None

None → scored once post-training; int → in-loop at that step.

attributetagslist[str]
= field(default_factory=list)

Functions

func__init__(self, name, benchmark_id, metrics, breakdowns, eval_seconds, step=None, tags=list()) -> None
paramself
paramnamestr
parambenchmark_idstr | None
parammetricsdict[str, float]
parambreakdownsdict[str, Any]
parameval_secondsfloat
paramstepint | None
= None
paramtagslist[str]
= list()

Returns

None

On this page