EvalResult

One benchmark scored against one arm at one moment.

Multiple EvalResult entries can attach to an ArmResult when an experiment carries several benchmarks under metadata.benchmark (e.g. a val set + a test set). The step field disambiguates in-loop validation rows (with their training-step value) from a single post-training row (step is None).

Attributes

attributenamestr

attributebenchmark_idstr | None

attributemetricsdict[str, float]

attributebreakdownsdict[str, Any]

attributeeval_secondsfloat

attributestepint | None

= None

None → scored once post-training; int → in-loop at that step.

attributetagslist[str]

= field(default_factory=list)

Functions

func__init__(self, name, benchmark_id, metrics, breakdowns, eval_seconds, step=None, tags=list()) -> None

paramself

paramnamestr

parambenchmark_idstr | None

parammetricsdict[str, float]

parambreakdownsdict[str, Any]

parameval_secondsfloat

paramstepint | None

= None

paramtagslist[str]

= list()

Returns

None

EvalResult

Attributes

Functions

On this page