EvSys

Callback

Override any of these. Defaults are no-ops so callbacks don't have to implement every hook.

Hook signatures are positional + typed (not a generic event-bag) so IDE autocomplete and static type checking still work. Each hook fires at exactly one moment in the loop - see the docstrings.

All hooks are sync (def, not async). If you need to schedule async work, spawn asyncio.create_task(...) from inside the hook.

Functions

funcon_train_start(self, state) -> None

Fires once before the for-loop starts. Open log files, register a wandb run, snapshot the config - anything that should happen before the first step.

paramself
paramstateLoopState

Returns

None
funcon_train_end(self, state, artifacts) -> None

Fires once after the loop completes (including the final checkpoint save). Flush summary writes, close files.

paramself
paramstateLoopState
paramartifacts'LoopArtifacts'

Returns

None
funcon_step_end(self, state, step_idx, batch, metrics) -> None

Fires after every train step's metric row is written. The universal "do something per step" hook (printing, plotting, custom metric derivations, gradient debugging).

paramself
paramstateLoopState
paramstep_idxint
parambatch'TrainingBatch'
parammetricsdict[str, float]

Returns

None
funcon_checkpoint(self, state, row) -> None

Fires after each checkpoint manifest row is recorded. Useful for shipping to S3, pruning old checkpoints, kicking a side eval.

paramself
paramstateLoopState
paramrow'ManifestRow'

Returns

None
funcon_eval(self, state, step_idx, eval_name, metrics) -> None

Fires per evaluator after each in-loop eval completes. Useful for pushing to a dashboard, plotting val curves, driving early-stopping decisions. (Logger callbacks that also need the rollout predictions should use :meth:on_benchmark_eval, which the loop fires for benchmark evaluators with the full payload.)

paramself
paramstateLoopState
paramstep_idxint
parameval_namestr
parammetricsdict[str, float]

Returns

None
funcon_experiment_start(self, ctx) -> None

Fires once at the start of an experiment, before any arm. A logger creates the experiment record here (ctx.ids['experiment_id'] = ...).

paramself
paramctxLogContext

Returns

None
funcon_group_start(self, ctx, group_name) -> None

Fires when a new run-group is needed (n_repeats replicates or continual stages). A logger creates the group (ctx.ids[f'group:\{group_name\}'] = ...).

paramself
paramctxLogContext
paramgroup_namestr

Returns

None
funcon_run_start(self, ctx) -> None

Fires per arm, before training. ctx.run_config is set. A logger opens its run-scoped sink (wandb.init / create_run → ctx.ids['run_id']), reading ctx.ids['experiment_id'] / ctx.ids[f'group:\{ctx.group_name\}'] to parent it.

paramself
paramctxLogContext

Returns

None
funcon_benchmark_eval(self, ctx, eval_result, predictions, *, step=None) -> None

Fires per benchmark scored - in-loop (step = the train step) or post-training (step=None). Carries metrics + breakdowns + tags (on eval_result) AND the per-task prediction rows. A logger creates one eval row per (eval_result.name, step) and persists the predictions.

paramself
paramctxLogContext
parameval_result'EvalResult'
parampredictionslist[dict]
paramstepint | None
= None

Returns

None
funcon_run_end(self, ctx, run_result, arm) -> None

Fires per arm, after eval, before the run is marked completed. A logger flushes/closes its run-scoped sink (wandb.finish) and records the final status (update_run).

paramself
paramctxLogContext
paramrun_result'RunResult'
paramarm'ArmResult'

Returns

None
funcon_experiment_end(self, ctx, result) -> None

Fires once at the end of the experiment. Final summary / flush.

paramself
paramctxLogContext
paramresult'ExperimentResult'

Returns

None

On this page