Callback

Override any of these. Defaults are no-ops so callbacks don't have to implement every hook.

Hook signatures are positional + typed (not a generic event-bag) so IDE autocomplete and static type checking still work. Each hook fires at exactly one moment in the loop - see the docstrings.

All hooks are sync (def, not async). If you need to schedule async work, spawn asyncio.create_task(...) from inside the hook.

Functions

funcon_train_start(self, state) -> None

Fires once before the for-loop starts. Open log files, register a wandb run, snapshot the config - anything that should happen before the first step.

paramself

paramstateLoopState

Returns

None

funcon_train_end(self, state, artifacts) -> None

Fires once after the loop completes (including the final checkpoint save). Flush summary writes, close files.

paramself

paramstateLoopState

paramartifacts'LoopArtifacts'

Returns

None

funcon_step_end(self, state, step_idx, batch, metrics) -> None

Fires after every train step's metric row is written. The universal "do something per step" hook (printing, plotting, custom metric derivations, gradient debugging).

paramself

paramstateLoopState

paramstep_idxint

parambatch'TrainingBatch'

parammetricsdict[str, float]

Returns

None

funcon_checkpoint(self, state, row) -> None

Fires after each checkpoint manifest row is recorded. Useful for shipping to S3, pruning old checkpoints, kicking a side eval.

paramself

paramstateLoopState

paramrow'ManifestRow'

Returns

None

funcon_eval(self, state, step_idx, eval_name, metrics) -> None

Fires per evaluator after each in-loop eval completes. Useful for pushing to a dashboard, plotting val curves, driving early-stopping decisions. (Logger callbacks that also need the rollout predictions should use :meth:on_benchmark_eval, which the loop fires for benchmark evaluators with the full payload.)

paramself

paramstateLoopState

paramstep_idxint

parameval_namestr

parammetricsdict[str, float]

Returns

None

funcon_experiment_start(self, ctx) -> None

Fires once at the start of an experiment, before any arm. A logger creates the experiment record here (ctx.ids['experiment_id'] = ...).

paramself

paramctxLogContext

Returns

None

funcon_group_start(self, ctx, group_name) -> None

Fires when a new run-group is needed (n_repeats replicates or continual stages). A logger creates the group (ctx.ids[f'group:\{group_name\}'] = ...).

paramself

paramctxLogContext

paramgroup_namestr

Returns

None

funcon_run_start(self, ctx) -> None

Fires per arm, before training. ctx.run_config is set. A logger opens its run-scoped sink (wandb.init / create_run → ctx.ids['run_id']), reading ctx.ids['experiment_id'] / ctx.ids[f'group:\{ctx.group_name\}'] to parent it.

paramself

paramctxLogContext

Returns

None

funcon_benchmark_eval(self, ctx, eval_result, predictions, *, step=None) -> None

Fires per benchmark scored - in-loop (step = the train step) or post-training (step=None). Carries metrics + breakdowns + tags (on eval_result) AND the per-task prediction rows. A logger creates one eval row per (eval_result.name, step) and persists the predictions.

paramself

paramctxLogContext

parameval_result'EvalResult'

parampredictionslist[dict]

paramstepint | None

= None

Returns

None

funcon_run_end(self, ctx, run_result, arm) -> None

Fires per arm, after eval, before the run is marked completed. A logger flushes/closes its run-scoped sink (wandb.finish) and records the final status (update_run).

paramself

paramctxLogContext

paramrun_result'RunResult'

paramarm'ArmResult'

Returns

None

funcon_experiment_end(self, ctx, result) -> None

Fires once at the end of the experiment. Final summary / flush.

paramself

paramctxLogContext

paramresult'ExperimentResult'

Returns

None

Callback

Functions

On this page