EvSys

CheckpointManager

Decide WHEN to save and WRITE the manifest row when we do.

The decision policy is intentionally tiny - should_save(step) returns True every save_every steps (after the optimizer step at that index). The actual save call is dispatched by the loop, which has the live Backend handle; the manager only records the resulting paths.

Final-step save is unconditional: the loop calls save_final(...) after its for-loop completes, so even when save_every doesn't land exactly on the last step the final sampler is always recorded - that's the URI downstream eval consumes.

Attributes

attributelog_path
= Path(log_path)
attributesave_every
= max(0, int(save_every))
attributemanifest_path
= self.log_path / MANIFEST_NAME
attributerowslist[ManifestRow]

Functions

func__init__(self, *, log_path, save_every) -> None
paramself
paramlog_pathPath
paramsave_everyint

Returns

None
funcshould_save(self, step) -> bool

Save after the optimizer step at index step (zero-based).

The convention matches tinker_cookbook: (step + 1) % save_every == 0. Disabled when save_every == 0.

paramself
paramstepint

Returns

bool
funcrecord(self, row) -> None

Append one row to the manifest on disk and remember it.

paramself
paramrowManifestRow

Returns

None
funcfind_resume(self) -> Checkpoint | None

Find the most-recent recorded checkpoint to resume training from.

Reads the existing checkpoints.jsonl (if any) under log_path and picks the last row that has a state_path. The loop hands the state_path back to the backend to recreate the training client with optimizer state intact.

Returns None when no resumable checkpoint is on disk - caller should start fresh.

paramself

Returns

evsys_sdk.checkpoint.Checkpoint | None

On this page