BaseAlgorithmConfig
Fields common to every training algorithm.
Concrete algorithms subclass this and add their own knobs (SDFT: topk;
RL: num_samples / verifier_name / …). extra="forbid" so a YAML
typo fails loudly. A subclass may re-declare a field to change its default
(e.g. RL drops learning_rate to 1e-5).
Attributes
attributemodel_config= ConfigDict(extra='forbid')attributelearning_ratefloat= 0.0001attributenum_epochsint= 1attributebatch_sizeint= Field(default=4, gt=0)attributemax_stepsint | None= NoneHard step cap. When set, wins over num_epochs * steps_per_epoch.
attributelora_rankint= 8attributerenderer_namestr | None= Noneattributeenable_thinkingbool | None= Noneattributesave_everyint= 0If 0, computed from save_at_fractions (GCD-of-marks heuristic).
attributesave_at_fractionslist[float]= Field(default_factory=(lambda: [1.0]))attributecallbackslist[CallbackSpec]= Field(default_factory=list)attributeadam_beta1float= 0.9attributeadam_beta2float= 0.95attributeadam_epsfloat= 1e-08attributewandb_projectstr | None= Noneattributewandb_namestr | None= None