RLConfig
Config for :class:RL - shared knobs from :class:BaseAlgorithmConfig
plus RL/rollout-engine fields.
Attributes
attributelearning_ratefloat= 1e-05RL needs a lower LR than SFT - IS gradients can be large.
attributenum_samplesint= 1Rollouts per task (group_size); >= 2 enables the advantage baseline.
attributeverifier_namestr | None= NoneFallback verifier-fn name when a HarborTask's InProcessVerifier leaves
fn_name empty (normally the verifier rides per-row on the task).
attributemax_tokensint= 256attributetemperaturefloat= 1.0attributemax_turnsint= 1attributedrop_constant_rewardbool= Trueattributesystem_promptstr | None= Noneattributeuser_templatestr= '{prompt}'attributeagent_import_pathstr | None= NoneOverride the rollout agent harness (any harbor BaseAgent import path).
Default: our BasicLoopAgent.
attributen_concurrentint= 4attributemax_retriesint= 2