RLConfig

Config for :class:RL - shared knobs from :class:BaseAlgorithmConfig plus RL/rollout-engine fields.

Attributes

attributelearning_ratefloat

= 1e-05

RL needs a lower LR than SFT - IS gradients can be large.

attributenum_samplesint

= 1

Rollouts per task (group_size); >= 2 enables the advantage baseline.

attributeverifier_namestr | None

= None

Fallback verifier-fn name when a HarborTask's InProcessVerifier leaves fn_name empty (normally the verifier rides per-row on the task).

attributemax_tokensint

= 256

attributetemperaturefloat

= 1.0

attributemax_turnsint

= 1

attributedrop_constant_rewardbool

= True

attributesystem_promptstr | None

= None

attributeuser_templatestr

= '{prompt}'

attributeagent_import_pathstr | None

= None

Override the rollout agent harness (any harbor BaseAgent import path). Default: our BasicLoopAgent.

attributen_concurrentint

= 4

attributemax_retriesint

= 2