EvSys

RLConfig

Config for :class:RL - shared knobs from :class:BaseAlgorithmConfig plus RL/rollout-engine fields.

Attributes

attributelearning_ratefloat
= 1e-05

RL needs a lower LR than SFT - IS gradients can be large.

attributenum_samplesint
= 1

Rollouts per task (group_size); >= 2 enables the advantage baseline.

attributeverifier_namestr | None
= None

Fallback verifier-fn name when a HarborTask's InProcessVerifier leaves fn_name empty (normally the verifier rides per-row on the task).

attributemax_tokensint
= 256
attributetemperaturefloat
= 1.0
attributemax_turnsint
= 1
attributedrop_constant_rewardbool
= True
attributesystem_promptstr | None
= None
attributeuser_templatestr
= '{prompt}'
attributeagent_import_pathstr | None
= None

Override the rollout agent harness (any harbor BaseAgent import path). Default: our BasicLoopAgent.

attributen_concurrentint
= 4
attributemax_retriesint
= 2

On this page