rl
RL - on-policy reinforcement learning, rollouts run by harbor's engine.
Rollouts are handed to harbor's Job engine (retries, bounded
concurrency, timeouts, persistence) via
:func:evsys_sdk.training.harbor_engine.run_harbor_rollouts; this algorithm
just turns HarborTask rows into the engine's inputs and assembles the
returned trajectories into importance-sampling-loss Datums.
Composer plumbing lives in :class:~evsys_sdk.algorithms.base.BaseAlgorithm;
RL supplies:
- :meth:
setup- parseHarborTaskrows; stash the backend + the.evsysrollout workspace. - :meth:
build_batch- save a sampler checkpoint (on-policy), roll out the batch via harbor, group-normalize advantages, emit IS-loss Datums.
The agent harness is pluggable: by default harbor runs our BasicLoopAgent
(Chat(TinkerLLM)); set agent_import_path to any harbor BaseAgent.
attribute__all__= ['RL', 'RLConfig']