rl

RL - on-policy reinforcement learning, rollouts run by harbor's engine.

Rollouts are handed to harbor's Job engine (retries, bounded concurrency, timeouts, persistence) via :func:evsys_sdk.training.harbor_engine.run_harbor_rollouts; this algorithm just turns HarborTask rows into the engine's inputs and assembles the returned trajectories into importance-sampling-loss Datums.

Composer plumbing lives in :class:~evsys_sdk.algorithms.base.BaseAlgorithm; RL supplies:

:meth:setup - parse HarborTask rows; stash the backend + the .evsys rollout workspace.
:meth:build_batch - save a sampler checkpoint (on-policy), roll out the batch via harbor, group-normalize advantages, emit IS-loss Datums.

The agent harness is pluggable: by default harbor runs our BasicLoopAgent (Chat(TinkerLLM)); set agent_import_path to any harbor BaseAgent.

attribute__all__

= ['RL', 'RLConfig']

RLConfig

RL

func_all_equal(xs) -> bool

paramxslist[float]

Returns

bool