Algorithms

The training recipe - SFT, RL, distillation, or any custom loss - that decides what each batch contains.

An algorithm is a training recipe. It owns one thing the rest of the stack does not: how each step's batch is built and which loss it trains under. Everything else - backend allocation, step/save cadence, evaluators, logging, artifacts - is shared plumbing. Any algorithm is supported (LoRA on any model, any loss) by overriding a single method, so you write your own whenever the recipe differs from SFT/RL/SDFT.

The contract

At the protocol level (src/evsys_sdk/protocols.py), an algorithm declares two ClassVars and implements one method:

name: ClassVar[str] - registry key and the YAML algorithm.kind.
Config: ClassVar[type] - a Pydantic model for the recipe's params, validated against algorithm.params in YAML.
train(self, ctx: RunContext) -> RunResult - execute one full run.
- ctx: RunContext carries everything the run needs: run_id, output_dir, config (the parsed ExperimentConfig), data_store, log_store, backend, and a free-form extras dict (where the backend handles, the train_rows, the model_name, etc. live).
- Returns a RunResult: run_id, status ("completed"/"failed"/ "cancelled"), metrics (final scalars), artifacts (named paths, e.g. {"run_dir": ..., "checkpoint-final": ...}), and an optional error string.

The easy path: `BaseAlgorithm`

The Tinker-backed recipes don't reimplement train. They subclass BaseAlgorithm (src/evsys_sdk/algorithms/base.py), which owns the whole composer body and is itself the StepBuilder the loop drives. A concrete algorithm overrides just the per-recipe pieces:

async setup(self, ctx: RunContext, backend: TinkerBackend) -> None One-time prep before the loop. Stash per-algorithm state on self (tokenized datums for SFT; dataset + teacher client for SDFT/RL) and must set self._steps_per_epoch. The backend is the already-allocated LoRA training client; call backend.get_tokenizer(), backend.save_for_sampler(...), etc. Returns nothing.
async build_batch(self, step_idx: int) -> TrainingBatch Produce the batch for step_idx (0-based). SFT slices its static datums; RL and SDFT roll out on-policy here. The loss rides on the returned batch - there is no separate loss override. Returns a TrainingBatch.
step_metrics(self, step_idx: int, batch: TrainingBatch, fb_result: Any) -> dict[str, float] (Optional, defaults to {}.) Per-step metrics computed from the forward-backward result. fb_result is what the backend's forward_backward_*_async() future resolved to; inspect fb_result.loss_fn_outputs (one entry per Datum) to derive, e.g., train_mean_nll.

BaseAlgorithmConfig supplies the shared knobs every recipe inherits (extra="forbid"): learning_rate, num_epochs, batch_size, max_steps (a hard cap that wins over num_epochs * steps_per_epoch), lora_rank, renderer_name, enable_thinking, save_every / save_at_fractions, callbacks, and Adam knobs. A subclass adds its own and may re-declare a field to change its default (RL drops learning_rate to 1e-5).

`TrainingBatch` - where the loss lives

From src/evsys_sdk/training/loop.py, the dataclass build_batch returns:

data: list[tinker.Datum] - the step's training examples. The loss mask on each Datum decides which tokens are supervised (the algorithm's call, not the dataset's).
loss_fn: tinker.types.LossFnType | LossCallable - either a server-recognized string name ("cross_entropy", "importance_sampling", …) for a server-side loss, or a Python callable for a client-side custom loss. The loop dispatches the right backend method based on the type: forward_backward_async for a string, forward_backward_custom_async for a callable. A LossCallable is Callable[[model_output, batch_metadata], Any] returning a scalar tensor for the backward pass.
loss_fn_config: dict[str, Any] | None - extra loss params; only used when loss_fn is a string.
metrics: dict[str, float] - algorithm-precomputed per-step metrics (e.g. reward stats, teacher entropy) merged into the per-step log row.

This is the seam for any loss: focal loss, DPO, GRPO, a custom RL objective - override build_batch, set the loss_fn, and the shared loop drives it.

Use a built-in

algorithm:
  kind: sft                 # sft | rl | sdft | local_sft | local_rl | mock_sft
  params:
    learning_rate: 1.0e-4
    batch_size: 4
    max_steps: 200
    lora_rank: 8
    supervise: all_assistant   # sft-only: which assistant turns get loss
    save_at_fractions: [1.0]

Built-in	What it does
`sft`	Supervised fine-tuning on Tinker. Tokenizes chat rows to static datums with assistant-span masks; `loss_fn="cross_entropy"`. Knobs: `max_seq_len`, `supervise`.
`rl`	On-policy RL on Tinker. Rolls out the batch through harbor's Job engine, group-normalizes advantages, emits `loss_fn="importance_sampling"` Datums. Knobs: `num_samples`, `verifier_name`, `agent_import_path`, `n_concurrent`.
`sdft`	Self-distillation FT on Tinker. On-policy student rollout → frozen-teacher top-K scoring → `cross_entropy` soft-target distillation. Knobs: `topk`, `max_context_length`, `demo_template`.
`local_sft`	TRL `SFTTrainer` on the `local` backend (transformers + peft + trl). LoRA on any HF model, runs on CPU/MPS/CUDA.
`local_rl`	TRL `GRPOTrainer` on the `local` backend, reward driven by a registered verifier (`verifier_kind` / `verifier_params`).
`mock_sft`	Fake SFT for tests: logs a deterministic loss curve and writes stub checkpoints. No backend needed.

Create your own

Subclass a concrete recipe and override one method - here, SFT with a custom client-side loss:

from typing import ClassVar
from evsys_sdk.algorithms.sft import SFT, SFTConfig
from evsys_sdk.registry import register_algorithm
from evsys_sdk.training.loop import TrainingBatch


class FocalSFTConfig(SFTConfig):   # inherits extra="forbid" + all SFT knobs
    gamma: float = 2.0             # focal-loss focusing parameter


@register_algorithm("focal_sft")
class FocalSFT(SFT):
    name: ClassVar[str] = "focal_sft"
    Config: ClassVar[type] = FocalSFTConfig

    async def build_batch(self, step_idx: int) -> TrainingBatch:
        batch = await super().build_batch(step_idx)   # reuse SFT's slicing

        def focal_loss(model_output, meta):            # a LossCallable
            import torch
            logp = model_output.logprobs               # per-token target logprob
            p = logp.exp()
            return -((1 - p) ** self.cfg.gamma * logp).mean()

        batch.loss_fn = focal_loss                     # client-side custom loss
        return batch

algorithm:
  kind: focal_sft
  params:
    gamma: 2.0
    lora_rank: 8
    max_steps: 200

For a recipe built from scratch, subclass BaseAlgorithm instead and implement setup + build_batch (+ optionally step_metrics).

Ship it in a package

A third-party package self-registers via a Python entry point in the evsys_sdk.algorithms group:

[project.entry-points."evsys_sdk.algorithms"]
my_dpo = "my_pkg.algorithms:MyDPO"

Importing evsys_sdk imports the target module, running its @register_algorithm decorator - the recipe is selectable by name from any config.yaml, no SDK fork.

Algorithms

The contract

The easy path: `BaseAlgorithm`

`TrainingBatch` - where the loss lives

Use a built-in

Create your own

Ship it in a package

Backends

Agents

Verifiers

On this page

Algorithms

The contract

The easy path: BaseAlgorithm

TrainingBatch - where the loss lives

Use a built-in

Create your own

Ship it in a package

Backends

Agents

Verifiers

On this page

The easy path: `BaseAlgorithm`

`TrainingBatch` - where the loss lives