Infrastructure for thousands of task-specialised models that learn continuously from every interaction.
Continually-learning models
Easily create continually-learning models on your own data.
One declarative YAML
Every experiment is a single ExperimentConfig - nothing hidden in scripts.
Autoresearch friendly
Make your coding agents fine-tune models on your own data on demand.
Easily customisable
Run any algorithm or data ablation, against any Tinker-protocol backend.
The main components
One ExperimentConfig ties together five layers. Every kind: in it resolves through a registry.
Experiment
The organizing unit - a hypothesis, one or more runs, an auto-synthesized conclusion and best_arm.
Data surface
Raw sources → ordered transforms → standardized typed rows that carry only data.
Algorithm surface
One contract - train(ctx) -> RunResult - over any tinker-compatible backend.
Evaluation
A test/validation firewall: Benchmark (once) vs Validation(in-loop), scored by metrics & verifiers.
Plugins
Eight registries - implement a protocol, register a kind, reference it in YAML.
API reference
263 pages auto-generated from the code - always in sync.
The overall structure
The whole system on one screen - one canonical config drives the Experiment layer, each run's data and algorithm surfaces, evaluation, and storage; the registries resolve every kind:.
Built for coding agents
Because everything is one declarative artifact, a coding agent can drive the whole loop programmatically - and customise it at every layer without forking the library:
- Launch & sweep. An agent edits the
ExperimentConfig- flip an algorithm, add amatrixaxis - and a whole campaign of runs expands from one file. - Register new parts. Every
kind:resolves through a registry, so an agent can add a brand-new algorithm, verifier, or backend with@register_algorithm(...)- no SDK edit. - Learn from results. Outcomes are structured (
ExperimentResult·best_arm·conclusion), so an agent can read them and decide the next experiment. - Train continuously. Weights chain via
init_from_checkpoint, so models keep learning across experiments instead of starting cold.
# A coding agent launches an experiment by writing this - and
# sweeps, swaps algorithms, or registers new components by editing it.
matrix:
axes:
algorithm.kind: [sft, dpo, grpo] # try three methods at once
algorithm.params.lr: [1.0e-5, 2.0e-5]
base_run:
data: { source: { kind: jsonl, params: { path: data/train.jsonl } } }
backend: { kind: tinker }