Cookbook
End-to-end walkthroughs for the most common tasks.
Cookbook
Walkthroughs for the most common things you'll do with evsys-sdk.
Setup
cd evsys-sdk
uv sync
source .venv/bin/activateFor real Tinker training, also export TINKER_API_KEY=....
Run unit tests:
pytest tests/1. Hello world (no GPU, no network)
examples/01_local_mock_sft.py runs a deterministic mock SFT. It's the
fastest way to verify the framework is wired up right.
python examples/01_local_mock_sft.pyYou should see a final_checkpoint artifact and a metrics.jsonl in
examples/outputs/01/.../logs/.
2. Drive everything from YAML
examples/02_yaml_driven.py shows the same experiment expressed as a YAML
string. The library's canonical interface is YAML - Python configs are just
sugar that produces the same dict tree. This is what an external evolution
loop should mutate.
evsys validate examples/outputs/02/example.yaml --deep
evsys run examples/outputs/02/example.yaml3. Matrix campaigns
examples/03_matrix_campaign.py shows a 6-run sweep from one matrix: block.
For real campaigns (e.g. LoRA-rank × SFT-checkpoint fan-out), this is what
you'd write for a LoRA-rank × checkpoint fan-out.
4. Adding your own algorithm
examples/04_custom_algorithm.py registers a new training recipe via
@register_algorithm("cosine_toy"). The decorator + Pydantic Config is the
entire integration; YAMLs can immediately use kind: cosine_toy.
For external packages that should auto-extend the registry, declare an entry
point in your pyproject.toml:
[project.entry-points."evsys_sdk.algorithms"]
my_dpo = "my_pkg.algorithms:MyDPO"When evsys_sdk imports, every entry point in that group is
loaded. No fork needed.
5. Adding your own verifier (reward function)
examples/05_custom_verifier.py registers a Gaussian length-reward verifier.
Same pattern as algorithms; entry point group is
evsys_sdk.verifiers.
6. Real Tinker SFT
examples/06_tinker_sft_real.py - minimal real Tinker run, costs roughly $0.01.
For a real-world campaign you'd write a SFT-then-RL fan-out over LoRA rank with checkpointing at 50/75/90/100% of training steps, then evaluate everything and produce a single combined plot.
CLI
evsys list # everything in the registries
evsys list --kind algorithms
evsys schema algorithm sft # print Pydantic JSON schema
evsys validate config.yaml --deep # validate top-level + each kind/params block
evsys run config.yaml # run an experimentArchitecture map
ExperimentConfig (YAML)
├─ data_store: kind+params ────► DataStore (LocalDataStore / SupabaseDataStore / InMemory)
├─ log_store: kind+params ────► LogStore (JSONLLogStore / TensorBoardLogStore / Multiplex)
└─ runs / matrix:
└─ RunConfig
├─ data: source_kind + transforms[]
│ └─ Transform (jsonl_to_chat / identity / your own)
├─ model: name, load_checkpoint_path, renderer_name
├─ backend: kind+params ───► Backend (tinker / local / mock)
├─ algorithm: kind+params ──► Algorithm (sft / sdft / rl / local_sft / local_rl / mock_*)
└─ eval:
├─ inference: kind+params ► InferenceClient (tinker / local / mock)
└─ metrics[]: kind+params ► Metric (exact_match / pass_at_k / mean_reward / ...)Every box on the right is one extension point with its own registry and entry-point group.
Stores: local vs Supabase
The library defines DataStore and LogStore as protocols (see
src/evsys_sdk/protocols.py). Built-in implementations:
LocalDataStore(root-anchored filesystem JSONL/JSON)InMemoryDataStore(test-only)JSONLLogStore(append-only metrics.jsonl + hyperparams.json + artifacts.json)TensorBoardLogStore(SummaryWriter)MultiplexLogStore(fan-out to N children)
Supabase adapters live in evsys_sdk.adapters.supabase (extra:
pip install evsys-sdk[supabase]). They are optional - the core
library imports zero Supabase code.
Backends
Pick mock for tests, local for local TRL training, tinker for hosted
Tinker training. Algorithms are routed to the right code path based on
ctx.backend.name.
Modular extension cheat sheet
| Extension point | Decorator | Entry point group |
|---|---|---|
| Algorithm | @register_algorithm("name") | evsys_sdk.algorithms |
| Verifier | @register_verifier("name") | evsys_sdk.verifiers |
| Metric | @register_metric("name") | evsys_sdk.metrics |
| Backend | @register_backend("name") | evsys_sdk.backends |
| InferenceClient | @register_inference("name") | evsys_sdk.inference |
| Transform | @register_transform("name") | evsys_sdk.transforms |
| DataStore | @register_data_store("name") | evsys_sdk.data_stores |
| LogStore | @register_log_store("name") | evsys_sdk.log_stores |
Each registered class declares a Config: ClassVar[type] Pydantic model that
defines the legal field space - this is exactly what an evolutionary
algorithm needs to know to mutate the YAML safely.