EvSys

Cookbook

End-to-end walkthroughs for the most common tasks.

Cookbook

Walkthroughs for the most common things you'll do with evsys-sdk.

Setup

cd evsys-sdk
uv sync
source .venv/bin/activate

For real Tinker training, also export TINKER_API_KEY=....

Run unit tests:

pytest tests/

1. Hello world (no GPU, no network)

examples/01_local_mock_sft.py runs a deterministic mock SFT. It's the fastest way to verify the framework is wired up right.

python examples/01_local_mock_sft.py

You should see a final_checkpoint artifact and a metrics.jsonl in examples/outputs/01/.../logs/.

2. Drive everything from YAML

examples/02_yaml_driven.py shows the same experiment expressed as a YAML string. The library's canonical interface is YAML - Python configs are just sugar that produces the same dict tree. This is what an external evolution loop should mutate.

evsys validate examples/outputs/02/example.yaml --deep
evsys run examples/outputs/02/example.yaml

3. Matrix campaigns

examples/03_matrix_campaign.py shows a 6-run sweep from one matrix: block. For real campaigns (e.g. LoRA-rank × SFT-checkpoint fan-out), this is what you'd write for a LoRA-rank × checkpoint fan-out.

4. Adding your own algorithm

examples/04_custom_algorithm.py registers a new training recipe via @register_algorithm("cosine_toy"). The decorator + Pydantic Config is the entire integration; YAMLs can immediately use kind: cosine_toy.

For external packages that should auto-extend the registry, declare an entry point in your pyproject.toml:

[project.entry-points."evsys_sdk.algorithms"]
my_dpo = "my_pkg.algorithms:MyDPO"

When evsys_sdk imports, every entry point in that group is loaded. No fork needed.

5. Adding your own verifier (reward function)

examples/05_custom_verifier.py registers a Gaussian length-reward verifier. Same pattern as algorithms; entry point group is evsys_sdk.verifiers.

6. Real Tinker SFT

examples/06_tinker_sft_real.py - minimal real Tinker run, costs roughly $0.01.

For a real-world campaign you'd write a SFT-then-RL fan-out over LoRA rank with checkpointing at 50/75/90/100% of training steps, then evaluate everything and produce a single combined plot.

CLI

evsys list                       # everything in the registries
evsys list --kind algorithms
evsys schema algorithm sft          # print Pydantic JSON schema
evsys validate config.yaml --deep   # validate top-level + each kind/params block
evsys run config.yaml            # run an experiment

Architecture map

ExperimentConfig (YAML)
├─ data_store: kind+params  ────►  DataStore  (LocalDataStore / SupabaseDataStore / InMemory)
├─ log_store: kind+params   ────►  LogStore   (JSONLLogStore / TensorBoardLogStore / Multiplex)
└─ runs / matrix:
   └─ RunConfig
      ├─ data:  source_kind + transforms[]
      │            └─ Transform (jsonl_to_chat / identity / your own)
      ├─ model: name, load_checkpoint_path, renderer_name
      ├─ backend: kind+params  ───►  Backend  (tinker / local / mock)
      ├─ algorithm: kind+params ──►  Algorithm (sft / sdft / rl / local_sft / local_rl / mock_*)
      └─ eval:
         ├─ inference: kind+params ► InferenceClient (tinker / local / mock)
         └─ metrics[]: kind+params  ► Metric (exact_match / pass_at_k / mean_reward / ...)

Every box on the right is one extension point with its own registry and entry-point group.

Stores: local vs Supabase

The library defines DataStore and LogStore as protocols (see src/evsys_sdk/protocols.py). Built-in implementations:

  • LocalDataStore (root-anchored filesystem JSONL/JSON)
  • InMemoryDataStore (test-only)
  • JSONLLogStore (append-only metrics.jsonl + hyperparams.json + artifacts.json)
  • TensorBoardLogStore (SummaryWriter)
  • MultiplexLogStore (fan-out to N children)

Supabase adapters live in evsys_sdk.adapters.supabase (extra: pip install evsys-sdk[supabase]). They are optional - the core library imports zero Supabase code.

Backends

Pick mock for tests, local for local TRL training, tinker for hosted Tinker training. Algorithms are routed to the right code path based on ctx.backend.name.

Modular extension cheat sheet

Extension pointDecoratorEntry point group
Algorithm@register_algorithm("name")evsys_sdk.algorithms
Verifier@register_verifier("name")evsys_sdk.verifiers
Metric@register_metric("name")evsys_sdk.metrics
Backend@register_backend("name")evsys_sdk.backends
InferenceClient@register_inference("name")evsys_sdk.inference
Transform@register_transform("name")evsys_sdk.transforms
DataStore@register_data_store("name")evsys_sdk.data_stores
LogStore@register_log_store("name")evsys_sdk.log_stores

Each registered class declares a Config: ClassVar[type] Pydantic model that defines the legal field space - this is exactly what an evolutionary algorithm needs to know to mutate the YAML safely.

On this page