Quickstart
Train your first model in five minutes - no GPU, no network - then point it at a real backend.
This takes you from a zero-setup mock wiring test to real local
training - no GPU required. The config shape is identical; you only swap the
backend.
1. Install
uv syncEverything ships in the base install - see Installation.
2. Describe the experiment
Everything about a run lives in one declarative config.yaml. The repo ships
this ready to run at examples/quickstart_experiment/ - two configs (a
mock wiring test and a real local train) plus a tiny run.py that loads
either. The only difference between the two configs is backend, model,
and algorithm - the data is identical.
name: quickstart_experiment
output_dir: ./outputs
log_store: { kind: jsonl }
run:
name: mock_sft_run
data:
source_kind: in_memory
rows:
- {query: "save a contact", tool_slug: OUTLOOK_CREATE_CONTACT, toolkit: OUTLOOK, description: "..."}
- {query: "edit a slack message", tool_slug: SLACK_UPDATES_A_SLACK_MESSAGE, toolkit: SLACK, description: "..."}
transforms:
- kind: jsonl_to_chat
params: {user_template: "Query: {query}", assistant_template: "<answer>{tool_slug}</answer>"}
model: { name: tiny/fake }
algorithm:
kind: mock_sft
params: {num_epochs: 1, batch_size: 1, save_at_fractions: [0.5, 1.0]}
backend: { kind: mock }name: quickstart_local
output_dir: ./outputs
log_store: { kind: jsonl }
run:
name: local_sft_run
data: # ← identical to the mock config
source_kind: in_memory
rows:
- {query: "save a contact", tool_slug: OUTLOOK_CREATE_CONTACT, toolkit: OUTLOOK, description: "..."}
- {query: "edit a slack message", tool_slug: SLACK_UPDATES_A_SLACK_MESSAGE, toolkit: SLACK, description: "..."}
transforms:
- kind: jsonl_to_chat
params: {user_template: "Query: {query}", assistant_template: "<answer>{tool_slug}</answer>"}
model: { name: Qwen/Qwen3-0.6B } # ← a real model
algorithm:
kind: local_sft # ← TRL + peft LoRA
params: {max_steps: 5, lora_rank: 4, lora_alpha: 8, max_seq_len: 256, warmup_steps: 2}
backend:
kind: local # ← trains on your machine
params: {dtype: float32, device: cpu} # cuda / mps if availableimport sys
from pathlib import Path
from evsys_sdk import load_yaml, run_experiment
HERE = Path(__file__).parent
def main():
config_name = sys.argv[1] if len(sys.argv) > 1 else "config.yaml"
cfg = load_yaml(HERE / config_name)
[result] = run_experiment(cfg)
print(f"Status: {result.status}")
print(f"Artifacts: {list(result.artifacts.keys())}")
print(f"Metrics: {result.metrics}")
if __name__ == "__main__":
main()3. Run the wiring test (mock)
The mock backend produces a deterministic loss curve and stub checkpoints -
no model, no GPU, no network. It's the fastest way to confirm everything is
wired up.
evsys validate examples/quickstart_experiment/config.yaml --deep # parse + check every kind/params block
python examples/quickstart_experiment/run.py # or: evsys run examples/quickstart_experiment/config.yamlStatus: completed
Artifacts: ['ckpt_step_1', 'ckpt_step_2', 'final_checkpoint']
Metrics: {'train/final_loss': 0.1278, 'total_steps': 2.0}4. Train for real (local)
Now swap to the local backend - it trains a LoRA adapter on Qwen3-0.6B
with transformers + TRL + peft, right on your machine. No GPU required; it runs
on CPU/MPS in about a minute for 5 steps (the first run downloads the model,
~1.2 GB).
python examples/quickstart_experiment/run.py config_local.yaml{'loss': '6.047', 'grad_norm': '1.9e+04', 'learning_rate': '6.7e-05', 'mean_token_accuracy': '0.42', 'epoch': '5'}
Status: completed
Artifacts: ['final_checkpoint', 'checkpoint-5']
Metrics: {'train/final_loss': 6.05}This writes a real adapter_model.safetensors (the trained LoRA), per-step
metrics.jsonl, and a resumable checkpoint-5/ under
examples/quickstart_experiment/outputs/. Bump max_steps to 100–500 for a
real run, or set backend.kind: tinker to train hosted.
5. The three training paradigms
evsys-sdk runs SFT, RL, and distillation behind one config shape. Each has
a ready-to-run config in examples/configs/ and a hands-on walkthrough - raw
data → transform → config → run:
6. Go further
📚 Learn the concepts
How the Experiment, data, algorithm, and eval layers fit together.
🍳 Cookbook
End-to-end recipes: YAML-driven runs, sweeps, RL, distillation.
⚙️ Pick a real backend
Swap mock for local (TRL+peft) or tinker (hosted).
🧩 Register your own
Add an algorithm, verifier, or backend without forking.