Architecture
The whole system at a glance - one ExperimentConfig, five layers, eight registries.
evsys-sdk is organized around a single idea: one ExperimentConfig (YAML)
drives everything, and every kind: in it resolves through a registry.
The Experiment - the organizing unit
The Experiment is the top of the system. It's the scientific container: you
give it a hypothesis and one or more runs, and it produces a best arm and an
auto-synthesized conclusion. Concretely, Experiment.run():
- Expands the config - a single
run, a list ofruns, or amatrixshorthand - into one or more arms (each arm is oneRunConfig= one training job).n_repeatsreplicates an arm across seeds for variance. - Runs each arm through the same pipeline - Data → Training → Evaluation - with per-arm failure isolation.
- Picks the best arm by your
success_metricand records anExperimentResult(best_arm,conclusion, metrics).
So an experiment is one YAML that can fan out into a whole campaign of runs, each of which is the left-to-right pipeline you saw on the introduction. Blue is what you author; white pillars and the grey result are what the SDK owns.
The five layers
| Layer | What it does | Start here |
|---|---|---|
| ① Experiment | The organizing unit - a hypothesis, one or more runs, an auto-synthesized conclusion. | Experiments |
| ② Data | Raw sources → ordered transforms → standardized typed rows. | Data |
| ③ Algorithm | One contract - train(ctx) -> RunResult - over any tinker-compatible backend. | Algorithms |
| ④ Evaluation | A test/validation firewall: Benchmark (once) vs Validation (in-loop). | Algorithms |
| ⑤ Plugins | Eight registries - implement a protocol, register a kind, reference it in YAML. | Plugins |
Everything below this page drills into one layer. If you just want to run something, jump to the Quickstart.