workspace
Workspace - local cache for remote datasets/benchmarks.
Remote-first: datasets live in the backend (D20, accessed via EvsysStore
over the gateway). Streaming every row over HTTP during training is slow, so the
agent materializes a dataset to a local JSONL once and trains from the local
file. On pull_dataset the local copy is reused if present and complete;
otherwise it's fetched from remote, written, and cached.
Safe to cache: datasets are versioned and immutable per version, so a given
dataset_id never changes - the only risk is a partial pull, guarded by a
.meta.json manifest (atomic rename + complete flag + n_rows match).
The workspace root ($EVSYS_WORKSPACE or ./.evsys) writes a
self-ignoring .gitignore (*) on init, so nothing in it is ever tracked.
Rows are written raw (D17); MaterializedDataset carries the dataset's
format + transform so the trainer can render typed rows on read.
attribute__all__= ['Workspace', 'MaterializedDataset', 'read_jsonl_rows']funcread_jsonl_rows(path) -> list[dict[str, Any]]Read a materialized .evsys/ JSONL (one payload per line) to dicts.
parampathstrReturns
list[dict[str, typing.Any]]