DataConfig

Where the training data comes from + how to transform it.

The preferred way to reference a dataset is by dataset_id (or dataset_name): the SDK pulls it from the dashboard into the local .evsys/ workspace and trains from that cache - so stored experiment scripts are portable and don't depend on local file layout. dataset_id / dataset_name take precedence over source_kind / path, which remain as an offline / dev fallback.

Attributes

attributesource_kindLiteral['jsonl', 'json', 'in_memory', 'hf_dataset']

= 'jsonl'

Which built-in source loader to use (ignored when dataset_id/name set).

attributedataset_idstr | None

= None

Dashboard dataset id - pulled into .evsys/ and trained from locally.

attributedataset_namestr | None

= None

Dashboard dataset name - resolved to the latest version's id, then pulled.

attributepathstr | None

= None

For 'jsonl' / 'json': absolute or store-relative path.

attributerowslist[dict[str, Any]] | None

= None

For 'in_memory'.

attributehf_datasetstr | None

= None

For 'hf_dataset': HuggingFace dataset id.

attributehf_splitstr

= 'train'

attributetransformslist[TransformSpec]

= Field(default_factory=list)

Applied in order to the raw rows.

DataConfig

Attributes

On this page