sft_data
SFT data shaping - turn \{messages: [...]\} rows into tinker.Datums.
Pure function over the tokenizer + max_seq_len + enable_thinking; no
tinker_cookbook imports, no global state. The SFT algorithm calls
sft_tokenize once at construction time and reads tokenized Datums out of
the result in each build_batch call.
Output shape: list[tinker.Datum] where each Datum has
model_input- token ids of the entire chat-templated sequenceloss_fn_inputs["weights"]- per-position float mask,1.0on assistant tokens,0.0elsewhere.
This is the shape tinker's server-side cross_entropy loss consumes;
see training_client.forward_backward_async(loss_fn="cross_entropy").
attributelogger= logging.getLogger(__name__)attributeSupervise= Literal['all_assistant', 'last_assistant']Which assistant turns the loss is computed on. This is an algorithm
decision (passed in by the composer), NOT a property of the dataset -
:class:~evsys_sdk.data_types.ChatMessagesRow carries only the conversation.
attribute__all__= ['row_to_datum', 'sft_tokenize']funcsft_tokenize(rows, tokenizer, *, max_seq_len, enable_thinking=None, supervise='all_assistant') -> list[tinker.Datum]Tokenize every :class:ChatMessagesRow → list of tinker.Datum.
supervise selects which assistant turns are loss-masked
(all_assistant = every assistant turn, last_assistant = only the
final one). Rows that render zero supervised tokens (e.g. system+user only,
or no assistant turn) are silently dropped; a warning logs the count.
paramrowsSequence[ChatMessagesRow]paramtokenizerAnyparammax_seq_lenintparamenable_thinkingbool | None= NoneparamsuperviseSupervise= 'all_assistant'Returns
list[tinker.tinker.Datum]funcrow_to_datum(row, tokenizer, *, max_seq_len, enable_thinking=None, supervise='all_assistant') -> tinker.Datum | NoneSingle-row tokenize + assistant-span loss mask → tinker.Datum.
Strategy: render the full sequence once for the input ids; then walk
the messages and for each supervised assistant turn, render up to (and
not including) it with add_generation_prompt=True to find the prefix
span, render through it without to find the end. The diff is the
assistant token span - mark it 1.0 in the weight vector. supervise
selects which assistant turns count (all, or just the last).
Returns None when the row has no assistant turn or the supervised
span ends up empty after truncation to max_seq_len.
paramrowChatMessagesRowparamtokenizerAnyparammax_seq_lenintparamenable_thinkingbool | None= NoneparamsuperviseSupervise= 'all_assistant'Returns
tinker.tinker.Datum | Nonefunc_build_datum(ids, weights) -> tinker.DatumConstruct the Datum the cross_entropy loss expects.
Tinker's CE loss is left-shifted (predicts position t from position
t-1), so model_input covers positions [0, N-1], target_tokens
covers [1, N], and weights aligns with target_tokens.
paramidslist[int]paramweightslist[float]Returns
tinker.tinker.Datum