sft_data

SFT data shaping - turn \{messages: [...]\} rows into tinker.Datums.

Pure function over the tokenizer + max_seq_len + enable_thinking; no tinker_cookbook imports, no global state. The SFT algorithm calls sft_tokenize once at construction time and reads tokenized Datums out of the result in each build_batch call.

Output shape: list[tinker.Datum] where each Datum has

model_input - token ids of the entire chat-templated sequence
loss_fn_inputs["weights"] - per-position float mask, 1.0 on assistant tokens, 0.0 elsewhere.

This is the shape tinker's server-side cross_entropy loss consumes; see training_client.forward_backward_async(loss_fn="cross_entropy").

attributelogger

= logging.getLogger(__name__)

attributeSupervise

= Literal['all_assistant', 'last_assistant']

Which assistant turns the loss is computed on. This is an algorithm decision (passed in by the composer), NOT a property of the dataset - :class:~evsys_sdk.data_types.ChatMessagesRow carries only the conversation.

attribute__all__

= ['row_to_datum', 'sft_tokenize']

funcsft_tokenize(rows, tokenizer, *, max_seq_len, enable_thinking=None, supervise='all_assistant') -> list[tinker.Datum]

Tokenize every :class:ChatMessagesRow → list of tinker.Datum.

supervise selects which assistant turns are loss-masked (all_assistant = every assistant turn, last_assistant = only the final one). Rows that render zero supervised tokens (e.g. system+user only, or no assistant turn) are silently dropped; a warning logs the count.

paramrowsSequence[ChatMessagesRow]

paramtokenizerAny

parammax_seq_lenint

paramenable_thinkingbool | None

= None

paramsuperviseSupervise

= 'all_assistant'

Returns

list[tinker.tinker.Datum]

funcrow_to_datum(row, tokenizer, *, max_seq_len, enable_thinking=None, supervise='all_assistant') -> tinker.Datum | None

Single-row tokenize + assistant-span loss mask → tinker.Datum.

Strategy: render the full sequence once for the input ids; then walk the messages and for each supervised assistant turn, render up to (and not including) it with add_generation_prompt=True to find the prefix span, render through it without to find the end. The diff is the assistant token span - mark it 1.0 in the weight vector. supervise selects which assistant turns count (all, or just the last).

Returns None when the row has no assistant turn or the supervised span ends up empty after truncation to max_seq_len.

paramrowChatMessagesRow

paramtokenizerAny

parammax_seq_lenint

paramenable_thinkingbool | None

= None

paramsuperviseSupervise

= 'all_assistant'

Returns

tinker.tinker.Datum | None

func_build_datum(ids, weights) -> tinker.Datum

Construct the Datum the cross_entropy loss expects.

Tinker's CE loss is left-shifted (predicts position t from position t-1), so model_input covers positions [0, N-1], target_tokens covers [1, N], and weights aligns with target_tokens.

paramidslist[int]

paramweightslist[float]

Returns

tinker.tinker.Datum