sdft_data
SDFT data shaping - port the cookbook's distillation math, minus
orchestration. Pure functions over tinker types - no I/O, no
tinker_cookbook imports.
The Self-Distillation Fine-Tuning algorithm (Shenfeld et al., 2026):
- Student rollout - sample from the live student weights on the user question (no demo). Each sample is a completion + per-position logprobs.
- Teacher prompt - build a frozen-teacher prompt that contains the golden answer as an in-context demonstration.
- Teacher topK - append the student completion to the teacher prompt,
ask the teacher for its top-K token distribution at each completion
position via tinker's
topk_prompt_logprobssampling API. - CE distillation - train the student to match the teacher's renormalized top-K distribution at each position via cross_entropy.
This module owns Step 4's data shaping (turning teacher responses into
tinker.Datum objects with (N, K)-shaped target_tokens + weights)
plus the teacher-prompt helper for Step 2. The
:class:~evsys_sdk.algorithms.sdft.SDFT algorithm orchestrates 1-4.
attributelogger= logging.getLogger(__name__)attributeDEFAULT_DEMO_TEMPLATE= '{question}\n\nThis is an example for a response to the question:\n{golden_answer}\n\nNow answer with a response of your own, including the thinking process.'The cookbook's default demonstration template. Researchers usually
override this in the algorithm config (e.g. to bracket the golden answer
in \<answer>...\</answer> tags).
attribute__all__= ['CompletionSlice', 'DEFAULT_DEMO_TEMPLATE', 'SDFTDataset', 'SimpleSDFTDataset', 'build_teacher_forced_sequence', 'build_teacher_prompt', 'build_topk_targets', 'extract_completion_tokens', 'student_datum_from_rollout']funcbuild_teacher_prompt(*, question, golden_answer, tokenizer, system_prompt=None, demo_template=DEFAULT_DEMO_TEMPLATE, enable_thinking=None) -> tinker.ModelInputRender the teacher prompt (system + user-with-demo) → ModelInput.
The teacher gets to see the golden answer as a soft hint in the user
turn (via demo_template). Student completions then get appended to
this prompt for teacher-forced top-K scoring.
We use :func:~evsys_sdk.training.templates.messages_to_model_input
rather than the cookbook's Renderer.build_generation_prompt -
same end shape (HF chat-template applied with add_generation_prompt=True),
no Renderer class hierarchy needed.
paramquestionstrparamgolden_answerstrparamtokenizerAnyparamsystem_promptstr | None= Noneparamdemo_templatestr= DEFAULT_DEMO_TEMPLATEparamenable_thinkingbool | None= NoneReturns
tinker.tinker.ModelInputfuncstudent_datum_from_rollout(*, prompt, completion_tokens) -> tinker.DatumWrap a student rollout (prompt + sampled completion) as a tinker.Datum.
The Datum is what :func:build_topk_targets consumes: it carries the
full sequence as model_input and a per-position mask indicating
which positions are completion tokens (the ones the teacher scores).
Position alignment (matches the cookbook convention):
model_inputcoversprompt + completion[:-1](left-shifted)target_tokenscoverscompletion(the loss targets)maskis1on completion positions and0on prompt positions
paramprompttinker.ModelInputparamcompletion_tokensSequence[int]Returns
tinker.tinker.Datumfuncextract_completion_tokens(datum, teacher_prompt_len, *, max_context_length) -> CompletionSlicePull the completion tokens off a student Datum, truncating if
teacher_prompt + completion would exceed max_context_length.
The cookbook does the same step inline; here it's a named function so
the :class:~evsys_sdk.algorithms.sdft.SDFT algorithm and tests
share the implementation.
paramdatumtinker.Datumparamteacher_prompt_lenintparammax_context_lengthintReturns
evsys_sdk.training.sdft_data.CompletionSlicefuncbuild_teacher_forced_sequence(teacher_prompt, completion_tokens) -> tinker.ModelInputAppend completion tokens to a teacher prompt to form the sequence the teacher will score.
paramteacher_prompttinker.ModelInputparamcompletion_tokensSequence[int]Returns
tinker.tinker.ModelInputfuncbuild_topk_targets(*, student_data, completion_slices, teacher_topk_logprobs, topk=20, vocab_size=None, skip_first_n=3) -> tuple[list[tinker.Datum], dict[str, float]]Build cross_entropy Datums with (N, K) soft targets from a
batch of teacher top-K responses. Pure function - no I/O.
paramstudent_datalist[tinker.Datum]Student Datums from :func:student_datum_from_rollout. Their
mask tells us which positions are completion tokens; the
loss targets the K-best teacher tokens at each of those positions.
paramcompletion_sliceslist[CompletionSlice]Output of :func:extract_completion_tokens (one per datum). Carries
the teacher_prompt_len needed to index into teacher_topk_logprobs.
paramteacher_topk_logprobslist[list[list[tuple[int, float]] | None] | None]One list per datum. Each is a per-position list (length =
teacher_prompt + completion length); each position is either
None or a list of (token_id, logprob) tuples (the teacher's
top-K). On real tinker this comes from
sample_async(topk_prompt_logprobs=K).topk_prompt_logprobs.
paramtopkint= 20How many of the teacher's top tokens to keep (truncates if the teacher returned fewer).
paramvocab_sizeint | None= NoneIf set, drop teacher tokens >= vocab_size (handles special
tokens vLLM may emit outside the student's vocab).
paramskip_first_nint= 3Skip the first N completion positions from the loss. Matches the reference SDFT paper (default 3).
Returns
``(new_datums, metrics)`` where each new Datum has cross_entropyfunc_make_topk_datum(source, targets_NK, weights_NK) -> tinker.Datumparamsourcetinker.Datumparamtargets_NKtorch.Tensorparamweights_NKtorch.TensorReturns
tinker.tinker.Datum