chat_templated

ChatTemplatedInference - wrap an InferenceClient so eval-time inputs match the chat-templated distribution the model was trained on.

When a model is SFT'd on chat-templated sequences (HF tokenizer.apply_chat_template over a system + user + assistant message list), the assistant-side output format only emerges reliably when the prompt at inference time is shaped the same way. Benchmark.score() hands the client a raw task.instruction string with no role markers; this wrapper rebuilds the (system + user) opener around it before forwarding to the underlying client.

Project-specific values: the system_prompt (verbatim training-time string) and user_template (e.g. "Query: \{prompt\}" if user-turn content has a domain-specific prefix). With user_template left at the default "\{prompt\}" the raw instruction passes through unchanged.

Tokenizer-agnostic: any base client exposing a HF-style _tokenizer attribute works (e.g. TinkerInference).

attribute__all__

= ['ChatTemplatedInference']