evaluators
In-loop evaluator adapters - wrap an :class:evsys_sdk.Benchmark
so the training loop can score it every N steps against the live
sampler.
The post-training scoring path (Experiment._eval_arm →
Benchmark.score(client)) takes a synchronous InferenceClient.
The training-loop eval slot hands evaluators a live async
SamplingClient. :class:_AsyncToSyncSampler bridges the two with
asyncio.run_coroutine_threadsafe + asyncio.to_thread so the
Benchmark iteration doesn't block the event loop and the existing
ChatTemplatedInference wrapper (which inspects _tokenizer) keeps
working unchanged.
The other half of the wiring is :func:build_in_loop_evaluators - a
metadata-aware factory that the three native algorithm composers call to
turn metadata.benchmark list entries with a run_every field into
:class:BenchmarkEvaluator instances ready for the
:class:~evsys_sdk.training.loop.TrainingLoop evaluators list.
attributelogger= logging.getLogger(__name__)attribute__all__= ['BenchmarkEvaluator', 'build_in_loop_evaluators']funcbuild_in_loop_evaluators(metadata, *, tokenizer, store=None, model_name=None, workspace_dir=None, run_id=None) -> list[BenchmarkEvaluator]Read metadata.benchmark and return one
:class:BenchmarkEvaluator per entry whose run_every > 0.
Single-dict benchmark and list form are both accepted (matches the
parser in :meth:evsys_sdk.experiment.Experiment._resolve_benchmarks).
Entries without run_every are post-training-only and silently
skipped here - they're handled by Experiment._eval_arm.
tokenizer is required (used by the async→sync sampler bridge);
store is required when an entry resolves a benchmark by id or
name rather than path.
parammetadatadict[str, Any] | NoneparamtokenizerAnyparamstoreAny= Noneparammodel_namestr | None= Noneparamworkspace_dirAny= Noneparamrun_idstr | None= NoneReturns
list[evsys_sdk.training.evaluators.BenchmarkEvaluator]