EvSys

evaluators

In-loop evaluator adapters - wrap an :class:evsys_sdk.Benchmark so the training loop can score it every N steps against the live sampler.

The post-training scoring path (Experiment._eval_armBenchmark.score(client)) takes a synchronous InferenceClient. The training-loop eval slot hands evaluators a live async SamplingClient. :class:_AsyncToSyncSampler bridges the two with asyncio.run_coroutine_threadsafe + asyncio.to_thread so the Benchmark iteration doesn't block the event loop and the existing ChatTemplatedInference wrapper (which inspects _tokenizer) keeps working unchanged.

The other half of the wiring is :func:build_in_loop_evaluators - a metadata-aware factory that the three native algorithm composers call to turn metadata.benchmark list entries with a run_every field into :class:BenchmarkEvaluator instances ready for the :class:~evsys_sdk.training.loop.TrainingLoop evaluators list.

attributelogger
= logging.getLogger(__name__)
attribute__all__
= ['BenchmarkEvaluator', 'build_in_loop_evaluators']
funcbuild_in_loop_evaluators(metadata, *, tokenizer, store=None, model_name=None, workspace_dir=None, run_id=None) -> list[BenchmarkEvaluator]

Read metadata.benchmark and return one :class:BenchmarkEvaluator per entry whose run_every > 0.

Single-dict benchmark and list form are both accepted (matches the parser in :meth:evsys_sdk.experiment.Experiment._resolve_benchmarks). Entries without run_every are post-training-only and silently skipped here - they're handled by Experiment._eval_arm.

tokenizer is required (used by the async→sync sampler bridge); store is required when an entry resolves a benchmark by id or name rather than path.

parammetadatadict[str, Any] | None
paramtokenizerAny
paramstoreAny
= None
parammodel_namestr | None
= None
paramworkspace_dirAny
= None
paramrun_idstr | None
= None

Returns

list[evsys_sdk.training.evaluators.BenchmarkEvaluator]