EvSys

step_metrics

Forward per-step metrics from a local JSONL log to a EvsysStore.

Two row shapes are accepted:

  • Nested (the SDK's own log_store output)::

    {"ts": 1700000000.0, "step": 50, "metrics": {"loss": 0.42, "lr": 1e-4}}

  • Flat (tinker_cookbook's native metrics.jsonl)::

    {"step": 50, "epoch": 0, "train_mean_nll": 0.42, "learning_rate": 1e-4, "time/get_batch": 3e-06, "time/step": 18.1, ...}

The forwarder normalizes both into a dict[str, float] via :func:_extract_metrics before pushing.

Until now researcher scripts have hand-rolled a forwarder loop after each training run (see composio-bench/training/backfill_step_metrics.py). This module gives them one call:

forward_step_metrics(store, run_id, run_dir)

It locates metrics.jsonl under run_dir (preferring run_dir/logs/), walks each row, and calls store.log_metrics(run_id=..., step=..., metrics=\{...\}). Per-row store errors are swallowed (logged), so a flaky upload doesn't drop the rest of the metrics.

Called automatically by Experiment._train_arm after each arm - researcher scripts using the OOP path don't have to think about it.

attributelogger
= logging.getLogger(__name__)
attributeDEFAULT_METRICS_FILE
= 'metrics.jsonl'
attribute__all__
= ['DEFAULT_METRICS_FILE', 'forward_step_metrics']
func_extract_metrics(row) -> dict[str, float] | None

Return the metrics dict from a row, normalizing nested + flat shapes.

Nested rows carry the dict under "metrics". Flat rows (tinker_cookbook) have each metric at the top level alongside step; we collect every non-meta numeric value and coerce to float. Returns None when no usable metrics can be extracted.

paramrowdict[str, Any]

Returns

dict[str, float] | None
funcforward_step_metrics(store, run_id, run_dir, *, metrics_file=DEFAULT_METRICS_FILE) -> int

Push every row of a metrics.jsonl to the store. Returns row count.

Returns 0 (silent no-op) if store is None, run_id is None, run_dir is missing, or the metrics file isn't found. Malformed JSON rows and rows missing step/metrics are skipped; per-row store exceptions are caught so one bad upload doesn't drop the rest.

paramstoreAny | None
paramrun_idstr | None
paramrun_dirstr | Path | None
parammetrics_filestr
= DEFAULT_METRICS_FILE

Returns

int
func_locate_metrics_file(run_dir, metrics_file) -> Path | None

Find metrics.jsonl under run_dir. Prefer \<run_dir>/logs/.

paramrun_dirPath
parammetrics_filestr

Returns

pathlib.Path | None