BasicLoopAgent

Drive Chat(TinkerLLM(model_path)) and record the rollout: token-level rollout_details + the completion text on the AgentContext, and the completion written to the agent dir for the host-side verifier to score.

Functions

func__init__

(self, *, model_name, model_path=None, renderer_name=None, max_tokens=512, temperature=1.0, max_turns=1, system_prompt=None, model_client='tinker', api_base=None, **kw) -> None

paramself

parammodel_namestr

parammodel_pathstr | None

= None

paramrenderer_namestr | None

= None

parammax_tokensint

= 512

paramtemperaturefloat

= 1.0

parammax_turnsint

= 1

paramsystem_promptstr | None

= None

parammodel_clientstr

= 'tinker'

paramapi_basestr | None

= None

paramkwAny

= {}

Returns

None

funcname() -> str

Returns

str

funcversion(self) -> str | None

paramself

Returns

str | None

funcsetup(self, environment) -> None

paramself

paramenvironmentBaseEnvironment

Returns

None

func_build_llm(self) -> Any

The harbor sampler for this rollout. model_client picks it: "tinker" → on-policy TinkerLLM (needs model_path); "litellm" → harbor's litellm LLM for any provider (model_name a litellm string, e.g. "anthropic/claude-opus-4-1"; keys from the provider env vars).

collect_rollout_details is ON for tinker (we want token-level rollouts for training) but OFF for litellm: it makes harbor request logprobs + extra_body.return_token_ids, which closed APIs (Anthropic, OpenAI) reject with a 400. API-model benchmarking only needs the completion + reward + usage (cost/tokens), not token ids - those are still captured from the response. See _trial_to_trajectory.

paramself

Returns

typing.Any

func_cache_key(self) -> tuple

paramself

Returns

tuple

func_shared_llm(self) -> Any

The cached LLM for this rollout's (loop, config) - built once per harbor job and reused by every trial (see the module cache note). The first build is warmed under a per-loop lock so concurrent trials share ONE sampling client instead of each creating their own.

paramself

Returns

typing.Any

funcrun(self, instruction, environment, context) -> None

paramself

paraminstructionstr

paramenvironmentBaseEnvironment

paramcontextAgentContext

Returns

None

BasicLoopAgent

Functions

On this page