BasicLoopAgent
Drive Chat(TinkerLLM(model_path)) and record the rollout: token-level
rollout_details + the completion text on the AgentContext, and the
completion written to the agent dir for the host-side verifier to score.
Functions
func__init__(self, *, model_name, model_path=None, renderer_name=None, max_tokens=512, temperature=1.0, max_turns=1, system_prompt=None, model_client='tinker', api_base=None, **kw) -> Noneparamselfparammodel_namestrparammodel_pathstr | None= Noneparamrenderer_namestr | None= Noneparammax_tokensint= 512paramtemperaturefloat= 1.0parammax_turnsint= 1paramsystem_promptstr | None= Noneparammodel_clientstr= 'tinker'paramapi_basestr | None= NoneparamkwAny= {}Returns
Nonefuncname() -> strReturns
strfuncversion(self) -> str | NoneparamselfReturns
str | Nonefuncsetup(self, environment) -> NoneparamselfparamenvironmentBaseEnvironmentReturns
Nonefunc_build_llm(self) -> AnyThe harbor sampler for this rollout. model_client picks it:
"tinker" → on-policy TinkerLLM (needs model_path);
"litellm" → harbor's litellm LLM for any provider (model_name a
litellm string, e.g. "anthropic/claude-opus-4-1"; keys from the
provider env vars).
collect_rollout_details is ON for tinker (we want token-level
rollouts for training) but OFF for litellm: it makes harbor request
logprobs + extra_body.return_token_ids, which closed APIs
(Anthropic, OpenAI) reject with a 400. API-model benchmarking only needs
the completion + reward + usage (cost/tokens), not token ids - those are
still captured from the response. See _trial_to_trajectory.
paramselfReturns
typing.Anyfunc_cache_key(self) -> tupleparamselfReturns
tuplefunc_shared_llm(self) -> AnyThe cached LLM for this rollout's (loop, config) - built once per harbor job and reused by every trial (see the module cache note). The first build is warmed under a per-loop lock so concurrent trials share ONE sampling client instead of each creating their own.
paramselfReturns
typing.Anyfuncrun(self, instruction, environment, context) -> NoneparamselfparaminstructionstrparamenvironmentBaseEnvironmentparamcontextAgentContextReturns
None