REPL Environment for OpenEnv#
repl_env is an OpenEnv-native Python REPL environment for Recursive Language Model style execution. It now follows the current OpenEnv client/server conventions:
REPLEnvis the remote asyncEnvClient.sync()is the sync wrapper for remote usageLocalREPLEnvis the explicit in-process helperLocalRLMRunneris the higher-level orchestration loop for local recursive RLM runs
The architecture is intentionally split the same way the official rlm and DSPy implementations split things:
the environment executes code and exposes tools
the runner owns the iterative prompting loop
recursive behavior lives in backend/controller modules, not in the executor
Overview#
Inside the REPL, the model can:
inspect
contextexecute Python code across multiple turns with persistent state
call
llm_query(...)andllm_query_batched(...)call
rlm_query(...)andrlm_query_batched(...)for recursive child runs when configuredfinish with
FINAL(...),FINAL_VAR(...), oranswer = {"content": ..., "ready": True}
Current Architecture#
Main modules:
client.py: remote async OpenEnv clientlocal.py: explicit in-process local env helperrunner.py: local RLM orchestration looprecursive_backends.py: direct and recursive backend implementationsrecursive_controller.py: server-side backend/broker compositionrubrics.py: reward rubrics (OpenEnv RFC 004)server/repl_environment.py: server-side execution environmentserver/app.py: OpenEnv HTTP server app and env factory
What Works Today#
Standard remote OpenEnv usage through
REPLEnvLocal in-process execution through
LocalREPLEnvLocal recursive RLM runs through
LocalRLMRunnerServer-backed recursive calls through the current controller/broker path
Explicit recursion controls:
max_depthmax_children_totalmax_children_per_batchper_child_timeout_sresult_truncation_limit
Lightweight child trace metadata on local runner results
Rubric-based rewards (OpenEnv RFC 004):
ExactMatchRubric: binary outcome reward against ground truthFuzzyMatchRubric: partial credit for containment matchesCustomMetricRubric: user-providedmetric(expected, predicted) -> floatCodeExecutionRubric: per-step process reward for code errorsREPLRubric: composite rubric combining outcome + processGround truth injectable at reset via
expected_answer
Rewards#
Rewards follow the OpenEnv Rubric system (RFC 004). The environment uses
REPLRubric by default, which combines:
Outcome reward (on terminal steps): compares
final_answeragainstexpected_answerif provided. Returns 1.0 for match, 0.0 otherwise.Process reward (on non-terminal steps): returns -0.05 for code execution errors, 0.0 for successful steps.
Failure reward: returns -0.1 when max iterations exhausted without an answer.
For RL training (GRPO, etc.), pass expected_answer at reset time:
with LocalREPLEnv() as env:
env.reset(
context="...",
task_prompt="...",
expected_answer="42", # ground truth for rubric scoring
)
result = env.execute("print(FINAL(42))")
print(result.reward) # 1.0 (correct)
Custom rubrics can be injected at construction:
from repl_env import LocalREPLEnv, CustomMetricRubric, REPLRubric
def my_metric(expected, predicted):
return 1.0 if expected.strip() == predicted.strip() else 0.0
env = LocalREPLEnv(rubric=REPLRubric(outcome=CustomMetricRubric(my_metric)))
Quick Start#
Remote Server Usage#
Async:
import asyncio
from repl_env import REPLEnv
async def main():
async with REPLEnv(base_url="http://127.0.0.1:8000") as env:
result = await env.reset(
context="alpha beta gamma",
task_prompt="Count the words",
)
result = await env.execute("count = len(context.split())")
result = await env.execute("print(FINAL(count))")
print(result.done)
asyncio.run(main())
Sync:
from repl_env import REPLEnv
with REPLEnv(base_url="http://127.0.0.1:8000").sync() as env:
result = env.reset(
context="alpha beta gamma",
task_prompt="Count the words",
)
result = env.execute("count = len(context.split())")
result = env.execute("print(FINAL(count))")
print(result.observation.result.stdout)
Local Environment Usage#
from repl_env import LocalREPLEnv
with LocalREPLEnv() as env:
result = env.reset(
context="The quick brown fox jumps over the lazy dog",
task_prompt="Count the words",
)
result = env.execute("count = len(context.split())")
result = env.execute("print(FINAL(count))")
print(env.state().final_answer)
Local Recursive RLM Usage#
LocalRLMRunner takes any chat_fn(messages, model=None) -> str. It works
with HF Inference API, vLLM, SGLang, Ollama, or any OpenAI-compatible server.
With HF Inference API:
from huggingface_hub import InferenceClient
from repl_env import LocalRLMRunner, RLM_SYSTEM_PROMPT
client = InferenceClient(model="Qwen/Qwen3.5-9B", timeout=300)
def chat_fn(messages, model=None):
response = client.chat.completions.create(
model=model or "Qwen/Qwen3.5-9B",
messages=messages,
max_tokens=2048,
temperature=0.6,
extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
return response.choices[0].message.content
runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
result = runner.run("The answer is 42", "What number is mentioned?")
print(result.final_answer)
With a local vLLM server:
from openai import OpenAI
from repl_env import LocalRLMRunner
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
def chat_fn(messages, model=None):
response = client.chat.completions.create(
model=model or "Qwen/Qwen3.5-9B",
messages=messages,
max_tokens=2048,
temperature=0.6,
)
return response.choices[0].message.content
runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
result = runner.run(context, task)
Using Different Models for Outer and Inner Loops#
The outer loop (code generation) can use a large model while inner
llm_query/rlm_query calls use a smaller, faster model. Pass a
custom backend_factory to the runner:
from openai import OpenAI
from huggingface_hub import InferenceClient
from repl_env import LocalRLMRunner
from repl_env.recursive_backends import BackendLimits, LocalChildRLMBackend
# Outer loop: large local model via vLLM
vllm = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
def outer_chat(messages, model=None):
r = vllm.chat.completions.create(
model="Qwen/Qwen3-32B", messages=messages, max_tokens=2048,
)
return r.choices[0].message.content
# Inner calls (llm_query/rlm_query): smaller HF-hosted model
hf = InferenceClient(model="Qwen/Qwen3.5-9B")
def inner_chat(messages, model=None):
r = hf.chat.completions.create(
model=model or "Qwen/Qwen3.5-9B", messages=messages, max_tokens=2048,
extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
return r.choices[0].message.content
def my_backend_factory(llm_chat_fn, **kwargs):
return LocalChildRLMBackend(
inner_chat, # inner calls use the smaller model
runner_factory=LocalRLMRunner,
system_prompt=kwargs["system_prompt"],
max_iterations=kwargs["max_iterations"],
env_max_iterations_multiplier=kwargs["env_max_iterations_multiplier"],
depth=kwargs["depth"],
limits=BackendLimits(max_depth=2),
)
runner = LocalRLMRunner(
outer_chat, # outer loop: large model
backend_factory=my_backend_factory, # inner calls: small model
max_iterations=30,
max_depth=2,
)
result = runner.run(context, task)
Server#
Run the local server:
PYTHONPATH=src:envs uvicorn envs.repl_env.server.app:app --host 127.0.0.1 --port 8000
The server uses a proper OpenEnv environment factory in server/app.py.
API Surface#
Remote Client#
class REPLEnv(EnvClient[REPLAction, REPLObservation, REPLState]):
async def reset(...)
async def execute(code: str)
async def submit_final_answer(answer: str)
async def state()
Use .sync() for synchronous code.
Local Helpers#
class LocalREPLEnv:
def reset(...)
def execute(code: str)
def state()
class LocalRLMRunner:
def run(context: str, task_prompt: str, *, model: str | None = None) -> RLMRunResult
Actions and Observations#
REPLAction
code: str = ""
is_final: bool = False
final_answer: str | None = None
REPLObservation
result: CodeBlockResult
context_preview: str | None
context_length: int
available_variables: list[str]
iteration: int
max_iterations: int
done: bool
reward: float | None
metadata: dict
Injected REPL Helpers#
When configured, the REPL namespace exposes:
llm_query(prompt, model=None)llm_query_batched(prompts, model=None)rlm_query(prompt, model=None)rlm_query_batched(prompts, model=None)FINAL(value)FINAL_VAR(name)SHOW_VARS()
Notes:
rlm_queryis the recursive child-run surface.At max recursion depth, recursion falls back to direct LM calls rather than spawning more children.
Lifecycle callbacks follow the official
rlmpattern:on_subcall_start(depth, model, prompt_preview)on_subcall_complete(depth, model, duration, error_or_none)
Finalization Patterns#
FINAL(...)#
result = env.execute("answer = 42")
result = env.execute("print(FINAL(answer))")
FINAL_VAR(...)#
result = env.execute("my_answer = '42'")
result = env.execute('print(FINAL_VAR("my_answer"))')
answer dict#
result = env.execute("answer['content'] = '42'")
result = env.execute("answer['ready'] = True")
Prompt Utilities#
prompts.py contains the current message-building and parsing helpers used by the examples and runner.
Important exports:
RLM_SYSTEM_PROMPTRLM_SYSTEM_PROMPT_QWENQueryMetadatabuild_rlm_system_prompt(...)build_user_prompt(...)extract_code_blocks(...)format_observations(...)
These prompts were updated to reflect the actual helper surface the environment provides, rather than documenting tools that do not exist.
Examples#
Default hosted model in the examples is currently Qwen/Qwen3.5-9B, but real hosted inference still depends on provider availability and token access.
Environment Variables#
Server-side configuration in server/app.py:
LLM_MODELHF_TOKENREPL_MAX_ITERATIONSREPL_MAX_OUTPUT_LENGTHREPL_CONTEXT_PREVIEW_LENGTHREPL_RLM_MAX_DEPTHREPL_RLM_MAX_ITERATIONS