Rate this Page

★ ★ ★ ★ ★

REPL Environment for OpenEnv#

repl_env is an OpenEnv-native Python REPL environment for Recursive Language Model style execution. It now follows the current OpenEnv client/server conventions:

REPLEnv is the remote async EnvClient
.sync() is the sync wrapper for remote usage
LocalREPLEnv is the explicit in-process helper
LocalRLMRunner is the higher-level orchestration loop for local recursive RLM runs

The architecture is intentionally split the same way the official rlm and DSPy implementations split things:

the environment executes code and exposes tools
the runner owns the iterative prompting loop
recursive behavior lives in backend/controller modules, not in the executor

Overview#

Inside the REPL, the model can:

inspect context
execute Python code across multiple turns with persistent state
call llm_query(...) and llm_query_batched(...)
call rlm_query(...) and rlm_query_batched(...) for recursive child runs when configured
finish with FINAL(...), FINAL_VAR(...), or answer = {"content": ..., "ready": True}

Current Architecture#

Main modules:

client.py: remote async OpenEnv client
local.py: explicit in-process local env helper
runner.py: local RLM orchestration loop
recursive_backends.py: direct and recursive backend implementations
recursive_controller.py: server-side backend/broker composition
rubrics.py: reward rubrics (OpenEnv RFC 004)
server/repl_environment.py: server-side execution environment
server/app.py: OpenEnv HTTP server app and env factory

What Works Today#

Standard remote OpenEnv usage through REPLEnv
Local in-process execution through LocalREPLEnv
Local recursive RLM runs through LocalRLMRunner
Server-backed recursive calls through the current controller/broker path
Explicit recursion controls:
- max_depth
- max_children_total
- max_children_per_batch
- per_child_timeout_s
- result_truncation_limit
Lightweight child trace metadata on local runner results
Rubric-based rewards (OpenEnv RFC 004):
- ExactMatchRubric: binary outcome reward against ground truth
- FuzzyMatchRubric: partial credit for containment matches
- CustomMetricRubric: user-provided metric(expected, predicted) -> float
- CodeExecutionRubric: per-step process reward for code errors
- REPLRubric: composite rubric combining outcome + process
- Ground truth injectable at reset via expected_answer

Rewards#

Rewards follow the OpenEnv Rubric system (RFC 004). The environment uses REPLRubric by default, which combines:

Outcome reward (on terminal steps): compares final_answer against expected_answer if provided. Returns 1.0 for match, 0.0 otherwise.
Process reward (on non-terminal steps): returns -0.05 for code execution errors, 0.0 for successful steps.
Failure reward: returns -0.1 when max iterations exhausted without an answer.

For RL training (GRPO, etc.), pass expected_answer at reset time:

with LocalREPLEnv() as env:
    env.reset(
        context="...",
        task_prompt="...",
        expected_answer="42",  # ground truth for rubric scoring
    )
    result = env.execute("print(FINAL(42))")
    print(result.reward)  # 1.0 (correct)

Custom rubrics can be injected at construction:

from repl_env import LocalREPLEnv, CustomMetricRubric, REPLRubric

def my_metric(expected, predicted):
    return 1.0 if expected.strip() == predicted.strip() else 0.0

env = LocalREPLEnv(rubric=REPLRubric(outcome=CustomMetricRubric(my_metric)))

Quick Start#

Remote Server Usage#

Async:

import asyncio
from repl_env import REPLEnv


async def main():
    async with REPLEnv(base_url="http://127.0.0.1:8000") as env:
        result = await env.reset(
            context="alpha beta gamma",
            task_prompt="Count the words",
        )
        result = await env.execute("count = len(context.split())")
        result = await env.execute("print(FINAL(count))")
        print(result.done)


asyncio.run(main())

Sync:

from repl_env import REPLEnv

with REPLEnv(base_url="http://127.0.0.1:8000").sync() as env:
    result = env.reset(
        context="alpha beta gamma",
        task_prompt="Count the words",
    )
    result = env.execute("count = len(context.split())")
    result = env.execute("print(FINAL(count))")
    print(result.observation.result.stdout)

Local Environment Usage#

from repl_env import LocalREPLEnv

with LocalREPLEnv() as env:
    result = env.reset(
        context="The quick brown fox jumps over the lazy dog",
        task_prompt="Count the words",
    )
    result = env.execute("count = len(context.split())")
    result = env.execute("print(FINAL(count))")
    print(env.state().final_answer)

Local Recursive RLM Usage#

LocalRLMRunner takes any chat_fn(messages, model=None) -> str. It works with HF Inference API, vLLM, SGLang, Ollama, or any OpenAI-compatible server.

With HF Inference API:

from huggingface_hub import InferenceClient
from repl_env import LocalRLMRunner, RLM_SYSTEM_PROMPT

client = InferenceClient(model="Qwen/Qwen3.5-9B", timeout=300)

def chat_fn(messages, model=None):
    response = client.chat.completions.create(
        model=model or "Qwen/Qwen3.5-9B",
        messages=messages,
        max_tokens=2048,
        temperature=0.6,
        extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )
    return response.choices[0].message.content

runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
result = runner.run("The answer is 42", "What number is mentioned?")
print(result.final_answer)

With a local vLLM server:

from openai import OpenAI
from repl_env import LocalRLMRunner

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

def chat_fn(messages, model=None):
    response = client.chat.completions.create(
        model=model or "Qwen/Qwen3.5-9B",
        messages=messages,
        max_tokens=2048,
        temperature=0.6,
    )
    return response.choices[0].message.content

runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
result = runner.run(context, task)

Using Different Models for Outer and Inner Loops#

The outer loop (code generation) can use a large model while inner llm_query/rlm_query calls use a smaller, faster model. Pass a custom backend_factory to the runner:

from openai import OpenAI
from huggingface_hub import InferenceClient
from repl_env import LocalRLMRunner
from repl_env.recursive_backends import BackendLimits, LocalChildRLMBackend

# Outer loop: large local model via vLLM
vllm = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

def outer_chat(messages, model=None):
    r = vllm.chat.completions.create(
        model="Qwen/Qwen3-32B", messages=messages, max_tokens=2048,
    )
    return r.choices[0].message.content

# Inner calls (llm_query/rlm_query): smaller HF-hosted model
hf = InferenceClient(model="Qwen/Qwen3.5-9B")

def inner_chat(messages, model=None):
    r = hf.chat.completions.create(
        model=model or "Qwen/Qwen3.5-9B", messages=messages, max_tokens=2048,
        extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )
    return r.choices[0].message.content

def my_backend_factory(llm_chat_fn, **kwargs):
    return LocalChildRLMBackend(
        inner_chat,  # inner calls use the smaller model
        runner_factory=LocalRLMRunner,
        system_prompt=kwargs["system_prompt"],
        max_iterations=kwargs["max_iterations"],
        env_max_iterations_multiplier=kwargs["env_max_iterations_multiplier"],
        depth=kwargs["depth"],
        limits=BackendLimits(max_depth=2),
    )

runner = LocalRLMRunner(
    outer_chat,                        # outer loop: large model
    backend_factory=my_backend_factory, # inner calls: small model
    max_iterations=30,
    max_depth=2,
)
result = runner.run(context, task)

Server#

Run the local server:

PYTHONPATH=src:envs uvicorn envs.repl_env.server.app:app --host 127.0.0.1 --port 8000

The server uses a proper OpenEnv environment factory in server/app.py.

API Surface#

Remote Client#

class REPLEnv(EnvClient[REPLAction, REPLObservation, REPLState]):
    async def reset(...)
    async def execute(code: str)
    async def submit_final_answer(answer: str)
    async def state()

Use .sync() for synchronous code.

Local Helpers#

class LocalREPLEnv:
    def reset(...)
    def execute(code: str)
    def state()

class LocalRLMRunner:
    def run(context: str, task_prompt: str, *, model: str | None = None) -> RLMRunResult

Actions and Observations#

REPLAction

code: str = ""
is_final: bool = False
final_answer: str | None = None

REPLObservation

result: CodeBlockResult
context_preview: str | None
context_length: int
available_variables: list[str]
iteration: int
max_iterations: int
done: bool
reward: float | None
metadata: dict

Injected REPL Helpers#

When configured, the REPL namespace exposes:

llm_query(prompt, model=None)
llm_query_batched(prompts, model=None)
rlm_query(prompt, model=None)
rlm_query_batched(prompts, model=None)
FINAL(value)
FINAL_VAR(name)
SHOW_VARS()

Notes:

rlm_query is the recursive child-run surface.
At max recursion depth, recursion falls back to direct LM calls rather than spawning more children.
Lifecycle callbacks follow the official rlm pattern:
- on_subcall_start(depth, model, prompt_preview)
- on_subcall_complete(depth, model, duration, error_or_none)

Finalization Patterns#

`FINAL(...)`#

result = env.execute("answer = 42")
result = env.execute("print(FINAL(answer))")

`FINAL_VAR(...)`#

result = env.execute("my_answer = '42'")
result = env.execute('print(FINAL_VAR("my_answer"))')

`answer` dict#

result = env.execute("answer['content'] = '42'")
result = env.execute("answer['ready'] = True")

Prompt Utilities#

prompts.py contains the current message-building and parsing helpers used by the examples and runner.

Important exports:

RLM_SYSTEM_PROMPT
RLM_SYSTEM_PROMPT_QWEN
QueryMetadata
build_rlm_system_prompt(...)
build_user_prompt(...)
extract_code_blocks(...)
format_observations(...)

These prompts were updated to reflect the actual helper surface the environment provides, rather than documenting tools that do not exist.

Default hosted model in the examples is currently Qwen/Qwen3.5-9B, but real hosted inference still depends on provider availability and token access.

Environment Variables#

Server-side configuration in server/app.py:

LLM_MODEL
HF_TOKEN
REPL_MAX_ITERATIONS
REPL_MAX_OUTPUT_LENGTH
REPL_CONTEXT_PREVIEW_LENGTH
REPL_RLM_MAX_DEPTH
REPL_RLM_MAX_ITERATIONS

REPL Environment for OpenEnv#

Overview#

Current Architecture#

What Works Today#

Rewards#

Quick Start#

Remote Server Usage#

Local Environment Usage#

Local Recursive RLM Usage#

Using Different Models for Outer and Inner Loops#

Server#

API Surface#

Remote Client#

Local Helpers#

Actions and Observations#

Injected REPL Helpers#

Finalization Patterns#

`FINAL(...)`#

`FINAL_VAR(...)`#

`answer` dict#

Prompt Utilities#

Examples#

Environment Variables#

References#

Docs

Tutorials

Resources

REPL Environment for OpenEnv#

Overview#

Current Architecture#

What Works Today#

Rewards#

Quick Start#

Remote Server Usage#

Local Environment Usage#

Local Recursive RLM Usage#

Using Different Models for Outer and Inner Loops#

Server#

API Surface#

Remote Client#

Local Helpers#

Actions and Observations#

Injected REPL Helpers#

Finalization Patterns#

FINAL(...)#

FINAL_VAR(...)#

answer dict#

Prompt Utilities#

Examples#

Environment Variables#

References#

Docs

Tutorials

Resources

`FINAL(...)`#

`FINAL_VAR(...)`#

`answer` dict#