MCP Tools in OpenEnv Environments#

Most agentic work ends up needing the same thing: a way for the model to call tools and receive structured feedback, whether that is during RL training or offline evaluation. OpenEnv standardises that surface with MCP (Model Context Protocol), so the same tool interface works during training, eval, inference, and external serving. This tutorial covers the four paths you will walk in practice — wiring an MCP-backed environment into a training loop, using the same env for offline eval, inspecting the API underneath both of those, and building your own MCP environment when no existing one fits.

Why MCP?#

If your tools are just local Python functions, you do not need MCP — pass them to your trainer directly (TRL, torchforge, whatever) and you are done. MCP earns its complexity when the tool surface has to exist as a process boundary, not a function call:

The env runs elsewhere — in a Docker container, a Hugging Face Space, a remote server. MCP is the transport that crosses that boundary.
You want to reuse someone else’s env — the OpenEnv catalog, third-party envs, and community hubs all expose their tools over MCP, so the same env works in your training run without rewriting its interface.
You want the env to be callable by other agents — Claude Desktop, Cursor, inference servers, and any MCP-compatible client can plug into an MCP server. A private Python function doesn’t get that for free.
You need tool discovery and schemas — list_tools() + auto-generated JSON schemas are part of the protocol; models see them the same way they see any MCP server’s tools.

In short: MCP is the answer when your env is more than a helper function in your training script — when the same tools have to be usable from training, inference, and external clients without maintaining three interfaces.

The dual API boundary#

Inside OpenEnv, MCP plays a specific role in a two-surface split:

Training / orchestration infrastructure uses the Gym-style control plane — reset(), step(), state() — over WebSocket (/ws). This is what the trainer needs to roll out episodes, compute rewards, and enforce termination.
Agents use MCP tools over the /mcp JSON-RPC endpoint. Tools are what the model calls to act on the world.

Note

In simulation mode, MCP tool calls flow through step(). The trainer stays in control of timing, rewards, and termination; the MCP action types are just a standardised action schema. The MCP environment lifecycle guide covers the split in depth.

Note

MCP adoption in OpenEnv is still in flight. RFC 003 proposes MCP as the standard interface for all agent-facing actions, but it is still In Review. Today only a handful of envs are MCP-backed: echo_env and finqa_env inherit from the canonical openenv.core.env_server.mcp_environment.MCPEnvironment; calendar_env uses a local wrapper with the same shape. The majority (textarena_env / Wordle, openspiel_env, chess_env, browsergym_env, and most others) still use custom action types that you pass through env.step(CustomAction(...)) without MCP plumbing. Before using the patterns in this tutorial against a specific env, check whether it inherits from an MCPEnvironment base; if not, the env’s own action schema applies instead.

Using MCP Tools in a Training Loop#

An MCP-backed env is consumed like any other OpenEnv env from the trainer’s side. At the atomic level, each agent action is:

obs = env.step(CallToolAction(tool_name=..., arguments=...))
# obs.result       — runtime tool result object, or None on error
# obs.reward       — env's reward for this turn (may be None)
# obs.done         — episode terminated

That is the only MCP-specific piece. Everything around it — how the trainer generates actions, how tool schemas are surfaced to the model, how rewards are collected — belongs to your training framework, not to MCP.

Framework-agnostic rollout loop#

If you drive the rollout yourself (a custom loop, torchforge, an external agent server), you own the full generation path and call env.step() directly:

obs = env.reset()
total_reward = 0.0
for turn in range(max_turns):
    tool_call = model.decide(obs)  # your agent picks a tool + args from the latest observation
    obs = env.step(
        CallToolAction(tool_name=tool_call.name, arguments=tool_call.arguments)
    )
    total_reward += obs.reward or 0.0
    if obs.done:
        break

Whatever policy / generation code you use, env.step(CallToolAction(...)) is the only line that talks to the MCP env.

TRL `environment_factory`#

TRL’s GRPOTrainer takes an environment_factory class whose public methods auto-register as discoverable tools — the trainer then handles the multi-turn generation loop for you. The Wordle GRPO tutorial shows the full recipe (wrapper class, reward function, GRPOTrainer construction) with a non-MCP env. For an MCP-backed env, only the tool method bodies change; they call through to env.step(CallToolAction(...)):

def echo(self, message: str) -> str:
    """Echo back a message.

    Args:
        message: The message to echo.
    """
    step_result = self.env.step(
        CallToolAction(tool_name="echo_message", arguments={"message": message})
    )
    obs = step_result.observation
    self.reward = step_result.reward or obs.reward or 0.0
    result = obs.result
    return result.data if hasattr(result, "data") else result

environment_factory is a TRL API, not an MCP API. It works equally well with non-MCP envs (Wordle uses it with TextArenaAction), and MCP envs work equally well without it (the rollout-loop path above). They compose, but they are orthogonal.

The rest of this tutorial is for the other paths: the API underneath env.step(CallToolAction(...)) (useful when you need the full observation or want to debug), using the same env for offline eval, and building your own MCP environment from scratch.

Under the Hood: `CallToolAction` and `ListToolsAction`#

The two MCP action types are ListToolsAction (discover what’s available) and CallToolAction (invoke one). They behave like any other Gym action — pass them to step() and inspect the returned observation.

Discovering tools#

from echo_env.server.echo_environment import EchoEnvironment
from openenv.core.env_server.mcp_types import ListToolsAction, ListToolsObservation

env = EchoEnvironment()
env.reset()

obs = env.step(ListToolsAction())
assert isinstance(obs, ListToolsObservation)

for tool in obs.tools:
    print(f"{tool.name}: {tool.description}")

Each Tool carries a name, a description, and an input_schema (JSON Schema) describing the accepted arguments. The schema is what lets a language-model agent know which parameters to fill in when it emits a tool call.

Calling a tool#

from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation

obs = env.step(
    CallToolAction(
        tool_name="echo_message",
        arguments={"message": "Hello from MCP!"},
    )
)

assert isinstance(obs, CallToolObservation)
print(obs.tool_name)       # "echo_message"
print(obs.error)           # None
result = obs.result
print(result.data if hasattr(result, "data") else result)  # "Hello from MCP!"

CallToolObservation.result is typed as Any in OpenEnv. At runtime, FastMCP commonly returns a fastmcp.client.client.CallToolResult object with .data, .structured_content, and .content attributes, but JSON round-trips or custom environments can surface a plain dict or value instead. Treat .data as a convenience when it exists, not as an OpenEnv-defined wrapper type. obs.error carries every failure mode — transport errors, unknown tool names, malformed arguments, and exceptions raised from inside the tool function itself (as ToolErrorType.EXECUTION_ERROR). On an error, obs.result is None. Always branch on obs.error is None before reading a runtime result.

Error handling#

obs = env.step(
    CallToolAction(tool_name="does_not_exist", arguments={}),
)

assert isinstance(obs, CallToolObservation)
print(obs.error.error_type)  # ToolErrorType.TOOL_NOT_FOUND
print(obs.error.message)     # human-readable message from FastMCP, e.g. "Unknown tool: 'does_not_exist'"

The ToolError.error_type enum (TOOL_NOT_FOUND, INVALID_ARGS, EXECUTION_ERROR, TRANSPORT_ERROR, TIMEOUT) lets training code distinguish between bugs in the agent, bugs in the environment, and transient infrastructure issues — which often warrant different reward signals.

`step(CallToolAction(...))` vs `call_tool()`#

Environment clients that inherit from MCPToolClient (such as EchoEnv and FinQAEnv) expose a shorter async await env.call_tool("name", arg=value) helper for a running environment server. It returns the tool’s raw return value directly instead of a CallToolObservation — and it raises RuntimeError on any tool error (transport failure, unknown tool, invalid arguments, or a tool exception), so you cannot branch on error_type without a try/except. Use step(CallToolAction(...)) when you need the whole observation (reward, done, metadata, or graceful error classification); reach for call_tool() in async production scripts where the raw result is all you care about and a failure is allowed to propagate. The lifecycle guide covers the exact trade-offs.

Note

MCPToolClient and its base MCPClientBase only support mode="production"; construction raises ValueError for other modes. For direct in-process training or eval snippets like the ones above, call env.step(CallToolAction(...)) on the environment class itself.

Using MCP Tools for Evaluation#

The same mechanics work outside a training loop. For an offline eval — benchmarking a model’s tool use on a static dataset, regression-testing a deployed agent, or scoring a policy — drop the trainer and drive the step loop yourself:

from echo_env.server.echo_environment import EchoEnvironment
from openenv.core.env_server.mcp_types import CallToolAction

env = EchoEnvironment()
env.reset()

results = []
for sample in eval_dataset:
    tool_call = model.decide(sample)   # your agent picks a tool + arguments
    obs = env.step(
        CallToolAction(tool_name=tool_call.name, arguments=tool_call.arguments),
    )
    results.append({
        "prompt": sample.prompt,
        "reply": (
            obs.result.data if obs.error is None and hasattr(obs.result, "data")
            else obs.result if obs.error is None
            else None
        ),
        "reward": obs.reward or 0.0,
        "error": obs.error,
    })
    env.reset()

Pair the loop with a scoring function of your choice — the Reward Design guide covers common patterns (test-pass rate, LLM-as-judge quality, compliance gates) — and aggregate across the dataset. The eval harness integration in src/openenv/core/evals/ is still evolving; until that bridge lands, this plain-Python loop is the canonical pattern.

Building an MCP Environment#

Reach for this path when no existing environment covers the tools your agent needs — e.g. a new coding sandbox, a game, a proprietary API wrapper. The provider side is small: subclass MCPEnvironment, create a FastMCP server, register tools with the @mcp.tool decorator, and pass the server to super().__init__. Here is the echo environment, trimmed from envs/echo_env/server/echo_environment.py down to the parts this tutorial covers:

from uuid import uuid4

from fastmcp import FastMCP

from openenv.core.env_server.mcp_environment import MCPEnvironment
from openenv.core.env_server.types import Action, Observation, State


class EchoEnvironment(MCPEnvironment):
    SUPPORTS_CONCURRENT_SESSIONS = True

    def __init__(self):
        mcp = FastMCP("echo_env")

        @mcp.tool
        def echo_message(message: str) -> str:
            """Echo back the provided message.

            Args:
                message: The message to echo back

            Returns:
                The same message that was provided
            """
            return message

        @mcp.tool
        def echo_with_length(message: str) -> dict:
            """Echo back the message with its length.

            Args:
                message: The message to echo back

            Returns:
                Dictionary with the message and its length
            """
            return {"message": message, "length": len(message)}

        super().__init__(mcp)
        self._state = State(episode_id=str(uuid4()), step_count=0)

    def reset(self, seed=None, episode_id=None, **kwargs) -> Observation:
        self._state = State(episode_id=episode_id or str(uuid4()), step_count=0)
        return Observation(done=False, reward=0.0, metadata={"status": "ready"})

    def _step_impl(self, action: Action, timeout_s=None, **kwargs) -> Observation:
        # Called for non-MCP actions. Echo exposes MCP tools only,
        # so anything that isn't ListToolsAction / CallToolAction is an error.
        return Observation(
            done=False,
            reward=0.0,
            metadata={"error": f"Unknown action type: {type(action).__name__}"},
        )

    @property
    def state(self) -> State:
        return self._state

A few things worth calling out:

Docstring → schema. FastMCP inspects each tool’s signature and Google-style docstring to build the input_schema automatically. The Args: block becomes parameter descriptions, and type hints become JSON types. No hand-written schemas.
Reserved names. reset, step, state, and close are reserved and cannot be tool names — they belong to the infrastructure boundary. Trying to register a tool with one of those names raises at construction time.
_step_impl is required, step is not. MCPEnvironment.step already routes ListToolsAction and CallToolAction through the FastMCP server for you. Your subclass only has to implement _step_impl, which the base class calls for any non-MCP action. In pure-MCP environments like Echo it just returns an error observation; in environments that mix tool calls with other action types (e.g. a terminal “submit” action) it’s where that extra dispatch lives.
Rewards and done still work. Because MCP actions flow through step(), you can compute rewards, flip done, and emit metadata just like in any other OpenEnv environment.

Running the Demo End-to-End#

The repo ships a self-contained walkthrough at examples/echo_mcp_demo.py. Run it directly from the repo root:

PYTHONPATH=src:envs uv run python examples/echo_mcp_demo.py

You will see the discovery call, two tool invocations, and an error case printed in sequence — the same four steps the “Under the hood” section covers, end-to-end against the real EchoEnvironment.

Next Steps#

End-to-end training recipe — the Wordle GRPO tutorial walks through a full GRPO training run with environment_factory. The wrapper-class shape is the same for an MCP-backed env; inside each tool method, build a CallToolAction(tool_name=..., arguments={...}) instead of Wordle’s single-field TextArenaAction(message=guess).
MCP lifecycle details — the MCP Environment Lifecycle guide covers step() vs step_async(), the call_tool() convenience path, and common debugging questions.
A richer MCP environment — envs/finqa_env/ shows tool calls participating in episode progression, rewards, and terminal submission — not just a stateless echo.
Design rationale — RFC 003 explains why OpenEnv picked MCP as the agent boundary and how tool-calling and CodeAct styles share the same plumbing.
Serving tools to an external agent — the /mcp JSON-RPC endpoint is available alongside /ws on any MCP environment server. Point an MCP-compatible client at it for production inference without going through the step loop. This direct path bypasses reward computation, step counts, and episode termination, and it exposes only registered MCP tools — not reset, step, or state.

MCP Tools in OpenEnv Environments#

Why MCP?#

The dual API boundary#

Using MCP Tools in a Training Loop#

Framework-agnostic rollout loop#

TRL environment_factory#

Under the Hood: CallToolAction and ListToolsAction#

Discovering tools#

Calling a tool#

Error handling#

step(CallToolAction(...)) vs call_tool()#

Using MCP Tools for Evaluation#

Building an MCP Environment#

Running the Demo End-to-End#

Next Steps#

TRL `environment_factory`#

Under the Hood: `CallToolAction` and `ListToolsAction`#

`step(CallToolAction(...))` vs `call_tool()`#