Rate this Page

Simulation vs Production Mode#

OpenEnv has two related but different ideas of “mode”:

  • Simulation mode is for training, evaluation, and any workflow where the orchestrator controls episode boundaries.

  • Production mode is for exposing tools directly to clients over MCP without the training loop controlling reset(), step(), or state().

This guide explains when to use each mode and how they interact with MCP tools.

The Short Version#

Use simulation mode when you need:

  • reset(), step(), and state()

  • rewards and done signals

  • one action per step in a controlled trajectory

  • training or evaluation loops

Use production mode when you need:

  • direct MCP tool access

  • no simulation control routes

  • an agent or client talking to tools as a live service

  • the environment to get out of the way and expose the tool interface directly

Why OpenEnv Has Two Modes#

This split follows the core OpenEnv design principles:

  • training and evaluation need a controlled step loop

  • production integrations need direct tool access

  • the same environment should support both without inventing separate environment implementations

In practice, simulation mode models trajectory time and production mode models service time.

Simulation Mode#

Simulation mode is the default environment-control model.

In simulation mode, the orchestrator owns the episode:

  1. Call reset() to start an episode.

  2. Call step() for each action.

  3. Read reward, done, and state() as part of the rollout.

For environments with MCP tools, the canonical simulation-mode pattern is still step().

from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction

obs = env.step(ListToolsAction())

obs = env.step(
    CallToolAction(
        tool_name="echo_message",
        arguments={"message": "Hello from simulation mode"},
    )
)

That pattern matters because the training loop can then:

  • count tool usage as actions

  • assign rewards to tool interactions

  • record a full trajectory

  • preserve the same reset/step contract across environments

Simulation-Mode Routes#

When an HTTPEnvServer registers routes in simulation mode, it exposes the full control surface:

  • /ws

  • /mcp

  • /reset

  • /step

  • /state

This is the right mode for RL training infrastructure and for most environment testing.

Production Mode#

Production mode is for exposing tools directly.

In production mode, clients should interact with MCP tools as a service instead of driving the environment through reset() and step() as a trajectory loop.

Production mode keeps the MCP surface and removes the HTTP simulation control routes.

Production-Mode Routes#

When an HTTPEnvServer registers routes in production mode, OpenEnv does not expose:

  • /reset

  • /step

  • /state

It still registers /ws, because the WebSocket transport remains part of the infrastructure boundary.

That does not mean /ws should be exposed to agents.

  • /ws is for orchestration and simulation control

  • /mcp is the agent-facing boundary

  • production deployments should restrict /ws at the network, auth, or gateway layer if agents can reach the service directly

In other words, production mode removes the HTTP simulation endpoints, but operators must still treat /ws as infrastructure-only.

This is the right mode when:

  • you are serving a tool-backed environment to external clients

  • you do not want callers controlling episode boundaries

  • the MCP interface is the product surface

How MCP Fits Into Both Modes#

MCP is available in both modes, but the role is different.

In simulation mode#

MCP tools are part of the environment action space.

  • tool discovery can still happen

  • tool calls are modeled as actions

  • rewards and episode control remain in the OpenEnv loop

This is why examples such as examples/echo_mcp_demo.py use ListToolsAction and CallToolAction through step().

In production mode#

MCP is the primary interface.

  • clients call tools directly

  • OpenEnv does not present simulation-control endpoints

  • the service behaves like a live MCP endpoint, not an RL rollout loop

Server-Side Configuration#

The server-side switch happens when routes are registered.

from fastapi import FastAPI

from openenv.core.env_server.http_server import HTTPEnvServer
from openenv.core.env_server.types import ServerMode

app = FastAPI()
server = HTTPEnvServer(env=MyEnv, action_cls=MyAction, observation_cls=MyObservation)

# Training / evaluation
server.register_routes(app, mode=ServerMode.SIMULATION)

# Direct MCP serving
server.register_routes(app, mode=ServerMode.PRODUCTION)

ServerMode.SIMULATION is the default route-registration mode.

Client-Side Patterns#

For simulation-style interaction, use a client that participates in the OpenEnv control loop.

Examples:

  • an environment-specific EnvClient[...] subclass

  • GenericEnvClient(base_url=..., mode="simulation")

  • step(ListToolsAction()) and step(CallToolAction(...)) for MCP-backed environments

For direct MCP access, use an MCP-oriented client.

Examples:

  • MCPToolClient(base_url=...)

  • environment-specific clients built on top of MCPToolClient

MCPToolClient defaults to production mode and rejects mode="simulation".

Mode-Aware Tools#

MCPEnvironment supports mode-aware tool registration, so you can expose different tools depending on how the environment is being used.

class MyEnv(MCPEnvironment):
    def __init__(self):
        @self.tool(mode="simulation")
        def score_candidate(answer: str) -> str:
            return "Used inside the training loop"

        @self.tool(mode="production")
        def lookup_docs(query: str) -> str:
            return "Used by live MCP clients"

This lets one environment preserve the training contract while still serving a cleaner production surface.

Choosing the Right Mode#

Choose simulation mode if the caller needs to control trajectories.

Typical cases:

  • RL training

  • policy evaluation

  • benchmarking with rewards

  • environments where tool calls should count as agent actions

Choose production mode if the caller needs direct tool access.

Typical cases:

  • agent runtimes that speak MCP directly

  • demos and hosted services

  • integrations where reset() and step() should not be public

Common Mistake#

The most common confusion is assuming that “MCP environment” automatically means “production mode only”.

That is not the model OpenEnv uses.

  • An MCP-backed environment can still run in simulation mode.

  • In simulation mode, MCP tool interactions are represented through the OpenEnv step loop.

  • Production mode changes the public control surface, not the underlying environment concept.