Simulation vs Production Mode#
OpenEnv has two related but different ideas of “mode”:
Simulation mode is for training, evaluation, and any workflow where the orchestrator controls episode boundaries.
Production mode is for exposing tools directly to clients over MCP without the training loop controlling
reset(),step(), orstate().
This guide explains when to use each mode and how they interact with MCP tools.
The Short Version#
Use simulation mode when you need:
reset(),step(), andstate()rewards and
donesignalsone action per step in a controlled trajectory
training or evaluation loops
Use production mode when you need:
direct MCP tool access
no simulation control routes
an agent or client talking to tools as a live service
the environment to get out of the way and expose the tool interface directly
Why OpenEnv Has Two Modes#
This split follows the core OpenEnv design principles:
training and evaluation need a controlled step loop
production integrations need direct tool access
the same environment should support both without inventing separate environment implementations
In practice, simulation mode models trajectory time and production mode models service time.
Simulation Mode#
Simulation mode is the default environment-control model.
In simulation mode, the orchestrator owns the episode:
Call
reset()to start an episode.Call
step()for each action.Read
reward,done, andstate()as part of the rollout.
For environments with MCP tools, the canonical simulation-mode pattern is still step().
from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction
obs = env.step(ListToolsAction())
obs = env.step(
CallToolAction(
tool_name="echo_message",
arguments={"message": "Hello from simulation mode"},
)
)
That pattern matters because the training loop can then:
count tool usage as actions
assign rewards to tool interactions
record a full trajectory
preserve the same
reset/stepcontract across environments
Simulation-Mode Routes#
When an HTTPEnvServer registers routes in simulation mode, it exposes the full control surface:
/ws/mcp/reset/step/state
This is the right mode for RL training infrastructure and for most environment testing.
Production Mode#
Production mode is for exposing tools directly.
In production mode, clients should interact with MCP tools as a service instead of driving the environment through reset() and step() as a trajectory loop.
Production mode keeps the MCP surface and removes the HTTP simulation control routes.
Production-Mode Routes#
When an HTTPEnvServer registers routes in production mode, OpenEnv does not expose:
/reset/step/state
It still registers /ws, because the WebSocket transport remains part of the infrastructure boundary.
That does not mean /ws should be exposed to agents.
/wsis for orchestration and simulation control/mcpis the agent-facing boundaryproduction deployments should restrict
/wsat the network, auth, or gateway layer if agents can reach the service directly
In other words, production mode removes the HTTP simulation endpoints, but operators must still treat /ws as infrastructure-only.
This is the right mode when:
you are serving a tool-backed environment to external clients
you do not want callers controlling episode boundaries
the MCP interface is the product surface
How MCP Fits Into Both Modes#
MCP is available in both modes, but the role is different.
In simulation mode#
MCP tools are part of the environment action space.
tool discovery can still happen
tool calls are modeled as actions
rewards and episode control remain in the OpenEnv loop
This is why examples such as examples/echo_mcp_demo.py use ListToolsAction and CallToolAction through step().
In production mode#
MCP is the primary interface.
clients call tools directly
OpenEnv does not present simulation-control endpoints
the service behaves like a live MCP endpoint, not an RL rollout loop
Server-Side Configuration#
The server-side switch happens when routes are registered.
from fastapi import FastAPI
from openenv.core.env_server.http_server import HTTPEnvServer
from openenv.core.env_server.types import ServerMode
app = FastAPI()
server = HTTPEnvServer(env=MyEnv, action_cls=MyAction, observation_cls=MyObservation)
# Training / evaluation
server.register_routes(app, mode=ServerMode.SIMULATION)
# Direct MCP serving
server.register_routes(app, mode=ServerMode.PRODUCTION)
ServerMode.SIMULATION is the default route-registration mode.
Client-Side Patterns#
For simulation-style interaction, use a client that participates in the OpenEnv control loop.
Examples:
an environment-specific
EnvClient[...]subclassGenericEnvClient(base_url=..., mode="simulation")step(ListToolsAction())andstep(CallToolAction(...))for MCP-backed environments
For direct MCP access, use an MCP-oriented client.
Examples:
MCPToolClient(base_url=...)environment-specific clients built on top of
MCPToolClient
MCPToolClient defaults to production mode and rejects mode="simulation".
Mode-Aware Tools#
MCPEnvironment supports mode-aware tool registration, so you can expose different tools depending on how the environment is being used.
class MyEnv(MCPEnvironment):
def __init__(self):
@self.tool(mode="simulation")
def score_candidate(answer: str) -> str:
return "Used inside the training loop"
@self.tool(mode="production")
def lookup_docs(query: str) -> str:
return "Used by live MCP clients"
This lets one environment preserve the training contract while still serving a cleaner production surface.
Choosing the Right Mode#
Choose simulation mode if the caller needs to control trajectories.
Typical cases:
RL training
policy evaluation
benchmarking with rewards
environments where tool calls should count as agent actions
Choose production mode if the caller needs direct tool access.
Typical cases:
agent runtimes that speak MCP directly
demos and hosted services
integrations where
reset()andstep()should not be public
Common Mistake#
The most common confusion is assuming that “MCP environment” automatically means “production mode only”.
That is not the model OpenEnv uses.
An MCP-backed environment can still run in simulation mode.
In simulation mode, MCP tool interactions are represented through the OpenEnv step loop.
Production mode changes the public control surface, not the underlying environment concept.