Environments#
The OpenEnv community has built a catalog of ready-to-run environments that cover deterministic smoke tests, full developer workflows, and multi-step reasoning challenges. Explore the surface area below and jump directly into the guides for each environment.
Minimal observation/action loop for verifying client integrations, CI pipelines, and onboarding flows in seconds.
Secure sandbox with filesystem access and evaluation hooks for executing generated code and building autonomous dev workflows.
Message-driven loop tailored for conversational agents that need structured turns, safety rails, and message attribution.
Classic Arcade Learning Environment tasks packaged for fast benchmarking of reinforcement-learning style agents.
Multi-agent, game-theory workloads powered by DeepMind’s OpenSpiel suite, ideal for search and self-play experiments.
Traffic control scenarios with SUMO simulators for agents that reason about continuous control and scheduling.
Financial market simulations with portfolio APIs, perfect for RLHF strategies and algorithmic trading experiments.
Multi-task text arena for language-game competitions such as Wordle, reasoning puzzles, and program synthesis.
Teaches agents to navigate repositories, inspect diffs, and land changes via Git-native operations.
Safety-critical diagnostics from the DIPG benchmark, highlighting guardrails, adversarial prompts, and risk scoring.
Classic snake game environment for RL research with configurable grids, partial observability, and customizable rewards.
Web search environment for RL research with configurable grids, partial observability, and customizable rewards.
Browser automation environment for web agents with DOM interaction, navigation, and multi-step task completion.
RL environment for GPU kernel optimization. Train LLM agents to write fast CUDA/Triton kernels that beat baseline implementations.
Calendar tool-use environment exposing a Calendar Gym through the OpenEnv reset/step/state interface for scheduling agents.
Chess RL environment powered by the moonfish engine with configurable opponents, position evaluation, and full chess rules.
Classic Connect Four board game environment for training agents on turn-based strategy with a 6×7 grid.
Generic OpenEnv wrapper for dm_control.suite, providing access to all MuJoCo-based continuous control tasks like cartpole, walker, and humanoid.
Financial question-answering environment that evaluates LLMs on complex financial questions using tool calls on SEC 10-K filing data.
Simple 5×5 grid world RL testbed and step-by-step guide for building new OpenEnv environments from scratch.
Julia code execution environment with test result tracking and reward calculation for RL training on Julia programming tasks.
Gridworld maze where agents navigate from start to exit while avoiding walls, with configurable 8×8 layouts.
Web application simulation wrapping the OpenApps framework and BrowserGym for training UI agents on calendar, todo, messenger, and maps apps.
Integrates the Reasoning Gym library to provide single-step reasoning tasks with configurable datasets and scoring.
Python REPL environment for code execution tasks based on the Recursive Language Models paradigm with sandboxed execution and context loading.
OpenEnv wrapper for Terminal-Bench 2 tasks with local and Docker execution modes for terminal-based agent evaluation.
OpenEnv wrapper for Unity ML-Agents environments, providing access to Unity’s RL environments through HTTP/WebSocket interfaces.
Autonomous wildfire-control simulation where agents contain spreading fires using water, firebreaks, and timing under dynamic conditions.
Tip
Want to publish your own environment? Head over to the Build Your Own Environment guide for a step-by-step walkthrough.
Community Environments#
A suite of 400 environments that procedurally generate reasoning problems for LM training with configurable difficulty.