Rate this Page
โ˜… โ˜… โ˜… โ˜… โ˜…

Introduction & Quick Start#

Part 1 of 5 in the OpenEnv Getting Started Series

This notebook introduces OpenEnv, explains why it exists, and gets you running your first environment.

Note

Time: ~10 minutes | Difficulty: Beginner | GPU Required: No

What Youโ€™ll Learn#

  • What is OpenEnv: The unified framework for RL environments

  • Why OpenEnv: How it compares to traditional solutions like Gym

  • RL Basics: The observe-act-reward loop in 60 seconds

  • Quick Start: Connect to and interact with your first environment

Setup: Enable nested async event loops#

This is needed when running in environments like Sphinx-Gallery or Jupyter that already have an event loop running.

import nest_asyncio
nest_asyncio.apply()

What is OpenEnv?#

OpenEnv is a unified framework for building, sharing, and interacting with reinforcement learning environments. Itโ€™s a collaborative effort between Meta, Hugging Face, Unsloth, GPU Mode, and other industry leaders.

The Goal: Make environment creation as easy and standardized as model sharing on Hugging Face.

Key Features#

  • Standardized API: Gymnasium-style reset(), step(), state()

  • Type-Safe: Full IDE autocomplete and error checking

  • Containerized: Environments run in Docker for isolation and reproducibility

  • Shareable: Push to Hugging Face Hub with one command

  • Language-Agnostic: HTTP/WebSocket API works from any language

RL in 60 Seconds#

Reinforcement Learning is simpler than you think. Itโ€™s just a loop:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 THE RL LOOP                                 โ”‚
โ”‚                                                             โ”‚
โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                      โ”‚
โ”‚    โ”‚  AGENT  โ”‚โ”€actionโ”€โ–ถโ”‚ ENVIRONMENT โ”‚                      โ”‚
โ”‚    โ”‚         โ”‚โ—€โ”€rewardโ”€โ”‚             โ”‚                      โ”‚
โ”‚    โ”‚         โ”‚โ—€โ”€โ”€obsโ”€โ”€โ”€โ”‚             โ”‚                      โ”‚
โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ”‚
โ”‚                                                             โ”‚
โ”‚    1. Agent observes the environment                        โ”‚
โ”‚    2. Agent chooses an action                               โ”‚
โ”‚    3. Environment returns reward + new observation          โ”‚
โ”‚    4. Repeat until done                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

In code, it looks like this:

result = env.reset()                    # Start episode
while not result.done:
    action = agent.choose(result.observation)
    result = env.step(action)           # Take action, get reward
    agent.learn(result.reward)

Thatโ€™s it. Thatโ€™s RL!

Why OpenEnv? (vs. Traditional Solutions)#

Traditional RL environments (like OpenAI Gym/Gymnasium) have been the backbone of RL research for years. They provide a simple API for interacting with environments, and the community has built thousands of environments on top of them.

However, as RL moves from research to production, several challenges emerge:

The Problem with Traditional Approaches#

  1. No Type Safety: Observations are numpy arrays like obs[0][3]. What does index 3 mean? You have to read documentation or source code to find out.

  2. Same-Process Execution: The environment runs in your training process. A bug in the environment can crash your entire training run.

  3. Dependency Hell: Sharing environments means copying files and hoping the recipient has the same dependencies installed.

  4. Python Lock-in: Want to use Rust or C++ for your agent? Too badโ€”Gym is Python-only.

  5. โ€œWorks on My Machineโ€: Environments behave differently on different systems due to floating-point differences, library versions, or OS quirks.

How OpenEnv Solves These Problems#

Challenge

Traditional (Gym)

OpenEnv

Type Safety

obs[0][3] - what is it?

obs.info_state - IDE knows!

Isolation

Same process (can crash)

Docker container (isolated)

Deployment

โ€œWorks on my machineโ€

Same container everywhere

Sharing

Copy files, manage deps

openenv push to Hub

Language

Python only

Any language (HTTP/WebSocket)

Scaling

Single machine

Deploy to Kubernetes

Debugging

Cryptic numpy index errors

Clear, typed error messages

Side-by-Side Code Comparison#

Letโ€™s compare the same workflow in both approaches:

Traditional Gym approach:

import gym
import numpy as np

# Create environment - runs in your process
env = gym.make("CartPole-v1")

# Reset returns numpy arrays
obs, info = env.reset()
# obs = array([0.01, 0.02, -0.03, 0.01])
# What do these numbers mean? You have to check docs!

# Step returns multiple values
obs, reward, done, truncated, info = env.step(action)
# No IDE autocomplete, easy to mix up return values

# If env crashes, your whole training crashes
# Sharing requires: pip install gym[atari], hope versions match

OpenEnv approach:

from openenv import AutoEnv, AutoAction

# Load environment and action classes via auto-discovery
OpenSpielEnv = AutoEnv.get_env_class("openspiel")
OpenSpielAction = AutoAction.from_env("openspiel")

# Connect to containerized environment
with OpenSpielEnv(base_url="http://localhost:8000") as env:
    # Reset returns typed StepResult
    result = env.reset()
    # result.observation.legal_actions - IDE autocompletes!
    # result.observation.info_state - you know exactly what this is

    # Step with typed action
    action = OpenSpielAction(action_id=1, game_name="catch")
    result = env.step(action)
    # result.reward, result.done - all typed

    # Environment runs in Docker - isolated from your code
    # Share via: openenv push my-env (one command!)

Part 1: Environment Setup#

Letโ€™s set up our environment. This works in Google Colab, locally, or anywhere Python runs.

import subprocess
import sys
from pathlib import Path

# Detect environment
try:
    import google.colab

    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    print("=" * 70)
    print("   GOOGLE COLAB DETECTED - Installing OpenEnv...")
    print("=" * 70)

    # Install OpenEnv
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "openenv-core"],
        capture_output=True,
    )
    print("   OpenEnv installed!")
    print("=" * 70)
else:
    print("=" * 70)
    print("   RUNNING LOCALLY")
    print("=" * 70)
    print()
    print("If you haven't installed OpenEnv yet:")
    print("   pip install openenv-core")
    print()

    # Add src to path for local development (when running from docs folder)
    src_path = Path.cwd().parent.parent.parent / "src"
    if src_path.exists():
        sys.path.insert(0, str(src_path))

    # Add envs to path
    envs_path = Path.cwd().parent.parent.parent / "envs"
    if envs_path.exists():
        sys.path.insert(0, str(envs_path.parent))

    print("=" * 70)

print()
print("Ready to explore OpenEnv!")
======================================================================
   RUNNING LOCALLY
======================================================================

If you haven't installed OpenEnv yet:
   pip install openenv-core

======================================================================

Ready to explore OpenEnv!

Part 2: Your First Environment - OpenSpiel#

What is OpenSpiel?#

OpenSpiel is an open-source collection of 70+ game environments developed by DeepMind for research in reinforcement learning, game theory, and multi-agent systems.

It includes:

  • Classic board games: Chess, Go, Backgammon, Tic-Tac-Toe

  • Card games: Poker variants, Blackjack, Bridge

  • Simple RL benchmarks: Catch, Cliff Walking, 2048

  • Multi-agent games: Hanabi, Kuhn Poker, Negotiation games

OpenSpiel is widely used in RL research because it provides consistent, well-tested implementations with support for both single-player and multi-player scenarios.

How OpenSpiel Connects to OpenEnv#

OpenEnv wraps OpenSpiel games as containerized, type-safe environments. This means:

  1. You get all the benefits of OpenSpielโ€™s game library

  2. Plus type-safe Python clients with IDE autocomplete

  3. Plus Docker isolation for reproducibility

  4. Plus easy sharing via Hugging Face Hub

Currently, OpenEnv includes wrappers for 6 OpenSpiel games:

Game

Players

Description

Catch

1

Catch a falling ball with a paddle

2048

1

Slide tiles to combine numbers

Blackjack

1

Classic card game against dealer

Cliff Walking

1

Navigate a grid while avoiding cliffs

Tic-Tac-Toe

2

Classic 3ร—3 grid game

Kuhn Poker

2

Simplified 3-card poker

The Catch Game#

For this tutorial, weโ€™ll use Catchโ€”one of the simplest RL environments. Itโ€™s perfect for learning because:

  • Simple rules (easy to understand)

  • Fast episodes (10 steps each)

  • Clear success metric (did you catch the ball?)

  • Optimal strategy is learnable (move toward the ball)

Game Rules:

โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ    <- Ball starts at random column (row 0)
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ โฌœ โฌœ โฌœ       The ball falls down one row
โฌœ โฌœ โฌœ โฌœ โฌœ       each time step
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ โฌœ โฌœ โฌœ
โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ    <- Paddle at bottom (row 9)
  • Grid Size: 10 rows ร— 5 columns

  • Ball: Starts at a random column in row 0, falls one row per step

  • Paddle: Starts at center column, you control it

  • Episode Length: 10 steps (ball reaches bottom)

Actions:

Action ID

Movement

0

Move LEFT

1

STAY (no move)

2

Move RIGHT

Rewards:

  • +1.0 if the paddle is in the same column as the ball when it lands

  • 0.0 if you miss the ball

Optimal Strategy: Track the ballโ€™s column and move toward it. A perfect policy wins 100% of the time since the paddle can always reach any column in 10 steps (grid is only 5 columns wide).

Importing OpenEnv#

First, letโ€™s import the OpenSpiel environment client and models:

# Real imports from OpenEnv
try:
    # Direct imports from the openspiel_env package
    from openspiel_env.client import OpenSpielEnv
    from openspiel_env.models import OpenSpielAction, OpenSpielObservation, OpenSpielState

    OPENENV_AVAILABLE = True
    print("โœ“ OpenEnv imports successful!")
    print(f"  - OpenSpielEnv: {OpenSpielEnv}")
    print(f"  - OpenSpielAction: {OpenSpielAction}")
except ImportError as e:
    OPENENV_AVAILABLE = False
    print(f"โœ— OpenEnv not fully installed: {e}")
    print("  Run: pip install openenv-core")
    print("  And: pip install -e ./envs/openspiel_env")
โœ“ OpenEnv imports successful!
  - OpenSpielEnv: <class 'openspiel_env.client.OpenSpielEnv'>
  - OpenSpielAction: <class 'openspiel_env.models.OpenSpielAction'>

Connecting to an Environment#

OpenEnv provides three ways to connect to environments:

  1. From Hugging Face Hub (auto-downloads and starts container)

  2. From Docker image (uses local image)

  3. From URL (connects to running server)

Letโ€™s examine the actual methods available on the client class:

print("=" * 70)
print("   THREE WAYS TO CONNECT")
print("=" * 70)
print()

if OPENENV_AVAILABLE:
    # Show actual method signatures from the class
    import inspect

    print("Connection methods available on OpenSpielEnv:")
    print()

    # Method 1: from_hub
    if hasattr(OpenSpielEnv, "from_hub"):
        sig = inspect.signature(OpenSpielEnv.from_hub)
        print(f"1. OpenSpielEnv.from_hub{sig}")
        print("   โ†’ Auto-downloads from Hugging Face, starts container, connects")
        print("   Example: env = OpenSpielEnv.from_hub('openenv/openspiel-env')")
        print()

    # Method 2: from_docker_image
    if hasattr(OpenSpielEnv, "from_docker_image"):
        sig = inspect.signature(OpenSpielEnv.from_docker_image)
        print(f"2. OpenSpielEnv.from_docker_image{sig}")
        print("   โ†’ Starts container from local image, connects")
        print("   Example: env = OpenSpielEnv.from_docker_image('openspiel-env:latest')")
        print()

    # Method 3: Direct connection
    sig = inspect.signature(OpenSpielEnv.__init__)
    print(f"3. OpenSpielEnv.__init__{sig}")
    print("   โ†’ Connects to already-running server")
    print("   Example: env = OpenSpielEnv(base_url='http://localhost:8000')")
    print()

    print("-" * 70)
    print("All three give you the same API - just different ways to start!")
else:
    print("(OpenEnv not installed - showing expected methods)")
    print()
    print("1. OpenSpielEnv.from_hub(repo_id, *, use_docker=True, ...)")
    print("   โ†’ Auto-downloads from Hugging Face, starts container, connects")
    print()
    print("2. OpenSpielEnv.from_docker_image(image, provider=None, ...)")
    print("   โ†’ Starts container from local image, connects")
    print()
    print("3. OpenSpielEnv(base_url, connect_timeout_s=10.0, ...)")
    print("   โ†’ Connects to already-running server")
======================================================================
   THREE WAYS TO CONNECT
======================================================================

Connection methods available on OpenSpielEnv:

2. OpenSpielEnv.from_docker_image(image: 'str', provider: "Optional['ContainerProvider']" = None, **kwargs: 'Any') -> 'EnvClientT'
   โ†’ Starts container from local image, connects
   Example: env = OpenSpielEnv.from_docker_image('openspiel-env:latest')

3. OpenSpielEnv.__init__(self, base_url: 'str', connect_timeout_s: 'float' = 10.0, message_timeout_s: 'float' = 60.0, max_message_size_mb: 'float' = 100.0, provider: "Optional['ContainerProvider | RuntimeProvider']" = None, mode: 'Optional[str]' = None)
   โ†’ Connects to already-running server
   Example: env = OpenSpielEnv(base_url='http://localhost:8000')

----------------------------------------------------------------------
All three give you the same API - just different ways to start!

Part 3: Playing the Catch Game#

Now letโ€™s actually play! This code attempts to connect to a real server. If no server is running, weโ€™ll show what the interaction looks like.

import random

# Check if we can connect to a server
SERVER_URL = "http://localhost:8000"
SERVER_AVAILABLE = False

if OPENENV_AVAILABLE:
    try:
        # Try to connect using sync wrapper
        env = OpenSpielEnv(base_url=SERVER_URL)
        with env.sync() as client:
            # Quick test to verify connection
            pass
        SERVER_AVAILABLE = True
        print(f"โœ“ Connected to server at {SERVER_URL}")
    except Exception as e:
        print(f"โœ— No server running at {SERVER_URL}")
        print(f"  Error: {e}")
        print()
        print("To start a server, run one of these:")
        print("  docker run -p 8000:8000 openenv/openspiel-env:latest")
        print("  # OR")
        print("  cd envs/openspiel_env && openenv serve")
โœ— No server running at http://localhost:8000
  Error: Failed to connect to ws://localhost:8000/ws: Multiple exceptions: [Errno 111] Connect call failed ('::1', 8000, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 8000)

To start a server, run one of these:
  docker run -p 8000:8000 openenv/openspiel-env:latest
  # OR
  cd envs/openspiel_env && openenv serve

Playing with a Real Server#

When connected to a real server, hereโ€™s how the interaction works:

if OPENENV_AVAILABLE and SERVER_AVAILABLE:
    print("=" * 70)
    print("   PLAYING CATCH - LIVE!")
    print("=" * 70)

    env = OpenSpielEnv(base_url=SERVER_URL)
    with env.sync() as client:
        # Reset to start a new episode
        result = client.reset()

        print(f"\nEpisode started!")
        print(f"  Observation type: {type(result.observation).__name__}")
        print(f"  Legal actions: {result.observation.legal_actions}")
        print(f"  Done: {result.done}")

        # Play until the episode ends
        step_count = 0
        while not result.done:
            # Choose a random action from legal actions
            action_id = random.choice(result.observation.legal_actions)
            action = OpenSpielAction(action_id=action_id, game_name="catch")

            # Take the action
            result = client.step(action)
            step_count += 1

            print(f"\nStep {step_count}:")
            print(f"  Action: {action_id} ({'LEFT' if action_id == 0 else 'STAY' if action_id == 1 else 'RIGHT'})")
            print(f"  Reward: {result.reward}")
            print(f"  Done: {result.done}")

        # Get final state
        state = client.state()
        print(f"\nEpisode complete!")
        print(f"  Total steps: {state.step_count}")
        print(f"  Final reward: {result.reward}")
        print(f"  Result: {'CAUGHT!' if result.reward > 0 else 'MISSED!'}")

else:
    # Run a local simulation to demonstrate the gameplay
    print("=" * 70)
    print("   PLAYING CATCH - LOCAL SIMULATION")
    print("=" * 70)
    print()
    print("No server running - demonstrating with local simulation.")
    print("(This shows exactly what happens when playing the real game)")
    print()

    # Simulate the Catch game locally
    GRID_HEIGHT = 10
    GRID_WIDTH = 5

    # Initialize game state
    ball_col = random.randint(0, GRID_WIDTH - 1)
    paddle_col = GRID_WIDTH // 2  # Start in center

    print(f"Game initialized:")
    print(f"  Ball starting column: {ball_col}")
    print(f"  Paddle starting column: {paddle_col}")
    print(f"  Grid size: {GRID_HEIGHT} rows ร— {GRID_WIDTH} columns")
    print()

    # Simulate episode
    for step in range(GRID_HEIGHT):
        # Create observation (matching OpenSpiel format)
        info_state = [0.0] * (GRID_HEIGHT * GRID_WIDTH)
        info_state[step * GRID_WIDTH + ball_col] = 1.0  # Ball position
        info_state[(GRID_HEIGHT - 1) * GRID_WIDTH + paddle_col] = 1.0  # Paddle

        legal_actions = [0, 1, 2]  # LEFT, STAY, RIGHT

        # Choose random action
        action_id = random.choice(legal_actions)
        action_name = {0: "LEFT", 1: "STAY", 2: "RIGHT"}[action_id]

        # Execute action
        old_paddle = paddle_col
        if action_id == 0:  # LEFT
            paddle_col = max(0, paddle_col - 1)
        elif action_id == 2:  # RIGHT
            paddle_col = min(GRID_WIDTH - 1, paddle_col + 1)

        print(f"Step {step + 1}: Ball at row {step}, col {ball_col} | "
              f"Paddle: {old_paddle}โ†’{paddle_col} ({action_name})")

    # Determine result
    caught = (paddle_col == ball_col)
    reward = 1.0 if caught else 0.0

    print()
    print(f"Episode complete!")
    print(f"  Ball landed at column: {ball_col}")
    print(f"  Paddle final column: {paddle_col}")
    print(f"  Reward: {reward}")
    print(f"  Result: {'CAUGHT! ๐ŸŽ‰' if caught else 'MISSED! ๐Ÿ˜ข'}")
    print()
    print("-" * 70)
    print("This is exactly how the real OpenSpielEnv works,")
    print("just running locally instead of via WebSocket to a server.")
======================================================================
   PLAYING CATCH - LOCAL SIMULATION
======================================================================

No server running - demonstrating with local simulation.
(This shows exactly what happens when playing the real game)

Game initialized:
  Ball starting column: 0
  Paddle starting column: 2
  Grid size: 10 rows ร— 5 columns

Step 1: Ball at row 0, col 0 | Paddle: 2โ†’3 (RIGHT)
Step 2: Ball at row 1, col 0 | Paddle: 3โ†’4 (RIGHT)
Step 3: Ball at row 2, col 0 | Paddle: 4โ†’3 (LEFT)
Step 4: Ball at row 3, col 0 | Paddle: 3โ†’2 (LEFT)
Step 5: Ball at row 4, col 0 | Paddle: 2โ†’1 (LEFT)
Step 6: Ball at row 5, col 0 | Paddle: 1โ†’2 (RIGHT)
Step 7: Ball at row 6, col 0 | Paddle: 2โ†’3 (RIGHT)
Step 8: Ball at row 7, col 0 | Paddle: 3โ†’4 (RIGHT)
Step 9: Ball at row 8, col 0 | Paddle: 4โ†’3 (LEFT)
Step 10: Ball at row 9, col 0 | Paddle: 3โ†’4 (RIGHT)

Episode complete!
  Ball landed at column: 0
  Paddle final column: 4
  Reward: 0.0
  Result: MISSED! ๐Ÿ˜ข

----------------------------------------------------------------------
This is exactly how the real OpenSpielEnv works,
just running locally instead of via WebSocket to a server.

Part 4: Understanding the Response Types#

OpenEnv uses type-safe models for all interactions. Letโ€™s create actual instances and examine their attributes:

print("=" * 70)
print("   OPENENV TYPE SYSTEM - ACTUAL INSTANCES")
print("=" * 70)

# Create example instances that match what you'd get from the Catch game
# These are the actual Pydantic models used by OpenEnv

# 1. OpenSpielObservation - what the agent receives after each step
print("\n๐Ÿ“ฆ OpenSpielObservation (returned in StepResult)")
print("-" * 50)

if OPENENV_AVAILABLE:
    # OpenSpielObservation was already imported above via auto-discovery
    # Create a sample observation like what Catch game returns
    sample_observation = OpenSpielObservation(
        info_state=[0.0, 0.0, 1.0, 0.0, 0.0] + [0.0] * 45,  # Ball at col 2, row 0
        legal_actions=[0, 1, 2],  # LEFT, STAY, RIGHT
        game_phase="playing",
        current_player_id=0,
        opponent_last_action=None,
    )

    print(f"  info_state: {sample_observation.info_state[:10]}... (length: {len(sample_observation.info_state)})")
    print(f"  legal_actions: {sample_observation.legal_actions}")
    print(f"  game_phase: {sample_observation.game_phase!r}")
    print(f"  current_player_id: {sample_observation.current_player_id}")
    print(f"  opponent_last_action: {sample_observation.opponent_last_action}")
else:
    # Create without imports to show the structure
    from dataclasses import dataclass
    from typing import List, Optional

    @dataclass
    class OpenSpielObservation:
        info_state: List[float]
        legal_actions: List[int]
        game_phase: str = "playing"
        current_player_id: int = 0
        opponent_last_action: Optional[int] = None

    sample_observation = OpenSpielObservation(
        info_state=[0.0, 0.0, 1.0, 0.0, 0.0] + [0.0] * 45,
        legal_actions=[0, 1, 2],
        game_phase="playing",
        current_player_id=0,
        opponent_last_action=None,
    )

    print(f"  info_state: {sample_observation.info_state[:10]}... (length: {len(sample_observation.info_state)})")
    print(f"  legal_actions: {sample_observation.legal_actions}")
    print(f"  game_phase: {sample_observation.game_phase!r}")
    print(f"  current_player_id: {sample_observation.current_player_id}")
    print(f"  opponent_last_action: {sample_observation.opponent_last_action}")

# 2. OpenSpielState - the environment's internal state
print("\n๐Ÿ“Š OpenSpielState (returned by state())")
print("-" * 50)

if OPENENV_AVAILABLE:
    # OpenSpielState was already imported above via auto-discovery
    sample_state = OpenSpielState(
        game_name="catch",
        agent_player=0,
        opponent_policy="random",
        game_params={"rows": 10, "columns": 5},
        num_players=1,
    )

    print(f"  game_name: {sample_state.game_name!r}")
    print(f"  agent_player: {sample_state.agent_player}")
    print(f"  opponent_policy: {sample_state.opponent_policy!r}")
    print(f"  game_params: {sample_state.game_params}")
    print(f"  num_players: {sample_state.num_players}")
else:
    @dataclass
    class OpenSpielState:
        game_name: str = "catch"
        agent_player: int = 0
        opponent_policy: str = "random"
        game_params: dict = None
        num_players: int = 1

    sample_state = OpenSpielState(
        game_name="catch",
        agent_player=0,
        opponent_policy="random",
        game_params={"rows": 10, "columns": 5},
        num_players=1,
    )

    print(f"  game_name: {sample_state.game_name!r}")
    print(f"  agent_player: {sample_state.agent_player}")
    print(f"  opponent_policy: {sample_state.opponent_policy!r}")
    print(f"  game_params: {sample_state.game_params}")
    print(f"  num_players: {sample_state.num_players}")

# 3. OpenSpielAction - what you send to step()
print("\n๐ŸŽฎ OpenSpielAction (what you send to step())")
print("-" * 50)

if OPENENV_AVAILABLE:
    # OpenSpielAction was already imported above via auto-discovery
    sample_action = OpenSpielAction(
        action_id=1,  # STAY
        game_name="catch",
        game_params={"rows": 10, "columns": 5},
    )

    print(f"  action_id: {sample_action.action_id}  # 0=LEFT, 1=STAY, 2=RIGHT")
    print(f"  game_name: {sample_action.game_name!r}")
    print(f"  game_params: {sample_action.game_params}")
else:
    @dataclass
    class OpenSpielAction:
        action_id: int
        game_name: str = "catch"
        game_params: dict = None

    sample_action = OpenSpielAction(
        action_id=1,
        game_name="catch",
        game_params={"rows": 10, "columns": 5},
    )

    print(f"  action_id: {sample_action.action_id}  # 0=LEFT, 1=STAY, 2=RIGHT")
    print(f"  game_name: {sample_action.game_name!r}")
    print(f"  game_params: {sample_action.game_params}")

print("\n" + "=" * 70)
print("These are the actual Pydantic/dataclass models used by OpenEnv.")
print("Type safety helps catch errors before they reach the environment!")
print("=" * 70)
======================================================================
   OPENENV TYPE SYSTEM - ACTUAL INSTANCES
======================================================================

๐Ÿ“ฆ OpenSpielObservation (returned in StepResult)
--------------------------------------------------
  info_state: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]... (length: 50)
  legal_actions: [0, 1, 2]
  game_phase: 'playing'
  current_player_id: 0
  opponent_last_action: None

๐Ÿ“Š OpenSpielState (returned by state())
--------------------------------------------------
  game_name: 'catch'
  agent_player: 0
  opponent_policy: 'random'
  game_params: {'rows': 10, 'columns': 5}
  num_players: 1

๐ŸŽฎ OpenSpielAction (what you send to step())
--------------------------------------------------
  action_id: 1  # 0=LEFT, 1=STAY, 2=RIGHT
  game_name: 'catch'
  game_params: {'rows': 10, 'columns': 5}

======================================================================
These are the actual Pydantic/dataclass models used by OpenEnv.
Type safety helps catch errors before they reach the environment!
======================================================================

Part 5: The Architecture#

OpenEnv uses a client-server architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  YOUR CODE                                                  โ”‚
โ”‚                                                             โ”‚
โ”‚  from openenv import AutoEnv                                โ”‚
โ”‚  OpenSpielEnv = AutoEnv.get_env_class("openspiel")          โ”‚
โ”‚  env = OpenSpielEnv(base_url="http://localhost:8000")       โ”‚
โ”‚  result = env.reset()      # Sends WebSocket message        โ”‚
โ”‚  result = env.step(action) # Sends WebSocket message        โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
                         โ”‚ WebSocket (persistent connection)
                         โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  DOCKER CONTAINER                                           โ”‚
โ”‚                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚  FastAPI Server + Environment Logic                 โ”‚    โ”‚
โ”‚  โ”‚  - /ws (WebSocket endpoint)                         โ”‚    โ”‚
โ”‚  โ”‚  - Handles reset(), step(), state()                 โ”‚    โ”‚
โ”‚  โ”‚  - Runs the actual game simulation                  โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                                                             โ”‚
โ”‚  Isolated โ€ข Reproducible โ€ข Scalable                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key insight: You never deal with HTTP/WebSocket directly. The OpenEnv client handles all the networking!

Summary#

In this notebook, you learned:

What OpenEnv Is:

  • A unified framework for RL environments

  • Containerized, type-safe, and shareable

Why Use OpenEnv:

  • Type safety with IDE autocomplete

  • Isolated Docker containers

  • Easy sharing via Hugging Face Hub

How to Use It:

  • env.reset() - Start a new episode

  • env.step(action) - Take an action

  • env.state() - Get current state

Next Steps#

Continue to Notebook 2: Using Environments

In the next notebook, youโ€™ll:

  • Explore all available OpenEnv environments

  • Create different AI policies

  • Run evaluations and compare performance

  • Work with multi-player games

Total running time of the script: (0 minutes 0.017 seconds)

Gallery generated by Sphinx-Gallery