Skip to content

- openenv

OpenSpiel Environment

Integration of OpenSpiel games with the OpenEnv framework. OpenSpiel (https://github.com/google-deepmind/open_spiel) is DeepMind's collection of 70+ game environments for RL research.

Supported Games

This environment supports 6 games across different categories:

Single-Player Games (No Opponent)

  1. Catch - Move horizontally to catch a falling ball
  2. Cliff Walking - Navigate grid without falling off cliff (Sutton & Barto benchmark)
  3. 2048 - Classic tile-merging puzzle game
  4. Blackjack - Simplified blackjack (HIT/STAND only)

Multi-Player Games (with Bot Opponent)

  1. Tic-Tac-Toe - Classic 3x3 game
  2. Kuhn Poker - 2-player simplified poker (game theory benchmark)

Architecture

┌────────────────────────────────────┐
│ RL Training Code (Client)          │
│   OpenSpielEnv.step(action)        │
└──────────────┬─────────────────────┘
               │ HTTP
┌──────────────▼─────────────────────┐
│ FastAPI Server (Docker)            │
│   OpenSpielEnvironment             │
│     ├─ Wraps rl_environment.Env    │
│     ├─ Agent controls player 0     │
│     └─ Opponent: Random/Fixed      │
└────────────────────────────────────┘

Installation & Usage

Option 1: Local Development (without Docker)

Requirements: - OpenSpiel must be installed (see https://github.com/google-deepmind/open_spiel) - Python 3.11+

from envs.openspiel_env import OpenSpielEnv, OpenSpielAction

# Start local server manually
# python -m envs.openspiel_env.server.app

# Connect to local server
env = OpenSpielEnv(base_url="http://localhost:8000")

# Reset environment
result = env.reset()
print(f"Initial state: {result.observation.info_state}")
print(f"Legal actions: {result.observation.legal_actions}")

# Take actions
for _ in range(10):
    action_id = result.observation.legal_actions[0]  # Choose first legal action
    result = env.step(OpenSpielAction(action_id=action_id))
    print(f"Reward: {result.reward}, Done: {result.done}")
    if result.done:
        break

# Cleanup
env.close()

Build Docker image:

cd OpenEnv
docker build -f src/envs/openspiel_env/server/Dockerfile -t openspiel-env:latest .

Run specific games:

# Catch (default)
docker run -p 8000:8000 openspiel-env:latest

# Tic-Tac-Toe with random opponent
docker run -p 8000:8000 -e OPENSPIEL_GAME=tic_tac_toe openspiel-env:latest

# Kuhn Poker
docker run -p 8000:8000 -e OPENSPIEL_GAME=kuhn_poker openspiel-env:latest

# 2048
docker run -p 8000:8000 -e OPENSPIEL_GAME=2048 openspiel-env:latest

Use with from_docker_image():

from envs.openspiel_env import OpenSpielEnv, OpenSpielAction

# Automatically starts container
env = OpenSpielEnv.from_docker_image("openspiel-env:latest")

result = env.reset()
result = env.step(OpenSpielAction(action_id=0))

env.close()  # Stops container

Game-Specific Information

1. Catch

  • Type: Single-player
  • Action Space: 3 actions (left, stay, right)
  • Observation: 5x5 grid flattened (25 dimensions)
  • Reward: +1 for catching ball, 0 otherwise
  • Episode Length: ~10 steps
env = OpenSpielEnv.from_docker_image("openspiel-env:latest")
# Or set OPENSPIEL_GAME=catch

2. Tic-Tac-Toe

  • Type: 2-player turn-based, perfect information
  • Players: Agent (X) vs Random Bot (O)
  • Action Space: 9 positions
  • Observation: 27 dimensions (3x3 board + game state)
  • Reward: +1 win, -1 loss, 0 draw/mid-game
# Set environment variable or run directly
docker run -p 8000:8000 -e OPENSPIEL_GAME=tic_tac_toe openspiel-env:latest

3. Kuhn Poker

  • Type: 2-player turn-based, imperfect information
  • Players: Agent vs Random Bot
  • Action Space: 2 actions (pass/fold, bet/call)
  • Observation: 6 dimensions (card + betting history)
  • Reward: Pot winnings (typically -1, 0, +1, +2)
  • Notes: THE benchmark for imperfect-information RL
docker run -p 8000:8000 -e OPENSPIEL_GAME=kuhn_poker openspiel-env:latest

4. Cliff Walking

  • Type: Single-player grid world
  • Action Space: 4 actions (up, down, left, right)
  • Observation: Position encoding
  • Reward: -1 per step, -100 for falling off cliff
  • Notes: Classic RL benchmark from Sutton & Barto
docker run -p 8000:8000 -e OPENSPIEL_GAME=cliff_walking openspiel-env:latest

5. 2048

  • Type: Single-player puzzle
  • Action Space: 4 actions (up, down, left, right)
  • Observation: 4x4 grid with tile values
  • Reward: Points from merging tiles
  • Notes: Stochastic tile spawning
docker run -p 8000:8000 -e OPENSPIEL_GAME=2048 openspiel-env:latest

6. Blackjack

  • Type: Single-player vs dealer
  • Action Space: 2 actions (HIT, STAND)
  • Observation: Player hand + dealer's visible card
  • Reward: +1 win, -1 loss, 0 draw
  • Notes: Simplified version, no double/split
docker run -p 8000:8000 -e OPENSPIEL_GAME=blackjack openspiel-env:latest

Configuration

Environment Variables

  • OPENSPIEL_GAME: Game name (default: "catch")
  • OPENSPIEL_AGENT_PLAYER: Player ID for agent (default: 0)
  • OPENSPIEL_OPPONENT_POLICY: Opponent policy for multi-player games
  • random: Uniform random (default)
  • first: Always picks first legal action
  • last: Always picks last legal action

Example: Tic-Tac-Toe with Fixed Opponent

docker run -p 8000:8000 \
  -e OPENSPIEL_GAME=tic_tac_toe \
  -e OPENSPIEL_OPPONENT_POLICY=first \
  openspiel-env:latest

API Reference

OpenSpielAction

@dataclass
class OpenSpielAction(Action):
    action_id: int                      # Action to take
    game_name: str = "catch"            # Game name
    game_params: Dict[str, Any] = {}    # Optional game parameters

OpenSpielObservation

@dataclass
class OpenSpielObservation(Observation):
    info_state: List[float]             # Agent's information state
    legal_actions: List[int]            # Legal action IDs
    game_phase: str                     # "initial", "playing", "terminal"
    current_player_id: int              # Current player (-1 for simultaneous)
    opponent_last_action: Optional[int] # Last opponent action (if available)
    done: bool                          # Episode finished
    reward: Optional[float]             # Reward for last action

OpenSpielState

@dataclass
class OpenSpielState(State):
    episode_id: str                     # Unique episode ID
    step_count: int                     # Number of steps
    game_name: str                      # Game name
    agent_player: int                   # Agent's player ID
    opponent_policy: str                # Opponent policy name
    num_players: int                    # Total players

Testing

Automated Testing (All 6 Games)

Quick test of all games in Docker:

./test_docker_all_games.sh

This automated script will: - Build and run Docker containers for each game - Test reset, step, and state APIs - Verify episode completion - Report pass/fail for all 6 games

Expected output:

========================================
OpenSpiel Docker Integration Test
========================================

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Testing: catch
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  🐳 Starting Docker container...
  ⏳ Waiting for server to be ready...
  ✓ Server ready (2s)
  🎮 Running Python client test...
  ✓ PASSED - Episode completed successfully

[... tests all 6 games ...]

========================================
Test Summary
========================================

  ✓ catch
  ✓ tic_tac_toe
  ✓ kuhn_poker
  ✓ cliff_walking
  ✓ 2048
  ✓ blackjack

Total: 6 passed, 0 failed out of 6 games

========================================
All tests PASSED! 🎉
========================================

Manual Testing

# Local (requires OpenSpiel installed)
python -m pytest src/envs/openspiel_env/

# Docker build
docker build -f src/envs/openspiel_env/server/Dockerfile -t openspiel-env:latest .

# Run specific game
docker run -p 8000:8000 openspiel-env:latest

# Test from another terminal
python3 examples/openspiel_simple.py

Development

Adding New Games

To add support for more OpenSpiel games:

  1. Verify the game works with rl_environment.Environment
  2. Test with different opponent policies if multi-player
  3. Document game-specific configuration
  4. Add example script

Limitations

  • Simultaneous-move games: Only agent_player=0 supported
  • Multi-agent training: Single agent only (no self-play yet)
  • Opponent policies: Random and fixed only (no MCTS yet)
  • Build time: Docker image takes ~5-10 minutes to build (compiles C++)

Future Work

  • MCTS opponent policies
  • Self-play support (multiple agents)
  • More games (Chess, Go, Poker Hold'em)
  • Faster build with pre-built OpenSpiel base image
  • Game-specific reward shaping options

References