- openenv¶
Chat Environment¶
A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.
Overview¶
ChatEnvironment is a lightweight environment that: - Manages conversation history in Huggingface chat format - Handles tokenization internally using any compatible tokenizer - Stores both messages and tokens for efficient model interaction - Provides a clean interface for building chat-based RL agents
ChatEnvironment can be used in two ways: 1. Direct usage: Import and use ChatEnvironment directly in your Python code (best for local development) 2. HTTP client: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)
Quick Start¶
Option 1: Direct Usage (Local)¶
from transformers import AutoTokenizer
from envs.chat_env import ChatAction, ChatObservation
from envs.chat_env.server import ChatEnvironment
from core.env_server import Message
# Initialize with a tokenizer and optional system prompt
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(
tokenizer=tokenizer,
system_prompt="You are a helpful assistant.",
system_role="system"
)
# Reset the environment
obs = env.reset()
print(f"Messages: {obs.messages}")
print(f"Tokens shape: {obs.tokens.shape}")
# Create an action from a message
user_message: Message = {"role": "user", "content": "Hello!"}
action = env.message_to_action(user_message)
# Step the environment
obs = env.step(action)
print(f"Updated messages: {obs.messages}")
print(f"Updated tokens shape: {obs.tokens.shape}")
Option 2: HTTP Client (Distributed)¶
from transformers import AutoTokenizer
from envs.chat_env import ChatEnv, ChatAction
import torch
# Create environment from Docker image
client = ChatEnv.from_docker_image("chat-env:latest")
# Or connect to existing server
# client = ChatEnv(base_url="http://localhost:8000")
# Reset
result = client.reset()
print(f"Initial messages: {result.observation.messages}")
# Send an action with tokens
tokenizer = AutoTokenizer.from_pretrained("gpt2")
message = {"role": "user", "content": "Hello!"}
action = client.message_to_action(message, tokenizer)
result = client.step(action)
print(f"Messages: {result.observation.messages}")
print(f"Reward: {result.reward}")
# Cleanup
client.close()
Building the Docker Image¶
Before using the HTTP client, build the Docker image:
# From project root
docker build -t chat-env:latest -f src/envs/chat_env/server/Dockerfile .
# Optionally specify a different tokenizer
docker build -t chat-env:latest \
--build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
-f src/envs/chat_env/server/Dockerfile .
Architecture¶
Data Models¶
ChatAction¶
Actions contain only tokens (PyTorch tensors) that interface directly with models:
@dataclass
class ChatAction(Action):
tokens: torch.Tensor # Required, cannot be empty
ChatObservation¶
Observations contain both the message history and flattened tokens:
@dataclass
class ChatObservation(Observation):
messages: list[Message] # List of {"role": str, "content": str}
tokens: torch.Tensor # Flattened tensor of all conversation tokens
# Inherited: done, reward, metadata
ChatState¶
Internal state tracking message and token history:
@dataclass
class ChatState(State):
history_messages: list[Message]
history_tokens: list[torch.Tensor]
# Inherited: episode_id, step_count
Key Methods¶
reset() -> ChatObservation¶
Resets the environment to initial state with optional system prompt.
step(action: ChatAction) -> ChatObservation¶
Takes an action (tokens), decodes to text, adds to history, returns updated observation.
message_to_action(message: Message) -> ChatAction¶
Convenience method to convert a message dict to a tokenized ChatAction.
Usage Patterns¶
Basic Conversation¶
from transformers import AutoTokenizer
from envs.chat_env.server import ChatEnvironment
from core.env_server import Message
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(tokenizer=tokenizer)
# Reset
obs = env.reset()
# User turn
user_msg: Message = {"role": "user", "content": "What is 2+2?"}
action = env.message_to_action(user_msg)
obs = env.step(action)
# Assistant turn
assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
action = env.message_to_action(assistant_msg)
obs = env.step(action)
# Access conversation history
print(f"Full conversation: {obs.messages}")
print(f"All tokens: {obs.tokens}")
With Transforms¶
You can add transforms to compute rewards or modify observations:
from core.env_server import Transform, Observation
class LengthRewardTransform(Transform):
"""Reward based on response length."""
def __call__(self, observation: Observation) -> Observation:
if hasattr(observation, 'messages') and observation.messages:
last_message = observation.messages[-1]
observation.reward = len(last_message['content']) * 0.1
return observation
env = ChatEnvironment(
tokenizer=tokenizer,
transform=LengthRewardTransform()
)
Direct Token Usage¶
If you're generating tokens from a model, you can create actions directly:
import torch
from envs.chat_env import ChatAction
# Assume you have tokens from your model
generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])
# Create action directly
action = ChatAction(tokens=generated_tokens)
# Step environment
obs = env.step(action)
Design Philosophy¶
ChatEnvironment is intentionally minimal and flexible:
- No HTTP overhead: Works directly with Python objects and tensors
- Tokenizer ownership: Environment handles tokenization consistently
- Dual representation: Maintains both human-readable messages and model-ready tokens
- Transform support: Extensible reward computation and observation modification
- Type-safe: Uses typed Messages compatible with Huggingface format
Integration with Models¶
ChatEnvironment pairs naturally with language models:
# Pseudo-code for RL training loop
model = YourLanguageModel()
env = ChatEnvironment(tokenizer=model.tokenizer)
for episode in range(num_episodes):
obs = env.reset()
while not obs.done:
# Model generates response tokens
action_tokens = model.generate(obs.tokens)
action = ChatAction(tokens=action_tokens)
# Step environment
obs = env.step(action)
# Use obs.reward for RL updates
model.update(obs.reward)
Project Structure¶
chat_env/
├── __init__.py # Module exports (ChatEnv, ChatAction, etc.)
├── README.md # This file
├── client.py # ChatEnv HTTP client
├── models.py # ChatAction, ChatObservation, ChatState
└── server/
├── __init__.py # Server module exports
├── chat_environment.py # Core ChatEnvironment implementation
├── app.py # FastAPI server application
├── test_chat_env.py # Unit tests
└── Dockerfile # Container image for HTTP server
Requirements¶
- Python 3.10+
- PyTorch
- A tokenizer with
apply_chat_templatemethod (e.g., Huggingface transformers)
Notes¶
- ChatEnvironment does not generate responses - it only manages conversation state
- You need to provide tokens from your model or other source
- The environment is thread-safe for single-threaded use only
- For multi-turn conversations, alternate between user and assistant messages