BrowserGym Environment#
BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.
Why BrowserGym?#
BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
What are these benchmarks?
MiniWoB++ (Training): 100+ synthetic web tasks like “click this button”, “fill out this form”, “select from dropdown”. Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. No external setup needed - tasks run in isolated browser sessions.
WebArena (Evaluation): 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like “find the cheapest laptop and add to cart” or “create a merge request for bug #123”. Multistep, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. Requires running 7 backend services (shopping site, GitLab instance, etc.).
VisualWebArena: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
WorkArena: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
The training → evaluation pipeline:
Train on MiniWoB (simple, controlled, fast iterations)
Evaluate on WebArena (complex, realistic, measures real-world capability)
Key advantage: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
Quick Start - Training (MiniWoB)#
No Setup Required! 🎉#
from browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for MiniWoB training task
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc.
}
)
# Train your agent!
for episode in range(1000):
result = env.reset()
print(f"Goal: {result.observation.goal}")
done = False
while not done:
# Your agent decides what to do
action_str = agent.get_action(result.observation.text)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
done = result.done
print(f"Reward: {result.reward}")
env.close()
Available Tasks by Benchmark#
MiniWoB++ Tasks (Training - 100+ tasks)#
MiniWoB tasks are organized by difficulty and type. Here are the main categories:
Click Tasks (Basic interaction)
Task Name |
Description |
Difficulty |
|---|---|---|
|
Click a single button |
⭐ Easy |
|
Click button with specific text |
⭐ Easy |
|
Click buttons in order |
⭐⭐ Medium |
|
Select specific checkboxes |
⭐⭐ Medium |
|
Select checkboxes (multiple valid) |
⭐⭐ Medium |
|
Many checkboxes to select from |
⭐⭐ Medium |
|
Transfer learning variation |
⭐⭐ Medium |
|
Click correct button in dialog |
⭐ Easy |
|
More complex dialog |
⭐⭐ Medium |
|
Click on a link |
⭐ Easy |
|
Select from dropdown |
⭐⭐ Medium |
|
Click on pie chart slice |
⭐⭐ Medium |
|
Click item in scrollable list |
⭐⭐⭐ Hard |
|
Click on specific color shade |
⭐⭐ Medium |
|
Click on specific shape |
⭐⭐ Medium |
|
Switch between tabs |
⭐⭐ Medium |
|
More complex tab switching |
⭐⭐⭐ Hard |
|
Click on UI widget |
⭐⭐ Medium |
Text Entry Tasks (Typing and forms)
Task Name |
Description |
Difficulty |
|---|---|---|
|
Type text into input field |
⭐ Easy |
|
Dynamic text entry |
⭐⭐ Medium |
|
Multiple text fields |
⭐⭐ Medium |
|
Fill password field |
⭐ Easy |
|
Enter a date |
⭐⭐ Medium |
|
Enter a time |
⭐⭐ Medium |
|
Complete login form |
⭐⭐ Medium |
|
Login via popup |
⭐⭐⭐ Hard |
Navigation Tasks (Multi-step interaction)
Task Name |
Description |
Difficulty |
|---|---|---|
|
Navigate through tree structure |
⭐⭐⭐ Hard |
|
Use search interface |
⭐⭐ Medium |
|
Interact with autocomplete |
⭐⭐⭐ Hard |
|
Book a flight (complex form) |
⭐⭐⭐⭐ Very Hard |
|
Pick date from calendar |
⭐⭐⭐ Hard |
|
Simplified date picker |
⭐⭐ Medium |
|
Medium difficulty date picker |
⭐⭐⭐ Hard |
|
Select from long list |
⭐⭐ Medium |
Visual/Spatial Tasks (Requires visual understanding)
Task Name |
Description |
Difficulty |
|---|---|---|
|
Count sides of shape |
⭐⭐ Medium |
|
Count specific shapes |
⭐⭐ Medium |
|
Find word in text |
⭐⭐ Medium |
|
Focus on text element |
⭐ Easy |
|
More complex focus task |
⭐⭐ Medium |
|
Click grid coordinate |
⭐⭐ Medium |
|
Guess a number game |
⭐⭐⭐ Hard |
|
Identify shape type |
⭐⭐ Medium |
|
Extract info from table |
⭐⭐⭐ Hard |
|
More complex table reading |
⭐⭐⭐ Hard |
Email/Social Tasks (Realistic scenarios)
Task Name |
Description |
Difficulty |
|---|---|---|
|
Manage email inbox |
⭐⭐⭐⭐ Very Hard |
|
Forward emails |
⭐⭐⭐⭐ Very Hard |
|
Natural language email task |
⭐⭐⭐⭐ Very Hard |
|
Star and reply to emails |
⭐⭐⭐⭐ Very Hard |
|
Social media interaction |
⭐⭐⭐⭐ Very Hard |
|
Partial social media task |
⭐⭐⭐ Hard |
Total: 100+ tasks across all categories
Usage:
# Easy task for quick testing
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})
# Medium difficulty for training
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})
# Hard task for evaluation
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})
WebArena Tasks (Evaluation - 812 tasks)#
WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811.
By Website:
Website |
Task Count |
Description |
Example Tasks |
|---|---|---|---|
Shopping |
~200 |
E-commerce site |
Search products, add to cart, checkout |
Shopping Admin |
~150 |
Admin panel |
Manage products, orders, customers |
~150 |
Forum/social |
Post, comment, search discussions |
|
GitLab |
~200 |
Code repository |
Create issues, merge requests, review code |
Wikipedia |
~100 |
Knowledge base |
Search, read, extract information |
Map |
~12 |
Location service |
Find places, get directions |
By Difficulty:
Difficulty |
Task Count |
Steps Required |
Example |
|---|---|---|---|
Easy |
~200 |
1-5 steps |
“Find the price of product X” |
Medium |
~400 |
5-15 steps |
“Add cheapest laptop to cart” |
Hard |
~212 |
15+ steps |
“Create merge request for bug fix” |
Usage:
# Task 0 (usually easy)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
"SHOPPING": "http://your-server:7770",
# ... other URLs
})
# Task 156 (GitLab merge request)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "156",
# ... URLs
})
Note: WebArena tasks require the full backend infrastructure. See WebArena setup guide.
VisualWebArena Tasks (910 tasks)#
Similar to WebArena but requires visual understanding. Tasks involve:
Image-based reasoning
Visual element identification
Multimodal interaction (text + images)
WorkArena Tasks#
Enterprise software automation tasks:
CRM operations
Project management
Business workflows
Full task lists:
Evaluation (WebArena)#
Prerequisites#
WebArena requires setting up backend infrastructure. See the WebArena documentation.
Usage#
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for WebArena evaluation
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0", # Task ID
# WebArena backend URLs (required)
"SHOPPING": "http://your-server:7770",
"SHOPPING_ADMIN": "http://your-server:7780/admin",
"REDDIT": "http://your-server:9999",
"GITLAB": "http://your-server:8023",
"MAP": "http://your-server:3000",
"WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
"HOMEPAGE": "http://your-server:4399",
}
)
# Evaluate your trained agent
result = env.reset()
while not result.done:
action_str = agent.get_action(result.observation)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
print(f"Success: {result.reward}")
env.close()
Building the Docker Image#
Prerequisites#
Base Image: Build the OpenEnv base image first:
# From the OpenEnv repository root
docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .
Build the BrowserGym Environment#
# From the browsergym_env directory
cd envs/browsergym_env
docker build -t browsergym-env:latest -f server/Dockerfile .
Run the Server#
For MiniWoB (Training):#
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="miniwob" \
-e BROWSERGYM_TASK_NAME="click-test" \
browsergym-env:latest
For WebArena (Evaluation):#
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="webarena" \
-e BROWSERGYM_TASK_NAME="0" \
-e SHOPPING="http://your-server:7770" \
-e SHOPPING_ADMIN="http://your-server:7780/admin" \
-e REDDIT="http://your-server:9999" \
-e GITLAB="http://your-server:8023" \
-e MAP="http://your-server:3000" \
-e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
-e HOMEPAGE="http://your-server:4399" \
browsergym-env:latest
Environment Details#
Action#
Actions in BrowserGym are natural language strings that describe browser operations:
from envs.browsergym_env import BrowserGymAction
# Click actions
action = BrowserGymAction(action_str="click('Submit button')")
action = BrowserGymAction(action_str="click('element_id_123')")
# Type actions
action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
action = BrowserGymAction(action_str="fill('password', 'secret123')")
# Navigate actions
action = BrowserGymAction(action_str="goto('https://example.com')")
# Keyboard actions
action = BrowserGymAction(action_str="press('Enter')")
action = BrowserGymAction(action_str="press('Tab')")
# Scroll actions
action = BrowserGymAction(action_str="scroll('down')")
Observation#
Observations contain multiple modalities:
result = env.step(action)
obs = result.observation
# Text observations
print(obs.text) # Primary text representation (AXTree or DOM)
print(obs.axtree_txt) # Accessibility tree
print(obs.pruned_html) # Pruned HTML (interactive elements only)
# Page metadata
print(obs.url) # Current URL
print(obs.goal) # Task goal/instruction
# Visual (if enabled)
if obs.screenshot is not None:
print(obs.screenshot.shape) # [height, width, channels]
# Error handling
if obs.last_action_error:
print(f"Action failed: {obs.error}")
# Episode status
print(obs.done) # True if episode ended
print(obs.reward) # Reward for the step
# Access full BrowserGym data (includes timestamps, etc.)
print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym
print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)
Advanced: Accessing Raw BrowserGym Data#
For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in metadata:
result = env.step(action)
# Access timestamps (if available)
info = result.observation.metadata["browsergym_info"]
if "timestamp" in info:
print(f"Action timestamp: {info['timestamp']}")
# Access additional observation fields
obs_dict = result.observation.metadata["browsergym_obs"]
if "dom_object" in obs_dict:
dom = obs_dict["dom_object"]
# Work with raw DOM object
# Access page performance data
if "performance" in info:
print(f"Page load time: {info['performance']}")
State#
The environment state tracks progress:
state = env.state()
print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc.
print(f"Task: {state.task_name}") # Task name/ID
print(f"Episode: {state.episode_id}") # Unique episode ID
print(f"Steps: {state.step_count}") # Number of steps taken
print(f"Total Reward: {state.cum_reward}") # Cumulative reward
print(f"Goal: {state.goal}") # Task instruction
print(f"URL: {state.current_url}") # Current page URL
Configuration#
Environment variables:
Common Settings#
BROWSERGYM_BENCHMARK: Benchmark to use (miniwob,webarena,visualwebarena,workarena)BROWSERGYM_TASK_NAME: Specific task name (optional, will use first available if not set)BROWSERGYM_HEADLESS: Run browser in headless mode (default:true)BROWSERGYM_VIEWPORT_WIDTH: Browser viewport width (default:1280)BROWSERGYM_VIEWPORT_HEIGHT: Browser viewport height (default:720)BROWSERGYM_TIMEOUT: Action timeout in milliseconds (default:10000)
WebArena-Specific (only needed for WebArena benchmark)#
SHOPPING: Shopping website URLSHOPPING_ADMIN: Shopping admin panel URLREDDIT: Reddit-like forum URLGITLAB: GitLab instance URLMAP: Map service URLWIKIPEDIA: Wikipedia instance URLHOMEPAGE: Homepage URL
Supported Benchmarks#
1. MiniWoB++ (Training) ✅ Recommended for Training#
100+ tasks ranging from simple (click buttons) to complex (form filling, navigation)
Fast: Instant resets, quick episodes
Randomized: Task variations for generalization
No setup: Works out-of-the-box
Dense rewards: Immediate feedback for learning
Use Case: Train agents on fundamental web navigation skills
2. WebArena (Evaluation) 📊 Benchmark#
812 realistic tasks across 6 websites
Complex: Multi-step reasoning, real web interfaces
Requires setup: Need to run 7 backend services
Sparse rewards: Binary success/failure
Evaluation-focused: Test real-world performance
Use Case: Evaluate agents on realistic web tasks
3. VisualWebArena (Evaluation) 👁️ Visual Benchmark#
910 tasks requiring visual understanding
Multimodal: Both text and visual observations
Requires setup: Similar to WebArena
Challenging: Requires visual reasoning
Use Case: Test visual web navigation capabilities
4. WorkArena (Evaluation) 💼 Enterprise Benchmark#
Enterprise tasks: CRM, project management, etc.
Realistic workflows: Real enterprise software
Requires setup: Enterprise software instances
Use Case: Evaluate on business automation tasks
Typical Training Pipeline#
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Stage 1: Train on MiniWoB (simple tasks, fast)
train_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-button",
}
)
# Train your agent (RL, imitation learning, etc.)
agent.train(train_env, num_episodes=10000)
train_env.close()
# Stage 2: Evaluate on WebArena (complex tasks, realistic)
eval_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
# ... WebArena URLs
}
)
# Test performance
success_rate = agent.evaluate(eval_env, num_tasks=812)
print(f"WebArena Success Rate: {success_rate:.2%}")
eval_env.close()
Development & Testing#
Running Tests#
# From the OpenEnv repository root
pytest tests/envs/test_browsergym_env.py
Local Development#
# Install in development mode
cd /path/to/OpenEnv
pip install -e .
# Install BrowserGym
pip install browsergym browsergym-miniwob browsergym-webarena
# Run the server locally
cd envs/browsergym_env/server
export BROWSERGYM_BENCHMARK=miniwob
export BROWSERGYM_TASK_NAME=click-test
python app.py
Project Structure#
browsergym_env/
├── __init__.py # Module exports
├── models.py # Action, Observation, State dataclasses
├── client.py # HTTPEnvClient implementation
├── README.md # This file
└── server/
├── __init__.py
├── app.py # FastAPI application
├── browsergym_environment.py # Environment implementation
├── Dockerfile # Container specification
└── requirements.txt # Python dependencies