Reward Design#
Note
Coming Soon This page is under construction.
Learn how to design
Learn how to design effective reward functions for your OpenEnv environments.
Overview
Reward functions are critical for RL training. They signal to your agent what behaviors are desirable.
Reward Design Principles
1. Start Simple
Begin with sparse rewards (success/failure) before adding shaped rewards:
def compute_reward(observation, action, terminated):
if terminated and observation.success:
return 1.0
elif terminated:
return -1.0
return 0.0
2. Shape Carefully
Add intermediate rewards to help learning, but avoid reward hacking:
def compute_reward(observation, action, terminated):
reward = 0.0
# Progress reward
reward += 0.1 * observation.progress_delta
# Success bonus
if terminated and observation.success:
reward += 10.0
return reward
3. Consider Density
Dense rewards (every step) speed learning but can cause local optima. Sparse rewards are cleaner but slower.
Environment Examples
Chess (Sparse)
# Win: +1, Loss: -1, Draw: 0
reward = result.observation.game_result
Coding (Dense)
reward = 0.0
if observation.tests_passed > prev_tests_passed:
reward += 0.5 * (observation.tests_passed - prev_tests_passed)
if observation.all_tests_passed:
reward += 5.0
TextArena (Mixed)
# Per-turn progress + final outcome
reward = observation.score_delta + (10.0 if observation.won else 0.0)
Common Pitfalls
Reward hacking - Agent finds unintended shortcuts
Sparse rewards - Agent never finds positive signal
Conflicting signals - Mixed incentives confuse learning
Next Steps
RL Framework Integration - Use rewards in training
Environment Anatomy - Where to implement rewards