Rate this Page
★ ★ ★ ★ ★

Reward Design#

Note

Coming Soon This page is under construction.

Learn how to design

Learn how to design effective reward functions for your OpenEnv environments.

Overview

Reward functions are critical for RL training. They signal to your agent what behaviors are desirable.

Reward Design Principles

1. Start Simple

Begin with sparse rewards (success/failure) before adding shaped rewards:

def compute_reward(observation, action, terminated):
    if terminated and observation.success:
        return 1.0
    elif terminated:
        return -1.0
    return 0.0

2. Shape Carefully

Add intermediate rewards to help learning, but avoid reward hacking:

def compute_reward(observation, action, terminated):
    reward = 0.0

    # Progress reward
    reward += 0.1 * observation.progress_delta

    # Success bonus
    if terminated and observation.success:
        reward += 10.0

    return reward

3. Consider Density

Dense rewards (every step) speed learning but can cause local optima. Sparse rewards are cleaner but slower.

Environment Examples

Chess (Sparse)

# Win: +1, Loss: -1, Draw: 0
reward = result.observation.game_result

Coding (Dense)

reward = 0.0
if observation.tests_passed > prev_tests_passed:
    reward += 0.5 * (observation.tests_passed - prev_tests_passed)
if observation.all_tests_passed:
    reward += 5.0

TextArena (Mixed)

# Per-turn progress + final outcome
reward = observation.score_delta + (10.0 if observation.won else 0.0)

Common Pitfalls

  1. Reward hacking - Agent finds unintended shortcuts

  2. Sparse rewards - Agent never finds positive signal

  3. Conflicting signals - Mixed incentives confuse learning

Next Steps