# Reward Design

:::{note}
Coming Soon
    This page is under construction.

Learn how to design

Learn how to design effective reward functions for your OpenEnv environments.

## Overview

Reward functions are critical for RL training. They signal to your agent what behaviors are desirable.

## Reward Design Principles

### 1. Start Simple

Begin with sparse rewards (success/failure) before adding shaped rewards:

```python
def compute_reward(observation, action, terminated):
    if terminated and observation.success:
        return 1.0
    elif terminated:
        return -1.0
    return 0.0
```

### 2. Shape Carefully

Add intermediate rewards to help learning, but avoid reward hacking:

```python
def compute_reward(observation, action, terminated):
    reward = 0.0

    # Progress reward
    reward += 0.1 * observation.progress_delta

    # Success bonus
    if terminated and observation.success:
        reward += 10.0

    return reward
```

### 3. Consider Density

Dense rewards (every step) speed learning but can cause local optima. Sparse rewards are cleaner but slower.

## Environment Examples

### Chess (Sparse)

```python
# Win: +1, Loss: -1, Draw: 0
reward = result.observation.game_result
```

### Coding (Dense)

```python
reward = 0.0
if observation.tests_passed > prev_tests_passed:
    reward += 0.5 * (observation.tests_passed - prev_tests_passed)
if observation.all_tests_passed:
    reward += 5.0
```

### TextArena (Mixed)

```python
# Per-turn progress + final outcome
reward = observation.score_delta + (10.0 if observation.won else 0.0)
```

## Common Pitfalls

1. **Reward hacking** - Agent finds unintended shortcuts
2. **Sparse rewards** - Agent never finds positive signal
3. **Conflicting signals** - Mixed incentives confuse learning

## Next Steps

- [RL Framework Integration](rl-integration.md) - Use rewards in training
- [Environment Anatomy](environment-anatomy.md) - Where to implement rewards
:::