Rate this Page
โ˜… โ˜… โ˜… โ˜… โ˜…

FinRL Environment#

A wrapper around FinRL stock trading environments that conforms to the OpenEnv specification.

Overview#

This environment enables reinforcement learning for stock trading tasks using FinRLโ€™s powerful StockTradingEnv, exposed through OpenEnvโ€™s simple HTTP API. It supports:

  • Stock Trading: Buy/sell actions across multiple stocks

  • Portfolio Management: Track balance, holdings, and portfolio value

  • Technical Indicators: MACD, RSI, CCI, DX, and more

  • Flexible Configuration: Custom data sources and trading parameters

Quick Start#

1. Build the Docker Image#

First, build the base image (from OpenEnv root):

cd OpenEnv
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .

Then build the FinRL environment image:

docker build -t finrl-env:latest -f envs/finrl_env/server/Dockerfile .

2. Run the Server#

Option A: With Default Sample Data#

docker run -p 8000:8000 finrl-env:latest

This starts the server with synthetic sample data for testing.

Option B: With Custom Configuration#

Create a configuration file config.json:

{
  "data_path": "/data/stock_data.csv",
  "stock_dim": 3,
  "hmax": 100,
  "initial_amount": 100000,
  "num_stock_shares": [0, 0, 0],
  "buy_cost_pct": [0.001, 0.001, 0.001],
  "sell_cost_pct": [0.001, 0.001, 0.001],
  "reward_scaling": 0.0001,
  "state_space": 25,
  "action_space": 3,
  "tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"]
}

Run with configuration:

docker run -p 8000:8000 \
  -v $(pwd)/config.json:/config/config.json \
  -v $(pwd)/data:/data \
  -e FINRL_CONFIG_PATH=/config/config.json \
  finrl-env:latest

3. Use the Client#

from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np

# Connect to server
client = FinRLEnv(base_url="http://localhost:8000")

# Get configuration
config = client.get_config()
print(f"Trading {config['stock_dim']} stocks")
print(f"Initial capital: ${config['initial_amount']:,.0f}")

# Reset environment
result = client.reset()
print(f"Initial portfolio value: ${result.observation.portfolio_value:,.2f}")

# Trading loop
for step in range(100):
    # Get current state
    state = result.observation.state

    # Your RL policy here (example: random actions)
    num_stocks = config['stock_dim']
    actions = np.random.uniform(-1, 1, size=num_stocks).tolist()

    # Execute action
    result = client.step(FinRLAction(actions=actions))

    print(f"Step {step}: Portfolio=${result.observation.portfolio_value:,.2f}, "
          f"Reward={result.reward:.2f}")

    if result.done:
        print("Episode finished!")
        break

client.close()

Architecture#

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    RL Training Framework                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚  โ”‚ Policy Net   โ”‚  โ”‚ Value Net    โ”‚  โ”‚ Replay       โ”‚      โ”‚
โ”‚  โ”‚ (PyTorch)    โ”‚  โ”‚ (PyTorch)    โ”‚  โ”‚ Buffer       โ”‚      โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                            โ”‚                                 โ”‚
โ”‚                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                        โ”‚
โ”‚                   โ”‚ FinRLEnv        โ”‚ โ† HTTP Client          โ”‚
โ”‚                   โ”‚ (HTTPEnvClient) โ”‚                        โ”‚
โ”‚                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚ HTTP (JSON)
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚ Docker Containerโ”‚
                    โ”‚  Port: 8000     โ”‚
                    โ”‚                 โ”‚
                    โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
                    โ”‚ โ”‚FastAPI      โ”‚ โ”‚
                    โ”‚ โ”‚Server       โ”‚ โ”‚
                    โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
                    โ”‚        โ”‚        โ”‚
                    โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
                    โ”‚ โ”‚ FinRL       โ”‚ โ”‚
                    โ”‚ โ”‚ Environment โ”‚ โ”‚
                    โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
                    โ”‚        โ”‚        โ”‚
                    โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
                    โ”‚ โ”‚ FinRL       โ”‚ โ”‚
                    โ”‚ โ”‚ StockTradingโ”‚ โ”‚
                    โ”‚ โ”‚ Env         โ”‚ โ”‚
                    โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

API Reference#

FinRLAction#

Trading action for the environment.

Attributes:

  • actions: list[float] - Array of normalized action values (-1 to 1) for each stock

    • Positive values: Buy

    • Negative values: Sell

    • Magnitude: Relative trade size

Example:

# Buy stock 0, sell stock 1, hold stock 2
action = FinRLAction(actions=[0.5, -0.3, 0.0])

FinRLObservation#

Observation returned by the environment.

Attributes:

  • state: list[float] - Flattened state vector

    • Structure: [balance, prices..., holdings..., indicators...]

  • portfolio_value: float - Total portfolio value (cash + holdings)

  • date: str - Current trading date

  • done: bool - Whether episode has ended

  • reward: float - Reward for the last action

  • metadata: dict - Additional information

Example:

obs = result.observation
print(f"Portfolio: ${obs.portfolio_value:,.2f}")
print(f"Date: {obs.date}")
print(f"State dimension: {len(obs.state)}")

Client Methods#

reset() -> StepResult[FinRLObservation]#

Reset the environment to start a new episode.

result = client.reset()

step(action: FinRLAction) -> StepResult[FinRLObservation]#

Execute a trading action.

action = FinRLAction(actions=[0.5, -0.3])
result = client.step(action)

state() -> State#

Get episode metadata (episode_id, step_count).

state = client.state()
print(f"Episode: {state.episode_id}, Step: {state.step_count}")

get_config() -> dict#

Get environment configuration.

config = client.get_config()
print(config['stock_dim'])
print(config['initial_amount'])

Data Format#

The environment expects stock data in the following CSV format:

date

tic

close

high

low

open

volume

macd

rsi_30

cci_30

dx_30

2020-01-01

AAPL

100.0

102.0

98.0

99.0

1000000

0.5

55.0

10.0

15.0

2020-01-01

GOOGL

1500.0

1520.0

1480.0

1490.0

500000

-0.3

48.0

-5.0

20.0

Required columns:

  • date: Trading date

  • tic: Stock ticker symbol

  • close, high, low, open: Price data

  • volume: Trading volume

  • Technical indicators (as specified in tech_indicator_list)

Configuration Parameters#

Parameter

Type

Description

data_path

str

Path to CSV file with stock data

stock_dim

int

Number of stocks to trade

hmax

int

Maximum shares per trade

initial_amount

int

Starting cash balance

num_stock_shares

list[int]

Initial holdings for each stock

buy_cost_pct

list[float]

Transaction cost for buying (per stock)

sell_cost_pct

list[float]

Transaction cost for selling (per stock)

reward_scaling

float

Scaling factor for rewards

state_space

int

Dimension of state vector

action_space

int

Dimension of action space

tech_indicator_list

list[str]

Technical indicators to include

Integration with RL Frameworks#

Stable Baselines 3#

from stable_baselines3 import PPO
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np

# Create custom wrapper for SB3
class SB3FinRLWrapper:
    def __init__(self, base_url):
        self.env = FinRLEnv(base_url=base_url)
        config = self.env.get_config()
        self.action_space = spaces.Box(
            low=-1, high=1,
            shape=(config['action_space'],),
            dtype=np.float32
        )
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf,
            shape=(config['state_space'],),
            dtype=np.float32
        )

    def reset(self):
        result = self.env.reset()
        return np.array(result.observation.state, dtype=np.float32)

    def step(self, action):
        result = self.env.step(FinRLAction(actions=action.tolist()))
        return (
            np.array(result.observation.state, dtype=np.float32),
            result.reward or 0.0,
            result.done,
            result.observation.metadata
        )

# Train
env = SB3FinRLWrapper("http://localhost:8000")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

Troubleshooting#

Server wonโ€™t start#

  1. Check if base image exists:

    docker images | grep envtorch-base
    
  2. Build base image if missing:

    docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .
    

Import errors#

Make sure youโ€™re in the src directory:

cd OpenEnv/src
python -c "from envs.finrl_env import FinRLEnv"

Configuration errors#

Verify your data file has all required columns:

import pandas as pd
df = pd.read_csv('your_data.csv')
print(df.columns.tolist())

Examples#

See the examples/ directory for complete examples:

  • examples/finrl_simple.py - Basic usage

  • examples/finrl_training.py - Full training loop with PPO

  • examples/finrl_backtesting.py - Backtesting a trained agent

License#

BSD 3-Clause License (see LICENSE file in repository root)

References#