FinRL Environment#
A wrapper around FinRL stock trading environments that conforms to the OpenEnv specification.
Overview#
This environment enables reinforcement learning for stock trading tasks using FinRLโs powerful StockTradingEnv, exposed through OpenEnvโs simple HTTP API. It supports:
Stock Trading: Buy/sell actions across multiple stocks
Portfolio Management: Track balance, holdings, and portfolio value
Technical Indicators: MACD, RSI, CCI, DX, and more
Flexible Configuration: Custom data sources and trading parameters
Quick Start#
1. Build the Docker Image#
First, build the base image (from OpenEnv root):
cd OpenEnv
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .
Then build the FinRL environment image:
docker build -t finrl-env:latest -f envs/finrl_env/server/Dockerfile .
2. Run the Server#
Option A: With Default Sample Data#
docker run -p 8000:8000 finrl-env:latest
This starts the server with synthetic sample data for testing.
Option B: With Custom Configuration#
Create a configuration file config.json:
{
"data_path": "/data/stock_data.csv",
"stock_dim": 3,
"hmax": 100,
"initial_amount": 100000,
"num_stock_shares": [0, 0, 0],
"buy_cost_pct": [0.001, 0.001, 0.001],
"sell_cost_pct": [0.001, 0.001, 0.001],
"reward_scaling": 0.0001,
"state_space": 25,
"action_space": 3,
"tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"]
}
Run with configuration:
docker run -p 8000:8000 \
-v $(pwd)/config.json:/config/config.json \
-v $(pwd)/data:/data \
-e FINRL_CONFIG_PATH=/config/config.json \
finrl-env:latest
3. Use the Client#
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Connect to server
client = FinRLEnv(base_url="http://localhost:8000")
# Get configuration
config = client.get_config()
print(f"Trading {config['stock_dim']} stocks")
print(f"Initial capital: ${config['initial_amount']:,.0f}")
# Reset environment
result = client.reset()
print(f"Initial portfolio value: ${result.observation.portfolio_value:,.2f}")
# Trading loop
for step in range(100):
# Get current state
state = result.observation.state
# Your RL policy here (example: random actions)
num_stocks = config['stock_dim']
actions = np.random.uniform(-1, 1, size=num_stocks).tolist()
# Execute action
result = client.step(FinRLAction(actions=actions))
print(f"Step {step}: Portfolio=${result.observation.portfolio_value:,.2f}, "
f"Reward={result.reward:.2f}")
if result.done:
print("Episode finished!")
break
client.close()
Architecture#
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RL Training Framework โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Policy Net โ โ Value Net โ โ Replay โ โ
โ โ (PyTorch) โ โ (PyTorch) โ โ Buffer โ โ
โ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโผโโโโโโโโโ โ
โ โ FinRLEnv โ โ HTTP Client โ
โ โ (HTTPEnvClient) โ โ
โ โโโโโโโโโโฌโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP (JSON)
โโโโโโโโโโผโโโโโโโโโ
โ Docker Containerโ
โ Port: 8000 โ
โ โ
โ โโโโโโโโโโโโโโโ โ
โ โFastAPI โ โ
โ โServer โ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โโโโโโโโผโโโโโโโ โ
โ โ FinRL โ โ
โ โ Environment โ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โโโโโโโโผโโโโโโโ โ
โ โ FinRL โ โ
โ โ StockTradingโ โ
โ โ Env โ โ
โ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโ
API Reference#
FinRLAction#
Trading action for the environment.
Attributes:
actions: list[float]- Array of normalized action values (-1 to 1) for each stockPositive values: Buy
Negative values: Sell
Magnitude: Relative trade size
Example:
# Buy stock 0, sell stock 1, hold stock 2
action = FinRLAction(actions=[0.5, -0.3, 0.0])
FinRLObservation#
Observation returned by the environment.
Attributes:
state: list[float]- Flattened state vectorStructure:
[balance, prices..., holdings..., indicators...]
portfolio_value: float- Total portfolio value (cash + holdings)date: str- Current trading datedone: bool- Whether episode has endedreward: float- Reward for the last actionmetadata: dict- Additional information
Example:
obs = result.observation
print(f"Portfolio: ${obs.portfolio_value:,.2f}")
print(f"Date: {obs.date}")
print(f"State dimension: {len(obs.state)}")
Client Methods#
reset() -> StepResult[FinRLObservation]#
Reset the environment to start a new episode.
result = client.reset()
step(action: FinRLAction) -> StepResult[FinRLObservation]#
Execute a trading action.
action = FinRLAction(actions=[0.5, -0.3])
result = client.step(action)
state() -> State#
Get episode metadata (episode_id, step_count).
state = client.state()
print(f"Episode: {state.episode_id}, Step: {state.step_count}")
get_config() -> dict#
Get environment configuration.
config = client.get_config()
print(config['stock_dim'])
print(config['initial_amount'])
Data Format#
The environment expects stock data in the following CSV format:
date |
tic |
close |
high |
low |
open |
volume |
macd |
rsi_30 |
cci_30 |
dx_30 |
|---|---|---|---|---|---|---|---|---|---|---|
2020-01-01 |
AAPL |
100.0 |
102.0 |
98.0 |
99.0 |
1000000 |
0.5 |
55.0 |
10.0 |
15.0 |
2020-01-01 |
GOOGL |
1500.0 |
1520.0 |
1480.0 |
1490.0 |
500000 |
-0.3 |
48.0 |
-5.0 |
20.0 |
Required columns:
date: Trading datetic: Stock ticker symbolclose,high,low,open: Price datavolume: Trading volumeTechnical indicators (as specified in
tech_indicator_list)
Configuration Parameters#
Parameter |
Type |
Description |
|---|---|---|
|
str |
Path to CSV file with stock data |
|
int |
Number of stocks to trade |
|
int |
Maximum shares per trade |
|
int |
Starting cash balance |
|
list[int] |
Initial holdings for each stock |
|
list[float] |
Transaction cost for buying (per stock) |
|
list[float] |
Transaction cost for selling (per stock) |
|
float |
Scaling factor for rewards |
|
int |
Dimension of state vector |
|
int |
Dimension of action space |
|
list[str] |
Technical indicators to include |
Integration with RL Frameworks#
Stable Baselines 3#
from stable_baselines3 import PPO
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Create custom wrapper for SB3
class SB3FinRLWrapper:
def __init__(self, base_url):
self.env = FinRLEnv(base_url=base_url)
config = self.env.get_config()
self.action_space = spaces.Box(
low=-1, high=1,
shape=(config['action_space'],),
dtype=np.float32
)
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf,
shape=(config['state_space'],),
dtype=np.float32
)
def reset(self):
result = self.env.reset()
return np.array(result.observation.state, dtype=np.float32)
def step(self, action):
result = self.env.step(FinRLAction(actions=action.tolist()))
return (
np.array(result.observation.state, dtype=np.float32),
result.reward or 0.0,
result.done,
result.observation.metadata
)
# Train
env = SB3FinRLWrapper("http://localhost:8000")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
Troubleshooting#
Server wonโt start#
Check if base image exists:
docker images | grep envtorch-base
Build base image if missing:
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .
Import errors#
Make sure youโre in the src directory:
cd OpenEnv/src
python -c "from envs.finrl_env import FinRLEnv"
Configuration errors#
Verify your data file has all required columns:
import pandas as pd
df = pd.read_csv('your_data.csv')
print(df.columns.tolist())
Examples#
See the examples/ directory for complete examples:
examples/finrl_simple.py- Basic usageexamples/finrl_training.py- Full training loop with PPOexamples/finrl_backtesting.py- Backtesting a trained agent
License#
BSD 3-Clause License (see LICENSE file in repository root)