FinRL Environment¶
A wrapper around FinRL stock trading environments that conforms to the OpenEnv specification.
Overview¶
This environment enables reinforcement learning for stock trading tasks using FinRL's powerful StockTradingEnv, exposed through OpenEnv's simple HTTP API. It supports:
- Stock Trading: Buy/sell actions across multiple stocks
- Portfolio Management: Track balance, holdings, and portfolio value
- Technical Indicators: MACD, RSI, CCI, DX, and more
- Flexible Configuration: Custom data sources and trading parameters
Quick Start¶
1. Build the Docker Image¶
First, build the base image (from OpenEnv root):
cd OpenEnv
docker build -t envtorch-base:latest -f src/core/containers/images/Dockerfile .
Then build the FinRL environment image:
docker build -t finrl-env:latest -f src/envs/finrl_env/server/Dockerfile .
2. Run the Server¶
Option A: With Default Sample Data¶
docker run -p 8000:8000 finrl-env:latest
This starts the server with synthetic sample data for testing.
Option B: With Custom Configuration¶
Create a configuration file config.json:
{
"data_path": "/data/stock_data.csv",
"stock_dim": 3,
"hmax": 100,
"initial_amount": 100000,
"num_stock_shares": [0, 0, 0],
"buy_cost_pct": [0.001, 0.001, 0.001],
"sell_cost_pct": [0.001, 0.001, 0.001],
"reward_scaling": 0.0001,
"state_space": 25,
"action_space": 3,
"tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"]
}
Run with configuration:
docker run -p 8000:8000 \
-v $(pwd)/config.json:/config/config.json \
-v $(pwd)/data:/data \
-e FINRL_CONFIG_PATH=/config/config.json \
finrl-env:latest
3. Use the Client¶
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Connect to server
client = FinRLEnv(base_url="http://localhost:8000")
# Get configuration
config = client.get_config()
print(f"Trading {config['stock_dim']} stocks")
print(f"Initial capital: ${config['initial_amount']:,.0f}")
# Reset environment
result = client.reset()
print(f"Initial portfolio value: ${result.observation.portfolio_value:,.2f}")
# Trading loop
for step in range(100):
# Get current state
state = result.observation.state
# Your RL policy here (example: random actions)
num_stocks = config['stock_dim']
actions = np.random.uniform(-1, 1, size=num_stocks).tolist()
# Execute action
result = client.step(FinRLAction(actions=actions))
print(f"Step {step}: Portfolio=${result.observation.portfolio_value:,.2f}, "
f"Reward={result.reward:.2f}")
if result.done:
print("Episode finished!")
break
client.close()
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ RL Training Framework │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Policy Net │ │ Value Net │ │ Replay │ │
│ │ (PyTorch) │ │ (PyTorch) │ │ Buffer │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ FinRLEnv │ ← HTTP Client │
│ │ (HTTPEnvClient) │ │
│ └────────┬────────┘ │
└────────────────────────────┼─────────────────────────────────┘
│ HTTP (JSON)
┌────────▼────────┐
│ Docker Container│
│ Port: 8000 │
│ │
│ ┌─────────────┐ │
│ │FastAPI │ │
│ │Server │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ FinRL │ │
│ │ Environment │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ FinRL │ │
│ │ StockTrading│ │
│ │ Env │ │
│ └─────────────┘ │
└─────────────────┘
API Reference¶
FinRLAction¶
Trading action for the environment.
Attributes:
- actions: list[float] - Array of normalized action values (-1 to 1) for each stock
- Positive values: Buy
- Negative values: Sell
- Magnitude: Relative trade size
Example:
# Buy stock 0, sell stock 1, hold stock 2
action = FinRLAction(actions=[0.5, -0.3, 0.0])
FinRLObservation¶
Observation returned by the environment.
Attributes:
- state: list[float] - Flattened state vector
- Structure: [balance, prices..., holdings..., indicators...]
- portfolio_value: float - Total portfolio value (cash + holdings)
- date: str - Current trading date
- done: bool - Whether episode has ended
- reward: float - Reward for the last action
- metadata: dict - Additional information
Example:
obs = result.observation
print(f"Portfolio: ${obs.portfolio_value:,.2f}")
print(f"Date: {obs.date}")
print(f"State dimension: {len(obs.state)}")
Client Methods¶
reset() -> StepResult[FinRLObservation]¶
Reset the environment to start a new episode.
result = client.reset()
step(action: FinRLAction) -> StepResult[FinRLObservation]¶
Execute a trading action.
action = FinRLAction(actions=[0.5, -0.3])
result = client.step(action)
state() -> State¶
Get episode metadata (episode_id, step_count).
state = client.state()
print(f"Episode: {state.episode_id}, Step: {state.step_count}")
get_config() -> dict¶
Get environment configuration.
config = client.get_config()
print(config['stock_dim'])
print(config['initial_amount'])
Data Format¶
The environment expects stock data in the following CSV format:
| date | tic | close | high | low | open | volume | macd | rsi_30 | cci_30 | dx_30 |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-01-01 | AAPL | 100.0 | 102.0 | 98.0 | 99.0 | 1000000 | 0.5 | 55.0 | 10.0 | 15.0 |
| 2020-01-01 | GOOGL | 1500.0 | 1520.0 | 1480.0 | 1490.0 | 500000 | -0.3 | 48.0 | -5.0 | 20.0 |
Required columns:
- date: Trading date
- tic: Stock ticker symbol
- close, high, low, open: Price data
- volume: Trading volume
- Technical indicators (as specified in tech_indicator_list)
Configuration Parameters¶
| Parameter | Type | Description |
|---|---|---|
data_path |
str | Path to CSV file with stock data |
stock_dim |
int | Number of stocks to trade |
hmax |
int | Maximum shares per trade |
initial_amount |
int | Starting cash balance |
num_stock_shares |
list[int] | Initial holdings for each stock |
buy_cost_pct |
list[float] | Transaction cost for buying (per stock) |
sell_cost_pct |
list[float] | Transaction cost for selling (per stock) |
reward_scaling |
float | Scaling factor for rewards |
state_space |
int | Dimension of state vector |
action_space |
int | Dimension of action space |
tech_indicator_list |
list[str] | Technical indicators to include |
Integration with RL Frameworks¶
Stable Baselines 3¶
from stable_baselines3 import PPO
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Create custom wrapper for SB3
class SB3FinRLWrapper:
def __init__(self, base_url):
self.env = FinRLEnv(base_url=base_url)
config = self.env.get_config()
self.action_space = spaces.Box(
low=-1, high=1,
shape=(config['action_space'],),
dtype=np.float32
)
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf,
shape=(config['state_space'],),
dtype=np.float32
)
def reset(self):
result = self.env.reset()
return np.array(result.observation.state, dtype=np.float32)
def step(self, action):
result = self.env.step(FinRLAction(actions=action.tolist()))
return (
np.array(result.observation.state, dtype=np.float32),
result.reward or 0.0,
result.done,
result.observation.metadata
)
# Train
env = SB3FinRLWrapper("http://localhost:8000")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
Troubleshooting¶
Server won't start¶
-
Check if base image exists:
docker images | grep envtorch-base -
Build base image if missing:
docker build -t envtorch-base:latest -f src/core/containers/images/Dockerfile .
Import errors¶
Make sure you're in the src directory:
cd OpenEnv/src
python -c "from envs.finrl_env import FinRLEnv"
Configuration errors¶
Verify your data file has all required columns:
import pandas as pd
df = pd.read_csv('your_data.csv')
print(df.columns.tolist())
Examples¶
See the examples/ directory for complete examples:
- examples/finrl_simple.py - Basic usage
- examples/finrl_training.py - Full training loop with PPO
- examples/finrl_backtesting.py - Backtesting a trained agent
License¶
BSD 3-Clause License (see LICENSE file in repository root)