FinQA Environment#
A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.
Based on FinQABenchmark from Snorkel AI.
Overview#
FinQA tests an agentโs ability to:
Explore available financial tables for a company
Query table metadata and execute SQL queries
Perform calculations on extracted data
Submit final answers to financial questions
Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)
Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.
Note: This dataset is for evaluation only. Do not train on it.
Quick Start#
Using Docker#
# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .
# Run the server
docker run -p 8000:8000 finqa-env:latest
# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py
Local Development#
# Install dependencies
uv pip install pandas
# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh
Using the Client#
The client uses the MCP protocol and is async by default:
import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction
async def main():
async with FinQAEnv(base_url="http://localhost:8000") as env:
# Reset to get a question
obs = await env.reset()
question = obs.metadata["question"]
company = obs.metadata["company"]
print(f"Question: {question}")
print(f"Company: {company}")
# Discover available tools
tools = await env.list_tools()
print([t.name for t in tools])
# Use tools via call_tool (convenience method)
result = await env.call_tool("get_descriptions", company_name=company)
print(f"Available tables: {result}")
# Or use step() with CallToolAction for full observation access
step_result = await env.step(CallToolAction(
tool_name="sql_query",
arguments={
"company_name": "alphabet",
"table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
"query": "SELECT * FROM data WHERE year = '2022'"
}
))
print(f"Done: {step_result.done}, Reward: {step_result.reward}")
# Submit answer
result = await env.call_tool("submit_answer", answer="6.118")
asyncio.run(main())
Available Tools#
Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.
Tool |
Description |
Arguments |
|---|---|---|
|
Get list of available table names for a company |
|
|
Get table metadata (columns, dtypes, unique values) |
|
|
Execute SQL query on a table (requires filters) |
|
|
Submit final answer (ends episode) |
|
Tool Constraints#
sql_query: Must include filters (
WHERE,HAVING, etc.).SELECT *is not allowed.
Environment Variables#
Variable |
Default |
Description |
|---|---|---|
|
|
Path to data directory |
|
|
Maximum tool calls per episode |
|
|
Task name |
Reward Computation#
Rewards use fuzzy numerical matching:
Extracts numbers from
\boxed{...}formatHandles percentages, fractions, and decimals
1% relative tolerance or 0.01 absolute tolerance
Returns
1.0for correct,0.0for incorrect
Local Development#
# From OpenEnv repo root
cd envs/finqa_env
# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000
# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset
Integration with RL Frameworks#
TRL (GRPO)#
import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv
async def rollout_func(prompts, trainer):
async with FinQAEnv(base_url="http://localhost:8000") as env:
obs = await env.reset()
# Your agent logic here using await env.call_tool(...)
return {"reward": obs.reward, "completion": completion}
trainer = GRPOTrainer(
model=model,
rollout_func=rollout_func,
...
)
Project Structure#
finqa_env/
โโโ __init__.py # Exports FinQAEnv, CallToolAction, ListToolsAction
โโโ models.py # FinQAState and tool name constants
โโโ client.py # MCP client (subclasses MCPToolClient)
โโโ pyproject.toml # Dependencies
โโโ README.md # This file
โโโ data/ # Benchmark data (run download_data.sh)
โ โโโ benchmark_questions/
โ โ โโโ finqa.csv
โ โโโ input_companies/
โ โโโ [company folders]
โโโ download_data.sh # Downloads data from HuggingFace
โโโ server/
โโโ __init__.py
โโโ finqa_environment.py # MCPEnvironment subclass with @mcp.tool decorators
โโโ tools.py # Tool implementations
โโโ rewards.py # Reward computation
โโโ app.py # FastAPI server
โโโ Dockerfile