snorkelai/finqa-data tags:
openenv

FinQA Environment¶

A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.

Based on FinQABenchmark from Snorkel AI.

Overview¶

FinQA tests an agent's ability to: - Explore available financial tables for a company - Query table metadata and execute SQL queries - Perform calculations on extracted data - Submit final answers to financial questions

Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)

Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.

Note: This dataset is for evaluation only. Do not train on it.

Quick Start¶

Using Docker¶

# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .

# Run the server
docker run -p 8000:8000 finqa-env:latest

# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py

Local Development¶

# Install dependencies
uv pip install pandas

# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh

Using the Client¶

The client uses the MCP protocol and is async by default:

import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction

async def main():
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        # Reset to get a question
        obs = await env.reset()
        question = obs.metadata["question"]
        company = obs.metadata["company"]
        print(f"Question: {question}")
        print(f"Company: {company}")

        # Discover available tools
        tools = await env.list_tools()
        print([t.name for t in tools])

        # Use tools via call_tool (convenience method)
        result = await env.call_tool("get_descriptions", company_name=company)
        print(f"Available tables: {result}")

        # Or use step() with CallToolAction for full observation access
        step_result = await env.step(CallToolAction(
            tool_name="sql_query",
            arguments={
                "company_name": "alphabet",
                "table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
                "query": "SELECT * FROM data WHERE year = '2022'"
            }
        ))
        print(f"Done: {step_result.done}, Reward: {step_result.reward}")

        # Submit answer
        result = await env.call_tool("submit_answer", answer="6.118")

asyncio.run(main())

Available Tools¶

Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.

Tool	Description	Arguments
`get_descriptions`	Get list of available table names for a company	`company_name: str`
`get_table_info`	Get table metadata (columns, dtypes, unique values)	`company_name: str, table_name: str`
`sql_query`	Execute SQL query on a table (requires filters)	`company_name: str, table_name: str, query: str`
`submit_answer`	Submit final answer (ends episode)	`answer: str`

Tool Constraints¶

sql_query: Must include filters (WHERE, HAVING, etc.). SELECT * is not allowed.

Environment Variables¶

Variable	Default	Description
`FINQA_DATA_PATH`	`/app/env/data`	Path to data directory
`FINQA_MAX_STEPS`	`50`	Maximum tool calls per episode
`FINQA_TASK`	`finqa`	Task name

Reward Computation¶

Rewards use fuzzy numerical matching:

Extracts numbers from \boxed{...} format
Handles percentages, fractions, and decimals
1% relative tolerance or 0.01 absolute tolerance
Returns 1.0 for correct, 0.0 for incorrect

Local Development¶

# From OpenEnv repo root
cd envs/finqa_env

# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000

# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset

Integration with RL Frameworks¶

TRL (GRPO)¶

import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv

async def rollout_func(prompts, trainer):
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        # Your agent logic here using await env.call_tool(...)
        return {"reward": obs.reward, "completion": completion}

trainer = GRPOTrainer(
    model=model,
    rollout_func=rollout_func,
    ...
)

Project Structure¶

finqa_env/
├── __init__.py           # Exports FinQAEnv, CallToolAction, ListToolsAction
├── models.py             # FinQAState and tool name constants
├── client.py             # MCP client (subclasses MCPToolClient)
├── pyproject.toml        # Dependencies
├── README.md             # This file
├── data/                 # Benchmark data (run download_data.sh)
│   ├── benchmark_questions/
│   │   └── finqa.csv
│   └── input_companies/
│       └── [company folders]
├── download_data.sh      # Downloads data from HuggingFace
└── server/
    ├── __init__.py
    ├── finqa_environment.py  # MCPEnvironment subclass with @mcp.tool decorators
    ├── tools.py              # Tool implementations
    ├── rewards.py            # Reward computation
    ├── app.py                # FastAPI server
    └── Dockerfile

FinQA Environment¶

Overview¶

Quick Start¶

Using Docker¶

Local Development¶

Using the Client¶

Available Tools¶

Tool Constraints¶

Environment Variables¶

Reward Computation¶

Local Development¶

Integration with RL Frameworks¶

TRL (GRPO)¶

Project Structure¶

References¶