Rate this Page
โ˜… โ˜… โ˜… โ˜… โ˜…

FinQA Environment#

A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.

Based on FinQABenchmark from Snorkel AI.

Overview#

FinQA tests an agentโ€™s ability to:

  • Explore available financial tables for a company

  • Query table metadata and execute SQL queries

  • Perform calculations on extracted data

  • Submit final answers to financial questions

Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)

Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.

Note: This dataset is for evaluation only. Do not train on it.

Quick Start#

Using Docker#

# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .

# Run the server
docker run -p 8000:8000 finqa-env:latest

# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py

Local Development#

# Install dependencies
uv pip install pandas

# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh

Using the Client#

The client uses the MCP protocol and is async by default:

import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction

async def main():
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        # Reset to get a question
        obs = await env.reset()
        question = obs.metadata["question"]
        company = obs.metadata["company"]
        print(f"Question: {question}")
        print(f"Company: {company}")

        # Discover available tools
        tools = await env.list_tools()
        print([t.name for t in tools])

        # Use tools via call_tool (convenience method)
        result = await env.call_tool("get_descriptions", company_name=company)
        print(f"Available tables: {result}")

        # Or use step() with CallToolAction for full observation access
        step_result = await env.step(CallToolAction(
            tool_name="sql_query",
            arguments={
                "company_name": "alphabet",
                "table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
                "query": "SELECT * FROM data WHERE year = '2022'"
            }
        ))
        print(f"Done: {step_result.done}, Reward: {step_result.reward}")

        # Submit answer
        result = await env.call_tool("submit_answer", answer="6.118")

asyncio.run(main())

Available Tools#

Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.

Tool

Description

Arguments

get_descriptions

Get list of available table names for a company

company_name: str

get_table_info

Get table metadata (columns, dtypes, unique values)

company_name: str, table_name: str

sql_query

Execute SQL query on a table (requires filters)

company_name: str, table_name: str, query: str

submit_answer

Submit final answer (ends episode)

answer: str

Tool Constraints#

  • sql_query: Must include filters (WHERE, HAVING, etc.). SELECT * is not allowed.

Environment Variables#

Variable

Default

Description

FINQA_DATA_PATH

/app/env/data

Path to data directory

FINQA_MAX_STEPS

50

Maximum tool calls per episode

FINQA_TASK

finqa

Task name

Reward Computation#

Rewards use fuzzy numerical matching:

  • Extracts numbers from \boxed{...} format

  • Handles percentages, fractions, and decimals

  • 1% relative tolerance or 0.01 absolute tolerance

  • Returns 1.0 for correct, 0.0 for incorrect

Local Development#

# From OpenEnv repo root
cd envs/finqa_env

# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000

# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset

Integration with RL Frameworks#

TRL (GRPO)#

import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv

async def rollout_func(prompts, trainer):
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        # Your agent logic here using await env.call_tool(...)
        return {"reward": obs.reward, "completion": completion}

trainer = GRPOTrainer(
    model=model,
    rollout_func=rollout_func,
    ...
)

Project Structure#

finqa_env/
โ”œโ”€โ”€ __init__.py           # Exports FinQAEnv, CallToolAction, ListToolsAction
โ”œโ”€โ”€ models.py             # FinQAState and tool name constants
โ”œโ”€โ”€ client.py             # MCP client (subclasses MCPToolClient)
โ”œโ”€โ”€ pyproject.toml        # Dependencies
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ data/                 # Benchmark data (run download_data.sh)
โ”‚   โ”œโ”€โ”€ benchmark_questions/
โ”‚   โ”‚   โ””โ”€โ”€ finqa.csv
โ”‚   โ””โ”€โ”€ input_companies/
โ”‚       โ””โ”€โ”€ [company folders]
โ”œโ”€โ”€ download_data.sh      # Downloads data from HuggingFace
โ””โ”€โ”€ server/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ finqa_environment.py  # MCPEnvironment subclass with @mcp.tool decorators
    โ”œโ”€โ”€ tools.py              # Tool implementations
    โ”œโ”€โ”€ rewards.py            # Reward computation
    โ”œโ”€โ”€ app.py                # FastAPI server
    โ””โ”€โ”€ Dockerfile

References#