DIPG Safety Environment (DIPGSafetyEnv)¶
Overview¶
The DIPGSafetyEnv is a custom environment built on the OpenEnv framework for Reinforcement Learning research in high-stakes AI safety. It was developed to address a critical use case: ensuring the reliability and safety of a Large Language Model (LLM) agent operating in the medical domain of Diffuse Intrinsic Pontine Glioma (DIPG), a universally fatal pediatric brain tumor.
In this context, an AI's failure is not an option. The environment's primary purpose is to train and rigorously evaluate an agent's ability to: 1. Base its answers only on the verified clinical context provided. 2. Correctly identify and report conflicting information from different sources. 3. Safely abstain from answering when the context is insufficient. 4. Strictly avoid hallucinating facts or providing unsafe, unsupported information.
Features¶
The environment server contains a suite of safety-critical reward functions that score an agent's response based on the following behaviors:
- Conflict Identification: Rewards the agent for correctly stating that provided sources are contradictory.
- Knowledge Abstention: Rewards the agent for recognizing when a question cannot be answered from the given text and explicitly saying so.
- Format Adherence: Positively or negatively scores the response based on its adherence to a required structured output format.
- Hallucination Penalty: Heavily penalizes the agent for generating any information that is not supported by the provided context.
Getting Started: How to Use the Environment¶
The DIPGSafetyEnv follows a standard client-server model.
1. Running the Server¶
The server requires the custom synthetic dataset (harmonic_reasoner_dataset_structured.jsonl). You can download it from here.
The recommended way to run the server is with gunicorn for better performance and stability.
# Install gunicorn
pip install gunicorn
# Set the dataset path environment variable
export DIPG_DATASET_PATH=/path/to/your/harmonic_reasoner_dataset_structured.jsonl
# Run the server
PYTHONPATH=./src gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8009 envs.dipg_safety_env.server.app:app
2. Interacting from the Client¶
Once the server is running, an agent can interact with it using the DIPGSafetyEnv client.
from envs.dipg_safety_env.client import DIPGSafetyEnv
from envs.dipg_safety_env.models import DIPGAction
# Connect to the running server
env = DIPGSafetyEnv(base_url="http://localhost:8009", timeout=60)
# Start a new episode and get the first challenge
# The 'obs' object will contain a medical context and a question.
obs = env.reset()
print(f"Question: {obs.observation.question}")
# The agent processes the observation and generates a response
agent_response_text = "Based on the provided context, the information is conflicting."
# Send the response (as an Action) to the environment to be scored
action = DIPGAction(llm_response=agent_response_text)
result = env.step(action)
# The result contains the reward and a flag indicating the episode is done
print(f"Reward: {result.reward}")
print(f"Done: {result.done}")
Running Tests¶
The environment includes a suite of tests to ensure its core logic is working correctly. These tests verify that the environment can be reset, that actions are processed, and that the reward functions are behaving as expected.
Prerequisites¶
You must have pytest installed:
pip install pytest
How to Run¶
From the root directory of the OpenEnv project, run the following commands:
# Activate your virtual environment if you have one
source venv/bin/activate
# Set the PYTHONPATH
export PYTHONPATH=src
# Run the tests
pytest tests/envs/test_dipg_environment.py
pytest tests/envs/test_dipg_client.py
pytest tests/envs/test_dipg_reward_functions.py
A successful run will show an output indicating that all tests passed.
Test Structure¶
tests/envs/test_dipg_environment.py: This is an end-to-end test that starts the server, connects a client, and tests thereset()andstep()functions.tests/envs/test_dipg_client.py: These are unit tests for the client, checking for error handling with invalid URLs and server timeouts.tests/envs/test_dipg_reward_functions.py: These are unit tests for the reward functions, ensuring they calculate scores correctly for different scenarios.
Core Components¶
models.py: Defines the data structures for interaction:DIPGObservation: Contains thecontextandquestionserved to the agent.DIPGAction: Contains thellm_responsegenerated by the agent.
server/dipg_environment.py: The core of the environment. It loads the dataset, serves challenges viareset(), and calculates rewards viastep().client.py: The "remote control" that allows a Python script to communicate with the server over HTTP, handling all the JSON serialization and parsing.tests/: Contains the unit and integration tests for the environment.