TorchForge Documentation#
TorchForge is a PyTorch-native library for RL post-training and agentic development. Built on the principle that researchers should write algorithms, not infrastructure.
Note
Experimental Status: TorchForge is currently in early development. Expect bugs, incomplete features, and API changes. Please file issues on GitHub for bug reports and feature requests.
Why TorchForge?#
Reinforcement Learning has become essential to frontier AI - from instruction following and reasoning to complex research capabilities. But infrastructure complexity often dominates the actual research.
TorchForge lets you express RL algorithms as naturally as pseudocode, while powerful infrastructure handles distribution, fault tolerance, and optimization underneath.
Core Design Principles#
Algorithms, Not Infrastructure: Write your RL logic without distributed systems code
Any Degree of Asynchrony: From fully synchronous PPO to fully async off-policy training
Composable Components: Mix and match proven frameworks (vLLM, TorchTitan) with custom logic
Built on Solid Foundations: Leverages Monarch’s single-controller model for simplified distributed programming
Foundation: The Technology Stack#
TorchForge is built on carefully selected, battle-tested components:
Single-controller distributed programming framework that orchestrates clusters like you’d program a single machine. Provides actor meshes, fault tolerance, and RDMA-based data transfers.
Why it matters: Eliminates SPMD complexity, making distributed RL tractable
High-throughput, memory-efficient inference engine with PagedAttention and continuous batching.
Why it matters: Handles policy generation efficiently at scale
Meta’s production-grade LLM training platform with FSDP, pipeline parallelism, and tensor parallelism.
Why it matters: Battle-tested training infrastructure proven at scale
Distributed, in-memory key-value store for PyTorch tensors built on Monarch, optimized for weight synchronization with automatic DTensor resharding.
Why it matters: Solves the weight transfer bottleneck in async RL
What You Can Build#
Adapt foundation models to specific tasks using labeled data with efficient multi-GPU training.
Train models with Generalized Reward Policy Optimization for aligning with human preferences.
Continuous rollout generation with non-blocking training for maximum throughput.
Safe, sandboxed code execution environments for RL on coding tasks (RLVR).
Extensible environment system for agents that interact with tools and APIs.
Build your own components and compose them naturally with existing infrastructure.
Requirements at a Glance#
Before diving in, check out Getting Started and ensure your system meets the requirements.
Writing RL Code#
With TorchForge, your RL logic looks like pseudocode:
async def generate_episode(dataloader, policy, reward, replay_buffer):
# Sample a prompt
prompt, target = await dataloader.sample.route()
# Generate response
response = await policy.generate.route(prompt)
# Score the response
reward_value = await reward.evaluate_response.route(
prompt=prompt,
response=response.text,
target=target
)
# Store for training
await replay_buffer.add.route(
Episode(prompt_ids=response.prompt_ids,
response_ids=response.token_ids,
reward=reward_value)
)
No retry logic, no resource management, no synchronization code - just your algorithm.
Documentation Paths#
Choose your learning path:
Installation, prerequisites, verification, and your first training run.
Time to first run: ~15 minutes
Step-by-step guides and practical examples for training with TorchForge.
For hands-on development
Complete API documentation for customization and extension.
For deep integration
Validation & Partnerships#
TorchForge has been validated in real-world deployments:
Stanford Collaboration: Integration with the Weaver weak verifier project, training models that hill-climb on challenging reasoning benchmarks (MATH, GPQA)
CoreWeave: Large-scale training on 512 H100 GPU clusters with smooth, efficient performance
Scale: Tested across hundreds of GPUs with continuous rollouts and asynchronous training
Community#
GitHub: meta-pytorch/forge
Issues: Report bugs and request features
Contributing: CONTRIBUTING.md
Code of Conduct: CODE_OF_CONDUCT.md
Tip
Before starting significant work, signal your intention in the issue tracker to coordinate with maintainers.
Post-Training Focus: Specializes in techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
PyTorch Integration: Built natively on PyTorch with dependencies on PyTorch nightly, Monarch, vLLM, and TorchTitan.
Multi-GPU Support: Designed for distributed training with minimum 3 GPU requirement for GRPO training
Model Support: Includes pre-configured setups for popular models like Llama3 8B and Qwen3.1 7B
Documentation
Indices#
Index - Index of all documented objects
Module Index - Python module index
License: BSD 3-Clause | GitHub: meta-pytorch/forge