TorchForge Documentation#

TorchForge is a PyTorch-native library for RL post-training and agentic development. Built on the principle that researchers should write algorithms, not infrastructure.

Note

Experimental Status: TorchForge is currently in early development. Expect bugs, incomplete features, and API changes. Please file issues on GitHub for bug reports and feature requests.

Why TorchForge?#

Reinforcement Learning has become essential to frontier AI - from instruction following and reasoning to complex research capabilities. But infrastructure complexity often dominates the actual research.

TorchForge lets you express RL algorithms as naturally as pseudocode, while powerful infrastructure handles distribution, fault tolerance, and optimization underneath.

Core Design Principles#

  • Algorithms, Not Infrastructure: Write your RL logic without distributed systems code

  • Any Degree of Asynchrony: From fully synchronous PPO to fully async off-policy training

  • Composable Components: Mix and match proven frameworks (vLLM, TorchTitan) with custom logic

  • Built on Solid Foundations: Leverages Monarch’s single-controller model for simplified distributed programming

Foundation: The Technology Stack#

TorchForge is built on carefully selected, battle-tested components:

Monarch

Single-controller distributed programming framework that orchestrates clusters like you’d program a single machine. Provides actor meshes, fault tolerance, and RDMA-based data transfers.

Why it matters: Eliminates SPMD complexity, making distributed RL tractable

https://meta-pytorch.org/monarch
vLLM

High-throughput, memory-efficient inference engine with PagedAttention and continuous batching.

Why it matters: Handles policy generation efficiently at scale

https://docs.vllm.ai
TorchTitan

Meta’s production-grade LLM training platform with FSDP, pipeline parallelism, and tensor parallelism.

Why it matters: Battle-tested training infrastructure proven at scale

https://github.com/pytorch/torchtitan
TorchStore

Distributed, in-memory key-value store for PyTorch tensors built on Monarch, optimized for weight synchronization with automatic DTensor resharding.

Why it matters: Solves the weight transfer bottleneck in async RL

https://github.com/meta-pytorch/torchstore

What You Can Build#

Supervised Fine-Tuning

Adapt foundation models to specific tasks using labeled data with efficient multi-GPU training.

GRPO Training

Train models with Generalized Reward Policy Optimization for aligning with human preferences.

Asynchronous RL

Continuous rollout generation with non-blocking training for maximum throughput.

Code Execution

Safe, sandboxed code execution environments for RL on coding tasks (RLVR).

Tool Integration

Extensible environment system for agents that interact with tools and APIs.

Custom Workflows

Build your own components and compose them naturally with existing infrastructure.

Requirements at a Glance#

Before diving in, check out Getting Started and ensure your system meets the requirements.

Writing RL Code#

With TorchForge, your RL logic looks like pseudocode:

async def generate_episode(dataloader, policy, reward, replay_buffer):
    # Sample a prompt
    prompt, target = await dataloader.sample.route()

    # Generate response
    response = await policy.generate.route(prompt)

    # Score the response
    reward_value = await reward.evaluate_response.route(
        prompt=prompt,
        response=response.text,
        target=target
    )

    # Store for training
    await replay_buffer.add.route(
        Episode(prompt_ids=response.prompt_ids,
                response_ids=response.token_ids,
                reward=reward_value)
    )

No retry logic, no resource management, no synchronization code - just your algorithm.

Documentation Paths#

Choose your learning path:

🚀 Getting Started

Installation, prerequisites, verification, and your first training run.

Time to first run: ~15 minutes

Getting Started
💻 Tutorials

Step-by-step guides and practical examples for training with TorchForge.

For hands-on development

Tutorials
📖 API Reference

Complete API documentation for customization and extension.

For deep integration

API Reference

Validation & Partnerships#

TorchForge has been validated in real-world deployments:

  • Stanford Collaboration: Integration with the Weaver weak verifier project, training models that hill-climb on challenging reasoning benchmarks (MATH, GPQA)

  • CoreWeave: Large-scale training on 512 H100 GPU clusters with smooth, efficient performance

  • Scale: Tested across hundreds of GPUs with continuous rollouts and asynchronous training

Community#

Tip

Before starting significant work, signal your intention in the issue tracker to coordinate with maintainers.

  • Post-Training Focus: Specializes in techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)

  • PyTorch Integration: Built natively on PyTorch with dependencies on PyTorch nightly, Monarch, vLLM, and TorchTitan.

  • Multi-GPU Support: Designed for distributed training with minimum 3 GPU requirement for GRPO training

  • Model Support: Includes pre-configured setups for popular models like Llama3 8B and Qwen3.1 7B

Indices#


License: BSD 3-Clause | GitHub: meta-pytorch/forge