# TorchForge Documentation **TorchForge** is a PyTorch-native library for RL post-training and agentic development. Built on the principle that **researchers should write algorithms, not infrastructure**. ```{note} **Experimental Status:** TorchForge is currently in early development. Expect bugs, incomplete features, and API changes. Please file issues on [GitHub](https://github.com/meta-pytorch/forge) for bug reports and feature requests. ``` ## Why TorchForge? Reinforcement Learning has become essential to frontier AI - from instruction following and reasoning to complex research capabilities. But infrastructure complexity often dominates the actual research. TorchForge lets you **express RL algorithms as naturally as pseudocode**, while powerful infrastructure handles distribution, fault tolerance, and optimization underneath. ### Core Design Principles - **Algorithms, Not Infrastructure**: Write your RL logic without distributed systems code - **Any Degree of Asynchrony**: From fully synchronous PPO to fully async off-policy training - **Composable Components**: Mix and match proven frameworks (vLLM, TorchTitan) with custom logic - **Built on Solid Foundations**: Leverages Monarch's single-controller model for simplified distributed programming ## Foundation: The Technology Stack TorchForge is built on carefully selected, battle-tested components: ::::{grid} 1 1 2 2 :gutter: 3 :::{grid-item-card} **Monarch** :link: https://meta-pytorch.org/monarch Single-controller distributed programming framework that orchestrates clusters like you'd program a single machine. Provides actor meshes, fault tolerance, and RDMA-based data transfers. **Why it matters:** Eliminates SPMD complexity, making distributed RL tractable ::: :::{grid-item-card} **vLLM** :link: https://docs.vllm.ai High-throughput, memory-efficient inference engine with PagedAttention and continuous batching. **Why it matters:** Handles policy generation efficiently at scale ::: :::{grid-item-card} **TorchTitan** :link: https://github.com/pytorch/torchtitan Meta's production-grade LLM training platform with FSDP, pipeline parallelism, and tensor parallelism. **Why it matters:** Battle-tested training infrastructure proven at scale ::: :::{grid-item-card} **TorchStore** :link: https://github.com/meta-pytorch/torchstore Distributed, in-memory key-value store for PyTorch tensors built on Monarch, optimized for weight synchronization with automatic DTensor resharding. **Why it matters:** Solves the weight transfer bottleneck in async RL ::: :::: ## What You Can Build ::::{grid} 1 1 2 3 :gutter: 2 :::{grid-item-card} Supervised Fine-Tuning Adapt foundation models to specific tasks using labeled data with efficient multi-GPU training. ::: :::{grid-item-card} GRPO Training Train models with Generalized Reward Policy Optimization for aligning with human preferences. ::: :::{grid-item-card} Asynchronous RL Continuous rollout generation with non-blocking training for maximum throughput. ::: :::{grid-item-card} Code Execution Safe, sandboxed code execution environments for RL on coding tasks (RLVR). ::: :::{grid-item-card} Tool Integration Extensible environment system for agents that interact with tools and APIs. ::: :::{grid-item-card} Custom Workflows Build your own components and compose them naturally with existing infrastructure. ::: :::: ## Requirements at a Glance Before diving in, check out {doc}`getting_started` and ensure your system meets the requirements. ## Writing RL Code With TorchForge, your RL logic looks like pseudocode: ```python async def generate_episode(dataloader, policy, reward, replay_buffer): # Sample a prompt prompt, target = await dataloader.sample.route() # Generate response response = await policy.generate.route(prompt) # Score the response reward_value = await reward.evaluate_response.route( prompt=prompt, response=response.text, target=target ) # Store for training await replay_buffer.add.route( Episode(prompt_ids=response.prompt_ids, response_ids=response.token_ids, reward=reward_value) ) ``` No retry logic, no resource management, no synchronization code - just your algorithm. ## Documentation Paths Choose your learning path: ::::{grid} 1 1 2 2 :gutter: 3 :::{grid-item-card} 🚀 Getting Started :link: getting_started :link-type: doc Installation, prerequisites, verification, and your first training run. **Time to first run: ~15 minutes** ::: :::{grid-item-card} 💻 Tutorials :link: tutorials :link-type: doc Step-by-step guides and practical examples for training with TorchForge. **For hands-on development** ::: :::{grid-item-card} 📖 API Reference :link: api :link-type: doc Complete API documentation for customization and extension. **For deep integration** ::: :::: ## Validation & Partnerships TorchForge has been validated in real-world deployments: - **Stanford Collaboration**: Integration with the Weaver weak verifier project, training models that hill-climb on challenging reasoning benchmarks (MATH, GPQA) - **CoreWeave**: Large-scale training on 512 H100 GPU clusters with smooth, efficient performance - **Scale**: Tested across hundreds of GPUs with continuous rollouts and asynchronous training ## Community - **GitHub**: [meta-pytorch/forge](https://github.com/meta-pytorch/forge) - **Issues**: [Report bugs and request features](https://github.com/meta-pytorch/forge/issues) - **Contributing**: [CONTRIBUTING.md](https://github.com/meta-pytorch/forge/blob/main/CONTRIBUTING.md) - **Code of Conduct**: [CODE_OF_CONDUCT.md](https://github.com/meta-pytorch/forge/blob/main/CODE_OF_CONDUCT.md) ```{tip} Before starting significant work, signal your intention in the issue tracker to coordinate with maintainers. ``` * **Post-Training Focus**: Specializes in techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) * **PyTorch Integration**: Built natively on PyTorch with dependencies on [PyTorch nightly](https://pytorch.org/get-started/locally/), [Monarch](https://meta-pytorch.org/monarch), [vLLM](https://docs.vllm.ai/en/latest/), and [TorchTitan](https://github.com/pytorch/torchtitan). * **Multi-GPU Support**: Designed for distributed training with minimum 3 GPU requirement for GRPO training * **Model Support**: Includes pre-configured setups for popular models like Llama3 8B and Qwen3.1 7B ```{toctree} :maxdepth: 2 :caption: Documentation getting_started tutorials api ``` ## Indices * {ref}`genindex` - Index of all documented objects * {ref}`modindex` - Python module index --- **License**: BSD 3-Clause | **GitHub**: [meta-pytorch/forge](https://github.com/meta-pytorch/forge)