Getting Started#
This guide will walk you through installing TorchForge, understanding its dependencies, verifying your setup, and running your first training job.
System Requirements#
Before installing TorchForge, ensure your system meets the following requirements.
Component |
Requirement |
Notes |
|---|---|---|
Operating System |
Linux (Fedora/Ubuntu/Debian) |
MacOS and Windows not currently supported |
Python |
3.10 or higher |
Python 3.11 recommended |
GPU |
NVIDIA with CUDA support |
AMD GPUs not currently supported |
Minimum GPUs |
2+ for SFT, 3+ for GRPO |
More GPUs enable larger models |
CUDA |
12.8 |
Required for GPU training |
RAM |
32GB+ recommended |
Depends on model size |
Disk Space |
50GB+ free |
For models, datasets, and checkpoints |
PyTorch |
Nightly build |
Latest distributed features (DTensor, FSDP) |
Monarch |
Pre-packaged wheel |
Distributed orchestration and actor system |
vLLM |
v0.10.0+ |
Fast inference with PagedAttention |
TorchTitan |
Latest |
Production training infrastructure |
Prerequisites#
Conda or Miniconda: For environment management
Download from conda.io
GitHub CLI (gh): Required for downloading pre-packaged dependencies
Install instructions: github.com/cli/cli#installation
After installing, authenticate with:
gh auth loginYou can use either HTTPS or SSH as the authentication protocol
Git: For cloning the repository
Usually pre-installed on Linux systems
Verify with:
git --version
Installation note: The installation script provides pre-built wheels with PyTorch nightly already included.
Installation#
TorchForge uses pre-packaged wheels for all dependencies, making installation faster and more reliable.
Clone the Repository
git clone https://github.com/meta-pytorch/forge.git cd forge
Create Conda Environment
conda create -n forge python=3.10 conda activate forge
Run Installation Script
./scripts/install.sh
The installation script will:
Install system dependencies using DNF (or your package manager)
Download pre-built wheels for PyTorch nightly, Monarch, vLLM, and TorchTitan
Install TorchForge and all Python dependencies
Configure the environment for GPU training
Tip
Using sudo instead of conda: If you prefer installing system packages directly rather than through conda, use:
./scripts/install.sh --use-sudoWarning
When adding packages to
pyproject.toml, useuv sync --inexactto avoid removing Monarch and vLLM.
Verifying Your Setup#
After installation, verify that all components are working correctly:
Check GPU Availability
python -c "import torch; print(f'GPUs available: {torch.cuda.device_count()}')"
Expected output:
GPUs available: 2(or more)Check CUDA Version
python -c "import torch; print(f'CUDA version: {torch.version.cuda}')"
Expected output:
CUDA version: 12.8Check All Dependencies
# Check core components python -c "import torch, forge, monarch, vllm; print('All imports successful')" # Check specific versions python -c " import torch import forge import vllm print(f'PyTorch: {torch.__version__}') print(f'TorchForge: {forge.__version__}') print(f'vLLM: {vllm.__version__}') print(f'CUDA: {torch.version.cuda}') print(f'GPUs: {torch.cuda.device_count()}') "
Verify Monarch
python -c " from monarch.actor import Actor, this_host # Test basic Monarch functionality procs = this_host().spawn_procs({'gpus': 1}) procs.initialized.get() print('Monarch: Process spawning works') "
Quick Start Examples#
Now that TorchForge is installed, let’s run some training examples.
Here’s what training looks like with TorchForge:
# Install dependencies
conda create -n forge python=3.10
conda activate forge
git clone https://github.com/meta-pytorch/forge
cd forge
./scripts/install.sh
# Download a model
hf download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir /tmp/Meta-Llama-3.1-8B-Instruct --exclude "original/consolidated.00.pth"
# Run SFT training (requires 2+ GPUs)
uv run forge run --nproc_per_node 2 \
apps/sft/main.py --config apps/sft/llama3_8b.yaml
# Run GRPO training (requires 3+ GPUs)
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
Example 1: Supervised Fine-Tuning (SFT)#
Fine-tune Llama 3 8B on your data. Requires: 2+ GPUs
Access the Model
Note
Model downloads are no longer required, but Hugging Face authentication is required to access the models.
Run
huggingface-cli loginfirst if you haven’t already.Run Training
python -m apps.sft.main --config apps/sft/llama3_8b.yaml
What’s Happening:
--nproc_per_node 2: Use 2 GPUs for trainingapps/sft/main.py: SFT training script--config apps/sft/llama3_8b.yaml: Configuration file with hyperparametersTorchTitan handles model sharding across the 2 GPUs
Monarch coordinates the distributed training
Example 2: GRPO Training#
Train a model using reinforcement learning with GRPO. Requires: 3+ GPUs
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
What’s Happening:
GPU 0: Trainer model (being trained, powered by TorchTitan)
GPU 1: Reference model (frozen baseline, powered by TorchTitan)
GPU 2: Policy model (scoring outputs, powered by vLLM)
Monarch orchestrates all three components
TorchStore handles weight synchronization from training to inference
Understanding Configuration Files#
TorchForge uses YAML configuration files to manage training parameters. Let’s examine a typical config:
# Example: apps/sft/llama3_8b.yaml
model:
name: meta-llama/Meta-Llama-3.1-8B-Instruct
path: /tmp/Meta-Llama-3.1-8B-Instruct
training:
batch_size: 4
learning_rate: 1e-5
num_epochs: 10
gradient_accumulation_steps: 4
distributed:
strategy: fsdp # Managed by TorchTitan
precision: bf16
checkpointing:
save_interval: 1000
output_dir: /tmp/checkpoints
Key Sections:
model: Model path and settings
training: Hyperparameters like batch size and learning rate
distributed: Multi-GPU strategy (FSDP, tensor parallel, etc.) handled by TorchTitan
checkpointing: Where and when to save model checkpoints
Next Steps#
Now that you have TorchForge installed and verified:
Explore Examples: Check the
apps/directory for more training examplesRead Tutorials: Follow Tutorials for step-by-step guides
API Documentation: Explore API Reference for detailed API reference
Getting Help#
If you encounter issues:
Search Issues: Look through GitHub Issues
File a Bug Report: Create a new issue with:
Your system configuration (output of diagnostic command below)
Full error message
Steps to reproduce
Expected vs actual behavior
Diagnostic command:
python -c "
import torch
import forge
try:
import monarch
monarch_status = 'OK'
except Exception as e:
monarch_status = str(e)
try:
import vllm
vllm_version = vllm.__version__
except Exception as e:
vllm_version = str(e)
print(f'PyTorch: {torch.__version__}')
print(f'TorchForge: {forge.__version__}')
print(f'Monarch: {monarch_status}')
print(f'vLLM: {vllm_version}')
print(f'CUDA: {torch.version.cuda}')
print(f'GPUs: {torch.cuda.device_count()}')
"
Include this output in your bug reports!
Additional Resources#
Contributing Guide: CONTRIBUTING.md
Code of Conduct: CODE_OF_CONDUCT.md
Monarch Documentation: meta-pytorch.org/monarch
vLLM Documentation: docs.vllm.ai
TorchTitan: github.com/pytorch/torchtitan