Examples#
ping_pong.py: Demonstrates the basics of Monarch’s Actor/endpoint API with a ping-pong communication example
crawler.py: Demonstrates Monarch’s actor API and many-to-one communications with a web crawler example
spmd_ddp.py: Shows how to run PyTorch’s Distributed Data Parallel (DDP) using SPMDActor
Interactive SPMD Job: Shows how to use
serve()andrun_spmd()for interactive SPMD training with job caching and debuggingkubernetes_ddp.py: Extends the DDP example to run on Kubernetes using MonarchMesh CRD and operator
grpo_actor.py: Implements a distributed PPO-like reinforcement learning algorithm using the Monarch actor framework
distributed_tensors.py: Shows how to dispatch tensors and tensor level operations to a distributed mesh of workers and GPUs
debugging.py: Shows how to use the Monarch debugger to debug a distributed program
Multinode Slurm Tutorial: Multinode distributed training tutorial using Monarch and Slurm to run an SPMD training job.
Running on Kubernetes using Skypilot: Run Monarch on Kubernetes and cloud VMs via SkyPilot.