Rate this Page

Examples#

  • ping_pong.py: Demonstrates the basics of Monarch’s Actor/endpoint API with a ping-pong communication example

  • crawler.py: Demonstrates Monarch’s actor API and many-to-one communications with a web crawler example

  • spmd_ddp.py: Shows how to run PyTorch’s Distributed Data Parallel (DDP) using SPMDActor

  • Interactive SPMD Job: Shows how to use serve() and run_spmd() for interactive SPMD training with job caching and debugging

  • kubernetes_ddp.py: Extends the DDP example to run on Kubernetes using MonarchMesh CRD and operator

  • grpo_actor.py: Implements a distributed PPO-like reinforcement learning algorithm using the Monarch actor framework

  • grpo_yield.py: Same GRPO task as grpo_actor.py, wired with direct port messaging instead of queue actors and RDMABuffer; each actor’s run endpoint is structured like a Python generator that yields values to, and receives values from, the next actor in the ring

  • kubernetes_grpo.py: Extends the GRPO example to run on Kubernetes using MonarchMesh CRDs, fine-tuning the open-source Qwen3.5-0.8B model on the GSM8K math dataset

  • distributed_tensors.py: Shows how to dispatch tensors and tensor level operations to a distributed mesh of workers and GPUs

  • debugging.py: Shows how to use the Monarch debugger to debug a distributed program

  • otel_collector.py: Exports Monarch metrics and logs to an OpenTelemetry Collector deployed on Kubernetes, with Grafana for visualization

  • Multinode Slurm Tutorial: Multinode distributed training tutorial using Monarch and Slurm to run an SPMD training job.

  • Running on Kubernetes using Skypilot: Run Monarch on Kubernetes and cloud VMs via SkyPilot.