Monarch 🦋#

Monarch is a distributed programming framework for PyTorch based on scalable actor messaging. It provides:

  1. Remote actors with scalable messaging: Actors are grouped into collections called meshes and messages can be broadcast to all members.

  2. Fault tolerance through supervision trees: Actors and processes form a tree and failures propagate up the tree, providing good default error behavior and enabling fine-grained fault recovery.

  3. Point-to-point RDMA transfers: cheap registration of any GPU or CPU memory in a process, with the one-sided transfers based on libibverbs

  4. Distributed tensors: actors can work with tensor objects sharded across processes

Monarch code imperatively describes how to create processes and actors using a simple python API:

from monarch.actor import Actor, endpoint, this_host

# spawn 8 trainer processes one for each gpu
training_procs = this_host().spawn_procs({"gpus": 8})


# define the actor to run on each process
class Trainer(Actor):
    @endpoint
    def train(self, step: int): ...


# create the trainers
trainers = training_procs.spawn("trainers", Trainer)

# tell all the trainers to take a step
fut = trainers.train.call(step=0)

# wait for all trainers to complete
fut.get()

Note: Monarch is currently only supported on Linux systems

Getting Started#

Here are some suggested steps to get started with Monarch:

  1. Installation: Check out the Install guide for getting monarch installed.

  2. Getting Started: The getting started provides an introduction to Monarch’s core API

  3. Explore Examples: Review the Examples to see Monarch in action

  4. Dive Deeper: Explore the API Documentation for more detailed information:

  5. Deep Understanding of Actors: Gain comprehensive knowledge of Actors, the foundational building blocks of Monarch.

  6. Monitoring Tools: Inspect running meshes with the Admin TUI (terminal) or the Monarch Dashboard (web GUI).

License#

Monarch is BSD-3 licensed, as found in the LICENSE file.

  • Terms of Use <https://opensource.fb.com/legal/terms>_

  • Privacy Policy <https://opensource.fb.com/legal/privacy>_

Community#

We welcome contributions from the community! If you’re interested in contributing, please:

  1. Check the GitHub repository

  2. Review existing issues or create a new one

  3. Discuss your proposed changes before starting work

  4. Submit a pull request with your changes

Examples and blogs#