# Get Started

Welcome to Monarch! This guide will help you get up and running with Monarch, a distributed execution engine for PyTorch that delivers high-quality user experience at cluster scale.

## What is Monarch?

Monarch is designed to extend PyTorch's capabilities to efficiently run on distributed systems. It maintains the familiar PyTorch API while handling the complexities of distributed execution, making it easier to scale your deep learning workloads across multiple GPUs and nodes.

## Prerequisites

Before installing Monarch, ensure you have:

- A Linux system (Monarch is currently only supported on Linux)
- Python 3.10 or later
- CUDA-compatible GPU(s)
- Basic familiarity with PyTorch

## Installation

### Quick Installation

The simplest way to install Monarch is via pip:

```bash
pip install torchmonarch-nightly
```

### Manual Installation

For more control or development purposes, you can install Monarch manually:

```bash
# Create and activate the conda environment
conda create -n monarchenv python=3.10 -y
conda activate monarchenv

# Install nightly rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup toolchain install nightly
rustup default nightly

# Install non-python dependencies
conda install libunwind -y

# Install the correct cuda and cuda-toolkit versions for your machine
sudo dnf install cuda-toolkit-12-0 cuda-12-0

# Install clang-dev and nccl-dev
sudo dnf install clang-devel libnccl-devel
# Or, in some environments, the following may be necessary instead
conda install -c conda-forge clangdev nccl
conda update -n monarchenv --all -c conda-forge -y

# Install build dependencies
pip install -r build-requirements.txt
# Install test dependencies
pip install -r python/tests/requirements.txt

# Build and install Monarch
pip install --no-build-isolation .
# or setup for development
pip install --no-build-isolation -e .
```

## Verifying Your Installation

After installation, you can verify that Monarch is working correctly by running the unit tests:

```bash
pytest python/tests/ -v -m "not oss_skip"
```

## Basic Usage

Here's a simple example to get you started with Monarch:

```python
import torch
import monarch as mon

# Initialize Monarch
mon.init()

# Create a simple model
model = torch.nn.Linear(10, 5)

# Distribute the model using Monarch
distributed_model = mon.distribute(model)

# Create some input data
input_data = torch.randn(8, 10)

# Run a forward pass
output = distributed_model(input_data)

# Clean up
mon.shutdown()
```

## Example: Ping Pong

One of the simplest examples of using Monarch is the "ping pong" example, which demonstrates basic communication between processes:

```python
import monarch as mon
import torch

# Initialize Monarch
mon.init()

# Get the current process rank and world size
rank = mon.get_rank()
world_size = mon.get_world_size()

# Create a tensor to send
send_tensor = torch.tensor([rank], dtype=torch.float32)

# Determine the destination rank
dst_rank = (rank + 1) % world_size

# Send the tensor to the destination rank
mon.send(send_tensor, dst_rank)

# Receive a tensor from the source rank
src_rank = (rank - 1) % world_size
recv_tensor = torch.zeros(1, dtype=torch.float32)
mon.recv(recv_tensor, src_rank)

print(f"Rank {rank} received {recv_tensor.item()} from rank {src_rank}")

# Clean up
mon.shutdown()
```

## Distributed Data Parallel Training

Monarch makes it easy to implement distributed data parallel training:

```python
import monarch as mon
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize Monarch
mon.init()

# Create a simple model
model = nn.Linear(10, 5)
model = mon.distribute(model)

# Create optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Create loss function
criterion = nn.MSELoss()

# Training loop
for epoch in range(10):
    # Assume data_loader is your distributed data loader
    for data, target in data_loader:
        # Forward pass
        output = model(data)
        loss = criterion(output, target)

        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Clean up
mon.shutdown()
```

## Next Steps

Now that you've got the basics, you can:

1. Check out the [Examples](./generated/examples/index) directory for more detailed demonstrations
2. Explore the [API documentation](rust-api) for a complete reference


## Troubleshooting

If you encounter issues:

- Make sure your CUDA environment is properly set up
- Check that you're using a compatible version of PyTorch
- Verify that all dependencies are installed correctly
- Consult the [GitHub repository](https://github.com/pytorch-labs/monarch) for known issues

Remember that Monarch is currently in an experimental stage, so you may encounter bugs or incomplete features. Contributions and bug reports are welcome!