Crate monarch_tensor_worker

Expand description

A hyperactor-based implementation of a PyTorch worker actor.

The worker is responsible for executing PyTorch operations on a local device. It assumes it has exclusive access to device resources, and manages concurrency internally via device-specific constructs (CUDA stream, threads, etc.).

This is a port of monarch/python/controller/worker.py but does have gaps due to drift that needs to be reconciled. This mainly includes:

Support for record and replay
debugger support
general drift in exisitng messages

Modules§

device_mesh
stream
test_util

Structs§

WorkerActor: A PyTorch runtime instance, operating on a single accelerator device, controlled via hyperactor messaging.

Enums§

AssignRankMessage: Worker messages. These define the observable behavior of the worker, so the documentations here

Traits§

AssignRankMessageClient: The custom client trait for this message type.
AssignRankMessageHandler: The custom handler trait for this message type.

Crate monarch_tensor_workerCopy item path

Modules§

Structs§

Enums§

Traits§

Crate monarch_tensor_worker