Crate monarch_tensor_worker

Source
Expand description

A hyperactor-based implementation of a PyTorch worker actor.

The worker is responsible for executing PyTorch operations on a local device. It assumes it has exclusive access to device resources, and manages concurrency internally via device-specific constructs (CUDA stream, threads, etc.).

This is a port of monarch/python/controller/worker.py but does have gaps due to drift that needs to be reconciled. This mainly includes:

  • Support for record and replay
  • debugger support
  • general drift in exisitng messages

Modules§

bootstrap
device_mesh
pipe
py_pipe
stream
test_util

Structs§

WorkerActor
A PyTorch runtime instance, operating on a single accelerator device, controlled via hyperactor messaging.

Enums§

AssignRankMessage
Worker messages. These define the observable behavior of the worker, so the documentations here

Traits§

AssignRankMessageClient
The custom client trait for this message type.
AssignRankMessageHandler
The custom handler trait for this message type.