Rate this Page

monarch.rdma#

The monarch.rdma module provides Remote Direct Memory Access (RDMA) support for high-performance networking and zero-copy data transfers between processes. See the Point-to-Point RDMA guide for an overview.

RDMA Buffer#

class monarch.rdma.RDMABuffer(data)[source]#

Bases: object

__init__(data)[source]#

RDMABuffer supports 1d contiguous tensors (including tensor views/slices) or 1d c-contiguous memoryviews.

Parameters:

data (Tensor | memoryview) – torch.Tensor or memoryview to create the buffer from. Must be 1d and contiguous. If provided, addr and size must not be specified.

Raises:
  • ValueError – If data is not 1d contiguous, if size is 0, or if data is a GPU tensor.

  • RuntimeError – If RDMA is not available on this platform.

Note

Currently only CPU tensors are supported. GPU tensor support will be added in the future.

TODO: Create TensorBuffer, which will be main user API supporting non-contiguous tensors

size()[source]#
read_into(dst, *, timeout=3)[source]#

Read data from the RDMABuffer into a destination tensor.

The destination tensor must be contiguous (including tensor views/slices). :param dst: Destination tensor or memoryview to read into.

Keyword Arguments:

timeout (int, optional) – Timeout in seconds for the operation. Defaults to 3s.

Returns:

A Monarch Future that can be awaited or called with .get() for blocking operation.

Return type:

Future[Optional[int]]

Raises:

ValueError – If the destination tensor size is smaller than the RDMA buffer size.

Note

Currently only CPU tensors are fully supported. GPU tensors will be temporarily copied to CPU, which may impact performance.

write_from(src, *, timeout=3)[source]#

Write data from a source tensor into the RDMABuffer.

Parameters:

src (Tensor | memoryview) – Source tensor containing data to be written to the RDMA buffer. Must be a contiguous tensor (including tensor views/slices). Either src or addr/size must be provided.

Keyword Arguments:

timeout (int, optional) – Timeout in seconds for the operation. Defaults to 3s.

Returns:

A Monarch Future object that can be awaited or called with .get()

for blocking operation. Returns None when completed successfully.

Return type:

Future[None]

Raises:

ValueError – If the source tensor size exceeds the RDMA buffer size.

Note

Currently only CPU tensors are fully supported. GPU tensors will be temporarily copied to CPU, which may impact performance.

drop()[source]#

Release the handle on the memory that the src holds to this memory.

property owner: str#

The owner reference (str)

RDMA Actions#

class monarch.rdma.RDMAAction[source]#

Bases: object

Schedule a bunch of actions at once. This provides an opportunity to optimize bulk RDMA transactions without exposing complexity to users.

class RDMAOp(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Enumeration of RDMA operation types.

READ_INTO = 'read_into'#
WRITE_FROM = 'write_from'#
FETCH_ADD = 'fetch_add'#
COMPARE_AND_SWAP = 'compare_and_swap'#
__init__()[source]#
read_into(src, dst)[source]#

Read from src RDMA buffer into dst memory.

Parameters:
write_from(src, dst)[source]#

Write from dst memory to src RDMA buffer.

Parameters:
fetch_add(src, dst, add)[source]#

Perform atomic fetch-and-add operation on src RDMA buffer.

Parameters:
  • src (RDMABuffer) – src RDMA buffer to perform operation on

  • dst (Tensor | memoryview) – Local memory to store the original value

  • add (int) – Value to add to the src buffer

Atomically:

*dst = *src *src = *src + add

Note: src/dst are 8 bytes

compare_and_swap(src, dst, compare, swap)[source]#

Perform atomic compare-and-swap operation on src RDMA buffer.

Parameters:
  • src (RDMABuffer) – src RDMA buffer to perform operation on

  • dst (Tensor | memoryview) – Local memory to store the original value

  • compare (int) – Value to compare against

  • swap (int) – Value to swap in if comparison succeeds

Atomically:

*dst = *src; if (*src == compare) {

*src = swap

}

Note: src/dst are 8 bytes

submit()[source]#

Schedules the work (can be called multiple times to schedule the same work more than once). Future completes when all the work is done.

Executes futures for each src actor independently and concurrently for optimal performance.

Utility Functions#

monarch.rdma.is_rdma_available()[source]#