monarch.rdma#
The monarch.rdma module provides Remote Direct Memory Access (RDMA) support for high-performance networking and zero-copy data transfers between processes. See the Point-to-Point RDMA guide for an overview.
RDMA Buffer#
- class monarch.rdma.RDMABuffer(data)[source]#
Bases:
object- __init__(data)[source]#
RDMABuffer supports 1d contiguous tensors (including tensor views/slices) or 1d c-contiguous memoryviews.
- Parameters:
data (Tensor | memoryview) – torch.Tensor or memoryview to create the buffer from. Must be 1d and contiguous. If provided, addr and size must not be specified.
- Raises:
ValueError – If data is not 1d contiguous, if size is 0, or if data is a GPU tensor.
RuntimeError – If RDMA is not available on this platform.
Note
Currently only CPU tensors are supported. GPU tensor support will be added in the future.
TODO: Create TensorBuffer, which will be main user API supporting non-contiguous tensors
- read_into(dst, *, timeout=3)[source]#
Read data from the RDMABuffer into a destination tensor.
The destination tensor must be contiguous (including tensor views/slices). :param dst: Destination tensor or memoryview to read into.
- Keyword Arguments:
timeout (int, optional) – Timeout in seconds for the operation. Defaults to 3s.
- Returns:
A Monarch Future that can be awaited or called with .get() for blocking operation.
- Return type:
- Raises:
ValueError – If the destination tensor size is smaller than the RDMA buffer size.
Note
Currently only CPU tensors are fully supported. GPU tensors will be temporarily copied to CPU, which may impact performance.
- write_from(src, *, timeout=3)[source]#
Write data from a source tensor into the RDMABuffer.
- Parameters:
src (Tensor | memoryview) – Source tensor containing data to be written to the RDMA buffer. Must be a contiguous tensor (including tensor views/slices). Either src or addr/size must be provided.
- Keyword Arguments:
timeout (int, optional) – Timeout in seconds for the operation. Defaults to 3s.
- Returns:
- A Monarch Future object that can be awaited or called with .get()
for blocking operation. Returns None when completed successfully.
- Return type:
Future[None]
- Raises:
ValueError – If the source tensor size exceeds the RDMA buffer size.
Note
Currently only CPU tensors are fully supported. GPU tensors will be temporarily copied to CPU, which may impact performance.
RDMA Actions#
- class monarch.rdma.RDMAAction[source]#
Bases:
objectSchedule a bunch of actions at once. This provides an opportunity to optimize bulk RDMA transactions without exposing complexity to users.
- class RDMAOp(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumEnumeration of RDMA operation types.
- READ_INTO = 'read_into'#
- WRITE_FROM = 'write_from'#
- FETCH_ADD = 'fetch_add'#
- COMPARE_AND_SWAP = 'compare_and_swap'#
- read_into(src, dst)[source]#
Read from src RDMA buffer into dst memory.
- Parameters:
src (RDMABuffer) – Source RDMA buffer to read from
dst (Tensor | memoryview | List[Tensor | memoryview]) – Destination local memory to read into If dst is a list, it is the concatenation of the data in the list
- write_from(src, dst)[source]#
Write from dst memory to src RDMA buffer.
- Parameters:
src (RDMABuffer) – Destination RDMA buffer to write to
dst (Tensor | memoryview | List[Tensor | memoryview]) – Source local memory to write from If local is a list, it is the concatenation of the data in the list
- fetch_add(src, dst, add)[source]#
Perform atomic fetch-and-add operation on src RDMA buffer.
- Parameters:
src (RDMABuffer) – src RDMA buffer to perform operation on
dst (Tensor | memoryview) – Local memory to store the original value
add (int) – Value to add to the src buffer
Note: src/dst are 8 bytes
- compare_and_swap(src, dst, compare, swap)[source]#
Perform atomic compare-and-swap operation on src RDMA buffer.
- Parameters:
src (RDMABuffer) – src RDMA buffer to perform operation on
dst (Tensor | memoryview) – Local memory to store the original value
compare (int) – Value to compare against
swap (int) – Value to swap in if comparison succeeds
Note: src/dst are 8 bytes