Crate monarch_rdma

Source

Modules§

device_selection
This module provides functionality to automatically pair compute devices with the best available RDMA NICs based on PCI topology distance.

Macros§

cu_check

Structs§

DoorBell
Gid
IbvWc
Wrapper around ibv_wc (ibverbs work completion).
IbverbsConfig
Represents ibverbs specific configurations.
RdmaBuffer
RdmaDevice
Represents an RDMA device in the system.
RdmaDomain
Represents a domain for RDMA operations, encapsulating the necessary resources for establishing and managing RDMA connections.
RdmaManagerActor
RdmaMemoryRegionView
Represents a view of a memory region that can be registered with an RDMA device.
RdmaPort
RdmaQpInfo
Contains information needed to establish an RDMA queue pair with a remote endpoint.
RdmaQueuePair
Represents an RDMA Queue Pair (QP) that enables communication between two endpoints.

Enums§

PollTarget
Enum to specify which completion queue to poll
RdmaManagerMessage
Represents a reference to a remote RDMA buffer that can be accessed via RDMA operations. This struct encapsulates all the information needed to identify and access a memory region on a remote host using RDMA.
RdmaOperation
Enum representing the common RDMA operations.
RdmaQpType
Queue pair type for RDMA operations.

Traits§

RdmaManagerMessageClient
The custom client trait for this message type.
RdmaManagerMessageHandler
The custom handler trait for this message type.

Functions§

format_gid
Formats a GID (Global Identifier) into a human-readable string.
get_all_devices
Retrieves information about all available RDMA devices in the system.
get_link_layer_str
Converts the given link layer type to a human-readable string.
get_port_phy_state_str
Converts the given physical state to a human-readable string.
get_port_state_str
Converts the given port state to a human-readable string.
get_rdmaxcel_error_message
Helper function to get detailed error messages from RDMAXCEL error codes
get_registered_cuda_segments
Get all segments that have been registered with MRs
ibverbs_supported
Checks if ibverbs devices can be retrieved successfully.
is_cuda_available
Safely checks if CUDA is available on the system.
mlx5dv_supported
Checks if mlx5dv (Mellanox device-specific verbs extension) is supported.
print_device_info
Print comprehensive RDMA device information for debugging (always prints).
print_device_info_if_debug_enabled
Print comprehensive RDMA device information for debugging. Controlled by MONARCH_DEBUG_RDMA environment variable.
pt_cuda_allocator_compatibility
Check if PyTorch CUDA caching allocator has expandable segments enabled.
rdma_supported
Checks if RDMA is fully supported on this system.
resolve_qp_type
Converts RdmaQpType to the corresponding integer enum value in rdmaxcel_sys.
validate_execution_context
Utility to validate execution context.