pub struct Communicator { /* private fields */ }Expand description
Wraps a NCCL communicator, and provides a Tensor-based interface it.
This implements a subset of the c10d::ProcessGroupAPI.
Implementations§
Source§impl Communicator
impl Communicator
Sourcepub fn new(
device: CudaDevice,
world_size: i32,
unique_id: UniqueId,
rank: i32,
) -> Result<Self, NcclError>
pub fn new( device: CudaDevice, world_size: i32, unique_id: UniqueId, rank: i32, ) -> Result<Self, NcclError>
Create a new communicator. This function must be called by a different thread/process per rank.
Sourcepub fn split_all(
&mut self,
config: Option<NcclConfig>,
) -> Result<Self, NcclError>
pub fn split_all( &mut self, config: Option<NcclConfig>, ) -> Result<Self, NcclError>
Split off a new communicator from this one, preserving the same world size.
Sourcepub fn split_from(
&mut self,
ranks: Vec<i32>,
config: Option<NcclConfig>,
) -> Result<Option<Self>, NcclError>
pub fn split_from( &mut self, ranks: Vec<i32>, config: Option<NcclConfig>, ) -> Result<Option<Self>, NcclError>
Split off a new communicator from this one. Only ranks will be present
on this new communicator.
If ranks is empty, ncclCommSplit will be called with
NCCL_SPLIT_NOCOLOR. This can be useful if ranks excluded from the split
don’t even know what ranks will be included.
Sourcepub fn all_reduce(
&mut self,
tensor: &TensorCell,
reduce_op: ReduceOp,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce the tensor data across all ranks, with each rank receiving the final result in-place.
See torch.distributed.all_reduce for more detailed documentation.
Sourcepub fn broadcast(
&mut self,
tensor: &TensorCell,
root: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn broadcast( &mut self, tensor: &TensorCell, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Broadcast the tensor data on the root rank to all the others.
See torch.distributed.broadcast for more detailed documentation.
Sourcepub fn reduce(
&mut self,
tensor: &TensorCell,
reduce_op: ReduceOp,
root: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce the tensor data across all ranks, writing the result out to
tensor on the root rank.
See torch.distributed.reduce for more detailed documentation.
Sourcepub fn all_gather(
&mut self,
output_cells: &[TensorCell],
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_gather( &mut self, output_cells: &[TensorCell], input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Gather tensors from all ranks into a list of output tensors.
See torch.distributed.all_gather for more detailed documentation.
Sourcepub fn all_gather_into_tensor(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_gather_into_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Gather tensors from all ranks into a single output tensor.
See torch.distributed.all_gather_into_tensor for more detailed
documentation.
Sourcepub fn reduce_scatter_tensor(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
reduce_op: ReduceOp,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn reduce_scatter_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce, then scatters the result to all tensors in the group.
See torch.distributed.reduce_scatter_tensor for more detailed
documentation.
Sourcepub fn send(
&mut self,
tensor_cell: &TensorCell,
dst: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn send( &mut self, tensor_cell: &TensorCell, dst: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Send a tensor to the rank dst.
Sourcepub fn recv(
&mut self,
tensor_cell: &TensorCell,
src: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn recv( &mut self, tensor_cell: &TensorCell, src: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Receive a tensor from the rank src.
Sourcepub fn all_to_all_single(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_to_all_single( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Split the input tensor then scatter the split list to all processes in the group. The received splits are then concatenated into the output tensor.
See torch.distributed.all_to_all_single for more detailed
documentation.
Trait Implementations§
Source§impl Debug for Communicator
impl Debug for Communicator
impl Send for Communicator
SAFETY: ncclComm_t is okay to access from multiple threads, but each
communicator must issue nccl calls in the same order. It is up to the user
to ensure this.
impl Sync for Communicator
SAFETY: ncclComm_t is okay to access from multiple threads, but each
communicator must issue nccl calls in the same order. It is up to the user
to ensure this.
Auto Trait Implementations§
impl Freeze for Communicator
impl RefUnwindSafe for Communicator
impl Unpin for Communicator
impl UnwindSafe for Communicator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> FutureExt for T
impl<T> FutureExt for T
§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
Source§impl<A, M> Handler<IndexedErasedUnbound<M>> for A
impl<A, M> Handler<IndexedErasedUnbound<M>> for A
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more