pub struct Communicator { /* private fields */ }
Expand description
Wraps a NCCL communicator, and provides a Tensor-based interface it.
This implements a subset of the c10d::ProcessGroup
API.
Implementations§
Source§impl Communicator
impl Communicator
Sourcepub fn new(
device: CudaDevice,
world_size: i32,
unique_id: UniqueId,
rank: i32,
) -> Result<Self, NcclError>
pub fn new( device: CudaDevice, world_size: i32, unique_id: UniqueId, rank: i32, ) -> Result<Self, NcclError>
Create a new communicator. This function must be called by a different thread/process per rank.
Sourcepub fn split_all(
&mut self,
config: Option<NcclConfig>,
) -> Result<Self, NcclError>
pub fn split_all( &mut self, config: Option<NcclConfig>, ) -> Result<Self, NcclError>
Split off a new communicator from this one, preserving the same world size.
Sourcepub fn split_from(
&mut self,
ranks: Vec<i32>,
config: Option<NcclConfig>,
) -> Result<Option<Self>, NcclError>
pub fn split_from( &mut self, ranks: Vec<i32>, config: Option<NcclConfig>, ) -> Result<Option<Self>, NcclError>
Split off a new communicator from this one. Only ranks
will be present
on this new communicator.
If ranks
is empty, ncclCommSplit
will be called with
NCCL_SPLIT_NOCOLOR. This can be useful if ranks excluded from the split
don’t even know what ranks will be included.
Sourcepub fn all_reduce(
&mut self,
tensor: &TensorCell,
reduce_op: ReduceOp,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce the tensor data across all ranks, with each rank receiving the final result in-place.
See torch.distributed.all_reduce
for more detailed documentation.
Sourcepub fn broadcast(
&mut self,
tensor: &TensorCell,
root: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn broadcast( &mut self, tensor: &TensorCell, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Broadcast the tensor data on the root
rank to all the others.
See torch.distributed.broadcast
for more detailed documentation.
Sourcepub fn reduce(
&mut self,
tensor: &TensorCell,
reduce_op: ReduceOp,
root: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce the tensor data across all ranks, writing the result out to
tensor on the root
rank.
See torch.distributed.reduce
for more detailed documentation.
Sourcepub fn all_gather(
&mut self,
output_cells: &[TensorCell],
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_gather( &mut self, output_cells: &[TensorCell], input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Gather tensors from all ranks into a list of output tensors.
See torch.distributed.all_gather
for more detailed documentation.
Sourcepub fn all_gather_into_tensor(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_gather_into_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Gather tensors from all ranks into a single output tensor.
See torch.distributed.all_gather_into_tensor
for more detailed
documentation.
Sourcepub fn reduce_scatter_tensor(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
reduce_op: ReduceOp,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn reduce_scatter_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Reduce, then scatters the result to all tensors in the group.
See torch.distributed.reduce_scatter_tensor
for more detailed
documentation.
Sourcepub fn send(
&mut self,
tensor_cell: &TensorCell,
dst: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn send( &mut self, tensor_cell: &TensorCell, dst: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Send a tensor to the rank dst
.
Sourcepub fn recv(
&mut self,
tensor_cell: &TensorCell,
src: i32,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn recv( &mut self, tensor_cell: &TensorCell, src: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Receive a tensor from the rank src
.
Sourcepub fn all_to_all_single(
&mut self,
output_cell: &TensorCell,
input_cell: &TensorCell,
stream: &Stream,
) -> Result<NcclStatus, NcclError>
pub fn all_to_all_single( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>
Split the input tensor then scatter the split list to all processes in the group. The received splits are then concatenated into the output tensor.
See torch.distributed.all_to_all_single
for more detailed
documentation.
Trait Implementations§
Source§impl Debug for Communicator
impl Debug for Communicator
impl Send for Communicator
SAFETY: ncclComm_t
is okay to access from multiple threads, but each
communicator must issue nccl calls in the same order. It is up to the user
to ensure this.
impl Sync for Communicator
SAFETY: ncclComm_t
is okay to access from multiple threads, but each
communicator must issue nccl calls in the same order. It is up to the user
to ensure this.
Auto Trait Implementations§
impl Freeze for Communicator
impl RefUnwindSafe for Communicator
impl Unpin for Communicator
impl UnwindSafe for Communicator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> FutureExt for T
impl<T> FutureExt for T
§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
Source§impl<A, M> Handler<IndexedErasedUnbound<M>> for A
impl<A, M> Handler<IndexedErasedUnbound<M>> for A
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more