Struct Communicator

Source
pub struct Communicator { /* private fields */ }
Expand description

Wraps a NCCL communicator, and provides a Tensor-based interface it.

This implements a subset of the c10d::ProcessGroupAPI.

Implementations§

Source§

impl Communicator

Source

pub fn new( device: CudaDevice, world_size: i32, unique_id: UniqueId, rank: i32, ) -> Result<Self, NcclError>

Create a new communicator. This function must be called by a different thread/process per rank.

Source

pub fn split_all( &mut self, config: Option<NcclConfig>, ) -> Result<Self, NcclError>

Split off a new communicator from this one, preserving the same world size.

Source

pub fn split_from( &mut self, ranks: Vec<i32>, config: Option<NcclConfig>, ) -> Result<Option<Self>, NcclError>

Split off a new communicator from this one. Only ranks will be present on this new communicator.

If ranks is empty, ncclCommSplit will be called with NCCL_SPLIT_NOCOLOR. This can be useful if ranks excluded from the split don’t even know what ranks will be included.

Source

pub fn all_reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Reduce the tensor data across all ranks, with each rank receiving the final result in-place.

See torch.distributed.all_reduce for more detailed documentation.

Source

pub fn broadcast( &mut self, tensor: &TensorCell, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Broadcast the tensor data on the root rank to all the others.

See torch.distributed.broadcast for more detailed documentation.

Source

pub fn reduce( &mut self, tensor: &TensorCell, reduce_op: ReduceOp, root: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Reduce the tensor data across all ranks, writing the result out to tensor on the root rank.

See torch.distributed.reduce for more detailed documentation.

Source

pub fn all_gather( &mut self, output_cells: &[TensorCell], input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Gather tensors from all ranks into a list of output tensors.

See torch.distributed.all_gather for more detailed documentation.

Source

pub fn all_gather_into_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Gather tensors from all ranks into a single output tensor.

See torch.distributed.all_gather_into_tensor for more detailed documentation.

Source

pub fn reduce_scatter_tensor( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, reduce_op: ReduceOp, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Reduce, then scatters the result to all tensors in the group.

See torch.distributed.reduce_scatter_tensor for more detailed documentation.

Source

pub fn send( &mut self, tensor_cell: &TensorCell, dst: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Send a tensor to the rank dst.

Source

pub fn recv( &mut self, tensor_cell: &TensorCell, src: i32, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Receive a tensor from the rank src.

Source

pub fn all_to_all_single( &mut self, output_cell: &TensorCell, input_cell: &TensorCell, stream: &Stream, ) -> Result<NcclStatus, NcclError>

Split the input tensor then scatter the split list to all processes in the group. The received splits are then concatenated into the output tensor.

See torch.distributed.all_to_all_single for more detailed documentation.

Source

pub fn barrier(&mut self, stream: &Stream) -> Result<NcclStatus, NcclError>

Synchronize all ranks.

See torch.distributed.barrier for more detailed documentation.

Trait Implementations§

Source§

impl Debug for Communicator

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Send for Communicator

SAFETY: ncclComm_t is okay to access from multiple threads, but each communicator must issue nccl calls in the same order. It is up to the user to ensure this.

Source§

impl Sync for Communicator

SAFETY: ncclComm_t is okay to access from multiple threads, but each communicator must issue nccl calls in the same order. It is up to the user to ensure this.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> FutureExt for T

§

fn with_context(self, otel_cx: Context) -> WithContext<Self>

Attaches the provided Context to this type, returning a WithContext wrapper. Read more
§

fn with_current_context(self) -> WithContext<Self>

Attaches the current Context to this type, returning a WithContext wrapper. Read more
Source§

impl<A, M> Handler<IndexedErasedUnbound<M>> for A
where A: Handler<M>, M: Castable,

Source§

fn handle<'life0, 'life1, 'async_trait>( &'life0 mut self, cx: &'life1 Context<'_, A>, msg: IndexedErasedUnbound<M>, ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'async_trait>>
where 'life0: 'async_trait, 'life1: 'async_trait, A: 'async_trait,

Handle the next M-typed message.
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
Source§

impl<M> Message for M
where M: Debug + Send + Sync + 'static,

§

impl<T> Ungil for T
where T: Send,