Function cuLaunchCooperativeKernel

Source
pub unsafe extern "C" fn cuLaunchCooperativeKernel(
    f: CUfunction,
    gridDimX: c_uint,
    gridDimY: c_uint,
    gridDimZ: c_uint,
    blockDimX: c_uint,
    blockDimY: c_uint,
    blockDimZ: c_uint,
    sharedMemBytes: c_uint,
    hStream: CUstream,
    kernelParams: *mut *mut c_void,
) -> CUresult
Expand description

\brief Launches a CUDA function ::CUfunction or a CUDA kernel ::CUkernel where thread blocks can cooperate and synchronize as they execute

Invokes the function ::CUfunction or the kernel ::CUkernel \p f on a \p gridDimX x \p gridDimY x \p gridDimZ grid of blocks. Each block contains \p blockDimX x \p blockDimY x \p blockDimZ threads.

\p sharedMemBytes sets the amount of dynamic shared memory that will be available to each thread block.

The device on which this kernel is invoked must have a non-zero value for the device attribute ::CU_DEVICE_ATTRIBUTE_COOPERATIVE_LAUNCH.

The total number of blocks launched cannot exceed the maximum number of blocks per multiprocessor as returned by ::cuOccupancyMaxActiveBlocksPerMultiprocessor (or ::cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors as specified by the device attribute ::CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT.

The kernel cannot make use of CUDA dynamic parallelism.

Kernel parameters must be specified via \p kernelParams. If \p f has N parameters, then \p kernelParams needs to be an array of N pointers. Each of \p kernelParams[0] through \p kernelParams[N-1] must point to a region of memory from which the actual kernel parameter will be copied. The number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel’s image.

Calling ::cuLaunchCooperativeKernel() sets persistent function state that is the same as function state set through ::cuLaunchKernel API

When the kernel \p f is launched via ::cuLaunchCooperativeKernel(), the previous block shape, shared size and parameter info associated with \p f is overwritten.

Note that to use ::cuLaunchCooperativeKernel(), the kernel \p f must either have been compiled with toolchain version 3.2 or later so that it will contain kernel parameter information, or have no kernel parameters. If either of these conditions is not met, then ::cuLaunchCooperativeKernel() will return ::CUDA_ERROR_INVALID_IMAGE.

Note that the API can also be used to launch context-less kernel ::CUkernel by querying the handle using ::cuLibraryGetKernel() and then passing it to the API by casting to ::CUfunction. Here, the context to launch the kernel on will either be taken from the specified stream \p hStream or the current context in case of NULL stream.

\param f - Function ::CUfunction or Kernel ::CUkernel to launch \param gridDimX - Width of grid in blocks \param gridDimY - Height of grid in blocks \param gridDimZ - Depth of grid in blocks \param blockDimX - X dimension of each thread block \param blockDimY - Y dimension of each thread block \param blockDimZ - Z dimension of each thread block \param sharedMemBytes - Dynamic shared-memory size per thread block in bytes \param hStream - Stream identifier \param kernelParams - Array of pointers to kernel parameters

\return ::CUDA_SUCCESS, ::CUDA_ERROR_DEINITIALIZED, ::CUDA_ERROR_NOT_INITIALIZED, ::CUDA_ERROR_INVALID_CONTEXT, ::CUDA_ERROR_INVALID_HANDLE, ::CUDA_ERROR_INVALID_IMAGE, ::CUDA_ERROR_INVALID_VALUE, ::CUDA_ERROR_LAUNCH_FAILED, ::CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, ::CUDA_ERROR_LAUNCH_TIMEOUT, ::CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING, ::CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE, ::CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, ::CUDA_ERROR_NOT_FOUND \note_null_stream \notefnerr

\sa ::cuCtxGetCacheConfig, ::cuCtxSetCacheConfig, ::cuFuncSetCacheConfig, ::cuFuncGetAttribute, ::cuLaunchCooperativeKernelMultiDevice, ::cudaLaunchCooperativeKernel, ::cuLibraryGetKernel, ::cuKernelSetCacheConfig, ::cuKernelGetAttribute, ::cuKernelSetAttribute