Function cuLaunchKernel

Source
pub unsafe extern "C" fn cuLaunchKernel(
    f: CUfunction,
    gridDimX: c_uint,
    gridDimY: c_uint,
    gridDimZ: c_uint,
    blockDimX: c_uint,
    blockDimY: c_uint,
    blockDimZ: c_uint,
    sharedMemBytes: c_uint,
    hStream: CUstream,
    kernelParams: *mut *mut c_void,
    extra: *mut *mut c_void,
) -> CUresult
Expand description

\brief Launches a CUDA function ::CUfunction or a CUDA kernel ::CUkernel

Invokes the function ::CUfunction or the kernel ::CUkernel \p f on a \p gridDimX x \p gridDimY x \p gridDimZ grid of blocks. Each block contains \p blockDimX x \p blockDimY x \p blockDimZ threads.

\p sharedMemBytes sets the amount of dynamic shared memory that will be available to each thread block.

Kernel parameters to \p f can be specified in one of two ways:

  1. Kernel parameters can be specified via \p kernelParams. If \p f has N parameters, then \p kernelParams needs to be an array of N pointers. Each of \p kernelParams[0] through \p kernelParams[N-1] must point to a region of memory from which the actual kernel parameter will be copied. The number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel’s image.

  2. Kernel parameters can also be packaged by the application into a single buffer that is passed in via the \p extra parameter. This places the burden on the application of knowing each kernel parameter’s size and alignment/padding within the buffer. Here is an example of using the \p extra parameter in this manner: \code size_t argBufferSize; char argBuffer[256];

// populate argBuffer and argBufferSize

void *config[] = { CU_LAUNCH_PARAM_BUFFER_POINTER, argBuffer, CU_LAUNCH_PARAM_BUFFER_SIZE, &argBufferSize, CU_LAUNCH_PARAM_END }; status = cuLaunchKernel(f, gx, gy, gz, bx, by, bz, sh, s, NULL, config); \endcode

The \p extra parameter exists to allow ::cuLaunchKernel to take additional less commonly used arguments. \p extra specifies a list of names of extra settings and their corresponding values. Each extra setting name is immediately followed by the corresponding value. The list must be terminated with either NULL or ::CU_LAUNCH_PARAM_END.

  • ::CU_LAUNCH_PARAM_END, which indicates the end of the \p extra array;
  • ::CU_LAUNCH_PARAM_BUFFER_POINTER, which specifies that the next value in \p extra will be a pointer to a buffer containing all the kernel parameters for launching kernel \p f;
  • ::CU_LAUNCH_PARAM_BUFFER_SIZE, which specifies that the next value in \p extra will be a pointer to a size_t containing the size of the buffer specified with ::CU_LAUNCH_PARAM_BUFFER_POINTER;

The error ::CUDA_ERROR_INVALID_VALUE will be returned if kernel parameters are specified with both \p kernelParams and \p extra (i.e. both \p kernelParams and \p extra are non-NULL).

Calling ::cuLaunchKernel() invalidates the persistent function state set through the following deprecated APIs: ::cuFuncSetBlockShape(), ::cuFuncSetSharedSize(), ::cuParamSetSize(), ::cuParamSeti(), ::cuParamSetf(), ::cuParamSetv().

Note that to use ::cuLaunchKernel(), the kernel \p f must either have been compiled with toolchain version 3.2 or later so that it will contain kernel parameter information, or have no kernel parameters. If either of these conditions is not met, then ::cuLaunchKernel() will return ::CUDA_ERROR_INVALID_IMAGE.

Note that the API can also be used to launch context-less kernel ::CUkernel by querying the handle using ::cuLibraryGetKernel() and then passing it to the API by casting to ::CUfunction. Here, the context to launch the kernel on will either be taken from the specified stream \p hStream or the current context in case of NULL stream.

\param f - Function ::CUfunction or Kernel ::CUkernel to launch \param gridDimX - Width of grid in blocks \param gridDimY - Height of grid in blocks \param gridDimZ - Depth of grid in blocks \param blockDimX - X dimension of each thread block \param blockDimY - Y dimension of each thread block \param blockDimZ - Z dimension of each thread block \param sharedMemBytes - Dynamic shared-memory size per thread block in bytes \param hStream - Stream identifier \param kernelParams - Array of pointers to kernel parameters \param extra - Extra options

\return ::CUDA_SUCCESS, ::CUDA_ERROR_DEINITIALIZED, ::CUDA_ERROR_NOT_INITIALIZED, ::CUDA_ERROR_INVALID_CONTEXT, ::CUDA_ERROR_INVALID_HANDLE, ::CUDA_ERROR_INVALID_IMAGE, ::CUDA_ERROR_INVALID_VALUE, ::CUDA_ERROR_LAUNCH_FAILED, ::CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, ::CUDA_ERROR_LAUNCH_TIMEOUT, ::CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING, ::CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, ::CUDA_ERROR_NOT_FOUND \note_null_stream \notefnerr

\sa ::cuCtxGetCacheConfig, ::cuCtxSetCacheConfig, ::cuFuncSetCacheConfig, ::cuFuncGetAttribute, ::cudaLaunchKernel, ::cuLibraryGetKernel, ::cuKernelSetCacheConfig, ::cuKernelGetAttribute, ::cuKernelSetAttribute