pub unsafe extern "C" fn cuLaunchKernelEx(
config: *const CUlaunchConfig,
f: CUfunction,
kernelParams: *mut *mut c_void,
extra: *mut *mut c_void,
) -> CUresult
Expand description
\brief Launches a CUDA function ::CUfunction or a CUDA kernel ::CUkernel with launch-time configuration
Invokes the function ::CUfunction or the kernel ::CUkernel \p f with the specified launch-time configuration \p config.
The ::CUlaunchConfig structure is defined as:
\code typedef struct CUlaunchConfig_st { unsigned int gridDimX; unsigned int gridDimY; unsigned int gridDimZ; unsigned int blockDimX; unsigned int blockDimY; unsigned int blockDimZ; unsigned int sharedMemBytes; CUstream hStream; CUlaunchAttribute *attrs; unsigned int numAttrs; } CUlaunchConfig; \endcode
where:
- ::CUlaunchConfig::gridDimX is the width of the grid in blocks.
- ::CUlaunchConfig::gridDimY is the height of the grid in blocks.
- ::CUlaunchConfig::gridDimZ is the depth of the grid in blocks.
- ::CUlaunchConfig::blockDimX is the X dimension of each thread block.
- ::CUlaunchConfig::blockDimX is the Y dimension of each thread block.
- ::CUlaunchConfig::blockDimZ is the Z dimension of each thread block.
- ::CUlaunchConfig::sharedMemBytes is the dynamic shared-memory size per thread block in bytes.
- ::CUlaunchConfig::hStream is the handle to the stream to perform the launch in. The CUDA context associated with this stream must match that associated with function f.
- ::CUlaunchConfig::attrs is an array of ::CUlaunchConfig::numAttrs continguous ::CUlaunchAttribute elements. The value of this pointer is not considered if ::CUlaunchConfig::numAttrs is zero. However, in that case, it is recommended to set the pointer to NULL.
- ::CUlaunchConfig::numAttrs is the number of attributes populating the first ::CUlaunchConfig::numAttrs positions of the ::CUlaunchConfig::attrs array.
Launch-time configuration is specified by adding entries to ::CUlaunchConfig::attrs. Each entry is an attribute ID and a corresponding attribute value.
The ::CUlaunchAttribute structure is defined as: \code typedef struct CUlaunchAttribute_st { CUlaunchAttributeID id; CUlaunchAttributeValue value; } CUlaunchAttribute; \endcode where:
- ::CUlaunchAttribute::id is a unique enum identifying the attribute.
- ::CUlaunchAttribute::value is a union that hold the attribute value.
An example of using the \p config parameter: \code CUlaunchAttribute coopAttr = {.id = CU_LAUNCH_ATTRIBUTE_COOPERATIVE, .value = 1}; CUlaunchConfig config = {… // set block and grid dimensions .attrs = &coopAttr, .numAttrs = 1};
cuLaunchKernelEx(&config, kernel, NULL, NULL); \endcode
The ::CUlaunchAttributeID enum is defined as: \code typedef enum CUlaunchAttributeID_enum { CU_LAUNCH_ATTRIBUTE_IGNORE = 0, CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW = 1, CU_LAUNCH_ATTRIBUTE_COOPERATIVE = 2, CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY = 3, CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION = 4, CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE = 5, CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION = 6, CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT = 7, CU_LAUNCH_ATTRIBUTE_PRIORITY = 8, CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP = 9, CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN = 10, CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT = 12, CU_LAUNCH_ATTRIBUTE_DEVICE_UPDATABLE_KERNEL_NODE = 13, } CUlaunchAttributeID; \endcode
and the corresponding ::CUlaunchAttributeValue union as : \code typedef union CUlaunchAttributeValue_union { CUaccessPolicyWindow accessPolicyWindow; int cooperative; CUsynchronizationPolicy syncPolicy; struct { unsigned int x; unsigned int y; unsigned int z; } clusterDim; CUclusterSchedulingPolicy clusterSchedulingPolicyPreference; int programmaticStreamSerializationAllowed; struct { CUevent event; int flags; int triggerAtBlockStart; } programmaticEvent; int priority; CUlaunchMemSyncDomainMap memSyncDomainMap; CUlaunchMemSyncDomain memSyncDomain; struct { CUevent event; int flags; } launchCompletionEvent; struct { int deviceUpdatable; CUgraphDeviceNode devNode; } deviceUpdatableKernelNode; } CUlaunchAttributeValue; \endcode
Setting ::CU_LAUNCH_ATTRIBUTE_COOPERATIVE to a non-zero value causes the kernel launch to be a cooperative launch, with exactly the same usage and semantics of ::cuLaunchCooperativeKernel.
Setting ::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION to a non-zero values causes the kernel to use programmatic means to resolve its stream dependency – enabling the CUDA runtime to opportunistically allow the grid’s execution to overlap with the previous kernel in the stream, if that kernel requests the overlap.
::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT records an event along with the kernel launch. Event recorded through this launch attribute is guaranteed to only trigger after all block in the associated kernel trigger the event. A block can trigger the event through PTX launchdep.release or CUDA builtin function cudaTriggerProgrammaticLaunchCompletion(). A trigger can also be inserted at the beginning of each block’s execution if triggerAtBlockStart is set to non-0. Note that dependents (including the CPU thread calling cuEventSynchronize()) are not guaranteed to observe the release precisely when it is released. For example, cuEventSynchronize() may only observe the event trigger long after the associated kernel has completed. This recording type is primarily meant for establishing programmatic dependency between device tasks. The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. created with ::CU_EVENT_DISABLE_TIMING flag set).
::CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT records an event along with the kernel launch. Nominally, the event is triggered once all blocks of the kernel have begun execution. Currently this is a best effort. If a kernel B has a launch completion dependency on a kernel A, B may wait until A is complete. Alternatively, blocks of B may begin before all blocks of A have begun, for example:
- If B can claim execution resources unavaiable to A, for example if they run on different GPUs.
- If B is a higher priority than A.
Exercise caution if such an ordering inversion could lead to deadlock. The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. must be created with the ::CU_EVENT_DISABLE_TIMING flag set).
Setting ::CU_LAUNCH_ATTRIBUTE_DEVICE_UPDATABLE_KERNEL_NODE to 1 on a captured launch causes the resulting kernel node to be device-updatable. This attribute is specific to graphs, and passing it to a launch in a non-capturing stream results in an error. Passing a value other than 0 or 1 is not allowed.
On success, a handle will be returned via ::CUlaunchAttributeValue::deviceUpdatableKernelNode::devNode which can be passed to the various device-side update functions to update the node’s kernel parameters from within another kernel. For more information on the types of device updates that can be made, as well as the relevant limitations thereof, see ::cudaGraphKernelNodeUpdatesApply.
Kernel nodes which are device-updatable have additional restrictions compared to regular kernel nodes. Firstly, device-updatable nodes cannot be removed from their graph via ::cuGraphDestroyNode. Additionally, once opted-in to this functionality, a node cannot opt out, and any attempt to set the attribute to 0 will result in an error. Graphs containing one or more device-updatable node also do not allow multiple instantiation.
The effect of other attributes is consistent with their effect when set via persistent APIs.
See ::cuStreamSetAttribute for
- ::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW
- ::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY
See ::cuFuncSetAttribute for
- ::CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION
- ::CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE
Kernel parameters to \p f can be specified in the same ways that they can be using ::cuLaunchKernel.
Note that the API can also be used to launch context-less kernel ::CUkernel by querying the handle using ::cuLibraryGetKernel() and then passing it to the API by casting to ::CUfunction. Here, the context to launch the kernel on will either be taken from the specified stream ::CUlaunchConfig::hStream or the current context in case of NULL stream.
\param config - Config to launch \param f - Function ::CUfunction or Kernel ::CUkernel to launch \param kernelParams - Array of pointers to kernel parameters \param extra - Extra options
\return ::CUDA_SUCCESS, ::CUDA_ERROR_DEINITIALIZED, ::CUDA_ERROR_NOT_INITIALIZED, ::CUDA_ERROR_INVALID_CONTEXT, ::CUDA_ERROR_INVALID_HANDLE, ::CUDA_ERROR_INVALID_IMAGE, ::CUDA_ERROR_INVALID_VALUE, ::CUDA_ERROR_LAUNCH_FAILED, ::CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, ::CUDA_ERROR_LAUNCH_TIMEOUT, ::CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING, ::CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE, ::CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, ::CUDA_ERROR_NOT_FOUND \note_null_stream \notefnerr
\sa ::cuCtxGetCacheConfig, ::cuCtxSetCacheConfig, ::cuFuncSetCacheConfig, ::cuFuncGetAttribute, ::cudaLaunchKernel, ::cudaLaunchKernelEx, ::cuLibraryGetKernel, ::cuKernelSetCacheConfig, ::cuKernelGetAttribute, ::cuKernelSetAttribute