#[repr(C)]pub struct cudaFuncAttributes {Show 17 fields
pub sharedSizeBytes: usize,
pub constSizeBytes: usize,
pub localSizeBytes: usize,
pub maxThreadsPerBlock: c_int,
pub numRegs: c_int,
pub ptxVersion: c_int,
pub binaryVersion: c_int,
pub cacheModeCA: c_int,
pub maxDynamicSharedSizeBytes: c_int,
pub preferredShmemCarveout: c_int,
pub clusterDimMustBeSet: c_int,
pub requiredClusterWidth: c_int,
pub requiredClusterHeight: c_int,
pub requiredClusterDepth: c_int,
pub clusterSchedulingPolicyPreference: c_int,
pub nonPortableClusterSizeAllowed: c_int,
pub reserved: [c_int; 16],
}
Expand description
CUDA function attributes
Fields§
The size in bytes of statically-allocated shared memory per block required by this function. This does not include dynamically-allocated shared memory requested by the user at runtime.
constSizeBytes: usize
The size in bytes of user-allocated constant memory required by this function.
localSizeBytes: usize
The size in bytes of local memory used by each thread of this function.
maxThreadsPerBlock: c_int
The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.
numRegs: c_int
The number of registers used by each thread of this function.
ptxVersion: c_int
The PTX virtual architecture version for which the function was compiled. This value is the major PTX version * 10 + the minor PTX version, so a PTX version 1.3 function would return the value 13.
binaryVersion: c_int
The binary architecture version for which the function was compiled. This value is the major binary version * 10 + the minor binary version, so a binary version 1.3 function would return the value 13.
cacheModeCA: c_int
The attribute to indicate whether the function has been compiled with user specified option “-Xptxas –dlcm=ca” set.
The maximum size in bytes of dynamic shared memory per block for this function. Any launch must have a dynamic shared memory size smaller than this value.
preferredShmemCarveout: c_int
On devices where the L1 cache and shared memory use the same hardware resources, this sets the shared memory carveout preference, in percent of the maximum shared memory. Refer to ::cudaDevAttrMaxSharedMemoryPerMultiprocessor. This is only a hint, and the driver can choose a different ratio if required to execute the function. See ::cudaFuncSetAttribute
clusterDimMustBeSet: c_int
If this attribute is set, the kernel must launch with a valid cluster dimension specified.
requiredClusterWidth: c_int
The required cluster width/height/depth in blocks. The values must either all be 0 or all be positive. The validity of the cluster dimensions is otherwise checked at launch time.
If the value is set during compile time, it cannot be set at runtime. Setting it at runtime should return cudaErrorNotPermitted. See ::cudaFuncSetAttribute
requiredClusterHeight: c_int
§requiredClusterDepth: c_int
§clusterSchedulingPolicyPreference: c_int
The block scheduling policy of a function. See ::cudaFuncSetAttribute
nonPortableClusterSizeAllowed: c_int
Whether the function can be launched with non-portable cluster size. 1 is allowed, 0 is disallowed. A non-portable cluster size may only function on the specific SKUs the program is tested on. The launch might fail if the program is run on a different hardware platform.
CUDA API provides ::cudaOccupancyMaxActiveClusters to assist with checking whether the desired size can be launched on the current device.
Portable Cluster Size
A portable cluster size is guaranteed to be functional on all compute capabilities higher than the target compute capability. The portable cluster size for sm_90 is 8 blocks per cluster. This value may increase for future compute capabilities.
The specific hardware unit may support higher cluster sizes that’s not guaranteed to be portable. See ::cudaFuncSetAttribute
reserved: [c_int; 16]
Reserved for future use.
Trait Implementations§
Source§impl Clone for cudaFuncAttributes
impl Clone for cudaFuncAttributes
Source§fn clone(&self) -> cudaFuncAttributes
fn clone(&self) -> cudaFuncAttributes
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more