Function cudaMemPrefetchAsync_v2

Source
pub unsafe extern "C" fn cudaMemPrefetchAsync_v2(
    devPtr: *const c_void,
    count: usize,
    location: cudaMemLocation,
    flags: c_uint,
    stream: cudaStream_t,
) -> cudaError_t
Expand description

\brief Prefetches memory to the specified destination location

Prefetches memory to the specified destination location. \p devPtr is the base device pointer of the memory to be prefetched and \p location specifies the destination location. \p count specifies the number of bytes to copy. \p stream is the stream in which the operation is enqueued. The memory range must refer to managed memory allocated via ::cudaMallocManaged or declared via managed variables, or it may also refer to system-allocated memory on systems with non-zero cudaDevAttrPageableMemoryAccess.

Specifying ::cudaMemLocationTypeDevice for ::cudaMemLocation::type will prefetch memory to GPU specified by device ordinal ::cudaMemLocation::id which must have non-zero value for the device attribute ::concurrentManagedAccess. Additionally, \p stream must be associated with a device that has a non-zero value for the device attribute ::concurrentManagedAccess. Specifying ::cudaMemLocationTypeHost as ::cudaMemLocation::type will prefetch data to host memory. Applications can request prefetching memory to a specific host NUMA node by specifying ::cudaMemLocationTypeHostNuma for ::cudaMemLocation::type and a valid host NUMA node id in ::cudaMemLocation::id Users can also request prefetching memory to the host NUMA node closest to the current thread’s CPU by specifying ::cudaMemLocationTypeHostNumaCurrent for ::cudaMemLocation::type. Note when ::cudaMemLocation::type is etiher ::cudaMemLocationTypeHost OR ::cudaMemLocationTypeHostNumaCurrent, ::cudaMemLocation::id will be ignored.

The start address and end address of the memory range will be rounded down and rounded up respectively to be aligned to CPU page size before the prefetch operation is enqueued in the stream.

If no physical memory has been allocated for this region, then this memory region will be populated and mapped on the destination device. If there’s insufficient memory to prefetch the desired region, the Unified Memory driver may evict pages from other ::cudaMallocManaged allocations to host memory in order to make room. Device memory allocated using ::cudaMalloc or ::cudaMallocArray will not be evicted.

By default, any mappings to the previous location of the migrated pages are removed and mappings for the new location are only setup on the destination location. The exact behavior however also depends on the settings applied to this memory range via ::cuMemAdvise as described below:

If ::cudaMemAdviseSetReadMostly was set on any subset of this memory range, then that subset will create a read-only copy of the pages on destination location. If however the destination location is a host NUMA node, then any pages of that subset that are already in another host NUMA node will be transferred to the destination.

If ::cudaMemAdviseSetPreferredLocation was called on any subset of this memory range, then the pages will be migrated to \p location even if \p location is not the preferred location of any pages in the memory range.

If ::cudaMemAdviseSetAccessedBy was called on any subset of this memory range, then mappings to those pages from all the appropriate processors are updated to refer to the new location if establishing such a mapping is possible. Otherwise, those mappings are cleared.

Note that this API is not required for functionality and only serves to improve performance by allowing the application to migrate data to a suitable location before it is accessed. Memory accesses to this range are always coherent and are allowed even when the data is actively being migrated.

Note that this function is asynchronous with respect to the host and all work on other devices.

\param devPtr - Pointer to be prefetched \param count - Size in bytes \param location - location to prefetch to \param flags - flags for future use, must be zero now. \param stream - Stream to enqueue prefetch operation

\return ::cudaSuccess, ::cudaErrorInvalidValue, ::cudaErrorInvalidDevice \notefnerr \note_async \note_null_stream \note_init_rt \note_callback

\sa ::cudaMemcpy, ::cudaMemcpyPeer, ::cudaMemcpyAsync, ::cudaMemcpy3DPeerAsync, ::cudaMemAdvise, ::cudaMemAdvise_v2 ::cuMemPrefetchAsync