pub unsafe extern "C" fn cuMemPrefetchAsync(
devPtr: CUdeviceptr,
count: usize,
dstDevice: CUdevice,
hStream: CUstream,
) -> CUresult
Expand description
\brief Prefetches memory to the specified destination device
Note there is a later version of this API, ::cuMemPrefetchAsync_v2. It will supplant this version in 13.0, which is retained for minor version compatibility.
Prefetches memory to the specified destination device. \p devPtr is the base device pointer of the memory to be prefetched and \p dstDevice is the destination device. \p count specifies the number of bytes to copy. \p hStream is the stream in which the operation is enqueued. The memory range must refer to managed memory allocated via ::cuMemAllocManaged or declared via managed variables or it may also refer to system-allocated memory on systems with non-zero CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
Passing in CU_DEVICE_CPU for \p dstDevice will prefetch the data to host memory. If \p dstDevice is a GPU, then the device attribute ::CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS must be non-zero. Additionally, \p hStream must be associated with a device that has a non-zero value for the device attribute ::CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS.
The start address and end address of the memory range will be rounded down and rounded up respectively to be aligned to CPU page size before the prefetch operation is enqueued in the stream.
If no physical memory has been allocated for this region, then this memory region will be populated and mapped on the destination device. If there’s insufficient memory to prefetch the desired region, the Unified Memory driver may evict pages from other ::cuMemAllocManaged allocations to host memory in order to make room. Device memory allocated using ::cuMemAlloc or ::cuArrayCreate will not be evicted.
By default, any mappings to the previous location of the migrated pages are removed and mappings for the new location are only setup on \p dstDevice. The exact behavior however also depends on the settings applied to this memory range via ::cuMemAdvise as described below:
If ::CU_MEM_ADVISE_SET_READ_MOSTLY was set on any subset of this memory range, then that subset will create a read-only copy of the pages on \p dstDevice.
If ::CU_MEM_ADVISE_SET_PREFERRED_LOCATION was called on any subset of this memory range, then the pages will be migrated to \p dstDevice even if \p dstDevice is not the preferred location of any pages in the memory range.
If ::CU_MEM_ADVISE_SET_ACCESSED_BY was called on any subset of this memory range, then mappings to those pages from all the appropriate processors are updated to refer to the new location if establishing such a mapping is possible. Otherwise, those mappings are cleared.
Note that this API is not required for functionality and only serves to improve performance by allowing the application to migrate data to a suitable location before it is accessed. Memory accesses to this range are always coherent and are allowed even when the data is actively being migrated.
Note that this function is asynchronous with respect to the host and all work on other devices.
\param devPtr - Pointer to be prefetched \param count - Size in bytes \param dstDevice - Destination device to prefetch to \param hStream - Stream to enqueue prefetch operation
\return ::CUDA_SUCCESS, ::CUDA_ERROR_INVALID_VALUE, ::CUDA_ERROR_INVALID_DEVICE \notefnerr \note_async \note_null_stream
\sa ::cuMemcpy, ::cuMemcpyPeer, ::cuMemcpyAsync, ::cuMemcpy3DPeerAsync, ::cuMemAdvise, ::cuMemPrefetchAsync ::cudaMemPrefetchAsync_v2