CUDA张量操作与管理-优快云博客

这个包提供了对CUDA张量的支持，实现与CPU张量相同的功能，利用GPU进行计算。包括当前CUDA设备、流、内存管理和随机数生成器的相关函数，如current_device(), memory_allocated(), set_rng_state()等，适用于深度学习和机器学习场景。" 125211589,9031025,Elasticsearch 冷热数据分离策略,"['Elasticsearch', '大数据', '搜索引擎']

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考 torch.cuda - 云+社区 - 腾讯云

torch.cuda.current_blas_handle()

torch.cuda.current_device()

torch.cuda.current_stream(device=None)

torch.cuda.default_stream(device=None)

class torch.cuda.device(device)

torch.cuda.device_count()

class torch.cuda.device_of(obj)

torch.cuda.empty_cache()

torch.cuda.get_device_capability(device=None)

torch.cuda.get_device_name(device=None)

torch.cuda.init()

torch.cuda.ipc_collect()

torch.cuda.is_available()

torch.cuda.max_memory_allocated(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.reset_max_memory_cached(device=None)

torch.cuda.set_device(device)

torch.cuda.stream(stream)

torch.cuda.synchronize(device=None)

Random Number Generator

torch.cuda.get_rng_state(device='cuda')

torch.cuda.get_rng_state_all()

torch.cuda.set_rng_state(new_state, device='cuda')

torch.cuda.set_rng_state_all(new_states)

torch.cuda.manual_seed(seed)

torch.cuda.seed()

torch.cuda.seed_all()

torch.cuda.initial_seed()

Communication collectives

torch.cuda.comm.broadcast(tensor, devices)

torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)

torch.cuda.comm.reduce_add(inputs, destination=None)

torch.cuda.comm.scatter(tensor, devices, chunk_sizes=None, dim=0, streams=None)

torch.cuda.comm.gather(tensors, dim=0, destination=None)

Streams and events

class torch.cuda.Stream

query()

record_event(event=None)

synchronize()

wait_event(event)

wait_stream(stream)

class torch.cuda.Event

elapsed_time(end_event)

torch.cuda.empty_cache()

torch.cuda.memory_allocated(device=None)

torch.cuda.max_memory_allocated(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.reset_max_memory_cached(device=None)

NVIDIA Tools Extension (NVTX)

torch.cuda.nvtx.mark(msg)

torch.cuda.nvtx.range_push(msg)

torch.cuda.nvtx.range_pop()

这个包增加了对CUDA张量类型的支持，它实现了与CPU张量相同的功能，但是它们利用gpu进行计算。它是惰性初始化的，所以你总是可以导入它，并使用is_available()来确定您的系统是否支持CUDA。CUDA语义提供了更多关于使用CUDA的细节。

torch.cuda.current_blas_handle()

返回指向当前cuBLAS句柄的cublasHandle_t指针。

torch.cuda.current_device()

返回当前选定设备的索引。

torch.cuda.current_stream(device=None)

返回给定设备当前选定的流。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备当前选择的流，如果设备为None(默认)，则由current_device()给出。

`torch.cuda.default_stream`(device=None)

返回给定设备的默认流。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的默认流，如果设备为None(默认)，则由current_device()提供。

class `torch.cuda.device`(device)

更改所选设备的上下文管理器。

参数：

device (torch.device or int) – 要选择的设备索引。如果这个参数是负整数或None，那么它就是no-op。

`torch.cuda.device_count`()

返回可用的gpu数量。

class `torch.cuda.device_of`(obj)

将当前设备更改为给定对象的设备的上下文管理器。您可以同时使用张量和存储作为参数。如果一个给定的对象没有分配在GPU上，这是一个no-op。

参数：

obj (Tensor or Storage) – 在选定设备上分配的对象。

`torch.cuda.empty_cache`()

释放缓存分配器当前持有的所有未占用的缓存内存，以便这些内存可以在其他GPU应用程序中使用，并在nvidia-smi中可见。

注意

empty_cache()不会增加PyTorch可用的GPU内存。有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.get_device_capability`(device=None)

获取设备的cuda功能。

参数：

device (torch.device or int, optional) – 用于返回设备功能的设备。如果这个参数是一个负整数，那么这个函数就是no-op。如果设备为None(默认)，则使用current_device()提供的当前设备。

返回值：

主要和次要cuda功能的设备，返回类型。

tuple(int, int)

`torch.cuda.get_device_name`(device=None)

获取设备的名称。

参数：

device (torch.device or int, optional) – 用于返回名称的设备。如果这个参数是一个负整数，那么这个函数就是no-op。如果设备为None(默认)，则使用current_device()提供的当前设备。

`torch.cuda.init`()

初始化PyTorch的CUDA状态。如果您通过PyTorch的C API与它进行交互，可能需要显式地调用这个函数，因为在初始化之前，CUDA功能的Python绑定不会这样做。普通用户不应该需要这样做，因为PyTorch的所有CUDA方法都会根据需要自动初始化CUDA状态。如果CUDA状态已经初始化，则不执行任何操作。