Multi-GPU computing by CUDA

最新推荐文章于 2025-03-18 09:22:45 发布

zhouchao2013

最新推荐文章于 2025-03-18 09:22:45 发布

阅读量1.1k

点赞数

分类专栏： a 文章标签： cuda

本文链接：https://blog.youkuaiyun.com/u012313751/article/details/104684271

版权

本文介绍了使用CUDA实现多GPU调用，包括CUDA API的GPU切换、Peer-to-peer memory access的限制及解决策略。同时探讨了OpenMP与CUDA结合，如何通过OpenMP API控制多个GPU进行并行计算，并给出了编译注意事项。最后提到了计时函数在性能评估中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

CUDA实现多GPU调用

1、CUDA API 提供 cudaSetDevice(1) 函数切换GPU。只用CUDA编程，CPU只有一个线程控制一个GPU，默认是GPU(0)。通信拓扑结构如下图（左）：
蓝色箭头是访问关系
GPU0可以获取GPU1上的值和将值传输到GPU1；GPU1可以获取主机值但是不能将值传输到主机；GPU1只能将值传输到GPU0上但不能获取GPU0上的值。

2、考虑Peer-to-peer memory acces。cudaDeviceCanAccessPeer()没有返回1，即不能进行该策略，原因官方给出如下解释：

On Linux only, CUDA and the display driver does not support IOMMU-enabled bare-metal PCIe peer to peer memory copy. However, CUDA and the display driver does support IOMMU via VM pass through. As a consequence, users on Linux, when running on a native bare metal system, should disable the IOMMU. The IOMMU should be enabled and the VFIO driver be used as a PCIe pass through for virtual machines.
On Windows the above limitation does not exist.