CUDA实现多GPU调用
1、CUDA API 提供 cudaSetDevice(1)
函数切换GPU。只用CUDA编程,CPU只有一个线程控制一个GPU,默认是GPU(0)。通信拓扑结构如下图(左):
GPU0可以获取GPU1上的值和将值传输到GPU1;GPU1可以获取主机值但是不能将值传输到主机;GPU1只能将值传输到GPU0上但不能获取GPU0上的值。
2、考虑Peer-to-peer memory acces。cudaDeviceCanAccessPeer()
没有返回1,即不能进行该策略,原因官方给出如下解释:
On Linux only, CUDA and the display driver does not support IOMMU-enabled bare-metal PCIe peer to peer memory copy. However, CUDA and the display driver does support IOMMU via VM pass through. As a consequence, users on Linux, when running on a native bare metal system, should disable the IOMMU. The IOMMU should be enabled and the VFIO driver be used as a PCIe pass through for virtual machines.
On Windows the above limitation does not exist.