本文介绍了如何在Ubuntu 22.04系统下给2080Ti双卡开启NVLink的方法。
首先要确保安装好CUDA及配置好环境变量
1、开启
nvidia-smi -pm 1
sudo reboot
nvidia-smi topo -m
结果如下:
(base) root@myd-gpu:~# nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV2 0-11,24-35 0 N/A
GPU1 NV2 X 12-23,36-47 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
2、测试
下载官方例程
git clone https://github.com/NVIDIA/cuda-samples.git
编译运行
pip install cmake
cd cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest
mkdir build && cd build
cmake ..
make -j$(nproc)
./p2pBandwidthLatencyTest
查看结果
(base) root@myd-gpu:~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest/build# ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 2080 Ti, pciBusID: 4, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 2080 Ti, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1
0 1 1
1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 541.95 5.67
1 5.72 536.94
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 523.63 47.11
1 47.11 536.57
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 535.84 8.49
1 8.44 533.98
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 534.00 94.18
1 94.13 533.34
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.48 16.92
1 14.64 1.34
CPU 0 1
0 3.10 9.39
1 9.35 3.30
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.34 1.46
1 1.53 1.34
CPU 0 1
0 2.96 2.60
1 2.73 3.30
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.