比较对象:
V100 SXM2、V100 PCIe、V100S PCIe
A100 40GB PCIe、A100 80GB PCIe、A100 40GB SXM、A100 80GB SXM
H100 SXM5、H100 PCIe
陪跑:4090
一、硬件参数
V100 SXM2 | V100 PCIe | V100S PCIe | A100 40GB PCIe | A100 80GB PCIe | A100 40GB SXM | A100 80GB SXM | H100 SXM5 | H100 PCIe | 4090 | |
核心 | GV100 | GV100 | GV100 | GA100 | GA100 | GA100 | GA100 | GH100 | GH100 | AD102-300 |
架构 | Volta | Volta | Volta | Ampere | Ampere | Ampere | Ampere | Hopper | Hopper | Ada Lovelace |
SM | 80 | 80 | 80 | 108 | 108 | 108 | 108 | 132 | 114 | 128 |
CUDA Cores / SM | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 128 | 128 |
CUDA Cores / GPU | 5120 | 5120 | 5120 | 6912 | 6912 | 6912 | 6912 | 16896 | 14592 | 16384 |
FP32 Cores / SM | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 128 | 128* |
FP32 Cores / GPU | 5120 | 5120 | 5120 | 6912 | 6912 | 6912 | 6912 | 16896 | 14592 | 16384 |
FP64 Cores / SM | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 64 | 64 | 2 |
FP64 Cores / GPU | 2560 | 2560 | 2560 | 3456 | 3456 | 3456 | 3456 | 8448 | 7296 | 256 |
INT32 Cores / SM | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64 | 64* |
INT32 Cores / GPU | 5120 | 5120 | 5120 | 6912 | 6912 | 6912 | 6912 | 8448 | 7296 | 8192 |
Tensor Core | 1st | 1st | 1st | 3rd | 3rd | 3rd | 3rd | 4th | 4th | 4th |
Tensor Cores / SM | 8 | 8 | 8 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Tensor Cores / GPU | 640 | 640 | 640 | 432 | 432 | 432 | 432 | 528 | 456 | 512 |
GPU 加速频率 (MHz) | 1530 | 1380 | 1597 | 1410 | 1410 | 1410 | 1410 | 1830 / 1980** | 1620 / 1755** | 2520 |
显存 | 16 / 32 GB HBM2 | 16 / 32 GB HBM2 | 32 GB HBM2 | 40 GB HBM2 | 80 GB HBM2e | 40 GB HBM2 | 80 GB HBM2e | 80 GB HBM3 | 80 GB HBM2e | 24 GB GDDR6X |
显存位宽 (bit) | 4096 | 4096 | 4096 | 5120 | 5120 | 5120 | 5120 | 5120 | 5120 | 384 |
显存带宽 (GBps) | 897 | 897 | 1133 | 1555 | 1935 | 1555 | 2039 | 3352 | 2039 | 1008 |
一缓 (KB per SM) | 128 | 128 | 128 | 192 | 192 | 192 | 192 | 256 | 256 | 128 |
二缓 (MB) | 6 | 6 | 6 | 40 | 40 | 40 | 40 | 50 | 50 | 72 |
接口 | SXM2 | PCIe 3.0x16 | PCIe 3.0x16 | PCIe 4.0x16 | PCIe 4.0x16 | SXM4 | SXM4 | SXM5 | PCIe 5.0x16 | PCIe 4.0x16 |
TDP (W) | 300 | 250 | 250 | 250 | 300 | 400 | 400 | 700 | 350 | 450 |
制程 | TSMC 12nm FFN | TSMC 12nm FFN | TSMC 12nm FFN | TSMC N7 (7nm) | TSMC N7 (7nm) | TSMC N7 (7nm) | TSMC N7 (7nm) | TSMC 4N (5nm) | TSMC 4N (5nm) | TSMC 4N (5nm) |
* 4090 的 AD102-300 核心中每个 SM 单元中有 128 个 CUDA 计算单元,其中 64 个 CUDA 可以计算 FP32 或 INT32,另外 64 个只能计算 INT32。
** 第一项为 Tensor Core 计算 FP8、FP16、BF16、TF32 时的加速频率,第二项为 Tensor Core 计算 FP64 和 CUDA Core 计算 FP32、FP64 时的加速频率。
二、算力
1、CUDA Core 算力
浮点:TFLOPS
整型:TIOPS
取 A100 80GB PCIe 的算力为 100%
V100 SXM2 | V100 PCIe | V100S PCIe | A100 40GB PCIe | A100 80GB PCIe | A100 40GB SXM | A100 80GB SXM | H100 SXM5 | H100 PCIe | 4090 | |
FP32 | 15.67 | 14.13 | 16.35 | 19.5 | 19.5 | 19.5 | 19.5 | 66.9 | 51.2 | 82.6 |
FP16 | 31.33 | 28.26 | 32.71 | 78 | 78 | 78 | 78 | 133.8 | 102.4 | 82.6 |
FP64 | 7.834 | 7.066 | 8.177 | 9.7 | 9.7 | 9.7 | 9.7 | 33.5 | 25.6 | 1.29 |
BF16 | NA | NA | NA | 39 | 39 | 39 | 39 | 133.8 | 102.4 | 82.6 |
INT32 | 15.67 | 14.13 | 16.35 | 19.5 | 19.5 | 19.5 | 19.5 | 33.5 | 25.6 | 41.3 |
V100 SXM2 | V100 PCIe | V100S PCIe | A100 40GB PCIe | A100 80GB PCIe | A100 40GB SXM | A100 80GB SXM | H100 SXM5 | H100 PCIe | 4090 | |
FP32 | 80.4% | 72.5% | 83.8% | 100% | 100% | 100% | 100% | 343% | 263% | 424% |
FP16 | 40.2% | 36.2.% | 41.9% | 100% | 100% | 100% | 100% | 172% | 131% | 106% |
FP64 | 80.4% | 72.5% | 83.8% | 100% | 100% | 100% | 100% | 343% | 263% | 13.3% |
BF16 | NA | NA | NA | 100% | 100% | 100% | 100% | 343% | 263% | 212% |
INT32 | 80.4% | 72.5% | 83.8% | 100% | 100% | 100% | 100% | 172% | 131% | 212% |
2、Tensor Core 算力
浮点:TFLOPS
整型:TIOPS
稠密/稀疏
取 A100 80GB PCIe 的算力为 100%
V100 SXM2 | V100 PCIe | V100S PCIe | A100 40GB PCIe | A100 80GB PCIe | A100 40GB SXM | A100 80GB SXM | H100 SXM5 | H100 PCIe | 4090 | |
FP8 | NA | NA | NA | NA | NA | NA | NA | 1978.9 / 3957.8 | 1513 / 3026 | 660.6 / 1321.2 |
FP16 | 125 | 112 | 130 | 312 / 624 | 312 / 624 | 312 / 624 | 312 / 624 | 989.4 / 1978.9 | 756 / 1513 | 330.3 / 660.6 |
BF16 | NA | NA | NA | 312 / 624 | 312 / 624 | 312 / 624 | 312 / 624 | 989.4 / 1978.9 | 756 / 1513 | 165.2 / 330.4 |
TF32 | NA | NA | NA | 156 / 312 | 156 / 312 | 156 / 312 | 156 / 312 | 494.7 / 989.4 | 378 / 756 | 82.6 / 165.2 |
FP64 | NA | NA | NA | 19.5 | 19.5 | 19.5 | 19.5 | 66.9 | 51.2 | NA |
INT8 | NA | NA | NA | 624 / 1248 | 624 / 1248 | 624 / 1248 | 624 / 1248 | 1978.9 / 3957.8 | 1513 / 3026 | 660.6 / 1321.2 |
INT4 | NA | NA | NA | 1248 / 2496 | 1248 / 2496 | 1248 / 2496 | 1248 / 2496 | 3957.8 / 7915.6 | 3026 / 6052 | 1321.2 / 2642.4 |
Binary | NA | NA | NA | 4992 | 4992 | 4992 | 4992 | NA | NA | NA |
V100 SXM2 | V100 PCIe | V100S PCIe | A100 40GB PCIe | A100 80GB PCIe | A100 40GB SXM | A100 80GB SXM | H100 SXM5 | H100 PCIe | 4090 | |
FP8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
FP16 | 40.1% | 35.9% | 41.7% | 100% | 100% | 100% | 100% | 317% | 242% | 106% |
BF16 | NA | NA | NA | 100% | 100% | 100% | 100% | 317% | 242% | 52.9% |
TF32 | NA | NA | NA | 100% | 100% | 100% | 100% | 317% | 242% | 52.9% |
FP64 | NA | NA | NA | 100% | 100% | 100% | 100% | 343% | 263% | NA |
INT8 | NA | NA | NA | 100% | 100% | 100% | 100% | 317% | 242% | 106% |
INT4 | NA | NA | NA | 100% | 100% | 100% | 100% | 317% | 242% | 106% |
Binary | NA | NA | NA | 100% | 100% | 100% | 100% | NA | NA | NA |