常见GPU算力比较(X100计算卡,GX100)

比较对象:

        V100 SXM2、V100 PCIe、V100S PCIe

        A100 40GB PCIe、A100 80GB PCIe、A100 40GB SXM、A100 80GB SXM

        H100 SXM5、H100 PCIe

陪跑:4090

一、硬件参数

V100 SXM2V100 PCIeV100S PCIeA100 40GB PCIeA100 80GB PCIeA100 40GB SXMA100 80GB SXM H100 SXM5H100 PCIe4090
核心GV100GV100GV100GA100GA100GA100GA100GH100GH100AD102-300
架构VoltaVoltaVoltaAmpereAmpereAmpereAmpereHopperHopperAda Lovelace
SM808080108108108108132114128
CUDA Cores / SM64646464646464128128128
CUDA Cores / GPU5120512051206912691269126912168961459216384
FP32 Cores / SM64646464646464128128128*
FP32 Cores / GPU5120512051206912691269126912168961459216384
FP64 Cores / SM3232323232323264642
FP64 Cores / GPU256025602560345634563456345684487296256
INT32 Cores / SM64646464646464646464*
INT32 Cores / GPU5120512051206912691269126912844872968192
Tensor Core1st1st1st3rd3rd3rd3rd4th4th4th
Tensor Cores / SM8884444444
Tensor Cores / GPU640640640432432432432528456512
GPU 加速频率 (MHz)15301380159714101410141014101830 / 1980**1620 / 1755**2520
显存16 / 32 GB HBM216 / 32 GB HBM232 GB HBM240 GB HBM280 GB HBM2e40 GB HBM280 GB HBM2e80 GB HBM380 GB HBM2e24 GB GDDR6X
显存位宽 (bit)409640964096512051205120512051205120384
显存带宽 (GBps)89789711331555193515552039335220391008
一缓 (KB per SM)128128128192192192192256256128
二缓 (MB)66640404040505072
接口SXM2PCIe 3.0x16PCIe 3.0x16PCIe 4.0x16PCIe 4.0x16SXM4SXM4SXM5PCIe 5.0x16PCIe 4.0x16
TDP (W)300250250250300400400700350450
制程TSMC 12nm FFNTSMC 12nm FFNTSMC 12nm FFNTSMC N7 (7nm)TSMC N7 (7nm)TSMC N7 (7nm)TSMC N7 (7nm)TSMC 4N (5nm)TSMC 4N (5nm)TSMC 4N (5nm)

* 4090 的 AD102-300 核心中每个 SM 单元中有 128 个 CUDA 计算单元,其中 64 个 CUDA 可以计算 FP32 或 INT32,另外 64 个只能计算 INT32。

** 第一项为 Tensor Core 计算 FP8、FP16、BF16、TF32 时的加速频率,第二项为 Tensor Core 计算 FP64 和 CUDA Core 计算 FP32、FP64 时的加速频率。

二、算力

1、CUDA Core 算力

浮点:TFLOPS

整型:TIOPS

取 A100 80GB PCIe 的算力为 100%

V100 SXM2V100 PCIeV100S PCIeA100 40GB PCIeA100 80GB PCIeA100 40GB SXMA100 80GB SXMH100 SXM5H100 PCIe4090
FP3215.6714.1316.3519.519.519.519.566.951.282.6
FP1631.3328.2632.7178787878133.8102.482.6
FP647.8347.0668.1779.79.79.79.733.525.61.29
BF16NANANA39393939133.8102.482.6
INT3215.6714.1316.3519.519.519.519.533.525.6

41.3

V100 SXM2V100 PCIeV100S PCIeA100 40GB PCIeA100 80GB PCIeA100 40GB SXMA100 80GB SXMH100 SXM5H100 PCIe4090
FP3280.4%72.5%83.8%100%100%100%100%343%263%424%
FP1640.2%36.2.%41.9%100%100%100%100%172%131%106%
FP6480.4%72.5%83.8%100%100%100%100%343%263%13.3%
BF16NANANA100%100%100%100%343%263%212%
INT3280.4%72.5%83.8%100%100%100%100%172%131%212%

2、Tensor Core 算力

浮点:TFLOPS

整型:TIOPS

稠密/稀疏

取 A100 80GB PCIe 的算力为 100%

V100 SXM2V100 PCIeV100S PCIeA100 40GB PCIeA100 80GB PCIeA100 40GB SXMA100 80GB SXMH100 SXM5H100 PCIe4090
FP8NANANANANANANA1978.9 / 3957.81513 / 3026660.6 / 1321.2
FP16125112130312 / 624312 / 624312 / 624312 / 624989.4 / 1978.9756 / 1513330.3 / 660.6
BF16NANANA312 / 624312 / 624312 / 624312 / 624989.4 / 1978.9756 / 1513165.2 / 330.4
TF32NANANA156 / 312156 / 312156 / 312156 / 312494.7 / 989.4378 / 75682.6 / 165.2
FP64NANANA19.519.519.519.566.951.2NA
INT8NANANA624 / 1248624 / 1248624 / 1248624 / 12481978.9 / 3957.81513 / 3026660.6 / 1321.2
INT4NANANA1248 / 24961248 / 24961248 / 24961248 / 24963957.8 / 7915.63026 / 60521321.2 / 2642.4
BinaryNANANA4992499249924992NANANA
V100 SXM2V100 PCIeV100S PCIeA100 40GB PCIeA100 80GB PCIeA100 40GB SXMA100 80GB SXMH100 SXM5H100 PCIe4090
FP8NANANANANANANANANANA
FP1640.1%35.9%41.7%100%100%100%100%317%242%106%
BF16NANANA100%100%100%100%317%242%52.9%
TF32NANANA100%100%100%100%317%242%52.9%
FP64NANANA100%100%100%100%343%263%NA
INT8NANANA100%100%100%100%317%242%106%
INT4NANANA100%100%100%100%317%242%106%
BinaryNANANA100%100%100%100%NANANA
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值