performance_hardware(文档)

---
title: Performance and Hardware Configuration
---

# Performance and Hardware Configuration

To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model.

For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.

**Acknowledgements**: BVLC members are very grateful to NVIDIA for providing several GPUs to conduct this research.

## NVIDIA K40

Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.

Best settings with ECC off and maximum clock speed in standard Caffe:

* Training is 26.5 secs / 20 iterations (5,120 images)
* Testing is 100 secs / validation set (50,000 images)

Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn):

* Training is 19.2 secs / 20 iterations (5,120 images)
* Testing is 60.7 secs / validation set (50,000 images)

Other settings:

* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
* ECC on, default speed: training 31 secs / 20 iterations, test 117 secs / validation set
* ECC off, default speed: training 31 secs / 20 iterations, test 118 secs / validation set

### K40 configuration tips

For maximum K40 performance, turn off ECC and boost the clock speed (at your own risk).

To turn off ECC, do

    sudo nvidia-smi -i 0 --ecc-config=0    # repeat with -i x for each GPU ID

then reboot.

Set the "persistence" mode of the GPU settings by

    sudo nvidia-smi -pm 1

and then set the clock speed with

    sudo nvidia-smi -i 0 -ac 3004,875    # repeat with -i x for each GPU ID

but note that this configuration resets across driver reloading / rebooting. Include these commands in a boot script to initialize these settings. For a simple fix, add these commands to `/etc/rc.local` (on Ubuntu).

## NVIDIA Titan

Training: 26.26 secs / 20 iterations (5,120 images).
Testing: 100 secs / validation set (50,000 images).

cuDNN Training: 20.25 secs / 20 iterations (5,120 images).
cuDNN Testing: 66.3 secs / validation set (50,000 images).


## NVIDIA K20

Training: 36.0 secs / 20 iterations (5,120 images).
Testing: 133 secs / validation set (50,000 images).

## NVIDIA GTX 770

Training: 33.0 secs / 20 iterations (5,120 images).
Testing: 129 secs / validation set (50,000 images).

cuDNN Training: 24.3 secs / 20 iterations (5,120 images).
cuDNN Testing: 104 secs / validation set (50,000 images).

性能和硬件配置

为了衡量不同NVIDIA GPU上的性能,我们使用CaffeNet(Caffe参考ImageNet模型)。

为了进行训练,每个时间点是256张图像的20次迭代/minibatches处理,总共5120张图像。为了进行测试,分类了50,000个图像验证集。

NVIDIA K40

关闭ECC并启用加速时钟时,性能最佳。 ECC的速度差异可忽略不计,但将其禁用可释放约1 GB的GPU内存。

禁用ECC的最佳设置和标准Caffe中的最大时钟速度:

*训练时间为26.5秒/ 20次迭代(5,120张图像)
*测试时间为100秒/验证集(50,000张图片)

使用Caffe + [cuDNN加速](http://nvidia.com/cudnn)的最佳设置:

*训练时间为19.2秒/ 20次迭代(5,120张图像)
*测试时间为60.7秒/验证集(50,000张图片)

其他设定:

*开启ECC,最高速度:训练26.7秒/ 20次迭代,测试101秒/验证集
*开启ECC,默认速度:训练31秒/ 20次迭代,测试117秒/验证集

  • ECC关闭,默认速度:训练31秒/ 20次迭代,测试118秒/验证集

K40配置技巧

为了获得最佳的K40性能,请关闭ECC并提高时钟速度(后果自负)。

要关闭ECC,请执行

sudo nvidia-smi -i 0 --ecc-config = 0#对每个GPU ID用-i x重复

然后重新启动。

通过以下方式设置GPU设置的“持久性”模式

sudo nvidia-smi -pm 1

然后设置闹铃速度

sudo nvidia-smi -i 0 -ac 3004,875#对每个GPU ID用-i x重复

但请注意,此配置会在驱动程序重新加载/重新引导时重置。将这些命令包含在引导脚本中以初始化这些设置。为了简单的修复,请将这些命令添加到/ etc / rc.local(在Ubuntu上)。

NVIDIA Titan

训练:26.26秒/ 20次迭代(5,120张图像)。
测试:100秒/验证集(50,000张图像)。

cuDNN训练:20.25秒/ 20次迭代(5,120张图像)。
cuDNN测试:66.3秒/验证集(50,000张图像)。

NVIDIA K20

训练:36.0秒/ 20次迭代(5,120张图像)。
测试:133秒/验证集(50,000张图像)。

NVIDIA GTX 770

训练:33.0秒/ 20次迭代(5,120张图像)。
测试:129秒/验证集(50,000张图像)。

cuDNN训练:24.3秒/ 20次迭代(5,120张图像)。
cuDNN测试:104秒/验证集(50,000张图像)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值