cudnn.benchmark=True 将会让程序在开始时花费一点额外时间,为整个网络的每个卷积层搜索最适合它的卷积实现算法,进而实现网络的加速。适用场景是网络结构固定(不是动态变化的),网络的输入形状(包括 batch size,图片大小,输入的通道)是不变的,其实也就是一般情况下都比较适用。反之,如果卷积层的设置一直变化,将会导致程序不停地做优化,反而会耗费更多的时间。
作者:xiaopl
链接:https://zhuanlan.zhihu.com/p/73711222
单个Epoch速度(相关配置:cpu虚拟16核,内存64G,显卡3090)
未使用GradScaler和autocast的代码:
主要参数(workers=4,batch-size=12,cudnn.benchmark = False)
Epoch 001 (lr 0.01000)
Train: tpr 96.42, tnr 0.00, total pos 2627, total neg 7504, time 579.38
loss 51.8898, classify loss 50.0478, regress loss 0.5026, 0.4659, 0.4887, 0.3849
Validation: tpr 100.00, tnr 5.83922219, total pos 634, total neg 61716576, time 23.60
loss 22.7819, classify loss 22.6462, regress loss 0.0312, 0.0368, 0.0262, 0.0416)
单个速度:579.38s
使用GradScaler和autocast的代码:
1.主要参数(workers=4,batch-size=12,cudnn.benchmark = False)
Epoch 001 (lr 0.01000)
Train: tpr 2.44, tnr 4.42, total pos 2627, total neg 7504, time 300.46
loss 2.6525, classify loss 1.2253, regress loss 0.3179, 0.3708, 0.3444, 0.3941
Validation: tpr 0.00, tnr 83.19652942, total pos 634, total neg 61707300, time 23.81
loss 0.6969, classify loss 0.5074, regress loss 0.0520, 0.0296, 0.0533, 0.0545
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 6.78, total pos 2627, total neg 7504, time 301.67
loss 0.8012, classify loss 0.7067, regress loss 0.0204, 0.0208, 0.0202, 0.0331
Validation: tpr 0.00, tnr 88.57814414, total pos 634, total neg 61709096, time 23.57
loss 0.5895, classify loss 0.4833, regress loss 0.0240, 0.0202, 0.0245, 0.0374
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 13.89, total pos 2627, total neg 7504, time 308.26
loss 0.8198, classify loss 0.7180, regress loss 0.0228, 0.0207, 0.0227, 0.0356
Validation: tpr 0.00, tnr 93.44098638, total pos 634, total neg 61704400, time 23.67
loss 0.6270, classify loss 0.4882, regress loss 0.0304, 0.0309, 0.0349, 0.0425
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 15.54, total pos 2627, total neg 7504, time 304.40
loss 0.7945, classify loss 0.7007, regress loss 0.0201, 0.0208, 0.0210, 0.0319
Validation: tpr 0.00, tnr 94.17747036, total pos 634, total neg 61719016, time 23.63
loss 0.5930, classify loss 0.4951, regress loss 0.0198, 0.0202, 0.0204, 0.0375
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 14.54, total pos 2627, total neg 7504, time 303.77
loss 0.7859, classify loss 0.6997, regress loss 0.0193, 0.0194, 0.0186, 0.0290
Validation: tpr 0.00, tnr 94.92142907, total pos 634, total neg 61710352, time 23.67
loss 0.6252, classify loss 0.5087, regress loss 0.0224, 0.0244, 0.0235, 0.0463
Epoch 006 (lr 0.01000)
Train: tpr 0.00, tnr 19.00, total pos 2627, total neg 7504, time 301.97
loss 0.7867, classify loss 0.6980, regress loss 0.0193, 0.0198, 0.0194, 0.0302
Validation: tpr 0.00, tnr 99.96104801, total pos 634, total neg 61696464, time 23.62
loss 0.6016, classify loss 0.5038, regress loss 0.0216, 0.0214, 0.0199, 0.0349
Epoch 007 (lr 0.01000)
Train: tpr 0.00, tnr 22.79, total pos 2627, total neg 7504, time 304.46
loss 0.7846, classify loss 0.6997, regress loss 0.0187, 0.0188, 0.0188, 0.0288
Validation: tpr 0.00, tnr 95.42382979, total pos 634, total neg 61702600, time 23.56
loss 0.6197, classify loss 0.5211, regress loss 0.0217, 0.0200, 0.0189, 0.0381
Epoch 008 (lr 0.01000)
Train: tpr 0.00, tnr 24.81, total pos 2627, total neg 7504, time 302.30
loss 0.7809, classify loss 0.6971, regress loss 0.0179, 0.0190, 0.0184, 0.0286
Validation: tpr 0.00, tnr 93.81690706, total pos 634, total neg 61718820, time 23.57
loss 0.6067, classify loss 0.5134, regress loss 0.0194, 0.0175, 0.0197, 0.0366
Epoch 009 (lr 0.01000)
Train: tpr 0.00, tnr 24.16, total pos 2627, total neg 7504, time 305.15
loss 0.7804, classify loss 0.6973, regress loss 0.0182, 0.0187, 0.0182, 0.0280
Validation: tpr 0.00, tnr 87.05005522, total pos 634, total neg 61713128, time 23.62
loss 0.6147, classify loss 0.5206, regress loss 0.0194, 0.0180, 0.0208, 0.0360
去除第一个平均速度(为了配合统一取2-5):304.525s(看了混合精度确实有效,大幅提高速度)
2.主要参数(workers=4,batch-size=12,cudnn.benchmark = True)
Epoch 001 (lr 0.01000)
Train: tpr 1.18, tnr 5.25, total pos 2627, total neg 7504, time 310.73
loss 3.1704, classify loss 1.4532, regress loss 0.4762, 0.4834, 0.2843, 0.4733
Validation: tpr 0.00, tnr 90.80579587, total pos 634, total neg 61703568, time 22.39
loss 0.8063, classify loss 0.6344, regress loss 0.0380, 0.0468, 0.0295, 0.0576
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 15.99, total pos 2627, total neg 7504, time 301.82
loss 0.8186, classify loss 0.7168, regress loss 0.0208, 0.0215, 0.0201, 0.0394
Validation: tpr 0.00, tnr 94.02115859, total pos 634, total neg 61697104, time 20.59
loss 0.6936, classify loss 0.4773, regress loss 0.0366, 0.0584, 0.0607, 0.0605
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 15.53, total pos 2627, total neg 7504, time 301.61
loss 0.8147, classify loss 0.7100, regress loss 0.0235, 0.0218, 0.0232, 0.0362
Validation: tpr 0.00, tnr 94.03832154, total pos 634, total neg 61725704, time 20.57
loss 0.5585, classify loss 0.4610, regress loss 0.0212, 0.0196, 0.0211, 0.0356
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 8.04, total pos 2627, total neg 7504, time 302.98
loss 0.7936, classify loss 0.7006, regress loss 0.0202, 0.0194, 0.0202, 0.0332
Validation: tpr 0.00, tnr 84.07998293, total pos 634, total neg 61695964, time 20.63
loss 0.5632, classify loss 0.4687, regress loss 0.0203, 0.0206, 0.0193, 0.0343
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 17.04, total pos 2627, total neg 7504, time 304.19
loss 0.7881, classify loss 0.7006, regress loss 0.0195, 0.0193, 0.0194, 0.0293
Validation: tpr 0.00, tnr 94.05568317, total pos 634, total neg 61709564, time 20.58
loss 0.5956, classify loss 0.4768, regress loss 0.0233, 0.0214, 0.0236, 0.0505
去除第一个平均速度:302.65(比1提升0.62%,这种提升很难判断cudnn.benchmark 是否真的有提升。)
3.主要参数(workers=8,batch-size=12,cudnn.benchmark = False)
Epoch 001 (lr 0.01000)
Train: tpr 2.47, tnr 6.34, total pos 2627, total neg 7504, time 234.42
loss 3.2693, classify loss 1.6693, regress loss 0.3342, 0.4426, 0.3981, 0.4251
Validation: tpr 0.00, tnr 87.95868974, total pos 634, total neg 61695080, time 24.24
loss 0.7502, classify loss 0.5725, regress loss 0.0356, 0.0353, 0.0541, 0.0526
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 16.54, total pos 2627, total neg 7504, time 233.51
loss 0.8038, classify loss 0.7088, regress loss 0.0203, 0.0200, 0.0213, 0.0333
Validation: tpr 0.00, tnr 85.33051354, total pos 634, total neg 61714144, time 24.06
loss 0.6119, classify loss 0.4917, regress loss 0.0221, 0.0268, 0.0249, 0.0463
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 16.55, total pos 2627, total neg 7504, time 234.47
loss 0.7977, classify loss 0.7066, regress loss 0.0198, 0.0198, 0.0197, 0.0318
Validation: tpr 0.00, tnr 88.41188995, total pos 634, total neg 61724992, time 24.13
loss 0.5672, classify loss 0.4685, regress loss 0.0212, 0.0205, 0.0219, 0.0352
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 20.64, total pos 2627, total neg 7504, time 234.07
loss 0.7895, classify loss 0.7023, regress loss 0.0197, 0.0187, 0.0193, 0.0295
Validation: tpr 0.00, tnr 88.00590522, total pos 634, total neg 61713536, time 24.06
loss 0.5761, classify loss 0.4746, regress loss 0.0246, 0.0212, 0.0196, 0.0362
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 14.49, total pos 2627, total neg 7504, time 234.16
loss 0.7862, classify loss 0.7000, regress loss 0.0192, 0.0186, 0.0196, 0.0289
Validation: tpr 0.00, tnr 93.82381128, total pos 634, total neg 61694164, time 24.05
loss 0.5781, classify loss 0.4709, regress loss 0.0205, 0.0211, 0.0203, 0.0453
去除第一个平均速度:234.0525s(对比2增加workers十分有效)
4.主要参数(workers=8,batch-size=12,cudnn.benchmark = True)
Epoch 001 (lr 0.01000)
Train: tpr 1.26, tnr 2.87, total pos 2627, total neg 7504, time 236.82
loss 2.4346, classify loss 1.1293, regress loss 0.3311, 0.3774, 0.2640, 0.3329
Validation: tpr 0.00, tnr 94.18171435, total pos 634, total neg 61710480, time 22.78
loss 1.0356, classify loss 0.4749, regress loss 0.1270, 0.1281, 0.1586, 0.1469
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 4.29, total pos 2627, total neg 7504, time 224.36
loss 0.8743, classify loss 0.7090, regress loss 0.0379, 0.0381, 0.0363, 0.0530
Validation: tpr 0.00, tnr 94.17087003, total pos 634, total neg 61740260, time 21.08
loss 0.7275, classify loss 0.4995, regress loss 0.0500, 0.0640, 0.0445, 0.0695
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 18.31, total pos 2627, total neg 7504, time 223.75
loss 0.7954, classify loss 0.7017, regress loss 0.0197, 0.0197, 0.0200, 0.0343
Validation: tpr 0.00, tnr 94.09238272, total pos 634, total neg 61705216, time 21.10
loss 0.6135, classify loss 0.4880, regress loss 0.0273, 0.0296, 0.0248, 0.0437
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 22.63, total pos 2627, total neg 7504, time 223.33
loss 0.7913, classify loss 0.6988, regress loss 0.0197, 0.0199, 0.0204, 0.0326
Validation: tpr 0.00, tnr 94.16160580, total pos 634, total neg 61696896, time 20.98
loss 0.5864, classify loss 0.4960, regress loss 0.0173, 0.0208, 0.0194, 0.0328
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 21.35, total pos 2627, total neg 7504, time 223.07
loss 0.8088, classify loss 0.7007, regress loss 0.0252, 0.0250, 0.0222, 0.0357
Validation: tpr 0.00, tnr 92.49454172, total pos 634, total neg 61706292, time 21.06
loss 0.7395, classify loss 0.4886, regress loss 0.0569, 0.0713, 0.0486, 0.0742
去除第一个平均速度:223.6275s (比3提升4.45%,说明cudnn.benchmark确实有效果,但提升还是不大)
5.主要参数(workers=16,batch-size=12,cudnn.benchmark = False)
Epoch 001 (lr 0.01000)
Train: tpr 1.10, tnr 2.03, total pos 2627, total neg 7504, time 238.76
loss 2.2026, classify loss 1.1155, regress loss 0.2142, 0.2928, 0.2440, 0.3361
Validation: tpr 0.00, tnr 89.28941718, total pos 634, total neg 61712272, time 24.68
loss 0.7023, classify loss 0.4787, regress loss 0.0673, 0.0381, 0.0406, 0.0776
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 7.09, total pos 2627, total neg 7504, time 234.89
loss 0.8196, classify loss 0.7145, regress loss 0.0217, 0.0222, 0.0239, 0.0373
Validation: tpr 0.00, tnr 95.67126550, total pos 634, total neg 61696184, time 24.72
loss 0.7545, classify loss 0.4795, regress loss 0.0516, 0.0795, 0.0639, 0.0799
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 5.57, total pos 2627, total neg 7504, time 234.96
loss 0.8156, classify loss 0.7032, regress loss 0.0260, 0.0244, 0.0234, 0.0386
Validation: tpr 0.00, tnr 94.63027450, total pos 634, total neg 61719952, time 24.80
loss 0.6207, classify loss 0.4791, regress loss 0.0301, 0.0324, 0.0280, 0.0512
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 13.10, total pos 2627, total neg 7504, time 234.13
loss 0.7988, classify loss 0.7020, regress loss 0.0205, 0.0212, 0.0217, 0.0333
Validation: tpr 0.00, tnr 94.19245519, total pos 634, total neg 61716476, time 24.38
loss 0.5772, classify loss 0.4771, regress loss 0.0252, 0.0204, 0.0185, 0.0359
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 18.38, total pos 2627, total neg 7504, time 234.48
loss 0.7844, classify loss 0.6984, regress loss 0.0183, 0.0190, 0.0191, 0.0296
Validation: tpr 0.00, tnr 99.97719179, total pos 634, total neg 61732164, time 24.61
loss 0.6018, classify loss 0.4779, regress loss 0.0263, 0.0225, 0.0210, 0.0542
去除第一个平均速度:234.615s(对比3workers为16速度不变,可能8-16间达到上限,也可能8以之前都达到上限了。)
6.主要参数(workers=16,batch-size=12,cudnn.benchmark =True)
Epoch 001 (lr 0.01000)
Train: tpr 1.26, tnr 3.97, total pos 2627, total neg 7504, time 237.99
loss 2.0472, classify loss 1.0144, regress loss 0.2400, 0.2850, 0.2477, 0.2601
Validation: tpr 0.00, tnr 94.91618138, total pos 634, total neg 61709440, time 22.84
loss 0.7827, classify loss 0.4707, regress loss 0.0801, 0.0523, 0.0708, 0.1087
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 1.91, total pos 2627, total neg 7504, time 222.99
loss 0.8020, classify loss 0.7012, regress loss 0.0217, 0.0229, 0.0214, 0.0348
Validation: tpr 0.00, tnr 94.29440158, total pos 634, total neg 61716436, time 21.50
loss 0.6096, classify loss 0.4714, regress loss 0.0305, 0.0314, 0.0290, 0.0473
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 15.19, total pos 2627, total neg 7504, time 224.01
loss 0.7885, classify loss 0.6988, regress loss 0.0201, 0.0193, 0.0188, 0.0317
Validation: tpr 0.00, tnr 99.98782044, total pos 634, total neg 61709928, time 21.53
loss 0.6001, classify loss 0.4754, regress loss 0.0292, 0.0214, 0.0291, 0.0451
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 13.62, total pos 2627, total neg 7504, time 223.43
loss 0.7898, classify loss 0.6995, regress loss 0.0207, 0.0198, 0.0199, 0.0298
Validation: tpr 0.00, tnr 84.85430224, total pos 634, total neg 61710488, time 21.33
loss 0.5831, classify loss 0.4856, regress loss 0.0193, 0.0205, 0.0217, 0.0359
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 23.01, total pos 2627, total neg 7504, time 223.51
loss 0.7829, classify loss 0.6981, regress loss 0.0189, 0.0186, 0.0191, 0.0282
Validation: tpr 0.00, tnr 92.59010229, total pos 634, total neg 61697100, time 21.44
loss 0.6070, classify loss 0.4848, regress loss 0.0251, 0.0329, 0.0202, 0.0441
去除第一个平均速度:223.485s(对比4workers为16速度不变,可能8-16间达到上限,也可能8以之前都达到上限了。)
7.主要参数(workers=12,batch-size=12,cudnn.benchmark =True)
Epoch 001 (lr 0.01000)
Train: tpr 1.14, tnr 5.01, total pos 2627, total neg 7504, time 236.27
loss 2.2576, classify loss 1.0455, regress loss 0.2435, 0.3450, 0.2617, 0.3618
Validation: tpr 0.00, tnr 84.05598201, total pos 634, total neg 61695816, time 23.13
loss 0.7008, classify loss 0.5469, regress loss 0.0334, 0.0374, 0.0321, 0.0510
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 6.98, total pos 2627, total neg 7504, time 222.65
loss 0.8073, classify loss 0.7051, regress loss 0.0226, 0.0227, 0.0222, 0.0346
Validation: tpr 0.00, tnr 83.84634619, total pos 634, total neg 61699948, time 21.39
loss 0.6410, classify loss 0.5139, regress loss 0.0211, 0.0277, 0.0305, 0.0478
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 10.55, total pos 2627, total neg 7504, time 222.98
loss 0.7946, classify loss 0.7008, regress loss 0.0203, 0.0203, 0.0192, 0.0340
Validation: tpr 0.00, tnr 93.67948321, total pos 634, total neg 61716156, time 21.53
loss 0.5785, classify loss 0.4808, regress loss 0.0197, 0.0200, 0.0221, 0.0359
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 6.17, total pos 2627, total neg 7504, time 223.06
loss 0.8055, classify loss 0.7033, regress loss 0.0224, 0.0230, 0.0218, 0.0350
Validation: tpr 0.00, tnr 89.36315748, total pos 634, total neg 61700528, time 21.44
loss 0.5915, classify loss 0.4925, regress loss 0.0206, 0.0208, 0.0218, 0.0358
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 11.15, total pos 2627, total neg 7504, time 223.52
loss 0.8676, classify loss 0.7470, regress loss 0.0229, 0.0289, 0.0302, 0.0387
Validation: tpr 0.00, tnr 93.91831179, total pos 634, total neg 61699184, time 21.35
loss 0.6082, classify loss 0.5048, regress loss 0.0195, 0.0206, 0.0201, 0.0433
去除第一个平均速度:223.0525s(对比4,6workers为12速度不变,查看top发现平均负载为6-7之间,个别时候有7-8之间)
8.主要参数(workers=6,batch-size=12,cudnn.benchmark =True)
Epoch 001 (lr 0.01000)
Train: tpr 1.26, tnr 2.81, total pos 2627, total neg 7504, time 236.20
loss 2.0773, classify loss 1.1112, regress loss 0.2226, 0.2610, 0.2248, 0.2577
Validation: tpr 0.00, tnr 91.34270569, total pos 634, total neg 61710828, time 22.63
loss 0.9347, classify loss 0.6396, regress loss 0.0653, 0.0818, 0.0394, 0.1086
Epoch 002 (lr 0.01000)
Train: tpr 0.00, tnr 5.14, total pos 2627, total neg 7504, time 223.88
loss 0.8033, classify loss 0.7082, regress loss 0.0207, 0.0209, 0.0201, 0.0334
Validation: tpr 0.00, tnr 94.75498918, total pos 634, total neg 61706336, time 20.74
loss 0.6284, classify loss 0.4937, regress loss 0.0310, 0.0329, 0.0235, 0.0473
Epoch 003 (lr 0.01000)
Train: tpr 0.00, tnr 3.20, total pos 2627, total neg 7504, time 223.01
loss 0.8043, classify loss 0.7104, regress loss 0.0203, 0.0213, 0.0190, 0.0333
Validation: tpr 0.00, tnr 94.79560477, total pos 634, total neg 61710148, time 20.80
loss 0.5671, classify loss 0.4643, regress loss 0.0215, 0.0218, 0.0218, 0.0378
Epoch 004 (lr 0.01000)
Train: tpr 0.00, tnr 17.30, total pos 2627, total neg 7504, time 222.51
loss 0.7959, classify loss 0.7036, regress loss 0.0201, 0.0200, 0.0196, 0.0325
Validation: tpr 0.00, tnr 99.98570032, total pos 634, total neg 61707656, time 20.81
loss 0.6401, classify loss 0.5133, regress loss 0.0298, 0.0267, 0.0246, 0.0458
Epoch 005 (lr 0.01000)
Train: tpr 0.00, tnr 13.73, total pos 2627, total neg 7504, time 223.31
loss 0.7868, classify loss 0.7001, regress loss 0.0187, 0.0190, 0.0183, 0.0308
Validation: tpr 0.00, tnr 94.07303553, total pos 634, total neg 61713412, time 20.71
loss 0.6445, classify loss 0.5237, regress loss 0.0232, 0.0219, 0.0259, 0.0497
去除第一个平均速度:223.1775s(对比4,6,7workers为6速度还是不变,查看top发现平均负载基本为6以上。)
总结:设置为(workers=8,batch-size=12,cudnn.benchmark = True)比较合理。
博主对比了不同配置下,开启与关闭cudnn.benchmark对深度学习模型训练速度的影响,发现cudnn.benchmark在固定网络结构和输入形状时能提升速度,尤其是在高并发和大batch-size场景中效果明显,但提升幅度有限。
2043

被折叠的 条评论
为什么被折叠?



