paper—《Bag of Tricks for Image Classification with Convolutional Neural Networks》中提到
“Using large batch size, however, may slow down the training progress. For convex problems, convergence rate decreases as batch size increases. Similar empirical results have been reported for neural networks [25]. In other words, for the same number of epochs, training with a large batch size results in a model with degraded validation accuracy compared to the ones trained with smaller batch sizes”
针对这句话,有个问题:
从理论上来讲,batch size increases能够使得训练中数据的方差更小,即更加不易受小样本更新时噪声的影响,其训练速度会更快,那为什么最后会导致泛化性能下降?
带着这个问题,找到了这篇paper—《ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA》,其中提到两点来解释这个现象,并给出了实验来支撑:
(1)LB(large-batch) methods lack the explorative properties of SB(small-batch) methods and tend to zoom-in on the min

大型批量大小在理论上能降低训练中的数据方差,提高训练速度,但实验证明,这可能导致模型泛化性能下降。研究指出,小型批量方法有更强的探索性,能收敛到具有更好泛化的平坦极小值,而大型批量方法易收敛到尖锐极小值。为缓解这一问题,可以采用预热训练、数据增强、保守训练、对抗训练和适当的学习率策略,如周期性学习率调整和学习率重启等。
最低0.47元/天 解锁文章
480

被折叠的 条评论
为什么被折叠?



