【BoT】《Bag of Tricks for Image Classification with Convolutional Neural Networks》

原创

已于 2024-01-09 17:04:15 修改 · 389 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#Tricks #知识蒸馏 #mixup #cosine lr

于 2020-12-24 18:48:25 首次发布

本文详细探讨了如何通过组合大批量训练、低精度运算、模型微调等训练策略，将ResNet在ImageNet上的准确率从75.3%提升至79.29%，并展示了这些技术在对象检测和语义分割中的应用。关键策略包括线性学习率调整、零偏移初始化、知识蒸馏和数据增强方法如PCA噪声和HSV变换。

在这里插入图片描述

CVPR-2019

文章目录

1 Background and Motivation
2 Advantages / Contributions
3 Baseline
4 Efficient Training
5 Model Tweaks
6 Training Refinements
7 Transfer Learning
- 7.1 Object Detection
- 7.2 Semantic Segmentation
8 Conclusion（own）
【附录A】
【附录B——More tricks】
- 学习率

1 Background and Motivation

随着深度学习技术的发展，image classification 的精度也越来越高！

精度的提升不仅仅来自于 model architecture，也来自如下的 training procedure refinements

loss functions
data pre-processing
optimization methods
learn rate schedule
stride size of a particular convolution layer

然而 training procedure refinements 往往没有 model architecture 那样吸睛，都被冷落在论文中的 implementation details 部分 or only visible in source code

本文，作者翻牌 training procedure refinements

在这里插入图片描述

对各种 training procedure refinements 进行了详细的分析实验，通过组合拳，把 ResNet 在 ImageNet 上的 Top-1 ACC 从 75.3% 提到了 79.29%，这些技巧用在 object detection 和 semantic segmentation 任务中也能提升精度！

2 Advantages / Contributions

总结归纳各种 tricks，组合起来调 resnet，精度从 75.3 提到了 79.29（Top of ImageNet）

在这里插入图片描述

3 Baseline

提供了一种标准训练测试流程

训练

32-bit，0~255
randomly crop，aspect ratio [3/4,4/3]，scale [8%, 100%]，最后 resize 到 224×224
50% 水平翻转
hue，saturation，brightness 增强 [0.6,1.4]（参考本博客附录部分）
PCA noise（参考本博客附录部分）
减均值除以方差

在这里插入图片描述

测试

resize 短边到 256，保持 aspect ratio，从中心区域中截选出 224×224
减均值除以方差

按照上述标准流程，得到的结果如下表所示

在这里插入图片描述

4 Efficient Training

4.1 Large-batch training

大 batch-size 的好处在于，会缩短训练时间

但是

For convex problems，convergence rate decreases as batch size increases

这里理解起来可能不是那么顺畅，再看看这个讨论（Rhett 的回答）

在这里插入图片描述

换句话说

for the same number of epochs，training with a large batch size（仅仅增大 batch-size，不做其他的改变） results in a modal with degraded validation accuracy compared to the ones trained with smaller batch sizes

想在不影响收敛速度（精度）的情况下，加快训练速度（增大batch），作者给出了如下的建议

1）Linear scaling learning rate

在这里插入图片描述