Dropout Reduces Underfitting
单位:Meta AI, UC伯克利, MBZUAI
论文:https://arxiv.org/abs/2303.01500
代码(刚刚开源):
https://github.com/facebookresearch/dropout
日期: 2023年3月2日提交
当前阅读日期: 2023-03-08
论文主要贡献:
论文提供了两种改变标准dropout的方法:early dropout 和 late dropout 提高了过拟合与欠拟合模型的模型效果。
具体为:
- early dropout :在训练的开始一段时间进行dropout 在以后的训练中不使用,用于欠拟合模型更好的拟合
- late dropout: 在训练的一开始不使用dropout ,在之后的训练使用dropout,来降低已经使用dropout模型的过拟合。
论文代码
early drop 与 late drop 的实现
# https://github.com/facebookresearch/dropout/blob/main/drop_scheduler.py
import numpy as np
def drop_scheduler(drop_rate, epochs, niter_per_ep, cutoff_epoch=0, mode="standard", schedule="constant"):
'''
drop_rate: drop_rate
epochs: epochs
niter_per_ep:num_training_steps_per_epoch
cutoff_epoch: 当mode 为 early 时表示从哪个epoch drop out 结束使用
当mode 为 late 时表示从哪个epoch drop out 开始使用
mode: ["standard", "early", "late"]
schedule: drop out 是线性的还是固定的["constant", "linear"]
return : np.array length = epochs * niter_per_ep
每一batch 的 dropout
'''
assert mode in ["standard", "early", "late"]
if mode == "standard":
return np.full(epochs * niter_per_ep, drop_rate)
early_iters = cutoff_epoch * niter_per_ep
late_iters = (epochs - cutoff_epoch) * niter_per_ep
if mode == "early":
assert schedule in ["constant", "linear"]
if schedule == 'constant':
early_schedule = np.full(early_iters, drop_rate)
elif schedule == 'linear':
early_schedule = np.linspace(drop_rate, 0, early_iters)
final_schedule = np.concatenate((early_schedule, np.full(late_iters, 0)))
elif mode == "late":
assert schedule in ["constant"]
early_schedule = np.full(early_iters, 0)
final_schedule = np.concatenate((early_schedule, np.full(late_iters, drop_rate)))
assert len(final_schedule) == epochs * niter_per_ep
return final_schedule
early drop 与late drop 的使用
# 在声明模型时增加update_dropout 方法 如:
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
# ...
def update_dropout(self, drop_rate):
self.drop_rate = drop_rate
for module in self.modules():
if isinstance(module, nn.Dropout):
module.p = drop_rate
# 使用drop_path (关于drop path :stochastic depth具体可见:https://github.com/huggingface/pytorch-image-models/blob/4b8cfa6c0a355a9b3cb2a77298b240213fb3b921/timm/layers/drop.py#L137
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
def update_drop_path(self, drop_path_rate):
self.drop_path = drop_path_rate
dp_rates=[x.item() for x in torch.linspace(0, drop_path_rate, self.depth)]
for i in range(self.depth):
self.blocks[i].drop_path.drop_prob = dp_rates[i]
# 在训练时每个batch 更新一下 dropout
# https://github.com/facebookresearch/dropout/blob/main/engine.py#L114
model.module.update_dropout(schedules['do'][it])
论文效果
详见论文与git
- 参考链接: https://mp.weixin.qq.com/s/TqdOoHMtbQxveNSGgRC6Rw
- https://github.com/facebookresearch/dropout
- https://stackoverflow.com/questions/69175642/droppath-in-timm-seems-like-a-dropout