【general】[drop out]论文笔记：Dropout Reduces Underfitting-优快云博客

本文链接：https://blog.youkuaiyun.com/Magicapprentice/article/details/129400424

Dropout Reduces Underfitting

单位：Meta AI, UC伯克利, MBZUAI

论文：https://arxiv.org/abs/2303.01500

代码（刚刚开源）：

https://github.com/facebookresearch/dropout

日期: 2023年3月2日提交

当前阅读日期: 2023-03-08

论文主要贡献:

论文提供了两种改变标准dropout的方法：early dropout 和 late dropout 提高了过拟合与欠拟合模型的模型效果。

具体为:

early dropout ：在训练的开始一段时间进行dropout 在以后的训练中不使用，用于欠拟合模型更好的拟合
late dropout: 在训练的一开始不使用dropout ，在之后的训练使用dropout，来降低已经使用dropout模型的过拟合。

论文代码

early drop 与 late drop 的实现

# https://github.com/facebookresearch/dropout/blob/main/drop_scheduler.py
import numpy as np

def drop_scheduler(drop_rate, epochs, niter_per_ep, cutoff_epoch=0, mode="standard", schedule="constant"):
	'''
       drop_rate: drop_rate 
       epochs: epochs
       niter_per_ep:num_training_steps_per_epoch
       cutoff_epoch: 当mode 为 early 时表示从哪个epoch drop out 结束使用
										 当mode 为 late 时表示从哪个epoch drop out 开始使用
	     mode:			["standard", "early", "late"]
       schedule: drop out 是线性的还是固定的["constant", "linear"]

			 return : np.array length = epochs * niter_per_ep
													每一batch 的 dropout
	'''
    assert mode in ["standard", "early", "late"]
    if mode == "standard":
        return np.full(epochs * niter_per_ep, drop_rate)

    early_iters = cutoff_epoch * niter_per_ep
    late_iters = (epochs - cutoff_epoch) * niter_per_ep

    if mode == "early":
        assert schedule in ["constant", "linear"]
        if schedule == 'constant':
            early_schedule = np.full(early_iters, drop_rate)
        elif schedule == 'linear':
            early_schedule = np.linspace(drop_rate, 0, early_iters)
        final_schedule = np.concatenate((early_schedule, np.full(late_iters, 0)))

    elif mode == "late":
        assert schedule in ["constant"]
        early_schedule = np.full(early_iters, 0)
        final_schedule = np.concatenate((early_schedule, np.full(late_iters, drop_rate)))

    assert len(final_schedule) == epochs * niter_per_ep
    return final_schedule

early drop 与late drop 的使用

# 在声明模型时增加update_dropout 方法 如：
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
# ...
def update_dropout(self, drop_rate):
        self.drop_rate = drop_rate
        for module in self.modules():
            if isinstance(module, nn.Dropout):
                module.p = drop_rate
# 使用drop_path (关于drop  path ：stochastic depth具体可见：https://github.com/huggingface/pytorch-image-models/blob/4b8cfa6c0a355a9b3cb2a77298b240213fb3b921/timm/layers/drop.py#L137
# https://github.com/facebookresearch/dropout/blob/main/models/vision_transformer.py
def update_drop_path(self, drop_path_rate):
        self.drop_path = drop_path_rate
        dp_rates=[x.item() for x in torch.linspace(0, drop_path_rate, self.depth)]
        for i in range(self.depth):
            self.blocks[i].drop_path.drop_prob = dp_rates[i]

# 在训练时每个batch 更新一下 dropout
# https://github.com/facebookresearch/dropout/blob/main/engine.py#L114
model.module.update_dropout(schedules['do'][it])