模型训练技巧——CutMix_cutmix pytorch-优快云博客

本文深入解析CutMix数据增强技巧，一种结合Cutout和Mixup优点的图像预处理方法，通过随机裁剪并拼接图像区域，提升模型泛化能力。实验证明，在图像识别和目标检测任务中效果显著。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文：https://arxiv.org/pdf/1905.04899v2.pdf

官方代码：https://github.com/clovaai/CutMix-PyTorch

1. 论文核心

简单来讲，就是从A图中随机截取一个矩形区域，用该矩形区域的像素替换掉B图中对应的矩形区域，从而形成一张新的组合图片。同时，把标签按照一定的比例（矩形区域所占整张图的面积）进行线性组合计算损失。

论文中的表达形式如下：

将图片和标签进行了线性组合。

2. 代码实现

def cutmix_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)


def cutmix_data(x, y, alpha=1.0, use_cuda=True):
    '''Returns mixed inputs, pairs of targets, and lambda'''
    assert alpha > 0
    lam = np.random.beta(alpha, alpha)
    batch_size = x.size()[0]

    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    y_a, y_b = y, y[index]
    bbx1, bby1, bbx2, bby2 = rand_bbox(x.size(), lam)
    x[:, :, bbx1:bbx2, bby1:bby2] = x[index, :, bbx1:bbx2, bby1:bby2]

    # adjust lambda to exactly match pixel ratio
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (x.size()[-1] * x.size()[-2]))

    return x, y_a, y_b, lam


def rand_bbox(size, lam):
    W = size[2]
    H = size[3]
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2


# 在train函数中做以下修改, 其他地方不做任何修改
    for (inputs, targets) in tqdm(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)

        r = np.random.rand(1)
        if r < 0.5: # 做cutmix的概率为0.5
            inputs, targets_a, targets_b, lam = cutmix_data(inputs, targets)
            inputs, targets_a, targets_b = map(Variable, (inputs, targets_a, targets_b))
            outputs = net(inputs)
            loss = cutmix_criterion(criterion, outputs, targets_a.long(), targets_b.long(), lam)
        else:
            outputs = net(inputs)
            loss = criterion(outputs, targets.long())

官方代码都写在train函数里，博主觉得函数过长，于是把核心功能cutmix封装成函数，看起来更简洁。