【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》

最新推荐文章于 2025-05-04 23:37:29 发布

原创最新推荐文章于 2025-05-04 23:37:29 发布 · 1.4k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #python #开发语言

CNN / Transformer 专栏收录该内容

276 篇文章

订阅专栏

在这里插入图片描述

CVPR-2021

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Appendix
7 Conclusion（own）

1 Background and Motivation

Instance segmentation often data-hungry

人工标注成本较高，本文聚焦 data augmentation 类方法来缓解上述问题

虽然有许多 data augmentation 方法被提出，但 more general-purpose in nature and have not been designed speciﬁcally for instance segmentation.

本文作者提出了 Copy-Paste 实例分割数据增广方法，randomly picking objects and pasting them at random locations on the target image

作者方法和类似于【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》，区别在于

1）not use geometric transformationss (e.g.rotation)，ﬁnd Gaussian blurring of the pasted instances not beneﬁcial

2）拓展到了 semi-supervised learning

3）pasting objects contained in one image into another image already populated with instances（区别于全前景贴全背景）

4）数据集从 CMU 到了应用更广的 COCO 和 LVIS

在这里插入图片描述

2 Related Work

Data Augmentations
mainly used for encoding invariances to data transformations, 有利于分类
Mixing Image Augmentations（mixup, CutMix and Mosaic）
still not object-aware and have not been designed speciﬁcally for the task of instance segmentation
Copy-Paste Augmentation
Instance Segmentation
Long-Tail Visual Recognition
data re-samplin and loss re-weighting，作者方法 yields signiﬁcant gains

3 Advantages / Contributions

在这里插入图片描述

提出了 Copy-Paste 方法，provides a signiﬁcant boost on top of baselines across multiple settings.

it gives solid improvements across a wide range of settings with variability in backbone architecture, extent of scale jittering, training schedule and image size.

4 Method

在这里插入图片描述

scale jittering and random horizontal ﬂipping
select a random subset of objects from one of the images and paste them onto the other image
adjust the ground-truth annotations：remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

generated images can look very different from real images

1）Blending Pasted Objects

$I_1 \times \alpha + I_2 \times (1-\alpha)$

其中 $\alpha$ 为 binary mask， $I_1$ is the pasted image and $I_2$ is the main image

simply composing without any blending has similar performance，不像【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》中探索了不同的 composing 方法

2）Large Scale Jittering

在这里插入图片描述

standard scale jittering (SSJ) ：0.8~1.25
large scale jittering (LSJ)：0.1~2.0

3）Self-training Copy-Paste

标准的 self-training 流程

train a supervised model with Copy-Paste augmentation on labeled data
generate pseudo labels on unlabeled data
paste ground-truth instances into pseudo labeled and supervised labeled images and train a model on this new data

5 Experiments

training a model from scratch with large scale jittering and Copy-Paste augmentation requires 576 epochs while training with only standard scale jittering takes 96 epochs.

加了 scale jittering 和 Copy-Paste 训练的更久更猛

5.1 Datasets

COCO
VOC
LVIS
LVIS has 1203 classes to simulate the long-tail distribution of classes in natural images.

5.2 Copy-Paste is robust to training conﬁgurations

robust across a variety of training iterations, models and training hyperparameters.

1）Robustness to backbone initialization

在这里插入图片描述

2）Robustness to training schedules

在这里插入图片描述
训练更久，性能还能进一步提升

3）Copy-Paste is additive to large scale jittering augmentation
在这里插入图片描述
mixup 搭配 SSJ 还行，mixup 搭配 LSJ 效果基本被抵消，但 Copy-Paste 和 LSJ 兼容得不错

4）Copy-Paste works across backbone architectures and image sizes

在这里插入图片描述
看看不同尺寸的提升

在这里插入图片描述

5.3 Copy-Paste helps data-efﬁciency

在这里插入图片描述
用更少的数据达到同样的效果

a model trained on 75% of COCO with Copy-Paste and LSJ has a similar AP to a model trained on 100% of COCO with LSJ.

5.4 Copy-Paste and self-training are additive

self-training 相关实验，引入了额外的数据集
在这里插入图片描述

1）Data to Paste on
在这里插入图片描述

supervised COCO data (120k images)
pseudo labeled data (110k images from unlabeled COCO and 610k from Objects365).

表3 的结果可以和表2 对应上

2）Data to Copy from

pasting pseudo labeled objects from an unlabeled dataset directly into the COCO labeled dataset.

反过来贴，no additional AP improvements.

5.5 Copy-Paste improves COCO state-of-the-art

在这里插入图片描述

5.6 Copy-Paste produces better representations for PASCAL detection and segmentation

trained with Copy-Paste on COCO.

transfer learning experiments on the PASCAL VOC 2007 dataset.

在这里插入图片描述

5.7 Copy-Paste provides strong gains on LVIS

长尾数据集

two different training paradigms typically used for LVIS:

single-stage where a detector is trained directly on the LVIS dataset
two-stage where the model from the ﬁrst stage is ﬁne-tuned with class re-balancing losses to help handle the class imbalance.
损失的形式为 $1−β)/(1−β^n)$ ，where $n$ is the number of instances of the class and $β = 0.999$

1）Copy-Paste improves single-stage LVIS training

在这里插入图片描述
APr (the AP for rare classes)

Repeat Factor Sampling (RFS) are used to handle the class imbalance problem on LVIS

可以看到 Copy-Paste 和 RFS 可兼容

2）Copy-Paste improves two-stage LVIS training

在这里插入图片描述

3）Comparison with the state-of-the-art

在这里插入图片描述

6 Appendix

6.1 Ablation on the Copy-Paste method

1）Subset of pasted objects

效果最好的是 pasting a random subset of objects，one 或者 all 都没有 random 的好
在这里插入图片描述

2）Blending

表10 可以看出，不太 care blending 的形式，不用也没有掉点

3）Scale jittering

random scale jittering on both the pasted image (image that pasted objects are being copied from) and the main image.
在这里插入图片描述
引入 LSJ 主要的提升来自于对 main image 的增广

6.2 Copy-Paste provides more gain on harder categories of COCO

在这里插入图片描述
横坐标按 baseline 各类别的 AP 排序的，纵坐标是提升率，不是提升的点

6.3 How likely objects are copied to an un-matched scene?

compute the probability of copying objects to an unmatched scene category

仅区别室内室外场景

We found there are 42538 indoor and 71017 outdoor images (we couldn’t estimate the category of the rest 4732 images).
在这里插入图片描述
copy objects to an unmatched scene in about half (46.8%) of generated images.（23.4%+23.4%）

6.4 Benchmark results on different object sizes

在这里插入图片描述

7 Conclusion（own）

效果确实很惊艳，拿下COCO目标检测和实例分割双料第一名！目标检测数据刷到57.3 AP，实例分割刷到49.1 AP！（Table4）
Instaboost: Boosting instance segmentation via probability map guided copypasting. In ICCV, 2019.
半监督学习之self-training
复制-粘贴大法（Copy-Paste）：简单而有效的数据增强
代码复现：Copy-Paste 数据增强for 语义分割
数据格式是 VOC 版本的

"""
Unofficial implementation of Copy-Paste for semantic segmentation
"""
 
from PIL import Image
import imgviz
import cv2
import argparse
import os
import numpy as np
import tqdm
 
 
def save_colored_mask(mask, save_path):
    lbl_pil = Image.fromarray(mask.astype(np.uint8), mode="P")
    colormap = imgviz.label_colormap()
    lbl_pil.putpalette(colormap.flatten())
    lbl_pil.save(save_path)
 
 
def random_flip_horizontal(mask, img, p=0.5):
    if np.random.random() < p:
        img = img[:, ::-1, :]
        mask = mask[:, ::-1]
    return mask, img
 
 
def img_add(img_src, img_main, mask_src):
    if len(img_main.shape) == 3:
        h, w, c = img_main.shape
    elif len(img_main.shape) == 2:
        h, w = img_main.shape
    mask = np.asarray(mask_src, dtype=np.uint8)
    sub_img01 = cv2.add(img_src, np.zeros(np.shape(img_src), dtype=np.uint8), mask=mask) # src 前景抠出来
    mask_02 = cv2.resize(mask, (w, h), interpolation=cv2.INTER_NEAREST)
    mask_02 = np.asarray(mask_02, dtype=np.uint8)
    sub_img02 = cv2.add(img_main, np.zeros(np.shape(img_main), dtype=np.uint8),
                        mask=mask_02) # img_main 对应的 src 前景区域抠出来
    img_main = img_main - sub_img02 + cv2.resize(sub_img01, (img_main.shape[1], img_main.shape[0]),
                                                 interpolation=cv2.INTER_NEAREST) # 去掉 main 中的 src前景区域，把 src 前景区域贴上来
    return img_main
 
 
def rescale_src(mask_src, img_src, h, w):
    if len(mask_src.shape) == 3:
        h_src, w_src, c = mask_src.shape
    elif len(mask_src.shape) == 2:
        h_src, w_src = mask_src.shape
    max_reshape_ratio = min(h / h_src, w / w_src)
    rescale_ratio = np.random.uniform(0.2, max_reshape_ratio)
 
    # reshape src img and mask
    rescale_h, rescale_w = int(h_src * rescale_ratio), int(w_src * rescale_ratio)
    mask_src = cv2.resize(mask_src, (rescale_w, rescale_h),
                          interpolation=cv2.INTER_NEAREST)
    # mask_src = mask_src.resize((rescale_w, rescale_h), Image.NEAREST)
    img_src = cv2.resize(img_src, (rescale_w, rescale_h),
                         interpolation=cv2.INTER_LINEAR)
 
    # set paste coord
    py = int(np.random.random() * (h - rescale_h))
    px = int(np.random.random() * (w - rescale_w))
 
    # paste src img and mask to a zeros background
    img_pad = np.zeros((h, w, 3), dtype=np.uint8)
    mask_pad = np.zeros((h, w), dtype=np.uint8)
    img_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio), :] = img_src
    mask_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio)] = mask_src
 
    return mask_pad, img_pad
 
 
def Large_Scale_Jittering(mask, img, min_scale=0.1, max_scale=2.0):
    rescale_ratio = np.random.uniform(min_scale, max_scale)
    h, w, _ = img.shape
 
    # rescale
    h_new, w_new = int(h * rescale_ratio), int(w * rescale_ratio)
    img = cv2.resize(img, (w_new, h_new), interpolation=cv2.INTER_LINEAR)
    mask = cv2.resize(mask, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
    # mask = mask.resize((w_new, h_new), Image.NEAREST)
 
    # crop or padding
    x, y = int(np.random.uniform(0, abs(w_new - w))), int(np.random.uniform(0, abs(h_new - h)))
    if rescale_ratio <= 1.0:  # padding
        img_pad = np.ones((h, w, 3), dtype=np.uint8) * 168
        mask_pad = np.zeros((h, w), dtype=np.uint8)
        img_pad[y:y+h_new, x:x+w_new, :] = img
        mask_pad[y:y+h_new, x:x+w_new] = mask
        return mask_pad, img_pad
    else:  # crop
        img_crop = img[y:y+h, x:x+w, :]
        mask_crop = mask[y:y+h, x:x+w]
        return mask_crop, img_crop
 
 
def copy_paste(mask_src, img_src, mask_main, img_main):
    mask_src, img_src = random_flip_horizontal(mask_src, img_src)
    mask_main, img_main = random_flip_horizontal(mask_main, img_main)
 
    # LSJ， Large_Scale_Jittering
    if args.lsj:
        mask_src, img_src = Large_Scale_Jittering(mask_src, img_src)
        mask_main, img_main = Large_Scale_Jittering(mask_main, img_main)
    else:
        # rescale mask_src/img_src to less than mask_main/img_main's size
        h, w, _ = img_main.shape
        mask_src, img_src = rescale_src(mask_src, img_src, h, w)
 
    img = img_add(img_src, img_main, mask_src) # src 的前景抠出来贴 main
    mask = img_add(mask_src, mask_main, mask_src)
 
    return mask, img
 
 
def main(args):
    # input path
    segclass = os.path.join(args.input_dir, 'SegmentationClass')
    JPEGs = os.path.join(args.input_dir, 'JPEGImages')
 
    # create output path
    os.makedirs(args.output_dir, exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'SegmentationClass'), exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'JPEGImages'), exist_ok=True)
 
    masks_path = os.listdir(segclass)
    tbar = tqdm.tqdm(masks_path, ncols=100)
    for mask_path in tbar:
        # get source mask and img
        mask_src = np.asarray(Image.open(os.path.join(segclass, mask_path)), dtype=np.uint8)
        img_src = cv2.imread(os.path.join(JPEGs, mask_path.replace('.png', '.jpg')))
 
        # random choice main mask/img
        mask_main_path = np.random.choice(masks_path)
        mask_main = np.asarray(Image.open(os.path.join(segclass, mask_main_path)), dtype=np.uint8)
        img_main = cv2.imread(os.path.join(JPEGs, mask_main_path.replace('.png', '.jpg')))
 
        # Copy-Paste data augmentation
        mask, img = copy_paste(mask_src, img_src, mask_main, img_main) # 调用 copy_paste 方法
 
        mask_filename = "copy_paste_" + mask_path
        img_filename = mask_filename.replace('.png', '.jpg')
        save_colored_mask(mask, os.path.join(args.output_dir, 'SegmentationClass', mask_filename)) # 保存 mask
        cv2.imwrite(os.path.join(args.output_dir, 'JPEGImages', img_filename), img) # 保存合成后的图片
 
 
def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input_dir", default="../dataset/VOCdevkit2012/VOC2012", type=str,
                        help="input annotated directory")
    parser.add_argument("--output_dir", default="../dataset/VOCdevkit2012/VOC2012_copy_paste", type=str,
                        help="output dataset directory")
    parser.add_argument("--lsj", default=True, type=bool, help="if use Large Scale Jittering")
    return parser.parse_args()
 
 
if __name__ == '__main__':
    args = get_args()
    main(args)