【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》

在这里插入图片描述

CVPR-2021



1 Background and Motivation

Instance segmentation often data-hungry

人工标注成本较高,本文聚焦 data augmentation 类方法来缓解上述问题

虽然有许多 data augmentation 方法被提出,但 more general-purpose in nature and have not been designed specifically for instance segmentation.

本文作者提出了 Copy-Paste 实例分割数据增广方法,randomly picking objects and pasting them at random locations on the target image

作者方法和类似于 【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》,区别在于

1)not use geometric transformationss (e.g.rotation),find Gaussian blurring of the pasted instances not beneficial

2)拓展到了 semi-supervised learning

3)pasting objects contained in one image into another image already populated with instances(区别于全前景贴全背景)

4)数据集从 CMU 到了应用更广的 COCO 和 LVIS

在这里插入图片描述

2 Related Work

  • Data Augmentations
    mainly used for encoding invariances to data transformations, 有利于分类
  • Mixing Image Augmentations(mixup, CutMix and Mosaic)
    still not object-aware and have not been designed specifically for the task of instance segmentation
  • Copy-Paste Augmentation
  • Instance Segmentation
  • Long-Tail Visual Recognition
    data re-samplin and loss re-weighting,作者方法 yields significant gains

3 Advantages / Contributions

在这里插入图片描述

提出了 Copy-Paste 方法,provides a significant boost on top of baselines across multiple settings.

it gives solid improvements across a wide range of settings with variability in backbone architecture, extent of scale jittering, training schedule and image size.

4 Method

在这里插入图片描述

  • scale jittering and random horizontal flipping

  • select a random subset of objects from one of the images and paste them onto the other image

  • adjust the ground-truth annotations:remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

generated images can look very different from real images

1)Blending Pasted Objects

I 1 × α + I 2 × ( 1 − α ) I_1 \times \alpha + I_2 \times (1-\alpha) I1×α+I2×(1α)

其中 α \alpha α 为 binary mask, I 1 I_1 I1 is the pasted image and I 2 I_2 I2 is the main image

simply composing without any blending has similar performance,不像 【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》 中探索了不同的 composing 方法

2)Large Scale Jittering

在这里插入图片描述

  • standard scale jittering (SSJ) :0.8~1.25

  • large scale jittering (LSJ):0.1~2.0

3)Self-training Copy-Paste

标准的 self-training 流程

  • train a supervised model with Copy-Paste augmentation on labeled data
  • generate pseudo labels on unlabeled data
  • paste ground-truth instances into pseudo labeled and supervised labeled images and train a model on this new data

5 Experiments

training a model from scratch with large scale jittering and Copy-Paste augmentation requires 576 epochs while training with only standard scale jittering takes 96 epochs.

加了 scale jittering 和 Copy-Paste 训练的更久更猛

5.1 Datasets

  • COCO
  • VOC
  • LVIS
    LVIS has 1203 classes to simulate the long-tail distribution of classes in natural images.在这里插入图片描述
    在这里插入图片描述

5.2 Copy-Paste is robust to training configurations

robust across a variety of training iterations, models and training hyperparameters.

1)Robustness to backbone initialization

在这里插入图片描述

2)Robustness to training schedules

在这里插入图片描述
训练更久,性能还能进一步提升

3)Copy-Paste is additive to large scale jittering augmentation
在这里插入图片描述
mixup 搭配 SSJ 还行,mixup 搭配 LSJ 效果基本被抵消,但 Copy-Paste 和 LSJ 兼容得不错

4)Copy-Paste works across backbone architectures and image sizes

在这里插入图片描述
看看不同尺寸的提升

在这里插入图片描述

5.3 Copy-Paste helps data-efficiency

在这里插入图片描述
用更少的数据达到同样的效果

a model trained on 75% of COCO with Copy-Paste and LSJ has a similar AP to a model trained on 100% of COCO with LSJ.

5.4 Copy-Paste and self-training are additive

self-training 相关实验,引入了额外的数据集
在这里插入图片描述

1)Data to Paste on
在这里插入图片描述

  • supervised COCO data (120k images)
  • pseudo labeled data (110k images from unlabeled COCO and 610k from Objects365).

表3 的结果可以和表2 对应上

2)Data to Copy from

pasting pseudo labeled objects from an unlabeled dataset directly into the COCO labeled dataset.

反过来贴,no additional AP improvements.

5.5 Copy-Paste improves COCO state-of-the-art

在这里插入图片描述

5.6 Copy-Paste produces better representations for PASCAL detection and segmentation

trained with Copy-Paste on COCO.

transfer learning experiments on the PASCAL VOC 2007 dataset.

在这里插入图片描述
在这里插入图片描述

5.7 Copy-Paste provides strong gains on LVIS

长尾数据集

two different training paradigms typically used for LVIS:

  • single-stage where a detector is trained directly on the LVIS dataset

  • two-stage where the model from the first stage is fine-tuned with class re-balancing losses to help handle the class imbalance.
    损失的形式为 ( 1 − β ) / ( 1 − β n ) (1−β)/(1−β^n) (1β)/(1βn),where n n n is the number of instances of the class and β = 0.999 β = 0.999 β=0.999

1)Copy-Paste improves single-stage LVIS training

在这里插入图片描述
APr (the AP for rare classes)

Repeat Factor Sampling (RFS) are used to handle the class imbalance problem on LVIS

可以看到 Copy-Paste 和 RFS 可兼容

2)Copy-Paste improves two-stage LVIS training

在这里插入图片描述

3)Comparison with the state-of-the-art

在这里插入图片描述

6 Appendix

6.1 Ablation on the Copy-Paste method

1)Subset of pasted objects

效果最好的是 pasting a random subset of objects,one 或者 all 都没有 random 的好
在这里插入图片描述

2)Blending

表10 可以看出,不太 care blending 的形式,不用也没有掉点

3)Scale jittering

random scale jittering on both the pasted image (image that pasted objects are being copied from) and the main image.
在这里插入图片描述
引入 LSJ 主要的提升来自于对 main image 的增广

6.2 Copy-Paste provides more gain on harder categories of COCO

在这里插入图片描述
横坐标按 baseline 各类别的 AP 排序的,纵坐标是提升率,不是提升的点

6.3 How likely objects are copied to an un-matched scene?

compute the probability of copying objects to an unmatched scene category

仅区别室内室外场景

We found there are 42538 indoor and 71017 outdoor images (we couldn’t estimate the category of the rest 4732 images).
在这里插入图片描述
copy objects to an unmatched scene in about half (46.8%) of generated images.(23.4%+23.4%)

6.4 Benchmark results on different object sizes

在这里插入图片描述

7 Conclusion(own)

"""
Unofficial implementation of Copy-Paste for semantic segmentation
"""
 
from PIL import Image
import imgviz
import cv2
import argparse
import os
import numpy as np
import tqdm
 
 
def save_colored_mask(mask, save_path):
    lbl_pil = Image.fromarray(mask.astype(np.uint8), mode="P")
    colormap = imgviz.label_colormap()
    lbl_pil.putpalette(colormap.flatten())
    lbl_pil.save(save_path)
 
 
def random_flip_horizontal(mask, img, p=0.5):
    if np.random.random() < p:
        img = img[:, ::-1, :]
        mask = mask[:, ::-1]
    return mask, img
 
 
def img_add(img_src, img_main, mask_src):
    if len(img_main.shape) == 3:
        h, w, c = img_main.shape
    elif len(img_main.shape) == 2:
        h, w = img_main.shape
    mask = np.asarray(mask_src, dtype=np.uint8)
    sub_img01 = cv2.add(img_src, np.zeros(np.shape(img_src), dtype=np.uint8), mask=mask) # src 前景抠出来
    mask_02 = cv2.resize(mask, (w, h), interpolation=cv2.INTER_NEAREST)
    mask_02 = np.asarray(mask_02, dtype=np.uint8)
    sub_img02 = cv2.add(img_main, np.zeros(np.shape(img_main), dtype=np.uint8),
                        mask=mask_02) # img_main 对应的 src 前景区域抠出来
    img_main = img_main - sub_img02 + cv2.resize(sub_img01, (img_main.shape[1], img_main.shape[0]),
                                                 interpolation=cv2.INTER_NEAREST) # 去掉 main 中的 src前景区域,把 src 前景区域贴上来
    return img_main
 
 
def rescale_src(mask_src, img_src, h, w):
    if len(mask_src.shape) == 3:
        h_src, w_src, c = mask_src.shape
    elif len(mask_src.shape) == 2:
        h_src, w_src = mask_src.shape
    max_reshape_ratio = min(h / h_src, w / w_src)
    rescale_ratio = np.random.uniform(0.2, max_reshape_ratio)
 
    # reshape src img and mask
    rescale_h, rescale_w = int(h_src * rescale_ratio), int(w_src * rescale_ratio)
    mask_src = cv2.resize(mask_src, (rescale_w, rescale_h),
                          interpolation=cv2.INTER_NEAREST)
    # mask_src = mask_src.resize((rescale_w, rescale_h), Image.NEAREST)
    img_src = cv2.resize(img_src, (rescale_w, rescale_h),
                         interpolation=cv2.INTER_LINEAR)
 
    # set paste coord
    py = int(np.random.random() * (h - rescale_h))
    px = int(np.random.random() * (w - rescale_w))
 
    # paste src img and mask to a zeros background
    img_pad = np.zeros((h, w, 3), dtype=np.uint8)
    mask_pad = np.zeros((h, w), dtype=np.uint8)
    img_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio), :] = img_src
    mask_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio)] = mask_src
 
    return mask_pad, img_pad
 
 
def Large_Scale_Jittering(mask, img, min_scale=0.1, max_scale=2.0):
    rescale_ratio = np.random.uniform(min_scale, max_scale)
    h, w, _ = img.shape
 
    # rescale
    h_new, w_new = int(h * rescale_ratio), int(w * rescale_ratio)
    img = cv2.resize(img, (w_new, h_new), interpolation=cv2.INTER_LINEAR)
    mask = cv2.resize(mask, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
    # mask = mask.resize((w_new, h_new), Image.NEAREST)
 
    # crop or padding
    x, y = int(np.random.uniform(0, abs(w_new - w))), int(np.random.uniform(0, abs(h_new - h)))
    if rescale_ratio <= 1.0:  # padding
        img_pad = np.ones((h, w, 3), dtype=np.uint8) * 168
        mask_pad = np.zeros((h, w), dtype=np.uint8)
        img_pad[y:y+h_new, x:x+w_new, :] = img
        mask_pad[y:y+h_new, x:x+w_new] = mask
        return mask_pad, img_pad
    else:  # crop
        img_crop = img[y:y+h, x:x+w, :]
        mask_crop = mask[y:y+h, x:x+w]
        return mask_crop, img_crop
 
 
def copy_paste(mask_src, img_src, mask_main, img_main):
    mask_src, img_src = random_flip_horizontal(mask_src, img_src)
    mask_main, img_main = random_flip_horizontal(mask_main, img_main)
 
    # LSJ, Large_Scale_Jittering
    if args.lsj:
        mask_src, img_src = Large_Scale_Jittering(mask_src, img_src)
        mask_main, img_main = Large_Scale_Jittering(mask_main, img_main)
    else:
        # rescale mask_src/img_src to less than mask_main/img_main's size
        h, w, _ = img_main.shape
        mask_src, img_src = rescale_src(mask_src, img_src, h, w)
 
    img = img_add(img_src, img_main, mask_src) # src 的前景抠出来贴 main
    mask = img_add(mask_src, mask_main, mask_src)
 
    return mask, img
 
 
def main(args):
    # input path
    segclass = os.path.join(args.input_dir, 'SegmentationClass')
    JPEGs = os.path.join(args.input_dir, 'JPEGImages')
 
    # create output path
    os.makedirs(args.output_dir, exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'SegmentationClass'), exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'JPEGImages'), exist_ok=True)
 
    masks_path = os.listdir(segclass)
    tbar = tqdm.tqdm(masks_path, ncols=100)
    for mask_path in tbar:
        # get source mask and img
        mask_src = np.asarray(Image.open(os.path.join(segclass, mask_path)), dtype=np.uint8)
        img_src = cv2.imread(os.path.join(JPEGs, mask_path.replace('.png', '.jpg')))
 
        # random choice main mask/img
        mask_main_path = np.random.choice(masks_path)
        mask_main = np.asarray(Image.open(os.path.join(segclass, mask_main_path)), dtype=np.uint8)
        img_main = cv2.imread(os.path.join(JPEGs, mask_main_path.replace('.png', '.jpg')))
 
        # Copy-Paste data augmentation
        mask, img = copy_paste(mask_src, img_src, mask_main, img_main) # 调用 copy_paste 方法
 
        mask_filename = "copy_paste_" + mask_path
        img_filename = mask_filename.replace('.png', '.jpg')
        save_colored_mask(mask, os.path.join(args.output_dir, 'SegmentationClass', mask_filename)) # 保存 mask
        cv2.imwrite(os.path.join(args.output_dir, 'JPEGImages', img_filename), img) # 保存合成后的图片
 
 
def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input_dir", default="../dataset/VOCdevkit2012/VOC2012", type=str,
                        help="input annotated directory")
    parser.add_argument("--output_dir", default="../dataset/VOCdevkit2012/VOC2012_copy_paste", type=str,
                        help="output dataset directory")
    parser.add_argument("--lsj", default=True, type=bool, help="if use Large Scale Jittering")
    return parser.parse_args()
 
 
if __name__ == '__main__':
    args = get_args()
    main(args)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值