[翻系列]检测框的数据增强2：Scale与Translate

原文链接：https://blog.paperspace.com

在这里插入图片描述

本章我们将实现位移以及大小变换技术，还考虑到标准框超出图像的时如何处理，很鲁棒，放心用！

代码地址

本章用的方法和所有数据增强方法都放在下面这个链接中

https://github.com/Paperspace/DataAugmentationForObjectDetection

Scale:大小变换

通过scale之后结果大致如图所示

在这里插入图片描述

左：原始图像右：scale后的图像

设计细节

第一件事我们要想想scale函数的参数。显而易见，我们要先问放大倍数是多少，它必然要大于-1也就是不能小于0
你的第一反应是不是scale只是一个？那是不是代表长\宽是同时放大缩小，但是我们如果引入一个参数diff来判断是否改变长宽比，我们就可以做到让长\宽各自任意地缩放。
我们得到一个比例因子，如果使用者可以提供比例因子将被采样的范围，程序将进行抽取。但使用者仅提供一个小数且必须为正，那么采样的范围将是(-小数，小数)

让我们先定义__init__函数

class RandomScale(object):
    """Randomly scales an image    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    scale: float or tuple(float)
        if **float**, the image is scaled by a factor drawn 
        randomly from a range (1 - `scale` , 1 + `scale`). If **tuple**,
        the `scale` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Scaled image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, scale = 0.2, diff = False):
        self.scale = scale

        
        if type(self.scale) == tuple:
            assert len(self.scale) == 2, "Invalid range"
            assert self.scale[0] > -1, "Scale factor can't be less than -1"
            assert self.scale[1] > -1, "Scale factor can't be less than -1"
        else:
            assert self.scale > 0, "Please input a positive float"
            self.scale = (max(-1, -self.scale), self.scale)
        
        self.diff = diff

多说一句，请不要在__init__函数中确定你的比例因子，因为一旦在这里确定，就失去了随机变换的功能。

把这项工作放在__call__中！

增强逻辑

图像的变换其实很简单，使用cv2.resize一步到位，我们要做的仅仅是对检测框的调整。

img_shape = img.shape
        
if self.diff:
    # 实现了长宽不同比例的变换
	scale_x = random.uniform(*self.scale)
	scale_y = random.uniform(*self.scale)
else:
	scale_x = random.uniform(*self.scale)
	scale_y = scale_x

resize_scale_x = 1 + scale_x
resize_scale_y = 1 + scale_y

img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]

要注意到的是，输入多大的图片输出大小仍不变，如果比例因子为负，也就是图片缩小，将使用0像素点进行填充，将像上面的示例一样。

上面的代码实现了变换，下面将保证图片与原输入大小相同，沉下心慢慢看！

首先我们先创建一个和输入相同大小的空图像，也就是画布。

canvas = np.zeros(img_shape, dtype = np.uint8)

之后，把变换后的图片放到画布里。图像部分就搞定了！

y_lim = int(min(resize_scale_y,1)*img_shape[0])
x_lim = int(min(resize_scale_x,1)*img_shape[1])

canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

img = canvas

检测框修正

最后一件事就是保证检测框的正确，举个例子，假设比例因子为0.1，也就是图像要放大1.1倍，这样子将如下图导致足球消失，那么应该怎么做呢？

在这里插入图片描述

足球在放大中被删去

不对啊！难道这个在其他增强中不也会出现么！所以很重要！

因此我们在bbox_utils.py中定义了clip_box来保证检测框在图像范围内。我们用一个可控制的参数代表百分比，用此参数与边界框在图像的百分比比较来判断是否去掉检测框。

定义如下

def clip_box(bbox, clip_box, alpha):
    """Clip the bounding boxes to the borders of an image
    
    Parameters
    ----------
    
    bbox: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
    
    clip_box: numpy.ndarray
        An array of shape (4,) specifying the diagonal co-ordinates of the image
        The coordinates are represented in the format `x1 y1 x2 y2`
        
    alpha: float
        If the fraction of a bounding box left in the image after being clipped is 
        less than `alpha` the bounding box is dropped. 
    
    Returns
    -------
    
    numpy.ndarray
        Numpy array containing **clipped** bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes left are being clipped and the bounding boxes are represented in the
        format `x1 y1 x2 y2` 
    
    """
    ar_ = (bbox_area(bbox))
    x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)
    y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)
    x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)
    y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)
    
    bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))
    
    delta_area = ((ar_ - bbox_area(bbox))/ar_)
    
    mask = (delta_area < (1 - alpha)).astype(int)
    
    bbox = bbox[mask == 1,:]


    return bbox

上面这个函数就完成了我们要求的功能。

如果不想理解看源码也没问题，我们只需简单地在数据增强类中的__call__调用它就好了，这里设置了重合小于25%删去检测框，img_shape是变换后图像的大小。

bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)

为了计算框的面积，还额外定义了面积计算函数bbox_area

最后我们的__call__如下

def __call__(self, img, bboxes):


	#Chose a random digit to scale by 

	img_shape = img.shape

	if self.diff:
		scale_x = random.uniform(*self.scale)
		scale_y = random.uniform(*self.scale)
	else:
		scale_x = random.uniform(*self.scale)
		scale_y = scale_x

    resize_scale_x = 1 + scale_x
    resize_scale_y = 1 + scale_y

    img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

    bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]



    canvas = np.zeros(img_shape, dtype = np.uint8)

    y_lim = int(min(resize_scale_y,1)*img_shape[0])
    x_lim = int(min(resize_scale_x,1)*img_shape[1])

    print(y_lim, x_lim)

    canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

    img = canvas
    bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)


    return img, bboxes

Translate: 位移

接下来讲讲位移

在这里插入图片描述

同样，我们也需要一个比例因子来确定图片移动的距离。

当然，这时候就要保证比例因子不仅要大于-1，还要大于1，否则得到的图像不就成空白了嘛！

所以初始化函数主要做了这个工作。

class RandomTranslate(object):
    """Randomly Translates the image    
    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    translate: float or tuple(float)
        if **float**, the image is translated by a factor drawn 
        randomly from a range (1 - `translate` , 1 + `translate`). If **tuple**,
        `translate` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Translated image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, translate = 0.2, diff = False):
        self.translate = translate
        
        if type(self.translate) == tuple:
            assert len(self.translate) == 2, "Invalid range"  
            assert self.translate[0] > 0 & self.translate[0] < 1
            assert self.translate[1] > 0 & self.translate[1] < 1


        else:
            assert self.translate > 0 & self.translate < 1
            self.translate = (-self.translate, self.translate)
            
            
        self.diff = diff

增强逻辑

位移比尺寸变换有更多的细节，且听娓娓道来！

首先我们设置一些变量。

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape

    #translate the image

    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
    translate_factor_y = random.uniform(*self.translate)

    if not self.diff:
    translate_factor_y = translate_factor_x

当然，位移也会产生多余的空间，我们仍用空白填充，如果你刚刚认真学习了的话，相信对你不成问题。

跟之前一样，先定义画布

canvas = np.zeros(img_shape)

现在有两个问题：

我们需要得到输入的哪一部分？
我们要把得到的这一部分放在画布的哪个位置？

在这里插入图片描述

左：将保留的输入部分右：放在画布的位置

首先我们来确定输入被保留的部分，也就是上方左图紫色区域。

#get the top-left corner co-ordinates of the shifted image 
corner_x = int(translate_factor_x*img.shape[1])
corner_y = int(translate_factor_y*img.shape[0])

mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]

然后把这一部分放在画布中，代码很精彩，一定要看！

orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
img = canvas

代码看懂了嘛，接下来就别紧张了，检测框调整很简单，只用加上移动的距离就好，不过记得用clip_bbox呢！

bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]

bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)

所以__call__定义如下

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape
    
    #translate the image
    
    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
    translate_factor_y = random.uniform(*self.translate)
    
    if not self.diff:
        translate_factor_y = translate_factor_x
        
    canvas = np.zeros(img_shape).astype(np.uint8)


    corner_x = int(translate_factor_x*img.shape[1])
    corner_y = int(translate_factor_y*img.shape[0])
    
    
    
    #change the origin to the top-left corner of the translated box
    orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

    mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
    canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
    img = canvas
    
    bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]
    
    
    bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)
    

    

    
    return img, bboxes

测试

来看看测试代码和结果，如果你要使用你自己的标注框，请一定要确定格式正确，格式已在part1中说明。

from data_aug.bbox_utils import *
import matplotlib.pyplot as plt 

scale = RandomScale(0.2, diff = True)  
translate = RandomTranslate(0.2, diff = True)

img, bboxes = translate(img, bboxes)
img,bboxes = scale(img, bboxes)

plt.imshow(draw_rect(img, bboxes))

最终结果如图

在这里插入图片描述