[翻系列]检测框的数据增强2:Scale与Translate

[翻系列]检测框的数据增强2:Scale与Translate

原文链接:https://blog.paperspace.com

在这里插入图片描述

本章我们将实现位移以及大小变换技术,还考虑到标准框超出图像的时如何处理,很鲁棒,放心用!

代码地址

本章用的方法和所有数据增强方法都放在下面这个链接中

https://github.com/Paperspace/DataAugmentationForObjectDetection

Scale:大小变换

通过scale之后结果大致如图所示

在这里插入图片描述

左:原始图像 右:scale后的图像

设计细节

  • 第一件事我们要想想scale函数的参数。显而易见,我们要先问放大倍数是多少,它必然要大于-1也就是不能小于0
  • 你的第一反应是不是scale只是一个?那是不是代表长\宽是同时放大缩小,但是我们如果引入一个参数diff来判断是否改变长宽比,我们就可以做到让长\宽各自任意地缩放。
  • 我们得到一个比例因子,如果使用者可以提供比例因子将被采样的范围,程序将进行抽取。但使用者仅提供一个小数且必须为正,那么采样的范围将是(-小数,小数)

让我们先定义__init__函数

class RandomScale(object):
    """Randomly scales an image    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    scale: float or tuple(float)
        if **float**, the image is scaled by a factor drawn 
        randomly from a range (1 - `scale` , 1 + `scale`). If **tuple**,
        the `scale` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Scaled image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, scale = 0.2, diff = False):
        self.scale = scale

        
        if type(self.scale) == tuple:
            assert len(self.scale) == 2, "Invalid range"
            assert self.scale[0] > -1, "Scale factor can't be less than -1"
            assert self.scale[1] > -1, "Scale factor can't be less than -1"
        else:
            assert self.scale > 0, "Please input a positive float"
            self.scale = (max(-1, -self.scale), self.scale)
        
        self.diff = diff

多说一句,请不要在__init__函数中确定你的比例因子,因为一旦在这里确定,就失去了随机变换的功能。

把这项工作放在__call__中!

增强逻辑

图像的变换其实很简单,使用cv2.resize一步到位,我们要做的仅仅是对检测框的调整。

img_shape = img.shape
        
if self.diff:
    # 实现了长宽不同比例的变换
	scale_x = random.uniform(*self.scale)
	scale_y = random.uniform(*self.scale)
else:
	scale_x = random.uniform(*self.scale)
	scale_y = scale_x

resize_scale_x = 1 + scale_x
resize_scale_y = 1 + scale_y

img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]

要注意到的是,输入多大的图片输出大小仍不变,如果比例因子为负,也就是图片缩小,将使用0像素点进行填充,将像上面的示例一样。

上面的代码实现了变换,下面将保证图片与原输入大小相同,沉下心慢慢看!

首先我们先创建一个和输入相同大小的空图像,也就是画布。

canvas = np.zeros(img_shape, dtype = np.uint8)

之后,把变换后的图片放到画布里。图像部分就搞定了!

y_lim = int(min(resize_scale_y,1)*img_shape[0])
x_lim = int(min(resize_scale_x,1)*img_shape[1])

canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

img = canvas

检测框修正

最后一件事就是保证检测框的正确,举个例子,假设比例因子为0.1,也就是图像要放大1.1倍,这样子将如下图导致足球消失,那么应该怎么做呢?

在这里插入图片描述

足球在放大中被删去

不对啊!难道这个在其他增强中不也会出现么!所以很重要!

因此我们在bbox_utils.py中定义了clip_box来保证检测框在图像范围内。我们用一个可控制的参数代表百分比,用此参数与边界框在图像的百分比比较来判断是否去掉检测框。

定义如下

def clip_box(bbox, clip_box, alpha):
    """Clip the bounding boxes to the borders of an image
    
    Parameters
    ----------
    
    bbox: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
    
    clip_box: numpy.ndarray
        An array of shape (4,) specifying the diagonal co-ordinates of the image
        The coordinates are represented in the format `x1 y1 x2 y2`
        
    alpha: float
        If the fraction of a bounding box left in the image after being clipped is 
        less than `alpha` the bounding box is dropped. 
    
    Returns
    -------
    
    numpy.ndarray
        Numpy array containing **clipped** bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes left are being clipped and the bounding boxes are represented in the
        format `x1 y1 x2 y2` 
    
    """
    ar_ = (bbox_area(bbox))
    x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)
    y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)
    x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)
    y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)
    
    bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))
    
    delta_area = ((ar_ - bbox_area(bbox))/ar_)
    
    mask = (delta_area < (1 - alpha)).astype(int)
    
    bbox = bbox[mask == 1,:]


    return bbox

上面这个函数就完成了我们要求的功能。

如果不想理解看源码也没问题,我们只需简单地在数据增强类中的__call__调用它就好了,这里设置了重合小于25%删去检测框,img_shape是变换后图像的大小。

bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)

为了计算框的面积,还额外定义了面积计算函数bbox_area

最后我们的__call__如下

def __call__(self, img, bboxes):


	#Chose a random digit to scale by 

	img_shape = img.shape

	if self.diff:
		scale_x = random.uniform(*self.scale)
		scale_y = random.uniform(*self.scale)
	else:
		scale_x = random.uniform(*self.scale)
		scale_y = scale_x

    resize_scale_x = 1 + scale_x
    resize_scale_y = 1 + scale_y

    img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

    bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]



    canvas = np.zeros(img_shape, dtype = np.uint8)

    y_lim = int(min(resize_scale_y,1)*img_shape[0])
    x_lim = int(min(resize_scale_x,1)*img_shape[1])

    print(y_lim, x_lim)

    canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

    img = canvas
    bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)


    return img, bboxes

Translate: 位移

接下来讲讲位移

在这里插入图片描述

同样,我们也需要一个比例因子来确定图片移动的距离。

当然,这时候就要保证比例因子不仅要大于-1,还要大于1,否则得到的图像不就成空白了嘛!

所以初始化函数主要做了这个工作。

class RandomTranslate(object):
    """Randomly Translates the image    
    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    translate: float or tuple(float)
        if **float**, the image is translated by a factor drawn 
        randomly from a range (1 - `translate` , 1 + `translate`). If **tuple**,
        `translate` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Translated image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, translate = 0.2, diff = False):
        self.translate = translate
        
        if type(self.translate) == tuple:
            assert len(self.translate) == 2, "Invalid range"  
            assert self.translate[0] > 0 & self.translate[0] < 1
            assert self.translate[1] > 0 & self.translate[1] < 1


        else:
            assert self.translate > 0 & self.translate < 1
            self.translate = (-self.translate, self.translate)
            
            
        self.diff = diff

增强逻辑

位移比尺寸变换有更多的细节,且听娓娓道来!

首先我们设置一些变量。

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape

    #translate the image

    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
    translate_factor_y = random.uniform(*self.translate)

    if not self.diff:
    translate_factor_y = translate_factor_x

当然,位移也会产生多余的空间,我们仍用空白填充,如果你刚刚认真学习了的话,相信对你不成问题。

跟之前一样,先定义画布

canvas = np.zeros(img_shape) 

现在有两个问题:

  1. 我们需要得到输入的哪一部分?
  2. 我们要把得到的这一部分放在画布的哪个位置?

在这里插入图片描述

左: 将保留的输入部分 右:放在画布的位置

首先我们来确定输入被保留的部分,也就是上方左图紫色区域。

#get the top-left corner co-ordinates of the shifted image 
corner_x = int(translate_factor_x*img.shape[1])
corner_y = int(translate_factor_y*img.shape[0])

mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]

然后把这一部分放在画布中,代码很精彩,一定要看!

orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
img = canvas

代码看懂了嘛,接下来就别紧张了,检测框调整很简单,只用加上移动的距离就好,不过记得用clip_bbox呢!

bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]

bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)

所以__call__定义如下

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape
    
    #translate the image
    
    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
    translate_factor_y = random.uniform(*self.translate)
    
    if not self.diff:
        translate_factor_y = translate_factor_x
        
    canvas = np.zeros(img_shape).astype(np.uint8)


    corner_x = int(translate_factor_x*img.shape[1])
    corner_y = int(translate_factor_y*img.shape[0])
    
    
    
    #change the origin to the top-left corner of the translated box
    orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

    mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
    canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
    img = canvas
    
    bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]
    
    
    bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)
    

    

    
    return img, bboxes

测试

来看看测试代码和结果,如果你要使用你自己的标注框,请一定要确定格式正确,格式已在part1中说明。

from data_aug.bbox_utils import *
import matplotlib.pyplot as plt 

scale = RandomScale(0.2, diff = True)  
translate = RandomTranslate(0.2, diff = True)

img, bboxes = translate(img, bboxes)
img,bboxes = scale(img, bboxes)

plt.imshow(draw_rect(img, bboxes))

最终结果如图

在这里插入图片描述

快乐的学习时光又要结束了!下次见!

等等,你有想过,先位移再尺寸变换,和先尺寸变换再位移的不同么?考虑考虑嘛!

下次我们将来说说旋转rotation和变形shearing。再见!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值