[翻系列]检测框的数据增强2:Scale与Translate
原文链接:https://blog.paperspace.com
本章我们将实现位移以及大小变换技术,还考虑到标准框超出图像的时如何处理,很鲁棒,放心用!
代码地址
本章用的方法和所有数据增强方法都放在下面这个链接中
https://github.com/Paperspace/DataAugmentationForObjectDetection
Scale:大小变换
通过scale之后结果大致如图所示
设计细节
- 第一件事我们要想想scale函数的参数。显而易见,我们要先问放大倍数是多少,它必然要大于-1也就是不能小于0
- 你的第一反应是不是scale只是一个?那是不是代表长\宽是同时放大缩小,但是我们如果引入一个参数
diff
来判断是否改变长宽比,我们就可以做到让长\宽各自任意地缩放。 - 我们得到一个比例因子,如果使用者可以提供比例因子将被采样的范围,程序将进行抽取。但使用者仅提供一个小数且必须为正,那么采样的范围将是(-小数,小数)
让我们先定义__init__
函数
class RandomScale(object):
"""Randomly scales an image
Bounding boxes which have an area of less than 25% in the remaining in the
transformed image is dropped. The resolution is maintained, and the remaining
area if any is filled by black color.
Parameters
----------
scale: float or tuple(float)
if **float**, the image is scaled by a factor drawn
randomly from a range (1 - `scale` , 1 + `scale`). If **tuple**,
the `scale` is drawn randomly from values specified by the
tuple
Returns
-------
numpy.ndaaray
Scaled image in the numpy format of shape `HxWxC`
numpy.ndarray
Tranformed bounding box co-ordinates of the format `n x 4` where n is
number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
"""
def __init__(self, scale = 0.2, diff = False):
self.scale = scale
if type(self.scale) == tuple:
assert len(self.scale) == 2, "Invalid range"
assert self.scale[0] > -1, "Scale factor can't be less than -1"
assert self.scale[1] > -1, "Scale factor can't be less than -1"
else:
assert self.scale > 0, "Please input a positive float"
self.scale = (max(-1, -self.scale), self.scale)
self.diff = diff
多说一句,请不要在__init__
函数中确定你的比例因子,因为一旦在这里确定,就失去了随机变换的功能。
把这项工作放在__call__
中!
增强逻辑
图像的变换其实很简单,使用cv2.resize
一步到位,我们要做的仅仅是对检测框的调整。
img_shape = img.shape
if self.diff:
# 实现了长宽不同比例的变换
scale_x = random.uniform(*self.scale)
scale_y = random.uniform(*self.scale)
else:
scale_x = random.uniform(*self.scale)
scale_y = scale_x
resize_scale_x = 1 + scale_x
resize_scale_y = 1 + scale_y
img= cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)
bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]
要注意到的是,输入多大的图片输出大小仍不变,如果比例因子为负,也就是图片缩小,将使用0像素点进行填充,将像上面的示例一样。
上面的代码实现了变换,下面将保证图片与原输入大小相同,沉下心慢慢看!
首先我们先创建一个和输入相同大小的空图像,也就是画布。
canvas = np.zeros(img_shape, dtype = np.uint8)
之后,把变换后的图片放到画布里。图像部分就搞定了!
y_lim = int(min(resize_scale_y,1)*img_shape[0])
x_lim = int(min(resize_scale_x,1)*img_shape[1])
canvas[:y_lim,:x_lim,:] = img[:y_lim,:x_lim,:]
img = canvas
检测框修正
最后一件事就是保证检测框的正确,举个例子,假设比例因子为0.1,也就是图像要放大1.1倍,这样子将如下图导致足球消失,那么应该怎么做呢?
不对啊!难道这个在其他增强中不也会出现么!所以很重要!
因此我们在bbox_utils.py
中定义了clip_box
来保证检测框在图像范围内。我们用一个可控制的参数代表百分比,用此参数与边界框在图像的百分比比较来判断是否去掉检测框。
定义如下
def clip_box(bbox, clip_box, alpha):
"""Clip the bounding boxes to the borders of an image
Parameters
----------
bbox: numpy.ndarray
Numpy array containing bounding boxes of shape `N X 4` where N is the
number of bounding boxes and the bounding boxes are represented in the
format `x1 y1 x2 y2`
clip_box: numpy.ndarray
An array of shape (4,) specifying the diagonal co-ordinates of the image
The coordinates are represented in the format `x1 y1 x2 y2`
alpha: float
If the fraction of a bounding box left in the image after being clipped is
less than `alpha` the bounding box is dropped.
Returns
-------
numpy.ndarray
Numpy array containing **clipped** bounding boxes of shape `N X 4` where N is the
number of bounding boxes left are being clipped and the bounding boxes are represented in the
format `x1 y1 x2 y2`
"""
ar_ = (bbox_area(bbox))
x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)
y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)
x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)
y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)
bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))
delta_area = ((ar_ - bbox_area(bbox))/ar_)
mask = (delta_area < (1 - alpha)).astype(int)
bbox = bbox[mask == 1,:]
return bbox
上面这个函数就完成了我们要求的功能。
如果不想理解看源码也没问题,我们只需简单地在数据增强类中的__call__
调用它就好了,这里设置了重合小于25%删去检测框,img_shape
是变换后图像的大小。
bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)
为了计算框的面积,还额外定义了面积计算函数bbox_area
最后我们的__call__
如下
def __call__(self, img, bboxes):
#Chose a random digit to scale by
img_shape = img.shape
if self.diff:
scale_x = random.uniform(*self.scale)
scale_y = random.uniform(*self.scale)
else:
scale_x = random.uniform(*self.scale)
scale_y = scale_x
resize_scale_x = 1 + scale_x
resize_scale_y = 1 + scale_y
img= cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)
bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]
canvas = np.zeros(img_shape, dtype = np.uint8)
y_lim = int(min(resize_scale_y,1)*img_shape[0])
x_lim = int(min(resize_scale_x,1)*img_shape[1])
print(y_lim, x_lim)
canvas[:y_lim,:x_lim,:] = img[:y_lim,:x_lim,:]
img = canvas
bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)
return img, bboxes
Translate: 位移
接下来讲讲位移
同样,我们也需要一个比例因子来确定图片移动的距离。
当然,这时候就要保证比例因子不仅要大于-1,还要大于1,否则得到的图像不就成空白了嘛!
所以初始化函数主要做了这个工作。
class RandomTranslate(object):
"""Randomly Translates the image
Bounding boxes which have an area of less than 25% in the remaining in the
transformed image is dropped. The resolution is maintained, and the remaining
area if any is filled by black color.
Parameters
----------
translate: float or tuple(float)
if **float**, the image is translated by a factor drawn
randomly from a range (1 - `translate` , 1 + `translate`). If **tuple**,
`translate` is drawn randomly from values specified by the
tuple
Returns
-------
numpy.ndaaray
Translated image in the numpy format of shape `HxWxC`
numpy.ndarray
Tranformed bounding box co-ordinates of the format `n x 4` where n is
number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
"""
def __init__(self, translate = 0.2, diff = False):
self.translate = translate
if type(self.translate) == tuple:
assert len(self.translate) == 2, "Invalid range"
assert self.translate[0] > 0 & self.translate[0] < 1
assert self.translate[1] > 0 & self.translate[1] < 1
else:
assert self.translate > 0 & self.translate < 1
self.translate = (-self.translate, self.translate)
self.diff = diff
增强逻辑
位移比尺寸变换有更多的细节,且听娓娓道来!
首先我们设置一些变量。
def __call__(self, img, bboxes):
#Chose a random digit to scale by
img_shape = img.shape
#translate the image
#percentage of the dimension of the image to translate
translate_factor_x = random.uniform(*self.translate)
translate_factor_y = random.uniform(*self.translate)
if not self.diff:
translate_factor_y = translate_factor_x
当然,位移也会产生多余的空间,我们仍用空白填充,如果你刚刚认真学习了的话,相信对你不成问题。
跟之前一样,先定义画布
canvas = np.zeros(img_shape)
现在有两个问题:
- 我们需要得到输入的哪一部分?
- 我们要把得到的这一部分放在画布的哪个位置?
首先我们来确定输入被保留的部分,也就是上方左图紫色区域。
#get the top-left corner co-ordinates of the shifted image
corner_x = int(translate_factor_x*img.shape[1])
corner_y = int(translate_factor_y*img.shape[0])
mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
然后把这一部分放在画布中,代码很精彩,一定要看!
orig_box_cords = [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]
canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
img = canvas
代码看懂了嘛,接下来就别紧张了,检测框调整很简单,只用加上移动的距离就好,不过记得用clip_bbox
呢!
bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]
bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)
所以__call__
定义如下
def __call__(self, img, bboxes):
#Chose a random digit to scale by
img_shape = img.shape
#translate the image
#percentage of the dimension of the image to translate
translate_factor_x = random.uniform(*self.translate)
translate_factor_y = random.uniform(*self.translate)
if not self.diff:
translate_factor_y = translate_factor_x
canvas = np.zeros(img_shape).astype(np.uint8)
corner_x = int(translate_factor_x*img.shape[1])
corner_y = int(translate_factor_y*img.shape[0])
#change the origin to the top-left corner of the translated box
orig_box_cords = [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]
mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
img = canvas
bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]
bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)
return img, bboxes
测试
来看看测试代码和结果,如果你要使用你自己的标注框,请一定要确定格式正确,格式已在part1中说明。
from data_aug.bbox_utils import *
import matplotlib.pyplot as plt
scale = RandomScale(0.2, diff = True)
translate = RandomTranslate(0.2, diff = True)
img, bboxes = translate(img, bboxes)
img,bboxes = scale(img, bboxes)
plt.imshow(draw_rect(img, bboxes))
最终结果如图
快乐的学习时光又要结束了!下次见!
等等,你有想过,先位移再尺寸变换,和先尺寸变换再位移的不同么?考虑考虑嘛!
下次我们将来说说旋转rotation和变形shearing。再见!