深度学习——SSD目标检测网络源码学习之图像预处理

最新推荐文章于 2024-04-20 00:11:59 发布

原创最新推荐文章于 2024-04-20 00:11:59 发布 · 2k 阅读

14 ·

CC 4.0 BY-SA版权

tensorflow深度学习同时被 2 个专栏收录

6 篇文章

订阅专栏

目标检测

3 篇文章

订阅专栏

本文详细解析了SSD网络中的图像预处理步骤，包括训练数据读取、图像增强如裁剪、翻转及颜色变化，以及目标框信息的更新方法。特别介绍了剪裁图像后目标框坐标调整和筛选过程。

抽空把这个网络细究一下，希望大佬指正~~

大致理解：SSD网络抽取不同的特征图，每个特征图可以看成是一个网格图，每个点即是一个锚点，以锚点为中心，可以生成不同大小和比例的anchor，这些anchor都是可能的目标。目标检测网络分为目标定位和分类两个部分，分类很简单，就是在每个特征图上的每个点的每个anchor都进行分类，SSD网络中把背景也单独分成了一类，至于定位，就涉及到了边框回归问题（bounding-box regression）。边框回归最早出现在R-CNN中，其意思就是，我们的网络可以在每个特征图每个点的每个anchor上预估一个边框回归的值，而真实的边框回归值也可以根据真实目标所在的位置计算出来，进而计算估计值和真实值之间的偏差。

1.读取训练数据

源码中的训练数据读取在datasets文件夹中。数据读取为以下几行代码

dataset_dir ='./datasets/train/'
dataset_split_name = 'train'
dataset_name = 'pascalvoc_2012'
dataset = dataset_factory.get_dataset(dataset_name, dataset_split_name, dataset_dir)

进入dataset_factory可以看到有cifar,imagenet和pascalvoc三种数据集。这里我选择的是voc2012数据集的格式来制作和读取训练数据，源码中voc生成tfrecord的过程非常简单，这里不多描述。

可以进入pascalvoc2012.py查看相应的配置。再看其中的get_split函数，主要是用到了slim.dataset.Dataset()函数来读取数据。具体可以参考tensorflow从磁盘读取数据

2.数据预处理

深度学习图像处理上的预处理主要是做一些图像增广，通常的操作是裁剪、随机亮度、随机对比度、白化等，SSD中，由于涉及到了目标框的标注，因此在剪裁图像之后需要对目标框的信息进行相应的改变。

源码中数据处理在preprocessing文件夹中，其中，preprocessing_factory.py用于选择使用何种预处理方式。

看源码时，发现python有直接返回函数的用法：

def get_preprocessing(name, is_training=False):
    
    preprocessing_fn_map = {
        'ssd_300_vgg': ssd_vgg_preprocessing,
        'ssd_512_vgg': ssd_vgg_preprocessing,
    }

    if name not in preprocessing_fn_map:
        raise ValueError('Preprocessing name [%s] was not recognized' % name)

    def preprocessing_fn(image, labels, bboxes,
                         out_shape, data_format='NHWC', **kwargs):
        return preprocessing_fn_map[name].preprocess_image(
            image, labels, bboxes, out_shape, data_format=data_format,
            is_training=is_training, **kwargs)
    return preprocessing_fn

百度了一下这种用法，可以理解为把函数也看成了一个类，因此在外部调用get_preprocessing时，返回的是preprocessing_fn这一个类的实例化：

image_preprocessing_fn = preprocessing_factory.get_preprocessing(
        preprocessing_name, is_training=True)

个人觉得这种方式可以延迟函数的调用，便于在执行前检查参数，但是看源码时这层层调用确实让人懵逼，具体解释可以围观知乎：Python 里为什么函数可以返回一个函数内部定义的函数？

再看具体的图像预处理方法，返回的也是一个函数preprocess_image，可以看到训练和验证时，预处理方法是不一样的。

def preprocess_image(image,
                     labels,
                     bboxes,
                     out_shape,
                     data_format,
                     is_training=False,
                     **kwargs):

    if is_training:
        return preprocess_for_train(image, labels, bboxes,
                                    out_shape=out_shape,
                                    data_format=data_format)
    else:
        return preprocess_for_eval(image, labels, bboxes,
                                   out_shape=out_shape,
                                   data_format=data_format,
                                   **kwargs)

训练时的预处理流程：1.剪裁图像；2.随机左右翻转；3.颜色改变；4.白化。这里麻烦一点的就是剪裁图像和翻转之后，bboxes都要进行相应的改变。

先看剪裁图像：

def distorted_bounding_box_crop(image,
                                labels,
                                bboxes,
                                min_object_covered=0.3,
                                aspect_ratio_range=(0.9, 1.1),
                                area_range=(0.1, 1.0),
                                max_attempts=200,
                                clip_bboxes=True,
                                scope=None):
    
    with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bboxes]):
        # Each bounding box has shape [1, num_boxes, box coords] and
        # the coordinates are ordered [ymin, xmin, ymax, xmax].
        # 生成用于剪裁图像的边界框，用作重新计算bbox的参考，bbox_begin是左上角点
        bbox_begin, bbox_size, distort_bbox = tf.image.sample_distorted_bounding_box(
                tf.shape(image),
                bounding_boxes=tf.expand_dims(bboxes, 0),
                min_object_covered=min_object_covered,
                aspect_ratio_range=aspect_ratio_range,
                area_range=area_range,
                max_attempts=max_attempts,
                use_image_if_no_bounding_boxes=True)
        # 上面返回的distort_bbox维度为[1,1,4],所以这里要重新取出
        distort_bbox = distort_bbox[0, 0]

        # Crop the image to the specified bounding box.
        cropped_image = tf.slice(image, bbox_begin, bbox_size)
        # Restore the shape since the dynamic slice loses 3rd dimension.
        cropped_image.set_shape([None, None, 3])

        # Update bounding boxes: resize and filter out.
        bboxes = tfe.bboxes_resize(distort_bbox, bboxes)
        labels, bboxes = tfe.bboxes_filter_overlap(labels, bboxes,
                                                   threshold=BBOX_CROP_OVERLAP,
                                                   assign_negative=False)
        return cropped_image, labels, bboxes, distort_bbox

bbox的更新在tf_extended中，首先是更新bbox的坐标点：

def bboxes_resize(bbox_ref, bboxes, name=None):
    
    # Bboxes is dictionary.
    if isinstance(bboxes, dict):
        with tf.name_scope(name, 'bboxes_resize_dict'):
            d_bboxes = {}
            for c in bboxes.keys():
                d_bboxes[c] = bboxes_resize(bbox_ref, bboxes[c])
            return d_bboxes

    # Tensors inputs.
    with tf.name_scope(name, 'bboxes_resize'):
        # Translate.
        # 相当于是把原点从[0,0]变换到了[bbox_ref[0], bbox_ref[1]]
        v = tf.stack([bbox_ref[0], bbox_ref[1], bbox_ref[0], bbox_ref[1]])
        bboxes = bboxes - v
        # Scale.
        # 重新计算归一化的尺度
        s = tf.stack([bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1],
                      bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1]])
        bboxes = bboxes / s
        return bboxes

然后判断有的目标是否被剪裁得太厉害，要不要保留：

def bboxes_filter_overlap(labels, bboxes,
                          threshold=0.5, assign_negative=False,
                          scope=None):

    with tf.name_scope(scope, 'bboxes_filter', [labels, bboxes]):
        # bbox被裁后，保留的部分与原来的面积比
        scores = bboxes_intersection(tf.constant([0, 0, 1, 1], bboxes.dtype),
                                     bboxes)
        mask = scores > threshold
        # 保留所有的label和框，重叠区不够的label置负
        if assign_negative:
            labels = tf.where(mask, labels, -labels)
            # bboxes = tf.where(mask, bboxes, bboxes)
        # 删除重叠区不够的label和框
        else:
            labels = tf.boolean_mask(labels, mask)
            bboxes = tf.boolean_mask(bboxes, mask)
        return labels, bboxes

def bboxes_intersection(bbox_ref, bboxes, name=None):
    with tf.name_scope(name, 'bboxes_intersection'):
        # Should be more efficient to first transpose.
        bboxes = tf.transpose(bboxes)
        bbox_ref = tf.transpose(bbox_ref)
        # Intersection bbox and volume.
        int_ymin = tf.maximum(bboxes[0], bbox_ref[0])
        int_xmin = tf.maximum(bboxes[1], bbox_ref[1])
        int_ymax = tf.minimum(bboxes[2], bbox_ref[2])
        int_xmax = tf.minimum(bboxes[3], bbox_ref[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        inter_vol = h * w
        bboxes_vol = (bboxes[2] - bboxes[0]) * (bboxes[3] - bboxes[1])
        scores = tfe_math.safe_divide(inter_vol, bboxes_vol, 'intersection')
        return scores

再看水平翻转，其实也就是在x方向上，将x变换为1-x：

def random_flip_left_right(image, bboxes, seed=None):
    """Random flip left-right of an image and its bounding boxes.
    """
    def flip_bboxes(bboxes):
        """Flip bounding boxes coordinates.
        """
        bboxes = tf.stack([bboxes[:, 0], 1 - bboxes[:, 3],
                           bboxes[:, 2], 1 - bboxes[:, 1]], axis=-1)
        return bboxes

    # Random flip. Tensorflow implementation.
    with tf.name_scope('random_flip_left_right'):
        image = ops.convert_to_tensor(image, name='image')
        _Check3DImage(image, require_static=False)
        # 随机生成0-1之间的数，与0.5判断
        uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
        mirror_cond = math_ops.less(uniform_random, .5)
        # Flip image.
        # control_flow_ops.cond相当于if-else语句
        result = control_flow_ops.cond(mirror_cond,
                                       lambda: array_ops.reverse_v2(image, [1]),
                                       lambda: image)
        # Flip bboxes.
        bboxes = control_flow_ops.cond(mirror_cond,
                                       lambda: flip_bboxes(bboxes),
                                       lambda: bboxes)
        return fix_image_flip_shape(image, result), bboxes

另外的两种方法都比较简单，在此就不多做描述。验证时的预处理，主要是在没有目标时，加进去了一个原图大小的框，预处理采用的方式也是剪裁、白化等，在看验证代码时再进行补充。至此，ssd网络中的图像预处理部分就结束了。