SSD-keras ulti_boxes 代码解读

本文详细解析了SSD网络在目标检测领域的应用,包括不同类型的boxes(如groundtruthbox、priorbox等)及其作用,同时介绍了IoU计算方法,并通过源代码展示了SSD网络的工作流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

SSD网络做目标检测领域,速度快,实时性好,而且检测精度也很高(mAP为74.3)。作为一次检测网络,效果是非常不错的。我们今天的主要任务就是解读ssd网络中的各种boxes及相关的源代码。

1.原文中出现的各种box解读

ssd之所以能预测图片的位置,是因为ssd对一幅图片的坐标(xmin,ymin,xmax,ymax)进行回归运算。任何一个图片的区域,由左上角top-left和右下角bottom-right即可唯一确定。

gournd truth box:是你训练集中,标注好的待检测类别的的位置;即真实的位置。

prior box:是在feature map上每一个点上生成的某一类别图片的位置。feature map没一点生成4或6个box(这个到底生成多少box,数量是事先指定的)。每一个box长宽比例是从[1,2,1/2,3,1/3]和sqrt(smax*smin)中选择。

default box:则是经过IoU阈值筛选后,剩下的可能性高的box。这个box才是会被真正送去回归(做回归前,先要对ground truth box进行相关的表变换,变换过程为:有顶点坐标式→(x-central,y-central,w,h)→再转换为对应特征图下的尺寸)。这个过程也被称为encode过程。原文中的转换公式如下:

2.部分算法原理介绍

IoU计算:2个矩形的交集怎么计算呢?其实很简单,只需要确定相交区域的左上角顶点和右下角顶点即可。

交集的左上角顶点确定方法如下:

Ixmin=max(xmin_A,xmin_B); xmin_A,xmin_B分别表示矩形区域A,B的左上角顶点横坐标,后面自行类比,不再赘述。

Ixmin=max(ymin_A,ymin_B);

Ixmax=min(ymin_A,ymin_B);

Ixmax=min(ymin_A,ymin_B);

自己看不懂,可以画个草图演算,看究竟是不是这样。

3.源码解读

"""Some utils for SSD."""

import numpy as np
import tensorflow as tf


class BBoxUtility(object):
    """Utility class to do some stuff with bounding boxes and priors.
    # Arguments
        num_classes: Number of classes including background.
        priors: Priors and variances, numpy tensor of shape (num_priors, 8),
            priors[i] = [xmin, ymin, xmax, ymax, varxc, varyc, varw, varh].
        overlap_threshold: Threshold to assign box to a prior.
        nms_thresh: Nms threshold.
        top_k: Number of total bboxes to be kept per image after nms step.
    # References
        https://arxiv.org/abs/1512.02325
    """
    # TODO add setter methods for nms_thresh and top_K
    def __init__(self, num_classes, priors=None, overlap_threshold=0.5,
                 nms_thresh=0.45, top_k=400):
        self.num_classes = num_classes #预测的类别数
        self.priors = priors
        self.num_priors = 0 if priors is None else len(priors)
        self.overlap_threshold = overlap_threshold
        self._nms_thresh = nms_thresh
        self._top_k = top_k 
        self.boxes = tf.placeholder(dtype='float32', shape=(None, 4))
        self.scores = tf.placeholder(dtype='float32', shape=(None,))
         #非极大值抑制
        self.nms = tf.image.non_max_suppression(self.boxes, self.scores,
                                                self._top_k,
                                                iou_threshold=self._nms_thresh) 
        self.sess = tf.Session(config=tf.ConfigProto(device_count={'GPU': 0}))

    @property
    def nms_thresh(self):
        return self._nms_thresh

    @nms_thresh.setter
    def nms_thresh(self, value):
        self._nms_thresh = value
        self.nms = tf.image.non_max_suppression(self.boxes, self.scores,
                                                self._top_k,
                                                iou_threshold=self._nms_thresh)

    @property
    def top_k(self):
        return self._top_k

    @top_k.setter
    def top_k(self, value):
        self._top_k = value
        self.nms = tf.image.non_max_suppression(self.boxes, self.scores,
                                                self._top_k,
                                                iou_threshold=self._nms_thresh)

    '''
    下面这个函数是计算IoU.计算IoU,需要计算两个矩形的交集和并集
    '''
    def iou(self, box):
        """Compute intersection over union for the box with all priors.
        # Arguments
            box: Box, numpy tensor of shape (4,).
        # Return
            iou: Intersection over union,
                numpy tensor of shape (num_priors).
        """
        # compute intersection
        inter_upleft = np.maximum(self.priors[:, :2], box[:2])#计算左顶点xmin,ymin
        inter_botright = np.minimum(self.priors[:, 2:4], box[2:])#右顶点xmax,ymax
        inter_wh = inter_botright - inter_upleft
        inter_wh = np.maximum(inter_wh, 0)
        inter = inter_wh[:, 0] * inter_wh[:, 1]
        # compute union
        area_pred = (box[2] - box[0]) * (box[3] - box[1])
        area_gt = (self.priors[:, 2] - self.priors[:, 0])
        area_gt *= (self.priors[:, 3] - self.priors[:, 1])
        union = area_pred + area_gt - inter
        # compute iou
        iou = inter / union
        return iou

    '''
    对box进行编码,实际上就是把gt box的坐标转换到feature map下,后面进行回归运算
    特别注意:这个转换分为两个步骤,在前面第1部分已经介绍过了
    '''
    def encode_box(self, box, return_iou=True):
        """Encode box for training, do it only for assigned priors.
        # Arguments
            box: Box, numpy tensor of shape (4,).
            return_iou: Whether to concat iou to encoded values.
        # Return
            encoded_box: Tensor with encoded box
                numpy tensor of shape (num_priors, 4 + int(return_iou)).
        """
        iou = self.iou(box)
        encoded_box = np.zeros((self.num_priors, 4 + return_iou))
        '''assign_mask为与iou shape 相同的tensor'''
        '''overlap_threshold为设定的iou阈值,一般为0.5'''
        assign_mask = iou > self.overlap_threshold
        '''assign_mask都为0,则把iou中最大的那一个取出来,即相应位置设为true'''
        if not assign_mask.any():'''assign_mask都为0,则把iou中最大的那一个取出来'''
            assign_mask[iou.argmax()] = True
        if return_iou:'''把他们连接起来,怎么连,这个句法我还没有弄懂'''
            encoded_box[:, -1][assign_mask] = iou[assign_mask]
        assigned_priors = self.priors[assign_mask]
        box_center = 0.5 * (box[:2] + box[2:]) '''计算x_central,ycentral'''
        box_wh = box[2:] - box[:2] '''计算W,H'''
        assigned_priors_center = 0.5 * (assigned_priors[:, :2] +
                                        assigned_priors[:, 2:4])'''default box'''
        assigned_priors_wh = (assigned_priors[:, 2:4] -
                              assigned_priors[:, :2])'''default box,计算W,H'''
        '''以下就是第一部分的转换公式'''
        # we encode variance
        encoded_box[:, :2][assign_mask] = box_center - assigned_priors_center
        encoded_box[:, :2][assign_mask] /= assigned_priors_wh
        encoded_box[:, :2][assign_mask] /= assigned_priors[:, -4:-2]
        encoded_box[:, 2:4][assign_mask] = np.log(box_wh /
                                                  assigned_priors_wh)
        encoded_box[:, 2:4][assign_mask] /= assigned_priors[:, -2:]
        return encoded_box.ravel()

    def assign_boxes(self, boxes):
        """Assign boxes to priors for training.
        # Arguments
            boxes: Box, numpy tensor of shape (num_boxes, 4 + num_classes),
                num_classes without background.
        # Return
            assignment: Tensor with assigned boxes,
                numpy tensor of shape (num_boxes, 4 + num_classes + 8),
                priors in ground truth are fictitious,
                assignment[:, -8] has 1 if prior should be penalized
                    or in other words is assigned to some ground truth box,
                assignment[:, -7:] are all 0. See loss for more details.
        """
        assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8))
        assignment[:, 4] = 1.0
        if len(boxes) == 0:
            return assignment
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)
        best_iou = encoded_boxes[:, :, -1].max(axis=0)
        best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)
        best_iou_mask = best_iou > 0
        best_iou_idx = best_iou_idx[best_iou_mask]
        assign_num = len(best_iou_idx)
        encoded_boxes = encoded_boxes[:, best_iou_mask, :]
        assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,
                                                         np.arange(assign_num),
                                                         :4]
        assignment[:, 4][best_iou_mask] = 0
        assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:]
        assignment[:, -8][best_iou_mask] = 1
        return assignment

    def decode_boxes(self, mbox_loc, mbox_priorbox, variances):
        """Convert bboxes from local predictions to shifted priors.
        # Arguments
            mbox_loc: Numpy array of predicted locations.
            mbox_priorbox: Numpy array of prior boxes.
            variances: Numpy array of variances.
        # Return
            decode_bbox: Shifted priors.
        """
        '''
        这一部分就是前面第一部分的公式,反求出预测的坐标,在原图中的位置
        注意:这里反求解过程中,乘以了variance。至于原因,暂不清楚
        最后的格式也是要由(x-cetral,ycentral,w,h)转换为(xmin,ymin,xmax,ymax)格式
        '''
        prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]
        prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]
        prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])
        prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])
        decode_bbox_center_x = mbox_loc[:, 0] * prior_width * variances[:, 0]
        decode_bbox_center_x += prior_center_x
        decode_bbox_center_y = mbox_loc[:, 1] * prior_width * variances[:, 1]
        decode_bbox_center_y += prior_center_y
        decode_bbox_width = np.exp(mbox_loc[:, 2] * variances[:, 2])
        decode_bbox_width *= prior_width
        decode_bbox_height = np.exp(mbox_loc[:, 3] * variances[:, 3])
        decode_bbox_height *= prior_height
        decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
        decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
        decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
        decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height
        decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
                                      decode_bbox_ymin[:, None],
                                      decode_bbox_xmax[:, None],
                                      decode_bbox_ymax[:, None]), axis=-1)
        decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
        return decode_bbox

    def detection_out(self, predictions, background_label_id=0, keep_top_k=200,
                      confidence_threshold=0.01):
        """Do non maximum suppression (nms) on prediction results.
        # Arguments
            predictions: Numpy array of predicted values.
            num_classes: Number of classes for prediction.
            background_label_id: Label of background class.
            keep_top_k: Number of total bboxes to be kept per image
                after nms step.
            confidence_threshold: Only consider detections,
                whose confidences are larger than a threshold.
        # Return
            results: List of predictions for every picture. Each prediction is:
                [label, confidence, xmin, ymin, xmax, ymax]
        """
        '''
        上面 prediction为ndarray形式的多位数组,第一维表示label,第二维为得分,
        第三维为预测坐标(4个值)
        '''
        mbox_loc = predictions[:, :, :4] '''取出坐标'''
        variances = predictions[:, :, -4:]'''这个好像与上面的内容一样'''
        mbox_priorbox = predictions[:, :, -8:-4]
        mbox_conf = predictions[:, :, 4:-8]
        results = []
        for i in range(len(mbox_loc)):
            results.append([])
            decode_bbox = self.decode_boxes(mbox_loc[i],
                                            mbox_priorbox[i], variances[i])
            for c in range(self.num_classes):
                if c == background_label_id:
                    continue
                c_confs = mbox_conf[i, :, c]
                c_confs_m = c_confs > confidence_threshold
                if len(c_confs[c_confs_m]) > 0:
                    boxes_to_process = decode_bbox[c_confs_m]
                    confs_to_process = c_confs[c_confs_m]
                    feed_dict = {self.boxes: boxes_to_process,
                                 self.scores: confs_to_process}
                    idx = self.sess.run(self.nms, feed_dict=feed_dict)
                    good_boxes = boxes_to_process[idx]
                    confs = confs_to_process[idx][:, None]
                    labels = c * np.ones((len(idx), 1))
                    c_pred = np.concatenate((labels, confs, good_boxes),
                                            axis=1)
                    results[-1].extend(c_pred)
            if len(results[-1]) > 0:
                results[-1] = np.array(results[-1])
                argsort = np.argsort(results[-1][:, 1])[::-1]
                results[-1] = results[-1][argsort]
                results[-1] = results[-1][:keep_top_k]
        return results

 

### MobileNet SSD Implementation in Keras Tutorial and Usage #### Overview of MobileNet SSD with Keras MobileNet SSD (Single Shot MultiBox Detector) is a highly efficient object detection model that combines the lightweight architecture of MobileNets with the fast single-shot detection framework. This combination allows for real-time object detection on mobile devices while maintaining reasonable accuracy levels[^1]. #### Key Components of MobileNet SSD Model The core components include: - **Base Network**: Utilizes MobileNet as the backbone feature extractor. - **Detection Layers**: Adds several convolutional layers on top of MobileNet to predict bounding boxes and class scores. #### Installation Requirements To implement MobileNet SSD using Keras, ensure these dependencies are installed: ```bash pip install tensorflow keras opencv-python h5py numpy matplotlib ``` #### Loading Pre-trained Weights For convenience, pre-trained weights can be loaded directly from public repositories or trained models available online. Here’s how one might load such a model: ```python from keras.models import load_model model_path = 'path_to_mobilenet_ssd_weights.hdf5' ssd_model = load_model(model_path) ``` #### Data Preparation Data preparation involves converting images into tensors suitable for feeding into the network. For instance: ```python import cv2 import numpy as np def preprocess_image(image_path): img = cv2.imread(image_path) resized_img = cv2.resize(img, (300, 300)) input_data = np.array(resized_img).astype('float32') input_data -= [123, 117, 104] # Mean subtraction based on ImageNet statistics input_data = np.expand_dims(input_data, axis=0) return input_data ``` #### Performing Inference Once everything is set up, performing inference becomes straightforward: ```python input_tensor = preprocess_image('example.jpg') predictions = ssd_model.predict(input_tensor) for pred in predictions[0]: confidence = pred[-1] if confidence >= 0.5: # Confidence threshold label_id = int(pred[-2]) box_coords = pred[:4] * [width, height, width, height] print(f'Detected {label_id} at coordinates {box_coords}') ``` --related questions-- 1. How does MobileNet differ from other CNN architectures used in object detection? 2. What optimizations techniques exist specifically for deploying MobileNet SSD on edge devices? 3. Can you provide an example of fine-tuning a pre-trained MobileNet SSD model on custom datasets within Keras? 4. Are there any specific considerations when choosing between TensorFlow and PyTorch implementations of MobileNet SSD?
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值