初窥Tensorflow Object Detection API 源码之(2.1.2)FasterRCNNMetaArch._predict_second_stage

该博客详细解析了Tensorflow Object Detection API中FasterRCNNMetaArch的预测过程,包括softmax处理、non_max_suppression用于选择有效box,以及stop_gradient的原因。此外,介绍了预处理groundtruth数据、将box坐标转换为归一化坐标,以及_mask_rcnn_box_predictor如何利用block4特征进行预测,包括_boxes_and_classes和_masks的预测。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

_postprocess_rpn

proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(rpn_box_encodings, rpn_objectness_predictions_with_background,anchors, image_shape)

softmax处理

rpn_objectness_softmax_without_background = tf.nn.softmax(
        rpn_objectness_predictions_with_background_batch)[:, :, 1]

non_max_suppression处理,输出有效boxs(绝对值坐标)、scores

(proposal_boxes, proposal_scores, _, _, _,
     num_proposals) = post_processing.batch_multiclass_non_max_suppression(
         tf.expand_dims(proposal_boxes, axis=2),
         tf.expand_dims(rpn_objectness_softmax_without_background,
                        axis=2),
         self._first_stage_nms_score_threshold,
         self._first_stage_nms_iou_threshold,
         self._first_stage_max_proposals,
         self._first_stage_max_proposals,
         clip_window=clip_window)

stop_gradient:

if self._is_training:
      proposal_boxes = tf.stop_gradient(proposal_boxes)

其中有一个stop_gradient的应用,原因:non_max_suppression丢弃了一部分box,而丢弃的那部分对反向传播没有作用

self._format_groundtruth_data:

if not self._hard_example_miner:
        (groundtruth_boxlists, groundtruth_classes_with_background_list,
         _) = self._format_groundtruth_data(true_image_shapes)

由于对输入的图像进行了预处理,那么它的groundtruth数据也要进行相应的预处理。


self._unpad_proposals_and_sample_box_classifier_batch:

(proposal_boxes, proposal_scores,
         num_proposals) = self._unpad_proposals_and_sample_box_classifier_batch(
             proposal_boxes, proposal_scores, num_proposals,
             groundtruth_boxlists, groundtruth_classes_with_background_list)

to_normalized_coordinates

def normalize_boxes(args):
      proposal_boxes_per_image = args[0]
      image_shape = args[1]
      normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
          box_list.BoxList(proposal_boxes_per_image), image_shape[0],
          image_shape[1], check_range=False).get()
      return normalized_boxes_per_image
normalized_proposal_boxes = shape_utils.static_or_dynamic_map_fn(
    normalize_boxes, elems=[proposal_boxes, image_shapes], dtype=tf.float32)
return normalized_proposal_boxes, proposal_scores, num_proposals

Converts absolute box coordinates to normalized coordinates in [0, 1].
Usually one uses the dynamic shape of the image or conv-layer tensor:

boxlist = box_list_ops.to_normalized_coordinates(boxlist,
                                                tf.shape(images)[1],
                                                tf.shape(images)[2])

内部一些rehshape,expand_dims好多,看得脑袋都要炸了……

self._compute_second_stage_input_feature_maps

flattened_proposal_feature_maps = (self._compute_second_stage_input_feature_maps(rpn_features_to_crop, proposal_boxes_normalized))

按#1中输出的proposal_boxes和block3输出,从rpn_features_to_crop裁剪出并resize ,然后slim.max_pool2d

max_pooling

self._feature_extractor.extract…

box_classifier_features = (self._feature_extractor.extract_box_classifier_features(flattened_proposal_feature_maps,scope=self.second_stage_feature_extractor_scope))

block4

终于找到自建的block4用在哪儿了……

self._mask_rcnn_box_predictor.predict

box_predictions = self._mask_rcnn_box_predictor.predict(box_classifier_features,num_predictions_per_location=1,scope=self.second_stage_box_predictor_scope)

这里使用了_mask_rcnn_box_predictor
没有想到Faster R-CNN里还使用了Mask RCNN,挺意外的
一路追踪,找到MaskRCNNBoxPredictor类,里面有相关的代码
类中有几个关键函数:

_predict_boxes_and_classes

with slim.arg_scope(self._fc_hyperparams):
      box_encodings = slim.fully_connected(
          flattened_image_features,
          self._num_classes * self._box_code_size,
          activation_fn=None,
          scope='BoxEncodingPredictor')
      class_predictions_with_background = slim.fully_connected(
          flattened_image_features,
          self._num_classes + 1,
          activation_fn=None,
          scope='ClassPredictor')

box infer

_predict_masks

由于现在是在使用Faster RCNN,这个函数实际并没有被调用

_predict

通过调用上面的两个函数完成推理

if predict_boxes_and_classes:
      (box_encodings, class_predictions_with_background
      ) = self._predict_boxes_and_classes(image_feature)
      predictions_dict[BOX_ENCODINGS] = box_encodings
      predictions_dict[
          CLASS_PREDICTIONS_WITH_BACKGROUND] = class_predictions_with_background
if self._predict_instance_masks and predict_auxiliary_outputs:
      predictions_dict[MASK_PREDICTIONS] = self._predict_masks(image_feature)

返回prediction_dict

期间多次使用tf.squeeze对数据进行重整

prediction_dict = {
        'refined_box_encodings': refined_box_encodings,
        'class_predictions_with_background':
        class_predictions_with_background,
        'num_proposals': num_proposals,
        'proposal_boxes': absolute_proposal_boxes,
        'box_classifier_features': box_classifier_features,
        'proposal_boxes_normalized': proposal_boxes_normalized,
    }
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值