有了前面Faster R-CNN的基础,RFCN就比较容易了。
"""object_detection/meta_architectures/rfcn_meta_arch.py
The R-FCN meta architecture is similar to Faster R-CNN and only differs in the
second stage. Hence this class inherits FasterRCNNMetaArch and overrides only
the `_predict_second_stage` method
"""
改动比较大的地方如下:
box_predictions = self._rfcn_box_predictor.predict(
box_classifier_features,
num_predictions_per_location=1,
scope=self.second_stage_box_predictor_scope,
proposal_boxes=proposal_boxes_normalized)
只改动了这么一点的原因是代码实现和原始paper不是完全一致的,见作者的paper:3.4. Training and hyperparameter tuning。
另外注意,在faster r-cnn中,是先经过ROI poooling,然后进入第二阶段特征提取器。而在rfcn中,是先进行第二阶段特征提取,然后进入RfcnBoxPredictor。这正是frcn改进的地方,即将卷积操作尽量在faster r-cnn的roi之间共享,使得rfcn得到的(更高层面的)roi需要单独经过的预测层更少,大大提高了效率。
"""先回顾一下faster rcnn: ROI pooling"""
实现代码在
def _compute_second_stage_input_feature_maps(self, features_to_crop,
proposal_boxes_normalized)
ROI就是features_to_crop(第一阶段特征提取器得到的feature map)上的一块crop,是依据proposal_boxed截取的。ROI pooling是spp layer的特殊情况,就是通过自适应大小的卷积将特征图映射到固定尺寸,以便进入fc层。tf代码作者采用的实现不是这种,而是将不同尺寸的ROI先变成统一的大小,然后就不需要进行ROI pooling了(见_compute_second_stage_input_feature_maps),换言之,需要ROI POOLING的原因就是ROI的尺寸是不同的(RFCN PAPER的理解角度:this region-specific operation breaks down translation invariance, and the post-RoI convolutional layers are no longer translation-invariant when evaluated across different regions.)。
"""rfcn的基本想法上面分析对比过了,仔细看看rfcn paper中的三张大图就懂什么是position-sensitive了。为什么需要用position-sensitive RoI pooling呢?paper中说是为了将translation variance包含进FCN中。但是从position-sensitive score maps的生成来看,似乎类似显示地引入了一种“先验结构”,如上图的fig 3,怎么就能确保这九个score map刚好就是对应九个位置呢(For example, the “top-center-sensitive” score map exhibits high scores roughly near the top-center position of an object.)?
position-sensitive score maps
position-sensitive RoI pooling layer
"""
在Tensorflow object detection API中的实现有所不同:
class RfcnBoxPredictor(BoxPredictor)
Applies a position sensitve ROI pooling on position sensitive feature maps to
predict classes and refined locations
ops.position_sensitive_crop_regions
"""Position-sensitive crop and pool rectangular regions from a feature grid.
The output crops are split into `spatial_bins_y` vertical bins
and `spatial_bins_x` horizontal bins. For each intersection of a vertical
and a horizontal bin the output values are gathered by performing
`tf.image.crop_and_resize` (bilinear resampling) on a a separate subset of
channels of the image. This reduces `depth` by a factor of
`(spatial_bins_y * spatial_bins_x)`.
When global_pool is True, this function implements a differentiable version
of position-sensitive RoI pooling used in
[R-FCN detection system](https://arxiv.org/abs/1605.06409).
When global_pool is False, this function implements a differentiable version
of position-sensitive assembling operation used in
[instance FCN](https://arxiv.org/abs/1603.08678)."""
'''
目测是将第二阶段提取器得到的feature用1*1卷积增加到k*k*(c+1)个channel,就得到了position-sensitive score maps:
location_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
self.num_classes *
self._box_code_size)
location_feature_map = slim.conv2d(net, location_feature_map_depth,
[1, 1], activation_fn=None,
scope='refined_locations')
然后就用了和faster rcnn代码类似的方法,进行position-sensitive RoI pooling。position-sensitive score maps这个地方确实类似一种“先验结构”。经常在paper中看到这种想法大胆的end-to-end先验结构。论文中说With end-to-end training, this RoI layer shepherds the last convolutional layer to learn specialized position-sensitive score maps,可视化的结果似乎确实有这种现象,有趣。
'''
最后,一个将可变形卷积与R-FCN结合起来的代码:deformable R-FCN implemented in MXNet。
基于Faster R-CNN,本文深入探讨Tensorflow实现的RFCN目标检测算法,结合源码进行详细解读,并提及MXNet中可变形R-FCN的应用。
5万+

被折叠的 条评论
为什么被折叠?



