TITLE: Deformable Part-based Fully Convolutional Network for Object Detection
AUTHOR: Taylor Mordan, Nicolas Thome, Matthieu Cord, Gilles Henaff
FROM: arXiv:1707.06175
CONTRIBUTIONS
- Deformable Part-based Fully Convolutional Network (DPFCN), an end-to-end model integrating ideas from DPM into region-based deep ConvNets for object detection, is proposed.
- A new deformable part-based RoI pooling layer is introduced, which explicitly selects discriminative elements of objects around region proposals by simultaneously optimizing latent displacements of all parts.
- Another improvement is the design of a deformation-aware localization module, a specific module exploiting configuration information to refine localization.
METHOD
R-FCN is the work closest to DP-FCN. Both are developed on the basis of Faster-RCNN, in which an RPN is used to generate object proposals and a designed pooling layer is used to extract features for classification and localization. The architecture of DP-FCN is illustrated in the following figure. A Deformable part-based RoI Pooling layer follows a FCN network. Then two branches predict category and location respectively. The output of the backbone FCN is similar to that in R-FCN. It has k2(C+1) channels corresponding to k×k parts and C categories and background.

Deformable part-based RoI pooling
For each input channel, just like what has been done in DPM, a transformation is carried out to spread high responses to nearby locations, taking into account the deformation costs.

In my understanding, the output of RPN works like the root filter in DPM. Then the region proposal is evenly divided into k×k sub-regions. Then these sub-regions will displace taking deformation into account. Displacement computed during the forward pass are stored and used to backpropagate gradients at the same locations.
Classification and localization predictions with deformable parts
Predictions are performed with two sibling branches for classification and relocalization of region proposals as is common practice. The classification branch is simply composed of an average pooling followed by a SoftMax layer.

As for location prediction, every part has 4 elements to be predicted. In addition to that, the displacement is sent to two fully connected layers and is then element-wise multiplied with the first values to yield the final localization output for this class.

提出一种名为DP-FCN的可变形部件基全卷积网络模型,该模型结合了DPM思想与区域基深度卷积网络,用于目标检测任务。通过引入新的可变形部件基RoI池化层,能够同时优化所有部件的潜在位移,选择区域提议周围对象的判别元素。此外,还设计了一个变形感知定位模块,利用配置信息来细化定位。
796

被折叠的 条评论
为什么被折叠?



