【目标检测系列：十】Anchor Free | GARPN | Region Proposal by Guided Anchoring

最新推荐文章于 2024-09-28 13:30:00 发布

原创

最新推荐文章于 2024-09-28 13:30:00 发布 · 2.1k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#Region Proposal by Guided Anchoring #GARPN #Anchor Free

Guided Anchoring是一种新的目标检测方法，通过学习产生anchor，解决了预设anchor的问题。该方法通过预测目标中心点和形状，生成更少但更精确的anchor，从而提高召回率和检测精度。在MS COCO数据集上，相比RPN基线，它减少了90%的anchor，提升了9.1%的召回率，并在Fast R-CNN, Faster R-CNN和RetinaNet中分别提升2.2%, 2.7%和1.2%的检测mAP。" 133295374,19974087,C++动态数组new在后端开发中的高效应用,"['C++', '后端开发', '内存管理', '数据结构', '算法']

CVPR 2019
商汤
Region Proposal by Guided Anchoring

https://github.com/open-mmlab/mmdetection

DCN

anchor

Dense–>sparse
fix --> deformable

proposed method

center of objects
scales and aspect

With Guided Anchoring, we achieve

9.1% higher recall on MS COCO with 90% fewer anchors than the RPN baseline
We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by 2.2%, 2.7% and 1.2%.

Introduce

anchor策略

目前anchor都是人工预设好的（ FPN版的Faster-RCNN提前设定，如1:2，1:1，2:1 一种 scale 8，而YOLOv2则通过聚类得到）
存在的问题
- anchor需要根据数据不同进行设计，还有IoU阈值设置、超参数设计困难等一系列问题。
- 为了保持较高的召回率，需要大量的anchor，而其中大多数都是负样本（导致正负样本严重失衡）
论文提出了一种可学习的anchor机制，由图像本身的语义信息来学习产生得到（主要是位置和上下文信息）

Our method generates sparse anchors in two steps:

first identifying sub-regions that may contain objects
then determining the shapes at different locations

Guided Anchoring

基于语义特征指导anchor生成。主要思想是定位可能的目标中心点，然后根据中心点设置最优的anchor box。该方法联合预测各个位置可能的目标的中心点以及相应的尺度和宽高比

训练时相比于RPN

GA-RPN产生的正样本数目更多，而且高IoU的proposal占的比例更大。
GA-RPN采用更高的阈值、使用更少的样本
使用高质量proposal的前提是根据proposal的分布调整训练样本的分布
GA-RPN相比RPN减少90%的anchor，并且提高9.1%的召回率，将其用于不同的物体检测器Fast R-CNN, Faster R-CNN and RetinaNet，分别提高检测mAP 2.2%,2.7% ,1.2%

Guided Anchoring

预定义的anchor尺度和宽高比对于不同的数据集和算法需要单独调整。怎样生成稀疏且形状自适应的anchor呢？

首先识别可能包含对象的子区域，然后确定不同位置的尺度和宽高比

将目标的位置和形状用一个四元组表示：(x, y, w, h)，其中(x, y)是目标中心坐标，w和h分别是宽和高。则其分布满足：

这种因式分解抓住了两个重要的直觉:

给定一幅图像，物体可能只存在于某些区域
一个物体的 shape (i.e. scale and aspect ratio) 与它的位置密切相关

Anchor generation module
- based on a single feature map
- 由位置预测和形状预测两个分支组成的网络
- image $I$
- feature map $F_I$
- 在 $F_I$ 上，位置预测分支生成一个概率图，该概率图指示对象的可能位置，形状预测分支则预测与位置相关的形状
- 通过两个分支，我们通过选择预测概率高于某个阈值的位置和每个选择位置的最可能形状来生成一组 anchors
feature adaptation module

Anchor generation

直接预测anchor 的位置和形状（长宽）
生成anchor过程可以分解为两个步骤，anchor 位置预测和形状预测

Location Prediction

目标
预测那些区域应该作为中心点来生成 anchor
conv 1×1，channel 1 , element-wise sigmoid
the probability map $p(i, j|F_I )$ is predicted using a sub-network $N_L$
二分类
预测是不是物体的中心
根据生成的概率图，我们可以通过选择相应概率值高于预定义阈值 $ε_L$ 的位置来确定目标可能存在的区域
(这个过程可以过滤掉 90% 的 region ，同时仍然保持相同的 recall )
replace the ensuing convolutional layers by masked convolution for more efficient inference
天空、海洋等区域不包括在内，锚点密集分布在人和冲浪板周围。由于不需要考虑这些被排除的区域，为了得到更有效的 inference ，我们用 masked conv 代替了随后的卷积层