Faster R-CNN

最新推荐文章于 2025-05-31 15:46:51 发布

zhoujunr1

最新推荐文章于 2025-05-31 15:46:51 发布

阅读量269

点赞数

CC 4.0 BY-SA版权

分类专栏：深度学习

本文链接：https://blog.youkuaiyun.com/zhoujunr1/article/details/80058854

深度学习专栏收录该内容

5 篇文章

订阅专栏

本文介绍了Faster R-CNN的原理与实现细节，重点讨论了如何通过区域提议网络（RPN）实现实时目标检测。RPN使用卷积层计算候选区域，并为每个区域分配目标得分。该方法通过引入不同尺度和长宽比的锚点，增强了模型对多尺度目标的检测能力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

使用卷积层计算proposals
On top of these convolutional features, we construct an RPN by adding a few additional convolutional layers that simultaneously regress region bounds and objectness scores at each location on a regular grid.

与之前的金字塔相比，作者引入来 anchor（锚），作为不同scales和aspect ratios的参考。

这里写图片描述

Training scheme: alternates between fine-tuning for the region proposal task and then fine-tuning for object detection, while keeping the proposals fixed.

model

这里写图片描述

Faster R-CNN

Region Proposal Networks

A RPN takes an image as input and outputs a set of rectangular object proposals, each with an objectness score.

为了产生region proposal，we slide a small network over the convolutional feature map output by the last shared convolutional layer.

This small network takes as input an n*n spatial window of the input convolutional feature map.

Each sliding window is mapped to a lower-dimensional feature.

This feature if fed into two sibling fc layers - a box-regression layer and a box-classification layer.

We use n=3 in this paper.

这里写图片描述

Anchors

At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k.

An anchor is centered at the sliding window in question, and is associated with a scale and aspect ratio (Figure 3, left). By default we use 3 scales and 3 aspect ratios, yielding k = 9 anchors at each sliding position.

Translation-Invariant Anchors

anchors 和 function that compute proposal relative to the anchors都是translation-invariant

Multi-Scale Anchors as Regression References

Our method is built on a pyramid of anchors

Our method classiﬁes and regresses bounding boxes with reference to anchor boxes of multiple scales and aspect ratios.

It only relies on images and feature maps of a single scale, and uses ﬁlters (sliding windows on the feature map) of a single size.

Loss Function

we assign a binary class label (of being an object or not) to each anchor.

这里写图片描述

t i is a vector representing the 4 parameterized coordinates of the predicted bounding box, and t ∗ is that of the i ground-truth box associated with a positive anchor.

这里写图片描述

Training RPNs

It is possible to optimize for the loss functions of all anchors, but this will bias towards negative samples as they are dominate.

Instead, we randomly sample 256 anchors in an image to compute the loss function of a mini-batch, where the sampled positive and negative anchors have a ratio of up to 1:1.