Fast R-CNN 论文 笔记 及 源码解读

Fast R-CNN改进了RCNN和SPPnet的效率问题,通过RoI池化层提取固定长度特征,再经全连接层进行分类和边界框回归。模型直接在预训练网络上初始化并进行端到端微调,采样策略采用分层采样以优化多任务损失。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Fast R-CNN

与RCNN SPPnet对比

  1. RCNN首先finetune,使用log loss。然后,使用SVMs来训练,最后,使用bounding-box regressor。
  2. 代价大

Fast R-CNN 模型结构和训练

这里写图片描述

一张图片首先经过几个卷积层和池化层产生特征向量,然后 for each object proposal a region of interest(RoI) pooling layer extracts a fixed-length feature vector from the feature map.

然后输入一组fully connected层,最终 branch into two sibling output layers:
1. one that produces softmax probability estimates over K object classes plus a catch-all “background” class
2. another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes.

The RoI pooling layer

uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H * W
Each RoI is defined by a four-tuple (r, c, h, w) that specifies its top-left corner(r, c) and its height and width(h, w).

max pooling h/H * w/W size window

Initializing from pre-trained networks

use a pre-trained network initializes a Fast R-CNN network, it undergoes three transformations
1. last max pooling layer is replaced by a RoI pooling layer
2. last fc layer and soft-max replaced with two sibling layers(a fully connected layer and softmax over K +1 categories and category-specific bounding-box regressors)
3. The network is modified to take two data inputs: a list of images and a list of RiOs in those images.

Fine-tuning for detection

In Fast R-CNN training, SGD mini-batches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image.

* Multi-task loss *
这里写图片描述

这里写图片描述

未完

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值