R-CNN物体检测三篇开山制作阅读笔记

r-cnn reading sum(pattern recognition)

r-cnn

[source code](http://www.cs.berkeley.edu/ ̃rbg/rcnn)

Abstract

Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features.
Two key insights:(1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task,followed by domain-specific fine-tuning, yields a significant performance boost.

Introduction

Purpose
Before this: Recognition mainly based on SIFT and HOG.
Focus on two problems: localizing objects with deep network and training a high-capacity(higher than sift or hog-like features) model with only a small quantity of annotated detection data.
Detection before that:
1. treat it as a regression problem.(bad practical effects)
2. to build a sliding-window detector. (sliding window size in fixed therefore cannot accurate localization).
Method is this paper:
1. using the “recognition using regions” paradigm.
2. supervised pre-training on a large auxiliary dateset(ILSVRC), followed by domain specific fine-tuning on a small dataset(PASCAL) is an effective paradigm for learning high-capacity CNNs when data is scarce.
R-CNN: Regions with CNN features:
1. Input image
2. Entract region proposals(~2K)
3. Compute CNN features(conv and pool)
4. Classify regions with SVM
Other features:
1. Computation system is simple than previous region features.
2. Using detection analysis tool to show that a simple bounding box regression method could improve the model.
3. R-CNN works well on semantic segmentation problems.

Object detecion with R-CNN

Procedure:
Our object detection system consists of three modules.The first generates category-independent region proposals.These proposals define the set of candidate detections avail-
able to our detector. The second module is a large convolutional neural network that extracts a fixed-length feature vector from each region. The third module is a set of class-specific linear SVMs.

Module in details

  1. Region proposals: generating category-independent region proposals.(here using selective search method to obtain proposals)
  2. Feature extraction: 4096-dimensional feature vector from each region proposal using CNN.(5 conv layers and 2 fully connected layers) Before input into net, proposal region was wrapped in tight bounding box. Before wrapping the image, images need to be dilated.(防止形状失真的太厉害?)
  3. Score each extracted feature vector using SVM trained.(也可以用softmax) Apply a greedy non-maximum suppression to judge.(When a proposal has an intersention-over-union “IoU” overlap with a higher scoring proposal larger than a threshold, the prior proposal is rejected)(这里应该就是初步筛选掉一些有覆盖的,覆盖到高分proposal的)
    这种方法比其他方案快是因为:1.权值对所有图片共享。2.网络全连接层输出的维度很低—4096,与当时的其他检测方法相比。(然而还是用了十几秒在GPU上做一次propose和feature extraction)

Net training

Supervised pre-training
ILSVRC 2012上做的预训练,训出的是1000类的普通分类模型。
Supervised Domain-specific fine-tuning
使用VOC数据集在预训练基础上继续训练,采用SGD training.在网络结构上这里只是把之前的最后1000维预测输出换成了随机初始化的21维输出,因为VOC有20类再加上背景算1类。学习率为预训练的1/10,防止彻底打乱了预训练的结果。
每次迭代的输入都是一个batch size为128的数据,其中32个positive window, 96个background window。
* to be continued… *

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值