R-CNN物体检测三篇开山制作阅读笔记

最新推荐文章于 2024-04-28 03:00:18 发布

LordofRobots

最新推荐文章于 2024-04-28 03:00:18 发布

阅读量865

点赞数

分类专栏： tensorflow 文章标签： r-cnn 深度学习检测

本文链接：https://blog.youkuaiyun.com/LordofRobots/article/details/77886246

版权

tensorflow 专栏收录该内容

4 篇文章

订阅专栏

r-cnn reading sum(pattern recognition)

r-cnn reading sumpattern recognition
- r-cnn

r-cnn

[source code](http://www.cs.berkeley.edu/ ̃rbg/rcnn)

Abstract

Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features.
Two key insights:(1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task,followed by domain-specific fine-tuning, yields a significant performance boost.

Introduction

Purpose
Before this: Recognition mainly based on SIFT and HOG.
Focus on two problems: localizing objects with deep network and training a high-capacity(higher than sift or hog-like features) model with only a small quantity of annotated detection data.
Detection before that:
1. treat it as a regression problem.(bad practical effects)
2. to build a sliding-window detector. (sliding window size in fixed therefore cannot accurate localization).
Method is this paper:
1. using the “recognition using regions” paradigm.
2. supervised pre-training on a large auxiliary dateset(ILSVRC), followed by domain specific fine-tuning on a small dataset(PASCAL) is an effective paradigm for learning high-capacity CNNs when data is scarce.
R-CNN： Regions with CNN features:
1. Input image
2. Entract region proposals(~2K)
3. Compute CNN features(conv and pool)
4. Classify regions with SVM
Other features:
1. Computation system is simple than previous region features.
2. Using detection analysis tool to show that a simple bounding box regression method could improve the model.
3. R-CNN works well on semantic segmentation problems.

Object detecion with R-CNN

Procedure:
Our object detection system consists of three modules.The first generates category-independent region proposals.These proposals define the set of candidate detections avail-
able to our detector. The second module is a large convolutional neural network that extracts a fixed-length feature vector from each region. The third module is a set of class-specific linear SVMs.

Module in details

Region proposals: generating category-independent region proposals.(here using selective search method to obtain proposals)
Feature extraction: 4096-dimensional feature vector from each region proposal using CNN.(5 conv layers and 2 fully connected layers) Before input into net, proposal region was wrapped in tight bounding box. Before wrapping the image, images need to be dilated.(防止形状失真的太厉害？)
Score each extracted feature vector using SVM trained.(也可以用softmax) Apply a greedy non-maximum suppression to judge.(When a proposal has an intersention-over-union “IoU” overlap with a higher scoring proposal larger than a threshold, the prior proposal is rejected)(这里应该就是初步筛选掉一些有覆盖的，覆盖到高分proposal的)
这种方法比其他方案快是因为：1.权值对所有图片共享。2.网络全连接层输出的维度很低—4096，与当时的其他检测方法相比。（然而还是用了十几秒在GPU上做一次propose和feature extraction）

Net training

Supervised pre-training
ILSVRC 2012上做的预训练，训出的是1000类的普通分类模型。
Supervised Domain-specific fine-tuning
使用VOC数据集在预训练基础上继续训练，采用SGD training.在网络结构上这里只是把之前的最后1000维预测输出换成了随机初始化的21维输出，因为VOC有20类再加上背景算1类。学习率为预训练的1/10,防止彻底打乱了预训练的结果。
每次迭代的输入都是一个batch size为128的数据，其中32个positive window, 96个background window。
* to be continued… *