论文:He, Kaiming, et al. “Mask r-cnn.” Proceedings of the IEEE international conference on computer vision. 2017.
代码:TensorFlow实现
实例分割模型Mask R-CNN详解:从R-CNN,Fast R-CNN,Faster R-CNN再到Mask R-CNN从R-CNN1开讲讲到Mask R-CNN2,很完整,简明的表述了R-CNN系列的发展,但是不够细节。Mask R-CNN2可以理解为Faster R-CNN3 + FCN4
a. 在图像中确定约1000-2000个候选框
b. 对于每个候选框内图像块,使用深度网络提取特征
c. 对候选框中提取出的特征,使用分类器判别是否属于一个特定类
d. 对于属于某一特征的候选框,用回归器进一步调整其位置
网络末端同步训练的分类和位置调整,提升准确度
使用多尺度的图像金字塔,性能几乎没有提高
倍增训练数据,能够有2%-3%的准确度提升
网络直接输出各类概率(softmax),比SVM分类器性能略好
更多候选窗不能提升性能
目标检测的四个基本步骤(候选区域生成,特征提取,分类,位置精修)终于被统一到一个深度网络框架之内
[5]:Lin, Tsung-Yi, et al. “Feature pyramid networks for object detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
参考:
Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. ↩︎
He, Kaiming, et al. “Mask r-cnn.” Proceedings of the IEEE international conference on computer vision. 2017. ↩︎ ↩︎ ↩︎
Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015. ↩︎ ↩︎
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. ↩︎
Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015. ↩︎
Lin, Tsung-Yi, et al. “Feature pyramid networks for object detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. ↩︎