Object detection guide

Thank Object Detection based on handong1587 github: https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html


object-detection

This is a list of awesome articles about object detection.

  • R-CNN
  • Fast R-CNN
  • Faster R-CNN
  • Light-Head R-CNN
  • Cascade R-CNN
  • SPP-Net
  • YOLO
  • YOLOv2
  • YOLOv3
  • YOLT
  • SSD
  • DSSD
  • FSSD
  • ESSD
  • MDSSD
  • Pelee
  • Fire SSD
  • R-FCN
  • FPN
  • DSOD
  • RetinaNet
  • MegNet
  • RefineNet
  • DetNet
  • SSOD
  • 3D Object Detection
  • ZSD(Zero-Shot Object Detection)
  • OSD(One-Shot object Detection)
  • Other

Based on handong1587’s github(https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html)


Methodbackbonetest sizeVOC2007VOC2010VOC2012ILSVRC 2013MSCOCO 2015Speed
OverFeat     24.3%  
R-CNNAlexNet 58.5%53.7%53.3%31.4%  
R-CNNVGG16 66.0%     
SPP_netZF-5 54.2%  31.84%  
DeepID-Net  64.1%  50.3%  
NoC73.3% 68.8%     
Fast-RCNNVGG16 70.0%68.8%68.4% 19.7%(@[0.5-0.95]), 35.9%(@0.5) 
MR-CNN78.2% 73.9%     
Faster-RCNNVGG16 78.8% 75.9% 21.9%(@[0.5-0.95]), 42.7%(@0.5)198ms
Faster-RCNNResNet101 85.6% 83.8% 37.4%(@[0.5-0.95]), 59.0%(@0.5) 
YOLO  63.4% 57.9%  45 fps
YOLO VGG-16  66.4%    21 fps
YOLOv2 448x44878.6% 73.4% 21.6%(@[0.5-0.95]), 44.0%(@0.5)40 fps
SSDVGG16300x30077.2% 75.8% 25.1%(@[0.5-0.95]), 43.1%(@0.5)46 fps
SSDVGG16512x51279.8% 78.5% 28.8%(@[0.5-0.95]), 48.5%(@0.5)19 fps
SSDResNet101300x300    28.0%(@[0.5-0.95])16 fps
SSDResNet101512x512    31.2%(@[0.5-0.95])8 fps
DSSDResNet101300x300    28.0%(@[0.5-0.95])8 fps
DSSDResNet101500x500    33.2%(@[0.5-0.95])6 fps
ION  79.2% 76.4%   
CRAFT  75.7% 71.3%48.5%  
OHEM  78.9% 76.3% 25.5%(@[0.5-0.95]), 45.9%(@0.5) 
R-FCNResNet50 77.4%    0.12sec(K40), 0.09sec(TitianX)
R-FCNResNet101 79.5%    0.17sec(K40), 0.12sec(TitianX)
R-FCN(ms train)ResNet101 83.6% 82.0% 31.5%(@[0.5-0.95]), 53.2%(@0.5) 
PVANet 9.0  84.9% 84.2%  750ms(CPU), 46ms(TitianX)
RetinaNetResNet101-FPN       
Light-Head R-CNNXception*800/1200    31.5%@[0.5:0.95]95 fps
Light-Head R-CNNXception*700/1100    30.7%@[0.5:0.95]102 fps

Papers&Codes

R-CNN

Rich feature hierarchies for accurate object detection and semantic segmentation

Fast R-CNN

Fast R-CNN

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-CNN minus R

Faster R-CNN in MXNet with distributed implementation and data parallelization

Contextual Priming and Feedback for Faster R-CNN

An Implementation of Faster RCNN with Study for Region Sampling

Interpretable R-CNN

Light-Head R-CNN

Light-Head R-CNN: In Defense of Two-Stage Object Detector

Cascade R-CNN

Cascade R-CNN: Delving into High Quality Object Detection

SPP-Net

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

Object Detectors Emerge in Deep Scene CNNs

segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

Object Detection Networks on Convolutional Feature Maps

Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction

DeepBox: Learning Objectness with Convolutional Networks

YOLO

You Only Look Once: Unified, Real-Time Object Detection

img

darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++

Start Training YOLO with Our Own Data

img

YOLO: Core ML versus MPSNNGraph

TensorFlow YOLO object detection on Android

Computer Vision in iOS – Object Detection

YOLOv2

YOLO9000: Better, Faster, Stronger

darknet_scripts

Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2

LightNet: Bringing pjreddie’s DarkNet out of the shadows

https://github.com//explosion/lightnet

YOLO v2 Bounding Box Tool

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

  • intro: LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.
  • arxiv: https://arxiv.org/abs/1804.04606

Object detection at 200 Frames Per Second

Event-based Convolutional Networks for Object Detection in Neuromorphic Cameras

OmniDetector: With Neural Networks to Bounding Boxes

YOLOv3

YOLOv3: An Incremental Improvement

YOLT

You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery

SSD

SSD: Single Shot MultiBox Detector

img

What’s the diffience in performance between this new code you pushed and the previous code? #327

https://github.com/weiliu89/caffe/issues/327

DSSD

DSSD : Deconvolutional Single Shot Detector

Enhancement of SSD by concatenating feature maps for object detection

Context-aware Single-Shot Detector

Feature-Fused SSD: Fast Detection for Small Objects

https://arxiv.org/abs/1709.05054

FSSD

FSSD: Feature Fusion Single Shot Multibox Detector

https://arxiv.org/abs/1712.00960

Weaving Multi-scale Context for Single Shot Detector

ESSD

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

https://arxiv.org/abs/1801.05918

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

https://arxiv.org/abs/1802.06488

MDSSD

MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects

Pelee

Pelee: A Real-Time Object Detection System on Mobile Devices

https://github.com/Robert-JunWang/Pelee

Fire SSD

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

R-FCN

R-FCN: Object Detection via Region-based Fully Convolutional Networks

R-FCN-3000 at 30fps: Decoupling Detection and Classification

https://arxiv.org/abs/1712.01802

Recycle deep features for better object detection

FPN

Feature Pyramid Networks for Object Detection

Action-Driven Object Detection with Top-Down Visual Attentions

Beyond Skip Connections: Top-Down Modulation for Object Detection

Wide-Residual-Inception Networks for Real-time Object Detection

Attentional Network for Visual Object Detection

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Spatial Memory for Context Reasoning in Object Detection

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

https://arxiv.org/abs/1704.05775

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Point Linking Network for Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection

https://arxiv.org/abs/1706.05274

Few-shot Object Detection

https://arxiv.org/abs/1706.08249

Yes-Net: An effective Detector Based on Global Information

https://arxiv.org/abs/1706.09180

SMC Faster R-CNN: Toward a scene-specialized multi-object detector

https://arxiv.org/abs/1706.10217

Towards lightweight convolutional neural networks for object detection

https://arxiv.org/abs/1707.01395

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Mimicking Very Efficient Network for Object Detection

Residual Features and Unified Prediction Network for Single Stage Detection

https://arxiv.org/abs/1707.05031

Deformable Part-based Fully Convolutional Network for Object Detection

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

Recurrent Scale Approximation for Object Detection in CNN

DSOD

DSOD: Learning Deeply Supervised Object Detectors from Scratch

img

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

RetinaNet

Focal Loss for Dense Object Detection

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

Incremental Learning of Object Detectors without Catastrophic Forgetting

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

https://arxiv.org/abs/1709.04347

StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

https://arxiv.org/abs/1709.05788

Dynamic Zoom-in Network for Fast Object Detection in Large Images

https://arxiv.org/abs/1711.05187

Zero-Annotation Object Detection with Web Knowledge Transfer

MegDet

MegDet: A Large Mini-Batch Object Detector

Receptive Field Block Net for Accurate and Fast Object Detection

An Analysis of Scale Invariance in Object Detection - SNIP

Feature Selective Networks for Object Detection

https://arxiv.org/abs/1711.08879

Learning a Rotation Invariant Detector with Rotatable Bounding Box

Scalable Object Detection for Stylized Objects

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

Deep Regionlets for Object Detection

Training and Testing Object Detectors with Virtual Images

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video

  • keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation
  • arxiv: https://arxiv.org/abs/1712.08832

Spot the Difference by Object Detection

Localization-Aware Active Learning for Object Detection

Object Detection with Mask-based Feature Encoding

https://arxiv.org/abs/1802.03934

LSTD: A Low-Shot Transfer Detector for Object Detection

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Pseudo Mask Augmented Object Detection

https://arxiv.org/abs/1803.05858

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

https://arxiv.org/abs/1803.06799

Learning Region Features for Object Detection

Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection

Object Detection for Comics using Manga109 Annotations

Task-Driven Super Resolution: Object Detection in Low-resolution Images

https://arxiv.org/abs/1803.11316

Transferring Common-Sense Knowledge for Object Detection

https://arxiv.org/abs/1804.01077

Multi-scale Location-aware Kernel Representation for Object Detection

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Robust Physical Adversarial Attack on Faster R-CNN Object Detector

https://arxiv.org/abs/1804.05810

RefineNet

Single-Shot Refinement Neural Network for Object Detection

DetNet

DetNet: A Backbone network for Object Detection

SSOD

Self-supervisory Signals for Object Discovery and Detection

3D Object Detection

LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDARs

ZSD

Zero-Shot Detection

Zero-Shot Object Detection

Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

Zero-Shot Object Detection by Hybrid Region Embedding

OSD

One-Shot Object Detection

RepMet: Representative-based metric learning for classification and one-shot object detection

2018

MetaAnchor: Learning to Detect Objects with Customized Anchors

arxiv: https://arxiv.org/abs/1807.00980

Relation Network for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Learning Rich Features for Image Manipulation Detection

SNIPER: Efficient Multi-Scale Training

Soft Sampling for Robust Object Detection

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

<think>我们遇到了一个典型的自定义损失函数加载问题:当模型保存时包含了自定义损失函数(如'detection_loss'),重新加载时必须提供相同的自定义对象定义,否则TensorFlow无法识别该损失函数。 根据引用[1]中关于自定义损失函数(Focal Loss)的使用示例,我们可以推断:在模型编译时使用了自定义损失函数,那么在加载模型时也必须通过`custom_objects`参数提供这个自定义损失函数。 解决方案步骤: 1. 确保拥有定义'detection_loss'的源代码。 2. 在加载模型之前,将'detection_loss'函数通过`custom_objects`字典传递给加载函数。 3. 如果'detection_loss'是来自某个模块(如TensorFlow目标检测API),需要导入相应的模块或重新定义该函数。 具体操作如下: 假设'detection_loss'是在目标检测模型中定义的一个损失函数(常见于使用TensorFlow Object Detection API训练的模型)。由于目标检测API中的损失函数通常包含在模型的构建代码中,因此我们需要重现这个函数或者从原始训练代码中获取。 如果用户是从TensorFlow Object Detection API中训练的模型(引用[2]和引用[4]提到了目标检测API和模型构建器),那么'detection_loss'可能是API内部定义的一个损失函数。在这种情况下,我们需要使用API提供的相同方式来定义这个损失函数,或者直接导入。 但是,如果用户无法直接导入(例如,因为API的模块结构复杂),我们可以尝试以下方法: 方法一:从原始训练代码中提取'detection_loss'的定义 方法二:如果使用的是预构建的模型,查看API文档中关于该损失函数的实现,然后复制过来。 例如,在TensorFlow Object Detection API中,损失函数通常是通过配置文件中指定的。但当我们保存为Keras模型(h5)时,损失函数会被序列化,但加载时需要能够被识别。 然而,由于目标检测API中的模型通常是通过`tf.keras.Model`的子类构建的,并且损失函数可能是在模型内部定义的(而不是作为编译时的独立损失函数),所以情况可能更复杂。但错误信息明确提到了损失函数,因此我们假设在模型编译时使用了这个损失函数。 如果用户没有现成的'detection_loss'函数,我们可以尝试以下步骤: 步骤1:重新定义(或导入)'detection_loss'函数 步骤2:在加载模型时将其传递给`custom_objects` 示例代码: ```python import tensorflow as tf # 情况1:如果'detection_loss'是来自某个模块,可以导入 # 例如,假设它在object_detection.core.losses模块中 # from object_detection.core.losses import detection_loss # 情况2:手动重新定义(需要知道原始实现) # 注意:这里只是一个示例,实际函数需要用户根据训练时的定义来提供 def detection_loss(y_true, y_pred): # 这里应该是实际的损失计算 # 例如,可能是某种平滑L1损失或者交叉熵损失等 # 由于不知道具体实现,这里用占位符 loss = tf.reduce_mean(tf.square(y_true - y_pred)) return loss # 加载模型时指定自定义损失函数 model = tf.keras.models.load_model( 'your_model.h5', custom_objects={'detection_loss': detection_loss} ) # 如果模型还有其他自定义组件(如自定义层),也需要在custom_objects中提供 ``` 但是,如果模型是使用TensorFlow Object Detection API训练的,并且保存为h5文件,那么'detection_loss'可能是一个包含多个损失项的函数(分类损失+定位损失)。因此,用户需要找到原始训练代码中关于损失函数的定义部分。 另外,引用[4]中提到了模型构建器的测试,这暗示用户可能在使用目标检测API。因此,建议用户检查训练代码中损失函数的定义,并将其复制到加载模型的脚本中。 如果用户无法找到原始定义,可以尝试以下替代方案: 方案1:使用`compile=False`加载模型,然后重新编译模型 ```python model = tf.keras.models.load_model('your_model.h5', compile=False) # 然后重新编译模型,使用一个已知的损失函数(如果不需要继续训练,只是用于预测,则可以跳过编译) model.compile(optimizer='adam', loss=detection_loss) # 或者使用其他损失函数 ``` 但是,如果用户需要继续训练,则必须使用原来的损失函数。如果只是用于预测,则不需要损失函数,因此可以不用编译。 方案2:如果模型是使用TensorFlow目标检测API中的预构建模型,那么可以尝试重新构建整个模型(包括结构、损失函数等),然后加载权重。但这种方法要求用户有完整的模型构建代码。 总结步骤: 1. 确定'detection_loss'的原始定义,并将其包含在加载模型的代码中(通过导入或重新定义)。 2. 使用`custom_objects`参数加载模型,将'detection_loss'映射到自定义函数。 3. 如果只是预测,可以加载模型后不编译(使用`compile=False`),然后重新编译一个替代的损失函数(或者不编译,直接使用预测功能)。 注意:如果模型包含其他自定义组件(如自定义层),也需要在`custom_objects`中提供。 引用[1]中展示了如何将自定义损失函数(SparseCategoricalFocalLoss)用于模型编译,这支持了我们的解决方案:自定义损失函数必须在加载时可用。 最后,如果用户使用的是目标检测API中的预训练模型,并且无法找到损失函数的定义,可以查阅API的文档或源代码。例如,在TensorFlow Model Garden的object_detection项目中,损失函数通常定义在`losses.py`文件中。 例如,在TensorFlow Object Detection API中,常见的损失函数包括: - 分类损失:WeightedSigmoidClassificationLoss, WeightedSoftmaxClassificationLoss等 - 定位损失:WeightedL2LocalizationLoss, WeightedSmoothL1LocalizationLoss等 而'detection_loss'可能是这些损失函数的组合。因此,用户需要根据训练时使用的配置文件来确定具体的损失函数。 如果用户无法解决,建议提供模型训练时关于损失函数的代码部分。 相关问题: 1. 如何在TensorFlow中定义自定义损失函数? 2. TensorFlow目标检测API中的损失函数有哪些? 3. 如何加载包含自定义层的TensorFlow模型?</think>### 解决 TensorFlow 加载模型时报错:Unknown loss function 'detection_loss' 当加载包含自定义损失函数(如目标检测模型中的'detection_loss')的 H5 模型时,需要正确处理自定义组件。以下是完整解决方案: #### 1. 核心解决方案:注册自定义损失函数 在加载模型时,必须通过 `custom_objects` 参数显式注册自定义损失函数: ```python import tensorflow as tf from object_detection.core.losses import weighted_sigmoid_cross_entropy # 目标检测API中的损失函数 # 方法1:直接导入原始损失函数 model = tf.keras.models.load_model( 'detection_model.h5', custom_objects={'detection_loss': weighted_sigmoid_cross_entropy} # 替换为实际损失函数 ) # 方法2:自定义包装函数(如果不知道具体实现) def detection_loss(y_true, y_pred): """自定义损失函数实现(需替换为实际计算逻辑)""" # 示例:加权交叉熵损失 return tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits( labels=y_true, logits=y_pred, pos_weight=2.0 )) model = tf.keras.models.load_model( 'detection_model.h5', custom_objects={'detection_loss': detection_loss} ) ``` > **关键点**:`detection_loss` 通常是目标检测模型中的组合损失函数,包含分类损失和定位损失[^2]。 #### 2. 目标检测模型专用加载方法 对于使用 TensorFlow Object Detection API 训练的模型: ```python # 设置环境变量(确保API路径正确) import os os.environ['PYTHONPATH'] += ":/content/models/research" # 导入目标检测API from object_detection.utils import config_util from object_detection.builders import model_builder # 加载模型配置(需原始配置文件) pipeline_config = 'ssd_resnet50.config' # 替换为实际配置文件 configs = config_util.get_configs_from_pipeline_file(pipeline_config) model_config = configs['model'] # 重建模型结构 detection_model = model_builder.build( model_config=model_config, is_training=False ) # 加载权重 detection_model.load_weights('detection_model.h5') ``` #### 3. 临时解决方案(仅用于推理) 如果不需要继续训练,可跳过损失函数注册: ```python model = tf.keras.models.load_model( 'detection_model.h5', compile=False # 跳过编译步骤 ) # 验证模型可用性 sample_input = tf.random.normal([1, 320, 320, 3]) # 替换为实际输入尺寸 predictions = model.predict(sample_input) print(predictions.shape) # 检查输出形状 ``` #### 4. 完整工作流示例 ```python import tensorflow as tf from object_detection.core.losses import weighted_sigmoid_cross_entropy # 1. 加载模型 try: model = tf.keras.models.load_model( 'detection_model.h5', custom_objects={'detection_loss': weighted_sigmoid_cross_entropy} ) except KeyError as e: print(f"缺少自定义组件: {e}") # 重建模型结构(需原始代码) from model_arch import build_detection_model model = build_detection_model() model.load_weights('detection_model.h5') # 2. 验证模型 model.summary() # 3. 进行预测 input_tensor = tf.ones([1, 300, 300, 3]) # 输入尺寸需匹配模型 detections = model(input_tensor) print("检测框数量:", detections['num_detections'].numpy()[0]) ``` #### 常见问题解决 1. **"Unknown layer" 错误**: - 检查模型是否包含自定义层 - 在 `custom_objects` 中添加所有自定义层: ```python custom_objects = { 'CustomConvLayer': CustomConvLayer, 'detection_loss': detection_loss } ``` 2. **版本兼容性问题**: ```python # 尝试不同版本兼容模式 model = tf.keras.models.load_model( 'detection_model.h5', custom_objects={'detection_loss': detection_loss}, compile=False ) ``` 3. **H5 文件损坏**: ```bash # 检查文件完整性 h5ls -r detection_model.h5 ``` #### 最佳实践 1. **保存时包含元数据**: ```python model.save('detection_model.h5', include_optimizer=True, save_format='h5') ``` 2. **自定义组件统一管理**: ```python # custom_objects.py def detection_loss(y_true, y_pred): # 损失函数实现 ... class CustomLayer(tf.keras.layers.Layer): # 自定义层实现 ... # 加载时导入 from custom_objects import detection_loss, CustomLayer ``` 3. **转换为 SavedModel 格式**: ```python tf.saved_model.save(model, 'saved_model_dir') loaded = tf.saved_model.load('saved_model_dir') ``` > 对于目标检测模型,建议使用 TensorFlow Object Detection API 的原生加载方式[^2][^4],或确保训练环境和部署环境一致[^5]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值