DSOD: Learning Deeply Supervised Object Detectors from Scratch

介绍DSOD,一种无需预训练模型即可达到先进水平的目标检测框架。该框架不仅简单高效,且灵活多变,适用于多种计算平台。通过逐步消融研究验证了其原理的有效性,并展示了在PASCAL VOC及MSCOCO等基准上的优异表现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Key Problems

  • Limited structure design space.
  • Learning bias
    • As both the loss functions and the category distributions between classification and detection tasks are different, we argue that this will lead to different searching/optimization spaces. Therefore, learning may be biased towards a local minimum which is not the best for detection task.
  • Domain mismatch
  • State-of-the-art object objectors rely heavily on the offthe-shelf networks pre-trained on large-scale classification datasets like ImageNet
  • transferring pre-trained models from classification to detection between discrepant domains is even more difficult

Architecture

这里写图片描述

这里写图片描述

Principles

  • training detection network from scratch requires a proposal-free framework.
  • Deep Supervision
    • Transition w/o Pooling Layer. We introduce this layer in order to increase the number of dense blocks without reducing the final feature map resolution.
  • Stem Block
    • stem block can reduce the information loss from raw input images.
  • Dense Prediction Structure
    • *

Contributions

  • DSOD is a simple yet efficient framework which could learn object detectors from scratch
  • DSOD is fairly flexible, so that we can tailor various network structures for different computing platforms such as server, desktop, mobile and even embedded devices.
  • We present DSOD, to the best of our knowledge, world first framework that can train object detection networks from scratch with state-of-the-art performance.
  • We introduce and validate a set of principles to design efficient object detection networks from scratch through step-by-step ablation studies.
  • We show that our DSOD can achieve state-of-the-art performance on three standard benchmarks (PASCAL VOC 2007, 2012 and MS COCO datasets) with realtime processing speed and more compact models.

Experiments

这里写图片描述

这里写图片描述

这里写图片描述

这里写图片描述

这里写图片描述

Others

  • a well-designed network structure can outperform state-ofthe-art solutions without using the pre-trained models
  • only the proposal-free method (the 3rd category) can converge successfully without the pre-trained models.
    • RoI pooling generates features for each region proposals, which hinders the gradients being smoothly back-propagated from region-level to convolutional feature maps.
    • The proposal-based methods work well with pretrained network models because the parameter initialization
      is good for those layers before RoI pooling, while this is not
      true for training from scratch
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值