EffificientDet: Scalable and Effificient Object Detection_ef铿乧ientdet scalable and ef铿乧ient object detection-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_56836871/article/details/122769516

本文探讨了一种可能的方法，构建一个在广泛资源限制下具有更高准确性和更好效率的可扩展检测架构。提出了加权双向特征金字塔网络（BiFPN）和复合缩放方法，该方法同时调整分辨率、深度和宽度，适用于不同规模的模型。BiFPN解决了多尺度特征融合的问题，而复合缩放策略则允许针对不同算力需求创建一系列模型。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

动机：

Is it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints (e.g., from 3B to 300B FLOPs)?
【CC】开门见山：基于不同的算力构建一族网络

We systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.
【CC】动机非常存粹：尽可能的提升网络行效率（在不损失精度，甚至提升精度）；首先，在多尺度特征融合阶段提出了BiFPN结构（这也是本文最大的贡献！）；其次，基于作者自己的effificientne给出一族NN应对不同的算力

解题思路：

Challenge 1: efficient multi-scale feature fusion
Since these different input features are at different resolutions, we observe they usually contribute to the fused output feature unequally. we propose a simple yet highly effective weighted bi-directional feature pyramid network (BiFPN), which introduces learnable weights to learn the importance of different input features
【CC】观察发现不同尺度特征对最后的输出贡献是不一样的，基于这点设计一个权重可学习的双向金字塔结构用于特征融合；用MLP做weight的学些是不是也可以？同理，用self-attention是不是也可以可以？已经有人这么干了

Challenge 2: model scaling
Recently, [36] demonstrates remarkable model efficiency for image classification by jointly scaling up network
width, depth, and resolution.We observe that scaling up feature network and box/class prediction network is also critical when taking into account both accuracy and effificiency. we propose a compound scaling method for object detectors, which jointly scales up the resolution/depth/width for all backbone, feature network, box/class prediction network
【CC】其实是根据前人研究：将backbone/header/resolution 合起来缩放对最终精度有比较大的影响；基于这个思想作者对efficientnet+bifpn+header+resolution 进行不同尺度的缩放，形成了自己的一个网络族叫做efficientDet