FPN

最新推荐文章于 2025-05-16 12:08:17 发布

原创最新推荐文章于 2025-05-16 12:08:17 发布 · 590 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#目标检测 #FPN

深度学习同时被 2 个专栏收录

66 篇文章

订阅专栏

目标检测

17 篇文章

订阅专栏

Paper-info

title: Feature Pyramid Networks for object detection[2017-CVPR]
author : Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie.
Pyramid:

history:

Motivation

对于目标检测任务而言，被检测物体/目标的尺度大小往往会对最终的检测结果产生很大的影响，尤其对于一些小尺度的目标，很多情况下可能会被漏检。Region-based目标检测算法[以Faster-RCNN为代表的检测算法]一般流程为：

feature_maps = process(image)
ROIs         = RPN(image)         # Expensive!
for ROI in ROIs
    patch = roi_pooling(feature_maps, ROI)
    results = detector2(patch)

它存在一个固有的缺陷：所有的检测都是基于主干网络最后一个卷基层输出的单一尺度特征图[记为:feature_map]，相对于原始图像而言，feature_map具有high-level的semantical-info, 但是经过上游网络不断地up-sampling也损失resolution.因此，对于一些尺度具有明显差异的object，Region-based方法往往容易漏检。为了解决该问题，提作者提出了Feature Pyramid Networks [PFN]，它能够为detection提供Pyramid Features[multi-scale feature map]，从而获取更好的检测结果。

Idea

一般而言，获取不同scale的feature的方法有如下三种方式(a, b, c)：

a). 在detection任务的竞赛中，由于没有inference-time的限制，常常采用该策略。缺点：耗时、耗内存;

b). 采用多尺度feature_map融合策略，在得到的refine_map上进行detection，本质上属于基于单尺度feature_map的目标检测方法，与a)相比，省时高效，但精度有所不及；

c). 以SSD为代表，由于底层feature_map包含边缘、颜色等low-level的信息，导致检测到的bbox-region在后续的分类环节得分不高，这样尽管小尺度的object被检测到了，但是这样的bbox因为分类得分过低而被过掉，从而使得小尺度物体漏检！

fpn). FPN采用的Feature-Pyramid结构，兼具b)和c)的优点，大幅提升模型对小尺度物体的检测效果！

Code-torchvision

class FeaturePyramidNetwork(nn.Module):
    """
    Module that adds a FPN from on top of a set of feature maps. This is based on
    `"Feature Pyramid Network for Object Detection" <https://arxiv.org/abs/1612.03144>`_.

    The feature maps are currently supposed to be in increasing depth
    order.

    The input to the model is expected to be an OrderedDict[Tensor], containing
    the feature maps on top of which the FPN will be added.

    Arguments:
        in_channels_list (list[int]): number of channels for each feature map that
            is passed to the module
        out_channels (int): number of channels of the FPN representation
        extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
            be performed. It is expected to take the fpn features, the original
            features and the names of the original features as input, and returns
            a new list of feature maps and their corresponding names
    """

    def __init__(self, in_channels_list, out_channels, extra_blocks=None):

        super(FeaturePyramidNetwork, self).__init__()

        self.inner_blocks = nn.ModuleList()   # data-foreward
        self.layer_blocks = nn.ModuleList()   # feat-fusion

        for in_channels in in_channels_list:
            if in_channels == 0:
                continue
            inner_block_module = nn.Conv2d(in_channels, out_channels, 1)
            layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1)
            self.inner_blocks.append(inner_block_module)
            self.layer_blocks.append(layer_block_module)

        # initialize parameters now to avoid modifying the initialization of top_blocks
        for m in self.children():

            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight, a=1)
                nn.init.constant_(m.bias, 0)

        if extra_blocks is not None:
            assert isinstance(extra_blocks, ExtraFPNBlock)
        self.extra_blocks = extra_blocks


    def forward(self, x):
        """
        Computes the FPN for a set of feature maps.

        Arguments:
            x (OrderedDict[Tensor]): feature maps for each feature level.

        Returns:
            results (OrderedDict[Tensor]): feature maps after FPN layers.
                They are ordered from highest resolution first.
        """
        # unpack OrderedDict into two lists for easier handling
        names = list(x.keys())
        x = list(x.values())

        last_inner = self.inner_blocks[-1](x[-1])  # high-semantic 
        results = []
        results.append(self.layer_blocks[-1](last_inner))
        layer_iter = zip(x[:-1][::-1], self.inner_blocks[:-1][::-1], self.layer_blocks[:-1][::-1])
        
        for feature, inner_block, layer_block in layer_iter:

            if not inner_block:
                continue
            inner_lateral = inner_block(feature)
            feat_shape = inner_lateral.shape[-2:]
            inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")  # 2x up
            last_inner = inner_lateral + inner_top_down  # add 
            results.insert(0, layer_block(last_inner))

        if self.extra_blocks is not None:
            results, names = self.extra_blocks(results, x, names)

        # make it back an OrderedDict
        out = OrderedDict([(k, v) for k, v in zip(names, results)])

        return out

Outline(resnet-50)

FPN with RPN

Pipeline

我们已经知道Fast R-CNN和SPP-Net都是采用single-scale的feature_map，一个很自然的问题是：如果不改变原有的架构，同时还要利用FPN，应该怎么做？作者给出了一个参考策略：对于不同尺度为(w, h)的ROI，为它分配不同resolution的特征图。分配原则由下式给出：

其中，w, h依次表示ROI的宽和高，(224, 224, 3)为input_img的size，对于ResNet-based Faster R-CNN而言，k0=4，即ROI所能获得的最低级的分辨率特征图(最高的语义图)为refine_map为P4。直觉上，当ROI越小，分配给它的refine_map level越小，意味着需要用高分辨率的特征图进行检测；反之，则用低分辨率的特征图检测即可！