FCOS代码(二)(demo过程) RPN网络结构

本文详细解析了FCOS(Fully Convolutional One-Stage Object Detection)目标检测算法的实现,重点介绍了FCOSHead模块的结构,包括分类、回归和中心质量分支。FCOSHead不使用RPN,而是采用逐像素方法回归边界框,通过几个卷积层处理特征。FCOSHead的输出包括分类得分图、边界框回归向量和中心质量分数。后处理阶段,使用FCOSPostProcessor进行筛选和非极大值抑制,得到最终的预测边界框。整个流程展示了FCOS如何从特征图生成预测并进行优化,以实现高效的目标检测。

FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn

FCOS 代码 (三) demo过程的整个流程

FCOS 并没有使用RPN来回归边界框,而是采用逐像素的方法来回归边界框,只不过这部分的代码名称依然是RPN而以。下面打印了这部分的网络结构。

FCOSModule(
  (head): FCOSHead(
    (cls_tower): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): GroupNorm(32, 256, eps=1e-05, affine=True)
      (2): ReLU()
      (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): GroupNorm(32, 256, eps=1e-05, affine=True)
      (5): ReLU()
      (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): GroupNorm(32, 256, eps=1e-05, affine=True)
      (8): ReLU()
      (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (10): GroupNorm(32, 256, eps=1e-05, affine=True)
      (11): ReLU()
    )
    (bbox_tower): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): GroupNorm(32, 256, eps=1e-05, affine=True)
      (2): ReLU()
      (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): GroupNorm(32, 256, eps=1e-05, affine=True)
      (5): ReLU()
      (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): GroupNorm(32, 256, eps=1e-05, affine=True)
      (8): ReLU()
      (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (10): GroupNorm(32, 256, eps=1e-05, affine=True)
      (11): ReLU()
    )
    (cls_logits): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (centerness): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (scales): ModuleList(
      (0): Scale()
      (1): Scale()
      (2): Scale()
      (3): Scale()
      (4): Scale()
    )
  )
  (box_selector_test): FCOSPostProcessor()
)

1)FCOSHead 部分

i. cls_tower 和 bbox_tower

可以看到这两部分的网络结构是一样的,cls_tower用于分类分支,bbox_tower用于回归分支。因为分类和回归的任务性质不同,这两个部分不可以共享网络参数,所以每个都需要单独定义。它们两部分最终的输出size如下所示,cls_tower_list为cls_tower的输出,box_tower_list为bbox_tower的输出(这两部分是为了看一下这两网络的输出形式而在代码里新加的,源代码里没有)。包含5个输出(FPN的5个层),通道数都为256。

cls_tower_list = []  # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
box_tower_list = []  # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}

ii. cls_logits

接收来自cls_tower输出的特征,输出分类得分图,它的输出结果如下所示,一共5个输出,对应之前ResNet结合FPN的5个不同层,每个输出都有80个通道,每个通道表示一个类别。

logits = []  # list:5 {Tensor:(1,80,100,140),Tensor:(1,80,50,70),Tensor:(1,80,25,35),Tensor:(1,80,13,18),Tensor:(1,80,7,9)}

iii. bbox_pred

接收来自bbox_tower输出的特征,输出回归的向量,如下所示,5个输出,每个有4个通道,表示对应于分类得分图位置的回归向量(l,t,r,b)

bbox_reg = []  # list:5 {Tensor:(1,4,100,140),Tensor:(1,4,50,70),Tensor:(1,4,25,35),Tensor:(1,4,13,18),Tensor:(1,4,7,9)}

iv. centerness

接收来自bbox_tower输出的特征,FCOS论文中提出的中心质量分支,输出如下所示,每个只有1个通道,但会与对应的80个通道的得分图元素相乘(中心质量分支作用于分类得分图)。

centerness = []  # list:5 {Tensor:(1,1,100,140),Tensor:(1,1,50,70),Tensor:(1,1,25,35),Tensor:(1,1,13,18),Tensor:(1,1,7,9)}

v. FCOSHead 类

至此,FCOSHead类的定义如下,它构建了上述的(i~iv)head网络,所谓的head,就是输出头,通过它得到网络每个部分(分类,回归和中心质量)的最后输出(网络的输出,不是最终的预测结果,最终的预测结果还需根据整个输出进行筛选确定),相当于把来自FPN的5层特征输入同一个head得到5层对应的输出。下面分析一些其中的细节:

class FCOSHead(torch.nn.Module):
    def __init__(self, cfg, in_channels):  # in_channels:256
        """
        Arguments:
            in_channels (int): number of channels of the input feature
        """
        super(FCOSHead, self).__init__()
        # TODO: Implement the sigmoid version first.
        num_classes = cfg.MODEL.FCOS.NUM_CLASSES - 1  # 80
        self.fpn_strides = cfg.MODEL.FCOS.FPN_STRIDES  # [8,16,32,64,128]
        self.norm_reg_targets = cfg.MODEL.FCOS.NORM_REG_TARGETS  # True
        self.centerness_on_reg = cfg.MODEL.FCOS.CENTERNESS_ON_REG  # True
        self.use_dcn_in_tower = cfg.MODEL.FCOS.USE_DCN_IN_TOWER  # False

        cls_tower = []
        bbox_tower = []
        for i in range(cfg.MODEL.FCOS.NUM_CONVS):  # 4
            if self.use_dcn_in_tower and \
                    i == cfg.MODEL.FCOS.NUM_CONVS - 1:
                conv_func = DFConv2d
            else:
                conv_func = nn.Conv2d

            cls_tower.append(
                conv_func(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    bias=True
                )
            )
            cls_tower.append(nn.GroupNorm(32, in_channels))
            cls_tower.append(nn.ReLU())
            bbox_tower.append(
                conv_func(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    bias=True
                )
            )
            bbox_tower.append(nn.GroupNorm(32, in_channels))
            bbox_tower.append(nn.ReLU())

        self.add_module('cls_tower', nn.Sequential(*cls_tower))
        self.add_module('bbox_tower', nn.Sequential(*bbox_tower))
        self.cls_logits = nn.Conv2d(
            in_channels, num_classes, kernel_size=3, stride=1,
            padding=1
        )
        self.bbox_pred = nn.Conv2d(
            in_channels, 4, kernel_size=3, stride=1,
            padding=1
        )
        self.centerness = nn.Conv2d(
            in_channels, 1, kernel_size=3, stride=1,
            padding=1
        )

        # initialization
        for modules in [self.cls_tower, self.bbox_tower,
                        self.cls_logits, self.bbox_pred,
                        self.centerness]:
            for l in modules.modules():
                if isinstance(l, nn.Conv2d):
                    torch.nn.init.normal_(l.weight, std=0.01)
                    torch.nn.init.constant_(l.bias, 0)

        # initialize the bias for focal loss
        prior_prob = cfg.MODEL.FCOS.PRIOR_PROB  # 0.01
        bias_value = -math.log((1 - prior_prob) / prior_prob)  # -4.59511985013459
        torch.nn.init.constant_(self.cls_logits.bias, bias_value)

        self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])

    def forward(self, x):  # form FPN, tuple:5
        logits = []  # list:5 {Tensor:(1,80,100,140),Tensor:(1,80,50,70),Tensor:(1,80,25,35),Tensor:(1,80,13,18),Tensor:(1,80,7,9)}
        bbox_reg = []  # list:5 {Tensor:(1,4,100,140),Tensor:(1,4,50,70),Tensor:(1,4,25,35),Tensor:(1,4,13,18),Tensor:(1,4,7,9)}
        centerness = []  # list:5 {Tensor:(1,1,100,140),Tensor:(1,1,50,70),Tensor:(1,1,25,35),Tensor:(1,1,13,18),Tensor:(1,1,7,9)}
        # cls_tower_list = []  # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
        # box_tower_list = []  # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
        for l, feature in enumerate(x):
            cls_tower = self.cls_tower(feature)
            box_tower = self.bbox_tower(feature)
            # cls_tower_list.append(cls_tower)
            # box_tower_list.append(box_tower)

            logits.append(self.cls_logits(cls_tower))
            if self.centerness_on_reg:
                centerness.append(self.centerness(box_tower))
            else:
                centerness.append(self.centerness(cls_tower))

            bbox_pred = self.scales[l](self.bbox_pred(box_tower))
            if self.norm_reg_targets:
                bbox_pred = F.relu(bbox_pred)
                if self.training:
                    bbox_reg.append(bbox_pred)
                else:
                    bbox_reg.append(bbox_pred * self.fpn_strides[l])
            else:
                bbox_reg.append(torch.exp(bbox_pred))
        return logits, bbox_reg, centerness

## ------ self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])

定义如下,和而且从上面的代码可以看到边界框的回归用到了它,

bbox_pred = self.scales[l](self.bbox_pred(box_tower))

这是因为Head接收FPN 5层不同尺寸的特征(五层特征共享head),其回归范围不同,所以需要用缩放因子scale对回归结果进行缩放。

class Scale(nn.Module):
    def __init__(self, init_value=1.0):
        super(Scale, self).__init__()
        self.scale = nn.Parameter(torch.FloatTensor([init_value]))  # scale , size is 1

    def forward(self, input):
        return input * self.scale

打印了以下5个scale 的值,如下所示

number l is 0
self.scale is Parameter containing:
tensor([0.9034], device='cuda:0', requires_grad=True)
=============================================
number l is 1
self.scale is Parameter containing:
tensor([0.9520], device='cuda:0', requires_grad
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

匿名的魔术师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值