FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn
FCOS 代码 (三) demo过程的整个流程
FCOS 并没有使用RPN来回归边界框,而是采用逐像素的方法来回归边界框,只不过这部分的代码名称依然是RPN而以。下面打印了这部分的网络结构。
FCOSModule(
(head): FCOSHead(
(cls_tower): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
(2): ReLU()
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): GroupNorm(32, 256, eps=1e-05, affine=True)
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): GroupNorm(32, 256, eps=1e-05, affine=True)
(8): ReLU()
(9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(10): GroupNorm(32, 256, eps=1e-05, affine=True)
(11): ReLU()
)
(bbox_tower): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
(2): ReLU()
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): GroupNorm(32, 256, eps=1e-05, affine=True)
(5): ReLU()
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): GroupNorm(32, 256, eps=1e-05, affine=True)
(8): ReLU()
(9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(10): GroupNorm(32, 256, eps=1e-05, affine=True)
(11): ReLU()
)
(cls_logits): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(centerness): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(scales): ModuleList(
(0): Scale()
(1): Scale()
(2): Scale()
(3): Scale()
(4): Scale()
)
)
(box_selector_test): FCOSPostProcessor()
)
1)FCOSHead 部分
i. cls_tower 和 bbox_tower
可以看到这两部分的网络结构是一样的,cls_tower用于分类分支,bbox_tower用于回归分支。因为分类和回归的任务性质不同,这两个部分不可以共享网络参数,所以每个都需要单独定义。它们两部分最终的输出size如下所示,cls_tower_list为cls_tower的输出,box_tower_list为bbox_tower的输出(这两部分是为了看一下这两网络的输出形式而在代码里新加的,源代码里没有)。包含5个输出(FPN的5个层),通道数都为256。
cls_tower_list = [] # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
box_tower_list = [] # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
ii. cls_logits
接收来自cls_tower输出的特征,输出分类得分图,它的输出结果如下所示,一共5个输出,对应之前ResNet结合FPN的5个不同层,每个输出都有80个通道,每个通道表示一个类别。
logits = [] # list:5 {Tensor:(1,80,100,140),Tensor:(1,80,50,70),Tensor:(1,80,25,35),Tensor:(1,80,13,18),Tensor:(1,80,7,9)}
iii. bbox_pred
接收来自bbox_tower输出的特征,输出回归的向量,如下所示,5个输出,每个有4个通道,表示对应于分类得分图位置的回归向量(l,t,r,b)
bbox_reg = [] # list:5 {Tensor:(1,4,100,140),Tensor:(1,4,50,70),Tensor:(1,4,25,35),Tensor:(1,4,13,18),Tensor:(1,4,7,9)}
iv. centerness
接收来自bbox_tower输出的特征,FCOS论文中提出的中心质量分支,输出如下所示,每个只有1个通道,但会与对应的80个通道的得分图元素相乘(中心质量分支作用于分类得分图)。
centerness = [] # list:5 {Tensor:(1,1,100,140),Tensor:(1,1,50,70),Tensor:(1,1,25,35),Tensor:(1,1,13,18),Tensor:(1,1,7,9)}
v. FCOSHead 类
至此,FCOSHead类的定义如下,它构建了上述的(i~iv)head网络,所谓的head,就是输出头,通过它得到网络每个部分(分类,回归和中心质量)的最后输出(网络的输出,不是最终的预测结果,最终的预测结果还需根据整个输出进行筛选确定),相当于把来自FPN的5层特征输入同一个head得到5层对应的输出。下面分析一些其中的细节:
class FCOSHead(torch.nn.Module):
def __init__(self, cfg, in_channels): # in_channels:256
"""
Arguments:
in_channels (int): number of channels of the input feature
"""
super(FCOSHead, self).__init__()
# TODO: Implement the sigmoid version first.
num_classes = cfg.MODEL.FCOS.NUM_CLASSES - 1 # 80
self.fpn_strides = cfg.MODEL.FCOS.FPN_STRIDES # [8,16,32,64,128]
self.norm_reg_targets = cfg.MODEL.FCOS.NORM_REG_TARGETS # True
self.centerness_on_reg = cfg.MODEL.FCOS.CENTERNESS_ON_REG # True
self.use_dcn_in_tower = cfg.MODEL.FCOS.USE_DCN_IN_TOWER # False
cls_tower = []
bbox_tower = []
for i in range(cfg.MODEL.FCOS.NUM_CONVS): # 4
if self.use_dcn_in_tower and \
i == cfg.MODEL.FCOS.NUM_CONVS - 1:
conv_func = DFConv2d
else:
conv_func = nn.Conv2d
cls_tower.append(
conv_func(
in_channels,
in_channels,
kernel_size=3,
stride=1,
padding=1,
bias=True
)
)
cls_tower.append(nn.GroupNorm(32, in_channels))
cls_tower.append(nn.ReLU())
bbox_tower.append(
conv_func(
in_channels,
in_channels,
kernel_size=3,
stride=1,
padding=1,
bias=True
)
)
bbox_tower.append(nn.GroupNorm(32, in_channels))
bbox_tower.append(nn.ReLU())
self.add_module('cls_tower', nn.Sequential(*cls_tower))
self.add_module('bbox_tower', nn.Sequential(*bbox_tower))
self.cls_logits = nn.Conv2d(
in_channels, num_classes, kernel_size=3, stride=1,
padding=1
)
self.bbox_pred = nn.Conv2d(
in_channels, 4, kernel_size=3, stride=1,
padding=1
)
self.centerness = nn.Conv2d(
in_channels, 1, kernel_size=3, stride=1,
padding=1
)
# initialization
for modules in [self.cls_tower, self.bbox_tower,
self.cls_logits, self.bbox_pred,
self.centerness]:
for l in modules.modules():
if isinstance(l, nn.Conv2d):
torch.nn.init.normal_(l.weight, std=0.01)
torch.nn.init.constant_(l.bias, 0)
# initialize the bias for focal loss
prior_prob = cfg.MODEL.FCOS.PRIOR_PROB # 0.01
bias_value = -math.log((1 - prior_prob) / prior_prob) # -4.59511985013459
torch.nn.init.constant_(self.cls_logits.bias, bias_value)
self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])
def forward(self, x): # form FPN, tuple:5
logits = [] # list:5 {Tensor:(1,80,100,140),Tensor:(1,80,50,70),Tensor:(1,80,25,35),Tensor:(1,80,13,18),Tensor:(1,80,7,9)}
bbox_reg = [] # list:5 {Tensor:(1,4,100,140),Tensor:(1,4,50,70),Tensor:(1,4,25,35),Tensor:(1,4,13,18),Tensor:(1,4,7,9)}
centerness = [] # list:5 {Tensor:(1,1,100,140),Tensor:(1,1,50,70),Tensor:(1,1,25,35),Tensor:(1,1,13,18),Tensor:(1,1,7,9)}
# cls_tower_list = [] # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
# box_tower_list = [] # list:5 {Tensor:(1,256,100,140),Tensor:(1,256,50,70),Tensor:(1,256,25,35),Tensor:(1,256,13,18),Tensor:(1,256,7,9)}
for l, feature in enumerate(x):
cls_tower = self.cls_tower(feature)
box_tower = self.bbox_tower(feature)
# cls_tower_list.append(cls_tower)
# box_tower_list.append(box_tower)
logits.append(self.cls_logits(cls_tower))
if self.centerness_on_reg:
centerness.append(self.centerness(box_tower))
else:
centerness.append(self.centerness(cls_tower))
bbox_pred = self.scales[l](self.bbox_pred(box_tower))
if self.norm_reg_targets:
bbox_pred = F.relu(bbox_pred)
if self.training:
bbox_reg.append(bbox_pred)
else:
bbox_reg.append(bbox_pred * self.fpn_strides[l])
else:
bbox_reg.append(torch.exp(bbox_pred))
return logits, bbox_reg, centerness
## ------ self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in range(5)])
定义如下,和而且从上面的代码可以看到边界框的回归用到了它,
bbox_pred = self.scales[l](self.bbox_pred(box_tower))
这是因为Head接收FPN 5层不同尺寸的特征(五层特征共享head),其回归范围不同,所以需要用缩放因子scale对回归结果进行缩放。
class Scale(nn.Module):
def __init__(self, init_value=1.0):
super(Scale, self).__init__()
self.scale = nn.Parameter(torch.FloatTensor([init_value])) # scale , size is 1
def forward(self, input):
return input * self.scale
打印了以下5个scale 的值,如下所示
number l is 0
self.scale is Parameter containing:
tensor([0.9034], device='cuda:0', requires_grad=True)
=============================================
number l is 1
self.scale is Parameter containing:
tensor([0.9520], device='cuda:0', requires_grad

本文详细解析了FCOS(Fully Convolutional One-Stage Object Detection)目标检测算法的实现,重点介绍了FCOSHead模块的结构,包括分类、回归和中心质量分支。FCOSHead不使用RPN,而是采用逐像素方法回归边界框,通过几个卷积层处理特征。FCOSHead的输出包括分类得分图、边界框回归向量和中心质量分数。后处理阶段,使用FCOSPostProcessor进行筛选和非极大值抑制,得到最终的预测边界框。整个流程展示了FCOS如何从特征图生成预测并进行优化,以实现高效的目标检测。
最低0.47元/天 解锁文章
993

被折叠的 条评论
为什么被折叠?



