mmdetection源码分析，以FCOS训练流程为例

最新推荐文章于 2024-09-25 13:51:06 发布

原创

最新推荐文章于 2024-09-25 13:51:06 发布 · 置顶 · 2.9k 阅读

30 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #mmdetection

本文深入剖析mmdetection中FCOS检测器的训练流程，包括配置文件解析、模型构建、数据集加载及训练过程，为读者提供清晰的指导。

简介

mmdetection的源码分析——以FCOS为例

本文参考我“大师兄”的博客（八）深度学习实战 | MMDetection之FCOS（1）_Skies_的博客-优快云博客，参照他的文章思路，本文将从mmdetection的配置文件，和训练文件入手梳理训练的整个流程。

本文以训练流程未主线，重点在于梳理mmdetection 的运作机制，为之后自己构造新的模型做准备。

FCOS简介

FCOS是一种one-stage的anchor free、proposal free的全卷积网络，该网络能够以像素级的水平做目标检测，同时也能应用到实例分割等其他instance level的视觉任务上，此外，FCOS也可单独作为Region Proposal Networks来为two-stage的检测器服务，由于其是anchor-free的，避免了anchor box的一系列复杂问题，如计算和尺度大小设置等，仅通过一个NMS后处理就能达到很好的性能，简单而性能强大。第一次实现了简单的全卷积检测器但是却有着比那些anchor-based 检测器更好的性能

有关FCOS的详情可以参考我的论文阅读笔记：论文阅读|FCOS_yanghao201607030101的博客-优快云博客

其网络架构如下:

在这里插入图片描述

网络由三部分组成，骨干网，特征金字塔和头部网络。

训练

train.py

mmdetection训练时直接使用tools文件夹下的train.py即可，使用命令行的方式加上各种参数即可开始训练,train.py的代码。训练时，会以检测器的配置文件名在work_dirs下创建文件夹，然后存放当前模型的配置以及日志，当然也可以指定将输出存放在此。

python tools/train.py \
    ${CONFIG_FILE} \
    [optional arguments]

其中可选参数包括：

--no-validate (not suggested): Disable evaluation during training.
--work-dir ${WORK_DIR}: Override the working directory.
--resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.
--options 'Key=value': Overrides other settings in the used config.

训练流程

#构建detector
model = build_detector(
        cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
#构建训练数据集，如果工作流中有验证则数据集中会append验证集
datasets = [build_dataset(cfg.data.train)]

#通过train_detector函数开启训练
train_detector(
        model,
        datasets,
        cfg,
        distributed=distributed,
        validate=(not args.no_validate),
        timestamp=timestamp,
        meta=meta)

其中，重点是build_detector,build_dataset,train_detector,三部分。build_detector,build_dataset是通过各自的build函数mmdet/datasets/builder.py和实现的，train_detector则是通过mmdet/apis/train.py实现的。

在这里插入图片描述

build_model

FCOS配置文件

mmdetction中的配置文件都位于config文件夹下，检测器的配置文件一般由四个部分组成，模型的配置文件，学习率相关配置文件，数据集配置文件，以及运行时配置文件。编写一个检测器模型完整的配置文件时，可以选择从基本的数据集、学习率和运行时配置文件中继承配置，对于修改的部分可以在该文件中编写以覆盖继承的配置，此外还有编写模型的配置（一般一个文件夹内只有一个跟配置文件，其它配置文件继承自它）。

以下是FCOS的一个基础配置文件，如果需要修改，可以基础自它，然后覆盖部分配置。

_base_ = [
    '../_base_/datasets/coco_detection.py',
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
# model settings
model = dict(
    type='FCOS',
    pretrained='open-mmlab://detectron/resnet50_caffe',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,  # use P5
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        strides=[8, 16, 32, 64, 128],
        norm_cfg=None,
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
# training and testing settings
train_cfg = dict(
    assigner=dict(
        type='MaxIoUAssigner',
        pos_iou_thr=0.5,
        neg_iou_thr=0.4,
        min_pos_iou=0,
        ignore_iof_thr=-1),
    allowed_border=-1,
    pos_weight=-1,
    debug=False)
test_cfg = dict(
    nms_pre=1000,
    min_bbox_size=0,
    score_thr=0.05,
    nms=dict(type='nms', iou_threshold=0.5),
    max_per_img=100)
img_norm_cfg = dict(
    mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(pipeline=train_pipeline),
    val=dict