语义分割网络DeepLabV3在Cityscapes数据集上的探索(上)

论文名称:Rethinking Atrous Convolution for Semantic Image Segmentation

1.重新讨论了空洞卷积的使用,使得在串行模块和空间金字塔池化的框架下,能够获取更大的感受野从而获取多尺度信息;

2.改进了ASPP模块:由不同采样率的空洞卷积和BN层组成,尝试以串行或并行的方式布局模块;

3.讨论了一个重要问题:使用大采样率的3×3的空洞卷积,因为图像边界响应无法捕捉远距离信息(小目标),会退化为1×1的卷积, 建议将图像级特征融合到ASPP模块中。

论文下载地址:https://arxiv.org/abs/1706.05587

上图为笔者模型预测效果(由于设备性能原因,训练次数仅为20轮)

参考代码:https://github.com/fregu856/deeplabv3

参考文章:

https://blog.youkuaiyun.com/qq_37541097/article/details/121797301

https://blog.youkuaiyun.com/qq_35759272/article/details/123700919

https://blog.youkuaiyun.com/qq_43492938/article/details/111183906

  1. Cascaded modules

图1(a)未使用空洞卷积,所以图像分辨率一直缩小(信息的丢失非常严重);图1(b) 不改变分辨率以及感受野,其中Block1-4是原始ResNet网络中的层结构,但在Block4中将第一个残差结构里的3x3卷积层以及捷径分支上的1x1卷积层步距stride由2改成了1(即不再进行下采样),并且所有残差结构里3x3的普通卷积层都换成了空洞卷积层。Block5,Block6和Block7是额外新增的层结构,其结构与Block4一致,即由三个残差结构构成。

图1 Cascaded modules without and with atrous convolution

2.Atrous Spatial Pyramid Pooling

ASPP可以以不同的rate有效地捕捉多尺度信息,但是随着Block的深入与空洞卷积rate的增大,会导致卷积退化为1x1。例如,对于尺寸为65x65的特征图,如果将3x3、rate=30的空洞卷积核应用于它,生成的特征图会仅有中心点,捕获不到全局信息。为解决此问题,添加了Image-Level 图像级别的特征。具体来讲,将输入特征图的每一个通道做全局平均池化,再通过256个1x1的卷积核构成新的大小为(1, 1, 256)的特征图,再通过双线性插值得到需要的分辨率的图(如(b)所示),这么做可以弥补当rate太大的时候丢失的信息。(a)的部分包括一个1x1和rate分别为6、12、18的3x3的空洞卷积。将(a)和(b)进行concat,然后再通过256个1x1的卷积核得到新的特征图,上采样后进行损失的计算。

图2 Parallel modules with atrous convolution (ASPP), augmented with image-level features

DeepLab V3中的ASPP结构有5个并行分支(图3中心部分),分别是一个1x1的卷积层,三个3x3的膨胀卷积层,以及一个全局平均池化层(为了增加全局上下文信息global context information),然后通过Concat的方式将这5个分支的输出进行拼接,最后再通过一个1x1的卷积层进一步融合信息。

图3 DeepLab V3的网络结构(优快云:太阳花的小绿豆)

Part 1 数据预处理

  1. 将cityscapes数据集中gtFine文件中的labelIds.png中的id转换为trainId,并将转换后的文件放置到cityscapes_meta_ path中

# 创建id与trainId相对应的字典:
id_to_trainId = {label.id: label.trainId for label in labels}
# vectorize:将函数向量化,用法:np.vectorize(函数)(待函数处理的数据)
id_to_trainId_map_func = np.vectorize(id_to_trainId.get)

train_dirs = ["jena/", "zurich/", "weimar/", "ulm/", "tubingen/", "stuttgart/",
              "strasbourg/", "monchengladbach/", "krefeld/", "hanover/",
              "hamburg/", "erfurt/", "dusseldorf/", "darmstadt/", "cologne/",
              "bremen/", "bochum/", "aachen/"]
val_dirs = ["frankfurt/", "munster/", "lindau/"]
test_dirs = ["berlin", "bielefeld", "bonn", "leverkusen", "mainz", "munich"]
cityscapes_data_path = "/home/luyx/zk/Cityscapes"
cityscapes_meta_path = "/home/luyx/zk/Cityscapes/meta"
if not os.path.exists(cityscapes_meta_path):
    os.makedirs(cityscapes_meta_path)
if not os.path.exists(cityscapes_meta_path + "/label_imgs"):
    os.makedirs(cityscapes_meta_path + "/label_imgs")

# 将gtFine中labelIds.png中的id转换为trainId,并将转换后的文件放置到cityscapes_meta_path中
train_label_img_paths = []

img_dir = cityscapes_data_path + "/leftImg8bit/train/"
label_dir = cityscapes_data_path + "/gtFine/train/"
for train_dir in train_dirs:
    print (train_dir)
    train_img_dir_path = img_dir + train_dir
    train_label_dir_path = label_dir + train_dir
    file_names = os.listdir(train_img_dir_path)
    for file_name in file_names:
        # 提取出图像编号:如“aachen_000000_000019”
        img_id = file_name.split("_leftImg8bit.png")[0]
        # 提取出gtFine中的labelIds.png图像
        gtFine_img_path = train_label_dir_path + img_id + "_gtFine_labelIds.png"
        gtFine_img = cv2.imread(gtFine_img_path, -1) # (shape: (1024, 2048))
        # 使用np.vectorize将id转换为trainId
        label_img = id_to_trainId_map_func(gtFine_img) # (shape: (1024, 2048))
        label_img = label_img.astype(np.uint8)

        cv2.imwrite(cityscapes_meta_path + "/label_imgs/" + img_id + ".png", label_img)
        train_label_img_paths.append(cityscapes_meta_path + "/label_imgs/" + img_id + ".png")

img_dir = cityscapes_data_path + "/leftImg8bit/val/"
label_dir = cityscapes_data_path + "/gtFine/val/"
for val_dir in val_dirs:
    print (val_dir)
    val_img_dir_path = img_dir + val_dir
    val_label_dir_path = label_dir + val_dir
    file_names = os.listdir(val_img_dir_path)
    for file_name in file_names:
        img_id = file_name.split("_leftImg8bit.png")[0]
        gtFine_img_path = val_label_dir_path + img_id + "_gtFine_labelIds.png"
        gtFine_img = cv2.imread(gtFine_img_path, -1) # (shape: (1024, 2048))
        label_img = id_to_trainId_map_func(gtFine_img) # (shape: (1024, 2048))
        label_img = label_img.astype(np.uint8)

        cv2.imwrite(cityscapes_meta_path + "/label_imgs/" + img_id + ".png", label_img)
  1. 计算类别权重

print ("computing class weights")
# 共有20个类别,trainId为0-19
num_classes = 20
trainId_to_count = {}
for trainId in range(num_classes):
    trainId_to_count[trainId] = 0

# 获取每个类别的所有训练label_imgs中的像素总数
for step, label_img_path in enumerate(train_label_img_paths):
    if step % 100 == 0:
        print (step)
    label_img = cv2.imread(label_img_path, -1)
    for trainId in range(num_classes):
        trainId_mask = np.equal(label_img, trainId)
        trainId_count = np.sum(trainId_mask)
        trainId_to_count[trainId] += trainId_count

# 根据ENet论文计算类的权重:
class_weights = []
total_count = sum(trainId_to_count.values())
for trainId, count in trainId_to_count.items():
    trainId_prob = float(count)/float(total_count)
    trainId_weight = 1/np.log(1.02 + trainId_prob)
    class_weights.append(trainId_weight)
print (class_weights)
with open(cityscapes_meta_path + "/class_weights.pkl", "wb") as file:
    pickle.dump(class_weights, file, protocol=2)
  1. 图像增强

import torch
import torch.utils.data
import numpy as np
import cv2
import os
train_dirs = ["jena/", "zurich/", "weimar/", "ulm/", "tubingen/", "stuttgart/",
              "strasbourg/", "monchengladbach/", "krefeld/", "hanover/",
              "hamburg/", "erfurt/", "dusseldorf/", "darmstadt/", "cologne/",
              "bremen/", "bochum/", "aachen/"]
val_dirs = ["frankfurt/", "munster/", "lindau/"]
test_dirs = ["berlin", "bielefeld", "bonn", "leverkusen", "mainz", "munich"]

class DatasetTrain(torch.utils.data.Dataset):
    def __init__(self, cityscapes_data_path, cityscapes_meta_path):
        self.img_dir = cityscapes_data_path + "/leftImg8bit/train/"
        self.label_dir = cityscapes_meta_path + "/label_imgs/"
        self.img_h = 1024
        self.img_w = 2048
        self.new_img_h = 512
        self.new_img_w = 1024
        self.examples = []
        for train_dir in train_dirs:
            train_img_dir_path = self.img_dir + train_dir
            file_names = os.listdir(train_img_dir_path)
            for file_name in file_names:
                img_id = file_name.split("_leftImg8bit.png")[0]
                img_path = train_img_dir_path + file_name
                label_img_path = self.label_dir + img_id + ".png"
                example = {}
                example["img_path"] = img_path
                example["label_img_path"] = label_img_path
                example["img_id"] = img_id
                self.examples.append(example)
        self.num_examples = len(self.examples)

    def __getitem__(self, index):
        example = self.examples[index]
        img_path = example["img_path"]
        img = cv2.imread(img_path, -1) # (shape: (1024, 2048, 3))
        img = cv2.resize(img, (self.new_img_w, self.new_img_h),
                         interpolation=cv2.INTER_NEAREST) # (shape: (512, 1024, 3))
        label_img_path = example["label_img_path"]
        label_img = cv2.imread(label_img_path, -1) # (shape: (1024, 2048))
        label_img = cv2.resize(label_img, (self.new_img_w, self.new_img_h),
                               interpolation=cv2.INTER_NEAREST) # (shape: (512, 1024))
        # 以0.5的概率翻转img与label:
        flip = np.random.randint(low=0, high=2) # 返回一个随机整型数,范围从低(包括)到高(不包括),即[low, high)
        if flip == 1:
            img = cv2.flip(img, 1)
            label_img = cv2.flip(label_img, 1)

        scale = np.random.uniform(low=0.7, high=2.0) # 从一个均匀分布[low,high)中随机采样
        new_img_h = int(scale*self.new_img_h)
        new_img_w = int(scale*self.new_img_w)

        img = cv2.resize(img, (new_img_w, new_img_h),
                         interpolation=cv2.INTER_NEAREST) # (shape: (new_img_h, new_img_w, 3))

        label_img = cv2.resize(label_img, (new_img_w, new_img_h),
                               interpolation=cv2.INTER_NEAREST) # (shape: (new_img_h, new_img_w))

        # 从img和label中随机选取一个256*256的裁剪框
        start_x = np.random.randint(low=0, high=(new_img_w - 256))
        end_x = start_x + 256
        start_y = np.random.randint(low=0, high=(new_img_h - 256))
        end_y = start_y + 256

        img = img[start_y:end_y, start_x:end_x] # (shape: (256, 256, 3))
        label_img = label_img[start_y:end_y, start_x:end_x] # (shape: (256, 256))
        # 标准化img图像
        img = img/255.0
        img = img - np.array([0.485, 0.456, 0.406])
        img = img/np.array([0.229, 0.224, 0.225]) # (shape: (256, 256, 3))
        img = np.transpose(img, (2, 0, 1)) # (shape: (3, 256, 256))
        img = img.astype(np.float32)

        img = torch.from_numpy(img) # (shape: (3, 256, 256))
        label_img = torch.from_numpy(label_img) # (shape: (256, 256))

        return (img, label_img)

    def __len__(self):
        return self.num_examples

Part 2 构建网络

本文中的网络流程与图3类似,但部分不同。输入图像后,首先经过无fully connected layer, avg pool, layer 4与layer 5的resnet18,再经过4个类似图3中的BottleNeck结构(本文中的BottleNeck为2个BN后的3*3卷积与1个BN后的1*1卷积结果相加),经过ASPP层后upsample输出。

import torchvision.models as models

def make_layer(block, in_channels, channels, num_blocks, stride=1, dilation=1):
    strides = [stride] + [1]*(num_blocks - 1) # (stride == 2, num_blocks == 4 --> strides == [2, 1, 1, 1])
    blocks = []
    for stride in strides:
        blocks.append(block(in_channels=in_channels, channels=channels, stride=stride, dilation=dilation))
        in_channels = block.expansion*channels
    layer = nn.Sequential(*blocks) # (*blocks: call with unpacked list entires as arguments)
    return layer

class BasicBlock(nn.Module):
    expansion = 1
    def __init__(self, in_channels, channels, stride=1, dilation=1):
        super(BasicBlock, self).__init__()
        out_channels = self.expansion*channels
        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3, stride=stride, padding=dilation, dilation=dilation, bias=False)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=1, padding=dilation, dilation=dilation, bias=False)
        self.bn2 = nn.BatchNorm2d(channels)
        if (stride != 1) or (in_channels != out_channels):
            conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
            bn = nn.BatchNorm2d(out_channels)
            self.downsample = nn.Sequential(conv, bn)
        else:
            self.downsample = nn.Sequential()
    def forward(self, x):
        # (x has shape: (batch_size, in_channels, h, w))
        out = F.relu(self.bn1(self.conv1(x))) # (shape: (batch_size, channels, h, w) if stride == 1, (batch_size, channels, h/2, w/2) if stride == 2)
        out = self.bn2(self.conv2(out)) # (shape: (batch_size, channels, h, w) if stride == 1, (batch_size, channels, h/2, w/2) if stride == 2)
        out = out + self.downsample(x) # (shape: (batch_size, channels, h, w) if stride == 1, (batch_size, channels, h/2, w/2) if stride == 2)
        out = F.relu(out) # (shape: (batch_size, channels, h, w) if stride == 1, (batch_size, channels, h/2, w/2) if stride == 2)
        return out

class ResNet_BasicBlock_OS8(nn.Module):
    def __init__(self, num_layers):
        super(ResNet_BasicBlock_OS8, self).__init__()
        if num_layers == 18:
            resnet = models.resnet18()
            # load pretrained model:
            resnet.load_state_dict(torch.load("/home/luyx/zk/Cityscapes/MANYTESTS/DeepLabV3+/deeplabv3-master/deeplabv3-master/pretrained_models/resnet/resnet18-5c106cde.pth"))
            # remove fully connected layer, avg pool, layer4 and layer5:
            self.resnet = nn.Sequential(*list(resnet.children())[:-4])
            num_blocks_layer_4 = 2
            num_blocks_layer_5 = 2
            print ("pretrained resnet, 18")
        self.layer4 = make_layer(BasicBlock, in_channels=128, channels=256, num_blocks=num_blocks_layer_4, stride=1, dilation=2)
        self.layer5 = make_layer(BasicBlock, in_channels=256, channels=512, num_blocks=num_blocks_layer_5, stride=1, dilation=4)
    def forward(self, x):
        # (x has shape (batch_size, 3, h, w))
        # pass x through (parts of) the pretrained ResNet:
        c3 = self.resnet(x) # (shape: (batch_size, 128, h/8, w/8)) (it's called c3 since 8 == 2^3)
        output = self.layer4(c3) # (shape: (batch_size, 256, h/8, w/8))
        output = self.layer5(output) # (shape: (batch_size, 512, h/8, w/8))
        return output

ASPP

class ASPP(nn.Module):
    def __init__(self, num_classes):
        super(ASPP, self).__init__()
        self.conv_1x1_1 = nn.Conv2d(512, 256, kernel_size=1)
        self.bn_conv_1x1_1 = nn.BatchNorm2d(256)
        self.conv_3x3_1 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=6, dilation=6)
        self.bn_conv_3x3_1 = nn.BatchNorm2d(256)
        self.conv_3x3_2 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=12, dilation=12)
        self.bn_conv_3x3_2 = nn.BatchNorm2d(256)
        self.conv_3x3_3 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=18, dilation=18)
        self.bn_conv_3x3_3 = nn.BatchNorm2d(256)
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv_1x1_2 = nn.Conv2d(512, 256, kernel_size=1)
        self.bn_conv_1x1_2 = nn.BatchNorm2d(256)
        self.conv_1x1_3 = nn.Conv2d(1280, 256, kernel_size=1) # (1280 = 5*256)
        self.bn_conv_1x1_3 = nn.BatchNorm2d(256)
        self.conv_1x1_4 = nn.Conv2d(256, num_classes, kernel_size=1)

    def forward(self, feature_map):
        # (feature_map has shape (batch_size, 512, h/16, w/16)) (assuming self.resnet is ResNet18_OS16 or ResNet34_OS16. If self.resnet instead is ResNet18_OS8 or ResNet34_OS8, it will be (batch_size, 512, h/8, w/8))
        feature_map_h = feature_map.size()[2] # (== h/16)
        feature_map_w = feature_map.size()[3] # (== w/16)
        out_1x1 = F.relu(self.bn_conv_1x1_1(self.conv_1x1_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
        out_3x3_1 = F.relu(self.bn_conv_3x3_1(self.conv_3x3_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
        out_3x3_2 = F.relu(self.bn_conv_3x3_2(self.conv_3x3_2(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
        out_3x3_3 = F.relu(self.bn_conv_3x3_3(self.conv_3x3_3(feature_map))) # (shape: (batch_size, 256, h/16, w/16))

        out_img = self.avg_pool(feature_map) # (shape: (batch_size, 512, 1, 1))
        out_img = F.relu(self.bn_conv_1x1_2(self.conv_1x1_2(out_img))) # (shape: (batch_size, 256, 1, 1))
        out_img = F.upsample(out_img, size=(feature_map_h, feature_map_w), mode="bilinear") # (shape: (batch_size, 256, h/16, w/16))

        out = torch.cat([out_1x1, out_3x3_1, out_3x3_2, out_3x3_3, out_img], 1) # (shape: (batch_size, 1280, h/16, w/16))
        out = F.relu(self.bn_conv_1x1_3(self.conv_1x1_3(out))) # (shape: (batch_size, 256, h/16, w/16))
        out = self.conv_1x1_4(out) # (shape: (batch_size, num_classes, h/16, w/16))
        return out

Deeplabv3将上述结构融合

class DeepLabV3(nn.Module):
    def __init__(self, model_id, project_dir):
        super(DeepLabV3, self).__init__()
        self.num_classes = 20
        self.model_id = model_id
        self.project_dir = project_dir
        self.create_model_dirs()
        self.resnet = ResNet18_OS8()
        self.aspp = ASPP(num_classes=self.num_classes)
    def forward(self, x):
        # (x has shape (batch_size, 3, h, w))
        h = x.size()[2]
        w = x.size()[3]
        feature_map = self.resnet(x) # (shape: (batch_size, 512, h/8, w/8))
        output = self.aspp(feature_map) # (shape: (batch_size, num_classes, h/16, w/16))
        output = F.upsample(output, size=(h, w), mode="bilinear") # (shape: (batch_size, num_classes, h, w))
        return output

    def create_model_dirs(self):
        self.logs_dir = self.project_dir + "/training_logs"
        self.model_dir = self.logs_dir + "/model_%s" % self.model_id
        self.checkpoints_dir = self.model_dir + "/checkpoints"
        if not os.path.exists(self.logs_dir):
            os.makedirs(self.logs_dir)
        if not os.path.exists(self.model_dir):
            os.makedirs(self.model_dir)
            os.makedirs(self.checkpoints_dir)

·未完待续·

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值