ResNet，residual，Batch Normalization，迁移学习，ResNext，代码示例

本文链接：https://blog.youkuaiyun.com/2302_80719643/article/details/142924251

网络中的亮点：
Ø 超深的网络结构(突破1000层)
Ø 提出residual模块
Ø 使用Batch Normalization加速训练(丢弃dropout)

结构图：

34层ResNet网络

首先是一个卷积层、池化层、
一系列连接线结构（残差结构）、
最后再通过一个平均池化下采样操作，全连接层（输出层）得到输出。

参数讲解：

下图将我们的卷积层和最大池化下载量程通过简单的堆叠搭建的神经网络，说明并不是它的层数越深效果越好。

梯度消失或梯度爆炸解决：数据标准化、权重初始化、batch normalization

退化（退化就是网络越深反而识别错误率提高的现象）：通过残差结构不断加深我们的网络结构

错误率是模型给出的5个预测结果，有一个正确就算对，都错算错

突变是迭代到一定次数，学习率会改变，一般是乘以0.1。

residual结构：

两个模型在输入都为256时参数量的差别：

3* 3 *256* 256+ 3* 3 *256* 256 =1,179,648

1*1*256*64+3*3*64*64+1*1*64*256=69, 632

注意：主分支与shortcut的输出特征矩阵shape必须相同

左图输入的特征矩阵通过两个3*3的卷积层的到我们的一个结果然后在于我们输入的特征矩阵相加通过relu函数

残差结构实线和虚线的区别：

实线的输入和输出shape一致，虚线表示输入输出shape不一致，需要调整stride和kernel size

虚线：有的层是高宽、通道数都调整，有的只调整深度。

注意：一个卷积核只会产生一个特征矩阵，多个卷积核就产生多个（都是矩阵），叠加在一起就变成多通道的。所以：卷积核个数与输出的深度相同的

stride改变输出特征图长和宽，卷积核通道数决定输出特征图通道数。

Batch Normalization

Batch Normalization的目的是使我们的一批（Batch） feature map每隔维度满足均值为0，方差为1的分布规律。

使用BN时需要注意的问题：
（1）训练时要将traning参数设置为True，在验证时将trainning参数设置为False。在pytorch中可通过创建模型的model.train()和model.eval()方法控制。

（2）batch size尽可能设置大点，设置小后表现可能很糟糕，设置的越大求的均值和方差越接近整个训练集的均值和方差。

（3）建议将bn层放在卷积层（Conv）和激活层（例如Relu）之间，且卷积层不要使用偏置bias，因为没有用，即使使用了偏置bias求出的结果也是一样的。

迁移学习：

优势：
1. 能够快速的训练出一个理想的结果
2. 当数据集较小时也能训练出理想的效果

注意：使用别人预训练模型参数时，要注意别人的预处理方式。

常见的迁移学习方式：

1. 载入权重后训练所有参数

2. 载入权重后只训练最后几层参数

3. 载入权重后在原网络基础上再添加一层全连接层，仅训练最后一个全连接层

ResNext

 更新了block

ResNeXt提出的一个组卷积的概念：将输入通道为256的数据通过1*1卷积压缩成大小为4的32组，合起来也就是128通道，然后进行卷积操作后，再用1*1卷积扩充回32组256通道，将32组数据按对应位置相加合成一个256通道的输出。

（a）表示先划分，单独卷积并计算输出，最后输出相加。split-transform-merge三阶段形式

（b）表示先划分，单独卷积，然后拼接再计算输出。将各分支的最后一个1×1卷积聚合成一个卷积。

（c）就是分组卷积。将各分支的第一个1×1卷积融合成一个卷积，3×3卷积采用group（分组）卷积的形式，分组数=cardinality（基数）

下面的block模块，它们在数学计算上完全等价

以（c）为例：通过1×1的卷积层将输入channel从256降为128，然后利用组卷积进行处理，卷积核大小为3×3组数为32，再利用1×1的卷积层进行升维，将输出与输入相加，得到最终输出。

再看（b）模块，就是将第一层和第二层的卷积分组，将第一层卷积（卷积核大小为1×1，每个卷积核有256层）分为32组，每组4个卷积核，这样每一组输出的channel为4；将第二层卷积也分为32组对应第一层，每一组输入的channel为4，每一组4个卷积核输出channel也为4，再将输出拼接为channel为128的输出，再经过一个256个卷积核的卷积层得到最终输出。

对于（a）模块，就是对b模块的最后一层进行拆分，就是将第二层的32组的输出再经过一层（卷积核大小为1×1，每个卷积核有4层，一共有256个卷积核）卷积，再把这32组输出相加得到最终输出。

代码示例(在ResNet基础上搭建ResNext)：

module.py

import torch.nn as nn
import torch


class BasicBlock(nn.Module):
    # 主分支所采用的卷积核个数有没有发生变化
    expansion = 1
    # 残差结构所需要使用的一系列层结构，out_channel主分支上卷积核的个数
    # downsample=None下采样参数，虚线的残差结构
    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        # 在使用BN时不需要设置偏执
        # 使用BN的同时，卷积中的参数bias置为False；2，BN层放在conv层和relu层中间
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        # downsample引用的是下面ResNet的layer方法，在layer里面找到的downsample
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    """
    注意：原论文中，在虚线残差结构的主分支上，第一个1x1卷积层的步距是2，第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    # 残差结构所使用的卷积核的一个变化，因为再我们的50、101、152层谈到第三层卷积个数等于我们卷积层1、2的4倍
    expansion = 4
    # 这里只是定义网络层的结构，还没有连起来，relu在forward里面加了
    # out_channel指的是我们3*3卷积层中的卷积核个数
    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        # 第二、三层的卷积核个数
        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    #  blocks_num残差结构的数目（列表参数）
    # include_top=True方便我们以后再resnet网络上搭建更深的网络
    def __init__(self,
                 block,
                 blocks_num,
                 num_classes=1000,
                 include_top=True,
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        # 对应的是我们max pool后的
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group
        # padding=3使我们的高和宽缩减为原来的一半(x-7+6)/2+1=x/2+1/2,向下取整为x/2
        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        # 对应的是conv2..._x一系列残差结构
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        # 自适应平均池化下采样操作
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)
        # 对我们的卷积层进行一个初始化操作
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        # “！=”表示不等于，这个if是判断步长是否不等于1，就比如步长是2，他不等于1，就执行if语句。
        # or左边：当stride!=1时，输出的高和宽相较于输入会缩小；or右边：输入channel数与输出的channel数不相等。两者都会使x与identity无法进行相加
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        # channel对应的是我们主分支上第一个卷积层的卷积核个数
        # 虚线进行了数据结构的变化，50层的conv2变化了通道数。conv3 4 5变化了通道数与高宽。而实线是大卷基层内部的，没有数据格式变化
        layers.append(block(self.in_channel,
                            channel,
                             downsample=downsample,
                            stride=stride,
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        # 通过虚线1它结构1之后所得到的特征矩阵的一个深度
        self.in_channel = channel * block.expansion

        # 因为第一层已经搭建好了所以从一开始，将我们的网络结构压进去
        # 传入输入特征矩阵的深度以及残差结构主分支上第一层卷积个数
        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def resnet34(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)


def resnext50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)

train.py

import os
import sys
import json

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets
from tqdm import tqdm

from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     # 标准化处理参数来自官网
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
        "val": transforms.Compose([transforms.Resize(256),
                                   # 长宽比例不动，硬变成256
                                   transforms.CenterCrop(224),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 16
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=nw)

    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))

    # 实例化
    net = resnet34()
    # 迁移学习的方法
    # load pretrain weights
    # download url: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    model_weight_path = "./resnet34-pre.pth"
    # 载入我们的模型权重
    assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
    net.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
    # 冻结初最后一个全连接层外的所有权重，只单独训练它最后一层的权重
    # for param in net.parameters():
    #     param.requires_grad = False

    # change fc layer structure
    in_channel = net.fc.in_features
    # fc是我们定义的全连接层
    # 但是我们的花分类别只有5个所以重新定义我们全连接层
    # 可以将全连接层的参数删掉，再载入到我们的模型中
    net.fc = nn.Linear(in_channel, 5)
    net.to(device)

    # define loss function
    loss_function = nn.CrossEntropyLoss()

    # construct an optimizer
    params = [p for p in net.parameters() if p.requires_grad]
    optimizer = optim.Adam(params, lr=0.0001)

    epochs = 3
    best_acc = 0.0
    save_path = './resNet34.pth'
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        # 控制我们BN的状态
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            logits = net(images.to(device))
            loss = loss_function(logits, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                # loss = loss_function(outputs, test_labels)
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

                val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
                                                           epochs)

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    # load image
    img_path = r"D:\pycharm\深度学习-图像分类\data_set\flower_data\flower_photos\daisy\5547758_eea9edfd54_n.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    with open(json_path, "r") as f:
        class_indict = json.load(f)

    # create model
    model = resnet34(num_classes=5).to(device)

    # load model weights(载入我们训练好的模型参数)
    weights_path = "./resNet34.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))

    # prediction
    model.eval()
    with torch.no_grad():
        # predict class加载我们的图片并压缩维度
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()