d2l现代卷积神经网络（全部更新完成）

原创

已于 2023-03-18 08:06:25 修改 · 1.4k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#cnn #人工智能 #机器学习

于 2023-03-14 22:06:56 首次发布

本文详细介绍了AlexNet、VGG、NiN、GoogLeNet、ResNet和DenseNet这六个深度学习模型的关键特点、网络结构以及实现方式。通过代码示例展示了它们的网络实现和输出特征，解释了模型设计的动机和优化策略，如卷积层、池化层、全连接层的使用，以及如何处理过拟合问题。这些模型在图像识别领域有着广泛的应用，体现了深度学习模型的发展历程。

对第七章节的AlexNet、VGG、NiN、GoogLeNet、ResNet、DenseNet进行讲解。

2.2.1为什么会有两次in_c=out_c?

5.2.2为什么要保持通道与尺寸一致？--广播机制与BN层回顾

1.AlexNet

1.1模型概览

模型概览图如下所示：其实AlexNet可以看作是一个加强版的LeNet，相较前者，他的改进标记到了图中，主要有采取dropout，引进Relu代替Sigmoid，使用maxpooling等。

1.2网络实现

上代码：

net = nn.Sequential(
    # 这⾥，我们使⽤⼀个11*11的更⼤窗⼝来捕捉对象。
    # 同时，步幅为4，以减少输出的⾼度和宽度。
    # 另外，输出通道的数⽬远⼤于LeNet
    nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    # 减⼩卷积窗⼝，使⽤填充为2来使得输⼊与输出的⾼和宽⼀致，且增⼤输出通道数
    nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    # 使⽤三个连续的卷积层和较⼩的卷积窗⼝。
    # 除了最后的卷积层，输出通道的数量进⼀步增加。
    # 在前两个卷积层之后，汇聚层不⽤于减少输⼊的⾼度和宽度
    nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Flatten(),
    # 这⾥，全连接层的输出数量是LeNet中的好⼏倍。使⽤dropout层来减轻过拟合
    nn.Linear(6400, 4096), nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 4096), nn.ReLU(),
    nn.Dropout(p=0.5),
    # 最后是输出层。由于这⾥使⽤Fashion-MNIST，所以⽤类别数为10，⽽⾮论⽂中的1000
    nn.Linear(4096, 10))

1.这⾥，我们使⽤⼀个11×11的更⼤窗⼝来捕捉对象。同时，步幅为4，以减少输出的⾼度和宽度。另外，输出通道的数⽬远⼤于LeNet。有(96)。

2.减⼩卷积窗⼝，使⽤填充为2来使得输⼊与输出的⾼和宽⼀致，且增⼤输出通道数。

3.使⽤三个连续的卷积层和较⼩的卷积窗⼝。除了最后的卷积层，输出通道的数量进⼀步增加。在前两个卷积层之后，汇聚层不⽤于减少输⼊的⾼度和宽度。

4.全连接层的输出数量是LeNet中的好⼏倍。使⽤dropout层来减轻过拟合。

5.最后是输出层。由于这⾥使⽤Fashion-MNIST，所以⽤类别数为10，⽽⾮论⽂中的1000。

6.为什么后面使用了两个相同的4096？是因为前面的卷积抽取特征不够好不够深，所以后面采用了两个大的dense来补，砍掉一个的话效果会变差。

1.3模型输出概览

X = torch.randn(1, 1, 224, 224)
for layer in net:
    X=layer(X)
    print(layer.__class__.__name__,'output shape:\t',X.shape)

'''
Conv2d output shape:	 torch.Size([1, 96, 54, 54])
ReLU output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d output shape:	 torch.Size([1, 96, 26, 26])
Conv2d output shape:	 torch.Size([1, 256, 26, 26])
ReLU output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d output shape:	 torch.Size([1, 256, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 256, 12, 12])
ReLU output shape:	 torch.Size([1, 256, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 256, 5, 5])
Flatten output shape:	 torch.Size([1, 6400])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1, 10])
'''

1.4实现

使用上一节讲过的改进纯数据版：我直接整了一大块，后续还想使用直接复制即可。

from torchvision import transforms
import torchvision
from torch.utils import data


def load_data_fashion_mnist_nw2(batch_size, resize=None):
    """下载Fashion-MNIST数据集，然后将其加载到内存中"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=2),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=2))

def train_ch6_data(net, train_iter, test_iter, num_epochs, lr, device):
    """⽤GPU训练模型(在第六章定义)"""
    global train_l, train_acc, metric

    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)

    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    # animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
    #                         legend=['train loss', 'train acc', 'test acc'])
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(num_epochs):
        # 训练损失之和，训练准确率之和，样本数
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        print(f'loss {train_l:.3f}, train acc {train_acc:.3f}, '
              f'test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec '
          f'on {str(device)}')

超参数与数据集加载：

batch_size = 128
train_iter, test_iter = load_data_fashion_mnist_nw2(batch_size, resize=224)

训练命令行：

lr, num_epochs = 0.01, 10
train_ch6_data(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

'''
training on cuda:0
loss 1.336, train acc 0.500, test acc 0.744
loss 0.659, train acc 0.752, test acc 0.791
loss 0.542, train acc 0.798, test acc 0.808
loss 0.476, train acc 0.824, test acc 0.841
loss 0.436, train acc 0.840, test acc 0.851
loss 0.403, train acc 0.853, test acc 0.860
loss 0.381, train acc 0.861, test acc 0.862
loss 0.361, train acc 0.867, test acc 0.870
loss 0.344, train acc 0.875, test acc 0.876
loss 0.335, train acc 0.877, test acc 0.880
1200.0 examples/sec on cuda:0
'''

2.VGG

2.1模型概览

为什么要有VGG？因为Alex没有提供一个通用的模板来指导后续的研究人员设计新的网络。

其与Alex相比的特点标记在了图中：

2.2网络实现

首先，设置好每一个vgg块，是由n个卷积层与一个最大池化层组成:

def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels,
                                kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)

定义vgg块的个数与相应的输入输出通道：其中，