AlexNet论文阅读_arwin csdn-优快云博客

AlexNet是深度学习领域的里程碑，其成功在于使用ReLU激活函数解决了梯度消失问题，应用Dropout防止过拟合，以及数据扩充增加模型泛化能力。该网络包含五个卷积层，三个全连接层，并在ImageNet数据集上取得了显著优于以往的性能。此外，高效GPU实现加速了训练过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

AlexNet论文阅读

文章目录

- - - AlexNet论文阅读

ImageNet Classification with Deep Convolutional Neural Networks

单词

不重要的

high-resolution 高分辨率的

fully segmented images 完全分割的图像

compensate for 补偿

重要的

better than the previous state-of-the-art 是最先进的模型

convolutional neural network 卷积神经网络

five convolutional layer 五个卷积层

The neural network, which has 60 million parameters and 650,000 neurons, con- sists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax.

该神经网络有6000万个参数和65万个神经元，由五个卷积层组成，其中一些层后面是最大池化层，还有三个完全连接层，最后是1000路softmax。

regularization method 正则化方法

prior knowledge 先验知识

Local response normalization 局部响应规范化

not overlap 不重叠

epoch 轮数

pixel 像素

参考资料(61条消息) 基于卷积的图像分类识别（一）：AlexNet_图灵猫-Arwin的博客-优快云博客

摘要

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network,which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used nonsaturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry。

我们训练了一个大型深度卷积神经网络，将ImageNet LSVRC-2010比赛中的120万幅高分辨率图
像分类为1000个不同的类别。在测试数据上，我们分别获得了37.5%和17.0%的前1和前5错误
率，这比以前的最先进技术要好得多。该神经网络有6000万个参数和65万个神经元，由五个卷积
层组成，其中一些层后面是最大池化层，还有三个完全连接层，最后是1000路softmax。为了使
训练更快，我们使用了非饱和神经元和非常高效的GPU实现的卷积运算。为了减少完全连接层中
的过拟合，我们采用了一种最近开发的称为“dropout”的正则化方法，该方法被证明是非常有效
的。我们还在ILSVRC-2012比赛中加入了该模型的变体，并获得了15.3%的前五名测试错误率，
而第二名最佳参赛者的错误率为26.2%。

AlexNet理论

AlexNet之所以能够成功，跟这个模型设计的特点有关，主要有：

使用了非线性激活函数：ReLU
随机失活：Dropout
数据扩充：Data augmentation
其他：多GPU实现，LRN归一化层的使用

1. 激活函数：ReLU

传统的神经网络普遍使用Sigmoid或者tanh等非线性函数作为激励函数，然而它们容易出现梯度弥散或梯度饱和的情况。以Sigmoid函数为例，如下图所示，当输入的值非常大或者非常小的时候，值域的变化范围非常小，使得这些神经元的梯度值接近于0（梯度饱和现象）。由于神经网络的计算本质上是矩阵的连乘，一些近乎于0的值在连乘计算中会越来越小，导致网络训练中梯度更新的弥散现象，即梯度消失。

但是relu不存在这个缺陷，它在第一象限近似函数：y=x，不会出现值域变化小的问题。relu函数直到现在也是学术界和工业界公认的最好用的激活函数之一，在各个不同领域不同模型下的使用非常之多。

ae8ed6b7feb74a75bb8435ba640c35b5

其实，对于relu函数的设计思想我们可以寻求一个生物学解释，大家回忆一下初中的一个生物实验：生物学家们用电流刺激青蛙的大腿肌肉，当电流强度不够强时，肌肉组织不反应（即relu函数在x<0时，输出恒等于0的表现）；当电流强度到达一定的阈值，肌肉组织开始抽搐，且电流强度越大，抽搐反应越强（即relu函数在x>0时，输出为y=x的表现）。本质上，这是一种非线性的体现

2. 随机失活：Dropout

引入Dropout主要是为了防止网络在训练过程中出现的过拟合现象。过拟合现象出现的原因有两方面：1.数据集太小。 2.模型太复杂。

当数据太小时，模型就不会去学习数据中的相关性，不会尝试去理解数据，提取特征。最便捷的一种方式是把数据集中的所有数据强行记忆下来，这就叫过拟合。可以想象，一个过拟合的模型是没有举一反三的能力的，即对数据的泛化能力太差，只能对训练数据集中的数据做很好的处理，一旦换一批新的类似数据，模型的处理能力会很差。
那如何解决呢？两个方案：1.提升数据集容量，让模型难以记忆所有的数据，这时候它就会尝试学习数据，理解数据了，因为相较于记忆所有数据，这是种更容易的解决方案。 2.把模型变的简单些，我们想：之所以高三的学生会选择背答案，其实是因为高三的学生比较聪明，如果换个小学生来，他八成是想不到背答案的。因此模型也是一样的，模型会选择记忆数据一方面是因为模型太复杂，他有能力去记忆所有数据。当我们降低模型的复杂度时，他就不会出现过拟合现象。总之，过拟合的本质是数据集与模型在复杂度上不匹配。

在神经网络中Dropout是通过降低模型复杂度来防止过拟合现象的，对于某一层的神经元，通过一定的概率将某些神经元的计算结果乘0，这个神经元就不参与前向和后向传播，就如同在网络中被删除了一样，同时保持输入层与输出层神经元的个数不变，然后按照神经网络的学习方法进行参数更新。在下一次迭代中，又重新随机删除一些神经元（置为0），直至训练结束。

3. 数据扩充：Data augmentation

由于神经网络算法是基于数据驱动的，因此，有一种观点认为神经网络是靠数据喂出来的，如果能够增加训练数据，提供海量数据进行训练，则能够有效提升算法的准确率，因为这样可以避免过拟合，从而可以进一步增大、加深网络结构。而当训练数据有限时，可以通过一些变换从已有的训练数据集中生成一些新的数据，以快速地扩充训练数据。
其中，最简单、通用的图像数据变形的方式：水平翻转图像，从原始图像中随机裁剪、平移变换，颜色、光照变换，如下图所示：
09055aa8b9f445509a85f21be77ff14b

数据增广确实是提升模型的有效手段，而且最近的增广方式也不仅限于这种随机裁剪，也可以使用生成对抗网络进行图像生成来达到图像增广的目的。

代码复现

train_sample.py

即训练模型

import os 
import sys
import json 
import torch
import torch.nn as nn
from torchvision import transforms, datasets 
import torch.optim as optim 
from tqdm import tqdm  ##进度条可视化
from classic_models.alexnet import AlexNet ##在claaic_models下面的alexnet要导入进来，否则无法调用
#from classic_models.googlenet_v1 import  GoogLeNet
def main():
    # 判断可用设备
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") ##python三元运算 如果true就输出cuda:0，false就输出cpu
    print("using {} device.".format(device))

    # 数据集路径
    data_path = "D://BaiduNetdiskDownload//flower"
    assert os.path.exists(data_path), "{} path does not exist.".format(data_path) 

    # 数据预处理与增强
    """ 
    ToTensor()能够把灰度范围从0-255变换到0-1之间的张量.
    transform.Normalize()则把0-1变换到(-1,1). 具体地说, 对每个通道而言, Normalize执行以下操作: image=(image-mean)/std
    其中mean和std分别通过(0.5,0.5,0.5)和(0.5,0.5,0.5)进行指定。原来的0-1最小值0则变成(0-0.5)/0.5=-1; 而最大值1则变成(1-0.5)/0.5=1. 
    也就是一个均值为0, 方差为1的正态分布. 这样的数据输入格式可以使神经网络更快收敛。
    """
    data_transform = {
        "train": transforms.Compose([transforms.Resize(224), #把图片统一处理成224大小 因为AlexNet有全连接层的存在，所以输入的图片得固定尺寸
                                     transforms.CenterCrop(224),##数据扩充，中心裁剪
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]), ##因为是彩色图片，分别对应rgb三个颜色通道，所以是三维的

    ##val是验证集
        "val": transforms.Compose([transforms.Resize((224, 224)),  # val不需要任何数据增强
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}


    # 使用ImageFlolder加载数据集中的图像，并使用指定的预处理操作来处理图像， ImageFlolder会同时返回图像和对应的标签。 (image path, class_index) tuples元组
    train_dataset = datasets.ImageFolder(root=os.path.join(data_path, "train"), transform=data_transform["train"])
    validate_dataset = datasets.ImageFolder(root=os.path.join(data_path, "val"), transform=data_transform["val"])
    train_num = len(train_dataset)
    val_num = len(validate_dataset)

    # 使用class_to_idx给类别一个index，作为训练时的标签： {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    # 创建一个字典，存储index和类别的对应关系，在模型推理阶段会用到。
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # 将字典写成一个json文件
    json_str = json.dumps(cla_dict, indent=4)
    with open( os.path.join(data_path, 'class_indices.json') , 'w') as json_file:
        json_file.write(json_str)

    batch_size = 64 # batch_size大小，是超参，可调，如果模型跑不起来，尝试调小batch_size
 
    # 使用 DataLoader 将 ImageFloder 加载的数据集处理成批量（batch）加载模式，就比如说我batch_size=64,即每64张图片打一个包
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) 
    validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=4, shuffle=False) # 注意，验证集不需要shuffle
    print("using {} images for training, {} images for validation.".format(train_num, val_num))
    
    # 实例化模型，并送进设备
    #net = GoogLeNet(num_classes = 5)
    net = AlexNet(num_classes=5 )
    net.to(device)

    # 指定损失函数用于计算损失；指定优化器用于更新模型参数；指定训练迭代的轮数，训练权重的存储地址
    loss_function = nn.CrossEntropyLoss() # MSE均方误差 
    optimizer = optim.Adam(net.parameters(), lr=0.0002)  #Adam是优化器 ，lr是学习率
    epochs = 70 ##训练轮数
    save_path = os.path.abspath(os.path.join(os.getcwd(), './results/weights/alexnet'))   ##os.getcwd()返回当前项目存在的路径
    if not os.path.exists(save_path):    
        os.makedirs(save_path)

    best_acc = 0.0 # 初始化验证集上最好的准确率，以便后面用该指标筛选模型最优参数。  
    for epoch in range(epochs):
        ############################################################## train ######################################################
        net.train() 
        acc_num = torch.zeros(1).to(device)    # 初始化，用于计算训练过程中预测正确的数量
        sample_num = 0                         # 初始化，用于记录当前迭代中，已经计算了多少个样本
        # tqdm是一个进度条显示器，可以在终端打印出现在的训练进度，只是为了可视化
        train_bar = tqdm(train_loader, file=sys.stdout, ncols=100)
        for data in train_bar :
            images, labels = data 
            sample_num += images.shape[0] #[64, 3, 224, 224] 计算当前已经计算了多少张图片 第0维是batch_size，第二维是通道数，第三第四是图片的长宽
            optimizer.zero_grad() ##优化器清零  梯度下降算法参数更新 举生活中下山的例子，一种是沿同一方向埋头下山，另一种是走几步抬头看看选择更快的路线
            #清0的目的就是不想让历史信息对之后的决策产生影响，即下山时走几步抬头选择更快的路线
            outputs = net(images.to(device)) # output_shape: [batch_size, num_classes]
            pred_class = torch.max(outputs, dim=1)[1] # torch.max 返回值是一个tuple，第一个元素是max值，第二个元素是max值的索引。
            acc_num += torch.eq(pred_class, labels.to(device)).sum() ##因为是一批量的数据所以要先对64张图片中正确的进行求和然后再累加
            loss = loss_function(outputs, labels.to(device)) # 求损失
            loss.backward() # 自动求导
            optimizer.step() # 梯度下降

            # print statistics 
            train_acc = acc_num.item() / sample_num 
            # .desc是进度条tqdm中的成员变量，作用是描述信息
            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,  epochs, loss)

        ##每做完一整轮的学习后再进行验证
        # validate  验证集    
        # 因为我在测试的时候肯定不想改变模型，就像考试，是为了检验知识的，所以脑子里的知识不能变
        net.eval() ##不改变模型里的参数权重等等
        acc_num = 0.0  # accumulate accurate number per epoch
        ##pytorch里面封装好的不改变梯度的一个管理器 
        with torch.no_grad(): 
            for val_data in validate_loader:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device)) ##输出4行五列，对每一种的预测概率的大小，4行是因为验证集中的batch_size是4
                predict_y = torch.max(outputs, dim=1)[1] #torch.max(outputs, dim=1)会输出最大值和其索引，所以要用集合1来取得索引值
                acc_num += torch.eq(predict_y, val_labels.to(device)).sum().item() 

        val_accurate = acc_num / val_num
        print('[epoch %d] train_loss: %.3f  train_acc: %.3f  val_accuracy: %.3f' %  (epoch + 1, loss, train_acc, val_accurate))   
        # 判断当前验证集的准确率是否是最大的，如果是，则更新之前保存的权重
        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), os.path.join(save_path, "AlexNet.pth") )

        # 每次迭代后清空这些指标，重新计算 
        train_acc = 0.0
        val_accurate = 0.0

    print('Finished Training')

 
# if __name__ == '__main__':
#     main()
main()

AlexNet.py

import torch.nn as nn
import torch
from torchsummary import summary

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        ##两个阶段features和classifier，即对应的红色框和绿色框的两个阶段
        self.features = nn.Sequential(
            ##第一个参数，通道数
            ##第二个参数，96个卷积核 也等价于一次卷积的输出的第一个参数
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[96, 55, 55]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[96, 27, 27]
            # AlexNet是一个层级结构，所以要保证前一次的输出是下一个的输入
            # 即上面output的第一个参数96对应下面输入的第一个参数
            nn.Conv2d(96, 256, kernel_size=5, padding=2),           # output[256, 27, 27]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[256, 13, 13]

            nn.Conv2d(256, 384, kernel_size=3, padding=1),          # output[384, 13, 13]
            nn.ReLU(inplace=True),

            nn.Conv2d(384, 384, kernel_size=3, padding=1),          # output[384, 13, 13]
            nn.ReLU(inplace=True),

            nn.Conv2d(384, 256, kernel_size=3, padding=1),          # output[256, 13, 13]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[256, 6, 6]
        )

        ##神经网络层
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            ##nn.Linear的第一个参数是256*6*6，即featureMap的数量*尺寸，即输入维度，第二个参数是输出维度
            nn.Linear(256 * 6 * 6, 4096),##最后一个featureMap的大小是6*6，数量是256，即上面的output[256, 6, 6]
            nn.ReLU(inplace=True),

            ##第二个神经网络层的输入又是由神经网络第一层的输出决定的
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),

            ##最后一层的输出就是分类的个数
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    ##先执行features再进行classifier
    ##但是为什么这个forward没用到啊
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

def alexnet(num_classes): 
    model = AlexNet(num_classes=num_classes)
    return model

# net = AlexNet(num_classes=1000)
# summary(net.to('cuda'), (3,224,224))
#########################################################################################################################################
# Total params: 62,378,344
# Trainable params: 62,378,344
# Non-trainable params: 0
# ----------------------------------------------------------------
# Input size (MB): 0.57
# Forward/backward pass size (MB): 11.09
# Params size (MB): 237.95
# Estimated Total Size (MB): 249.62
# ----------------------------------------------------------------
# conv_parameters:  3,747,200
# fnn_parameters:  58,631,144   93% 的参数量

test.py

测试

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from classic_models.alexnet import AlexNet 

def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image 自己找的一个测试样本图片
    img_path = "F:/python/Deep-Learning-Image-Classification-Models-Based-CNN-or-Attention-main/dataset/th.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)

    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = "D://BaiduNetdiskDownload//flower//class_indices.json"
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)

    # create model
    model = AlexNet(num_classes=5).to(device)

    # load model weights 即训练出来得到的权重
    weights_path = "results/weights/alexnet/AlexNet.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path))

    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()