【论文阅读】AlexNet: ImageNet Classification with Deep Convolutional Neural Networks

AlexNet是一个8层的卷积神经网络,成功应用于ILSVRC比赛的图像识别,处理ImageNet数据集的1000类任务。网络结构包括5层卷积和3层全连接,使用ReLU激活函数、局部响应归一化以及池化层。特别地,网络设计中包含了dropout技术来防止过拟合。

1. 简介

AlexNet是一个用于图像识别的卷积神经网络,其应用在ILSVRC比赛中,AlexNet所用的数据集是ImageNet,总共识别1000个类别

2. 网络结构

整体网络结果如下图所示,一共有8层,前五层是卷积层,后三层是全连接层,最后一个全链接层输出是经过softmax处理后的1000分类。
在这里插入图片描述
(1)输入图像大小:224*224*3
(2)第一层卷积设置:卷积–>ReLU–>局部响应归一化(LRN)–>池化

  • 卷积核:96个大小为11*11*3的卷积核, stride=4;
  • 输出:根据卷积计算公式 ( i n p u t _ s i z e + 2 ∗ p a d d i n g − k e r n e l _ s i z e ) / s t r i d e + 1 = ( 227 + 2 ∗ 0 − 11 ) / 4 + 1 = 55 (input\_size + 2 * padding - kernel\_size) / stride + 1=(227+2*0-11)/4+1=55 (input_size+2paddingkernel_size)/stride+1=(227+2011)/4+1=55,即输出结果为得到输出是55*55*96,将其分成两组得到55*55*48
  • 池化:使用3x3,stride=2的池化单元进行最大池化操作,输出为(55-3)/2+1 = 27,即27*27*48

(3)第二层卷积:卷积–>ReLU–>局部响应归一化(LRN)–>池化

  • 卷积核:128个5x5x48的卷积核进行卷积,padding=2,stride=1,根据公式:(input_size + 2 * padding - kernel_size) / stride + 1=(27+2*2-5)/1+1=27,得到每组输出是27x27x128
  • 池化:使用3x3,stride=2的池化单元进行最大池化操作(max pooling),(27+2*0-3)/2+1=13,每组得到的输出为13x13x128

(4)第三层卷积:卷积–>ReLU

  • 输入是13x13x256(此处两个GPU之间有通信),使用384个3x3x256的卷积核进行卷积,padding=1,stride=1,根据公式:(input_size + 2 * padding - kernel_size) / stride + 1=(13+2*1-3)/1+1=13,得到输出是13x13x384

(5)第四层卷积:卷积–>ReLU
两组输入均是13x13x192,各组分别使用192个3x3x192的卷积核进行卷积,padding=1,stride=1,根据公式:(input_size + 2 * padding - kernel_size) / stride + 1=(13+2*1-3)/1+1=13,得到每组FeatureMap输出是13x13x192

(6)第五层卷积:卷积–>ReLU–>池化
两组输入均是13x13x192,各组分别使用128个3x3x192的卷积核进行卷积,padding=1,stride=1,根据公式:(input_size + 2 * padding - kernel_size) / stride + 1=(13+2*1-3)/1+1=13,得到每组FeatureMap输出是13x13x128

使用3x3,stride=2的池化单元进行最大池化操作(max pooling)。注意这里使用的是重叠池化,即stride小于池化单元的边长。根据公式:(13+2*0-3)/2+1=6,每组得到的输出为6x6x128

(7)全连接层
输入为6×6×256,使用4096个6×6×256的卷积核进行卷积,由于卷积核尺寸与输入的尺寸完全相同,即卷积核中的每个系数只与输入尺寸的一个像素值相乘一一对应,根据公式:(input_size + 2 * padding - kernel_size) / stride + 1=(6+2*0-6)/1+1=1,得到输出是1x1x4096。既有4096个神经元,该层被称为全连接层

ReLU:这4096个神经元的运算结果通过ReLU激活函数中

Dropout:随机的断开全连接层某些神经元的连接,通过不激活某些神经元的方式防止过拟合。4096个神经元也被均分到两块GPU上进行运算

(8)全连接层2
输入为4096个神经元,输出也是4096个神经元, 4096个神经元被分到两个GPU

(9)输出层(全连接层3)
输入为4096个神经元,输出是1000个神经元。这1000个神经元即对应1000个检测类别

参考

[1]. ImageNet Classification with Deep Convolutional
Neural Networks

### ImageNet Classification Using Deep Convolutional Neural Networks Paper Implementation and Explanation #### Overview of the Approach The approach described involves utilizing a deep convolutional neural network (ConvNet) for classifying images from the ImageNet dataset. When an unseen image enters this system, it undergoes forward propagation within the ConvNet structure. The outcome is a set of probabilities corresponding to different classes that the input could belong to[^1]. These probabilities result from computations involving optimized weights derived during training. #### Training Process Insights Training plays a crucial role in ensuring accurate classifications by optimizing these weights so they can effectively categorize previously seen data points accurately. A sufficiently large training set enhances generalization capabilities; thus, when presented with entirely novel inputs post-training phase completion, the model should still perform reliably well at assigning appropriate labels based on learned features rather than memorized instances. #### Historical Context and Impact In 2012, a groundbreaking paper titled "ImageNet Classification with Deep Convolutional Neural Networks" was published, marking significant advancements in computer vision technology. This work introduced innovations such as deeper architectures compared to earlier models along with improved techniques like ReLU activation functions which accelerated learning processes significantly over traditional methods used before then[^2]. #### Detailed Architecture Review For those interested in delving deeper into recent developments surrounding CNNs up until around 2019, surveys provide comprehensive reviews covering various aspects including architectural improvements made since AlexNet's introduction back in 2012[^3]. Such resources offer valuable insights not only regarding specific design choices but also broader trends shaping modern approaches towards building efficient yet powerful visual recognition systems capable of handling complex tasks efficiently while maintaining high accuracy levels across diverse datasets similar or even larger scale versions of what existed originally within ImageNet itself. ```python import torch from torchvision import models # Load pretrained ResNet-50 model trained on ImageNet model = models.resnet50(pretrained=True) # Set evaluation mode model.eval() def predict_image(image_tensor): """Predicts the class label given an image tensor.""" with torch.no_grad(): outputs = model(image_tensor.unsqueeze(0)) _, predicted_class = torch.max(outputs.data, 1) return predicted_class.item() ```
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值