摘要
-
训练大规模,深层卷积神经网络去分类 the 1.2 million high-resolution images .
-
数据集:the ImageNet LSVRC-2010
- 测试集:top-1 and top-5 error rates of 37.5%
and 17.0%
- 测试集:top-1 and top-5 error rates of 37.5%
-
神经网络: 60 million parameters and 650,000 neurons,。包含五个卷积层。five convolutional layers
- max-pooling layers,
- three fully-connected layers with a final 1000-way softmax
- non-saturating neurons 非饱和神经元。
- To reduce overfitting: dropout方法
介绍
- labeled high-resolution images:高分辨率的图像
- CNN应用在大规模高分辨率图像上是昂贵的。
- current GPUs, paired with a highly-optimized implementation of 2D convolution 有训练的能力。
模型修改方法
- removing any convolutional layer (each of which contains no more than 1% of the model’s parameters) resulted inferior performance
数据集
- ImageNet:15 million images
- 大约22000个类别。
- 收集图像工具: Mechanical Turk crowd-sourcing tool
- ILSVRC 数据集
- 1.2 million training images,
- 50,000 validation images
- 150,000 testing images
架构
- eight learned layers
- five convolutional and three fully-connected
ReLU Nonlinearity
f ( x ) = t a n h ( x ) f(x) = tanh(x) f(x)=tanh(x)
f ( x ) = ( 1 + e − x ) − 1 f(x) = (1+ e^{-x})^{-1} f(x)=(1+e−x)−1
f ( x ) = m a x ( 0 , x ) f(x) = max(0,x) f(x)=max(0,x)
Training on Multiple GPUs
Local Response Normalization
Overlapping Pooling
整体架构
-
eight layers with weights
- the first five are convolutional
- the remaining three are fully-connected.
- 将最终输出放进 a 1000-way softmax
- The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.
- 第一层卷积滤波器核:
- the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels
- 第二层将第一层的输出作为输入,with 256 kernels of size 5 × 5 × 48.
- The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer.
-
第四层卷积层: 384 kernels of size 3 × 3 × 192 ,
-
第五个卷积层: 256 kernels of size 3 × 3 × 192
-
The fully-connected layers have 4096 neurons each.
降低过拟合
Data Augmentation
- artificially enlarge the dataset
- altering the intensities of the RGB channels in training images
Dropout
- “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation
学习细节
- stochastic gradient descent
- a batch size of 128 examples
- momentum of 0.9,
- weight decay of 0.0005.
结果
- interchangeably 互换
Qualitative Evaluations
- Computing similarity by using Euclidean distance
Discussion
本文概述
Alexnet网络使用5个卷积层和3个全连接层构建卷积网络,并在其中加入局部归一化操作,使用dropout技术和人工扩大数据集来来防止过拟合。
top-1和top-5错误率
在测试图片的五十个分类概率中,取前面5个最大的分类概率,正确的标签。
T o p − 5 Top-5 Top−5的正确率: (所有测试图片中正确标签包含前五个分类概率的个数)除以(总的测试图片数)
- TOP-5错误率=(所有测试图片中正确标签不在前五个概率中的个数)除以(总的测试图片数)
- TOP-1: 你预测的 l a b e l label label取最后概率向量里面最大的哪一个作为预测结果,如果你的预测结果中概率最大的那个分类正确,即预测正确,否则预测错误。