AlexNet论文理解及Tensorfow实现_写alexnet,tensorfomers的书-优快云博客

本文详细介绍了AlexNet的网络结构，包括5个卷积层和3个全连接层，并探讨了其在2012年ImageNet比赛中的突破。AlexNet的创新点包括ReLU激活函数、多GPU并行计算、局部响应归一化等。此外，还讨论了如何在TensorFlow中实现AlexNet，并提到了数据增强和Dropout等防止过拟合的策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 基本介绍

AlexNet，是Hinton的学生Alex Krizhevsky在2012年ImageNet比赛夺冠时使用的网络（在ILSVRC-2010上top-1与top-5错误率为37.5%和17.0%，在ILSVRC-2012上top-5错误率为15.3%），在这之后众多优秀网络结构也相继出现，该网络出自论文《‘ImageNet Classification with Deep Convolutional Neural Networks’》。

2. 网络结构

AlexNet
AlexNet由8个学习层组成包括5个卷积层和3个全连接层。

1. conv1
step1（卷积）：
输入： $224\times224\times3$ 的图片(实际上经过预处理变为 $227\times227\times3$ )
卷积核： $11\times11\times3$
步长： $4$
数量： $96$ （分成两个服务器，每个为 $48$ 个）
卷积后的数据： $55\times55\times96\Rightarrow55=(227-11)/4+1$
step2（ReLU）:
ReLU激励函数

step3（池化）：
最大池化，池化核： $3\times3$ ，步长： $2$
池化后的数据（下采样）： $27\times27\times96\Rightarrow27=(55-3)/2+1$

2. conv2
step1(卷积)：
输入： $27\times27\times96$ ，
卷积核： $5\times5\times96$ ，
步长： $1$
数量： $256$
卷积后的数据： $27\times27\times256\Rightarrow27=(27+2\times2-5)/1+1$

step2（ReLU）：
ReLU激励函数

step3（池化）：
最大池化，池化核： $3\times3\times256$ ，步长： $2$
池化后的数据（下采样）： $13\times13\times256\Rightarrow13=(27-3)/2+1$

3. conv3
conv3没有使用下采样（池化）层。
卷积核： $3\times3\times256$
步长： $1$
数量： $384$
卷积后的数据： $13\times13\times384\Rightarrow13=(13+2\times1-3)/1+1$

4. conv4
conv4没有使用下采样（池化）层。
卷积核： $3\times3\times384$
步长： $1$
数量： $384$
卷积后的数据： $13\times13\times384\Rightarrow13=(13+2\times1-3)/1+1$

5. conv5
step1（卷积）：
卷积核： $3\times3\times384$
步长： $1$
数量： $256$
卷积后的数据： $13\times13\times256\Rightarrow13=(13+2\times1-3)/2+1$

step2（ReLU）：
ReLU激励函数

step3（池化）：
最大池化，池化核： $3\times3\times256$ ，步长： $2$
池化后的数据： $6\times6\times256\Rightarrow6=(13-3)/2+1$

6. fc6
step1：
使用4096个神经元，对256个大小为 $6\times6$ 特征图进行全连接，进行卷积变为一个特征点，然后对于4096个神经元中的一个点，是由256个特征图中某些特征图卷积后得到的特征点乘以相应的权重之后，再加上一个偏置得到。

step2：
进行dropout，随机从4096个节点中丢掉一些节点信息（将值清零），然后就得到新的4096个神经元。

7. fc7
类似fc6

8. fc8
采用的是1000个神经元，然后对fc7中4096个神经元进行全连接，然后会通过高斯过滤器，得到1000个float型的值，也就是我们所看到的预测的可能性。

3. AlexNet的创新点

1. ReLU作为激活函数（ReLU Nonlinearity）
ReLU为非饱和函数，在较深的网络中效果超过了sigmoid函数以及tanh函数，成功解决了sigmoid等在网络较深时的梯度弥散问题。（如图所示）
在这里插入图片描述
图中展示了ReLU和tanh在四层的卷积网络在CIFAR-10上的应用，实线为ReLU函数，虚线为tanh函数，能够看出ReLU的效果更好。

2. 使用多个GPU并行计算（Training on Multiple GPUs）
由于当时GPU运算能力有限，AlexNet使用了两个GTX 580进行并行计算，通过在特定层进行GPU的交流，只需要在每个GPU上部署一半的神经元，减少了运行的时间。

3. 局部响应归一化（Local Response Normalization）
在这里插入图片描述
参数解释：
$b^{i}_{x,j}\rightarrow$ 响应归一化活动；
$a^{i}_{x,y}\rightarrow$ 在位置 $(x, y)$ 应用核 $i$ 然后应用ReLU非线性计算；
$\sum_{j=max(0,i-n/2)}^{min(N-1,i+n/2)}\rightarrow$ 映射在相同空间位置的 $n$ 个“相邻”核的求和， $N$ 是层中的内核总数；
$k,n,\alpha,\beta\rightarrow$ 超参数，文章中设置为 $k=2,n=5,\alpha=10^{-4},\beta=0.75$

4. 重叠池化(Overlapping Pooling)
文章中，步长设置为 $2$ ，卷积核设置为 $3\times3$ ，进行池化时会出现重叠的池化层，能够更不容易的出现过拟合情况。相较步长为 $2$ ，卷积核为 $2\times2$ （没有重叠池化）， $t o p - 1$ 和 $t o p - 2$ 的错误率分别下降了 $0.4$ 和 $0.2$ 。

5. 降低过拟合
5.1. 数据增强（Data Augmentation）

通过从 $256\times256$ 的图像中截取 $224\times224$ 的区域（以及他们的水平翻转），并使用提取的照片用于网络的训练，因此训练集的大小扩充了2048倍。使能够使用更深的网络进行训练，避免过拟合问题。
改变训练图像中RGB通道的强度，对RGB像素值集进行PCA，降低过拟合的问题

5.2. Dropout
"droput"将每个隐藏神经元的输出设置为零的概率为0.5，神经元不能依赖于特定其他神经元的存在。因此，它被迫
学习更有力的功能，这些功能可以与许多不同的随机子集结合使用其他神经元。因此降低了过拟合的问题，但使收敛所需的迭代次数加倍了。

4. AlexNet的Tensorflow实现

完整实现过程："Finetuning AlexNet with Tensorflow"
速度测试：

from datetime import datetime
import math
import time
import tensorflow as tf


# 设置batch_size和num_batches
batch_size = 32
num_batches = 100

'''
显示网络每一层的结构
t:tensor
t.op.name:名称
t.get_shape.as_list():尺寸
'''
def print_activations(t):
    print(t.op.name, ' ', t.get_shape().as_list())


def inference(images):
    parameters = []

    '''
    1.通过with tf.name_scope('conv1') as scope可以将scope内生成的Variable自动命名为conv1/xxx，便于区分不同卷积层之间的组件
    2.kernel用tf.truncated_normal截断正态分布函数（标准差为0.1）进行初始化，尺寸为11×11，3通道，数量为64
    3.用tf.nn.conv2d对images用kernel进行卷积操作，步长为4，padding为SAME
    4.初始化biases为0
    5.使用tf.nn.bias_add将conv和biases加起来，并用激活函数tf.nn.relu对结果进行非线性处理
    6.print_activations输出conv1的结构，再将kernel、biases添加到parameters中
    '''
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)
        print_activations(conv1)
        parameters += [kernel, biases]

    '''
    1.tf.nn.lrn对conv1进行LRN处理，使用的depth_radius=4, bias=1, alpha=0.001/9, beta=0.75
        注意：目前基本不会使用LRN（效果不明显），会让前馈和反馈的速度下降（整体速度下降到1/3）
    2.tf.nn.max_pool对lrn1进行最大池化，尺寸为3×3，步长为2，padding为VALID    
    '''
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                           padding='VALID', name='pool1')
    print_activations(pool1)


    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 96, 256],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256],
                                         dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv2)

    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                           padding='VALID', name='pool2')
    print_activations(pool2)


    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 384],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384],
                                         dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv3)


    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 384],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384],
                                         dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv4)

    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256],
                                         dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv5)

    pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                           padding='VALID', name='pool5')
    print_activations(pool5)

    with tf.name_scope('fc6') as scope:
        kernel = tf.Variable(tf.truncated_normal([6*6*256, 4096],
                                                 dtype=tf.float32, stddev=0.1), name='weights')
        biases = tf.Variable(tf.constant(0.0, shape=[4096],
                                         dtype=tf.float32), trainable=True, name='biases')
        flat = tf.reshape(pool5, [-1, 6*6*256])
        fc = tf.nn.relu(tf.matmul(flat, kernel) + biases)
        fc6 = tf.nn.dropout(fc, keep_prob=0.5, name=scope)
        parameters += [kernel, biases]
        print_activations(fc6)

    with tf.name_scope('fc7') as scope:
        kernel = tf.Variable(tf.truncated_normal([4096, 4096],
                                                 dtype=tf.float32, stddev=0.1), name='weights')
        biases = tf.Variable(tf.constant(0.0, shape=[4096],
                                         dtype=tf.float32), trainable=True, name='biases')
        fc = tf.nn.relu(tf.matmul(fc6, kernel) + biases)
        fc7 = tf.nn.dropout(fc, keep_prob=0.5, name=scope)
        parameters += [kernel, biases]
        print_activations(fc7)

    with tf.name_scope('fc8') as scope:
        kernel = tf.Variable(tf.truncated_normal([4096, 1000],
                                                 dtype=tf.float32, stddev=0.1), name='weights')
        biases = tf.Variable(tf.constant(0.0, shape=[1000],
                                         dtype=tf.float32), trainable=True, name='biases')
        fc8 = tf.nn.xw_plus_b(fc7, kernel, biases, name=scope)
        parameters += [kernel, biases]
        print_activations(fc8)

    return fc8, parameters

def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration = %.3f' %(datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))

def main():
    with tf.Graph().as_default():
        image_size = 224
        images = tf.Variable(tf.random_normal([batch_size,
                                               image_size,
                                               image_size, 3],
                                              dtype=tf.float32,
                                              stddev=0.1))
        fc8, parameters = inference(images)

        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)

        time_tensorflow_run(sess, fc8, "Forward")
        objective = tf.nn.l2_loss(fc8)
        grad = tf.gradients(objective, parameters)
        time_tensorflow_run(sess, grad, 'Forward-backward')

if __name__ == '__main__':
    main()