1、预备知识
①卷积操作(卷积核其实是一个立方体,64*64*3通过100个3*3的卷积核得到64*64*100,再通过一个3*3的卷积得到64*64*1,这里padding=1,stride=1卷积后图像长宽不变)
我想表达的是,卷积核的默认维度是输入图像的通道数。
②池化操作(降维)
2、AlexNet网络
①5个卷积层+3个全连接层(5个卷积跟5个激活)
②ReLu非线性激活
③Max Pooling池化(一二五层有池化)
④Dropout regularization(丢弃正则化,Fc1,Fc2)
提速,防止过拟合,提高模型泛化能力
⑤LRN(局部响应归一化,一二层用)
举个例子:k,α,β都是超参数,i=10,N=96,n=5(邻域),第i=10卷积核处理结果(也就是通道)在x,y处的特征为a,归一化是其值与第8,9,10,11,12在该对应位置的特征值和有关。
流程 | 输入 | 卷积核 | 卷积结果 | 激活 | 池化(3*3,s=2) | LRN |
Conv1 | 224*224*3 | 48个11*11,s=4(2组) |
55*55*48 (单核) | ReLu | 27*27*48 | YES |
Conv2 | 27*27*48 | 128个5*5,s=1,p=2 |
27*27*128 (单核) | ReLu | 13*13*128 | YES |
Conv3 (交互) | 13*13*256 | 192个3*3,s=1,p=1 |
13*13*192 (单核) | ReLu | ||
Conv4 | 13*13*192 |
192个3*3, s=1,p=1 |
13*13*192 (单核) | ReLu | ||
Conv5 | 13*13*192 |
128个3*3, s=1,p=1 |
13*13*128 (单核) | ReLu | 6*6*128 | |
Fc1 | 6*6*128*2 | 输出:4096 | ||||
Fc2 | 4096 | 输出:4096 | ||||
Fc3 | 4096 | 输出:1000 | ||||
SoftMax | 1000 | 输出:1000 |
3、VGG(核分解,5*5->2个3*3,7*7->3个3*3)
①vgg主要是核分解,大的卷积核全部分解成3*3的卷积核
vgg16 | vgg19 |
卷积-卷积-池化 | 卷积-卷积-池化 |
卷积-卷积-池化 | 卷积-卷积-池化 |
卷积-卷积-卷积-池化 | 卷积-卷积--卷积-卷积-池化 |
卷积-卷积-卷积-池化 | 卷积-卷积--卷积-卷积-池化 |
卷积-卷积-卷积-池化 | 卷积-卷积--卷积-卷积-池化 |
全连接-全连接-全连接 | 全连接-全连接-全连接 |
4、GoogLeNet
①Inception V1
思想:多尺度卷积核,增加网络宽度,引入1*1卷积降维,引入辅助分类器,取消主分类器全连接层
理解:9个inception组件,2个辅助分类器,1个主分类器,分别在第三个第六个第九个组件后,主分类器没有全连接层,预处理还用到LRN局部相应归一化。
补充:以前接触的都是2*2/2最大池化,没有接触过重叠池化,3*3/2,搞不清楚怎么padding的,有两种参数,SAME和VALID,第一种是不够除,补0,第二种是舍去(这里都是用SAME)例如:3*3--->2*2/s=2池化(SAME:2*2,VALID:1*1),3*3--->2*2/s=1池化(SAME:3*3,VALID:1*1)
输入图片:224*224*3--->通过64个卷积核7*7/s=2/p=3--->112*112*64--->池化---->56*56*64--->LRN---->64个1*1/s=1降维---->56*56*64---->192个3*3/s=1/p=1卷积--->56*56*192--->池化---->28*28*192(到此预处理结束,下面开始第一个Inception模块)输入:28*28*192
64个1*1卷积--->28*28*64
96个1*1卷积(3*3的卷积预处理)--->28*28*96--->128个3*3的卷积--->28*28*128(p=1)
16个1*1卷积(5*5的卷积预处理)--->28*28*16--->32个5*5的卷积--->28*28*32(p=2)
3*3/s=1池化--->28*28*192---->32个1*1的卷积---->28*28*32(步长是1,SAME,池化后大小不变)
输出:28*28*256
②InceptionV2
思想:在V1的基础上增加了核分解(5*5分解成两个3*3的卷积核),增加了BN批归一化操作
BN(Batch Normalization)批归一化
在深度神经网络训练过程中使输入保存相同的分布,让其分布固定,提高收敛效率。白化操作,使每一层的输出都规范化到N(0,1),正态分布。
比如:Batch=32,64个卷积核,64次批归一化操作,每个卷积核对32张图片得到的结果进行一次批归一化操作。
②InceptionV3
思路:这个变化是最大的,V1-V2基本没怎么变,V3提出来3种Inception组件,非对称分解卷积核。同时取消了2个辅助分类器,添加一个新的辅助分类器(位置不同,具体这个论文中的参数表和实际代码不符,总共11个Inception组件,每个组件的输入输出代码中注释的很全)在V3中优化了池化操作,在降维的时候同时增加通道数(一般是卷积和池化串连,这里用一个Inception组件替代池化操作,第4个和第9个)
类型 | 输入 | 卷积核 | 输出 |
Conv | 299*299*3 | 32个3*3/s=2 | 149*149*32 |
Conv | 149*149*32 | 32个3*3/s=1 | 147*147*32 |
Conv | 147*147*32 | 64个3*3/s=1,p=1 | 147*147*64 |
Pool(池化) | 147*147*64 | 3*3/s=2 | 73*73*64 |
Conv | 73*73*64 | 80个1*1 | 73*73*80 |
Conv | 73*73*80 | 192个3*3/s=1 | 71*71*192 |
Pool | 71*71*192 | 3*3/s=2 | 35*35*192 |
3*Inception | 35*35*192 |
见代码 | 35*35*288 |
5*Inception | 35*35*288 |
见代码 | 17*17*768 |
2*Inception | 17*17*768 | 见代码 | 8*8*2048 |
Pool | 8*8*2048 | 8*8 | 1*1*2048 |
linear | 1*1*2048 | 1*1*1000 | |
softmax | 1*1*1000 | 1*1*1000 |
def inception_v3_base(inputs, scope=None):
end_points = {} # 定义一个字典表保存某些关键节点供之后使用
with tf.variable_scope(scope, 'InceptionV3', [inputs]):
# 对三个参数设置默认值
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
# 正式定义Inception V3的网络结构。
# 输入299 x 299 x 3
net = slim.conv2d(inputs, 32, [3, 3], stride=2, scope='Conv2d_1a_3x3')
# 输出149 x 149 x 32
net = slim.conv2d(net, 32, [3, 3], scope='Conv2d_2a_3x3')
# 输出147 x 147 x 32
net = slim.conv2d(net, 64, [3, 3], padding='SAME', scope='Conv2d_2b_3x3')
# 输出147 x 147 x 64
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_3a_3x3')
# 输出73 x 73 x 64
net = slim.conv2d(net, 80, [1, 1], scope='Conv2d_3b_1x1')
# 输出73 x 73 x 80.
net = slim.conv2d(net, 192, [3, 3], scope='Conv2d_4a_3x3')
# 输出71 x 71 x 192.
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_5a_3x3')
# 输出35 x 35 x 192.
'''上面部分代码一共有5个卷积层,2个池化层,实现了对图片数据的尺寸压缩,
我看论文发现给出的框架图流程与源码不符?依源码为准!!!!
InceptionV3提出了3个组件,但源码中不是完全相同,只是非常结构相似,卷积核数量不同
'''
# Inception模块
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
# 第一种类型模块(第一个,4个分支,输入:35*35*192)
with tf.variable_scope('Mixed_5b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
# 输出尺寸35*35*64
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
# 输出尺寸35*35*48
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
# 输出尺寸35*35*64
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
# 输出尺寸35*35*64
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
# 输出尺寸35*35*96
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
# 输出尺寸35*35*96
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
# 输出尺寸35*35*192
branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
# 输出尺寸35*35*32
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出尺寸35*35*256(64+64+96+32)
# 第一种类型模块(第二个,4个分支,输入:35*35*256)
with tf.variable_scope('Mixed_5c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
# 输出尺寸35*35*64
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0b_1x1')
# 输出尺寸35*35*48
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv_1_0c_5x5')
# 输出尺寸35*35*64
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
# 输出尺寸35*35*64
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
# 输出尺寸35*35*96
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
# 输出尺寸35*35*96
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
# 输出尺寸35*35*192
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
# 输出尺寸35*35*64
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出35*35*288(64+64+96+64)
# 第一种类型模块(第三个,4个分支,输入:35*35*288)
with tf.variable_scope('Mixed_5d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出35*35*288(64+64+96+64)
# 第二种类型模块(第一个,3个分支,输入:35*35*288)(相当于一个池化操作)
with tf.variable_scope('Mixed_6a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 384, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
# 输出17 x 17 x 768(384+96+288)
# 第二种类型模块(第二个,4个分支,输入:17*17*768)
with tf.variable_scope('Mixed_6b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 128, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出17*17*768(192+192+192+192,注意这里使用平均池化)
# 第二种类型模块(第三个,4个分支,输入:17*17*768)
with tf.variable_scope('Mixed_6c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出17*17*768
# 第二种类型模块(第四个,4个分支,输入:17*17*768)
with tf.variable_scope('Mixed_6d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出17*17*768
# 第二种类型模块(第五个,4个分支,输入:17*17*768)
with tf.variable_scope('Mixed_6e'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出17*17*768
end_points['Mixed_6e'] = net
# 辅助模型的分类
# 第三种类型模块(第一个,3个分支,输入:17*17*768)(相当于一个池化操作)
with tf.variable_scope('Mixed_7a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 320, [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 192, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
# 输出尺寸8*8*(320+192+768)=8*8*1280
# 第三种类型模块(第二个,4个分支,输入:8*8*1280)区别分支内又有分支
with tf.variable_scope('Mixed_7b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'): # 第二个分支里还有分支
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([
slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(
branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([
slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出8*8*2048(320+(384+384)+(384+384)+192)
# 第三种类型模块(第三个,4个分支,输入:8*8*2048)
with tf.variable_scope('Mixed_7c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([
slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0c_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(
branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([
slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 输出8*8*2048(320+768+768+192)
return net, end_points