Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Rethinking the Inception Architecture for Computer Vision
GoogleNet 网络结构的一种变形 - InceptionV2,改动主要有:
- [1] - 5x5 卷积层被替换为两个连续的 3x3 卷积层. 网络的最大深度增加 9 个权重层. 参数量增加了大约 25%,计算量增加了大约 30%.
两个 3x3 卷积层作用可以代替一个 5x5 卷积层.
- [2] - 28x28 的 Inception 模块的数量由 2 增加到了 3.
- [3] - Inception 模块,Ave 和 Max Pooling 层均有用到. 参考表格.
- [4] - 两个 Inception 模块间不再使用 pooling 层;而在模块 3c 和 4e 中的 concatenation 前采用了 stride-2 conv/pooling 层.
- [5] - 网络结构的第一个卷积层采用了深度乘子为 8 的可分离卷积(separable convolution with depth multiplier 8),减少了计算量,但训练时增加了内存消耗.
Tensorflow Slim 的 Inception V2 定义
"""
Inception V2 分类网络的定义.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def inception_v2_base(inputs,
final_endpoint='Mixed_5c',
min_depth=16,
depth_multiplier=1.0,
use_separable_conv=True,
data_format='NHWC',
scope=None):
"""
Inception V2 基础网络结构定义.
根据给定的输入和最终网络节点构建 Inception V2 网络.
可以构建表格中从输入到 inception(5b) 网络层的网络结构.
参数:
inputs: Tensor,尺寸为 [batch_size, height, width, channels].
final_endpoint: 指定网络定义结束的节点endpoint,即网络深度.
候选值:['Conv2d_1a_7x7', 'MaxPool_2a_3x3',
'Conv2d_2b_1x1', 'Conv2d_2c_3x3',
'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
'Mixed_4a', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
'Mixed_5a', 'Mixed_5b', 'Mixed_5c'].
min_depth: 所有卷积 ops 的最小深度值(通道数,depth value (number of channels)).
当 depth_multiplier < 1 时,强制执行;
当 depth_multiplier >= 1 时,不是主动约束项.
depth_multiplier: 所有卷积 ops 深度(depth (number of channels))的浮点数乘子.
该值必须大于 0.
一般是将该值设为 (0, 1) 间的浮点数值,以减少参数量或模型的计算量.
use_separable_conv: 网络第一个卷积层Conv2d_1a_7x7,采用 separable convolution.
如果值为 False,则采用传统的 conv 层.
data_format: 激活值的数据格式 ('NHWC' or 'NCHW').
scope: 可选变量作用域 variable_scope.
返回值:
tensor_out: 对应到网络最终节点final_endpoint 的输出张量Tensor.
end_points: 外部使用的激活值集合,例如,summaries 和 losses.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0
"""
# end_points 保存相关外用的激活值,例如 summaries 或 losses.
end_points = {
}
# 用于寻找每一层的最薄的深度(depths,通道数).
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
depth = lambda d: max(int(d * depth_multiplier), min_depth)
if data_format != 'NHWC' and data_format != 'NCHW':
raise ValueError('data_format must be either NHWC or NCHW.')
if data_format == 'NCHW' and use_separable_conv:
raise ValueError(
'separable convolution only supports NHWC layout. NCHW data format can'
' only be used when use_separable_conv is False.'
)
concat_dim = 3 if data_format == 'NHWC' else 1
with tf.variable_scope(scope, 'InceptionV2', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1,
padding='SAME',
data_format=data_format):
# 下面的注释中,假设网络的输入尺寸为 224x224.
# 实际上,网络的输入尺寸可以是任何大于 32x32 的尺寸.
# 224 x 224 x 3
end_point = 'Conv2d_1a_7x7'
if use_separable_conv: # 采用可分离卷积
# depthwise_multiplier here is different from depth_multiplier.
# depthwise_multiplier determines the output channels of the initial
# depthwise conv (see docs for tf.nn.separable_conv2d), while
# depth_multiplier controls the # channels of the subsequent 1x1
# convolution. Must have
# in_channels * depthwise_multipler <= out_channels
# so that the separable convolution is not overparameterized.
depthwise_multiplier = min(int(depth(64) / 3), 8)
net = slim.separable_conv2d(inputs, depth(64), [7, 7],
depth_multiplier=depthwise_multiplier,
stride=2,
padding='SAME',
weights_initializer=trunc_normal(1.0),
scope=end_point)
else: # 采用一般卷积
net = slim.conv2d(inputs, depth(64), [7, 7], stride=2,
weights_initializer=trunc_normal(1.0),
scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 112 x 112 x 64
end_point = 'MaxPool_2a_3x3'
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 64
end_point = 'Conv2d_2b_1x1'
net = slim.conv2d(net, depth(64), [1, 1], scope=end_point,
weights_initializer=trunc_normal(0.1))
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 64
end_point = 'Conv2d_2c_3x3'
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 192
end_point = 'MaxPool_3a_3x3'
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 28 x 28 x 192
# Inception module.
end_point = 'Mixed_3b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96),