Batch Normalization 理论
Batch Normalization 相当于归一化输出的feature map。理论基础首先在Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 文中提出,用于消除深度学习中臭名昭著的梯度消失和梯度爆炸的问题。batch normalization 通常用于卷积层的后面,用于激活层的前面。
其理论如下:

其中上述黑箱Batch Normalization 实现的具体步骤如下:
μB=1mB∑i=1mBx(i)σB2=1mB∑i=1mB(x(i)−μB)2x^(i)=x(i)−μBσB2+ϵz(i)=γx^(i)+β\begin{aligned} \mu _ { B } & = \frac { 1 } { m _ { B } } \sum _ { i = 1 } ^ { m _ { B } } \mathbf { x } ^ { ( i ) } \\ \sigma _ { B } ^ { 2 } & = \frac { 1 } { m _ { B } } \sum _ { i = 1 } ^ { m _ { B } } \left( \mathbf { x } ^ { ( i ) } - \mu _ { B } \right) ^ { 2 } \\ \widehat { \mathbf { x } } ^ { ( i ) } & = \frac { \mathbf { x } ^ { ( i ) } - \mu _ { B } } { \sqrt { \sigma _ { B } ^ { 2 } + \epsilon } } \\ \mathbf { z } ^ { ( i ) } & =\gamma \widehat { x } ^ { ( i ) } + \beta \end{aligned}μBσB2x(i)z(i)=mB1i=1∑mBx(i)=mB1i=1∑mB(x(i)−μB)2=σB2+ϵx(i)−μB=γx(i)+β
• μB is the empirical mean, evaluated over the whole mini-batch B.
• σB is the empirical standard deviation, also evaluated over the whole mini-batch.
• mB is the number of instances in the mini-batch
• x^(i)\widehat { x } ^ { ( i ) }x(i) is the zero-centered and normalized input.
• γ is the scaling parameter for the layer.
• β is the shifting parameter (offset) for the layer.
• ϵ is a tiny number to avoid division by zero (typically 10–3). This is called a
smoothing term.
• z(i) is the output of the BN operation: it is a scaled and shifted version of the inputs.[^1]
Batch Normalization在tensorflow上的实现
tensorflow关于batch normalization 的函数如下所示
tf.layers.batch_normalization(inputs, axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x000000000AC16A20>, gamma_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x000000000AC16A58>, moving_mean_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x000000000AC16A90>, moving_variance_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x000000000AC16AC8>, beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, training=False, trainable=True, name=None, reuse=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, virtual_batch_size=None, adjustment=None)
inputs: 输入,为必选的参数
training: 是否训练,一般也都要给入False or True
beta_initializer: 初始化上述理论中提到的β\betaβ, 只有在scale=True下起作用
gamma_initializer: 初始化上述l理论中提到的γ\gammaγ,只有在scale=True下起作用
moving_mean_initializer: 初始化上述提到的均值μ\muμ
moving_variance_initializer: 初始化上述提到的方差σ\sigmaσ
batch normalization 封装函数
def batch_normalization(inputs, is_training):
moving_var = tf.constant_initializer(0.01)
output = tf.layers.batch_normalization(inputs, moving_variance_initializer=moving_var, training=is_training)
return output
batch normalization 训练注意事项
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
update_weight= tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
在训练时通常常见上述两种操作,两者的详细区别见上条博客。简单来说就是第一行语句是为了获取需要更新的操作,比如 batch normalization的均值和方差的更新;第二行代表获取需要更新的权重等变量,比如weight,bias。因此,一旦使用了batch normalization就一定要在第一行代码的条件下进行训练操作。至于第二条语句,一般在需要指定某些层训练的时候使用,也就是用于冻结部分层,不指定就是全部的变量都进行训练。
训练时的代码如下:
# 直接使用trian_op进行训练即可
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op1 = optimizer.minimize(loss)
参考文献
1.[^1] Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. 2017

本文深入探讨BatchNormalization在深度学习中的应用,旨在解决梯度消失和梯度爆炸问题。介绍了BatchNormalization的理论基础,包括其在卷积神经网络中的位置及具体实现步骤,如均值和方差的计算。同时,提供了TensorFlow中BatchNormalization的函数说明和封装示例,以及训练时的注意事项。
1423

被折叠的 条评论
为什么被折叠?



