论文提出了batch normalization,用于减少Internal Covariate Shift,防止梯度弥散,还可加速模型的训练。本文就tensorflow中的实现过程和训练测试时的使用差异进行说明。
![]() |
图1 算法说明 |
tf实现函数体如下:
with ops.name_scope(name, "batchnorm", [x, mean, variance, scale, offset]):
inv = math_ops.rsqrt(variance + variance_epsilon)
if scale is not None:
inv *= scale
# Note: tensorflow/contrib/quantize/python/fold_batch_norms.py depends on
# the precise order of ops that are generated by the expression below.
return x * math_ops.cast(inv, x.dtype) + math_ops.cast(
offset - mean * inv if offset is not None else -mean * inv, x.dtype)
把图1中normalize和scale and shift(第3行代入4行)合并,就得到函数体中return结果。
了解过原理和实现后,看下如何使用。此处以此处代码为例。
- 使用位置:非线性函数之前
Y1l = tf.matmul(XX, W1)
Y1bn, update_ema1 = batchnorm(Y1l, O1, S1, tst, iter)
Y1 = tf.nn.sigmoid(Y1bn)
- 训练和测试时的不同:训练时,均值和方差由mini-batch直接获得;测试时,均值和方差通过样本的滑动平均值获得。
def batchnorm(Ylogits, Offset, Scale, is_test, iteration):
exp_moving_avg = tf.train.ExponentialMovingAverage(0.998, iteration) # adding the iteration prevents from averaging across non-existing iterations
bnepsilon = 1e-5
mean, variance = tf.nn.moments(Ylogits, [0])
update_moving_averages = exp_moving_avg.apply([mean, variance])
m = tf.cond(is_test, lambda: exp_moving_avg.average(mean), lambda: mean)
v = tf.cond(is_test, lambda: exp_moving_avg.average(variance), lambda: variance)
Ybn = tf.nn.batch_normalization(Ylogits, m, v, Offset, Scale, bnepsilon)
return Ybn, update_moving_averages
关于滑动平均值的计算过程可查看这里。
关于batch normalization的更多讨论可查看这里
参考文献: