batch normalization 中的 moving_mean与moving_variance理解

本文探讨了TensorFlow中tf.control_dependencies的使用方法及其在确保操作执行顺序方面的重要性。此外，还深入解析了tf.layers.batch_normalization在训练过程中的工作原理，特别是其更新操作如何被正确地集成到训练流程中。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

batch normalization在训练部分代码时看到下面这一行

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss, global_step)

这里先介绍tf.control_dependencies的作用
在有些机器学习程序中我们想要指定某些操作执行的依赖关系，这时我们可以使用tf.control_dependencies()来实现。

control_dependencies(control_inputs)返回一个控制依赖的上下文管理器，使用with关键字可以让在这个上下文环境中的操作都在control_inputs 执行。

with g.control_dependencies([a, b, c]):
  # d and e will only run after a, b, and c have executed.
  d = …

https://blog.youkuaiyun.com/pku_jade/article/details/73498753

https://blog.youkuaiyun.com/u012436149/article/details/72084744

*************************************************************

回到正文

我定义网络结构中用的是tf.layers.batch_normalization，查看它的api，发现一个注释：note: when training, the moving_mean andmoving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op

这里就需要进一步思考为什么会这样，先说一下batch normalization原论文中对moving average的说明：在原论文中，训练过程中均值和方差所求的是每一个mini batch的均值方差，而在测试过程中，并不是求整个测试样本的均值方差，而是在之前训练的过程中求所有batch的均值的期望，方差的期望。然后进行测试推理，详细解释如下: