batch normalization与layer normalization

最新推荐文章于 2025-09-26 11:50:28 发布

原创最新推荐文章于 2025-09-26 11:50:28 发布 · 413 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习

deep learning 专栏收录该内容

7 篇文章

订阅专栏

批归一化是基于特征进行标准化，常用于CNN，而层归一化适用于RNN和Transformer，因为它处理序列数据的长度变化。标准化有助于加速训练并防止梯度爆炸。然而，批归一化在小批量时可能不准确，且不适于序列模型，因为不同长度的序列可能导致不同的批次统计。

1，batch normalization是以特征为主体进行标准化，一个batch中所有样本的某个特征组成一组数，对这组数进行标准化。

在这里插入图片描述

2，layer normalization是以样本为主体进行标准化，某个样本的所有特征组成一组数，对这组数进行标准化。

在这里插入图片描述

3，标准化最常用的方法就是减去平均值，再除以标准差。

在这里插入图片描述

4，标准化的目的：1），加快训练的速度；2），防止梯度爆炸。

batch normalization常用在CNN上，而用layer normalization用在RNN和transformer上更合适，因为序列数据的长度不一，导致有些特征在部分样本中没有，给基于特征的标准化带来了麻烦。

5，batch normalization的缺点：

1），In batch normalization, we use the batch statistics: the mean and standard deviation corresponding to the current mini-batch. However, when the batch size is small, the sample mean and sample standard deviation are not representative enough of the actual distribution and the network cannot learn anything meaningful.

2），As batch normalization depends on batch statistics for normalization, it is less suited for sequence models. This is because, in sequence models, we may have sequences of potentially different lengths and smaller batch sizes corresponding to longer sequences.