BN(Batch Normalization)

BN(Batch Normalization)通过减少内部协变量转移加速深度网络训练。它在每一层输入时加入归一化层,使用可学习参数γ和β保持特征分布,改善梯度流动,允许更大学习率,减少初始化依赖,并作为正则化手段。测试时,均值和方差基于整个数据集计算。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

BN训练    

    1)随机梯度下降法(SGD)对于训练深度网络简单高效,但是它有个毛病,就是需要我们人为的去选择参数,比如学习率、参数初始化、权重衰减系数、Drop out比例等。这些参数的选择对训练结果至关重要,以至于我们很多时间都浪费在这些的调参上。那么使用BN(详见论文《Batch Normalization_ Accelerating Deep Network Training by Reducing Internal Covariate Shift》)之后,你可以不需要那么刻意的慢慢调整参数。

    2)神经网络一旦训练起来,那么参数就要发生更新,除了输入层的数据外(因为输入层数据,我们已经人为的为每个样本归一化),后面网络每一层的输入数据分布是一直在发生变化的,因为在训练的时候,前面层训练参数的更新将导致后面层输入数据分布的变化。以网络第二层为例:网络的第二层输入,是由第一层的参数和input计算得到的,而第一层的参数在整个训练过程中一直在变化,因此必然会引起后面每一层输入数据分布的改变。我们把网络中间层在训练过程中,数据分布的改变称之为:“Internal  Covariate Shift”。Paper所提出的算法,就是要解决在训练过程中,中间层数据分布发生改变的情况,于是就有了Batch  Normalization,这个牛逼算法的诞生

### Batch Normalization in Neural Networks Explained #### Definition and Purpose Batch normalization is a method utilized during the training of artificial neural networks that aims at improving the speed, performance, and stability of these models by normalizing the input layer through re-centering and rescaling[^2]. This process helps mitigate issues related to internal covariate shift, which refers to changes in the distribution of inputs to layers deep within the network as weights are updated. #### Implementation Details During each mini-batch update step, batch normalization computes mean μ_B and variance σ²_B across all instances present in the current batch B. Then, for every instance x^(i), normalized value y^(i) gets calculated according to: \[ y^{(i)}=\frac{x^{(i)}-\mu_{B}}{\sqrt{\sigma_{B}^{2}+\epsilon}} \] where ε represents a small constant added for numerical stability purposes. Afterward, two learnable parameters γ and β get introduced per feature dimension so that transformed output z^(i)=γ*y^(i)+β allows controlling scale and offset after normalization has been applied. This approach ensures activations remain well-behaved throughout forward propagation while also providing regularization benefits similar to dropout techniques described elsewhere[^3]. #### Benefits During Training Phase Applying batch norm leads to several advantages including faster convergence rates due to reduced vanishing/exploding gradient problems; enhanced model generalization capabilities thanks partly because this mechanism acts like another form of noise injection into learned representations; less sensitivity towards specific weight initialization schemes since optimal ranges become more forgiving post-normalization treatment. ```python import tensorflow as tf model = tf.keras.models.Sequential([ ... tf.keras.layers.BatchNormalization(), ... ]) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

贾世林jiashilin

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值