BN

批量归一化是一种用于深度学习的技术,主要目标是加速学习过程并解决协变量漂移问题。它通过在隐藏层中保持恒定的输出分布来稳定网络,即使权重在训练过程中变化。批量归一化通过强制每一层的输出具有零均值和单位方差,使后续层能独立于前一层学习,从而加速训练。此外,由于批量归一化计算的是小批量的均值和方差,引入了一定的噪声,起到了类似正则化的作用,但不应过度依赖其作为主要的正则化手段。

Why does batch normalization work?

(1) We know that normalizing input features can speed up learning, one intuition is that doing same thing for hidden layers should also work.

(2)solve the problem of covariance shift

Suppose you have trained your cat-recognizing network use black cat, but evaluate on colored cats, you will see data distribution changing(called covariance shift). Even there exist a true boundary separate cat and non-cat, you can’t expect learn that boundary only with black cat. So you may need to retrain the network.

For a neural network, suppose input distribution is constant, so output distribution of a certain hidden layer should have been constant. But as the weights of that layer and previous layers changing in the training phase, the output distribution will change, this cause covariance shift from the perspective of layer after it. Just like cat-recognizing network, the following need to re-train. To recover this problem, we use batch normal to force a zero-mean and one-variance distribution. It allow layer after it to learn independently from previous layers, and more concentrate on its own task, and so as to speed up the training process.

(3)Batch normal as regularization(slightly)

In batch normal, mean and variance is computed on mini-batch, which consist not too much samples. So the mean and variance contains noise. Just like dropout, it adds some noise to hidden layer’s activation(dropout randomly multiply activation by 0 or 1).

This is an extra and slight effect, don’t rely on it as a regularizer.

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值