BatchNorm Layer 是对输入进行均值,方差归一化,消除过大噪点,有助于网络收敛
首先我们先看一下 BatchNormParameter
message BatchNormParameter {
// If false, accumulate global mean/variance values via a moving average.
// If true, use those accumulated values instead of computing mean/variance across the batch.
// use_global_stats来控制获取均值和方差的方式:
// 如果为假(默认),则采用滑动平均计算全局的均值和方差,表示通过当前的minibatch数据计算得到
// 如果为真,则使用累计均值和方差,通过全部数据计算得到的统计值,而非通过当前batch得到
// 一般如果是test则为真,如果是train则等价为假
optional bool use_global_stats = 1;
// How much does the moving average decay each iteration?
// 滑动平均的衰减系数,默认为0.999
optional float moving_average_fraction = 2 [default = .999];
// Small value to add to the variance estimate so that we don't divide by zero.
// 分母附加值,防止除以方差时出现除0操作,默认为1e-5
optional float eps = 3 [default = 1e-5];
}
BatchNorm layer 在prototxt里面的书写:
layer {
name: "bn_conv1"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
batch_norm_param {
use_global_stats: true
}
}
例如在MobileNet中:
layer {
name: "conv6_4/bn"
type: "BatchNorm"
bottom: "conv6_4"
top: "conv6_4/bn"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
use_global_stats: true
eps: 1e-5
}
}