caffe中的normalization_layer

本文详细解读了Caffe-SSD中L2正则化层的实现原理与代码流程,包括参数解释及不同设置下的具体运作方式。

caffe-ssd里面有normalization的实现,包括.hpp,.cpp,.cu。其实现的是L2-normalization 
L2正则化的公式是: 
这里写图片描述 
现在来看caffe的代码实现。 
首先是caffe.proto,这里面定义了normalization_parameter 
message NormalizeParameter { 
optional bool across_spatial = 1 [default = true]; 
// Initial value of scale. Default is 1.0 for all 
optional FillerParameter scale_filler = 2; 
// Whether or not scale parameters are shared across channels. 
optional bool channel_shared = 3 [default = true]; 
// Epsilon for not dividing by zero while normalizing variance 
optional float eps = 4 [default = 1e-10]; 

这里面有两个很重要的参数,across_spatial和channel_shared。 
accross_spatial决定了normalization的范围,如果为true的话(默认),则对每个num(channel*height*width)整体进行normalization,也就是上面xi的平方加和的个数是channel*height*width;如果是false的话,就表明normalization不是accross_spatial的,上面加和的个数是channel,也就是说,spatial中的每个像素点(height*width个数)分别进行normalization,这就大大减小了normalization的范围. 
至于channel_shared。在上面的归一化完了之后,要将top_data乘以一个scale(这个scale是normalization_layer的唯一的参数),如果channel_shared为true(默认),那么top_data的所有channel都乘以同一个数,如果channel_shared为false,那么top_data的channel乘的数是不一样的。 
下面看forward_cpu。

for (int n = 0; n < num; ++n) {
    caffe_sqr<Dtype>(dim, bottom_data, buffer_data);
    if (across_spatial_) {
      // add eps to avoid overflow
      norm_data[n] = pow(caffe_cpu_asum<Dtype>(dim, buffer_data)+eps_,
                         Dtype(0.5));
      caffe_cpu_scale<Dtype>(dim, Dtype(1.0 / norm_data[n]), bottom_data,
                             top_data);
    } else {
      caffe_cpu_gemv<Dtype>(CblasTrans, channels, spatial_dim, Dtype(1),
                            buffer_data, sum_channel_multiplier, Dtype(1),
                            norm_data);
      // compute norm
      caffe_powx<Dtype>(spatial_dim, norm_data, Dtype(0.5), norm_data);
      // scale the layer
      caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, spatial_dim,
                            1, Dtype(1), sum_channel_multiplier, norm_data,
                            Dtype(0), buffer_data);
      caffe_div<Dtype>(dim, bottom_data, buffer_data, top_data);
      norm_data += spatial_dim;
    }
    // scale the output
    if (channel_shared_) {   //defalut为true
      caffe_scal<Dtype>(dim, scale[0], top_data);  //都乘以一个scale[0]
    } else {
      caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, spatial_dim,
                            1, Dtype(1), scale, sum_spatial_multiplier,
                            Dtype(0),
                            buffer_data);
      caffe_mul<Dtype>(dim, top_data, buffer_data, top_data);
    }
    bottom_data += dim;
    top_data += dim;
  }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

注意,这里的循环以及最后的bottom_data += dim; top_data += dim;这表明任何情况下normalization_layer都是对每一个num进行归一化的,而不是对所有的num进行归一化。 
首先是 caffe_sqr(dim, bottom_data, buffer_data); 这是将bottom_data的第一个num进行element_wise的平方,并保存在buffer_data中。 
然后进入第一个if(也就是across_spatial为true,此时norm_data形状为(num,1,1,1))

if (across_spatial_) {
// add eps to avoid overflow
norm_data[n] = pow(caffe_cpu_asum<Dtype>(dim, buffer_data)+eps_,
                   Dtype(0.5));
caffe_cpu_scale<Dtype>(dim, Dtype(1.0 / norm_data[n]), bottom_data,
                       top_data);
 }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

第一句算出了buffer-data所有元素的和(加上eps_防止overflow),然后开根号 
第二句是将bottom_data的所有元素乘上norm_data[n]的导数。至此across_spatial为true下的normalization就完成了。 
下面是第一个else,也就是across_spatial为false的情况 
else { 
caffe_cpu_gemv(CblasTrans, channels, spatial_dim, Dtype(1), 
buffer_data, sum_channel_multiplier, Dtype(1), 
norm_data); 
// compute norm 
caffe_powx(spatial_dim, norm_data, Dtype(0.5), norm_data); 
// scale the layer 
caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, channels, spatial_dim, 
1, Dtype(1), sum_channel_multiplier, norm_data, 
Dtype(0), buffer_data); 
caffe_div(dim, bottom_data, buffer_data, top_data); 
norm_data += spatial_dim; 

这段代码要完成的事情我们上面已经介绍了,很简单,只不过实现上比across_spatial为true要复杂些。我们看到上面总共有4个操作,现分别介绍。(注意,此时norm_data为(num,1,height,width))

第一个操作: 
norm_data=1*buffer_data*sum_channel_multiplier+1*norm_data 
(后面那个1改成0也行,因为norm_data没有进行任何初始化,所以其原始元素为0) 
其中,buffer_data形状为channel*spatial_dim(计算时要转置,因为第一个参数是CblasTrans) 
sum_channel_multiplier形状为 channel*1(其元素都是1) 
norm_data(参与运算的)的形状为spatial_dim*1 
这里的norm_data相当于在channel维度上对元素进行了求平方和,然后结果存在norm_data中(存了spatial_dim也就是height*width个), 
第二个操作: 
caffe_powx(spatial_dim, norm_data, Dtype(0.5), norm_data); 
对上面得到的norm_data开根号 
第三个操作: 
buffer_data=sum_channel_multiplier*norm_data 
其中,buffer_data为channels*spatial_dim 
sum_channel_multiplier为channel*1(所有元素为1) 
norm_data位1*spatial_dim 
这里是将norm_data扩展成channel个,得到了buffer_data,便于后面的element_wise的相除 
第4个操作: 
caffe_div(dim, bottom_data, buffer_data, top_data); 
top_data=bottom_data/buffer_data (element_wise) 
最后,别忘了norm_data会加上一个spatial_dim。这样好就行下一个num的计算。 
以上就完成了across_spatial为false的normalization。 
接下来是第二个if(跟上面的if是平行而不是嵌入的关系,即是在上面的normalization完成后进行的)

if (channel_shared_) {   //defaluttrue
      caffe_scal<Dtype>(dim, scale[0], top_data);  //都乘以一个scale[0]
    }
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3

如果channel_shared为ture,则将整个top_data乘以一个scale[0]

else {
 caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, spatial_dim,
                       1, Dtype(1), scale, sum_spatial_multiplier,
                       Dtype(0),
                       buffer_data);
 caffe_mul<Dtype>(dim, top_data, buffer_data, top_data);
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

完成的操作是: 
buffer_data= scale*sum_spatial_multiplier 
其中 buffer_data: channel*spatial_dim 
scale : channel 
sum_spatial_multiplier: 1*spatial_dim (元素都是1) 
其实就是将scale复制spatial_dim遍,做成一张mask,以便后面element_wise的相乘 
caffe_mul(dim, top_data, buffer_data, top_data); 
最后是bottom_data += dim;top_data += dim;整个上面提到过的,进入下一个num的归一化。

生成的格式如下,不适合chrome://tracing/[ { "count" : 9080 } , { "name" : "ESMM_FW_Gate_Network/hiddenlayer_0/alpha_dice:0 + ONNXTRT_Broadcast_7", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_FW_Gate_Network/gate_output_layer/alpha_dice:0 + ONNXTRT_Broadcast_17", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_Dnn_Network/dnn_first_layer/alpha_dice:0 + ONNXTRT_Broadcast_93", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_Dnn_Network/hiddenlayer_0/alpha_dice:0 + ONNXTRT_Broadcast_103", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_Dnn_Network/hiddenlayer_1/alpha_dice:0 + ONNXTRT_Broadcast_113", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "reshape_before_ESMM_FW_Gate_Network/hiddenlayer_0/MatMul", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_FW_Gate_Network/hiddenlayer_0/MatMul + ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/BatchNorm/FusedBatchNormV3", "timeMs" : 42.6197, "averageMs" : 0.00469379, "medianMs" : 0.00448, "percentage" : 2.03828 } , { "name" : "ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/Rsqrt:0 + ONNXTRT_Broadcast + unsqueeze_node_after_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/Rsqrt:0 + ONNXTRT_Broadcast_ONNXTRT_Broadcast_output + ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/mul + ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/mul_1:0 + ONNXTRT_Broadcast_3 + unsqueeze_node_after_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/mul_1:0 + ONNXTRT_Broadcast_3_ONNXTRT_Broadcast_3_output + ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/batch_normalization/batchnorm/add_1 + ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/Sigmoid", "timeMs" : 38.245, "averageMs" : 0.004212, "medianMs" : 0.004096, "percentage" : 1.82906 } , { "name" : "squeeze_after_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/Sigmoid", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "unsqueeze_node_after_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/Sigmoid_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/Sigmoid:0", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/sub/x:0 + ONNXTRT_Broadcast_5, PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/sub)), PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul))", "timeMs" : 30.7875, "averageMs" : 0.00339069, "medianMs" : 0.00336, "percentage" : 1.47241 } , { "name" : "unsqueeze_node_after_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul:0", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 0 to PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_1), PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_2), PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 1 to PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_1), PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_2), PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 2 to PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_1), PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_2), PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_1), PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/mul_2), PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/add)))", "timeMs" : 31.0026, "averageMs" : 0.00341439, "medianMs" : 0.00336, "percentage" : 1.4827 } , { "name" : "Reformatting CopyNode for Input Tensor 0 to ESMM_FW_Gate_Network/gate_output_layer/MatMul + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/BatchNorm/FusedBatchNormV3", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_FW_Gate_Network/gate_output_layer/MatMul + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/BatchNorm/FusedBatchNormV3", "timeMs" : 43.6841, "averageMs" : 0.00481102, "medianMs" : 0.004608, "percentage" : 2.08919 } , { "name" : "ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/Rsqrt:0 + ONNXTRT_Broadcast_11 + unsqueeze_node_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/Rsqrt:0 + ONNXTRT_Broadcast_11_ONNXTRT_Broadcast_11_output + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/mul + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/mul_1:0 + ONNXTRT_Broadcast_13 + unsqueeze_node_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/mul_1:0 + ONNXTRT_Broadcast_13_ONNXTRT_Broadcast_13_output + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/batch_normalization/batchnorm/add_1 + ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/Sigmoid", "timeMs" : 32.7257, "averageMs" : 0.00360415, "medianMs" : 0.003648, "percentage" : 1.5651 } , { "name" : "squeeze_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/Sigmoid", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "unsqueeze_node_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/Sigmoid_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/Sigmoid:0", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(ESMM_FW_Gate_Network/hiddenlayer_0/hiddenlayer_0/sub/x:0_clone_1 + ONNXTRT_Broadcast_15, PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/sub)), PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul))", "timeMs" : 31.0739, "averageMs" : 0.00342223, "medianMs" : 0.003392, "percentage" : 1.48611 } , { "name" : "unsqueeze_node_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul:0", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 0 to PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_1), PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_2), PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 1 to PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_1), PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_2), PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "Reformatting CopyNode for Input Tensor 2 to PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_1), PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_2), PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add)))", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_1), PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/mul_2), PWN(ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add)))", "timeMs" : 30.9887, "averageMs" : 0.00341285, "medianMs" : 0.003392, "percentage" : 1.48203 } , { "name" : "Reformatting CopyNode for Input Tensor 0 to copied_squeeze_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "copied_squeeze_after_ESMM_FW_Gate_Network/gate_output_layer/gate_output_layer/add", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(PWN(PWN(ESMM_FW_Gate_Network/gate_output_layer/clip_by_value/Minimum/y:0 + ONNXTRT_Broadcast_19, PWN(ESMM_FW_Gate_Network/gate_output_layer/clip_by_value/Minimum)), PWN(ESMM_FW_Gate_Network/gate_output_layer/clip_by_value/y:0 + ONNXTRT_Broadcast_21, PWN(ESMM_FW_Gate_Network/gate_output_layer/clip_by_value))), PWN(ESMM_FW_Gate_Network/gate_output_layer/Sigmoid)), PWN(ESMM_FW_Gate_Network/gate_output_layer/Mul/y:0 + ONNXTRT_Broadcast_23, PWN(ESMM_FW_Gate_Network/gate_output_layer/Mul)))", "timeMs" : 30.4434, "averageMs" : 0.00335279, "medianMs" : 0.003328, "percentage" : 1.45595 } , { "name" : "reshape_before_ESMM_FW_Gate_Network/MatMul", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_FW_Gate_Network/MatMul", "timeMs" : 405.694, "averageMs" : 0.04468, "medianMs" : 0.044768, "percentage" : 19.4023 } , { "name" : "reshape_after_ESMM_FW_Gate_Network/MatMul", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "PWN(PWN(ESMM_Dnn_Network/Mul_1), PWN(ESMM_Dnn_Network/Mul_3))", "timeMs" : 32.9282, "averageMs" : 0.00362645, "medianMs" : 0.003712, "percentage" : 1.57479 } , { "name" : "PWN(PWN(ESMM_Dnn_Network/Mul), PWN(ESMM_Dnn_Network/Mul_2))", "timeMs" : 34.0671, "averageMs" : 0.00375188, "medianMs" : 0.00384, "percentage" : 1.62926 } , { "name" : "reshape_before_ESMM_Dnn_Network/dnn_first_layer/MatMul_1", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "reshape_before_ESMM_Dnn_Network/dnn_first_layer/MatMul", "timeMs" : 0, "averageMs" : 0, "medianMs" : 0, "percentage" : 0 } , { "name" : "ESMM_Dnn_Network/dnn_first_layer/MatMul", "timeMs" : 478.948, "averageMs" : 0.0527476, "medianMs" : 0.052896, "percentage" : 22.9057 } , { "name" : "ESMM_Dnn_Network/dnn_first_layer/MatMul_1 + ESMM_Dnn_Network/dnn_first_layer/add", "timeMs" : 259.201, "averageMs" : 0.0285464, "medianMs" : 0.028672, "percentage" : 12.3963 } , { "name" : "ESMM_Dnn_Network/dnn_first_layer/BatchNorm/FusedBatchNormV3", "timeMs" : 36.2014, "averageMs" : 0.00398694, "medianMs" : 0.004, "percentage" : 1.73133 }
08-04
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值