Normalize Layer in Caffe

最新推荐文章于 2022-03-28 17:25:12 发布

机器学习的小学生

最新推荐文章于 2022-03-28 17:25:12 发布

阅读量4.5k

点赞数 2

分类专栏： Caffe

本文链接：https://blog.youkuaiyun.com/raby_gyl/article/details/80295222

版权

Caffe 专栏收录该内容

47 篇文章

订阅专栏

本文详细解析了Caffe中Normalize层的三个关键参数：across_spatial、scale_filler及channel_shared的功能，并通过实例展示了这些参数如何影响网络的前向传播结果及反向传播过程中的scale更新。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

message NormalizeParameter {
  optional bool across_spatial = 1 [default = true];
  // Initial value of scale. Default is 1.0 for all
  optional FillerParameter scale_filler = 2;
  // Whether or not scale parameters are shared across channels.
  optional bool channel_shared = 3 [default = true];
  // Epsilon for not dividing by zero while normalizing variance
  optional float eps = 4 [default = 1e-10];
}

下面进行实验，主要检查三个参数的具体含义，即across_spatial , scale_filler ,channel_shared。

across_spatial: 对于每个样本，其到达norm层的张量形状表示为(1,c,h,w)，那么across_spatial用来指示标准化是否要跨空间位置 (即(h,w))。如果across_spatial=False,则表示分别对(h,w)空间中的每个位置的c个通道的元素（也就是该位置的特征描述矢量）进行单独的标准化。如果across_spatial=True，则基于所有的c*h*w个元素进行标准化。
scale_filler: 包含可学习的参数(见下面的prototxt)。和卷积层参数一样，可以通过设置学习率来决定是否对该参数进行更新，例如设置学习率为0，实现常数的缩放。
channel_shared: 用来控制scale_filler的参数是否被多个通道共享。如果channel_shared=True, 那么scale_filler中参数的形状为(1,)也就是一个待学习的标量；如果channel_shared=False, 那么scale_filler中参数的形状/长度为(c,), c为通道数。

检查1：前向

deploy.prototxt:

name: "demo"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 2 dim: 3 dim: 2 dim: 3 } }
}
layer {
  name: "norm"
  type: "Normalize"
  bottom: "data"
  top: "norm"
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }
}

main.py

#coding=UTF-8
import caffe
import numpy as np
caffe.set_mode_cpu()

input_data = np.zeros(shape=(2,3,2,3),dtype=np.float32)

input_data[0,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[0,1,:,:] = np.array([[1,1,2],[4,5,6]])
input_data[0,2,:,:] = np.array([[1,2,2],[4,5,6]])


input_data[1,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[1,1,:,:] = np.array([[1,2,3],[4,4,6]])
input_data[1,2,:,:] = np.array([[1,2,3],[4,5,5]])


deploy_pro = 'deploy.prototxt'
# specify any model and make sure any weight of layers is not loaded 
weight_file = '../pytorch-caffe-master/ZOO_AlexNet/bvlc_alexnet.caffemodel' # not use
net = caffe.Net(deploy_pro,weight_file,caffe.TEST)

shape = input_data.shape
net.blobs['data'].reshape(shape[0],shape[1],shape[2],shape[3])
net.blobs['data'].data[...] = input_data

net.forward()

result = net.blobs['norm'].data

print(result)

from caffe.proto import caffe_pb2
import google.protobuf.text_format
net = caffe_pb2.NetParameter()
f = open(deploy_pro, 'r')
net = google.protobuf.text_format.Merge(str(f.read()), net)
f.close()

across_spatial = True
channel_shared = True
scale_type     = ''
scale_value    = 0

for i in range(0, len(net.layer)):
    if net.layer[i].type == 'Normalize':
        if net.layer[i].norm_param.across_spatial == True: # bias term, for example
            across_spatial = True
        else:
            across_spatial = False
        if net.layer[i].norm_param.channel_shared == True:
            channel_shared = True
        else:
            channel_shared = False
        scale_type = net.layer[i].norm_param.scale_filler.type
        scale_value = net.layer[i].norm_param.scale_filler.value
        #
        break
print('The parameters in Normalize layer:')
print(across_spatial)
print(channel_shared)
print(scale_type)
print(scale_value)
if across_spatial == False and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == False, channel_shared == False, scale_value = 1 ')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(temp_result**2))
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == True, channel_shared == True, scale_value = 1, check for across_spatial')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == True, scale_value != 1 , check for scale_value')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == False and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == False, scale_value != 1 , check for channel_shared')
    #####################################################################
    # need back propagation 
    #####################################################################
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

检查2：scale的更新

和普通的卷积层参数一样，scale_filler 中的value值会随着迭代的更新而更新，例如，在AlexNet网络的pool5层(其中pool5层输出的blob形状为(n*c*h*w)=(n*256*6*6))后面添加如上的一个norm层，然后在每次迭代后输出：

scale_value = mysolver.net.params['norm'][0].data
print(scale_value)

结果：

如果想将scale_value设置为固定的值，那么和卷积层参数一样，将学习率设置为0:，如下：

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }

}

检测3：channel_shared

默认情况下，channel_shared设置为True，那么我们在prototxt中将其修改为False。

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: False
  }

}

然后，我们在每次迭代后打印scale_value或者shape或者长度，按照参考文献[1]和[2]描述，其长度应该为256，即每个通道对应一个scale_value值。

 scale_value = mysolver.net.params['norm'][0].data
 print(scale_value.shape)  #结果：(256,)

参考文献:
1.https://blog.youkuaiyun.com/zqjackking/article/details/69938901[caffe中的normalization_layer]
2.https://blog.youkuaiyun.com/weixin_35653315/article/details/72715367 [Normalization on conv4_3 in SSD]
3.Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
4.SSD 中的 test_normalize_layer.cpp.