message NormalizeParameter {
optional bool across_spatial = 1 [default = true];
// Initial value of scale. Default is 1.0 for all
optional FillerParameter scale_filler = 2;
// Whether or not scale parameters are shared across channels.
optional bool channel_shared = 3 [default = true];
// Epsilon for not dividing by zero while normalizing variance
optional float eps = 4 [default = 1e-10];
}
下面进行实验,主要检查三个参数的具体含义,即across_spatial , scale_filler ,channel_shared。
- across_spatial: 对于每个样本,其到达norm层的张量形状表示为(1,c,h,w),那么across_spatial用来指示标准化是否要跨空间位置 (即(h,w))。如果across_spatial=False,则表示分别对(h,w)空间中的每个位置的c个通道的元素(也就是该位置的特征描述矢量)进行单独的标准化。如果across_spatial=True, 则基于所有的c*h*w个元素进行标准化。
- scale_filler: 包含可学习的参数(见下面的prototxt)。和卷积层参数一样,可以通过设置学习率来决定是否对该参数进行更新,例如设置学习率为0,实现常数的缩放。
- channel_shared: 用来控制scale_filler的参数是否被多个通道共享。如果channel_shared=True, 那么scale_filler中参数的形状为(1,)也就是一个待学习的标量;如果channel_shared=False, 那么scale_filler中参数的形状/长度为(c,), c为通道数。
检查1:前向
deploy.prototxt:
name: "demo"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 2 dim: 3 dim: 2 dim: 3 } }
}
layer {
name: "norm"
type: "Normalize"
bottom: "data"
top: "norm"
norm_param {
across_spatial: True
scale_filler {
type: "constant"
value: 6
}
channel_shared: True
}
}
main.py
#coding=UTF-8
import caffe
import numpy as np
caffe.set_mode_cpu()
input_data = np.zeros(shape=(2,3,2,3),dtype=np.float32)
input_data[0,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[0,1,:,:] = np.array([[1,1,2],[4,5,6]])
input_data[0,2,:,:] = np.array([[1,2,2],[4,5,6]])
input_data[1,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[1,1,:,:] = np.array([[1,2,3],[4,4,6]])
input_data[1,2,:,:] = np.array([[1,2,3],[4,5,5]])
deploy_pro = 'deploy.prototxt'
# specify any model and make sure any weight of layers is not loaded
weight_file = '../pytorch-caffe-master/ZOO_AlexNet/bvlc_alexnet.caffemodel' # not use
net = caffe.Net(deploy_pro,weight_file,caffe.TEST)
shape = input_data.shape
net.blobs['data'].reshape(shape[0],shape[1],shape[2],shape[3])
net.blobs['data'].data[...] = input_data
net.forward()
result = net.blobs['norm'].data
print(result)
from caffe.proto import caffe_pb2
import google.protobuf.text_format
net = caffe_pb2.NetParameter()
f = open(deploy_pro, 'r')
net = google.protobuf.text_format.Merge(str(f.read()), net)
f.close()
across_spatial = True
channel_shared = True
scale_type = ''
scale_value = 0
for i in range(0, len(net.layer)):
if net.layer[i].type == 'Normalize':
if net.layer[i].norm_param.across_spatial == True: # bias term, for example
across_spatial = True
else:
across_spatial = False
if net.layer[i].norm_param.channel_shared == True:
channel_shared = True
else:
channel_shared = False
scale_type = net.layer[i].norm_param.scale_filler.type
scale_value = net.layer[i].norm_param.scale_filler.value
#
break
print('The parameters in Normalize layer:')
print(across_spatial)
print(channel_shared)
print(scale_type)
print(scale_value)
if across_spatial == False and channel_shared == True and abs(scale_value-1.0)<1e-10:
print('when: across_spatial == False, channel_shared == False, scale_value = 1 ')
# across_spatial = False, represent: normalize each position(x,y) across three channels
# check i-th sample
i = 0 # i = o or i = 1 in our case
position = [0,2] # the position in heatmap, n * c * h * w, that is, in h*w
# the result computed by hand
temp_result = input_data[i,:,position[0],position[1]]
result_byhand = temp_result / np.sqrt(np.sum(temp_result**2))
# the result computed by normalized layer
result_bylayer = result[i,:,position[0],position[1]]
print(result_byhand)
print(result_bylayer)
if across_spatial == True and channel_shared == True and abs(scale_value-1.0)<1e-10:
print('when: across_spatial == True, channel_shared == True, scale_value = 1, check for across_spatial')
# across_spatial = False, represent: normalize each position(x,y) across three channels
# check i-th sample
i = 0 # i = o or i = 1 in our case
position = [0,2] # the position in heatmap, n * c * h * w, that is, in h*w
# the result computed by hand
temp_result = input_data[i,:,position[0],position[1]]
result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2)) # input_data[i,:,:,:] stand for across space/position/seat
# the result computed by normalized layer
result_bylayer = result[i,:,position[0],position[1]]
print(result_byhand)
print(result_bylayer)
if across_spatial == True and channel_shared == True and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
print('when: across_spatial == True, channel_shared == True, scale_value != 1 , check for scale_value')
# across_spatial = False, represent: normalize each position(x,y) across three channels
# check i-th sample
i = 0 # i = o or i = 1 in our case
position = [0,2] # the position in heatmap, n * c * h * w, that is, in h*w
# the result computed by hand
temp_result = input_data[i,:,position[0],position[1]]
result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2)) # input_data[i,:,:,:] stand for across space/position/seat
result_byhand = result_byhand * scale_value
# the result computed by normalized layer
result_bylayer = result[i,:,position[0],position[1]]
print(result_byhand)
print(result_bylayer)
if across_spatial == True and channel_shared == False and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
print('when: across_spatial == True, channel_shared == False, scale_value != 1 , check for channel_shared')
#####################################################################
# need back propagation
#####################################################################
# across_spatial = False, represent: normalize each position(x,y) across three channels
# check i-th sample
i = 0 # i = o or i = 1 in our case
position = [0,2] # the position in heatmap, n * c * h * w, that is, in h*w
# the result computed by hand
temp_result = input_data[i,:,position[0],position[1]]
result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2)) # input_data[i,:,:,:] stand for across space/position/seat
result_byhand = result_byhand * scale_value
# the result computed by normalized layer
result_bylayer = result[i,:,position[0],position[1]]
print(result_byhand)
print(result_bylayer)
检查2:scale的更新
和普通的卷积层参数一样,scale_filler 中的value值会随着迭代的更新而更新,例如,在AlexNet网络的pool5层(其中pool5层输出的blob形状为(n*c*h*w)=(n*256*6*6))后面添加如上的一个norm层,然后在每次迭代后输出:
scale_value = mysolver.net.params['norm'][0].data
print(scale_value)
结果:
5.9999228
5.999773
5.999566
5.99917
5.9985614
5.997842
5.9970937
如果想将scale_value设置为固定的值,那么和卷积层参数一样,将学习率设置为0:,如下:
layer {
name: "norm"
type: "Normalize"
bottom: "pool5"
top: "norm"
param {
lr_mult: 0 # 学习率设置为0
decay_mult: 0
}
norm_param {
across_spatial: True
scale_filler {
type: "constant"
value: 6
}
channel_shared: True
}
}
检测3:channel_shared
默认情况下,channel_shared设置为True,那么我们在prototxt中将其修改为False。
layer {
name: "norm"
type: "Normalize"
bottom: "pool5"
top: "norm"
param {
lr_mult: 0 # 学习率设置为0
decay_mult: 0
}
norm_param {
across_spatial: True
scale_filler {
type: "constant"
value: 6
}
channel_shared: False
}
}
然后,我们在每次迭代后打印scale_value或者shape或者长度,按照参考文献[1]和[2]描述,其长度应该为256,即每个通道对应一个scale_value值。
scale_value = mysolver.net.params['norm'][0].data
print(scale_value.shape) #结果:(256,)
参考文献:
1.https://blog.youkuaiyun.com/zqjackking/article/details/69938901[caffe中的normalization_layer]
2.https://blog.youkuaiyun.com/weixin_35653315/article/details/72715367 [Normalization on conv4_3 in SSD]
3.Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
4.SSD 中的 test_normalize_layer.cpp.