Normalize Layer in Caffe

本文详细解析了Caffe中Normalize层的三个关键参数:across_spatial、scale_filler及channel_shared的功能,并通过实例展示了这些参数如何影响网络的前向传播结果及反向传播过程中的scale更新。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

message NormalizeParameter {
  optional bool across_spatial = 1 [default = true];
  // Initial value of scale. Default is 1.0 for all
  optional FillerParameter scale_filler = 2;
  // Whether or not scale parameters are shared across channels.
  optional bool channel_shared = 3 [default = true];
  // Epsilon for not dividing by zero while normalizing variance
  optional float eps = 4 [default = 1e-10];
}

下面进行实验,主要检查三个参数的具体含义,即across_spatial , scale_filler ,channel_shared。

  1. across_spatial: 对于每个样本,其到达norm层的张量形状表示为(1,c,h,w),那么across_spatial用来指示标准化是否要跨空间位置 (即(h,w))。如果across_spatial=False,则表示分别对(h,w)空间中的每个位置的c个通道的元素(也就是该位置的特征描述矢量)进行单独的标准化。如果across_spatial=True, 则基于所有的c*h*w个元素进行标准化。
  2. scale_filler: 包含可学习的参数(见下面的prototxt)。和卷积层参数一样,可以通过设置学习率来决定是否对该参数进行更新,例如设置学习率为0,实现常数的缩放。
  3. channel_shared: 用来控制scale_filler的参数是否被多个通道共享。如果channel_shared=True, 那么scale_filler中参数的形状为(1,)也就是一个待学习的标量;如果channel_shared=False, 那么scale_filler中参数的形状/长度为(c,), c为通道数。

检查1:前向

deploy.prototxt:

name: "demo"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 2 dim: 3 dim: 2 dim: 3 } }
}
layer {
  name: "norm"
  type: "Normalize"
  bottom: "data"
  top: "norm"
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }
}

main.py

#coding=UTF-8
import caffe
import numpy as np
caffe.set_mode_cpu()

input_data = np.zeros(shape=(2,3,2,3),dtype=np.float32)

input_data[0,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[0,1,:,:] = np.array([[1,1,2],[4,5,6]])
input_data[0,2,:,:] = np.array([[1,2,2],[4,5,6]])


input_data[1,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[1,1,:,:] = np.array([[1,2,3],[4,4,6]])
input_data[1,2,:,:] = np.array([[1,2,3],[4,5,5]])


deploy_pro = 'deploy.prototxt'
# specify any model and make sure any weight of layers is not loaded 
weight_file = '../pytorch-caffe-master/ZOO_AlexNet/bvlc_alexnet.caffemodel' # not use
net = caffe.Net(deploy_pro,weight_file,caffe.TEST)

shape = input_data.shape
net.blobs['data'].reshape(shape[0],shape[1],shape[2],shape[3])
net.blobs['data'].data[...] = input_data

net.forward()

result = net.blobs['norm'].data

print(result)

from caffe.proto import caffe_pb2
import google.protobuf.text_format
net = caffe_pb2.NetParameter()
f = open(deploy_pro, 'r')
net = google.protobuf.text_format.Merge(str(f.read()), net)
f.close()

across_spatial = True
channel_shared = True
scale_type     = ''
scale_value    = 0

for i in range(0, len(net.layer)):
    if net.layer[i].type == 'Normalize':
        if net.layer[i].norm_param.across_spatial == True: # bias term, for example
            across_spatial = True
        else:
            across_spatial = False
        if net.layer[i].norm_param.channel_shared == True:
            channel_shared = True
        else:
            channel_shared = False
        scale_type = net.layer[i].norm_param.scale_filler.type
        scale_value = net.layer[i].norm_param.scale_filler.value
        #
        break
print('The parameters in Normalize layer:')
print(across_spatial)
print(channel_shared)
print(scale_type)
print(scale_value)
if across_spatial == False and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == False, channel_shared == False, scale_value = 1 ')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(temp_result**2))
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == True, channel_shared == True, scale_value = 1, check for across_spatial')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == True, scale_value != 1 , check for scale_value')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == False and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == False, scale_value != 1 , check for channel_shared')
    #####################################################################
    # need back propagation 
    #####################################################################
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

检查2:scale的更新

和普通的卷积层参数一样,scale_filler 中的value值会随着迭代的更新而更新,例如,在AlexNet网络的pool5层(其中pool5层输出的blob形状为(n*c*h*w)=(n*256*6*6))后面添加如上的一个norm层,然后在每次迭代后输出:

scale_value = mysolver.net.params['norm'][0].data
print(scale_value)

结果:

5.9999228
5.999773
5.999566
5.99917
5.9985614
5.997842
5.9970937

如果想将scale_value设置为固定的值,那么和卷积层参数一样,将学习率设置为0:,如下:

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }

}

检测3:channel_shared

默认情况下,channel_shared设置为True,那么我们在prototxt中将其修改为False。

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: False
  }

}

然后,我们在每次迭代后打印scale_value或者shape或者长度,按照参考文献[1]和[2]描述,其长度应该为256,即每个通道对应一个scale_value值。

 scale_value = mysolver.net.params['norm'][0].data
 print(scale_value.shape)  #结果:(256,)

参考文献:
1.https://blog.youkuaiyun.com/zqjackking/article/details/69938901[caffe中的normalization_layer]
2.https://blog.youkuaiyun.com/weixin_35653315/article/details/72715367 [Normalization on conv4_3 in SSD]
3.Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
4.SSD 中的 test_normalize_layer.cpp.

### Normalize in Programming and Data Processing Normalization refers to a range of techniques used to adjust values measured on different scales to a common scale, often between 0 and 1 or -1 and 1. This process ensures that no single feature dominates due to its original measurement unit or magnitude[^4]. In machine learning and data analysis, normalization can improve model performance by ensuring features contribute equally. In programming contexts, especially within algorithms like those involving Perlin Noise generation, normalization might refer to scaling output values so they fit into an expected range. For instance, when generating noise patterns, outputs may be normalized to ensure consistent behavior across varying input parameters[^3]. For handling distances or matrices as mentioned earlier, normalization could involve converting raw distance measurements into relative terms—such as averaging them—to facilitate comparison or further computation[^1]. #### Code Example for Normalization Here's how one might implement simple min-max normalization in Python: ```python def normalize(data): """Normalize list of numbers to [0, 1]""" minimum = min(data) maximum = max(data) return [(i - minimum) / (maximum - minimum) for i in data] data_points = [1, 2, 3, 4, 5] normalized_data = normalize(data_points) print(normalized_data) ``` This function takes a set of numerical inputs and rescales all elements proportionally based on the smallest and largest value found among the dataset provided.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值