背景:
神经网络的参数初始化,一般是采用随机初始化的方式。如果是初始化为全0,会导致每层的多个神经元退化为一个,即在每层中的多个神经元是完全失效的。虽然层与层之间仍然是有效的,但是每层一个神经元的多层神经网络,你真的觉得有意思?有什么想法,欢迎留言。
代码测试:
2层神经网络的全0初始化
# -*- coding: utf-8 -*-
__author__ = 'jasonliu'
#探究神经网络初始化值的影响
#初始化为0
#初始化为相同值,但是不为0
import numpy as np
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
X = np.array([[0.5,0.9,1],
[2,1,1],
[0.3,0.6,1],
[1.5,0.9,0.6]])
#此时X是在行方式叠其样本数
Y = np.array([[1],
[3],
[2],
[0]])
#此时Y是在行方向叠其样本数
np.random.seed(1)
# randomly initialize our weights with mean 0
# syn0 = 2*np.random.random((3,4)) - 1
# syn1 = 2*np.random.random((4,1)) - 1
W1 = 2*np.zeros((3,4))# + 1
W2 = 2*np.zeros((4,1))# + 1
for j in range(60000):
# Feed forward through layers 0, 1, and 2
A0 = X
Z1 = np.dot(A0, W1)
A1 = nonlin(Z1)
Z2 = np.dot(A1, W2)
A2 = nonlin(Z2)
# how much did we miss the target value?
dZ_2 = Y - A2#Loss
if (j% 10000) == 0:
print("Error:" + str(np.mean(np.abs(dZ_2))))
# in what direction is the target value?
# were we really sure? if so, don't change too much.
l2_delta = dZ_2*nonlin(A2, deriv=True)#dZ_1
# how much did each l1 value contribute to the l2 error (according to the weights)?
l1_error = l2_delta.dot(W2.T)
# in what direction is the target l1?
# were we really sure? if so, don't change too much.
l1_delta = l1_error * nonlin(A1, deriv=True)
W2 += A1.T.dot(l2_delta)
W1 += A0.T.dot(l1_delta)
print("Output After Training:")
print("W1=", W1)
print("W2=", W2)
#从结果可以看出,W1在列方向是重复的。
#注意行和列方向的维度信息,也注意样本是在行方向的排列还是列方向
输出结果:
Error:1.25
Error:1.0000091298568936
Error:1.0000044798865095
Error:1.000002957418707
Error:1.0000022037278755
Error:1.0000017545861548
Output After Training:
W1= [[0.58078498 0.58078498 0.58078498 0.58078498]
[0.72845083 0.72845083 0.72845083 0.72845083]
[1.33742659 1.33742659 1.33742659 1.33742659]]
W2= [[3.52357914]
[3.52357914]
[3.52357914]
[3.52357914]]
可以看出,出现了重复,W1在列方向是重复的,即该层的每个神经元的权重是相同的。
2层神经网络的全2初始化
输出结果如下:
Error:1.0001879134151608
Error:1.0000064142342748
Error:1.0000032676762678
Error:1.0000021930282932
Error:1.0000016505669969
Error:1.0000013233782656
Output After Training:
W1= [[2.0085157 2.0085157 2.0085157 2.0085157 ]
[2.02205683 2.02205683 2.02205683 2.02205683]
[2.03953857 2.03953857 2.03953857 2.03953857]]
W2= [[3.30069379]
[3.30069379]
[3.30069379]
[3.30069379]]
结果是类似的,在列方向的神经元都是一样的。这种对称性依然存在。
随机初始化
W1 = 2*np.random.random((3,4)) - 1
W2 = 2*np.random.random((4,1)) - 1
输出结果:
W1= [[ 0.08581783 1.08039398 -1.16536044 0.27396062]
[-0.48584844 0.29602972 -0.86136823 0.54469744]
[ 0.24509319 2.23500284 -0.5412316 2.23673393]]
W2= [[1.23731123]
[6.40888963]
[0.09966753]
[5.78541642]]