Numpy实现
if self.initialization == 'zeros':
self.W[layer] = np.zeros([out_dim, in_dim])
self.b[layer] = np.zeros([out_dim])
elif self.initialization == 'ones':
self.W[layer] = np.ones([out_dim, in_dim])
self.b[layer] = np.ones([out_dim])
elif self.initialization == 'normal':
self.W[layer] = np.random.normal(loc=0., scale=1., size=[out_dim, in_dim])
self.b[layer] = np.random.normal(loc=0., scale=1., size=[out_dim])
elif self.initialization == 'xavier_Glorot_normal':
self.W[layer] = np.random.normal(loc=0., scale=1., size=[out_dim, in_dim]) / np.sqrt(in_dim)
self.b[layer] = np.random.normal(loc=0., scale=1., size=[out_dim]) / np.sqrt(in_dim)
elif self.initialization == 'xavier_normal':
std = np.sqrt(2. / (in_dim + out_dim))
self.W[layer] = np.random.normal(loc=0., scale=std, size=[out_dim, in_dim])
self.b[layer] = np.random.normal(loc=0., scale=std, size=[out_dim])
elif self.initialization == 'uniform':
a = np.sqrt(1. / in_dim)
self.W[layer] = np.random.uniform(low=-a, high=a, size=[out_dim, in_dim])
self.b[layer] = np.random.uniform(low=-a, high=a, size=[out_dim])
elif self.initialization == 'xavier_uniform':
a = np.sqrt(6. / (in_dim + out_dim))
self.W[layer] = np.random.uniform(low=-a, high=a, size=[out_dim, in_dim])
self.b[layer] = np.random.uniform(low=-a, high=a, size=[out_dim])
else:
print("initialization error!")
exit(1)
表现对比
Mnist数据集 input_feature_dim=784(28*28)
MLP-64-64-softmax网络(自己用numpy搭建的,可能有问题)
SGD优化方法 batch_size=128 max_epoch=100 lr=0.05
以下数值为test set上的accuracy
需要注意的是:只跑了100个epoch(100*128=12800个shuffled training samples)
| zeros | ones | normal | xavier_Glorot | xavier_normal | uniform | xavier_uniform |
sigmoid | not convergence | not convergence | 0.838 | 0.756 | 0.623 | 0.347 | 0.645 |
relu | not convergence | not convergence | not convergence | 0.895 | 0.895 | 0.881 | 0.896 |
参考:
神经网络参数初始化 xavier_Glorot
https://blog.youkuaiyun.com/qq_26972735/article/details/81984111
初始化函数 uniform, normal, const, Xavier, He initialization
https://blog.youkuaiyun.com/dss_dssssd/article/details/83959474
常用权重初始化方法Xavier,He initialization的推导