神经网络各种初始化方法（normal，uniform，xavier）的numpy实现以及表现对比

最新推荐文章于 2025-03-11 22:44:52 发布

Kenn7

最新推荐文章于 2025-03-11 22:44:52 发布

阅读量4.2k

点赞数 7

分类专栏：机器学习文章标签：神经网络机器学习 python

本文链接：https://blog.youkuaiyun.com/kane7csdn/article/details/108896031

版权

机器学习专栏收录该内容

22 篇文章

订阅专栏

Numpy实现

if self.initialization == 'zeros':
    self.W[layer] = np.zeros([out_dim, in_dim])
    self.b[layer] = np.zeros([out_dim])
elif self.initialization == 'ones':
    self.W[layer] = np.ones([out_dim, in_dim])
    self.b[layer] = np.ones([out_dim])
elif self.initialization == 'normal':
    self.W[layer] = np.random.normal(loc=0., scale=1., size=[out_dim, in_dim])
    self.b[layer] = np.random.normal(loc=0., scale=1., size=[out_dim])
elif self.initialization == 'xavier_Glorot_normal':
    self.W[layer] = np.random.normal(loc=0., scale=1., size=[out_dim, in_dim]) / np.sqrt(in_dim)
    self.b[layer] = np.random.normal(loc=0., scale=1., size=[out_dim]) / np.sqrt(in_dim)
elif self.initialization == 'xavier_normal':
    std = np.sqrt(2. / (in_dim + out_dim))
    self.W[layer] = np.random.normal(loc=0., scale=std, size=[out_dim, in_dim])
    self.b[layer] = np.random.normal(loc=0., scale=std, size=[out_dim])
elif self.initialization == 'uniform':
    a = np.sqrt(1. / in_dim)
    self.W[layer] = np.random.uniform(low=-a, high=a, size=[out_dim, in_dim])
    self.b[layer] = np.random.uniform(low=-a, high=a, size=[out_dim])
elif self.initialization == 'xavier_uniform':
    a = np.sqrt(6. / (in_dim + out_dim))
    self.W[layer] = np.random.uniform(low=-a, high=a, size=[out_dim, in_dim])
    self.b[layer] = np.random.uniform(low=-a, high=a, size=[out_dim])
else:
    print("initialization error!")
    exit(1)

表现对比

Mnist数据集 input_feature_dim=784(28*28)

MLP-64-64-softmax网络（自己用numpy搭建的，可能有问题）

SGD优化方法 batch_size=128 max_epoch=100 lr=0.05

以下数值为test set上的accuracy

需要注意的是：只跑了100个epoch（100*128=12800个shuffled training samples）

	zeros	ones	normal	xavier_Glorot	xavier_normal	uniform	xavier_uniform
sigmoid	not convergence	not convergence	0.838	0.756	0.623	0.347	0.645
relu	not convergence	not convergence	not convergence	0.895	0.895	0.881	0.896