the initialization of ANN weights and activation function

本文探讨了神经网络中权重初始化和激活函数选择的重要性。强调了权重应当适中且随机,过小或过大都会带来问题;同时指出激活函数在零附近保持线性的必要性,并解释了这些选择背后的原因。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

the chioce of weights initialization and activation function is a paradigm: 1. weights must be small suitablely at random(too small is bad). 2. activation function is supposed to be linear in range of zero.

before I list some reasons, there are two foundational facts should be known: 1. weights initialization has strong relations to activation funcions; 2. uniform weights is meaningless, because it can not represent its knowledge that should deal with inputs. (this is a bit metaphysical, but intuitional. e.g. all weights is zero will make the network lose learning ability, delta_w will aslo be zero because the definition contain item derivate of f(net) and that is zero); 3. activation function could reflect prioi knowledge: Gauss distribution is prefered if targets has Gauss distribution.

1. large weights initialization will saturate activation function and this "saturate" is a property of activation function on purpose which used to limiting wx in order to limiting the time of training(search space is limited). remind point 2: if all hidden units is saturated, then network have been crippled due to weights of output is nearly uniform.

2. too small weights slow down learning rate. the reason is: delta_w contain item w, too small w will lead to situation of which modification vaule is small too irrespective of others items, or delta_w is smallest in all consideration because t and x is independence of network itself.

3. activation function's linear property in range of zero. here, I have to say I have not understand this. following is a note in my reading: "the linearity of activation function makes inputs and outputs in proprotion. a little change in inputs vaule will not lead to a big shift in outputs. so the sum error should be small and this assumpution aslo satisify natural condition: two almost same inputs has low probablity to be two different category..." notice that this is corresponding to weights initialization again: weights initialization take place in range near zero.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值