CS231n: Convolutional Neural Networks for Visual Recognition
笔记中文翻译:https://zhuanlan.zhihu.com/p/21930884?refer=intelligentunit
Lecture 5: Convolutional Neural Networks
卷积层的参数设置:
池化层的参数设置:
Lecture 6: Training Neural Networks,Part I
激活函数的选择
Sigmoid:
- - Squashes numbers to range [0,1]
- - Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron
- - Saturated neurons “kill” the gradients (x很小或很大时,梯度趋近于0)
- - Sigmoid outputs are not zero-centered(w的梯度will be always all positive or all negative,也就是所有w每次都往同一个方向前进)
w本来应该走蓝色方向,现在只能走红色方向
- - exp() is a bit compute expensive
tanh:
- - Squashes numbers to range [-1,1]
- - zero centered (nice)
- - still kills gradients when saturated
ReLU:
- - Does not saturate (in +region)
- - Very computationally efficient
- - Converges much faster than sigmoid/tanh in practice (e.g. 6x)
- - Actually more biologically plausible than sigmoid
- - Not zero-centered output
- - An annoyance: what is the gradient when x < 0?
dead ReLU
,
will never activate=> never update </