Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural net-
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised learning sinc
works.” arXiv preprint arXiv:1312.6120 (2013). [Citations: 97].
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.
• Optimal learning rate scales as O(1/L), where L is # of layers.
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised learning sinc

本文探讨了深度学习中权重初始化对于学习动态的非线性影响。研究指出,最佳学习率与层数成反比,并提出了使用随机正交矩阵作为初始化策略,以保持层间统计特性,加快学习速度。同时,相较于高斯矩阵,正交矩阵能精确保持所有向量的范数,避免学习受阻。在非线性情况下,理想的初始化要求雅可比矩阵的奇异值集中在1附近。
最低0.47元/天 解锁文章
1544

被折叠的 条评论
为什么被折叠?



