本文记录了博主阅读Google给出的神经网络首个理论证明的论文《Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradien Descent》的阅读笔记。更新于2019.02.26。
文章目录
摘要
- 在无穷宽度条件下,宽神经网络由在初始参数处的一阶泰勒展开式线性模型主导。(for wide neural networks, …, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.)
- 将贝叶斯神经网络和高斯过程的对应部分映射,基于梯度和平方损失的宽神经网络的训练生成的测试集的估计,是由一个特定的核下的高斯过程生成的。(Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel.)
- 尽管上述结论是在无限宽模型下得到的,论文作者发现实验证明对于可操作的有限尺寸的神经网络,由神经网络得到的估计与线性模型得到的估计也是基本一致的。且这个一致性对于不同结构、不同优化方法、不同损失函数,都是成立的。(While these theoretical results