【deep learning学习笔记】Greedy Layer-Wise Training of Deep Networks

本文探讨了深度网络训练中的维数灾难问题及解决方案。通过引入分层贪婪训练策略,利用DBN模型验证其有效性,并扩展到连续输入及隐含层。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

标题:Greedy Layer-Wise Training of Deep Networks

作者:Yoshua Bengio

发表于:NIPS’2006


主要内容:

很经典的文章了。在Hinton提出 stack RBM 组成的DBN之后,Bengio这篇文章一方面验证DBN模型,另一方面讨论可能的扩展。


对于shallow architecture模型,如SVM,对于d个输入,要有2^d个样本,才足够训练模型。当d增大的时候,这就产生了维数灾难问题。而多层神经网络能够避免这个问题:

boolean functions (such as the function that computes the multiplication of two numbers from their d-bit representation) expressible by O(logd) layers of combinatorial logic with O(d) elements in eachlayer may require O(2^d)elements when expressed with only 2 layers。

但是对于深层神经网络,用梯度下降方法来训练,通常会陷入局部最优解。

文章接下来介绍deep belief network。


1. 扩展到连续输入

一种直观的方法,是将输入的实属向量进行归一化, 转成(0, 1)区间的数。然后用正常的RBM的CD-k来训练就行。

作者从RBM的能量函数入手,将输入转成高斯unit,然后用CD-k算法就可以训练。具体怎么操作的,作者没细说,我也没看懂。

2. 将隐含层扩展成连续值的形式

上述方法也可以用到隐含层。

3. Understanding why the layer-wise strategy works

作者用autoencoder来替换DBN中的RBM,得到了comparable的实验结果。作者用surperwised训练算法来代替RBM的unsurpervised训练算法,发现结果略差,作者的解释是:surperwised的方法过于“贪心”,在训练过程中丢掉了部分信息。


作者在文章之后附上了实验的所有算法的伪代码,值得借鉴。


Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowing them to compactly represent highly non linear and highly varying functions However until recently it was not clear how to train such deep networks since gradient based optimization starting from random initialization appears to often get stuck in poor solutions Hinton et al recently introduced a greedy layer wise unsupervised learning algorithm for Deep Belief Networks DBN a generative model with many layers of hidden causal variables In the context of the above optimization problem we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task Our experiments also conrm the hypothesis that the greedy layer wise unsupervised training strategy mostly helps the optimization by initializing weights in a region near a good local minimum giving rise to internal distributed representations that are high level abstractions of the input bringing better generalization ">Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowin [更多]
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值