首先看RBM教程推导:http://blog.youkuaiyun.com/itplus/article/details/19207371,推导到下图时,对中括号中的第二项进行计算,是
通过采样的到的,那么采样有三种方法:Gibbs,CD-K,PCD,下面分别讲三种抽样方法。
1:Gibbs采样
2:CD-K前两种可参考张春霞《受限波尔兹曼机简介》和http://blog.youkuaiyun.com/mytestmy/article/details/9150213,下面主要讲我理解的PCD算法。
3:PCD
关于PCD的论文《Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient》中,有下面几段话:
The standard way to get it is by using a Markov Chain, but running a chain for many steps is too time-consuming. However, between parameter updates, the model changes only slightly.We can take advantage of that by initializing a Markov Chain at the state in which it ended for the previous model. This initialization is often fairly close to the model distribution, even though the model has changed a bit in the parameter update. Neal uses this approach with Sigmoid Belief Networks to approximately sample from the posterior distribution over hidden layer states given the visible layer state. For RBMs, the situation is a bit simpler: there is only one distribution from which we need samples, as opposed to one distribution per training data point. Thus, the algorithm can be used to produce gradient estimates online or using mini-batches, using only a few training data points for the positive part of each gradient estimate, and only a few ’fantasy’ points for the negative part. The fantasy points are updated by one full step of the Markov Chain each time a mini-batch is processed.
Of course this still is an approximation, because the model does change slightly with each parameter update. With infinitesimally small learning rate it becomes exact, and in general it seems to work best with small learning rates.
We call this algorithm Persistent Contrastive Divergence (PCD), to emphasize that the Markov Chain is not reset between parameter updates.