dropout
On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present. Another way to view the dropout procedure is as a very efficient way of performing model averaging with neural networks.
每一次训练表征(BP训练)时,每个隐含层神经元随机以0.5的概率被舍弃(赋为0),因此灭个隐含层神经元不能依赖其他的神经元来表征,即神经元的表征相对独立。从另一个角度来说,dropout是一个有效的神经网络模型平均方法。
At test time, we use the “mean network” that contains all of the hidden units but with their outgoing weights halved to compensate for the fact that twice as many of them are active.
在测试阶段,我们使用“平均网络”,它包含所有的隐含层神经元,但每个神经元的值都为原值的一半,因为此时激活的个数为训练时激活的个数的两倍。
Dropout can also be combined with generative pre-training, but in this case we use a small learning rate and no weight constraints to avoid losing the feature detectors discovered by the pre-training.
Dropout可以用在预训练阶段,即rbm无监督训练阶段,但此时需要使用较小的训利率,且无权重限制,以此来避免该阶段中特征检测可能出现的损失。
We found that finetuning a model using dropout with a small learning rate can give much better performace than standard backpropagation finetuning.
我们发现使用dropout并用较小的学习率来微调一个模型时,能够获得比标准bp算法微调更好的效果。
Reference:
Improving neural networks by preventing co-adaptation of feature detectors.pdf