Deep Learning:正则化(十一)

Bagging(Bootstrap Aggregating)是一种通过组合多个模型来减少泛化误差的技术。通过训练不同的模型,然后让所有模型对测试样例进行投票预测,以此实现模型平均策略。当模型的错误不完全相关时,模型平均可以显著提高预测的准确性。神经网络由于其训练过程中的随机性,即使使用相同的训练集,也能从模型平均中获益。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Bagging and Other Ensemble Methods

Bagging (short for bootstrap aggregating) is a technique for reducing generalization error by combining several models.
The idea is to train several different models separately, then have all of the models vote on the output for test examples.
This is an example of a general strategy in machine learning called model averaging. Techniques employing this strategy are known as ensemble methods.
The reason that model averaging works is that different models will usually not make all the same errors on the test set.
Consider for example a set of k regression models. Suppose that each model makes an error ϵi on each example, with the errors drawn from a zero-mean multivariate normal distribution with variances E[ϵ2i]=v and covariances E[ϵiϵj]=c. Then the error made by the average prediction of all the ensemble models is 1kiϵi. The expected squared error of the ensemble predictor is

E(1kiϵi)2=1k2Eiϵ2i+jiϵiϵj=1kv+k1kc
  • In the case where the errors are perfectly correlated and c = v, the mean squared error reduces to v, so the model averaging does not help at all.
  • In the case where the errors are perfectly uncorrelated and c = 0, the expected squared error of the ensemble is only 1kv.
  • This means that the expected squared error of the ensemble decreases linearly with the ensemble size.
  • In other words, on average, the ensemble will perform at least as well as any of its members, and if the members make independent errors, the ensemble will perform significantly better than its members.

Different ensemble methods construct the ensemble of models in different ways. For example, each member of the ensemble could be formed by training a completely different kind of model using a different algorithm or objective function.
Bagging is a method that allows the same kind of model, training algorithm and objective function to be reused several times.
Specifically, bagging involves constructing k different datasets. Each dataset has the same number of examples as the original dataset, but each dataset is constructed by sampling with replacement from the original dataset. This means that, with high probability, each dataset is missing some of the examples from the original dataset and also contains several duplicate examples. Model i is then trained on dataset i. The differences between which examples are included in each dataset result in differences between the trained models.

  • Neural networks reach a wide enough variety of solution points that they can often benefit from model averaging even if all of the models are trained on the same dataset.
  • Differences in random initialization, random selection of minibatches, differences in hyperparameters, or different outcomes of non-deterministic implementations of neural networks are often enough to cause different members of the ensemble to make partially independent errors.
    Model averaging is an extremely powerful and reliable method for reducing generalization error.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值