bootstrap && bagging && 决策树 && 随机森林

最新推荐文章于 2025-04-25 01:48:16 发布

weixin_30559481

最新推荐文章于 2025-04-25 01:48:16 发布

阅读量228

点赞数

CC 4.0 BY-SA版权

文章标签：数据结构与算法人工智能

原文链接：http://www.cnblogs.com/robin2ML/p/9860816.html

本文深入探讨了Bootstrap方法、Bagging集成算法及Random Forest在机器学习中的应用。Bootstrap通过创建并分析数据集的多个子集来提升统计估计的准确性。Bagging用于减少高方差算法如决策树的波动性，通过训练多个子数据集上的决策树并融合其预测结果。Random Forest进一步引入随机性，限制每个决策树可选择的特征，以减少树间的相关性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

看了一篇介绍这几个概念的文章，整理一点点笔记在这里，原文链接：

https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

1.Bootstrap Method

The bootstrap is a powerful statistical method for estimating a quantity from a data sample. This is easiest to understand if the quantity is a descriptive statistic such as a mean or a standard deviation.

就是说，bootstrap是一个统计学习的方法，用来更好的估计一个数据集的某些性质，比如方差和均值，当数据集的数据有一些错误的时候，这样可以提高估计的准确率；

具体的操作就是，创造一个数据集的多个子数据集，然后再各个子数据集上分别计算比如方差，最后将多个计算结果做平均；

2.Bootstrap Aggregation (Bagging)

是一种集成方法，集成方法就是合并来自多种机器学习预测方法计算的结果的技术，得到的结果比单一的预测结果要好；

Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. An algorithm that has high variance are decision trees, like classification and regression trees (CART).

Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees.

可以看出，bagging其实是 bootsrap方法在高方差的算法上的应用，用来降低方差 variance；所以可以得到 5-bagged decision trees 这种；

具体的方法也很简单，和bootstrap差不多，将数据集划分，然后再各个子数据集上分别训练决策树，最后合并决策树的预测结果，例如：

Let’s assume we have a sample dataset of 1000 instances (x) and we are using the CART algorithm. Bagging of the CART algorithm would work as follows.

1.Create many (e.g. 100) random sub-samples of our dataset with replacement.
2.Train a CART model on each sample.
3.Given a new dataset, calculate the average prediction from each model.