随机森林模型解释_随机森林解释

最新推荐文章于 2025-04-23 18:59:51 发布

weixin_26713457

最新推荐文章于 2025-04-23 18:59:51 发布

阅读量7.6k

点赞数 8

文章标签： python 机器学习 java 深度学习 tensorflow

原文链接：https://towardsdatascience.com/random-forest-explained-7eae084f3ebe

版权

本文详细介绍了随机森林模型的工作原理，包括其构建过程、决策树的随机化特征选择和并行化特性，揭示了该模型在机器学习中的广泛应用和优势。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

随机森林模型解释

In this post, we will explain what a Random Forest model is, see its strengths, how it is built, and what it can be used for.

在这篇文章中，我们将解释什么是随机森林模型，了解其优势，如何构建以及可用于什么。

We will go through the theory and intuition of Random Forest, seeing the minimum amount of maths necessary to understand how everything works, without diving into the most complex details.

我们将通过“随机森林”的理论和直觉，了解在不深入了解最复杂细节的情况下了解所有工作原理所需的最少数学量。

Let’s get to it!

让我们开始吧！

1.简介 (1. Introduction)

In the Machine Learning world, Random Forest models are a kind of non parametric models that can be used both for regression and classification. They are one of the most popular ensemble methods, belonging to the specific category of Bagging methods.

在机器学习世界中，随机森林模型是一种非参数模型，可用于回归和分类。它们是最流行的合奏方法之一，属于Bagging方法的特定类别。

Ensemble methods involve using many learners to enhance the performance of any single one of them individually. These methods can be described as techniques that use a group of weak learners (those who on average achieve only slightly better results than a random model) together, in order to create a stronger, aggregated one.

集成方法涉及使用许多学习者来分别提高其中任何一个学习者的表现。这些方法可以描述为一种技术，它使用一组弱势学习者(那些学习者平均仅比随机模型获得更好的结果)，以创建一个更强大的汇总者。

In our case, Random Forests are an ensemble of many individual Decision Trees. If you are not familiar with Decision Trees, you can learn all about them here:

在我们的案例中，随机森林是许多单独的决策树的集合。如果您不熟悉决策树，则可以在这里了解所有有关决策树的信息：

One of the main drawbacks of Decision Trees is that they are very prone to over-fitting: they do well on training data, but are not so flexible for making predictions on unseen samples. While there are workarounds for this, like pruning the trees, this reduces their predictive power. Generally they are models with medium bias and high variance, but they are simple and easy to interpret.

决策树的主要缺点之一是它们很容易过度拟合：它们在训练数据上表现出色，但是对于对看不见的样本进行预测并不那么灵活。尽管有一些解决方法，例如修剪树木，但这会降低其预测能力。通常，它们是具有中等偏差和高方差的模型，但是它们简单易懂。

If you are not very confident with the difference between bias and variance, check out the following post:

如果您对偏差和方差之间的差异不是很有把握，请查看以下文章：

Random Forest models combine the simplicity of Decision Trees with the flexibility and power of an ensemble model. In a forest of trees, we forget about the high variance of an specific tree, and are less concerned about each individual element, so we can grow nicer, larger trees that have more predictive power than a pruned one.

随机森林模型将决策树的简单性与集成模型的灵活性和强大功能相结合。在一片森林中，我们会忘记一棵特定树木的高变异性，而不必担心每个元素，因此我们可以种植更好的，更大的树木，比修剪的树木具有更大的预测能力。

Al tough Random Forest models don’t offer as much interpret ability as a single tree, their performance is a lot better, and we don’t have to worry so much about perfectly tuning the parameters of the forest as we do with individual trees.

严格的随机森林模型没有提供像单棵树那么多的解释能力，它们的性能要好得多，而且我们不必像单独树一样担心完美调整森林的参数。

Okay, I get it, a Random Forest is a collection of individual trees. But why the name Random? Where is the Randomness? Lets find out by learning how a Random Forest model is built.

好的，我明白了，随机森林是单个树木的集合。 但是为什么命名为Random？ 随机性在哪里？