decision tree-优快云博客

本文链接：https://blog.youkuaiyun.com/jmydream/article/details/8644114

本文介绍了决策树技术，这是一种被广泛使用的数据处理方法。它能够将复杂的数据转化为易于理解的知识，并适用于专家系统的构建。文章还讨论了决策树的优点，如计算成本低、能够处理缺失值及无关特征等；同时也指出了其缺点，即容易过拟合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

recent surveys claim that it’s the most commonly used technique.

One of the best things about decision trees is that humans can easily understand the data.

The decision tree does a great job of distilling data into knowledge.

Decision trees are often used in expert systems（专家系统是早期人工智能的一个重要分支，它可以看作是一类具有专门知识和经验的计算机智能程序系统，一般采用人工智能中的知识表示和知识推理技术来模拟通常由领域专家才能解决的复杂问题。）

Decision trees
Pros: Computationally cheap to use, easy for humans to understand learned results,
missing values OK, can deal with irrelevant features
Cons: Prone to overfitting
Works with: Numeric values, nominal values

Check if every item in the dataset is in the same class:
If so return the class label
Else
find the best feature to split the data
split the dataset
create a branch node
for each split
call createBranch and add the result to the branch node
return branch node

首先要讲的东西，是所谓的回归树。之前在我的博文中有介绍过决策树，大多是用来分类的。选择分类属性的标准是信息增益最大，涉及到熵这个概念。而在做回归树的时候，我们希望和回归这个东西多一点联系，因此选择变量的标准我们用残差平方和。