recent surveys claim that it’s the most commonly used technique.
One of the best things about decision trees is that humans can easily understand the data.
The decision tree does a great job of distilling data into knowledge.
Decision trees are often used in expert systems(专家系统是早期人工智能的一个重要分支,它可以看作是一类具有专门知识和经验的计算机智能程序系统,一般采用人工智能中的知识表示和知识推理技术来模拟通常由领域专家才能解决的复杂问题。)
Decision trees
Pros: Computationally cheap to use, easy for humans to understand learned results,
missing values OK, can deal with irrelevant features
Cons: Prone to overfitting
Works with: Numeric values, nominal values
Check if every item in the dataset is in the same class:
If so return the class label
Else
find the best feature to split the data
split the dataset
create a branch node
for each split
call createBranch and add the result to the branch node
return branch node
首先要讲的东西,是所谓的回归树。之前在我的博文中有介绍过决策树,大多是用来分类的。选择分类属性的标准是信息增益最大,涉及到熵这个概念。而在做回归树的时候,我们希望和回归这个东西多一点联系,因此选择变量的标准我们用残差平方和。