数据挖掘决策树ID3算法原理

最新推荐文章于 2019-09-23 19:55:17 发布

原创

最新推荐文章于 2019-09-23 19:55:17 发布 · 755 阅读

0 ·

CC 4.0 BY-SA版权

本文深入探讨了数据挖掘中的决策树ID3算法，详细解析了其原理，帮助读者理解如何通过该算法构建策略树。内容包括算法的核心思想、步骤以及在实际应用中的影响。

上一篇博客写了ID3算法的简单实现
这一篇讲讲ID3的原理
写这个算法是由于某同事的同学的毕业设计，关系够复杂的了==|||，写完这个算法，突然对数据挖掘有了兴趣，决定把C4.5,C5.0算法也一并实现，并且再研究一下数据挖掘的分类算法
其实这篇原理，没有我自己的内容。。。引用某人blog的东东吧（我本人倒是很反感抄袭的）
首先奉上blog作者：神威异度
虽然未曾与之交谈，不过经历千辛万苦的搜索之后，终于在他的blog发现了有价值的东西（这里要提一下，想要在国内搜索出有价值的东西真不容易，到处充斥着转载，小小鄙视一下我自己先），在这里万分感谢神威异度同学
奉上blog链接：http://www.blog.edu.cn/user2/huangbo929/archives/2006/1533249.shtml
再不厌其烦的把人家的东西copy到我这里。

决策树生成原理
Abstract
This paper details the ID3 classification algorithm. Very simply, ID3 builds a decision tree from a fixed set of examples. The resulting tree is used to classify future samples. The example has several attributes and belongs to a class (like yes or no). The leaf nodes of the decision tree contain the class name whereas a non-leaf node is a decision node. The decision node is an attribute test with each branch (to another decision tree) being a possible value of the attribute. ID3 uses information gain to help it decide which attribute goes into a decision node. The advantage of learning a decision tree is that a program, rather than a knowledge engineer, elicits knowledge from an expert.

Introduction
J. Ross Quinlan originally developed ID3 at the University of Sydney. He first presented ID3 in 1975 in a book, Machine Learning, vol. 1, no. 1. ID3 is based off the Concept Learning System (CLS) algorithm. The basic CLS algorithm over a set of training instances C:

Step 1: If all instances in C are positive, then create YES node and halt.

If all instances in C are negative, create a NO node and halt.

Otherwise select a feature, F with values v1, ..., vn and create a decision node.

Step 2: Partition the training instances in C into subsets C1, C2, ..., Cn according to the values of V.

Step 3: apply the algorithm recursively to each of the sets Ci.

Note, the trainer (the expert) decides which feature to select.

ID3 improves on CLS by adding a feature selection heuristic. ID3 searches through the attribu