《Machine Learning（Tom M. Mitchell）》读书笔记——7、第六章

最新推荐文章于 2019-01-28 10:41:11 发布

原创

最新推荐文章于 2019-01-28 10:41:11 发布 · 3.4k 阅读

2 ·

CC 4.0 BY-SA版权

本文详细探讨了贝叶斯学习在机器学习中的重要性，包括最大后验假设、贝叶斯定理及其在概念学习中的应用。讨论了在不同假设下，极大似然和最小误差平方假设之间的等价性，并介绍了朴素贝叶斯分类器的工作原理。此外，还概述了贝叶斯信念网络、EM算法及其在处理未观察变量学习问题中的作用。

1. Introduction (about machine learning)

2. Concept Learning and the General-to-Specific Ordering

3. Decision Tree Learning

4. Artificial Neural Networks

5. Evaluating Hypotheses

6. Bayesian Learning

7. Computational Learning Theory

8. Instance-Based Learning

9. Genetic Algorithms

10. Learning Sets of Rules

11. Analytical Learning

12. Combining Inductive and Analytical Learning

13. Reinforcement Learning

6. Bayesian Learnin

6.1 INTRODUCTION

Bayesian learning methods are relevant to our study of machine learning for two different reasons. First, Bayesian learning algorithms that calculate explicit probabilities for hypotheses, such as the naive Bayes classifier(朴素贝叶斯分类器), are among the most practical approaches to certain types of learning problems. The second reason that Bayesian methods are important to our study of machine learning is that they provide a useful perspective for understanding many learning algorithms that do not explicitly manipulate probabilities.

One practical difficulty in applying Bayesian methods is that they typically require initial knowledge of many probabilities. When these probabilities are not known in advance they are often estimated based on background knowledge, previously available data, and assumptions about the form of the underlying distributions. A second practical difficulty is the significant computational cost required to determine the Bayes optimal hypothesis in the general case (linear in the number of candidate hypotheses).

6.2 BAYES THEOREM

作为理科生，概率论是基础，就不细说了！

The most probable hypothesis h ∈ H given the observed data D (or at least one of the maximally probable if there are several) is called a maximum a posteriori (MAP) hypothesis(极大后验假设).

We will assume that every hypothesis in H is equally probable a priori (P(hi) = P(hj) for all hi and hj in H). In this case we can further simplify Equation (6.2) and need only consider the term P(D|h) to find the most probable hypothesis.

极大似然假设

6.3 BAYES THEOREM AND CONCEPT LEARNING

6.3.1 Brute-Force Bayes Concept Learning

This algorithm may require significant computation, because it applies Bayes theorem to each hypothesis in H to calculate P(hJ D). While this may prove impractical for large hypothesis spaces, the algorithm is still of interest because it provides a standard against which we may judge the performance of other concept learning algorithms.

(推导过程略)

最低0.47元/天解锁文章