Naive Bayes classifier

朴素贝叶斯分类器是一种简单的概率分类器,基于应用贝叶斯定理,并假设特征间相互独立。该分类器认为每个特征独立地对类别概率做出贡献。尽管这种假设在现实中很少成立,但朴素贝叶斯分类器在许多复杂的真实世界情境中表现出意外的良好效果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even though these features depend on the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without believing in Bayesian probability or using any Bayesian methods.

In spite of their naive design and apparently over-simplified assumptions, naive Bayes classifiers often work much better in many complex real-world situations than one might expect. Recently, careful analysis of the Bayesian classification problem has shown that there are some theoretical reasons for the apparently unreasonable efficacy of naive Bayes classifiers.[1] An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

 

The naive Bayes probabilistic model

Abstractly, the probability model for a classifier is a conditional model

p(C /vert F_1,/dots,F_n)/,

over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 through Fn. The problem is that if the number of features n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes' theorem, we write

p(C /vert F_1,/dots,F_n) = /frac{p(C) / p(F_1,/dots,F_n/vert C)}{p(F_1,/dots,F_n)}. /,

In plain English the above equation can be written as

/mbox{posterior} = /frac{/mbox{prior} /times /mbox{likelihood}}{/mbox{evidence}}. /,

In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model

p(C, F_1, /dots, F_n)/,

which can be rewritten as follows, using repeated applications of the definition of conditional probability:

p(C, F_1, /dots, F_n)/,
= p(C) / p(F_1,/dots,F_n/vert C)
= p(C) / p(F_1/vert C) / p(F_2,/dots,F_n/vert C, F_1)
= p(C) / p(F_1/vert C) / p(F_2/vert C, F_1) / p(F_3,/dots,F_n/vert C, F_1, F_2)
= p(C) / p(F_1/vert C) / p(F_2/vert C, F_1) / p(F_3/vert C, F_1, F_2) / p(F_4,/dots,F_n/vert C, F_1, F_2, F_3)
= p(C) / p(F_1/vert C) / p(F_2/vert C, F_1) / p(F_3/vert C, F_1, F_2) / /dots p(F_n/vert C, F_1, F_2, F_3,/dots,F_{n-1}).

Now the "naive" conditional independence assumptions come into play: assume that each feature Fi is conditionally independent of every other feature Fj for j/neq i. This means that

p(F_i /vert C, F_j) = p(F_i /vert C)/,

and so the joint model can be expressed as

p(C, F_1, /dots, F_n)
= p(C) / p(F_1/vert C) / p(F_2/vert C) / p(F_3/vert C) / /cdots/,
= p(C) /prod_{i=1}^n p(F_i /vert C)./,

This means that under the above independence assumptions, the conditional distribution over the class variable C can be expressed like this:

p(C /vert F_1,/dots,F_n) = /frac{1}{Z}  p(C) /prod_{i=1}^n p(F_i /vert C)

where Z is a scaling factor dependent only on F_1,/dots,F_n, i.e., a constant if the values of the feature variables are known.

Models of this form are much more manageable, since they factor into a so-called class prior p(C) and independent probability distributions p(F_i/vert C). If there are k classes and if a model for each p(F_i/vert C=c) can be expressed in terms of r parameters, then the corresponding naive Bayes model has (k − 1) + n r k parameters. In practice, often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common, and so the total number of parameters of the naive Bayes model is 2n + 1, where n is the number of binary features used for prediction.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值