Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the“naive”assumption of independence between every pair of features. Given a class variable y and a dependent feature vector x_1through x_n, Bayes’ theorem states the following relationship:

P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)}                                 {P(x_1, \dots, x_n)}

Using the naive independence assumption that

P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y),

for all i, this relationship is simplified to

P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)}                                 {P(x_1, \dots, x_n)}

Since P(x_1, \dots, x_n) is constant given the input, we can use the following classification rule:

P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\Downarrow\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),

and we can use Maximum A Posteriori (MAP) estimation to estimate P(y) and P(x_i \mid y); the former is then the relative frequency of class y in the training set.

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(x_i \mid y).

Example:

spam filtering: y means classification labels: Spam / Regular.

x means different words in the emails.

P(y)means the probability of y type emails in a set of all emails. AKA prior probability.

P(x_i \mid y)means the probability of word "xi" by given the email type. In other words, compute the frequency of word "xi" in all ytype emails.

Then we can simply compare the values of the final formula given different y type. The type with the higher value is treated as the output class label.

Note: Real Spam Filtering is much complex than this, you have to consider a lot of other situations, such as dealing with rare words(Laplace smoothing), dealing with words like "and", "is", "a", "the", how to define the threshold to do the final classification, how to define the posterior probability in the real life and so on. Wanna know more, see the wiki link below:

https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

Here is another example: I think it is more in details:

http://blog.youkuaiyun.com/amds123/article/details/70173402 

In scikit learn package,  MultinomialNB and BernoulliNB are  suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值