Naive Bayes Theorm And Application - Application

本文深入探讨了朴素贝叶斯定理及其在文本分类、垃圾邮件过滤、新闻类别识别及情感分析中的应用。接着介绍了自举(Autoclass)算法,一种用于无监督学习的朴素贝叶斯方法,特别适用于数据聚类任务。文中详细解释了最大期望算法(EM)在训练自举模型过程中的作用,包括期望步骤(E-step)和最大化步骤(M-step)。通过实例展示了如何使用EM算法进行参数估计,并在自举算法中实现这一过程。最后,文章提供了关于EM算法在自举算法中的具体应用案例,以及如何利用EM算法进行参数迭代以达到收敛的详细步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Naive Bayes Theorm And Application - Application


Naive Bayes Theore classifier can be applied for Text classification, such as classifying the email into spam and not spam, getting news categories, and finding emotion.

Autoclass:Naive Bayes for clustering

Autoclass is just like Naive Bayes, but it designed for unsupervised learning. Given unlabeled training data D1,...,DN where Di=xi1,...xik where k is the attributes of the instance i, without class label like win or fail etc. Goal of this problem is to learn a Naive Bayes Model. We introduce 2 symbols: P(C) for the probability for class C and P(Xi|C) for the probability of attributes Xi given class C.

To solve this problem, we use maximum likelihood algorithm. Just like the theorem about Naive Bayes we disscuss in Naive Bayes Theorm And Application - Theorem
Parameters:
1. θC = P(C=T)
2. 1θC = P(C=F)
3. P(Xi=T|C=T) = θTi
4. P(Xi=F|C=T) = 1θTi
5. P(Xi=T|C=F) = θFi
6. P(Xi=F|C=F) = 1 - θFi
7. θ=θC,θT1,...,θTn,θF1,...θFn
And the approach is for this problem is to find θ that maximizes the formular:

L(θ)=p(D|θ)=i=1Np(xi|θ)
But this is a difficult problem because we don’t have sufficient statistics because the class labels are missing.

EM(Expectation Maximization)

Expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posterior(MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. —— Wikipedia

The problem now is that data is not fully observed(not labels).
1.If we know the sufficient statistics of the data, we can choose parameter values so as to maximize the likelihood just like discussed in the theorem essay.
2. If we know the model parameters, we can compute a probability distribution over the missing attributes. From these, we get the expected sufficient statistics

Expected sufficient statistics

From observed data and model parameters we get the probability of every possible completion of the data(guess the label of the data). Then each completion defines sufficient statistics(find θ maximizing the likelihood). The expected sufficient statistics is the expectation, taken over all possible completions, of the sufficient statistics for each completion.

Now we can give a general form about EM algorithm:
Repeat:
θold=θ
E-step (Expectation): Compute the expectated sufficient statistics.
M-step (Maxmization): Choose θ so as to maximize the likelihood of the expected sufficient statistical.
Until θ is close to θold

Example: E step

θC=0.7θT1=0.9θF1=0.3θT2=0.6θF2=0.2

D = [FTTT]
After the first completions, in other other words, add label vector to the Data matrix, now we assume the augmented matrix Completion1:
1st:
[FTTTFF]

2nd:
[FTTTFT]

3rd:
[FTTTTF]

4th:
[FTTTTT]

Example: M-step

The probability of

P(Completion1|θ)P(X1=F,X2=T,C=F)P(X1=T,X2=T,C=F)=P(C=F)P(X1=F|C=F)P(X2=T|C=F)P(C=F)P(X1=T|C=F)P(X2=T|C=F)=0.30.70.20.30.30.2=0.000756

With the same procedure, we can get:
P(Completion2)0.30.70.20.70.90.6=0.015876
P(Completion3)0.70.10.60.30.30.2=0.000756
P(Completion4)0.70.10.60.70.90.6=0.015876

iteration with EM step

Now a interation finished, we “guess” the completion (often refering labels), and maximize the parameter θ with the completion. Now we can start a new iteration with new θ . The explaination of the epecatation symbols i s given in the Naive Bayes Theorm And Application - Theorem
.

E[NT]=0.02270+0.47731+0.02271+0.47732=1.4546E[NF]=NE[NT]E[NT1,T]=0.02270+0.47731+0.02270+0.47731=0.9546E[NF1,T]=0.02271+0.47730+0.02271+0.47730=0.0454E[NT1,T]=1.4546E[NF2,T]=0.5454

Now we can make Maximum likelihood estimates(2nd M-step) again:

θC=E[NT]/N=1.4546/2=0.7273θT1=E[NT1,T]/E[NT]=0.9546/1.4546=0.6563θF1=E[NF1,T]/E[NF]=0.0454/0.5454=0.0832θT2=E[NT2,T]/E[NT]=1.4546/1.4546=1θF2=E[NT2,T]/E[NF]=0.5454/0.5454=1

EM algorithm in Naive Bayes

In fact, the number of completions is exponential in number of instances.

Key observation

We don’t care about the exact completions, only expected sufficient statistics. While Each instances contributes separately to expected sufficient. Then we can:
1. enumerate completions of each instance separately.
2. get probability of each completion.
3. get expected contribution of that instance to sufficient statistics.

E-step for Naive Bayes:

Expectation according to the initial parameter θ :
1. E[NT] is the expected number of instances in which the class is T.
2. Each instance has probability of the class being T.
3. Each instance contributes that probability to E[NT]
4. In symbols:

E[NT]=j=1NP(Cj=T|xj1,...,xjn)j=1NP(Cj=T)i=1nP(xji|Cj=T)

5. E[NTi,T] is the expected number of times the class is T when Xi is T. If an instance has XiT , it contributes 0 to E[NTi,T] .
6. If an instance has Xi=T , it contributes the probability that the class is T to E[NTi,T] .
7. In symbols:
E[NTi,T]=j:xji=TNP(Cj=T|xji,...,xjn)j:xji=TNP(Cj=T)i=1nP(xji|Cj=T)
.

M-step for Naive Bayes:

Maximize the the likelihood according to the expectation:
1. E[NT] is the expected number of instances in which the class is T.
2. Each instance has probability of the class being T.
3. Each instance contributes that probability to E[NT]
4. In symbols:

E[NT]=j=1NP(Cj=T|xj1,...,xjn)j=1NP(Cj=T)i=1nP(xji|Cj=T)

For notational convenience, we encode T as 1, F as 0, then for instance Xj :

P(xji|Cj=T)=(θTi)xji(1θTi)1xjiP(xji|Cj=F)=(θFi)xji(1θFi)1xji

Autoclass

Set θC,θTi and θFi to arbitrary values for all attributes. Then Repeat the EM algorithm until convergence:
Expectation step
Maximization step
In the expectation step,

E[NT]=0E[NTi,T]=0E[NFi,T]=0

For each instance Dj :
pT=θCi=1n(θTi)xji(1θTi)1xjipF=(1θC)i=1n(θFi)xji(1θFi)1xjiq=pTpT+pFE[NT]+=q

for each attribute i:
if x_i & == T:
    E[NTi,T]+=q
    E[NFi,T]+=(1q)
In the maximization step:
θC=E[NT]N
For each attribute i:
θTi=E[NTi,T]E[NT]θFi=E[NFi,T]NE[NT]

Example of Autoclass

E-step: Given the the initial “guess” about the parameter θ and expectation:

θC=0.7θT1=0.9,θF1=0.3θT2=0.6,θF2=0.2

and the dataset, D :
instances12attribute 1(X_1)FTattributes 2(X_2)TT

and the initial expectation is below :
E[NT]=0E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0E[NF2,T]=0

Detail of EM steps in Autoclass

The 1st E-step, for instance 1:

pTpFq=θCi=12(θT1)xji(1θTi)1xji=0.7(0.90)(10.9)10(0.61)(10.6)0=0.042=(1θC)i=12(θF1)xji(1θFi)1xji=(10.7)(0.30)(10.3)10(0.21)(10.2)0=0.042=0.0420.042+0.042=0.5

after the loop of instance 1, the expectation is below:
E[NT]=0.5E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0.5E[NF2,T]=0.5

For instance 2:

pTpFq=θCi=12(θT1)xji(1θTi)1xji=0.7(0.910.10)(0.610.40)=0.378=(1θC)i=12(θF1)xji(1θFi)1xji=0.3(0.310.70)(0.210.80)=0.018=0.3780.378+0.018=0.95

after the loop of instance 1, the expectation is below:
E[NT]=1.45E[NT1,T]=0.95E[NF1,T]=0.05E[NT2,T]=1.45E[NF2,T]=0.55

The first M-step, we maximize the parameter according the expectation:
θC=E[NT]N=1.452=0.72
For attribute 1:

θT1=E[NT1,T]E[NT]=0.951.45=0.65θF1=E[NF1,T]NE[NT]=0.051.45=0.09

For attribute 2:

θT2=E[NT2,T]E[NT]=1.451.45=1.0θF2=E[NF2,T]NE[NT]=1.451.45=1.0

Convergence

EM improves the likelihood on every iteration, and it’s guaanteed to converge to a maximum of the likelihood function. But the maximum may be a local maximum. There is a tip about Using EM algorithm, don’t start EM with symmetric parameter values, in particular, not starting with uniform.

Reference

Most of content in this essay comes from the CMU machine learning course notes, while I’m forgetting the source link. Sorry!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值