模型
朴素贝叶斯是根据贝叶斯定理学习联合概率
F
\mathcal F
F =
{
P
∣
P
(
Y
,
X
)
}
\{P\ |\ P ( Y,X ) \}
{P ∣ P(Y,X)},因此朴素贝叶斯是一个生成模型。所以根据贝叶斯公式,
就要通过数据得到先验概率分布
P
(
Y
=
c
k
)
P ( Y=c_k )
P(Y=ck)和条件概率分布
P
(
X
=
x
∣
Y
=
c
k
)
P( X=x|Y=c_k )
P(X=x∣Y=ck)
策略
后验概率最大化
y
=
f
(
x
)
=
a
r
g
max
c
k
P
(
Y
=
c
k
)
∏
j
P
(
X
j
=
x
j
∣
Y
=
c
k
)
∑
k
P
(
Y
=
c
k
)
∏
j
P
(
X
j
=
x
j
∣
Y
=
c
k
)
y=f(x)=arg \max \limits_{c_k} \cfrac{P(Y=c_k ) \prod_j P( X^j=x^j|Y=c_k )}{\sum_k P(Y=c_k ) \prod_j P( X^j=x^j|Y=c_k )}
y=f(x)=argckmax∑kP(Y=ck)∏jP(Xj=xj∣Y=ck)P(Y=ck)∏jP(Xj=xj∣Y=ck)
因为分母都一样,所以等价于
y
=
f
(
x
)
=
a
r
g
max
c
k
P
(
Y
=
c
k
)
∏
j
P
(
X
j
=
x
j
∣
Y
=
c
k
)
y=f(x)=arg \max \limits_{c_k} {P(Y=c_k ) \prod_j P( X^j=x^j|Y=c_k )}
y=f(x)=argckmaxP(Y=ck)∏jP(Xj=xj∣Y=ck)
算法
因为直接计算条件概率有指数级数量的参数,所以把概率估计转化为参数估计
极大似然估计
作业
假设有N个数据,先验概率
p
=
P
(
Y
=
c
k
)
p=P(Y=c_k )
p=P(Y=ck)且彼此相互独立
似然函数为
L
(
p
)
=
P
(
y
1
,
y
2...
y
N
)
=
p
∑
i
N
I
(
y
i
=
c
k
)
(
1
−
p
)
∑
i
N
I
(
y
i
≠
c
k
)
L(p) =P(y1,y2...yN)= {p}^{\sum_{i}^{N} I(yi=c_k)}(1-p)^{\sum_{i}^{N}{ I(yi \ne c_k)}}
L(p)=P(y1,y2...yN)=p∑iNI(yi=ck)(1−p)∑iNI(yi̸=ck)
求导得
p
∑
i
N
I
(
y
i
=
c
k
)
−
1
(
1
−
p
)
∑
i
N
I
(
y
i
≠
c
k
)
−
1
(
(
1
−
p
)
∑
i
N
I
(
y
i
=
c
k
)
−
p
∑
i
N
I
(
y
i
≠
c
k
)
)
=
0
{p}^{\sum_{i}^{N} I(yi=c_k)-1}(1-p)^{\sum_{i}^{N}{ I(yi \ne c_k)-1}}((1-p){\sum_{i}^{N} I(yi=c_k)}- p {\sum_{i}^{N}{ I(yi \ne c_k)}})=0
p∑iNI(yi=ck)−1(1−p)∑iNI(yi̸=ck)−1((1−p)∑iNI(yi=ck)−p∑iNI(yi̸=ck))=0
解得
p
=
P
(
Y
=
c
k
)
=
∑
i
N
I
(
y
i
=
c
k
)
N
p=P(Y=c_k)=\cfrac{\sum_{i}^{N} I(yi = c_k)}{N}
p=P(Y=ck)=N∑iNI(yi=ck)
由于朴素贝叶斯对先验概率和条件概率都做了独立性假设,因此联合概率也彼此独立
可以得到
p = P ( Y = c k , X ( j ) = a j l ) = ∑ i N I ( X ( j ) = a j l , y i = c k ) N p=P(Y=c_k,X^{(j)}=a_{jl})=\cfrac{\sum_{i}^{N} I(X^{(j)}=a_{jl},yi = c_k)}{N} p=P(Y=ck,X(j)=ajl)=N∑iNI(X(j)=ajl,yi=ck)
所以条件概率 p = P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i N I ( X ( j ) = a j l , y i = c k ) ∑ i N I ( y i = c k ) p=P(X^{(j)}=a_{jl}|Y=c_k)=\cfrac{\sum_{i}^{N} I(X^{(j)}=a_{jl},yi = c_k)}{\sum_{i}^{N} I(yi = c_k)} p=P(X(j)=ajl∣Y=ck)=∑iNI(yi=ck)∑iNI(X(j)=ajl,yi=ck)
贝叶斯估计
用极大似然估计可能会得到估计值为0的情况,因此采用贝叶斯估计
作业
未知的情况下,可以考虑假设先验概率
p
=
P
(
Y
=
c
k
)
p=P(Y=c_k )
p=P(Y=ck)为均匀分布,即
p
K
−
1
=
0
pK-1=0
pK−1=0 …(1)
又因为
p
=
P
(
Y
=
c
k
)
=
∑
i
N
I
(
y
i
=
c
k
)
N
p=P(Y=c_k)=\cfrac{\sum_{i}^{N} I(yi = c_k)}{N}
p=P(Y=ck)=N∑iNI(yi=ck) …(2)
所以引入一个
λ
\lambda
λ
(
1
)
∗
λ
+
(
2
)
(1)* \lambda +(2)
(1)∗λ+(2) =0
解得
p
=
P
λ
(
Y
=
c
k
)
=
∑
i
N
I
(
y
i
=
c
k
)
+
λ
N
+
K
λ
p=P_\lambda(Y=c_k)=\cfrac{\sum_{i}^{N} I(yi = c_k)+\lambda}{N+K\lambda}
p=Pλ(Y=ck)=N+Kλ∑iNI(yi=ck)+λ
同理得到
p = P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i N I ( X ( j ) = a j l , y i = c k ) + λ ∑ i N I ( y i = c k ) + S j λ p=P(X^{(j)}=a_{jl}|Y=c_k)=\cfrac{\sum_{i}^{N} I(X^{(j)}=a_{jl},yi = c_k)+\lambda}{\sum_{i}^{N} I(yi = c_k)+S_j\lambda} p=P(X(j)=ajl∣Y=ck)=∑iNI(yi=ck)+Sjλ∑iNI(X(j)=ajl,yi=ck)+λ
注:
λ
=
1
\lambda=1
λ=1时是拉普拉斯平滑。
参考第4章答案