期望最大（EM）算法推导

EM算法原理详解

最新推荐文章于 2024-09-13 21:23:42 发布

原创最新推荐文章于 2024-09-13 21:23:42 发布 · 381 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#em #期望最大

Data/Web Mining 专栏收录该内容

20 篇文章

订阅专栏

X是一个随机向量，我们希望找到[img]http://latex.codecogs.com/gif.latex?\theta[/img]
使得[img]http://latex.codecogs.com/gif.latex?P(X|\theta)[/img]取得最大值，这就是关于[img]http://latex.codecogs.com/gif.latex?\theta[/img]的最大似然估计。
为了方便估计[img]http://latex.codecogs.com/gif.latex?\theta[/img]，我们一般引入log似然函数：
[img]http://latex.codecogs.com/gif.latex?L(\theta)=lnP(X|\theta)[/img]
EM算法是一个迭代的过程，假设第n次迭代当前[img]http://latex.codecogs.com/gif.latex?\theta[/img]的估计是[img]http://latex.codecogs.com/gif.latex?\theta_n[/img]。由于我们的目标是最大化[img]http://latex.codecogs.com/gif.latex?L(\theta)[/img]，我们希望新一轮的更新[img]http://latex.codecogs.com/gif.latex?\theta[/img]使得
[img]http://latex.codecogs.com/gif.latex?L(\theta)>L(\theta_n)}[/img]
等价的，我们希望最大化他们的不同：
[img]http://latex.codecogs.com/gif.latex?L(\theta)-L(\theta_n)=lnP(X|\theta)-lnP(x|\theta_n)[/img]
现在我们考虑隐变量的问题，隐变量可能是没有观测到的或者缺失的变量，有时为了计算最大似然函数更容易解决也会引入隐变量，因为可以利用EM框架来方便计算。我们假设隐变量用Z来表示，那么
[img]http://latex.codecogs.com/gif.latex?P(X|\theta)=\sum_z{P(X|z,\theta)P(z|\theta)[/img]
我们重写一下[img]http://latex.codecogs.com/gif.latex?L(\theta)-L(\theta_n)[/img]得到：
[img]http://latex.codecogs.com/gif.latex?L(\theta)-L(\theta_n)=ln(\sum_z{P(X|z,\theta)P(z|\theta)})-lnP(X|\theta_n)[/img]
利用Jensen's不定式：
[img]http://latex.codecogs.com/gif.latex?ln\sum_{i=1}^n\lambda_ix_i\ge\sum_{i=1}^n\lambda_i{ln(x_i)}[/img]
其中常量[img]http://latex.codecogs.com/gif.latex?\lambda_i\ge0[/img]并且[img]http://latex.codecogs.com/gif.latex?\sum_{i=1}^n\lambda_i=1[/img]
[img]http://latex.codecogs.com/gif.latex?L(\theta)-L(\theta_n)=ln(\sum_z{P(X|z,\theta)P(z|\theta)})-lnP(X|\theta_n)[/img]
[img]http://latex.codecogs.com/gif.latex?=ln(\sum_z{P(X|z,\theta)P(z|\theta)}\frac{P(z|X,\theta_n)}{P(z|X,\theta_n)})-lnP(X|\theta_n)[/img]
[img]http://latex.codecogs.com/gif.latex?=ln(\sum_z{P(z|X,\theta_n)\frac{P(X|z,\theta)P(z|\theta)}{P(z|X,\theta_n)})-lnP(X|\theta_n)[/img]
[img]http://latex.codecogs.com/gif.latex?\ge\sum_z{P(z|X,\theta_n)}ln({\frac{P(X|z,\theta)P(z|\theta)}{P(z|X,\theta_n)})-lnP(X|\theta_n)[/img]
[img]http://latex.codecogs.com/gif.latex?\ge\sum_z{P(z|X,\theta_n)}ln({\frac{P(X|z,\theta)P(z|\theta)}{P(z|X,\theta_n)P(X|\theta_n)})[/img]
[img]http://latex.codecogs.com/gif.latex?\doteq\Delta(\theta|\theta_{n})[/img]
其中由于[img]http://latex.codecogs.com/gif.latex?\sum_z{P(z|X,\theta_n)=1[/img]
所以有：
[img]http://latex.codecogs.com/gif.latex?P(X|\theta_n)=\sum_z{P(z|X,\theta_n)}ln{P(x|\theta_n)}[/img]
我们可以写作：
[img]http://latex.codecogs.com/gif.latex?L(\theta)\ge{L(\theta_n)+\Delta(\theta|\theta_n)}[/img]
为了方便，我们定义：
[img]http://latex.codecogs.com/gif.latex?l(\theta|\theta_n)\doteq{L(\theta_n)+\Delta(\theta|\theta_n)}[/img]
这样我们得到
[img]http://latex.codecogs.com/gif.latex?L(\theta)\ge{l(\theta|\theta_n)}[/img]

现在我们得到了似然函数[img]http://latex.codecogs.com/gif.latex?L(\theta)[/img]的下界[img]http://latex.codecogs.com/gif.latex?{l(\theta|\theta_n)}[/img]
另外我们观察到：
[img]http://latex.codecogs.com/gif.latex?{l(\theta_n|\theta_n)}[/img]
[img]http://latex.codecogs.com/gif.latex?=L(\theta_n)+\Delta(\theta_n|\theta_n)=L(\theta_n)+\sum_z{P(z|X,\theta_n)ln\frac{P(X|z,\theta_n)P(z|\theta_n)}{P(z|X,\theta_n)P{X|\theta_n)}}[/img]
[img]http://latex.codecogs.com/gif.latex?=L(\theta_n)+\sum_z{P(z|X,\theta_n)ln\frac{P(X,z|\theta_n)}{P(X,z|\theta_n)}=L(\theta_n)+\sum_z{P(z|X,\theta_n)ln1=L(\theta_n)[/img]
所以当[img]http://latex.codecogs.com/gif.latex?\theta=\theta_n[/img]时，
[img]http://latex.codecogs.com/gif.latex?{l(\theta|\theta_n)}=L(\theta)[/img]

所以任何能够增加[img]http://latex.codecogs.com/gif.latex?{l(\theta_n|\theta_n)}[/img]的[img]http://latex.codecogs.com/gif.latex?\theta[/img]都会增加[img]http://latex.codecogs.com/gif.latex?L(\theta)[/img]
所以EM算法选择最大化[img]http://latex.codecogs.com/gif.latex?{l(\theta_n|\theta_n)}[/img]

最终我们得到：
[img]http://latex.codecogs.com/gif.latex?\theta_{n+1}=argmax_{\theta}\{l(\theta|\theta_n)\}[/img]
[img]http://latex.codecogs.com/gif.latex?=argmax_\theta\{\L(\theta_n)+\sum_z{P(z|X,\theta_n)}ln({\frac{P(X|z,\theta)P(z|\theta)}{P(z|X,\theta_n)P(X|\theta_n)})\}[/img]
去掉相对于[img]http://latex.codecogs.com/gif.latex?\theta[/img]的常量得到：
[img]http://latex.codecogs.com/gif.latex?=argmax_\theta\{\sum_z{P(z|X,\theta_n)}ln(P(X|z,\theta)P(z|\theta)\}[/img]
[img]http://latex.codecogs.com/gif.latex?=argmax_\theta\{\sum_z{P(z|X,\theta_n)}ln\frac{P(X,z,\theta)}{p(z,\theta)}\frac{P(z,\theta)}{p(\theta)}\}[/img]
[img]http://latex.codecogs.com/gif.latex?=argmax_\theta\{\sum_z{P(z|X,\theta_n)}ln(P(X,z|\theta)\}[/img]
[img]http://latex.codecogs.com/gif.latex?=argmax_\theta\{E_{Z|X,\theta_n}\{lnP(X,z|\theta)}\}[/img]

所以EM包含以下迭代步骤：
1、E-step: 得到条件期望[img]http://latex.codecogs.com/gif.latex?E_{Z|X,\theta_n}\{lnP(X,z|\theta)}[/img]
2、M-step：求解[img]http://latex.codecogs.com/gif.latex?\theta[/img]最大化该条件期望