《统计学习方法》(第九章)—— EM算法及推广

EM算法主要用于处理含有隐变量的概率模型的参数估计问题。本文详细介绍了EM算法的引入、导出过程、在无监督学习和高斯混合模型中的应用,以及F函数的极大-极大算法和GEM算法的推广。通过迭代的E步和M步,EM算法能够逐步提高模型的似然性,直至收敛。

EM算法的引入

概率模型有时既含有观测变量,又含有隐变量或潜在变量。所以不能直接用极大似然估计去估计参数。EM算法就是对含有隐变量模型的参数的极大似然估计算法。

EM算法

   一般用YYY表示观测随机变量的数据,ZZZ表示隐随机变量的数据,YYYZZZ连起来称为完全数据,YYY称为不完全数据。假设给定观测数据YYY,其概率分布P(Y∣θ)P(Y|\theta)P(Yθ),其中θ\thetaθ为参数。那么不完全数据YYY的似然函数是P(Y∣θ)P(Y|\theta)P(Yθ),其对数似然函数是L(θ)=log⁡P(Y∣θ)L(\theta)=\log P(Y|\theta)L(θ)=logP(Yθ),假设YYYZZZ的联合改论分布是P(Y,Z∣θ)P(Y,Z|\theta)P(Y,Zθ),那么完全数据的对数似然函数是L(θ)=P(Y,Z∣θ)L(\theta)=P(Y,Z|\theta)L(θ)=P(Y,Zθ)
EMEMEM算法基本思路是先求期望MMM再进一步最大化,似然函数
算法:
输入:观测变量YYY,隐变量数据ZZZ,联合分布P(Y,Z∣θ)P(Y,Z|\theta)P(Y,Zθ),条件分布P(Z∣Y,θ)P(Z|Y,\theta)P(ZY,θ)
输出:模型参数θ\thetaθ
(1)(1)(1)选择参数的初始值θ(0),\theta^{(0)},θ(0),开始迭代
(2)(2)(2)EEE步:记θ(i)\theta^{(i)}θ(i),为第iii次迭代的参数估计值,在第i+1i+1i+1次迭代的EEE计算
Q(θ,θ(i))=Ez[log⁡P(Y,Z∣θ)∣Y,θ(i)]Q(\theta,\theta^{(i)})=E_z[\log P(Y,Z|\theta)|Y,\theta^{(i)}]Q(θ,θ(i))=Ez[logP(Y,Zθ)Y,θ(i)]
=∑Zlog⁡P(Y,Z∣θ)P(Z∣Y,θ(i))=\sum\limits_{Z}\log P(Y,Z|\theta)P(Z|Y,\theta^{(i)})=ZlogP(Y,Zθ)P(ZY,θ(i))
(3)(3)(3)MMM步:求使Q(θ,θ(i))Q(\theta,\theta^{(i)})Q(θ,θ(i))极大化的θ\thetaθ,确定第i+1i+1i+1次迭代的参数估计值θ(i+1)\theta^{(i+1)}θ(i+1)
θ(i+1)=arg max⁡θQ(θ,θ(i))\theta^{(i+1)}=\argmax\limits_{\theta}Q(\theta,\theta^{(i)})θ(i+1)=θargmaxQ(θ,θ(i))
(4)(4)(4)重复(2),(3)(2),(3)(2),(3)直到收敛

注意,定义Q(θ,θ(i))=EZ[log⁡P(Y,Z∣θ)∣Y,θ(i)]Q(\theta,\theta^{(i)})=E_Z[\log P(Y,Z|\theta)|Y,\theta^{(i)}]Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]
∣∣θi+1−θi∣∣<ϵ1    or    ∣∣Q(θ(i+1),θ(i))−Q(θ(i),θ(i))∣∣<ϵ2||\theta^{i+1}-\theta^{i}|| <\epsilon_1 \ \ \ \ or \ \ \ \ ||Q(\theta^{(i+1)},\theta^{(i)})-Q(\theta^{(i)},\theta^{(i)})||<\epsilon_2θi+1θi<ϵ1    or    Q(θ(i+1),θ(i))Q(θ(i),θ(i))<ϵ2
算法停止

EM算法的导出

对数似然函数为
L(θ)=log⁡P(Y∣θ)=log⁡∑ZP(Y,Z∣θ)L(\theta)=\log P(Y|\theta)=\log \sum\limits_Z P(Y,Z|\theta)L(θ)=logP(Yθ)=logZP(Y,Zθ)
=log⁡(∑P(Y∣Z,θ)P(Z∣θ))=\log(\sum\limits_P(Y|Z,\theta)P(Z|\theta))=log(P(YZ,θ)P(Zθ))
我们希望新值L(θ)>L(θ(i))L(\theta)>L(\theta^{(i)})L(θ)>L(θ(i))于是
L(θ)−L(θ(i))=log⁡(∑ZP(Y∣Z,θ)P(Z∣θ))−log⁡P(Y∣θ(i))L(\theta)-L(\theta^{(i)})=\log (\sum\limits_ZP(Y|Z,\theta)P(Z|\theta))-\log P(Y|\theta^{(i)})L(θ)L(θ(i))=log(ZP(YZ,θ)P(Zθ))logP(Yθ(i))
利用JensenJensenJensen不等式得
L(θ)−L(θ(i))=log⁡(∑ZP(Z∣Y,θi)P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θi))−log⁡P(Y∣θ(i))L(\theta)-L(\theta^{(i)})=\log(\sum\limits_ZP(Z|Y,\theta^{i})\frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{i})})-\log P(Y|\theta^{(i)})L(θ)L(θ(i))=log(ZP(ZY,θi)P(ZY,θi)P(YZ,θ)P(Zθ))logP(Yθ(i))
≥∑ZP(Z∣Y,θ(i))log⁡P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θi)−log⁡P(Y∣θ(i))\ge \sum\limits_ZP(Z|Y,\theta^{(i)})\log \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{i})}-\log P(Y|\theta^{(i)})ZP(ZY,θ(i))logP(ZY,θi)P(YZ,θ)P(Zθ)logP(Yθ(i))
=∑ZP(Z∣Y,θ(i))log⁡P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θ(i))P(Y∣θ(i))=\sum\limits_ZP(Z|Y,\theta^{(i)})\log \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})P(Y|\theta^{(i)})}=ZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)

B(θ,θ(i))=L(θ(i))+∑ZP(Z∣Y,θ(i))log⁡P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θ(i))P(Y∣θ(i))B(\theta,\theta^{(i)})=L(\theta^{(i)})+\sum\limits_ZP(Z|Y,\theta^{(i)})\log \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})P(Y|\theta^{(i)})}B(θ,θ(i))=L(θ(i))+ZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ)

L(θ)≥B(θ,θ(i))L(\theta)\ge B(\theta,\theta^{(i)})L(θ)B(θ,θ(i))

L(θ(i))=B(θ(i),θ(i))L(\theta^{(i)})= B(\theta^{(i)},\theta^{(i)})L(θ(i))=B(θ(i),θ(i))
因此我们可以使B(θ,θ(i))B(\theta,\theta^{(i)})B(θ,θ(i))增大
θ(i+1)=arg max⁡θB(θ,θ(i))\theta^{(i+1)}=\argmax\limits_{\theta}B(\theta,\theta^{(i)})θ(i+1)=θargmaxB(θ,θ(i))
θ(i+1)=arg max⁡θ(L(θ(i))+∑ZP(Z∣Y,θ(i))log⁡P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θ(i))P(Y∣θ(i)))\theta^{(i+1)}=\argmax\limits_{\theta}(L(\theta^{(i)})+\sum\limits_ZP(Z|Y,\theta^{(i)})\log \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})P(Y|\theta^{(i)})})θ(i+1)=θargmax(L(θ(i))+ZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ))
=arg max⁡θ(∑ZP(Z∣Y,θ(i))log⁡P(Y∣Z,θ)P(Z∣θ)P(Z∣Y,θ(i))P(Y∣θ(i)))=\argmax\limits_{\theta}(\sum\limits_ZP(Z|Y,\theta^{(i)})\log \frac{P(Y|Z,\theta)P(Z|\theta)}{P(Z|Y,\theta^{(i)})P(Y|\theta^{(i)})})=θargmax(ZP(ZY,θ(i))logP(ZY,θ(i))P(Yθ(i))P(YZ,θ)P(Zθ))
=arg max⁡θ(∑ZP(Z∣Y,θ(i))log⁡P(Y,Z∣θ))=\argmax\limits_{\theta}(\sum\limits_ZP(Z|Y,\theta^{(i)})\log P(Y,Z|\theta))=θargmax(ZP(ZY,θ(i))logP(Y,Zθ))
=arg max⁡θQ(θ,θ(i))=\argmax\limits_{\theta}Q(\theta,\theta^{(i)})=θargmaxQ(θ,θ(i))
EMEMEM算法是对极大似然得逼近

EM算法在无监督学习中的应用

   对于训练数据T={(x1,y1),(x2,y2),...,(xN,yN)}T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}T={(x1,y1),(x2,y2),...,(xN,yN)},我们可以把xix_ixi看成观测变量yiy_iyi看成隐变量,这样就可以利用算法来估计参数

EM算法的收敛性

   设P(Y∣θ)P(Y|\theta)P(Yθ)为观测数据的似然函数,θi(i=1,2,...)\theta^{i}(i=1,2,...)θi(i=1,2,...)EMEMEM算法得到的参数估计,P(Y∣θ(i))(i=1,2,...)P(Y|\theta^{(i)})(i=1,2,...)P(Yθ(i))(i=1,2,...)为对应似然函数序列,则P(Y∣θ(i))P(Y|\theta^{(i)})P(Yθ(i))是单调递增的,即
P(Y∣θi+1)≥P(Y∣θi)P(Y|\theta^{i+1})\ge P(Y|\theta^{i})P(Yθi+1)P(Yθi)
   证明
P(Y∣θ)≥P(Y,Z∣θ)P(Z∣Y,θ)P(Y|\theta)\ge \frac{P(Y,Z|\theta)}{P(Z|Y,\theta)}P(Yθ)P(ZY,θ)P(Y,Zθ)
log⁡P(Y∣θ)=log⁡P(Y,Z∣θ)−log⁡P(Z∣Y,θ)\log P(Y|\theta)=\log P(Y,Z|\theta)-\log P(Z|Y,\theta)logP(Yθ)=logP(Y,Zθ)logP(ZY,θ)
Q(θ,θ(i))=EZ[log⁡P(Y,Z∣θ)∣Y,θ(i)]Q(\theta,\theta^{(i)})=E_Z[\log P(Y,Z|\theta)|Y,\theta^{(i)}]Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]
H(θ,θ(i))=∑Zlog⁡P(Z∣Y,θ)P(Z∣Y,θ(i))H(\theta,\theta^{(i)})=\sum\limits_Z\log P(Z|Y,\theta)P(Z|Y,\theta^{(i)})H(θ,θ(i))=ZlogP(ZY,θ)P(ZY,θ(i))
于是对数似然函数可写成
log⁡P(Y∣θ)=Q(θ,θ(i))−H(θ,θ(i))\log P(Y|\theta)=Q(\theta,\theta^{(i)})-H(\theta,\theta^{(i)})logP(Yθ)=Q(θ,θ(i))H(θ,θ(i))

log⁡P(Y∣θ(i+1))−log⁡P(Y∣θ(i))\log P(Y|\theta^{(i+1)})-\log P(Y|\theta^{(i)})logP(Yθ(i+1))logP(Yθ(i))
=[Q(θ(i+1),θ(i))−Q(θ(i),θ(i))]−[H(θ(i+1),θ(i))−H(θ(i),θ(i))]=[Q(\theta^{(i+1)},\theta^{(i)})-Q(\theta^{(i)},\theta^{(i)})]-[H(\theta^{(i+1)},\theta^{(i)})-H(\theta^{(i)},\theta^{(i)})]=[Q(θ(i+1),θ(i))Q(θ(i),θ(i))][H(θ(i+1),θ(i))H(θ(i),θ(i))]
由极大定义
[Q(θ(i+1),θ(i))−Q(θ(i),θ(i))]≥0[Q(\theta^{(i+1)},\theta^{(i)})-Q(\theta^{(i)},\theta^{(i)})]\ge 0[Q(θ(i+1),θ(i))Q(θ(i),θ(i))]0
[H(θ(i+1),θ(i))−H(θ(i),θ(i))]=∑Z(log⁡P(Z∣Y,θ(i+1))P(Z∣Y,θ(i)))P(Z∣Y,θ(i))[H(\theta^{(i+1)},\theta^{(i)})-H(\theta^{(i)},\theta^{(i)})]=\sum\limits_Z(\log \frac{P(Z|Y,\theta^{(i+1)})}{P(Z|Y,\theta^{(i)})})P(Z|Y,\theta^{(i)})[H(θ(i+1),θ(i))H(θ(i),θ(i))]=Z(logP(ZY,θ(i))P(ZY,θ(i+1)))P(ZY,θ(i))
≤log⁡(∑Z)P(Z∣Y,θ(i+1))P(Z∣Y,θ(i))P(Z∣Y,θ(i))=0\le \log(\sum\limits_Z) \frac{P(Z|Y,\theta^{(i+1)})}{P(Z|Y,\theta^{(i)})}P(Z|Y,\theta^{(i)})=0log(Z)P(ZY,θ(i))P(ZY,θ(i+1))P(ZY,θ(i))=0
最终得证

   设L(θ)=log⁡P(Y∣θ)L(\theta)=\log P(Y|\theta)L(θ)=logP(Yθ)为观测数据得对数似然函数,θ(i),i=1,2,...\theta^{(i)},i=1,2,...θ(i),i=1,2,...为EM算法得到的参数序列,L(θ(i))L(\theta^{(i)})L(θ(i))为对应的对数似然函数序列,

         (1)如果P(Y∣X)P(Y|X)P(YX)有上界,则L(θ(i))L(\theta^{(i)})L(θ(i))收敛到某一值L∗L^*L
         (2)在函数QQQLLL满足一定条件下,EM算法得到收敛的θ∗\theta^*θ是稳定点

EM算法在高斯混合模型学习中的应用

高斯混合模型

  • 定义高斯混合模型是指具有如下形式的概率分布模型
    P(y∣θ)=∑k=1Kakϕ(y∣θk)P(y|\theta)=\sum\limits_{k=1}^Ka_k\phi(y|\theta_k)P(yθ)=k=1Kakϕ(yθk)
    其中ak≥0a_k\ge0ak0是系数,∑k=1Kak=1,ϕ(y∣θk)\sum\limits_{k=1}^Ka_k=1,\phi(y|\theta_k)k=1Kak=1,ϕ(yθk)是高斯分布密度,θk(μk,σk2)\theta_k(\mu_k,\sigma_k^2)θk(μk,σk2)
    ϕ(y∣θk)=12πσkexp⁡(−(y−μk)22σk2)\phi(y|\theta_k)=\frac{1}{\sqrt{2\pi}\sigma_k}\exp(-\frac{(y-\mu_k)^2}{2\sigma_k^2})ϕ(yθk)=2πσk1exp(2σk2(yμk)2)
    称为第k个模型

高斯混合模型参数估计的EM算法

  • 推导算法
  1. 明确隐变量,写出完全数据的对数似然函数
    γjk={1第j个观测来自第k个分量模型0其他\gamma_{jk}=\begin{cases} 1 & 第j个观测来自第k个分量模型\\ 0 & 其他\\ \end{cases}γjk={10jk
    j=1,2,...,N;k=1,2,...,Kj=1,2,...,N;k=1,2,...,Kj=1,2,...,N;k=1,2,...,K
    于是似然函数
    P(y,γ∣θ)=∏k=1K∏j=1N[akϕ(yi∣θk)]γjkP(y,\gamma|\theta)=\prod\limits_{k=1}^K\prod\limits_{j=1}^N[a_k\phi(y_i|\theta_k)]^{\gamma_{jk}}P(y,γθ)=k=1Kj=1N[akϕ(yiθk)]γjk
    =∏k=1Kank∏j=1N[12πσkexp⁡(−(yj−μk)22σk2)]=\prod\limits_{k=1}^Ka^{n_k}\prod\limits_{j=1}^N[\frac{1}{\sqrt{2\pi}\sigma_k}\exp(-\frac{(y_j-\mu_k)^2}{2\sigma_k^2})]=k=1Kankj=1N[2πσk1exp(2σk2(yjμk)2)]
    nk=∑j=1Nγjk,∑k=1Knk=Nn_k=\sum\limits_{j=1}^N\gamma_{jk},\sum\limits_{k=1}^Kn_k=Nnk=j=1Nγjk,k=1Knk=N
    log⁡P(y,γ∣θ)=∑k=1K{nklog⁡ak+∑j=1Nγjk[log⁡(12π)−log⁡σk−12σk2(yj−μk)2]}\log P(y,\gamma|\theta)=\sum\limits_{k=1}^K\{n_k\log a_k +\sum\limits_{j=1}^N\gamma_{jk}[\log (\frac{1}{\sqrt{2\pi}})-\log \sigma_k-\frac{1}{2\sigma_k^2}(y_j-\mu_k)^2] \}logP(y,γθ)=k=1K{nklogak+j=1Nγjk[log(2π1)logσk2σk21(yjμk)2]}
  2. EM算法的E,计算Q
    Q(θ,θ(i))=E[log⁡P(y,γ∣θ)∣y,θ(i)]Q(\theta,\theta^{(i)})=E[\log P(y,\gamma|\theta)|y,\theta^{(i)}]Q(θ,θ(i))=E[logP(y,γθ)y,θ(i)]
    =E{∑k=1K{nklog⁡ak+∑j=1Nγjk[log⁡(12π)−log⁡σk−12σk2(yj−μk)2]}}=E\{\sum\limits_{k=1}^K\{n_k\log a_k +\sum\limits_{j=1}^N\gamma_{jk}[\log (\frac{1}{\sqrt{2\pi}})-\log \sigma_k-\frac{1}{2\sigma_k^2}(y_j-\mu_k)^2] \}\}=E{k=1K{nklogak+j=1Nγjk[log(2π1)logσk2σk21(yjμk)2]}}
    ∑k=1K{∑j=1N(γjk)log⁡ak+∑j=1NE(γjk)[log⁡(12π)−log⁡σk−12σk2(yj−μk)2]}\sum\limits_{k=1}^K\{\sum\limits_{j=1}^N(\gamma_{jk})\log a_k +\sum\limits_{j=1}^NE(\gamma_{jk})[\log (\frac{1}{\sqrt{2\pi}})-\log \sigma_k-\frac{1}{2\sigma_k^2}(y_j-\mu_k)^2] \}k=1K{j=1N(γjk)logak+j=1NE(γjk)[log(2π1)logσk2σk21(yjμk)2]}
    E(γjk∣y,θ)=P(γjk=1∣y,θ)E(\gamma_{jk}|y,\theta)=P(\gamma_{jk}=1|y,\theta)E(γjky,θ)=P(γjk=1y,θ)
    =P(γjk=1∣y,θ)∑k=1KP(γjk=1∣y,θ)=\frac{P(\gamma_{jk}=1|y,\theta)}{\sum\limits_{k=1}^KP(\gamma_{jk}=1|y,\theta)}=k=1KP(γjk=1y,θ)P(γjk=1y,θ)
    =P(yj∣γjk=1,θ)P(γjk=1∣θ)∑k=1KP(yj∣γjk=1,θ)P(γjk=1∣θ)=\frac{P(y_j|\gamma_{jk}=1,\theta)P(\gamma_{jk}=1|\theta)}{\sum\limits_{k=1}^KP(y_j|\gamma_{jk}=1,\theta)P(\gamma_{jk}=1|\theta)}=k=1KP(yjγjk=1,θ)P(γjk=1θ)P(yjγjk=1,θ)P(γjk=1θ)
    akϕ(yi∣θk)∑k=1Kakϕ(yi∣θk)\frac{a_k\phi(y_i|\theta_k)}{\sum\limits_{k=1}^Ka_k\phi(y_i|\theta_k)}k=1Kakϕ(yiθk)akϕ(yiθk)
    γjk^=E(γjk∣y,θ)\hat{\gamma_{jk}}=E(\gamma_{jk}|y,\theta)γjk^=E(γjky,θ)
    Q(θ,θ(i))=∑k=1K{nklog⁡ak+∑j=1Nγjk^[log⁡(12π)−log⁡σk−12σk2(yj−μk)2]}Q(\theta,\theta^{(i)})=\sum\limits_{k=1}^K\{n_k\log a_k +\sum\limits_{j=1}^N\hat{\gamma_{jk}}[\log (\frac{1}{\sqrt{2\pi}})-\log \sigma_k-\frac{1}{2\sigma_k^2}(y_j-\mu_k)^2] \}Q(θ,θ(i))=k=1K{nklogak+j=1Nγjk^[log(2π1)logσk2σk21(yjμk)2]}
  3. EM算法的M,求解极大θ(i+1)=arg max⁡θQ(θ,θ(i))\theta^{(i+1)}=\argmax\limits_{\theta}Q(\theta,\theta^{(i)})θ(i+1)=θargmaxQ(θ,θ(i))
    对参数求导等于0得
    μ^k=∑j=1Nγ^jkyj∑j=1Nγ^jk,    k=1,2,...,K\hat\mu_k=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}y_j}{\sum\limits_{j=1}^N\hat\gamma_{jk}},\ \ \ \ k=1,2,...,Kμ^k=j=1Nγ^jkj=1Nγ^jkyj,    k=1,2,...,K
    σ^k2=∑j=1Nγ^jk(yj−μk)∑j=1Nγ^jk,    k=1,2,...,K\hat\sigma_k^2=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}(y_j-\mu_k)}{\sum\limits_{j=1}^N\hat\gamma_{jk}},\ \ \ \ k=1,2,...,Kσ^k2=j=1Nγ^jkj=1Nγ^jk(yjμk),    k=1,2,...,K
    a^knkN=∑j=1Nγ^jkN,    k=1,2,...,K\hat a_k\frac{n_k}{N}=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}}{N},\ \ \ \ k=1,2,...,Ka^kNnk=Nj=1Nγ^jk,    k=1,2,...,K

算法
输入:观测数据y1,y2,...,yNy_1,y_2,...,y_Ny1,y2,...,yN,高斯混合模型
输出:高斯混合模型参数
(1)(1)(1)取参数得初始值开始迭代
(2)(2)(2)E步:依据当前模型参数,计算分模型得响应度
γ^jk=akϕ(yi∣θk)∑k=1Kakϕ(yi∣θk)\hat\gamma_{jk}=\frac{a_k\phi(y_i|\theta_k)}{\sum\limits_{k=1}^Ka_k\phi(y_i|\theta_k)}γ^jk=k=1Kakϕ(yiθk)akϕ(yiθk)
(3)(3)(3)M步:
μ^k=∑j=1Nγ^jkyj∑j=1Nγ^jk,    k=1,2,...,K\hat\mu_k=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}y_j}{\sum\limits_{j=1}^N\hat\gamma_{jk}},\ \ \ \ k=1,2,...,Kμ^k=j=1Nγ^jkj=1Nγ^jkyj,    k=1,2,...,K
σ^k2=∑j=1Nγ^jk(yj−μk)∑j=1Nγ^jk,    k=1,2,...,K\hat\sigma_k^2=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}(y_j-\mu_k)}{\sum\limits_{j=1}^N\hat\gamma_{jk}},\ \ \ \ k=1,2,...,Kσ^k2=j=1Nγ^jkj=1Nγ^jk(yjμk),    k=1,2,...,K
a^knkN=∑j=1Nγ^jkN,    k=1,2,...,K\hat a_k\frac{n_k}{N}=\frac{\sum\limits_{j=1}^N\hat\gamma_{jk}}{N},\ \ \ \ k=1,2,...,Ka^kNnk=Nj=1Nγ^jk,    k=1,2,...,K
(4)(4)(4)重复第(2)(2)(2)和第(3)(3)(3)步直到收敛

EM算法的推广

F函数的极大-极大算法

  • 定义假设隐变量数据ZZZ的概率分布为P^(Z)\hat P(Z)P^(Z),定义分布P^\hat PP^与参数θ\thetaθ的函数F(P^,θ)F(\hat P,\theta)F(P^,θ)如下:
    F(P^,θ)=EP^[log⁡P(Y,Z∣θ)]+H(P^)F(\hat P,\theta)=E_{\hat P}[\log P(Y,Z|\theta)]+H(\hat P)F(P^,θ)=EP^[logP(Y,Zθ)]+H(P^)
    称为FFF函数,其中H(P^)=−EP^log⁡P^(Z)H(\hat P)=-E_{\hat P\log \hat P(Z)}H(P^)=EP^logP^(Z)是分布P^(Z)\hat P(Z)P^(Z)的熵
       对于固定的θ\thetaθ存在唯一的分布P^θ\hat P_\thetaP^θ极大化F(P^,θ)F(\hat P,\theta)F(P^,θ),这时P^θ由下式给出\hat P_\theta由下式给出P^θ
    P^θ(Z)=P(Z∣Y,θ)\hat P_\theta(Z)=P(Z|Y,\theta)P^θ(Z)=P(ZY,θ)并且P^θ\hat P_\thetaP^θθ\thetaθ连续变化
    证明
    拉格朗日函数为
    L=EP^log⁡P(Y,Z∣θ)−EP^log⁡P^(Z)+λ(1−∑ZP^(Z))L=E_{\hat P}\log P(Y,Z|\theta)-E_{\hat P}\log \hat P(Z)+\lambda(1-\sum\limits_Z \hat P(Z))L=EP^logP(Y,Zθ)EP^logP^(Z)+λ(1ZP^(Z))
    ∂L∂P^(Z)=log⁡P(Y,Z∣θ)−log⁡P^(Z)−1−λ=0\frac{\partial L}{\partial \hat P(Z)}=\log P(Y,Z|\theta)-\log \hat P(Z)-1-\lambda=0P^(Z)L=logP(Y,Zθ)logP^(Z)1λ=0

    λ=log⁡P(Y,Z∣θ)−log⁡P^θ(Z)−1\lambda=\log P(Y,Z|\theta)-\log \hat P_\theta(Z)-1λ=logP(Y,Zθ)logP^θ(Z)1
    最终
    P^θ(Z)=P(Z∣Y,θ)\hat P_\theta(Z)=P(Z|Y,\theta)P^θ(Z)=P(ZY,θ)
       设L(θ)=log⁡P(Y∣θ)L(\theta)=\log P(Y|\theta)L(θ)=logP(Yθ) 为观测数据得似然函数,θ(i)\theta^{(i)}θ(i)EMEMEM算法得到得参数估计,如果F(P^,θ)F(\hat P,\theta)F(P^,θ)θ∗\theta^*θ由局部极大\最大,则在LLL上也是局部极大\最大

   EM算法得一次迭代可由FFF函数得极大-极大算法实现
(1)(1)(1)对于固定得θ(i)\theta^{(i)}θ(i),求P^(i+1)\hat P^{(i+1)}P^(i+1)使F(P^,θ(i))F(\hat P,\theta^{(i)})F(P^,θ(i))极大
(2)(2)(2)对于固定P^(i+1)\hat P^{(i+1)}P^(i+1)θ(i+1)\theta^{(i+1)}θ(i+1)使F(P^(i+1),θ)F(\hat P^{(i+1),\theta})F(P^(i+1),θ)极大化

GEM算法

算法1
输入:观测数据,FFF函数
输出:模型参数
(1)(1)(1)初始化参数θ(0)\theta^{(0)}θ(0), 开始迭代
(2)(2)(2)固定θ\thetaθ最大化PPP
(3)(3)(3)得到PPP后优化θ\thetaθ
(4)(4)(4)重复(2),(3)(2),(3)(2),(3)
算法2
输入:观测数据,QQQ函数
输出:模型参数
(1)(1)(1)初始化参数θ(0)\theta^{(0)}θ(0), 开始迭代
(2)(2)(2)
Q(θ,θ(i))=EZ[log⁡P(Y,Z∣θ)∣Y,θ(i)]Q(\theta,\theta^{(i)})=E_Z[\log P(Y,Z|\theta)|Y,\theta^{(i)}]Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]
=∑ZP(Z∣Y,θ(i))log⁡P(Y,Z∣θ)=\sum\limits_ZP(Z|Y,\theta^{(i)})\log P(Y,Z|\theta)=ZP(ZY,θ(i))logP(Y,Zθ)
(3)(3)(3)θ(i+1)\theta^{(i+1)}θ(i+1)
Q(θ(i+1),θ(i))>Q(θ(i),θ(i))Q(\theta^{(i+1)},\theta^{(i)})>Q(\theta^{(i)},\theta^{(i)})Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
(4)(4)(4)重复(2),(3)(2),(3)(2),(3)
算法3
输入:观测数据,QQQ函数
输出:模型参数
(1)(1)(1)初始化参数θ(0)\theta^{(0)}θ(0), 开始迭代
(2)(2)(2)
Q(θ,θ(i))=EZ[log⁡P(Y,Z∣θ)∣Y,θ(i)]Q(\theta,\theta^{(i)})=E_Z[\log P(Y,Z|\theta)|Y,\theta^{(i)}]Q(θ,θ(i))=EZ[logP(Y,Zθ)Y,θ(i)]
=∑ZP(Z∣Y,θ(i))log⁡P(Y,Z∣θ)=\sum\limits_ZP(Z|Y,\theta^{(i)})\log P(Y,Z|\theta)=ZP(ZY,θ(i))logP(Y,Zθ)
(3)(3)(3)θ(i+1),d\theta^{(i+1)},dθ(i+1),d次,求依次优化θi\theta_iθi,固定其他不变
Q(θ(i+1),θ(i))>Q(θ(i),θ(i))Q(\theta^{(i+1)},\theta^{(i)})>Q(\theta^{(i)},\theta^{(i)})Q(θ(i+1),θ(i))>Q(θ(i),θ(i))
(4)(4)(4)重复(2),(3)(2),(3)(2),(3)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值