HMM的EM算法很多地方都可以找到,但是往往缺失一些不那么容易理解的细节。
-
前向后向算法中
P(it=qi,O∣λ)=P(O∣it=qi,λ)∗P(it=qi∣λ)=P(o1,...,ot∣it=qi,λ)∗P(ot+1,...,oT∣it=qi,λ)∗P(it=qi∣λ)=P(o1,...,ot,it=qi,λ)∗P(ot+1,...,oT∣it=qi,λ)=αt(i)∗βt(i)P(i_t=q_i,O|{\lambda}) \\=P(O|i_t=q_i,\lambda)*P(i_t=q_i|\lambda) \\=P(o_1,...,o_t|i_t=q_i,\lambda)*P(o_t+1,...,o_T|i_t=q_i,\lambda)*P(i_t=q_i|\lambda) \\=P(o_1,...,o_t,i_t=q_i,\lambda)*P(o_t+1,...,o_T|i_t=q_i,\lambda) \\=\alpha_t(i)*\beta_t(i)P(it=qi,O∣λ)=P(O∣it=qi,λ)∗P(it=qi∣λ)=P(o1,...,ot∣it=qi,λ)∗P(ot+1,...,oT∣it=qi,λ)∗P(it=qi∣λ)=P(o1,...,ot,it=qi,λ)∗P(ot+1,...,oT∣it=qi,λ)=αt(i)∗βt(i) -
Baum-welch算法
∑I(∑t=1T−1logaitit+1)P(O,I∣λ‾)=∑i1...∑it∑it+1...∑iT(∑t=1T−1logaitit+1)P(O,i1,...,it,it+1,...,iT∣λ‾)=∑it∑it+1(∑t=1T−1logaitit+1)P(O,it,it+1∣λ‾)=∑i=1N∑j=1N(∑t=1T−1logaij)P(O,it=i,it+1=j∣λ‾)\sum\limits_{I}(\sum\limits^{T-1}\limits_{t=1}loga_{i_ti_{t+1}})P(O,I|\overline{\lambda}) \\=\sum\limits_{i_1}...\sum\limits_{i_t}\sum\limits_{i_{t+1}}...\sum\limits_{i_T}(\sum\limits^{T-1}\limits_{t=1}loga_{i_ti_{t+1}})P(O,i_1,...,i_t,i_{t+1},...,i_T|\overline{\lambda}) \\=\sum\limits_{i_t}\sum\limits_{i_{t+1}}(\sum\limits^{T-1}\limits_{t=1}loga_{i_ti_{t+1}})P(O,i_t,i_{t+1}|\overline{\lambda}) \\=\sum\limits_{i=1}\limits^{N}\sum_{j=1}\limits^{N}(\sum\limits^{T-1}\limits_{t=1}loga_{ij})P(O,i_t=i,i_{t+1}=j|\overline{\lambda})I∑(t=1∑T−1logaitit+1)P(O,I∣λ)=i1∑...it∑it+1∑...iT∑(t=1∑T−1logaitit+1)P(O,i1,...,it,it+1,...,iT∣λ)=it∑it+1∑(t=1∑T−1logaitit+1)P(O,it,it+1∣λ)=i=1∑Nj=1∑N(t=1∑T−1logaij)P(O,it=i,it+1=j∣λ) -
求bj(k)b_j(k)bj(k)
L(B)=∑I(∑t=1T−1logbit(ot)P(O,I∣λ‾)=∑j=1N∑t=1Tlogbj(ot)P(O,it=j∣λ‾)L(B)=\sum\limits_{I}(\sum\limits^{T-1}\limits_{t=1}logb_{i_t}(o_t)P(O,I|\overline{\lambda}) \\=\sum\limits^N\limits_{j=1}\sum\limits^T\limits_{t=1}logb_{j}(o_t)P(O,i_t=j|\overline{\lambda})L(B)=I∑(t=1∑T−1logbit(ot)P(O,I∣λ)=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λ)
L(B)加上拉格朗日项:
L(B)=∑j=1N∑t=1Tlogbj(ot)P(O,it=j∣λ‾)+φ(∑k=1Mbj(k)−1)L(B)=\sum\limits^N\limits_{j=1}\sum\limits^T\limits_{t=1}logb_{j}(o_t)P(O,i_t=j|\overline{\lambda})+\varphi(\sum\limits_{k=1}\limits^Mb_j(k)-1)L(B)=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λ)+φ(k=1∑Mbj(k)−1)
令∂L(B)∂bj(k)=0\frac{\partial{L(B)}}{\partial{b_j(k)}}=0∂bj(k)∂L(B)=0
∂L(B)∂bj(k)=∑t=1TP(O,it=j∣λ‾)I(ot=k)bj(k)+φ=0\frac{\partial{L(B)}}{\partial{b_j(k)}}=\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{b_j(k)}+\varphi=0∂bj(k)∂L(B)=bj(k)t=1∑TP(O,it=j∣λ)I(ot=k)+φ=0
得:
bj(k)=−∑t=1TP(O,it=j∣λ‾)I(ot=k)φb_j(k)=-\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{\varphi}bj(k)=−φt=1∑TP(O,it=j∣λ)I(ot=k)
代入:
∑k=1Mbj(k)=1\sum\limits_{k=1}\limits^{M}b_j(k)=1k=1∑Mbj(k)=1
−∑k=1M∑t=1TP(O,it=j∣λ‾)I(ot=k)φ=1-\sum\limits_{k=1}\limits^{M}\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{\varphi}=1−k=1∑Mφt=1∑TP(O,it=j∣λ)I(ot=k)=1
得:
φ=−∑k=1M∑t=1TP(O,it=j∣λ‾)I(ot=k)\varphi=-\sum\limits_{k=1}\limits^{M}\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)φ=−k=1∑Mt=1∑TP(O,it=j∣λ)I(ot=k)
bj(k)=−∑t=1TP(O,it=j∣λ‾)I(ot=k)φ=∑t=1TP(O,it=j∣λ‾)I(ot=k)∑k=1M∑t=1TP(O,it=j∣λ‾)I(ot=k)=∑t=1TP(O,it=j∣λ‾)I(ot=k)∑t=1TP(O,it=j∣λ‾)b_j(k)=-\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{\varphi}
\\=\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{\sum\limits_{k=1}\limits^{M}\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}
\\=\frac{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})I(o_t=k)}{\sum\limits^T\limits_{t=1}P(O,i_t=j|\overline{\lambda})}bj(k)=−φt=1∑TP(O,it=j∣λ)I(ot=k)=k=1∑Mt=1∑TP(O,it=j∣λ)I(ot=k)t=1∑TP(O,it=j∣λ)I(ot=k)=t=1∑TP(O,it=j∣λ)t=1∑TP(O,it=j∣λ)I(ot=k)
注:I(ot=k)I(o_t=k)I(ot=k)加和总为1,可理解为离散概率,因此概率求边缘分布,消掉。