首先需要明确,HMM学习的目标函数到底是什么:HMM是一种有向概率图模型,在有监督的情况下,使用极大似然估计最大化联合概率,求解最优的参数,即:
L(θ)=p(x,z∣θ) L(\theta) = p(x, z| \theta) L(θ)=p(x,z∣θ)
xxx是观测序列,zzz是状态序列,联合概率表示为:
p(x,z∣θ)=∏t=0T−1p(xt+1∣zt+1,θ)p(zt+1∣zt,θ) p(x, z| \theta) = \prod_{t = 0}^{T - 1} p(x_{t + 1}| z_{t + 1}, \theta) p(z_{t + 1}| z_{t}, \theta) p(x,z∣θ)=t=0∏T−1p(xt+1∣zt+1,θ)p(zt+1∣zt,θ)
取对数:
log(p(x,z∣θ))=∑t=1Tlog(p(xt∣zt,θ))+∑t=1T−1log(p(zt+1∣zt,θ)) log \bigg( p(x, z| \theta) \bigg) = \sum_{t = 1}^{T}log \bigg( p(x_t| z_t, \theta) \bigg) + \sum_{t = 1}^{T - 1}log \bigg( p(z_{t + 1}| z_t, \theta) \bigg) log(p(x,z∣θ))=t=1∑Tlog(p(xt∣zt,θ))+t=1∑T−1log(p(zt+1∣zt,θ))
假设xt∈{1,2,…,O}x_t \in \{1, 2, \dots, O\}xt∈{1,2,…,O},zt∈{1,2,…,H}z_t \in \{1, 2, \dots, H\}zt∈{1,2,…,H},发射概率矩阵记为AAA,转移概率矩阵记为BBB,Ah,oA_{h, o}Ah,o在数据中的计数为eh,oe_{h, o}eh,o,Bj,kB_{j, k}Bj,k在数据中的计数为fj,kf_{j, k}fj,k,似然函数改写为:
log(p(x,z∣θ))=∑h=1H∑o=1Oeh,olog(Ah,o)+∑j=1H∑k=1Ofj,klog(Bj,k) log \bigg( p(x, z| \theta) \bigg) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) log(p(x,z∣θ))=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)
因此最终的优化问题为:
max∑h=1H∑o=1Oeh,olog(Ah,o)+∑j=1H∑k=1Ofj,klog(Bj,k)s.t.∑o=1OAh,o=1,∑k=1HBj,k=1 \begin{array}{rcl} \max && \displaystyle \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \\ s.t. && \displaystyle \sum_{o = 1}^{O}A_{h, o} = 1, \sum_{k = 1}^{H}B_{j, k} = 1 \end{array} maxs.t.h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)o=1∑OAh,o=1,k=1∑HBj,k=1
使用拉格朗日乘数法:
L(A,B)=∑h=1H∑o=1Oeh,olog(Ah,o)+∑j=1H∑k=1Ofj,klog(Bj,k)−∑h=1Hαh(∑o=1OAh,o−1)−∑j=1Hβj(∑k=1HBj,k−1) L(A, B) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \newline -\sum_{h = 1}^{H}\alpha_h \bigg( \sum_{o = 1}^{O}{A_{h, o} - 1} \bigg) \newline -\sum_{j = 1}^{H}\beta_j \bigg( \sum_{k = 1}^{H}{B_{j, k} - 1} \bigg) L(A,B)=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)−h=1∑Hαh(o=1∑OAh,o−1)−j=1∑Hβj(k=1∑HBj,k−1)
对AAA、BBB求偏导数得:
∂L∂Ah,o=eh,oAh,o−αh=0∂L∂Bj,k=fj,kBj,k−βj=0∂L∂αh=∑o=1OAh,o−1∂L∂βj=∑k=1HBj,k−1 \frac{\partial{L}}{\partial{A_{h, o}}} = \frac{e_{h, o}}{A_{h, o}} - \alpha_h = 0 \newline \frac{\partial{L}}{\partial{B_{j, k}}} = \frac{f_{j, k}}{B_{j, k}} - \beta_j = 0 \newline \frac{\partial{L}}{\partial{\alpha_h}} = \sum_{o = 1}^{O}{A_{h, o} - 1} \newline \frac{\partial{L}}{\partial{\beta_j}} = \sum_{k = 1}^{H}{B_{j, k} - 1} ∂Ah,o∂L=Ah,oeh,o−αh=0∂Bj,k∂L=Bj,kfj,k−βj=0∂αh∂L=o=1∑OAh,o−1∂βj∂L=k=1∑HBj,k−1
把AAA、BBB代入约束条件得:
αh=∑o=1Oeh,oβj=∑k=1Hfj,k \alpha_h = \sum_{o = 1}^{O}e_{h, o} \newline \beta_j = \sum_{k = 1}^{H}f_{j, k} αh=o=1∑Oeh,oβj=k=1∑Hfj,k
代入增广函数:
L(A,B)=∑h=1H∑o=1Oeh,olog(Ah,o)+∑j=1H∑k=1Ofj,klog(Bj,k)−∑h=1H∑o=1Oeh,o(∑o=1OAh,o−1)−∑j=1H∑k=1Hfj,k(∑k=1HBj,k−1) L(A, B) = \sum_{h = 1}^{H}\sum_{o = 1}^{O} e_{h, o}log(A_{h, o}) + \sum_{j = 1}^{H}\sum_{k = 1}^{O} f_{j, k}log(B_{j, k}) \newline -\sum_{h = 1}^{H}\sum_{o = 1}^{O}e_{h, o} \bigg( \sum_{o = 1}^{O}{A_{h, o} - 1} \bigg) \newline -\sum_{j = 1}^{H}\sum_{k = 1}^{H}f_{j, k} \bigg( \sum_{k = 1}^{H}{B_{j, k} - 1} \bigg) L(A,B)=h=1∑Ho=1∑Oeh,olog(Ah,o)+j=1∑Hk=1∑Ofj,klog(Bj,k)−h=1∑Ho=1∑Oeh,o(o=1∑OAh,o−1)−j=1∑Hk=1∑Hfj,k(k=1∑HBj,k−1)
重新对AAA、BBB求偏导数得:
∂L∂Ah,o=eh,oAh,o−∑o=1Oeh,o=0∂L∂Bj,k=fj,kBj,k−∑k=1Hfj,k=0 \frac{\partial{L}}{\partial{A_{h, o}}} = \frac{e_{h, o}}{A_{h, o}} - \sum_{o = 1}^{O}e_{h, o} = 0 \newline \frac{\partial{L}}{\partial{B_{j, k}}} = \frac{f_{j, k}}{B_{j, k}} - \sum_{k = 1}^{H}f_{j, k} = 0 ∂Ah,o∂L=Ah,oeh,o−o=1∑Oeh,o=0∂Bj,k∂L=Bj,kfj,k−k=1∑Hfj,k=0
最后的结果与直观认知一致(认知与推导一致是偶然的):
Ah,o=eh,o∑o=1Oeh,oBj,k=fj,k∑k=1Hfj,k A_{h, o} = \frac{e_{h, o}}{\displaystyle\sum_{o = 1}^{O}e_{h, o}} \newline B_{j, k} = \frac{f_{j, k}}{\displaystyle\sum_{k = 1}^{H}f_{j, k}} Ah,o=o=1∑Oeh,oeh,oBj,k=k=1∑Hfj,kfj,k
该博客详细介绍了隐马尔科夫模型(HMM)的学习目标函数,通过极大似然估计来最大化联合概率,并利用拉格朗日乘数法解决约束优化问题。文章深入探讨了如何通过对数似然函数进行求导,得到发射概率矩阵AAA和转移概率矩阵BBB的更新公式,最终得出最优参数。内容涉及概率图模型、最大似然估计、拉格朗日乘数法等核心概念。
3863

被折叠的 条评论
为什么被折叠?



