HMM的概率计算问题
HMM的概率计算问题是指,给定模型参数λ=(A,B,π)\lambda = (A,B,\pi)λ=(A,B,π) 和观测序列O=(o1,o2,...,oT)O = (o_1,o_2,...,o_T)O=(o1,o2,...,oT),计算在模型λ\lambdaλ下,观测序列OOO出现的概率:P(O∣λ)P(O | \lambda)P(O∣λ)。
直接计算
按概率公式直接计算,在贝叶斯框架下有:
P(O∣λ)=∑IP(O,I∣λ)=∑IP(O∣I,λ)P(I∣λ)P(O | \lambda) = \sum_{I} P(O,I | \lambda) = \sum_{I} P(O | I,\lambda)P(I | \lambda)P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)P(I∣λ)
- 其中,P(O∣I,λ)P(O | I,\lambda)P(O∣I,λ)是从it→oti_t \to o_tit→ot,由发射概率矩阵[bj(k)]N×M[b_j(k)]_{N \times M}[bj(k)]N×M中获得:
P(O∣I,λ)=P(o1∣i1)...P(ot∣it)...P(oT∣iT)=bi1(o1)...bit(ot)...biT(oT)P(O | I,\lambda) =P(o_1 | i_1)...P(o_t | i_t)...P(o_T | i_T) = b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T)P(O∣I,λ)=P(o1∣i1)...P(ot∣it)...P(oT∣iT)=bi1(o1)...bit(ot)...biT(oT),共TTT项
- P(I∣λ)P(I | \lambda)P(I∣λ)是从it−1→iti_{t-1} \to i_tit−1→it,由转移概率矩阵[aij]N×N[a_{ij}]_{N \times N}[aij]N×N和初始状态概率向量π\piπ获得:
P(I∣λ)=πi1P(i2∣i1)...P(it∣it−1)...P(iT∣iT−1)=πi1ai1i2...ait−1it...aiT−1iTP(I | \lambda) = \pi_{i_1}P(i_2 | i_1) ...P(i_t | i_{t-1})...P(i_T | i_{T-1}) = \pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T}P(I∣λ)=πi1P(i2∣i1)...P(it∣it−1)...P(iT∣iT−1)=πi1ai1i2...ait−1it...aiT−1iT,共TTT项
两式代入计算得:
- P(O∣λ)=∑IP(O,I∣λ)P(O | \lambda) = \sum_{I} P(O,I | \lambda)P(O∣λ)=∑IP(O,I∣λ)
=∑IP(O∣I,λ)P(I∣λ)= \sum_{I} P(O | I,\lambda)P(I | \lambda)=∑IP(O∣I,λ)P(I∣λ)
=∑I[bi1(o1)...bit(ot)...biT(oT)]×[πi1ai1i2...ait−1it...aiT−1iT]= \sum_{I} [b_{i_1}(o_1)...b_{i_t}(o_t)...b_{i_T}(o_T)] \times [\pi_{i_1} a_{i_1 i_2}...a_{i_{t-1} i_t}...a_{i_{T-1} i_T}]=∑I[bi1(o1)...bit(ot)...biT(oT)]×[πi1ai1i2...ait−1it...aiT−1iT]
=∑Iπi1∏t=1Tbit(ot)∏t=1T−1aitit+1= \sum_{I} \pi_{i_1} \prod_{t=1}^T b_{i_t}(o_t) \prod_{t=1}^{T-1}a_{i_t i_{t+1}}=∑Iπi1∏t=1Tbit(ot)∏t=1T−1aitit+1
由于∑I=∑i1...∑it...∑iT\sum_{I} = \sum_{i_1}...\sum_{i_t}...\sum_{i_T}∑I=∑i1...∑it...∑iT,每个iti_tit有NNN种取值可能,故∑I\sum_{I}∑I共有NTN^TNT项,可知若按概率公式直接计算P(O∣λ)P(O | \lambda)P(O∣λ),计算量会很大。
前向算法(Forward Algorithm)
找出从时刻1→...→t→...→T1 \to ... \to t \to ... \to T1→...→t→...→T,前向概率的递归关系:
前向概率
在观测时间点1,...,t,...,T1,...,t,...,T1,...,t,...,T上,对应的观测值为o1,...,ot,...,oTo_1,...,o_t,...,o_To1,...,ot,...,oT,各隐状态分别为i1,...,it,...,iTi_1,...,i_t,...,i_Ti1,...,it,...,iT。
i1→...→it→...→iTi_1 \to ... \to i_t \to ...\to i_Ti1→...→it→...→iTo1→...→ot→...→oTo_1 \to ... \to o_t \to ...\to o_To1→...→ot→...→oT
定义前向概率
:αt(i)=P(o1,...,ot,it=qi∣λ)\alpha_t(i) = P(o_1,...,o_t,i_t = q_i | \lambda)αt(i)=P(o1,...,ot,it=qi∣λ)
它表示:截止到时刻ttt,观测序列的值为o1,o2,...,oto_1,o_2,...,o_to1,o2,...,ot、且ttt时刻的状态为qiq_iqi的概率。
递归过程的公式推导
根据定义,写出t=1t=1t=1和t=2t=2t=2的前向概率:
-
α1(i)=P(o1,i1=qi∣λ)=P(o1∣i1=qi,λ)P(i1=qi∣λ)=bi(o1)πi\alpha_1(i) = P(o_1,i_1 = q_i | \lambda) = P(o_1 | i_1 = q_i, \lambda)P(i_1 = q_i | \lambda) = b_{i}(o_1) \pi_iα1(i)=P(o1,i1=qi∣λ)=P(o1∣i1=qi,λ)P(i1=qi∣λ)=bi(o1)πi
-
α2(j)=P(o1,o2,i2=qj∣λ)\alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda)α2(j)=P(o1,o2,i2=qj∣λ)
$= \sum_{i=1}^N P(o_1,o_2,i_1 = q_i,i_2 = q_j | \lambda) $
=∑i=1NP(o2∣i2=qj,λ)P(i2=qj∣i1=qi,λ)P(o1∣i1=qi,λ)P(i1=qi∣λ)= \sum_{i=1}^N P(o_2 | i_2 = q_j,\lambda)P(i_2 = q_j | i_1 = q_i,\lambda)P(o_1 | i_1 = q_i,\lambda) P(i_1 = q_i | \lambda)=∑i=1NP(o2∣i2=qj,λ)P(i2=qj∣i1=qi,λ)P(o1∣i1=qi,λ)P(i1=qi∣λ)
=∑i=1Nbj(o2)aijα1= \sum_{i=1}^N b_j(o_2) a_{ij} \alpha_1=∑i=1Nbj(o2)aijα1
=bj(o2)∑i=1Naijα1(i)= b_j(o_2) \sum_{i=1}^N a_{ij} \alpha_1(i)=bj(o2)∑i=1Naijα1(i)
.........
递推得到αt+1(j)\alpha_{t+1}(j)αt+1(j)与αt(i)\alpha_t(i)αt(i)之间的关系:
αt+1(j)=bj(ot+1)∑i=1Naijαt(i)\alpha_{t+1}(j) = b_j(o_{t+1}) \sum_{i=1}^N a_{ij} \alpha_t(i)αt+1(j)=bj(ot+1)i=1∑Naijαt(i)
其中,j∈{1,2,...,N}j \in \{1,2,...,N\}j∈{1,2,...,N}。
对递归过程的直观理解
以t=1t=1t=1和t=2t=2t=2两个时刻为例,它们之间涉及到的观测值和隐状态有:o1o_1o1、o2o_2o2、i1i_1i1、i2i_2i2:
i1→i2i_1 \to i_2i1→i2
o1→o2o_1 \to o_2o1→o2
当计算出α1(i)=P(o1,i1=qi∣λ),i∈{1,2,...,N}\alpha_1(i) = P(o_1,i_1 = q_i | \lambda), i \in \{1,2,...,N\}α1(i)=P(o1,i1=qi∣λ),i∈{1,2,...,N}后,我们手上的信息有:在时刻t=1t=1t=1,隐状态为q1q_1q1且观测值为o1o_1o1的概率α1(1)\alpha_1(1)α1(1)、…、隐状态为qNq_NqN且观测值为o1o_1o1的概率α1(N)\alpha_1(N)α1(N)。
而计算α2(j)=P(o1,o2,i2=qj∣λ),j∈{1,2,...,N}\alpha_2(j) = P(o_1,o_2,i_2 = q_j | \lambda), j \in \{1,2,...,N\}α2(j)=P(o1,o2,i2=qj∣λ),j∈{1,2,...,N}意味着我们要求出:在时刻t=2t=2t=2,隐状态为q1q_1q1且过去两个观测值为o1o_1o1、o2o_2o2的概率α2(1)\alpha_2(1)α2(1)、…、隐状态为qNq_NqN且过去两个观测值为o1o_1o1、o2o_2o2的概率α2(N)\alpha_2(N)α2(N)。
如何利用α1(i)\alpha_1(i)α1(i)来计算α2(j)\alpha_2(j)α2(j)?
对比我们已有的信息、待求的信息,发现我们需要确定的是观测值o2o_2o2,而o2o_2o2是通过i2i_2i2决定(即bi2(o2)b_{i_2}(o_2)bi2(o2)),i2i_2i2又由i1i_1i1确定(即ai1i2a_{i_1 i_2}ai1i2)。因此,在每个α1(i)\alpha_1(i)α1(i)的基础上,再加入bi2(o2)b_{i_2}(o_2)bi2(o2)和ai1i2a_{i_1 i_2}ai1i2这两个概率,就可求得α2(j)\alpha_2(j)α2(j):
α2(j)=∑i1=1Nα1(i)bi2(o2)ai1i2\alpha_2(j) = \sum_{i_1 = 1}^N \alpha_1(i) b_{i_2}(o_2) a_{i_1 i_2}α2(j)=i1=1∑Nα1(i)bi2(o2)ai1i2
稍作调整令i1=qi,i2=qji_1 = q_i, i_2 = q_ji1=qi,i2=qj,即可得:
α2(j)=∑i=1Nα1(i)bj(o2)aij=bj(o2)∑i=1Nα1(i)aij\alpha_2(j) = \sum_{i = 1}^N \alpha_1(i) b_{j}(o_2) a_{ij} = b_j(o_2) \sum_{i=1}^N \alpha_1(i) a_{ij}α2(j)=i=1∑Nα1(i)bj(o2)aij=bj(o2)i=1∑Nα1(i)aij
意义
为什么要计算前向概率?
- 首先,前向概率可以帮助我们计算目标概率:P(O∣λ)P(O | \lambda)P(O∣λ)。根据定义,t=Tt=Tt=T时刻的前向概率为:
αT(i)=P(o1,...,oT,iT=qi∣λ)\alpha_T(i) = P(o_1,...,o_T,i_T = q_i | \lambda)αT(i)=P(o1,...,oT,iT=qi∣λ)
因此,P(O∣λ)=∑i=1NαT(i)P(O | \lambda) = \sum_{i=1}^N \alpha_T(i)P(O∣λ)=∑i=1NαT(i)。
- 其次,由于递归关系的存在,计算前向概率的工作量,远小于概率公式直接计算。注意到,i∈{1,2,...,N}i \in \{1,2,...,N\}i∈{1,2,...,N}。因此,计算α1(i)\alpha_1(i)α1(i)需进行NNN次运算;计算α2(i)\alpha_2(i)α2(i)需进行NNN次累加;…;计算αT(i)\alpha_T(i)αT(i)需进行NNN次累加。最终进行了N×TN \times TN×T次运算,远小于NTN^TNT。
计算量减少的原因在于,每一次计算直接引用前一个时刻的计算结果,避免重复计算。
后向算法(Backward Algorithm)
找出从时刻T→...→t→...→1T \to ... \to t \to ... \to 1T→...→t→...→1,后向概率的递归关系:
后向概率
在观测时间点1,...,t,...,T1,...,t,...,T1,...,t,...,T上,对应的观测值为o1,...,ot,...,oTo_1,...,o_t,...,o_To1,...,ot,...,oT,各隐状态分别为i1,...,it,...,iTi_1,...,i_t,...,i_Ti1,...,it,...,iT。
i1→...→it→...→iTi_1 \to ... \to i_t \to ...\to i_Ti1→...→it→...→iTo1→...→ot→...→oTo_1 \to ... \to o_t \to ...\to o_To1→...→ot→...→oT
定义后向概率
:βt(i)=P(ot+1,...,oT∣it=qi,λ)\beta_t(i) = P(o_{t+1},...,o_T | i_t = q_i, \lambda)βt(i)=P(ot+1,...,oT∣it=qi,λ)
它表示:在ttt时刻的状态为qiq_iqi的条件下,对于ttt之后的所有时刻,观测序列的值为ot+1,ot+2,...,oTo_{t+1},o_{t+2},...,o_Tot+1,ot+2,...,oT的概率。
递归过程的公式推导
根据定义,写出t=Tt=Tt=T、t=T−1t=T-1t=T−1和t=T−2t=T-2t=T−2的后向概率:
- βT(i)=1\beta_T(i) = 1βT(i)=1
【注】:初始值等于111是因为,后向概率考量的是ttt时刻之后(不包括ttt时刻)的观测值序列,我们的观测序列只持续到时刻TTT,TTT之后的观测值与状态都未知,所有的情况都是可能的,因此定义为111。
-
βT−1(i)=P(oT∣iT−1=qi,λ)\beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda)βT−1(i)=P(oT∣iT−1=qi,λ)
=∑k=1NP(oT,iT=qk∣iT−1=qi,λ)= \sum_{k=1}^N P(o_T,i_T = q_k| i_{T-1} = q_i, \lambda)=∑k=1NP(oT,iT=qk∣iT−1=qi,λ)
=∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)= \sum_{k=1}^N P(o_T | i_T = q_k,\lambda) P(i_T = q_k | i_{T-1} = q_i, \lambda)=∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)
=∑k=1Nbk(oT)aik= \sum_{k=1}^N b_k(o_T) a_{ik}=∑k=1Nbk(oT)aik -
βT−2(j)=P(oT,oT−1∣iT−2=qj,λ)\beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda)βT−2(j)=P(oT,oT−1∣iT−2=qj,λ)
=∑i=1N∑k=1NP(oT,oT−1,iT=qk,iT−1=qi∣iT−2=qj,λ)= \sum_{i=1}^N \sum_{k=1}^N P(o_T,o_{T-1},i_T=q_k,i_{T-1}=q_i | i_{T-2} = q_j, \lambda)=∑i=1N∑k=1NP(oT,oT−1,iT=qk,iT−1=qi∣iT−2=qj,λ)
=∑i=1N∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)P(oT−1∣iT−1=qi,λ)P(iT−1=qi∣iT−2=qj,λ)= \sum_{i=1}^N \sum_{k=1}^N P(o_T | i_T=q_k, \lambda) P(i_T=q_k | i_{T-1}=q_i, \lambda) P(o_{T-1} | i_{T-1}=q_i, \lambda) P(i_{T-1}=q_i | i_{T-2}=q_j, \lambda)=∑i=1N∑k=1NP(oT∣iT=qk,λ)P(iT=qk∣iT−1=qi,λ)P(oT−1∣iT−1=qi,λ)P(iT−1=qi∣iT−2=qj,λ)
=∑i=1NβT−1(i)bi(oT−1)aji= \sum_{i=1}^N \beta_{T-1}(i) b_i(o_{T-1}) a_{ji}=∑i=1NβT−1(i)bi(oT−1)aji
.........
递推得到βt(j)\beta_t(j)βt(j)与βt+1(i)\beta_{t+1}(i)βt+1(i)之间的关系:
βt(j)=∑i=1Nβt+1(i)bi(ot+1)aji\beta_t(j) = \sum_{i=1}^N \beta_{t+1}(i) b_i(o_{t+1}) a_{ji}βt(j)=i=1∑Nβt+1(i)bi(ot+1)aji
其中,j∈{1,2,...,N}j \in \{1,2,...,N\}j∈{1,2,...,N}。
对递归过程的直观理解
以t=T−1t = T-1t=T−1和t=T−2t = T-2t=T−2两个时刻为例,它们之间涉及到的观测值和隐状态有:oT−2o_{T-2}oT−2、oT−1o_{T-1}oT−1、oTo_ToT、iT−2i_{T-2}iT−2、iT−1i_{T-1}iT−1、iTi_TiT:
iT−2→iT−1→iTi_{T-2} \to i_{T-1} \to i_TiT−2→iT−1→iT
oT−2→oT−1→oTo_{T-2} \to o_{T-1}\to o_ToT−2→oT−1→oT
当计算出βT−1(i)=P(oT∣iT−1=qi,λ),i∈{1,2,...,N}\beta_{T-1}(i) = P(o_T | i_{T-1} = q_i, \lambda), i \in \{1,2,...,N\}βT−1(i)=P(oT∣iT−1=qi,λ),i∈{1,2,...,N}后,我们手上的信息有:在时刻t=T−1t = T-1t=T−1,隐状态为q1q_1q1的条件下,后面时刻的观测值为oTo_ToT的概率βT−1(1)\beta_{T-1}(1)βT−1(1)、…、隐状态为qNq_NqN的条件下,后面时刻的观测值为oTo_ToT的概率βT−1(N)\beta_{T-1}(N)βT−1(N)。
而计算βT−2(j)=P(oT,oT−1∣iT−2=qj,λ),j∈{1,2,...,N}\beta_{T-2}(j) = P(o_T,o_{T-1} | i_{T-2} = q_j, \lambda), j \in \{1,2,...,N\}βT−2(j)=P(oT,oT−1∣iT−2=qj,λ),j∈{1,2,...,N}意味着我们要求出:在时刻t=T−2t = T-2t=T−2,隐状态为q1q_1q1的条件下,后面时刻的观测值为oTo_ToT、oT−1o_{T-1}oT−1的概率βT−2(1)\beta_{T-2}(1)βT−2(1)、…、隐状态为qNq_NqN的条件下,后面时刻的观测值为oTo_ToT、oT−1o_{T-1}oT−1的概率βT−2(N)\beta_{T-2}(N)βT−2(N)。
如何利用βT−1(i)\beta_{T-1}(i)βT−1(i)来计算βT−2(j)\beta_{T-2}(j)βT−2(j)?
对比我们已有的信息、待求的信息,发现我们需要确定的是观测值oT−1o_{T-1}oT−1,而oT−1o_{T-1}oT−1是通过iT−1i_{T-1}iT−1决定(即biT−1(oT−1)b_{i_{T-1}}(o_{T-1})biT−1(oT−1)),iT−1i_{T-1}iT−1又由iT−2i_{T-2}iT−2确定(即aiT−2iT−1a_{i_{T-2} i_{T-1}}aiT−2iT−1)。因此,在每个βT−1(i)\beta_{T-1}(i)βT−1(i)的基础上,再加入biT−1(oT−1)b_{i_{T-1}}(o_{T-1})biT−1(oT−1)和aiT−2iT−1a_{i_{T-2} i_{T-1}}aiT−2iT−1这两个概率,就可求得βT−2(j)\beta_{T-2}(j)βT−2(j):
βT−2(j)=∑iT−1=1NβT−1(i)biT−1(oT−1)aiT−2iT−1\beta_{T-2}(j) = \sum_{i_{T-1} = 1}^N \beta_{T-1}(i) b_{i_{T-1}}(o_{T-1}) a_{i_{T-2} i_{T-1}}βT−2(j)=iT−1=1∑NβT−1(i)biT−1(oT−1)aiT−2iT−1
稍作调整令t=T−2,t+1=T−1,iT−1=qi,iT−2=qjt = T-2, t+1 = T-1, i_{T-1} = q_i, i_{T-2} = q_jt=T−2,t+1=T−1,iT−1=qi,iT−2=qj,即可得:
βt(j)=∑i=1Nβt+1(i)bi(ot+1)aji\beta_{t}(j) = \sum_{i = 1}^N \beta_{t+1}(i) b_{i}(o_{t+1}) a_{ji}βt(j)=i=1∑Nβt+1(i)bi(ot+1)aji
意义
为什么要计算后向概率?
- 首先,后向概率也可以帮助我们计算目标概率:P(O∣λ)P(O | \lambda)P(O∣λ)。根据定义,t=1t=1t=1时刻的后向概率为:
β1(i)=P(o2,...,oT∣i1=qi,λ)\beta_1(i) = P(o_2,...,o_T | i_1 = q_i, \lambda)β1(i)=P(o2,...,oT∣i1=qi,λ)
此时β1(i)\beta_1(i)β1(i)与目标概率P(O∣λ)P(O | \lambda)P(O∣λ)相比,还差一个观测值o1o_1o1。由于所有的观测都相互独立,在t=1t=1t=1时刻、状态为qiq_iqi的条件下,观测值o1o_1o1出现的条件概率为:P(o1∣i1=qi,λ)=bi(o1)P(o_1 | i_1 = q_i, \lambda) = b_i(o_1) P(o1∣i1=qi,λ)=bi(o1)
两式相乘,得到所有观测值O=(o1,...,oT)O = (o_1,...,o_T)O=(o1,...,oT)在t=1t=1t=1时刻、状态为qiq_iqi条件下的联合概率:P(o1,...,oT∣i1=qi,λ)=β1(i)bi(o1)P(o_1,...,o_T | i_1 = q_i, \lambda) = \beta_1(i) b_i(o_1)P(o1,...,oT∣i1=qi,λ)=β1(i)bi(o1)
因此,目标概率P(O∣λ)=∑i=1NP(o1,...,oT∣i1=qi,λ)P(i1=qi∣λ)=∑i=1Nβ1(i)bi(o1)πiP(O | \lambda) = \sum_{i=1}^N P(o_1,...,o_T | i_1 = q_i, \lambda) P(i_1 = q_i| \lambda ) = \sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_iP(O∣λ)=i=1∑NP(o1,...,oT∣i1=qi,λ)P(i1=qi∣λ)=i=1∑Nβ1(i)bi(o1)πi
- 其次,后向概率与前向概率的计算量一样,最终进行了N×TN \times TN×T次运算,都远远小于概率公式直接计算的NTN^TNT项。
前向-后向算法(Forward-Backward Algorithm)
前向算法利用前向概率,从1→T1 \to T1→T的方向计算P(O∣λ)P(O | \lambda)P(O∣λ) = ∑i=1NαT(i)\sum_{i=1}^N \alpha_T(i)∑i=1NαT(i)
后向算法利用后向概率,从T→1T \to 1T→1的方向计算P(O∣λ)P(O | \lambda)P(O∣λ) = ∑i=1Nβ1(i)bi(o1)πi\sum_{i=1}^N \beta_1(i) b_i(o_1) \pi_i∑i=1Nβ1(i)bi(o1)πi
也可以同时用前向概率、后向概率计算P(O∣λ)P(O | \lambda)P(O∣λ):
P(O∣λ)=∑i=1NP(O,it=qi∣λ)P(O | \lambda) = \sum_{i=1}^N P(O,i_t = q_i | \lambda)P(O∣λ)=∑i=1NP(O,it=qi∣λ)
=∑i=1NP(O∣it=qi,λ)P(it=qi∣λ)= \sum_{i=1}^N P(O | i_t = q_i,\lambda) P(i_t = q_i | \lambda)=∑i=1NP(O∣it=qi,λ)P(it=qi∣λ)
=∑i=1NP(o1,...,ot∣it=qi,λ)P(ot+1,...,oT∣it=qi,λ)P(it=qi∣λ)= \sum_{i=1}^N P(o_1,...,o_t | i_t = q_i,\lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda) P(i_t = q_i | \lambda)=∑i=1NP(o1,...,ot∣it=qi,λ)P(ot+1,...,oT∣it=qi,λ)P(it=qi∣λ)
=∑i=1NP(o1,...,ot,it=qi∣λ)P(ot+1,...,oT∣it=qi,λ)= \sum_{i=1}^N P(o_1,...,o_t,i_t = q_i | \lambda) P(o_{t+1},...,o_T | i_t = q_i,\lambda)=∑i=1NP(o1,...,ot,it=qi∣λ)P(ot+1,...,oT∣it=qi,λ)
=∑i=1Nαt(i)βt(i)= \sum_{i=1}^N \alpha_t(i) \beta_t(i)=∑i=1Nαt(i)βt(i)
若利用后向概率的递推关系,替换βt(i)=∑j=1Nβt+1(j)bj(ot+1)aij\beta_{t}(i) = \sum_{j = 1}^N \beta_{t+1}(j) b_{j}(o_{t+1}) a_{ij}βt(i)=∑j=1Nβt+1(j)bj(ot+1)aij,又有:
P(O∣λ)=∑i=1N∑j=1Nαt(i)βt+1(j)bj(ot+1)aijP(O | \lambda) = \sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}P(O∣λ)=i=1∑Nj=1∑Nαt(i)βt+1(j)bj(ot+1)aij
其他概率的计算
利用前向、后向概率,还可以进行其他的计算:
- 给定模型λ\lambdaλ,则观测序列为O=(o1,...,oT)O=(o_1,...,o_T)O=(o1,...,oT)、且ttt时刻的隐状态为qiq_iqi的概率:
P(O,it=qi∣λ)=αt(i)βt(i)P(O,i_t = q_i | \lambda) = \alpha_t(i) \beta_t(i)P(O,it=qi∣λ)=αt(i)βt(i)
- 给定模型λ\lambdaλ和观测序列O=(o1,...,oT)O=(o_1,...,o_T)O=(o1,...,oT),则ttt时刻的隐状态为qiq_iqi的概率(
单个状态
):
P(it=qi∣O,λ)=P(O,it=qi∣λ)P(O∣λ)=αt(i)βt(i)∑j=1Nαt(j)βt(j)P(i_t = q_i | O,\lambda) = \frac{P(O,i_t = q_i | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \beta_t(j)}P(it=qi∣O,λ)=P(O∣λ)P(O,it=qi∣λ)=∑j=1Nαt(j)βt(j)αt(i)βt(i)
- 给定模型λ\lambdaλ和观测序列O=(o1,...,oT)O=(o_1,...,o_T)O=(o1,...,oT),则ttt时刻的隐状态为qiq_iqi、且t+1t+1t+1时刻的隐状态为qjq_jqj的概率(
两个状态
):
P(it=qi,it+1=qj∣O,λ)=P(O,it=qi,it+1=qj∣λ)P(O∣λ)=αt(i)βt+1(j)bj(ot+1)aij∑i=1N∑j=1Nαt(i)βt+1(j)bj(ot+1)aijP(i_t = q_i,i_{t+1} = q_j | O,\lambda) = \frac{P(O,i_t = q_i,i_{t+1} = q_j | \lambda)}{P(O | \lambda)} = \frac{\alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}}{\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \beta_{t+1}(j) b_j(o_{t+1}) a_{ij}}P(it=qi,it+1=qj∣O,λ)=P(O∣λ)P(O,it=qi,it+1=qj∣λ)=∑i=1N∑j=1Nαt(i)βt+1(j)bj(ot+1)aijαt(i)βt+1(j)bj(ot+1)aij