隐马尔科夫模型的基本概念
隐马尔科夫模型的定义
关于时间序列得模型,描述由一个隐藏得马尔可夫链随机生成不可观测得状态随机序列,再由状态生成一个观测从而产生观测随机序列得过程。隐藏得马尔可夫链随机生成得状态序列,称为状态序列;每个状态生成一个观测,而由此产生的观测随机序列,称为观测序列。序列的每个位置又可以看作是一个时刻
Q={q1,q2,..qN},V={v1,v2,..,vM}Q=\{q_1,q_2,..q_N\},V=\{v_1,v_2,..,v_M\}Q={q1,q2,..qN},V={v1,v2,..,vM}
其中QQQ为所有可能的状态集合,VVV是所有的观测集合
I={i1,i2,...,iN},O={o1,o2,...,oT}I=\{i_1,i_2,...,i_N\},O=\{o_1,o_2,...,o_T\}I={i1,i2,...,iN},O={o1,o2,...,oT}
III是状态序列,OOO是观测序列
转移矩阵
A=[aij]N×NA=[a_{ij}]_{N \times N}A=[aij]N×N
aij=P(it+1=qj∣it=qi)a_{ij}=P(i_{t+1}=q_j|i_t=q_i)aij=P(it+1=qj∣it=qi)
观测矩阵
B=[bj(k)]N×MB=[b_j(k)]_{N \times M}B=[bj(k)]N×M
bj(k)=P(ot=vk∣it=qj)b_j(k)=P(o_t=v_k|i_t=q_j)bj(k)=P(ot=vk∣it=qj)
初始状态概率向量
π=(πi), πi=P(i1=qi)\pi=(\pi_i),\ \ \ \ \ \pi_i=P(i_1=q_i)π=(πi), πi=P(i1=qi)
因此隐马尔科夫模型
λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π)表示
又定义知作了两个假设
P(it∣it−1,ot−1,...,i1,o1)=P(it∣it−1)P(i_t|i_{t-1},o_{t-1},...,i_1,o_1)=P(i_t|i_{t-1})P(it∣it−1,ot−1,...,i1,o1)=P(it∣it−1)
P(ot∣iT,oT,....,i1,o1)=P(ot∣it)P(o_t|i_T,o_T,....,i_1,o_1)=P(o_t|i_t)P(ot∣iT,oT,....,i1,o1)=P(ot∣it)
观测序列的生成
输入:隐马尔可夫模型λ=(A,B,π),\lambda=(A,B,\pi),λ=(A,B,π),观测序列长度TTT
输出:观测序列O=(o1,o2,...,oT)O=(o_1,o_2,...,o_T)O=(o1,o2,...,oT)
(1)(1)(1)按照初始状态分布π\piπ产生状态i1i_1i1
(2)(2)(2)令t=1t=1t=1
(3)(3)(3)按照状态iti_tit的观测概率分布bit(k)b_{i_t}(k)bit(k)生成oto_tot
(4)(4)(4)按照状态iti_tit的状态转移概率分布产生it+1i_{t+1}it+1
(5)(5)(5)t=t+1t=t+1t=t+1如果t<Tt \lt Tt<T则(3)(3)(3)否则终止
隐马尔科夫模型的3个基本问题
- 概率计算问题
已知λ=(A,B,π),O=(o1,o2,..,oT)\lambda=(A,B,\pi),O=(o_1,o_2,..,o_T)λ=(A,B,π),O=(o1,o2,..,oT)计算P(O∣λ)P(O|\lambda)P(O∣λ) - 学习问题
已知O=(o1,o2,..,oT)O=(o_1,o_2,..,o_T)O=(o1,o2,..,oT),估计λ\lambdaλ使得P(O∣λ)P(O|\lambda)P(O∣λ)最大 - 预测问题
已知λ,O=(o1,o2,..,oT)\lambda,O=(o_1,o_2,..,o_T)λ,O=(o1,o2,..,oT)求P(I∣O)P(I|O)P(I∣O)
概率计算算法
直接计算法
P(I∣λ)=πitai1i2ai2i3...aiT−1iTP(I|\lambda)=\pi_{i_t}a_{i_1i_2}a_{i_2i_3}...a_{i_{T-1}i_T}P(I∣λ)=πitai1i2ai2i3...aiT−1iT
P(O∣I,λ)=bi1(o1)bi2(o2)...biT(oT)P(O|I,\lambda)=b_{i_1}(o_1)b_{i_2}(o_2)...b_{i_T}(o_T)P(O∣I,λ)=bi1(o1)bi2(o2)...biT(oT)
P(O,I∣λ)=P(O∣I,λ)P(I∣λ)P(O,I|\lambda)=P(O|I,\lambda)P(I|\lambda)P(O,I∣λ)=P(O∣I,λ)P(I∣λ)
=πi1bi1(o1)πi2bi2(o2)...πiTbiT(oT)=\pi_{i_1}b_{i_1}(o_1)\pi_{i_2}b_{i_2}(o_2)...\pi_{i_T}b_{i_T}(o_T)=πi1bi1(o1)πi2bi2(o2)...πiTbiT(oT)
P(O∣λ)=∑IP(O∣I,λ)P(I∣λ)P(O|\lambda)=\sum\limits_{I}P(O|I,\lambda)P(I|\lambda)P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)
但是计算复杂度太高
前向计算法
定义at(i)=P(o1,o2,...,ot,it=qi∣λ)a_t(i)=P(o_1,o_2,...,o_t,i_t=q_i|\lambda)at(i)=P(o1,o2,...,ot,it=qi∣λ)
算法
输入:隐马尔可夫模型λ\lambdaλ,观测序列OOO
输出:观测序列的概率P(O∣λ)P(O|\lambda)P(O∣λ)
(1)(1)(1)初值
a1(i)=πibi(o1)a_1(i)=\pi_ib_i(o_1)a1(i)=πibi(o1)
(2)(2)(2)递推
at+1(i)=[∑j=1Nat(j)aji]bi(ot+1)a_{t+1}(i)=[\sum\limits_{j=1}^Na_t(j)a_{ji}]b_i(o_{t+1})at+1(i)=[j=1∑Nat(j)aji]bi(ot+1)
(3)(3)(3)终止
P(O∣λ)=∑i=1NaT(i)P(O|\lambda)=\sum\limits_{i=1}^Na_T(i)P(O∣λ)=i=1∑NaT(i)
后向计算法
定义βt(i)=P(ot+1,ot+2,...,oT∣t=qi,λ)\beta_t(i)=P(o_{t+1},o_{t+2},...,o_T|t=q_i,\lambda)βt(i)=P(ot+1,ot+2,...,oT∣t=qi,λ)
算法
输入:隐马尔可夫模型λ\lambdaλ,观测序列OOO
输出:观测序列的概率P(O∣λ)P(O|\lambda)P(O∣λ)
(1)(1)(1)初值
βT(i)=1\beta_T(i)=1βT(i)=1
(2)(2)(2)递推
βt(i)=∑j=1Naijbj(ot+1)βt+1(j)\beta_t(i)=\sum\limits_{j=1}^Na_{ij}b_j(o_{t+1})\beta_{t+1}(j)βt(i)=j=1∑Naijbj(ot+1)βt+1(j)
(3)(3)(3)终止
P(O∣λ)=∑i=1Nπibi(o1)β1(i)P(O|\lambda)=\sum\limits_{i=1}^N\pi_ib_i(o_1)\beta_1(i)P(O∣λ)=i=1∑Nπibi(o1)β1(i)
一些概率与期望值的计算
1.1.1.给定模型λ\lambdaλ和观测OOO,在时刻ttt处于状态qiq_iqi的概率
γt(i)=P(it=qi∣O,λ)\gamma_t(i)=P(i_t=q_i|O,\lambda)γt(i)=P(it=qi∣O,λ)
=P(it=qi,O∣λ)P(O∣λ)=\frac{P(i_t=q_i,O|\lambda)}{P(O|\lambda)}=P(O∣λ)P(it=qi,O∣λ)
又
at(i)βt(i)=P(it=qi,O∣λ)a_t(i)\beta_t(i)=P(i_t=q_i,O|\lambda)at(i)βt(i)=P(it=qi,O∣λ)
最终
γt(i)=at(i)βt(i)∑j=1Nat(j)βt(j)\gamma_t(i)=\frac{a_t(i)\beta_t(i)}{\sum\limits_{j=1}^Na_t(j)\beta_t(j)}γt(i)=j=1∑Nat(j)βt(j)at(i)βt(i)
2.2.2.给定模型λ,O\lambda,Oλ,O在时刻ttt在,qiq_iqi且t+1t+1t+1在qjq_jqj处的概率
ξt(i,j)=P(it=qi,it+1=qj∣O,λ)\xi_t(i,j)=P(i_t=q_i,i_{t+1}=q_j|O,\lambda)ξt(i,j)=P(it=qi,it+1=qj∣O,λ)
=P(it=qi,it+1=qj,O∣λ)P(O∣λ)=P(it=qi,it+1=qj,O∣λ)∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)=\frac{P(i_t=q_i,i_{t+1}=q_j,O|\lambda)}{P(O|\lambda)}=\frac{P(i_t=q_i,i_{t+1}=q_j,O|\lambda)}{\sum\limits_{i=1}^N\sum\limits_{j=1}^NP(i_t=q_i,i_{t+1}=q_j,O|\lambda)}=P(O∣λ)P(it=qi,it+1=qj,O∣λ)=i=1∑Nj=1∑NP(it=qi,it+1=qj,O∣λ)P(it=qi,it+1=qj,O∣λ)
=at(i)aijbj(ot+1)βt+1(j)∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)=\frac{a_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\sum\limits_{i=1}^N\sum\limits_{j=1}^NP(i_t=q_i,i_{t+1}=q_j,O|\lambda)}=i=1∑Nj=1∑NP(it=qi,it+1=qj,O∣λ)at(i)aijbj(ot+1)βt+1(j)
3.3.3.导出
- 在观测OOO下状态iii出现的期望
∑t=1Tγt(i)\sum\limits_{t=1}^T\gamma_t(i)t=1∑Tγt(i) - 在观测OOO下状态iii转移的期望
∑t=1T−1γt(i)\sum\limits_{t=1}^{T-1}\gamma_t(i)t=1∑T−1γt(i) - 在观测OOO下状态iii转移到jjj的期望
∑t=1T−1ξt(i,j)\sum\limits_{t=1}^{T-1}\xi_t(i,j)t=1∑T−1ξt(i,j)
学习算法
监督学习方法
假设已经给出{(O1,I1),(O2,I2),...,(OS,IS)}\{(O_1,I_1),(O_2,I_2),...,(O_S,I_S)\}{(O1,I1),(O2,I2),...,(OS,IS)},我们利用极大似然估计来求
1.1.1.转移矩阵aija_{ij}aij
aij=Aij∑j=1NAija_{ij}=\frac{A_{ij}}{\sum\limits_{j=1}^NA_{ij}}aij=j=1∑NAijAij
其中AijA_{ij}Aij为iii到jjj的频数
2.2.2.观测矩阵估计
bj(k)=Bjk∑k=1MBjkb_j(k)=\frac{B_{jk}}{\sum\limits_{k=1}^MB_{jk}}bj(k)=k=1∑MBjkBjk
3.3.3.初始状态的估计为初始的qiq_iqi频度
Baum-Welch算法
1.1.1.确定完全数据的对数似然函数
log P(O,I∣λ)\log \ P(O,I|\lambda)log P(O,I∣λ)
2.2.2.求QQQ函数
Q(λ,λ^)=∑IlogP(O,I∣λ)P(O,I∣λ^)Q(\lambda,\hat{\lambda})=\sum\limits_I\log P(O,I|\lambda)P(O,I|\hat{\lambda})Q(λ,λ^)=I∑logP(O,I∣λ)P(O,I∣λ^)
P(O,I∣λ)=πi1bi1(o1)ai1i2bi2(o2)...aiT−1iTbiT(oT)P(O,I|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)...a_{i_{T-1}i_T}b_{i_{T}}(o_T)P(O,I∣λ)=πi1bi1(o1)ai1i2bi2(o2)...aiT−1iTbiT(oT)
Q(λ,λ^)=∑Ilogπi1P(O,I∣λ^)+∑I(∑t=1T−1logaitit+1)P(O,I∣λ^)+∑I(∑t=1Tlogbit(ot)P(O,I∣λ^))Q(\lambda,\hat{\lambda})=\sum\limits_I\log \pi_{i_1} P(O,I|\hat{\lambda})+\sum\limits_I(\sum\limits_{t=1}^{T-1}\log a_{i_ti_{t+1}})P(O,I|\hat{\lambda})+\sum\limits_I(\sum\limits_{t=1}^T\log b_{i_t}(o_t)P(O,I|\hat{\lambda}))Q(λ,λ^)=I∑logπi1P(O,I∣λ^)+I∑(t=1∑T−1logaitit+1)P(O,I∣λ^)+I∑(t=1∑Tlogbit(ot)P(O,I∣λ^))
3.3.3.最大化
(1)(1)(1)第一项
∑Ilogπi1P(O,I∣λ^)=∑i=1NlogπiP(O,i1=i∣λ^)\sum\limits_I\log \pi_{i_1}P(O,I|\hat{\lambda})=\sum\limits_{i=1}^N \log \pi_i P(O,i_1=i|\hat{\lambda})I∑logπi1P(O,I∣λ^)=i=1∑NlogπiP(O,i1=i∣λ^)
又∑i=1Nπi=1\sum\limits_{i=1}^N\pi_i=1i=1∑Nπi=1,拉格朗日函数为
∑i=1NlogπiP(O,i1=i∣λ^)+γ(∑i=1Nπi−1)\sum\limits_{i=1}^N\log \pi_iP(O,i_1=i|\hat{\lambda})+\gamma(\sum\limits_{i=1}^N\pi_i-1)i=1∑NlogπiP(O,i1=i∣λ^)+γ(i=1∑Nπi−1)
对πi\pi_iπi求导为0得
P(O,i1=i∣λ^)+γπi=0P(O,i_1=i|\hat{\lambda})+\gamma\pi_i=0P(O,i1=i∣λ^)+γπi=0
πi=P(O,i1=i∣λ^)P(O∣λ^)\pi_i=\frac{P(O,i_1=i|\hat{\lambda})}{P(O|\hat{\lambda})}πi=P(O∣λ^)P(O,i1=i∣λ^)
(2)(2)(2)第二项
∑I(∑t=1T−1logaitit+1)P(O,I∣λ^)=∑i=1N∑j=1N∑t=1T−1logaijP(O,it=i,it+1=j∣λ^)\sum\limits_I(\sum\limits_{t=1}^{T-1}\log a_{i_ti_{t+1}})P(O,I|\hat{\lambda})=\sum\limits_{i=1}^N\sum\limits_{j=1}^N\sum\limits_{t=1}^{T-1}\log a_{ij}P(O,i_t=i,i_{t+1}=j|\hat{\lambda})I∑(t=1∑T−1logaitit+1)P(O,I∣λ^)=i=1∑Nj=1∑Nt=1∑T−1logaijP(O,it=i,it+1=j∣λ^)
约束条件为∑j=1Naij=1\sum\limits_{j=1}^Na_{ij}=1j=1∑Naij=1
aij=∑t=1T−1P(O,it=i,it+1=j∣λ^)∑t=1T−1P(O,it=i∣λ^)a_{ij}=\frac{\sum\limits_{t=1}^{T-1}P(O,i_t=i,i_{t+1}=j|\hat{\lambda})}{\sum\limits_{t=1}^{T-1}P(O,i_t=i|\hat{\lambda})}aij=t=1∑T−1P(O,it=i∣λ^)t=1∑T−1P(O,it=i,it+1=j∣λ^)
(3)(3)(3)第三项
∑I(∑t=1Tlogbit(ot)P(O,I∣λ^))=∑j=1N∑t=1Tlogbj(ot)P(O,it=j∣λ^)\sum\limits_I(\sum\limits_{t=1}^T\log b_{i_t}(o_t)P(O,I|\hat{\lambda}))=\sum\limits_{j=1}^N\sum\limits_{t=1}^T\log b_j(o_t)P(O,i_t=j|\hat{\lambda})I∑(t=1∑Tlogbit(ot)P(O,I∣λ^))=j=1∑Nt=1∑Tlogbj(ot)P(O,it=j∣λ^)
同约束条件∑k=1Mbj(k)=1\sum\limits_{k=1}^Mb_j(k)=1k=1∑Mbj(k)=1
bj(k)=∑t=1TP(O,it=j∣λ^)I(ot=vk)∑t=1TP(O,it=j∣λ)^b_j(k)=\frac{\sum\limits_{t=1}^TP(O,i_t=j|\hat{\lambda})I(o_t=v_k)}{\sum\limits_{t=1}^TP(O,i_t=j|\hat{\lambda)}}bj(k)=t=1∑TP(O,it=j∣λ)^t=1∑TP(O,it=j∣λ^)I(ot=vk)
Baum-Welch模型参数估计公式
aij=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)a_{ij}=\frac{\sum\limits_{t=1}^{T-1}\xi_t(i,j)}{\sum\limits_{t=1}^{T-1}\gamma_t(i)}aij=t=1∑T−1γt(i)t=1∑T−1ξt(i,j)
bj(k)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)b_j(k)=\frac{\sum\limits_{t=1,o_t=v_k}^{T}\gamma_t(j)}{\sum\limits_{t=1}^{T}\gamma_t(j)}bj(k)=t=1∑Tγt(j)t=1,ot=vk∑Tγt(j)
πi=γ1(i)\pi_i=\gamma_1(i)πi=γ1(i)
算法
输入:观测数据O=(o1,o2,..,oT)O=(o_1,o_2,..,o_T)O=(o1,o2,..,oT)
输出:隐马尔可夫模型
(1)(1)(1)对于n=0n=0n=0选取λ(0)=(A(0),B(0),π(0))\lambda^{(0)}=(A^{(0)},B^{(0)},\pi^{(0)})λ(0)=(A(0),B(0),π(0))
(2)(2)(2)递推
aij(n+1)=∑t=1T−1ξt(i,j)∑t=1T−1γt(i)a_{ij}^{(n+1)}=\frac{\sum\limits_{t=1}^{T-1}\xi_t(i,j)}{\sum\limits_{t=1}^{T-1}\gamma_t(i)}aij(n+1)=t=1∑T−1γt(i)t=1∑T−1ξt(i,j)
bj(k)(n+1)=∑t=1,ot=vkTγt(j)∑t=1Tγt(j)b_j(k)^{(n+1)}=\frac{\sum\limits_{t=1,o_t=v_k}^{T}\gamma_t(j)}{\sum\limits_{t=1}^{T}\gamma_t(j)}bj(k)(n+1)=t=1∑Tγt(j)t=1,ot=vk∑Tγt(j)
πi(n+1)=γ1(i)\pi_i^{(n+1)}=\gamma_1(i)πi(n+1)=γ1(i)
(3)(3)(3)如果满足条件,终止
预测算法
近似算法
it∗=arg max1≤i≤N[γt(i)]i_t^*=\argmax\limits_{1\le i\le N}[\gamma_t(i)]it∗=1≤i≤Nargmax[γt(i)]
但太简单了
维特比算法
输入:模型λ,O\lambda,Oλ,O
输出:最优路径I∗I^*I∗
(1)(1)(1)初始化
ϱ1(i)=πibi(o1)\varrho_1(i)=\pi_ib_i(o_1)ϱ1(i)=πibi(o1)
ψ1(i)=0\psi_1(i)=0ψ1(i)=0
(2)(2)(2)递推
ϱt(i)=max1≤j≤N[ϱt−1(j)aji]bi(ot)\varrho_t(i)=\max\limits_{1\le j \le N}[\varrho_{t-1}(j)a_{ji}]b_i(o_t)ϱt(i)=1≤j≤Nmax[ϱt−1(j)aji]bi(ot)
ψt(i)=arg max1≤j≤N[ϱt−1(j)aji]\psi_t(i)=\argmax\limits_{1 \le j \le N}[\varrho_{t-1}(j)a_{ji}]ψt(i)=1≤j≤Nargmax[ϱt−1(j)aji]
(3)(3)(3)终止
P∗=max1≤i≤NϱT(i)P^*=\max\limits_{1 \le i \le N}\varrho_T(i)P∗=1≤i≤NmaxϱT(i)
iT∗=arg max1≤i≤N[ϱT(i)]i_T^*=\argmax\limits_{1 \le i \le N}[\varrho_T(i)]iT∗=1≤i≤Nargmax[ϱT(i)]
(4)(4)(4)回溯最优路径