Table of Contents
Hidden Markov Model (隐马尔科夫模型)
两种问题特征:
- 基于序列的,比如时间序列,或者状态序列
- 两类数据,一类序列数据是可以观测到的,即观测序列;而另一类数据是不能观察到的,即隐藏状态序列,简称状态序列
定义
假设
N
N
N是可能的隐藏状态数,
M
M
M是可能的观测状态数,定义
Q
=
{
q
1
,
q
2
,
…
,
q
N
}
,
V
=
{
v
1
,
v
2
,
…
,
v
M
}
\mathcal{Q}=\{q_1,q_2,\dots,q_N\},\mathcal{V}=\{v_1,v_2,\dots,v_M\}
Q={q1,q2,…,qN},V={v1,v2,…,vM}
分别为所有可能的隐藏状态和所有可能的观测状态的集合
同时,对于一个长度为
T
T
T的序列
I
I
I,和对应的观测序列
O
O
O
I
=
{
s
1
,
s
2
,
…
,
s
T
}
,
O
=
{
o
1
,
o
2
,
…
,
o
T
}
\mathcal{I}=\{s_1,s_2,\dots,s_T\},\mathcal{O}=\{o_1,o_2,\dots,o_T\}
I={s1,s2,…,sT},O={o1,o2,…,oT}
HMM做了两个很重要的假设:
- 齐次马尔科夫链假设。任意时刻隐藏状态只依赖于它前一个隐藏状态
定义状态转移概率 A i j A_{ij} Aij为从当前时刻 t t t的状态 s i s_i si转移到下一时刻 t + 1 t+1 t+1的状态 s j s_j sj的概率,即
A i j = P ( s t + 1 = q j ∣ s t = q i ) A_{ij}=P(s_{t+1}=q_j|s_t=q_i) Aij=P(st+1=qj∣st=qi)
从而定义状态转移矩阵 A ∈ R N × N A\in \mathbb{R}^{N\times N} A∈RN×N - 观测独立性假设即任意时刻的观测状态只仅仅依赖于当前时刻的隐藏状态。定义生成概率
B
i
j
B_{ij}
Bij为由隐藏状态
s
i
s_i
si生成观测状态
q
j
q_j
qj的概率,即
B i j = P ( o t = v i ∣ s t = q j ) B_{ij}=P(o_t=v_i|s_t=q_j) Bij=P(ot=vi∣st=qj)
从而定义生成概率矩阵(发射矩阵) B ∈ R N × M B\in \mathbb{R}^{N\times M} B∈RN×M
最后,定义在 t t t时刻的隐藏状态分布 Π t = [ π t ( k ) ] \Pi_t=[\pi_t (k)] Πt=[πt(k)],其中 π t ( k ) = P ( s t = q k ) \pi_t (k)=P(s_t=q_k) πt(k)=P(st=qk)
因此一个HMM模型主要由三个参数表示:
λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)
基本问题
-
- 评估观察序列概率。给定模型 λ \lambda λ和观测序列 O \mathcal{O} O,计算在模型 λ \lambda λ下该观测序列 O \mathcal{O} O出现的概率 P ( O ∣ λ ) P(\mathcal{O}|\lambda) P(O∣λ)。求解方法:前向后向算法
-
- 预测问题。给定观测序列 O = { o 1 , o 2 , … , o T } \mathcal{O}=\{o_1,o_2,\dots,o_T\} O={o1,o2,…,oT}和模型参数 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π),求解最有可能出现的隐藏状态序列。求解方法:Viterbi算法
前向算法
算法流程
输入:观测序列
O
=
{
o
1
,
o
2
,
…
,
o
T
}
\mathcal{O}=\{o_1,o_2,\dots,o_T\}
O={o1,o2,…,oT},模型参数
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)
输出:观测序列
P
(
O
∣
λ
)
P(O|\lambda)
P(O∣λ)
步骤:
- 计算时刻1各个隐藏状态
s
i
s_i
si的前向概率
α 1 ( i ) = π ( i ) B i , o 1 , i = 1 , 2 , … , N \alpha_1(i)=\pi(i)B_{i,o_1},i=1,2,\dots,N α1(i)=π(i)Bi,o1,i=1,2,…,N - 递推
2
,
3
,
…
,
T
2,3,\dots,T
2,3,…,T时刻的前向概率
α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) A j i ] B i , o t + 1 , i = 1 , 2 , … , N \alpha_{t+1}(i)=[\sum_{j=1}^{N}\alpha_{t}(j)A_{ji}]B_{i,o_{t+1}},i=1,2,\dots,N αt+1(i)=[j=1∑Nαt(j)Aji]Bi,ot+1,i=1,2,…,N - 最终结果
P ( O ∣ λ ) = ∑ i N α T ( i ) P(\mathcal{O}|\lambda)=\sum_i^N\alpha_T(i) P(O∣λ)=i∑NαT(i)
实现代码
def HMMfwd(pi, a, b, obs):
'''
pi:初始概率分布
a:状态转移矩阵
b:发射矩阵
obs:观测序列
'''
nStates = np.shape(b)[0]
T = np.shape(obs)[0]
alpha = np.zeros((nStates,T))
'''alpha[i,t]表示上述公式的 alpha_t(i)'''
alpha[:,0] = pi*b[:,obs[0]]
for t in range(1,T):
for s in range(nStates):
alpha[s,t] = b[s,obs[t]] * np.sum(alpha[:,t-1] * a[:,s])
return alpha
最后计算 P ( O ∣ λ ) P(\mathcal{O}|\lambda) P(O∣λ)
np.sum(alpha[:,-1])
后向算法
算法流程
输入:观测序列
O
=
{
o
1
,
o
2
,
…
,
o
T
}
\mathcal{O}=\{o_1,o_2,\dots,o_T\}
O={o1,o2,…,oT},模型参数
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)
输出:观测序列
P
(
O
∣
λ
)
P(O|\lambda)
P(O∣λ)
步骤:
- 初始化时刻
T
T
T各个隐藏状态
s
i
s_i
si的前向概率
β T ( i ) = 1 , i = 1 , 2 , … , N \beta_T(i)=1, i=1,2,\dots,N βT(i)=1,i=1,2,…,N - 递推
T
−
1
,
T
−
2
,
…
,
1
T-1,T-2,\dots,1
T−1,T−2,…,1时刻的后向概率
β t ( i ) = ∑ i = 1 N A i j B j , o t + 1 β t + 1 ( j ) , i = 1 , 2 , … , N \beta_{t}(i)=\sum_{i=1}^{N}A_{ij}B_{j,o_{t+1}}\beta_{t+1}(j),i=1,2,\dots,N βt(i)=i=1∑NAijBj,ot+1βt+1(j),i=1,2,…,N - 最终结果
P ( O ∣ λ ) = ∑ i N π i B i , o 1 β 1 ( i ) P(\mathcal{O}|\lambda)=\sum_i^N\pi_iB_{i,o_1}\beta_1(i) P(O∣λ)=i∑NπiBi,o1β1(i)
实现代码
def HMMbwd(a, b, obs):
'''
a:状态转移矩阵
b:发射矩阵
obs:观测序列
'''
nStates = np.shape(b)[0]
T = np.shape(obs)[0]
'''beta[i,t]表示上述公式的 beta_t(i)'''
beta = np.zeros((nStates,T))
beta[:,-1] = 1.0
for t in range(T-2,-1,-1):
for s in range(nStates):
beta[s,t] = np.sum(a[s,:] * b[:,obs[t+1]] * beta[:,t+1] )
return beta
最后计算 P ( O ∣ λ ) P(\mathcal{O}|\lambda) P(O∣λ)
np.sum(pi*b[:,obs[0]]*beta[:,0])
Viterbi算法
算法流程
输入:观测序列
O
=
{
o
1
,
o
2
,
…
,
o
T
}
\mathcal{O}=\{o_1,o_2,\dots,o_T\}
O={o1,o2,…,oT},模型参数
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)
输出:最有可能的有隐状态序列
I
∗
=
{
i
1
∗
,
i
2
∗
,
…
,
i
T
∗
}
\mathcal{I}^*=\{i^{*}_1,i^{*}_2,\dots,i^{*}_T\}
I∗={i1∗,i2∗,…,iT∗}
步骤:
- 初始化局部状态
δ 1 ( i ) = π i B i , o 1 , Φ 1 ( i ) = 0 , i = 1 , 2 , … , N \delta_1(i)=\pi_iB_{i,o_1},\quad \Phi_1(i)=0,\quad i=1,2,\dots,N δ1(i)=πiBi,o1,Φ1(i)=0,i=1,2,…,N - 进行动态规划递推时刻
t
=
2
,
3
,
…
,
T
t=2,3,\dots,T
t=2,3,…,T的局部状态
δ t ( i ) = max 1 ≤ j ≤ N [ δ t − 1 ( j ) A j , i ] B i , o t , i = 1 , 2 , … , N Φ t ( i ) = arg max 1 ≤ j ≤ N [ δ t − 1 ( j ) A j , i ] , i = 1 , 2 , … , N \begin{array}{c} {\delta_t(i)=\max_{1\leq j\leq N}[\delta_{t-1}(j)A_{j,i}]B_{i,o_t},\quad i=1,2,\dots,N} \\ \\ {\Phi_t(i)=\arg \max_{1\leq j\leq N}[\delta_{t-1}(j)A_{j,i}],\quad i=1,2,\dots,N} \end{array} δt(i)=max1≤j≤N[δt−1(j)Aj,i]Bi,ot,i=1,2,…,NΦt(i)=argmax1≤j≤N[δt−1(j)Aj,i],i=1,2,…,N - 计算时刻
T
T
T最大的
δ
T
(
i
)
\delta_T(i)
δT(i)即为最可能隐藏状态序列出现的概率,计算时刻
T
T
T最大的
Φ
T
(
i
)
\Phi_T(i)
ΦT(i),即为最可能的隐藏状态
i T ∗ = arg max 1 ≤ j ≤ N δ T ( i ) i^*_T=\arg \max_{1\leq j\leq N}\delta_{T}(i) iT∗=arg1≤j≤NmaxδT(i) - 利用局部状态
Φ
(
i
)
\Phi(i)
Φ(i)开始回溯,对于时刻
t
=
T
−
1
,
T
−
2
,
…
,
1
t=T-1,T-2,\dots,1
t=T−1,T−2,…,1:
i t ∗ = Φ t + 1 ( i t + 1 ∗ ) i^*_t=\Phi_{t+1}(i^*_{t+1}) it∗=Φt+1(it+1∗) - 最终得到最有可能的隐藏状态序列
I
∗
=
{
i
1
∗
,
i
2
∗
,
…
,
i
T
∗
}
\mathcal{I}^*=\{i^{*}_1,i^{*}_2,\dots,i^{*}_T\}
I∗={i1∗,i2∗,…,iT∗}
实现代码
def Viterbi(pi, a, b, obs):
'''
pi:初始概率分布
a:状态转移矩阵
b:发射矩阵
obs:观测序列
'''
nStates = np.shape(b)[0]
T = np.shape(obs)[0]
path = np.zeros(T, dtype=np.int32)
delta = np.zeros((nStates,T))
phi = np.zeros((nStates,T))
delta[:,0] = pi * b[:,obs[0]]
phi[:,0] = 0
for t in range(1,T):
for s in range(nStates):
delta[s,t] = np.max(delta[:,t-1]*a[:,s])*b[s,obs[t]]
phi[s,t] = np.argmax(delta[:,t-1]*a[:,s])
path[-1] = np.argmax(delta[:,-1])
for t in range(T-2,-1,-1):
path[t] = phi[path[t+1],t+1]
return path
最后解码得到的序列即为path
Baum-Welch 算法
单观测序列
输入:
1
1
1个观测序列样本
O
=
{
o
1
,
o
2
,
…
,
o
T
}
\mathcal{O}=\{o_1,o_2,\dots,o_T\}
O={o1,o2,…,oT}
输出:模型参数
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ=(A,B,Π)
步骤:
- 随机初始化或手动初始化 A , B , Π A,B,\Pi A,B,Π
- E 步,求解两个中间变量
γ
t
(
i
)
,
ξ
t
(
i
,
j
)
\gamma_{t}(i), \xi_{t}(i, j)
γt(i),ξt(i,j),两者含义如下
- γ t ( i ) \gamma_{t}(i) γt(i):给定模型 λ \lambda λ和观测序列 O \mathcal{O} O,在时刻 t t t的隐含状态为 q i q_i qi的概率,即 γ t ( i ) = P ( s t = q i ∣ O , λ ) \gamma_{t}(i)=P(s_{t}=q_{i} | O, \lambda) γt(i)=P(st=qi∣O,λ)
-
ξ
t
(
i
,
j
)
\xi_{t}(i, j)
ξt(i,j):给定模型
λ
\lambda
λ和观测序列
O
\mathcal{O}
O,在时刻
t
t
t的隐含状态为
q
i
q_i
qi,时刻
t
+
1
t+1
t+1的隐含状态为
q
j
q_j
qj的概率,即
ξ
t
(
i
,
j
)
=
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
∣
O
,
λ
)
\xi_{t}(i, j)=P(s_{t}=q_{i}, s_{t+1}=q_{j} | O, \lambda)
ξt(i,j)=P(st=qi,st+1=qj∣O,λ)
结合前面的前向概率和后向概率的定义,计算这两个中间变量的公式如下:
γ t ( i ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) ξ t ( i , j ) = α t ( i ) A i j B j , o t + 1 β t + 1 ( j ) ∑ p = 1 N ∑ q = 1 N α t ( p ) A p q B q , o t + 1 β t + 1 ( q ) \begin{array}{c}{\gamma_{t}(i)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{t}(j) \beta_{t}(j)}} \\ \\ {\xi_{t}(i, j)=\frac{\alpha_{t}(i) A_{i j} B_{j,o_{t+1}} \beta_{t+1}(j)}{\sum_{p=1}^{N} \sum_{q=1}^{N} \alpha_{t}(p) A_{pq} B_{q,o_{t+1}} \beta_{t+1}(q)}}\end{array} γt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)ξt(i,j)=∑p=1N∑q=1Nαt(p)ApqBq,ot+1βt+1(q)αt(i)AijBj,ot+1βt+1(j)
- M 步,通过 E 步求解出的两个中间变量来求解模型参数,求解公式如下:
A i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) B i j = ∑ t = 1 T γ t ( i ) I ( o t = v j ) ∑ t = 1 T γ t ( i ) π i = γ 1 ( i ) \begin{array}{c} {A_{i j}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)}} \\ \\ {B_{ij}=\frac{\sum_{t=1}^{T} \gamma_{t}(i) I\left(o_{t}=v_{j}\right)}{\sum_{t=1}^{T} \gamma_{t}(i)}} \\ \\ {\pi_{i}=\gamma_{1}(i)} \end{array} Aij=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)Bij=∑t=1Tγt(i)∑t=1Tγt(i)I(ot=vj)πi=γ1(i)
上式中的 I ( o t = v j ) I(o_{t}=v_{j}) I(ot=vj)表示当时刻 t t t观测状态为 v k v_k vk 时, I ( o t = v j ) = 1 I(o_{t}=v_{j})=1 I(ot=vj)=1,否则 I ( o t = v j ) = 0 I(o_{t} = v_{j})=0 I(ot=vj)=0
多观测序列
输入:
D
D
D个观测序列样本
{
O
1
,
O
2
,
…
,
O
D
}
\{\mathcal{O}_1,\mathcal{O}_2,\dots,\mathcal{O}_D\}
{O1,O2,…,OD},其中
O
d
=
{
o
1
,
o
2
,
…
,
o
T
}
,
d
=
1
,
2
,
…
,
D
\mathcal{O}_d=\{o_1,o_2,\dots,o_T\},d=1,2,\dots,D
Od={o1,o2,…,oT},d=1,2,…,D
输出:模型参数
λ
=
(
A
,
B
,
Π
)
\lambda =(A,B,\Pi)
λ=(A,B,Π)
步骤:
- 随机初始化或手动初始化 A , B , Π A,B,\Pi A,B,Π
- E 步,为每个序列求解两个中间变量
γ
t
(
d
)
(
i
)
,
ξ
t
(
d
)
(
i
,
j
)
\gamma^{(d)}_{t}(i), \xi^{(d)}_{t}(i, j)
γt(d)(i),ξt(d)(i,j)
类似单观测序列,只不过每个序列各自用前向后向算法求出各自的 α ( d ) \alpha^{(d)} α(d)和 β ( d ) \beta^{(d)} β(d),求解公式如下:
γ t ( d ) ( i ) = α t ( d ) ( i ) β t ( d ) ( i ) ∑ j = 1 N α t ( d ) ( j ) β t ( d ) ( j ) ξ t ( d ) ( i , j ) = α t ( d ) ( i ) A i j B j , o t + 1 β t + 1 ( d ) ( j ) ∑ p = 1 N ∑ q = 1 N α t ( d ) ( p ) A p q B q , o t + 1 β t + 1 ( d ) ( q ) \begin{array}{c}{\gamma^{(d)}_{t}(i)=\frac{\alpha^{(d)}_{t}(i) \beta^{(d)}_{t}(i)}{\sum_{j=1}^{N} \alpha^{(d)}_{t}(j) \beta^{(d)}_{t}(j)}} \\ \\ {\xi^{(d)}_{t}(i, j)=\frac{\alpha^{(d)}_{t}(i) A_{i j} B_{j,o_{t+1}} \beta^{(d)}_{t+1}(j)}{\sum_{p=1}^{N} \sum_{q=1}^{N} \alpha^{(d)}_{t}(p) A_{pq} B_{q,o_{t+1}} \beta^{(d)}_{t+1}(q)}}\end{array} γt(d)(i)=∑j=1Nαt(d)(j)βt(d)(j)αt(d)(i)βt(d)(i)ξt(d)(i,j)=∑p=1N∑q=1Nαt(d)(p)ApqBq,ot+1βt+1(d)(q)αt(d)(i)AijBj,ot+1βt+1(d)(j) - M 步,通过 E 步求解出的两个中间变量来求解模型参数,同样类似单观测序列,只不过更新参数时需要考虑所有序列(即外层加上一个求和复方蒿),求解公式如下:
A i j = ∑ d = 1 D ∑ t = 1 T − 1 ξ t ( d ) ( i , j ) ∑ d = 1 D ∑ t = 1 T − 1 γ t ( d ) ( i ) B i j = ∑ d = 1 D ∑ t = 1 T γ t ( d ) ( i ) I ( o t = v j ) ∑ d = 1 D ∑ t = 1 T γ t ( d ) ( i ) π i = ∑ d = 1 D γ 1 ( d ) ( i ) D \begin{array}{c} {A_{i j}=\frac{\sum_{d=1}^D \sum_{t=1}^{T-1} \xi^{(d)}_{t}(i, j)}{\sum_{d=1}^D \sum_{t=1}^{T-1} \gamma^{(d)}_{t}(i)}} \\ \\ {B_{ij}=\frac{\sum_{d=1}^D \sum_{t=1}^{T} \gamma^{(d)}_{t}(i) I\left(o_{t}=v_{j}\right)}{\sum_{d=1}^D \sum_{t=1}^{T} \gamma^{(d)}_{t}(i)}} \\ \\ {\pi_{i}=\frac{\sum_{d=1}^D \gamma^{(d)}_{1}(i)}{D}} \end{array} Aij=∑d=1D∑t=1T−1γt(d)(i)∑d=1D∑t=1T−1ξt(d)(i,j)Bij=∑d=1D∑t=1Tγt(d)(i)∑d=1D∑t=1Tγt(d)(i)I(ot=vj)πi=D∑d=1Dγ1(d)(i)
同样的,上式中的 I ( o t = v j ) I(o_{t}=v_{j}) I(ot=vj)表示当时刻 t t t观测状态为 v k v_k vk 时, I ( o t = v j ) = 1 I(o_{t}=v_{j})=1 I(ot=vj)=1,否则 I ( o t = v j ) = 0 I(o_{t} = v_{j})=0 I(ot=vj)=0
实现代码(单观测序列)
def BaumWelch(obs, n_states, n_obs, pi=None, a=None, b=None, tol=1e-2, n_iter=10):
T = np.shape(obs)[0]
# initialize
if not pi:
pi = np.ones((n_states))/n_states
if not a:
a = np.random.rand(n_states, n_states)
if not b:
b = np.random.rand(n_states, n_obs)
xi = np.zeros((n_states, n_states, T))
nits = 0
while True:
nits += 1
old_a = a.copy()
old_b = b.copy()
alpha = HMMfwd(pi, a, b, obs)
beta = HMMbwd(a, b, obs)
gamma = alpha*beta
gamma /= gamma.sum(0)
# E-step
for t in range(T-1):
for i in range(n_states):
for j in range(n_states):
xi[i, j, t] = alpha[i,t]*beta[j,t+1]*a[i,j]*b[j,obs[t+1]]
xi[:,:,t] /= xi[:,:,t].sum()
# The last step has no b, beta in
for i in range(n_states):
for j in range(n_states):
xi[i,j,-1] = alpha[i,-1]*a[i,j]
xi[:,:,-1] /= xi[:,:,-1].sum()
# M-step
for i in range(n_states):
for j in range(n_states):
a[i,j] = xi[i,j,:-1].sum()/gamma[i,:-1].sum()
a /= a.sum(1, keepdims=True)
for i in range(n_states):
for j in range(n_obs):
found = (obs==j).nonzero()[0]
b[i,j] = gamma[i,found].sum()/gamma[i].sum()
b /= b.sum(1, keepdims=True)
pi = gamma[:, 0]
if np.linalg.norm(a - old_a) < tol and np.linalg.norm(b - old_b)< tol or nits > n_iter:
break
return pi, a, b
符号总结
| 符号 | 解释 | 符号 | 解释 |
|---|---|---|---|
| N N N | 可能的隐藏状态数 | M M M | 可能的观测状态数 |
| Q = { q 1 , q 2 , … , q N } \mathcal{Q}=\{q_1,q_2,\dots,q_N\} Q={q1,q2,…,qN} | 所有可能的隐藏状态集合 | V = { v 1 , v 2 , … , v M } \mathcal{V}=\{v_1,v_2,\dots,v_M\} V={v1,v2,…,vM} | 所有可能的观测状态的集合 |
| I = { s 1 , s 2 , … , s T } \mathcal{I}=\{s_1,s_2,\dots,s_T\} I={s1,s2,…,sT} | 真实隐藏状态 | O = { o 1 , o 2 , … , o T } \mathcal{O}=\{o_1,o_2,\dots,o_T\} O={o1,o2,…,oT} | 真实观测序列 |
| A i j A_{ij} Aij | 从当前时刻 t t t的状态 s i s_i si转移到下一时刻 t + 1 t+1 t+1的状态 s j s_j sj的概率 | B i j B_{ij} Bij | 由隐藏状态 s i s_i si生成观测状态 q j q_j qj的概率 |
| Π t = [ π t ( k ) ] \Pi_t=[\pi_t (k)] Πt=[πt(k)] | π t ( k ) = P ( s t = q k ) \pi_t (k)=P(s_t=q_k) πt(k)=P(st=qk)代表 t t t时刻的隐藏状态为 q k q_k qk的概率,一般忽略时刻 t t t简写为 π k \pi_k πk | λ \lambda λ | λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π)为模型参数 |
| δ t ( i ) \delta_t(i) δt(i) | 表示 t t t时刻隐状态的取值 s t = q i s_t=q_i st=qi,观测状态为 o t o_t ot的最大概率 | Φ t ( i ) \Phi_t(i) Φt(i) | 表示 t t t时刻隐藏状态为 q i q_i qi,观测状态为 o t o_t ot的最大概率,是由上一时刻哪一个隐藏状态转移而来的 |
| α t ( i ) \alpha_t(i) αt(i) | t t t时刻隐藏状态为 q i q_i qi且 1 1 1时刻到 t t t时刻的观测序列为 o 1 , o 2 , … , o t o_{1},o_{2},\dots,o_{t} o1,o2,…,ot的前向概率 | β t ( i ) \beta_t(i) βt(i) | t t t时刻隐藏状态为 q i q_i qi且 t + 1 t+1 t+1时刻到 T T T时刻的观测序列为 o t + 1 , o t + 2 , … , o T o_{t+1},o_{t+2},\dots,o_{T} ot+1,ot+2,…,oT的后向概率 |
| γ t ( i ) \gamma_{t}(i) γt(i) | 给定模型 λ \lambda λ和观测序列 O \mathcal{O} O,在时刻 t t t的隐含状态为 q i q_i qi的概率,即 γ t ( i ) = P ( s t = q i ∥ O , λ ) \gamma_{t}(i)=P(s_{t}=q_{i} \| O, \lambda) γt(i)=P(st=qi∥O,λ) | ξ t ( i , j ) \xi_{t}(i, j) ξt(i,j) | 给定模型 λ \lambda λ和观测序列 O \mathcal{O} O,在时刻 t t t的隐含状态为 q i q_i qi,时刻 t + 1 t+1 t+1的隐含状态为 q j q_j qj的概率,即 ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j ∥ O , λ ) \xi_{t}(i, j)=P(s_{t}=q_{i}, s_{t+1}=q_{j} \| O, \lambda) ξt(i,j)=P(st=qi,st+1=qj∥O,λ) |
| I ( o t = v j ) I(o_{t}=v_{j}) I(ot=vj) | 当时刻 t t t观测状态为 v k v_k vk 时, I ( o t = v j ) = 1 I(o_{t}=v_{j})=1 I(ot=vj)=1,否则 I ( o t = v j ) = 0 I(o_{t} = v_{j})=0 I(ot=vj)=0 |
hmmlearn
hmmlearn 是一个用于隐马尔可夫模型无监督学习和推理的算法库,其提供了易于使用的类似于sklearn的API。
安装
pip install hmmlearn
注意:前提是要有gcc编译器,因为部分源代码由C编译而成。
当前版本:V 0.2.2
使用
这里只介绍通用的离散HMM模型的使用(序列是离散的)
- 建立一个HMM模型
from hmmlearn import hmm
model = hmm.MultinomialHMM(n_components=n_states, tol=1e-2, n_iter=10, init_params='')
model.startprob_ = init_pi
model.transmat_ = init_T
model.emissionprob_ = init_E
如果是自己初始化的话,init_params='',否则模型随机初始化(默认)init_params='ste'代表初始化三个参数,并且需要手动设定model.n_features = n_obs
-
hmm.MultinomialHMM相关参数-
n_components: int
Number of states. (隐状态数) -
startprob_prior: array, shape (n_components, ), optional
Parameters of the Dirichlet prior distribution for
:attr:startprob_. -
transmat_prior: array, shape (n_components, n_components), optional
Parameters of the Dirichlet prior distribution for each row
of the transition probabilities :attr:transmat_. -
algorithm: string, optional
Decoder algorithm. Must be one of “viterbi” or “map”.
Defaults to “viterbi”. -
random_state: RandomState or an int seed, optional
A random number generator instance. -
n_iter: int, optional
Maximum number of iterations to perform. -
tol: float, optional
Convergence threshold. EM will stop if the gain in log-likelihood
is below this value. -
verbose: bool, optional
WhenTrueper-iteration convergence reports are printed
to :data:sys.stderr. You can diagnose convergence via the
:attr:monitor_attribute. -
params: string, optional
Controls which parameters are updated in the training
process. Can contain any combination of ‘s’ for startprob,
‘t’ for transmat, ‘e’ for emissionprob.
Defaults to all parameters. -
init_params: string, optional
Controls which parameters are initialized prior to
training. Can contain any combination of ‘s’ for
startprob, ‘t’ for transmat, ‘e’ for emissionprob.
Defaults to all parameters.
-
-
计算 P ( O ∣ λ ) P(\mathcal{O}|\lambda) P(O∣λ)(问题1)
model.score(obs.T)
- Viterbi 算法解码(问题2)
model.decode(obs.T)
- Baum-Welch 算法训练(问题3)
model.fit(obs.T)
常见错误
- Expected 2D array, got 1D array instead
输入的obs序列需要是一个[n_sample,n_obs]的二维数组,如果只有一个观测序列,需要将其扩展为二维,使用np.atleast_2d(obs) - Buffer dtype mismatch, expected ‘dtype_t’ but got 'float’
由于hmmlearn库在前向传播的时候调用的是C编译的代码,因此如果传入的变量类型不为float64则会报如上错误。解决办法:将传入的初始化变量类型改为np.float64。 - arrays used as indices must be of integer (or boolean) type
传入的obs序列需要为整型 - expected a sample from a Multinomial distribution.
源码的解释是:Xshould be an array of non-negative integers from range[min(X), max(X)], such that each integer from the range occurs inXat least once. For example[0, 0, 2, 1, 3, 1, 1]is a valid sample from a Multinomial distribution, while[0, 0, 3, 5, 10]is not.
也就是说传入的序列需要每个obs都至少出现一次,但这在大多数情况下不太可能,官方GitHub的解决办法是:
>>> from sklearn.preprocessing import LabelEncoder
>>> LabelEncoder().fit_transform([0, 1, 5, 10])
array([0, 1, 2, 3])
但是这样重新编码不就是失去了一些状态?因此我的解决办法是,注释掉hmm.py第407行左右的self._check_input_symbols(X)语句,不让它对输入进行检查,然后对于第436行可能出错的代码进行一点小改进
def _compute_log_likelihood(self, X):
return np.log(self.emissionprob_)[:, np.concatenate(X)].T
增加一个微小常数,改为
def _compute_log_likelihood(self, X):
return np.log(self.emissionprob_ + 1e-10)[:, np.concatenate(X)].T

这篇博客详细介绍了隐马尔科夫模型(HMM)的定义、基本问题,包括评估观测序列概率、预测问题和模型参数学习问题。文中通过前向算法、后向算法、Viterbi算法和Baum-Welch算法的流程和代码实现,帮助读者深入理解HMM。此外,还提及了hmmlearn库的安装、使用和常见错误。


3856

被折叠的 条评论
为什么被折叠?



