1.前向和后向概率的关系
(1)前向概率:
α
t
(
i
)
=
P
(
y
1
,
y
2
,
⋯
y
t
,
q
t
=
i
∣
λ
)
\alpha_{t}(i)=P\left(y_{1}, y_{2}, \cdots y_{t}, q_{t}=i | \lambda\right)
αt(i)=P(y1,y2,⋯yt,qt=i∣λ)
(2)后向概率:
β
t
(
i
)
=
P
(
y
t
+
1
,
y
t
+
2
,
⋯
,
y
T
∣
q
t
=
i
,
λ
)
\beta_{t}(i)=P\left(y_{t+1}, y_{t+2}, \cdots, y_{T} | q_{t}=i, \lambda\right)
βt(i)=P(yt+1,yt+2,⋯,yT∣qt=i,λ)
(3)关系:
P
(
i
t
=
q
i
,
O
∣
λ
)
=
P
(
O
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
,
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
∣
i
t
=
q
i
,
λ
)
P
(
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
\begin{aligned} &P\left(i_{t}=q_{i}, O | \lambda\right)\\ &=P\left(\left.O\right|{i_{t}}=q_{i}, \lambda\right) P\left(i_{t}=q_{i} | \lambda\right)\\ &=P\left(o_{1}, \cdots o_{t}, o_{t+1}, \cdots o_{T} | i_{t}=q_{i}, \lambda\right) P\left(i_{t}=q_{i} | \lambda\right)\\ &=P\left(o_{1}, \cdots o_{t} | i_{t}=q_{i}, \lambda\right) P\left(o_{t+1}, \cdots o_{T} | i_{t}=q_{i}, \lambda\right) P\left(i_{t}=q_{i} | \lambda\right)\\ &=P\left(o_{1}, \cdots o_{t}, i_{t}=q_{i} | \lambda\right) P\left(o_{t+1},\left.\cdots o_{T}\right|i_{t}=q_{i}, \lambda\right)\\ &=\alpha_{t}(i) \beta_{t}(i) \end{aligned}
P(it=qi,O∣λ)=P(O∣it=qi,λ)P(it=qi∣λ)=P(o1,⋯ot,ot+1,⋯oT∣it=qi,λ)P(it=qi∣λ)=P(o1,⋯ot∣it=qi,λ)P(ot+1,⋯oT∣it=qi,λ)P(it=qi∣λ)=P(o1,⋯ot,it=qi∣λ)P(ot+1,⋯oT∣it=qi,λ)=αt(i)βt(i)
2.单个状态的概率
给定模型
λ
\lambda
λ以及观测序列
O
O
O,在时刻t处于状态
q
i
q_i
qi的概率,记:
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
\gamma_{t}(i)=P\left(i_{t}=q_{i} | O, \lambda\right)
γt(i)=P(it=qi∣O,λ)
根据前向后向概率的定义:
P
(
i
t
=
q
i
,
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
i
=
1
N
α
t
(
i
)
β
t
(
i
)
\begin{array}{c} P\left(i_{t}=q_{i}, O | \lambda\right)=\alpha_{t}(i) \beta_{t}(i) \\ \gamma_{t}(i)=P\left(i_{t}=q_{i} | O, \lambda\right)=\frac{P\left(i_{t}=q_{i}, O | \lambda\right)}{P(O | \lambda)} \\ \gamma_{t}(i)=\frac{\alpha_{t}(i) \beta_{t}(i)}{P(O | \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{t}(i) \beta_{t}(i)} \end{array}
P(it=qi,O∣λ)=αt(i)βt(i)γt(i)=P(it=qi∣O,λ)=P(O∣λ)P(it=qi,O∣λ)γt(i)=P(O∣λ)αt(i)βt(i)=∑i=1Nαt(i)βt(i)αt(i)βt(i)
γ
\gamma
γ的意义:
在每个时刻t选择在该时刻最有可能出现的状态
1
^
t
∗
\hat{\mathbf{1}}_{\mathbf{t}}^{*}
1^t∗,从而得到一个状态序列
I
∗
=
{
i
1
∗
,
i
2
∗
⋯
i
T
∗
}
I^{*}=\left\{i_{1}^{*}, i_{2}^{*} \cdots i_{\mathrm{T}}^{*}\right\}
I∗={i1∗,i2∗⋯iT∗},将他作为预测的结果。
给定模型和观测序列,时刻t处于
q
i
q_i
qi的概率为:
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
t
=
1
N
α
t
(
i
)
β
t
(
i
)
\gamma_{t}(i)=\frac{\alpha_{t}(i) \beta_{t}(i)}{P(O | \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{t=1}^{N} \alpha_{t}(i) \beta_{t}(i)}
γt(i)=P(O∣λ)αt(i)βt(i)=∑t=1Nαt(i)βt(i)αt(i)βt(i)
3.两个状态的概率
ξ
t
(
i
,
j
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
O
,
λ
)
=
P
(
i
t
=
q
t
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
O
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
∑
i
=
1
N
∑
j
=
1
N
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
=
α
t
(
i
)
a
i
j
b
j
o
t
1
β
t
+
1
(
j
)
\begin{array}{c} \xi_{t}(i, j)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} | O, \lambda\right) \\ =\frac{P\left(i_{t}=q_{t}, i_{t+1}=q_{j}, O | \lambda\right)}{P(O | \lambda)} \\ =\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda\right)}{\sum_{i=1}^{N} \sum_{j=1}^{N} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda\right)} \\ P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O | \lambda\right)=\alpha_{t}(i) a_{i j} b_{j o_{t 1}} \beta_{t+1}(j) \end{array}
ξt(i,j)=P(it=qi,it+1=qj∣O,λ)=P(O∣λ)P(it=qt,it+1=qj,O∣λ)=∑i=1N∑j=1NP(it=qi,it+1=qj,O∣λ)P(it=qi,it+1=qj,O∣λ)P(it=qi,it+1=qj,O∣λ)=αt(i)aijbjot1βt+1(j)
4.期望
在观测O下状态i出现的期望:
∑
t
=
1
T
γ
t
(
i
)
\sum_{t=1}^{T} \gamma_{t}(i)
t=1∑Tγt(i)
在观测O下状态i转移到状态j的期望:
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
\sum_{t=1}^{T-1} \xi_{t}(i, j)
t=1∑T−1ξt(i,j)
5.学习算法:
若训练数据包含观测序列和状态序列,则HMM的学习非常简单,是监督学习,若训练数据只有观测序列,则HMM的学习需要使用EM算法,是非监督学习。
假设已给定训练数据包含S个长度相同的观测序列和对应的观测序列
{
(
O
1
,
I
1
)
,
(
O
2
,
I
2
)
…
(
O
s
,
I
s
)
}
\left\{\left(\mathrm{O}_{1}, \mathrm{I}_{1}\right),\left(\mathrm{O}_{2}, \mathrm{I}_{2}\right) \ldots\right. \left.\left(O_{s}, I_{s}\right)\right\}
{(O1,I1),(O2,I2)…(Os,Is)},那么,可以直接利用Bernoulli大数定理的结论“频率的极限是概率”,给出HMM的参数估计。
(1)监督学习:
初始概率:
π
^
i
=
∣
q
i
∣
∑
i
∣
q
i
∣
\hat{\pi}_{i}=\frac{\left|q_{i}\right|}{\sum_{i}\left|q_{i}\right|}
π^i=∑i∣qi∣∣qi∣
转移概率:
a
^
i
j
=
∣
q
i
j
∣
∑
j
=
1
N
∣
q
i
j
∣
\hat{a}_{i j}=\frac{\left|q_{i j}\right|}{\sum_{j=1}^{N}\left|q_{i j}\right|}
a^ij=∑j=1N∣qij∣∣qij∣
观测概率:
b
^
i
k
=
∣
s
i
k
∣
∑
k
=
1
M
∣
s
i
k
∣
\hat{b}_{i k}=\frac{\left|s_{i k}\right|}{\sum_{k=1}^{M}\left|s_{i k}\right|}
b^ik=∑k=1M∣sik∣∣sik∣
(2)Baum-Welch算法
所有观测数据写成
O
=
(
o
1
,
o
2
…
o
T
)
\mathrm{O}=\left(\mathrm{o}_{1}, \mathrm{o}_{2} \dots \mathrm{o}_{\mathrm{T}}\right)
O=(o1,o2…oT),所有隐数据写成
I
=
(
i
1
,
i
2
…
i
T
)
\mathrm{I}=\left(\mathrm{i}_{1}, \mathrm{i}_{2} \dots \mathrm{i}_{\mathrm{T}}\right)
I=(i1,i2…iT),完全数据是
(
O
,
I
)
=
(
o
1
,
o
2
…
o
T
,
i
1
,
i
2
…
i
T
)
(\mathrm{O}, \mathrm{I})=\left(\mathrm{o}_{1}, \mathrm{o}_{2} \dots \mathrm{o}_{\mathrm{T}}, \mathrm{i}_{1}, \mathrm{i}_{2} \dots \mathrm{i}_{\mathrm{T}}\right)
(O,I)=(o1,o2…oT,i1,i2…iT),完全数据的对数似然是
ln
P
(
O
,
I
∣
λ
)
\ln \mathrm{P}(\mathrm{O}, \mathrm{I} | \lambda)
lnP(O,I∣λ)
假设
λ
ˉ
\bar{\lambda}
λˉ是HMM参数当前的估计值,
λ
\lambda
λ是当前的参数。
Q
(
λ
,
λ
ˉ
)
=
∑
I
(
ln
P
(
O
,
I
∣
λ
)
)
P
(
I
∣
O
,
λ
ˉ
)
=
∑
I
ln
P
(
O
,
I
∣
λ
)
P
(
O
,
I
∣
λ
ˉ
)
P
(
O
,
λ
ˉ
)
∝
∑
I
ln
P
(
O
,
I
∣
λ
)
P
(
O
,
I
∣
λ
ˉ
)
\begin{aligned} &Q(\lambda, \bar{\lambda})=\sum_{I}(\ln P(O, I | \lambda)) P(I | O, \bar{\lambda})\\ &=\sum_{I} \ln P(O, I | \lambda) \frac{P(O, I | \bar{\lambda})}{P(O, \bar{\lambda})}\\ &\propto \sum_{I} \ln P(O, I | \lambda) P(O, I | \bar{\lambda}) \end{aligned}
Q(λ,λˉ)=I∑(lnP(O,I∣λ))P(I∣O,λˉ)=I∑lnP(O,I∣λ)P(O,λˉ)P(O,I∣λˉ)∝I∑lnP(O,I∣λ)P(O,I∣λˉ)
EM过程:
HMM(2)
最新推荐文章于 2022-12-17 13:10:53 发布