【20210920】HMM入门

本文参考的视频链接

首先要知道什么式序列(Series),什么是集合(Set)

时间序列模型 Discrete Dynamic Model: Hidden Markov Model

P ( X t ∣ X t − 1 , X t − 2 … . X 1 ) = P ( X t ∣ X t − 1 ) (1) \begin{aligned} & P\left(X_{t} | X_{t-1}, X_{t-2} \ldots . X_{1}\right) \\ =& P\left(X_{t} \mid X_{t-1}\right)\tag{1} \end{aligned} =P(XtXt1,Xt2.X1)P(XtXt1)(1)

1. 马尔可夫过程简介

在这里插入图片描述

在我们知道一系列隐状态之后,我们的观测都是独立的

股市中的箭头的数值指的就是式(1)的概率值

为了所有的符号一致,现在把所有的隐状态记为q:

  • p ( q t ∣ q t − 1 ) p(q_t|q_{t-1}) p(qtqt1)→transition probability(转移概率,在HMM里面,一定是离散的
  • p ( y t ∣ q t ) p(y_t|q_t) p(ytqt)→emission/measurement probability(发射概率,并不一定是离散的)

这两个概率决定了HMM模型

在语音里面的应用如下所示(音标是隐变量)

在这里插入图片描述
HMM图模型
在这里插入图片描述
知道隐状态之后,观测都是独立的!

在这里插入图片描述

2. {A、B、 π \pi π}

在HMM里面,transition probability用一个矩阵(k×k)来表示:

在这里插入图片描述

  • 我们用一个 A k × k A_{k×k} Ak×k的矩阵代表 p ( q t ∣ q t − 1 ) p(q_t|q_{t-1}) p(qtqt1)
  • 假设 p ( y t ∣ q t ) p(y_t|q_t) p(ytqt)是离散的,我们用一个 B k × L B_{k×L} Bk×L来表示

在这里插入图片描述
现在思考,是否只有 λ = { A , B } \lambda=\{A,B\} λ={A,B}便可以描述HMM模型

所以请思考,怎么计算股票观测到 P ( y 1 = u p , y 2 = u p , y 3 = d o w n ) P(y_1=up,y_2=up,y_3=down) P(y1=up,y2=up,y3=down)的概率?

我们知道: P ( x ) = ∫ y P ( x , y ) d y P(x)=\int_{y} P(x, y) d y P(x)=yP(x,y)dy
所以原概率可以转化为:
P ( y 1 , y 2 , y 3 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 1 , y 2 , y 3 , q 1 , q 2 , q 3 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 3 ∣ y 1 , y 2 , q 1 , q 2 , q 3 ) × p ( y 1 , y 2 , q 1 , q 2 , q 3 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 3 ∣ q 3 ) × p ( y 1 , y 2 , q 1 , q 2 , q 3 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 3 ∣ q 3 ) × p ( q 3 ∣ y 1 , y 2 , q 1 , q 2 ) × ( y 1 , y 2 , q 1 , q 2 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 3 ∣ q 3 ) × p ( q 3 ∣ q 2 ) × ( y 1 , y 2 , q 1 , q 2 ) = ∑ q 1 = 1 k ∑ q 2 = 1 k ∑ q 3 = 1 k p ( y 3 ∣ q 3 ) × p ( q 3 ∣ q 2 ) × p ( y 2 ∣ q 2 ) × p ( q 2 ∣ q 1 ) × p ( y 1 ∣ q 1 ) × p ( q 1 ) \begin{aligned} P\left(y_{1}, y_{2}, y_{3}\right)&=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k} p\left(y_{1}, y_{2}, y_{3}, q_{1}, q_{2}, q_{3}\right)\\ &=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k}p(y_3|y_{1}, y_{2}, q_{1}, q_{2}, q_{3})\times p(y_{1}, y_{2}, q_{1}, q_{2}, q_{3})\\ &=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k}p(y_3|q_{3})\times p(y_{1}, y_{2}, q_{1}, q_{2}, q_{3})\\ &=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k}p(y_3|q_{3})\times p(q_{3}|y_{1}, y_{2}, q_{1}, q_{2})\times ({y_{1}, y_{2}, q_{1}, q_{2}})\\ &=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k}p(y_3|q_{3})\times p(q_{3}|q_{2})\times ({y_{1}, y_{2}, q_{1}, q_{2}})\\ &=\sum_{q_1=1}^{k} \sum_{q_2=1}^{k} \sum_{q_{3}=1}^{k}p(y_3|q_{3})\times p(q_{3}|q_{2})\times p(y_2|q_{2})\times p(q_{2}|q_{1})\times p(y_1|q_{1})\times p(q_{1}) \end{aligned} P(y1,y2,y3)=q1=1kq2=1kq3=1kp(y1,y2,y3,q1,q2,q3)=q1=1kq2=1kq3=1kp(y3y1,y2,q1,q2,q3)×p(y1,y2,q1,q2,q3)=q1=1kq2=1kq3=1kp(y3q3)×p(y1,y2,q1,q2,q3)=q1=1kq2=1kq3=1kp(y3q3)×p(q3y1,y2,q1,q2)×(y1,y2,q1,q2)=q1=1kq2=1kq3=1kp(y3q3)×p(q3q2)×(y1,y2,q1,q2)=q1=1kq2=1kq3=1kp(y3q3)×p(q3q2)×p(y2q2)×p(q2q1)×p(y1q1)×p(q1)
所以现在还差了一个 p ( q 1 ) p(q_1) p(q1),所以现在需要一个初始状态的概率,所以我们还需要一个参数 π \pi π
在这里插入图片描述

HMM三个主要的操作如下:
 Evaluate  p ( Y ∣ λ ) λ MLE  = arg ⁡ max ⁡ λ p ( Y ∣ λ ) arg ⁡ max ⁡ Q p ( Y ∣ Q , λ ) \begin{aligned} &\text { Evaluate } p(Y \mid \lambda) \\ &\lambda_{\text {MLE }}=\underset{\lambda}{\arg \max } p(Y \mid \lambda) \\ &\underset{Q}{\arg \max } p(Y \mid Q, \lambda) \end{aligned}  Evaluate p(Yλ)λMLE =λargmaxp(Yλ)Qargmaxp(YQ,λ)
下面首先讨论Evaluation

如何应用HMM,以语音识别为例,例如找50个人说同样的cat或者dog…

λ c a t = arg max ⁡ λ l o g P ( y ( 1 ) ( 1 ) , y ( 2 ) ( 2 ) . . . ∣ λ ) \lambda_{cat} = \argmax_{\lambda} logP(y_{(1)}^{(1)},y_{(2)}^{(2)}...|\lambda) λcat=λargmaxlogP(y(1)(1),y(2)(2)...λ)
之后我们想要干嘛?语音识别,有个人说了一段从来没有听说过的录音,现在可以去评估说哪个单词的概率最高,即Evaluate

下面来看Evaluation,即有了 λ \lambda λ来估计观测值:
P ( y 1 . . . y T ∣ λ ) P\left(y_{1} ... y_{T} \mid \lambda\right) P(y1...yTλ)

= ∑ q 1 = 1 k ∑ q 2 = 1 k … ∑ q T = 1 k P ( y 1 … . y T , q 1 … q T ) ⏟ =\sum_{q_{1}=1}^{k} \sum_{q_{2}=1}^{k} \ldots \sum_{q_{T}=1}^{k} \underbrace{P\left(y_{1} \ldots . y_{T}, q_{1} \ldots q_{T}\right)} =q1=1kq2=1kqT=1k P(y1.yT,q1qT)

通常计算的方法如下所示:通常计算的方法如下所示:
p ( Y ∣ λ ) = ∑ Q [ p ( Y , Q ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k [ p ( y 1 , … , y T , q 1 , … q T ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k [ p ( y 1 , … , y T , q 0 , q 1 , … q T ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k p ( q 1 ) p ( y 1 ∣ q 1 ) p ( q 2 ∣ q 1 ) … p ( q t ∣ q t − 1 ) p ( y t ∣ q t ) = ∑ q 1 = 1 k … , ∑ q T = 1 k π ( q 1 ) ∏ t = 2 T a q t − 1 , q t ∏ t = 1 T b q t ( y t ) \begin{aligned} p(Y \mid \lambda) &=\sum_{Q}[p(Y, Q \mid \lambda)]=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k}\left[p\left(y_{1}, \ldots, y_{T}, q_{1}, \ldots q_{T} \mid \lambda\right)\right] \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k}\left[p\left(y_{1}, \ldots, y_{T}, q_{0}, q_{1}, \ldots q_{T} \mid \lambda\right)\right] \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k} p\left(q_{1}\right) p\left(y_{1} \mid q_{1}\right) p\left(q_{2} \mid q_{1}\right) \ldots p\left(q_{t} \mid q_{t-1}\right) p\left(y_{t} \mid q_{t}\right) \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k} \pi\left(q_{1}\right) \prod_{t=2}^{T} a_{q_{t-1}, q_{t}} \prod_{t=1}^{T}b_{q_{t}}\left(y_{t}\right) \end{aligned} p(Yλ)=Q[p(Y,Qλ)]=q1=1k,qT=1k[p(y1,,yT,q1,qTλ)]=q1=1k,qT=1k[p(y1,,yT,q0,q1,qTλ)]=q1=1k,qT=1kp(q1)p(y1q1)p(q2q1)p(qtqt1)p(ytqt)=q1=1k,qT=1kπ(q1)t=2Taqt1,qtt=1Tbqt(yt)

其中,定义转移概率为转移概率矩阵的第 i i i行,第 j j j列: p ( q t = j ∣ q t − 1 = i ) ≡ a i , j p\left(q_{t}=j \mid q_{t-1}=i\right) \equiv a_{i, j} p(qt=jqt1=i)ai,j,同时定义测量概率为: p ( y t ∣ q t = j ) ≡ b j ( y t ) p\left(y_{t} \mid q_{t}=j\right) \equiv b_{j}\left(y_{t}\right) p(ytqt=j)bj(yt)。注意到这里有 k T k^T kT个可能的 Q Q Q值,所以我们需要更简单的方法。

3. 引入 α , β \alpha,\beta α,β便于Evaluate

存在的问题,运算量太大了!所以这里假设一个 α \alpha α β \beta β
在这里插入图片描述
联合概率:
α i ( t ) = p ( y 1 , y 2 , … y t , q t = i ) \alpha_{i}(t)=p\left(y_{1}, y_{2}, \ldots y_{t}, q_{t}=i \right) αi(t)=p(y1,y2,yt,qt=i)
按照以上概率:
α i ( 1 ) = p ( y 1 , q 1 = i ) = p ( y 1 ∣ q 1 = i ) × p ( q 1 = i ) = b i ( y 1 ) ⋅ π ( q 1 ) \alpha_{i}(1)=p\left(y_{1}, q_{1}=i \right)=p(y_1|q_{1}=i)\times p(q_{1}=i)=b_i(y_1) \cdot \pi(q_1) αi(1)=p(y1,q1=i)=p(y1q1=i)×p(q1=i)=bi(y1)π(q1)

对于 α i ( 2 ) \alpha_i(2) αi(2),由于 P ( y 2 ∣ q 2 = j ) P\left(y_{2} \mid q_{2}=j\right) P(y2q2=j)一项与 i i i没有关系:

α i ( 2 ) = p ( y 1 , y 2 , q 2 = j ) = ∑ i = 1 k p ( y 1 , y 2 , q 1 = i , q 2 = j ) = ∑ i = 1 k p ( y 2 ∣ q 2 = j ) ⋅ p ( q 1 = j ∣ q 1 = i ) ⋅ p ( y 1 , q 1 = i ) ⏟ α i ( 1 ) = p ( y 2 ∣ q 2 = j ) ⏟ b j ( y 2 ) ∑ i = 1 k p ( q 1 = j ∣ q 1 = i ) ⏟ a i , j α i ( 1 ) = b j ( y 2 ) ∑ i = 1 k a i , j α i ( 1 ) . . . α j ( t + 1 ) = [ ∑ i = 1 k α i ( t ) a i , j ] b j ( y t + 1 ) . . . α i ( T ) = b j ( y T ) [ ∑ i = 1 k a i , j α i ( T − 1 ) ] \begin{aligned} \alpha_{i}(2)&=p(y_1,y_2,q_2=j)\\ &=\sum_{i=1}^{k}p(y_1,y_2,q_1=i,q_2=j)\\ &=\sum_{i=1}^{k}p\left(y_{2} \mid q_{2}=j\right) \cdot p\left(q_{1}=j \mid q_1=i\right) \cdot \underbrace{p\left(y_{1}, q_{1}=i\right)}_{\alpha_{i}(1)}\\ &=\underbrace{p\left(y_{2} \mid q_{2}=j\right)}_{b_j(y_2)}\sum_{i=1}^{k}\underbrace{p\left(q_{1}=j \mid q_1=i\right)}_{a_{i,j}}\alpha_{i}(1)\\ &=b_j(y_2)\sum_{i=1}^{k}a_{i,j}\alpha_{i}(1)\\ &...\\ \alpha_{j}(t+1)&=\left[\sum_{i=1}^{k} \alpha_{i}(t) a_{i, j}\right] b_{j}\left(y_{t+1}\right)\\ &...\\ \alpha_{i}(T)&=b_{j}\left(y_{T}\right)\left[\sum_{i=1}^{k} a_{i, j} \alpha_{i}(T-1)\right] \end{aligned} αi(2)αj(t+1)αi(T)=p(y1,y2,q2=j)=i=1kp(y1,y2,q1=i,q2=j)=i=1kp(y2q2=j)p(q1=jq1=i)αi(1) p(y1,q1=i)=bj(y2) p(y2q2=j)i=1kai,j p(q1=jq1=i)αi(1)=bj(y2)i=1kai,jαi(1)...=[i=1kαi(t)ai,j]bj(yt+1)...=bj(yT)[i=1kai,jαi(T1)]

现在只有 k × T k\times T k×T个运算量而不是 K T K^T KT了,所以到此为止,我们定义了前向过程:
α i ( t ) = p ( y 1 , y 2 , … y t , q t = i ∣ λ ) ⟹ p ( Y ∣ λ ) = ∑ i = 1 k α i ( T ) \alpha_{i}(t)=p\left(y_{1}, y_{2}, \ldots y_{t}, q_{t}=i \mid \lambda\right) \Longrightarrow p(Y \mid \lambda)=\sum_{i=1}^{k} \alpha_{i}(T) αi(t)=p(y1,y2,yt,qt=iλ)p(Yλ)=i=1kαi(T)

这是最后在t时处于状态i部分序列 y 1 , … , y t y_1,\dots,y_t y1,,yt的概率。

现在的运算量只有KT而不是KT
P ( y 1 … y T ) = ∑ j = 1 k α j ( T ) {P\left(y_{1} \ldots y_{T}\right)}{=\sum_{j=1}^{k} \alpha_{j}(T)} P(y1yT)=j=1kαj(T)
在这里插入图片描述

在这里插入图片描述
以上讲解的都是Evaluate,现在讲怎么把 λ \lambda λ学出来

4. EM算法参数学习

EM算法:

θ ( g + 1 ) = argmax ⁡ θ ∫ Q log ⁡ P ( Y , Q ) ⋅ P ( Q ∣ Y , θ ( g ) ) ⋅ d Q \theta^{(g+1)}=\underset{\theta}{\operatorname{argmax}} \int_{Q} \log P(Y, Q) \cdot P\left(Q \mid Y ,\theta^{(g)}\right) \cdot d Q θ(g+1)=θargmaxQlogP(Y,Q)P(QY,θ(g))dQ
因为 P ( Q , Y ∣ θ ( g ) ) P\left(Q ,Y \mid\theta^{(g)}\right) P(Q,Yθ(g))= P ( Q ∣ Y , θ ( g ) ) ⋅ P ( Y ∣ θ ( g ) ) P\left(Q \mid Y ,\theta^{(g)}\right) \cdot P(Y \mid \theta^{(g)}) P(QY,θ(g))P(Yθ(g)),由于最后一项与 θ \theta θ没有关系,与 Q Q Q没有关系,这是一个常数,乘不乘对结果没有影响,所以我们写成:
θ ( g + 1 ) = argmax ⁡ θ ∫ Q log ⁡ P ( Y , Q ) ⋅ P ( Q , Y ∣ θ ( g ) ) ⋅ d Q \theta^{(g+1)}=\underset{\theta}{\operatorname{argmax}} \int_{Q} \log P(Y, Q) \cdot P\left(Q ,Y \mid\theta^{(g)}\right) \cdot d Q θ(g+1)=θargmaxQlogP(Y,Q)P(Q,Yθ(g))dQ
之前我们已经知道:
p ( Y ∣ λ ) = ∑ Q [ p ( Y , Q ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k [ p ( y 1 , … , y T , q 1 , … q T ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k [ p ( y 1 , … , y T , q 0 , q 1 , … q T ∣ λ ) ] = ∑ q 1 = 1 k … , ∑ q T = 1 k p ( q 1 ) p ( y 1 ∣ q 1 ) p ( q 2 ∣ q 1 ) … p ( q t ∣ q t − 1 ) p ( y t ∣ q t ) = ∑ q 1 = 1 k … , ∑ q T = 1 k π ( q 1 ) ∏ t = 2 T a q t − 1 , q t b q t ( y t ) \begin{aligned} p(Y \mid \lambda) &=\sum_{Q}[p(Y, Q \mid \lambda)]=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k}\left[p\left(y_{1}, \ldots, y_{T}, q_{1}, \ldots q_{T} \mid \lambda\right)\right] \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k}\left[p\left(y_{1}, \ldots, y_{T}, q_{0}, q_{1}, \ldots q_{T} \mid \lambda\right)\right] \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k} p\left(q_{1}\right) p\left(y_{1} \mid q_{1}\right) p\left(q_{2} \mid q_{1}\right) \ldots p\left(q_{t} \mid q_{t-1}\right) p\left(y_{t} \mid q_{t}\right) \\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k} \pi\left(q_{1}\right) \prod_{t=2}^{T} a_{q_{t-1}, q_{t}} b_{q_{t}}\left(y_{t}\right) \end{aligned} p(Yλ)=Q[p(Y,Qλ)]=q1=1k,qT=1k[p(y1,,yT,q1,qTλ)]=q1=1k,qT=1k[p(y1,,yT,q0,q1,qTλ)]=q1=1k,qT=1kp(q1)p(y1q1)p(q2q1)p(qtqt1)p(ytqt)=q1=1k,qT=1kπ(q1)t=2Taqt1,qtbqt(yt)
所以前一个式子即为:
θ ( g + 1 ) = argmax ⁡ θ ∫ Q log ⁡ [ p ( q 1 ) ∏ t = 2 T a q t − 1 , q t ∏ t = 1 T b q t ( y t ) ] ⋅ P ( Q , Y ∣ θ ( g ) ) ⋅ d Q = ∑ q 1 = 1 k … , ∑ q T = 1 k [ log ⁡ π ( q 1 ) + ∑ t = 2 T log ⁡ a q t − 1 , q t + ∑ t = 1 T log ⁡ b q t ( y t ) ] ⋅ P ( Q , Y ∣ θ ( g ) ) \begin{aligned} \theta^{(g+1)}&=\underset{\theta}{\operatorname{argmax}} \int_{Q} \log [ p\left(q_{1}\right) \prod_{t=2}^{T} a_{q_{t-1}, q_{t}} \prod_{t=1}^{T}b_{q_{t}}\left(y_{t}\right)] \cdot P\left(Q ,Y \mid\theta^{(g)}\right) \cdot d Q\\ &=\sum_{q_{1}=1}^{k} \ldots, \sum_{q_{T}=1}^{k} [\log \pi\left(q_{1}\right) +\sum_{t=2}^{T} \log a_{q_{t-1}, q_{t}}+ \sum_{t=1}^{T}\log b_{q_{t}}\left(y_{t}\right)] \cdot P\left(Q ,Y \mid\theta^{(g)}\right)\\ \end{aligned} θ(g+1)=θargmaxQlog[p(q1)t=2Taqt1,qtt=1Tbqt(yt)]P(Q,Yθ(g))dQ=q1=1k,qT=1k[logπ(q1)+t=2Tlogaqt1,qt+t=1Tlogbqt(yt)]P(Q,Yθ(g))
在这里插入图片描述

对于每一个term,可以分别由观测数据学习其对应参数,例如对于 π \pi π
Q term ⁡ 1 = ∑ q 0 = 1 k ⋯ ∑ q T = 1 k ln ⁡ π q 0 p ( q , Y ∣ λ ( g ) ) = ∑ i = 1 k ln ⁡ π i p ( q 0 = i , Y ∣ λ ( g ) ) \mathcal{Q}^{\operatorname{term} 1}=\sum_{q_{0}=1}^{k} \cdots \sum_{q_{T}=1}^{k} \ln \pi_{q_{0}} p\left(q, Y \mid \lambda^{(g)}\right)=\sum_{i=1}^{k} \ln \pi_{i} p\left(q_{0}=i, Y \mid \lambda^{(g)}\right) Qterm1=q0=1kqT=1klnπq0p(q,Yλ(g))=i=1klnπip(q0=i,Yλ(g))
约束条件:
arg ⁡ max ⁡ ( Q term  1 )  with  ∑ i = 1 k π i = 1 \arg \max \left(\mathcal{Q}^{\text {term } 1}\right) \text { with } \sum_{i=1}^{k} \pi_{i}=1 argmax(Qterm 1) with i=1kπi=1
使用拉格朗日中值定理:
L M term  1 = ∑ i = 1 k ln ⁡ π i p ( q 0 = i , Y ∣ λ ( g ) ) + τ ( ∑ i = 1 k π i − 1 ) \mathbb{L M}^{\text {term } 1}=\sum_{i=1}^{k} \ln \pi_{i} p\left(q_{0}=i, Y \mid \lambda^{(g)}\right)+\tau\left(\sum_{i=1}^{k} \pi_{i}-1\right) LMterm 1=i=1klnπip(q0=i,Yλ(g))+τ(i=1kπi1)
对两项求导,令其等于0:
∂ L M term  1 ∂ π i = p ( q , Y ∣ λ ( g ) ) π j + τ = 0 ∂ L M term  1 ∂ τ = ∑ i = 1 k π i − 1 = 0 \frac{\partial \mathbb{L} \mathbb{M}^{\text {term } 1}}{\partial \pi_{i}}=\frac{p\left(q, Y \mid \lambda^{(g)}\right)}{\pi_{j}}+\tau=0 \quad \frac{\partial \mathbb{L} \mathbb{M}^{\text {term } 1}}{\partial \tau}=\sum_{i=1}^{k} \pi_{i}-1=0 πiLMterm 1=πjp(q,Yλ(g))+τ=0τLMterm 1=i=1kπi1=0
p ( q 0 = i , Y ∣ λ ( g ) ) = − τ π i p\left(q_{0}=i, Y \mid \lambda^{(g)}\right)=-\tau \pi_{i} p(q0=i,Yλ(g))=τπi
为了使用到约束条件,两边相加:
∑ i = 1 k p ( q 0 = i , Y ∣ λ ( g ) ) = − τ ∑ i = 1 k π i = − τ \sum_{i=1}^{k} p\left(q_{0}=i, Y \mid \lambda^{(g)}\right)=-\tau \sum_{i=1}^{k} \pi_{i}=-\tau i=1kp(q0=i,Yλ(g))=τi=1kπi=τ
代入可得:
π i = p ( q 0 = i , Y ∣ λ ( g ) ) − τ ⟹ π i = p ( q 0 = i , Y ∣ λ ( g ) ) ∑ i = 1 k p ( q 0 = i , Y ∣ λ ( g ) ) \pi_{i}=\frac{p\left(q_{0}=i, Y \mid \lambda^{(g)}\right)}{-\tau} \Longrightarrow \pi_{i}=\frac{p\left(q_{0}=i, Y \mid \lambda^{(g)}\right)}{\sum_{i=1}^{k} p\left(q_{0}=i, Y \mid \lambda(g)\right)} πi=τp(q0=i,Yλ(g))πi=i=1kp(q0=i,Yλ(g))p(q0=i,Yλ(g))
在这里插入图片描述
在这里插入图片描述

小结

基本算是入了一个门,主要在听思路,三个典型问题的第三个这个视频没有讲,应该需要用维特比(Viterbi)算法进行递归,这个不会;EM算法公式怎么推的很不会。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值