隐马尔可夫模型

Hidden Markov Model

Model Definition

Hidden Markov Model(HMM) contains three different sets of parameters:

Transition Matrix Aij∈RN×NA_{ij}\in \mathbb{R}^{N\times N}AijRN×N,

Emission Matrix Bjk∈RN×MB_{jk}\in \mathbb{R}^{N\times M}BjkRN×M,

Initial Probability Distribution πi∈RN×1\pi_i\in \mathbb{R}^{N\times 1}πiRN×1,

where NNN denotes #hidden states zzz, MMM denotes #observable states xxx.

In an HMM, we model the sequence as the result of a series of observations of a series of hidden states. Throughout the process, transition among hidden states are modeled by transition matrix AAA, and follows Markov Chain properties. In an observation of state zjz_jzj, there is a chance of BjkB_{jk}Bjk that we will observe the kkk-th observable state. The initial probability is described by π\piπ.

Maths and Viterbi Algorithm

A common question raised in the setting of an HMM is, Given a series of observations, what are the most likely sequence of the hidden states? To answer this question, we first recall Bayesian Rules, i.e.

P(z∣x)=P(x,z)P(x)∝P(x,z)=P(x∣z)P(z)\mathcal{P}(z|x) = \frac{\mathcal{P}(x, z)}{\mathcal{P}(x)}\propto\mathcal{P}(x, z)=\mathcal{P}(x|z)\mathcal{P}(z)P(zx)=P(x)P(x,z)P(x,z)=P(xz)P(z)

Therefore, as we want to use Maximize A Posteriori(MAP) estimation, we need to maximize the R.H.S. Based on laws derived on Probability Graph Models, for an HMM, we have

P(z)=πz1Πi=1T−1Azizi+1\mathcal{P}(z)=\pi_{z_1}\Pi_{i=1}^{T-1}A_{z_i z_{i+1}}P(z)=πz1Πi=1T1Azizi+1

, where TTT is the length of the sequence, and

P(x∣z)=Πi=1TBzixi\mathcal{P}(x|z) = \Pi_{i=1}^T B_{z_i x_i}P(xz)=Πi=1TBzixi

. Thus, our optimization goal is

max⁡zπz1Πi=1T−1Azizi+1Πi=1TBzixi\max_z \pi_{z_1}\Pi_{i=1}^{T-1}A_{z_i z_{i+1}}\Pi_{i=1}^T B_{z_i x_i}zmaxπz1Πi=1T1Azizi+1Πi=1TBzixi

With some insights, this optimization can be solved using the methodology of Dynamic Programming(DP), since it satisfies the two conditions of DP, i.e. optimal substructure and overlapping subproblems.

Consider the prefix of the whole sequence of length t>1t>1t>1, we rewrite the subproblem as

max⁡zπz1Πi=1t−1Azizi+1Πi=1tBzixi=max⁡zt,zt−1{Azt−1ztBztxtmax⁡z1:t−1{πz1Πi=1t−2Azizi+1Πi=1t−1Bzixi}}\max_z \pi_{z_1}\Pi_{i=1}^{t-1}A_{z_i z_{i+1}}\Pi_{i=1}^t B_{z_i x_i} = \max_{z_t, z_{t-1}} \{A_{z_{t-1}z_t}B_{z_t x_t}\max_{z_{1:t-1}} \{\pi_{z_1}\Pi_{i=1}^{t-2}A_{z_i z_{i+1}}\Pi_{i=1}^{t-1} B_{z_i x_i}\}\}zmaxπz1Πi=1t1Azizi+1Πi=1tBzixi=zt,zt1max{Azt1ztBztxtz1:t1max{πz1Πi=1t2Azizi+1Πi=1t1Bzixi}}

Hence we get an iterative form of the optimization problem. For the sake of both computability and compactness, we take the logarithm on R.H.S, and get

max⁡zπz1Πi=1t−1Azizi+1Πi=1tBzixi=max⁡zt,zt−1{log⁡Azt−1zt+log⁡Bztxt+max⁡z1:t−1{log⁡πz1+Σi=1t−2log⁡Azizi+1+Σi=1t−1log⁡Bzixi}}\max_z \pi_{z_1}\Pi_{i=1}^{t-1}A_{z_i z_{i+1}}\Pi_{i=1}^t B_{z_i x_i} = \max_{z_t, z_{t-1}} \{\log A_{z_{t-1}z_t}+\log B_{z_t x_t} + \max_{z_{1:t-1}} \{\log \pi_{z_1}+\Sigma_{i=1}^{t-2}\log A_{z_i z_{i+1}}+\Sigma_{i=1}^{t-1} \log B_{z_i x_i}\}\}zmaxπz1Πi=1t1Azizi+1Πi=1tBzixi=zt,zt1max{logAzt1zt+logBztxt+z1:t1max{logπz1+Σi=1t2logAzizi+1+Σi=1t1logBzixi}}

This is an apposite form for solving the problem. The process of iteratively solve the problem is named after Viterbi.

Implementation of Hidden Markov Model

We implement the basic behavior of an HMM, which is the generation of a succession of hidden states and observations, and the Viterbi Algorithm, which is the basic solution to an HMM.

For the generation, all we need to do is to do experiments from a series of multinoulli distributions portrayed by the three parameters.

For the Viterbi Algorithm, we utilize two matrices, one recording the maximum log-likelihood of a hidden state sequence ending at time ttt with zt=nz_t=nzt=n, we denote this matrix as Hnt∈RN×T\mathcal{H}_{nt}\in\mathbb{R}^{N\times T}HntRN×T. The other matrix is prepared for recording the precursors of each hidden state at time ttt in order to reach the maximum log-likelihood at the very time-state pair, we denote this matrix as Rnt∈RN×T\mathcal{R}_{nt}\in\mathbb{R}^{N\times T}RntRN×T.

Now we are ready to implement the algorithm.

Generation:

    def generate(self, T, series_idx):
        self.sample[series_idx] = {'hidden': np.zeros(T, dtype=np.int32),
                                   'observed': np.zeros(T, dtype=np.int32)}
        cur_state = np.random.choice(np.arange(self.N), p=self.pi)
        self.sample[series_idx]['hidden'][0] = cur_state
        self.sample[series_idx]['observed'][0] = np.random.choice(np.arange(self.M), p=self.B[cur_state])
        for t in range(1, T):
            cur_state = np.random.choice(np.arange(self.N), p=self.A[cur_state])
            self.sample[series_idx]['hidden'][t] = cur_state
            self.sample[series_idx]['observed'][t] = np.random.choice(np.arange(self.M), p=self.B[cur_state])

Viterbi:

    def viterbi(self, O, exp_idx):
        """Viterbi Algorithm

        Args:
            O (np.array): 1d array, length of T
            exp_idx (string): index of experiment
        """
        
        T = O.shape[0]
        self.H[exp_idx] = np.zeros((self.N, T), dtype=np.int32)
        self.R[exp_idx] = np.zeros((self.N, T), dtype=np.int32)
        self.optimal[exp_idx] = np.zeros(T, dtype=np.int32)
        
        # Initialization
        self.H[exp_idx][:, 0] = self.log_pi + self.log_B[:, O[0]]
        self.R[exp_idx][:, 0] = -1
        
        # Iteration
        for t in range(1, T):
            for j in range(self.N):
                tmp = self.H[exp_idx][:, t-1] + self.log_A[:, j] + self.log_B[j, O[t]]
                self.H[exp_idx][j, t] = np.max(tmp)
                self.R[exp_idx][j, t] = np.argmax(tmp)
                
        print(f"Hidden State Matrix: \n{self.H[exp_idx]}")
        print(f"Route Matrix: \n{self.R[exp_idx]}")
                
        # Termination
        self.optimal[exp_idx][-1] = np.argmax(self.H[exp_idx][:, -1])
        for t in range(T-2, -1, -1):
            self.optimal[exp_idx][t] = self.R[exp_idx][self.optimal[exp_idx][t+1], t+1]
            
        print(f"Optimal Hidden State Sequence: {self.optimal[exp_idx]}")
            
        return self.optimal[exp_idx]

We consider a simple example, where

A=[0.70.30.20.8]A = \left[\begin{matrix} 0.7& 0.3\\ 0.2& 0.8 \end{matrix}\right]A=[0.70.20.30.8], B=[0.40.10.40.10.10.40.10.4]B = \left[\begin{matrix} 0.4& 0.1& 0.4& 0.1\\ 0.1& 0.4& 0.1& 0.4 \end{matrix}\right]B=[0.40.10.10.40.40.10.10.4], π=[0.50.5]\pi = \left[\begin{matrix} 0.5& 0.5 \end{matrix}\right]π=[0.50.5], observation=[013]\mathrm{observation} = \left[\begin{matrix} 0& 1& 3 \end{matrix}\right]observation=[013]

The following derivation is adapted from the course slides of Media and Cognition, EE, Tsinghua, so variables may have different letters. Here, δ\deltaδ denotes likelihood, φ\varphiφ stands for precursor, OOO stands for observation, PPP stands for transition probability, and q∗q^*q is the solution.

δ0(0)=π0b0(O0)=0.2δ0(1)=π1b1(O0)=0.05φ0(0)=−1φ0(1)=−1δ1(0)=max⁡{δ0(0)P(S0∣S0),δ0(1)P(S0∣S1)}b0(O1)=max⁡{0.2×0.7,0.05×0.2}×0.1=0.014δ1(1)=max⁡{δ0(0)P(S1∣S0),δ0(1)P(S1∣S1)}b1(O1)=max⁡{0.2×0.3,0.05×0.8}×0.4=0.024φ1(0)=arg⁡max⁡{δ0(0)P(S0∣S0),δ0(1)P(S0∣S1)}=0φ1(1)=0δ2(0)=max⁡{δ1(0)P(S0∣S0),δ1(1)P(S0∣S1)}b0(O2)=max⁡{0.014×0.7,0.024×0.2}×0.1=0.00098δ2(1)=max⁡{δ1(0)P(S1∣S0),δ1(1)P(S1∣S1)}b1(O2)=max⁡{0.014×0.3,0.024×0.8}×0.4=0.00768φ2(0)=0φ2(1)=1P∗=max⁡{δ2(0),δ2(1)}=δ2(1)=0.00768q2∗=1,q1∗=φ2(1)=1,q0∗=φ1(1)=0\begin{aligned} & \delta_0(0)=\pi_0 b_0\left(O_0\right)=0.2 \\ & \delta_0(1)=\pi_1 b_1\left(O_0\right)=0.05 \\ & \varphi_0(0)=-1 \\ & \varphi_0(1)=-1 \\ & \delta_1(0)=\max \left\{\delta_0(0) P\left(S_0 \mid S_0\right), \delta_0(1) P\left(S_0 \mid S_1\right)\right\} b_0\left(O_1\right)=\max \{0.2 \times 0.7,0.05 \times 0.2\} \times 0.1=0.014 \\ & \delta_1(1)=\max \left\{\delta_0(0) P\left(S_1 \mid S_0\right), \delta_0(1) P\left(S_1 \mid S_1\right)\right\} b_1\left(O_1\right)=\max \{0.2 \times 0.3,0.05 \times 0.8\} \times 0.4=0.024 \\ & \varphi_1(0)=\arg \max \left\{\delta_0(0) P\left(S_0 \mid S_0\right), \delta_0(1) P\left(S_0 \mid S_1\right)\right\}=0 \\ & \varphi_1(1)=0 \\ & \delta_2(0)=\max \left\{\delta_1(0) P\left(S_0 \mid S_0\right), \delta_1(1) P\left(S_0 \mid S_1\right)\right\} b_0\left(O_2\right)=\max \{0.014 \times 0.7,0.024 \times 0.2\} \times 0.1=0.00098 \\ & \delta_2(1)=\max \left\{\delta_1(0) P\left(S_1 \mid S_0\right), \delta_1(1) P\left(S_1 \mid S_1\right)\right\} b_1\left(O_2\right)=\max \{0.014 \times 0.3,0.024 \times 0.8\} \times 0.4=0.00768 \\ & \varphi_2(0)=0 \\ & \varphi_2(1)=1 \\ & P^*=\max \left\{\delta_2(0), \delta_2(1)\right\}=\delta_2(1)=0.00768 \\ & q_2{ }^*=1, \quad q_1{ }^*=\varphi_2(1)=1, \quad q_0{ }^*=\varphi_1(1)=0 \end{aligned}δ0(0)=π0b0(O0)=0.2δ0(1)=π1b1(O0)=0.05φ0(0)=1φ0(1)=1δ1(0)=max{δ0(0)P(S0S0),δ0(1)P(S0S1)}b0(O1)=max{0.2×0.7,0.05×0.2}×0.1=0.014δ1(1)=max{δ0(0)P(S1S0),δ0(1)P(S1S1)}b1(O1)=max{0.2×0.3,0.05×0.8}×0.4=0.024φ1(0)=argmax{δ0(0)P(S0S0),δ0(1)P(S0S1)}=0φ1(1)=0δ2(0)=max{δ1(0)P(S0S0),δ1(1)P(S0S1)}b0(O2)=max{0.014×0.7,0.024×0.2}×0.1=0.00098δ2(1)=max{δ1(0)P(S1S0),δ1(1)P(S1S1)}b1(O2)=max{0.014×0.3,0.024×0.8}×0.4=0.00768φ2(0)=0φ2(1)=1P=max{δ2(0),δ2(1)}=δ2(1)=0.00768q2=1,q1=φ2(1)=1,q0=φ1(1)=0

So we have the optimal hidden state sequence is [0, 1, 1]

By running the codes, we get the following results.

====================
Hidden: [1 1 1 1 1]
--------------------
Observed: [1 1 2 1 3]
====================
Hidden State Matrix: 
[[0.2     0.014   0.00098]
 [0.05    0.024   0.00768]]
Route Matrix: 
[[-1  0  0]
 [-1  0  1]]
Optimal Hidden State Sequence: [0 1 1]

The first part(between the lines of “==”) records the generated hidden sequence and the observed sequence. The second part denotes the solving steps of the Viterbi Algorithm, which is in sync with previous calculation. This verifies the correctness of our program.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值