隐马尔可夫模型-优快云博客

本文链接：https://blog.youkuaiyun.com/Acelerator_thu/article/details/131388261

Hidden Markov Model

Model Definition

Hidden Markov Model(HMM) contains three different sets of parameters:

Transition Matrix $Aij∈RN×NA_{ij}\in \mathbb{R}^{N\times N}$ ,

Emission Matrix $Bjk∈RN×MB_{jk}\in \mathbb{R}^{N\times M}$ ,

Initial Probability Distribution $πi∈RN×1\pi_i\in \mathbb{R}^{N\times 1}$ ,

where $N$ denotes #hidden states $z$ , $M$ denotes #observable states $x$ .

In an HMM, we model the sequence as the result of a series of observations of a series of hidden states. Throughout the process, transition among hidden states are modeled by transition matrix $A$ , and follows Markov Chain properties. In an observation of state $z_j$ , there is a chance of $B_{jk}$ that we will observe the $k$ -th observable state. The initial probability is described by $π\pi$ .

Maths and Viterbi Algorithm

A common question raised in the setting of an HMM is, Given a series of observations, what are the most likely sequence of the hidden states? To answer this question, we first recall Bayesian Rules, i.e.

$P(z∣x)=P(x,z)P(x)∝P(x,z)=P(x∣z)P(z)\mathcal{P}(z|x) = \frac{\mathcal{P}(x, z)}{\mathcal{P}(x)}\propto\mathcal{P}(x, z)=\mathcal{P}(x|z)\mathcal{P}(z)$

Therefore, as we want to use Maximize A Posteriori(MAP) estimation, we need to maximize the R.H.S. Based on laws derived on Probability Graph Models, for an HMM, we have

$P(z)=πz1Πi=1T−1Azizi+1\mathcal{P}(z)=\pi_{z_1}\Pi_{i=1}^{T-1}A_{z_i z_{i+1}}$

, where $T$ is the length of the sequence, and

$P(x∣z)=Πi=1TBzixi\mathcal{P}(x|z) = \Pi_{i=1}^T B_{z_i x_i}$

. Thus, our optimization goal is

$max⁡zπz1Πi=1T−1Azizi+1Πi=1TBzixi\max_z \pi_{z_1}\Pi_{i=1}^{T-1}A_{z_i z_{i+1}}\Pi_{i=1}^T B_{z_i x_i}$

With some insights, this optimization can be solved using the methodology of Dynamic Programming(DP), since it satisfies the two conditions of DP, i.e. optimal substructure and overlapping subproblems.

Consider the prefix of the whole sequence of length $t > 1$ , we rewrite the subproblem as

$max⁡zπz1Πi=1t−1Azizi+1Πi=1tBzixi=max⁡zt,zt−1{Azt−1ztBztxtmax⁡z1:t−1{πz1Πi=1t−2Azizi+1Πi=1t−1Bzixi}}\max_z \pi_{z_1}\Pi_{i=1}^{t-1}A_{z_i z_{i+1}}\Pi_{i=1}^t B_{z_i x_i} = \max_{z_t, z_{t-1}} \{A_{z_{t-1}z_t}B_{z_t x_t}\max_{z_{1:t-1}} \{\pi_{z_1}\Pi_{i=1}^{t-2}A_{z_i z_{i+1}}\Pi_{i=1}^{t-1} B_{z_i x_i}\}\}$

Hence we get an iterative form of the optimization problem. For the sake of both computability and compactness, we take the logarithm on R.H.S, and get

$max⁡zπz1Πi=1t−1Azizi+1Πi=1tBzixi=max⁡zt,zt−1{log⁡Azt−1zt+log⁡Bztxt+max⁡z1:t−1{log⁡πz1+Σi=1t−2log⁡Azizi+1+Σi=1t−1log⁡Bzixi}}\max_z \pi_{z_1}\Pi_{i=1}^{t-1}A_{z_i z_{i+1}}\Pi_{i=1}^t B_{z_i x_i} = \max_{z_t, z_{t-1}} \{\log A_{z_{t-1}z_t}+\log B_{z_t x_t} + \max_{z_{1:t-1}} \{\log \pi_{z_1}+\Sigma_{i=1}^{t-2}\log A_{z_i z_{i+1}}+\Sigma_{i=1}^{t-1} \log B_{z_i x_i}\}\}$

This is an apposite form for solving the problem. The process of iteratively solve the problem is named after Viterbi.

Implementation of Hidden Markov Model

We implement the basic behavior of an HMM, which is the generation of a succession of hidden states and observations, and the Viterbi Algorithm, which is the basic solution to an HMM.

For the generation, all we need to do is to do experiments from a series of multinoulli distributions portrayed by the three parameters.

For the Viterbi Algorithm, we utilize two matrices, one recording the maximum log-likelihood of a hidden state sequence ending at time $t$ with $z_t=n$ , we denote this matrix as $Hnt∈RN×T\mathcal{H}_{nt}\in\mathbb{R}^{N\times T}$ . The other matrix is prepared for recording the precursors of each hidden state at time $t$ in order to reach the maximum log-likelihood at the very time-state pair, we denote this matrix as $Rnt∈RN×T\mathcal{R}_{nt}\in\mathbb{R}^{N\times T}$ .

Now we are ready to implement the algorithm.

Generation:

    def generate(self, T, series_idx):
        self.sample[series_idx] = {'hidden': np.zeros(T, dtype=np.int32),
                                   'observed': np.zeros(T, dtype=np.int32)}
        cur_state = np.random.choice(np.arange(self.N), p=self.pi)
        self.sample[series_idx]['hidden'][0] = cur_state
        self.sample[series_idx]['observed'][0] = np.random.choice(np.arange(self.M), p=self.B[cur_state])
        for t in range(1, T):
            cur_state = np.random.choice(np.arange(self.N), p=self.A[cur_state])
            self.sample[series_idx]['hidden'][t] = cur_state
            self.sample[series_idx]['observed'][t] = np.random.choice(np.arange(self.M), p=self.B[cur_state])

Viterbi:

    def viterbi(self, O, exp_idx):
        """Viterbi Algorithm

        Args:
            O (np.array): 1d array, length of T
            exp_idx (string): index of experiment
        """
        
        T = O.shape[0]
        self.H[exp_idx] = np.zeros((self.N, T), dtype=np.int32)
        self.R[exp_idx] = np.zeros((self.N, T), dtype=np.int32)
        self.optimal[exp_idx] = np.zeros(T, dtype=np.int32)
        
        # Initialization
        self.H[exp_idx][:, 0] = self.log_pi + self.log_B[:, O[0]]
        self.R[exp_idx][:, 0] = -1
        
        # Iteration
        for t in range(1, T):
            for j in range(self.N):
                tmp = self.H[exp_idx][:, t-1] + self.log_A[:, j] + self.log_B[j, O[t]]
                self.H[exp_idx][j, t] = np.max(tmp)
                self.R[exp_idx][j, t] = np.argmax(tmp)
                
        print(f"Hidden State Matrix: \n{self.H[exp_idx]}")
        print(f"Route Matrix: \n{self.R[exp_idx]}")
                
        # Termination
        self.optimal[exp_idx][-1] = np.argmax(self.H[exp_idx][:, -1])
        for t in range(T-2, -1, -1):
            self.optimal[exp_idx][t] = self.R[exp_idx][self.optimal[exp_idx][t+1], t+1]
            
        print(f"Optimal Hidden State Sequence: {self.optimal[exp_idx]}")
            
        return self.optimal[exp_idx]

We consider a simple example, where

$\left[\begin{matrix} 0.7& 0.3\\ 0.2& 0.8 \end{matrix}\right]$ , $\left[\begin{matrix} 0.4& 0.1& 0.4& 0.1\\ 0.1& 0.4& 0.1& 0.4 \end{matrix}\right]$ , $π=[0.50.5]\pi = \left[\begin{matrix} 0.5& 0.5 \end{matrix}\right]$ , $observation=[013]\mathrm{observation} = \left[\begin{matrix} 0& 1& 3 \end{matrix}\right]$

The following derivation is adapted from the course slides of Media and Cognition, EE, Tsinghua, so variables may have different letters. Here, $δ\delta$ denotes likelihood, $φ\varphi$ stands for precursor, $O$ stands for observation, $P$ stands for transition probability, and $q^*$ is the solution.

$δ0(0)=π0b0(O0)=0.2δ0(1)=π1b1(O0)=0.05φ0(0)=−1φ0(1)=−1δ1(0)=max⁡{δ0(0)P(S0∣S0),δ0(1)P(S0∣S1)}b0(O1)=max⁡{0.2×0.7,0.05×0.2}×0.1=0.014δ1(1)=max⁡{δ0(0)P(S1∣S0),δ0(1)P(S1∣S1)}b1(O1)=max⁡{0.2×0.3,0.05×0.8}×0.4=0.024φ1(0)=arg⁡max⁡{δ0(0)P(S0∣S0),δ0(1)P(S0∣S1)}=0φ1(1)=0δ2(0)=max⁡{δ1(0)P(S0∣S0),δ1(1)P(S0∣S1)}b0(O2)=max⁡{0.014×0.7,0.024×0.2}×0.1=0.00098δ2(1)=max⁡{δ1(0)P(S1∣S0),δ1(1)P(S1∣S1)}b1(O2)=max⁡{0.014×0.3,0.024×0.8}×0.4=0.00768φ2(0)=0φ2(1)=1P∗=max⁡{δ2(0),δ2(1)}=δ2(1)=0.00768q2∗=1,q1∗=φ2(1)=1,q0∗=φ1(1)=0\begin{aligned} & \delta_0(0)=\pi_0 b_0\left(O_0\right)=0.2 \\ & \delta_0(1)=\pi_1 b_1\left(O_0\right)=0.05 \\ & \varphi_0(0)=-1 \\ & \varphi_0(1)=-1 \\ & \delta_1(0)=\max \left\{\delta_0(0) P\left(S_0 \mid S_0\right), \delta_0(1) P\left(S_0 \mid S_1\right)\right\} b_0\left(O_1\right)=\max \{0.2 \times 0.7,0.05 \times 0.2\} \times 0.1=0.014 \\ & \delta_1(1)=\max \left\{\delta_0(0) P\left(S_1 \mid S_0\right), \delta_0(1) P\left(S_1 \mid S_1\right)\right\} b_1\left(O_1\right)=\max \{0.2 \times 0.3,0.05 \times 0.8\} \times 0.4=0.024 \\ & \varphi_1(0)=\arg \max \left\{\delta_0(0) P\left(S_0 \mid S_0\right), \delta_0(1) P\left(S_0 \mid S_1\right)\right\}=0 \\ & \varphi_1(1)=0 \\ & \delta_2(0)=\max \left\{\delta_1(0) P\left(S_0 \mid S_0\right), \delta_1(1) P\left(S_0 \mid S_1\right)\right\} b_0\left(O_2\right)=\max \{0.014 \times 0.7,0.024 \times 0.2\} \times 0.1=0.00098 \\ & \delta_2(1)=\max \left\{\delta_1(0) P\left(S_1 \mid S_0\right), \delta_1(1) P\left(S_1 \mid S_1\right)\right\} b_1\left(O_2\right)=\max \{0.014 \times 0.3,0.024 \times 0.8\} \times 0.4=0.00768 \\ & \varphi_2(0)=0 \\ & \varphi_2(1)=1 \\ & P^*=\max \left\{\delta_2(0), \delta_2(1)\right\}=\delta_2(1)=0.00768 \\ & q_2{ }^*=1, \quad q_1{ }^*=\varphi_2(1)=1, \quad q_0{ }^*=\varphi_1(1)=0 \end{aligned}$

So we have the optimal hidden state sequence is [0, 1, 1]

By running the codes, we get the following results.

====================
Hidden: [1 1 1 1 1]
--------------------
Observed: [1 1 2 1 3]
====================
Hidden State Matrix: 
[[0.2     0.014   0.00098]
 [0.05    0.024   0.00768]]
Route Matrix: 
[[-1  0  0]
 [-1  0  1]]
Optimal Hidden State Sequence: [0 1 1]

The first part(between the lines of “==”) records the generated hidden sequence and the observed sequence. The second part denotes the solving steps of the Viterbi Algorithm, which is in sync with previous calculation. This verifies the correctness of our program.