前一篇文章没有系统地介绍这个模型,本篇文章将详细介绍
1. 定义:
The Hidden Markov Model is a finite set of states , each of which is associated with a (generally multidimensional) probability distribution []. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are ``hidden'' to the outside; hence the name Hidden Markov Model.
隐马尔科夫模型是由有限个状态组成的,每一个状态都以一定的概率出现,状态之间的转换由转换概率表决定,每一个状态都可以产生一个观察到的状态,在隐马尔科夫模型中,只有观察到的状态所见,真实的马尔科夫状态链不可见,因此被称为隐马尔科夫模型,该模型包含如下三个要素:
- The number of states of the model, N .
- The number of observation symbols in the alphabet, M . If the observations are continuous then M is infinite.
- A set of state transition probabilities
.
where
denotes the current state.
Transition probabilities should satisfy the normal stochastic constraints,and
- A probability distribution in each of the states,
.
where
denotes the
observation symbol in the alphabet, and
the current parameter vector.
Following stochastic constraints must be satisfied.and
If the observations are continuous then we will have to use a continuous probability density function, instead of a set of discrete probabilities. In this case we specify the parameters of the probability density function. Usually the probability density is approximated by a weighted sum of M Gaussian distributions
,
where,
should satisfy the stochastic constrains,
and
- The initial state distribution,
.
where,
Therefore we can use the compact notation
to denote an HMM with discrete probability distributions, while
to denote one with continuous densities. .
所以,一个隐马尔科夫模型可记为
2.一些假设:
For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs.
-
(1)The Markov assumption
状态的马尔科夫假设,即当前状态只与前一个状态相关
-
As given in the definition of HMMs, transition probabilities are defined as,
In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM.
However generally the next state may depend on past k states and it is possible to obtain a such model, called anorder HMM by defining the transition probabilities as follows.
But it is seen that a higher order HMM will have a higher complexity. Even though the first order HMMs are the most common, some attempts have been made to use the higher order HMMs too.
(2)The stationarity assumption
稳定性假设,状态转换与时间无关 -
Here it is assumed that state transition probabilities are independent of the actual time at which the transitions takes place. Mathematically,
for any
and
.
(3)The output independence assumption
当前状态到观察状态的转换概率与已经发生的观察序列无关,即可以将观察序列分解为多个无关的步骤 -
This is the assumption that current output(observation) is statistically independent of the previous outputs(observations). We can formulate this assumption mathematically, by considering a sequence of observations,
. Then according to the assumption for an HMM
,
However unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs.
3.要解决的三个问题:
Once we have an HMM, there are three problems of interest.
-
(1)The Evaluation Problem
计算某一个观察序列在模型下的出现概率
-
Given an HMM
and a sequence of observations
, what is the probability that the observations are generated by the model,
?
(2)The Decoding Problem
根据观察到的序列,计算其最有可能对应的隐藏状态序列,即解码问题 -
Given a model
and a sequence of observations
, what is the most likely state sequence in the model that produced the observations?
(3)The Learning Problem
怎样改进这个模型,使得观察到的序列的概率最大化 -
Given a model
and a sequence of observations
, how should we adjust the model parameters
in order to maximize
Evaluation problem can be used for isolated (word) recognition. Decoding problem is related to the continuous recognition as well as to the segmentation. Learning problem must be solved, if we want to train an HMM for the subsequent use of recognition tasks.
4. 估计观察序列的概率问题:
We have a model and a sequence of observations
, and
must be found. We can calculate this quantity using simple probabilistic arguments. But this calculation involves number of operations in the order of
. This is very large even if the length of the sequence, T is moderate. Therefore we have to look for an other method for this calculation. Fortunately there exists one which has a considerably low complexity and makes use an auxiliary variable,
called forward variable .
The forward variable is defined as the probability of the partial observation sequence , when it terminates at the state i . Mathematically,
前向变量:观察到O1,O2,..,Ot并且t时刻Qt = i 的概率,它是按t向前推进的,当t=T时,整个观察序列都已经获取到,因此只要对所有的前向变量在T时刻的值求和就得到了观察序列出现的概率
Then it is easy to see that following recursive relationship holds.
where,
Using this recursion we can calculate
and then the required probability is given by,
The complexity of this method, known as the forward algorithm is proportional to , which is linear wrt T whereas the direct calculation mentioned earlier, had an exponential complexity.
In a similar way we can define the backward variable as the probability of the partial observation sequence
, given that the current state is i . Mathematically ,
后向变量定义的时t时刻之后产生的某一个观察序列的概率,而前向变量定义的是这个时刻之前的观察序列的概率,根据前面三个假设中的最后一个,因此整个序列的概率等于前向 与后向的乘积
As in the case of there is a recursive relationship which can be used to calculate
efficiently.
where,
Further we can see that,
Therefore this gives another way to calculate , by using both forward and backward variables as given in eqn. 1.7 .
Eqn. 1.7 is very useful, specially in deriving the formulas required for gradient based training.
5.解码问题:
In this case We want to find the most likely state sequence for a given sequence of observations, and a model,
The solution to this problem depends upon the way ``most likely state sequence'' is defined. One approach is to find the most likely state at t =t and to concatenate all such '
's. But some times this method does not give a physically meaningful state sequence. Therefore we would go for another method which has no such problems.
In this method, commonly known as Viterbi algorithm , the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable,
which gives the highest probability that partial observation sequence and state sequence up to t =t can have, when the current state is i .
上面的公式定义:在t时刻,结束状态是i,并且观察到的序列是O1...t-1的最大概率,因此解码问题就变成求在T时刻,概率最大的结束状态
It is easy to observe that the following recursive relationship holds.
where,
这个递推公式说明如下:
由前面的第三个假设可知,t时刻转换到t + 1 时刻,这个概率与已经发生的观察序列无关,因此我们只需要保存在每个状态上的最大概率,然后在计算这个状态进行到下一个状态的概率,将二者进行乘积即得到在t + 1时刻该路径的概率,然后在N个值中选择一个最大的值
So the procedure to find the most likely state sequence starts from calculation of using recursion in 1.8 , while always keeping a pointer to the ``winning state'' in the maximum finding operation. Finally the state
, is found where
and starting from this state, the sequence of states is back-tracked as the pointer in each state indicates.This gives the required set of states.
This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the states of the HMM in each of the time instant .
6.学习问题