在开始之前,先回顾一下正交、不相关和独立之间的联系与差别
-
正交
随机变量:R(x,y)=E[xy]\mathcal R(x, y) = \mathbb E[xy]R(x,y)=E[xy]为相关函数,若R(xy)=0\mathcal R(xy)=0R(xy)=0,则认为x,yx,yx,y正交。(类比内积,注意,相关函数为0,是正交,不是不相关)
随机过程:R(X(t),Y(t))=E[X(t)Y(t)]\mathcal R(X(t), Y(t)) = \mathbb E[X(t)Y(t)]R(X(t),Y(t))=E[X(t)Y(t)],若R(X(t),Y(t))=0\mathcal R(X(t), Y(t)) =0R(X(t),Y(t))=0,则认为X(t),Y(t)X(t), Y(t)X(t),Y(t)正交。 -
不相关
随机变量:E[xy]=E[x]E[y]\mathbb E[xy] = \mathbb E [x] \mathbb E[y]E[xy]=E[x]E[y],则认为x,yx,yx,y不相关。
随机过程:E[X(t)Y(t)]=E[X(t)]E[Y(t)]\mathbb E[X(t)Y(t)] = \mathbb E [X(t)] \mathbb E[Y(t)]E[X(t)Y(t)]=E[X(t)]E[Y(t)],则认为X(t),Y(t)X(t), Y(t)X(t),Y(t)不相关。
注意:当随机变量为高斯随机变量,或随机过程为高斯随机过程时,不相关与独立等价。 -
独立
若联合分布p(x,y)=p(x)⋅p(y)p(x,y)=p(x) \cdot p(y)p(x,y)=p(x)⋅p(y),则认为x,yx,yx,y独立。 -
协方差的相关和独立
协方差函数Cov(x,y)=E[(x−E[x])(y−E[y])]\text{Cov}(x,y) = \mathbb E\left [ (x - \mathbb E[x])(y - \mathbb E[y]) \right]Cov(x,y)=E[(x−E[x])(y−E[y])],若Cov(x,y)=0\text{Cov}(x,y) = 0Cov(x,y)=0,则称x,yx,yx,y不相关(不相关只是说明两者没有线性关系,但是不代表有任何关系)
正交、不相关与独立之间的关系:
- 独立⇒\Rightarrow⇒不相关
- 高斯随机变量时,独立⇔\Leftrightarrow⇔不相关
- 当其中一个变量的均值为0时,不相关⇔\Leftrightarrow⇔正交,否则没关系
Kalman滤波:标量形式
考虑标量的状态方程(scalar state equation)和标量观测方程(scalar observation equation):
s[n]=as[n−1]+u[n](1)s[n] = a s[n-1] + u[n] \tag{1}s[n]=as[n−1]+u[n](1)
x[n]=s[n]+w[n](2)x[n] = s[n] + w[n] \tag{2}x[n]=s[n]+w[n](2)
其中,我们假设s[−1]∼N(μs,σs)s[-1] \sim \mathcal{N}(\mu_s,\sigma_s)s[−1]∼N(μs,σs)。u[n]u[n]u[n]是零均值的高斯噪声,E[u2[n]]=σu2\mathbb{E}[u^2[n]]=\sigma_u^2E[u2[n]]=σu2,且{u[n]}\{u[n]\}{u[n]}之间相互独立。w[n]w[n]w[n]是零均值的高斯噪声,E[w2[n]]=σn2\mathbb{E}[w^2[n]]=\sigma_n^2E[w2[n]]=σn2,且{w[n]}\{w[n]\}{w[n]}之间相互独立。为了简化过程,我们假设μs=0\mu_s=0μs=0。我们要从观测值{x[0],x[1],⋯ ,x[n]}\{x[0],x[1],\cdots,x[n]\}{x[0],x[1],⋯,x[n]}中估计出s[n]s[n]s[n]。我们指定基于{x[0],x[1],⋯ ,x[n]}\{x[0],x[1],\cdots,x[n]\}{x[0],x[1],⋯,x[n]}来估计s[n]s[n]s[n]的估计器为s^[n∣m]\hat{s}[n|m]s^[n∣m]。我们的最优准则(criterion of optimality)基于最小化贝叶斯MSE(minimum Bayes MSE),用公式表示为
E[(s[n]−s^[n∣n])2]
\mathbb{E} \left [ (s[n] - \hat{s}[n|n])^2 \right]
E[(s[n]−s^[n∣n])2]
求该期望所对应的概率为联合概率密度函数p(x[0],x[1],⋯ ,x[n],s[n])p(x[0],x[1],\cdots,x[n],s[n])p(x[0],x[1],⋯,x[n],s[n]) (在这一点上要区别于经典的MSE,经典的MSE与Bayes-MSE区别在于如何看待s[n]s[n]s[n]:经典的MSE是把s[n]s[n]s[n]看作是一个未知的参数,所以MSE求期望的是基于p(x[0],x[1],⋯ ,x[n];s[n])p(x[0],x[1],\cdots,x[n];s[n])p(x[0],x[1],⋯,x[n];s[n]);而Bayes-MSE把s[n]s[n]s[n]看作是一个随机变量。)
MMSE估计器是后验均值:
s^[n∣n]=E[s[n]∣x[0],x[1],⋯ ,x[n]](3)
\hat{s}[n|n] = \mathbb{E} \left [ s[n]| x[0],x[1],\cdots, x[n] \right] \tag{3}
s^[n∣n]=E[s[n]∣x[0],x[1],⋯,x[n]](3)
令θ=s[n]\theta=s[n]θ=s[n]和x=[x[0],x[1],⋯ ,x[n]]T\boldsymbol{x} = [x[0],x[1],\cdots,x[n]]^Tx=[x[0],x[1],⋯,x[n]]T是联合高斯的,所以有
s^[n∣n]=CθxCxx−1x(4)
\hat{s} [n|n] = \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} \boldsymbol{ x} \tag{4}
s^[n∣n]=CθxCxx−1x(4)
因为我们假设的统计特征都是基于高斯的,所以MMSE估计器是线性的,也就与LMMSE估计器一致。
关于MMSE估计器:估计θ\thetaθ,我们给出两个性质:
-
性质1:基于两个不相关数据向量x1,x2\boldsymbol{x}_1,\boldsymbol{x}_2x1,x2,假设他们服从联合高斯分布,那么
θ^=E[θ∣x1,x2]=E[θ∣x1]+E[θ∣x2] \begin{aligned} \hat{\theta} & = \mathbb{E} \left [ \theta| \boldsymbol{x}_1,\boldsymbol{x}_2 \right] \\ &= \mathbb{E} \left [ \theta| \boldsymbol{x}_1 \right] + \mathbb{E} \left [ \theta| \boldsymbol{x}_2 \right] \end{aligned} θ^=E[θ∣x1,x2]=E[θ∣x1]+E[θ∣x2]关于该性质,我们做出两种证明或解释,如下所述:
解释1:因为x=[x1T,x2T]T\boldsymbol{x} = [\boldsymbol{x}_1^T, \boldsymbol{x}_2^T]^Tx=[x1T,x2T]T服从高斯分布,所以
θ^=E[θ∣x]=E[θ]+CθxCxx−1(x−E[x])=CθxCxx−1x \begin{aligned} \hat{\theta} = \mathbb{E}[\theta|\boldsymbol x] &= \mathbb{E}[\theta] + \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} (\boldsymbol x - \mathbb{E}[\boldsymbol x]) \\ &= \boldsymbol C_{\theta x} \boldsymbol C^{-1}_{x x} \boldsymbol x \end{aligned} θ^=E[θ∣x]=E[θ]+CθxCxx−1(x−E[x])=CθxCxx−1x因为我们假设E[θ]=0\mathbb{E}[\theta]=0E[θ]=0,E[x]=0\mathbb{E}[\boldsymbol x]=0E[x]=0,这样的假设是合理的,因为我们可以在开始处理之前先减掉均值。
考虑到x1,x2\boldsymbol{x}_1,\boldsymbol{x}_2x1,x2不相关,且E[x1]=E[x2]=0\mathbb{E}[\boldsymbol x_1]=\mathbb{E}[\boldsymbol x_2]=\boldsymbol{0}E[x1]=E[x2]=0,所以E[x1x2T]=E[x1]E[x2T]=0\mathbb{E}[\boldsymbol x_1 \boldsymbol{x}^T_2] = \mathbb{E}[\boldsymbol x_1] \mathbb{E}[\boldsymbol{x}^T_2]=\boldsymbol{0}E[x1x2T]=E[x1]E[x2T]=0,因此可以得到,
Cxx−1=[Cx1x1Cx1x2Cx2x1Cx2x2]−1=[Cx1x100Cx2x2]−1=[Cx1x1−100Cx2x2−1] \begin{aligned} \boldsymbol{C}_{xx}^{-1}&=\left[ \begin{matrix} {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_{\boldsymbol{1}}}& {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_2}\\ {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_{\boldsymbol{1}}}& {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_2}\\ \end{matrix} \right] ^{-1} \\ &=\left[ \begin{matrix} {\boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}}}_{\boldsymbol{x}_{\boldsymbol{1}}}& \boldsymbol{0}\\ \boldsymbol{0}& {\boldsymbol{C}_{\boldsymbol{x}_2}}_{\boldsymbol{x}_2}\\ \end{matrix} \right] ^{-1} \\ &=\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1}\\ \end{matrix} \right] \end{aligned} Cxx−1=[Cx1x1Cx2x1Cx1x2Cx2x2]−1=[Cx1x100Cx2x2]−1=[Cx1x1−100Cx2x2−1]并且,
Cθx=E[θ[x1x2]T]=[Cθx1Cθx2] \boldsymbol C_{\theta x} = \mathbb{E} \left[ \boldsymbol{\theta }\left[ \begin{array}{c} \boldsymbol{x}_1\\ \boldsymbol{x}_2\\ \end{array} \right] ^T \right] =\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{\theta x}_1}& \boldsymbol{C}_{\boldsymbol{\theta x}_2}\\ \end{matrix} \right] Cθx=E[θ[x1x2]T]=[Cθx1Cθx2]因此,
θ=[Cθx1Cθx2][Cx1x1−100Cx2x2−1][x1x2]=Cθx1Cx1x1−1x1+Cθx2Cx2x2−1x2=E[θ∣x1]+E[θ∣x2] \begin{aligned} \boldsymbol \theta &= \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{\theta x}_1}& \boldsymbol{C}_{\boldsymbol{\theta x}_2}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}_1\\ \boldsymbol{x}_2\\ \end{array} \right] \\ & = \boldsymbol{C}_{\boldsymbol{\theta x}_1} \boldsymbol{C}_{\boldsymbol{x}_{\boldsymbol{1}}\boldsymbol{x}_{\boldsymbol{1}}}^{-1} \boldsymbol x_1 + \boldsymbol{C}_{\boldsymbol{\theta x}_2} \boldsymbol{C}_{\boldsymbol{x}_2\boldsymbol{x}_2}^{-1} \boldsymbol x_2 \\ & = \mathbb{E} \left [ \theta| \boldsymbol{x}_1 \right] + \mathbb{E} \left [ \theta| \boldsymbol{x}_2 \right] \end{aligned} θ=[Cθx1Cθx2][Cx1x1−100Cx2x2−1][x1x2]=Cθx1Cx1x1−1x1+Cθx2Cx2x2−1x2=E[θ∣x1]+E[θ∣x2]解释2:从线性空间的角度来看,应该会比较形象,因为E[x1x2T]=E[x1]E[x2T]=0\mathbb{E}[\boldsymbol x_1 \boldsymbol{x}^T_2] = \mathbb{E}[\boldsymbol x_1] \mathbb{E}[\boldsymbol{x}^T_2]=\boldsymbol{0}E[x1x2T]=E[x1]E[x2T]=0,我们知道x1\boldsymbol{x}_1x1与x2\boldsymbol{x}_2x2是相互正交的,所以可以表征为各自估计的结果的和。 -
性质2:MMSE估计器是可加的,如果θ=θ1+θ2\theta = \theta_1 + \theta_2θ=θ1+θ2,那么
θ^=E[θ∣x]=E[θ1+θ2∣x]=E[θ1∣x]+E[θ2∣x] \begin{aligned} \hat{\theta} &= \mathbb{E}[\theta|\boldsymbol x] \\ &= \mathbb{E}[\theta_1+\theta_2|\boldsymbol x] \\ & = \mathbb{E}[\theta_1|\boldsymbol x] + \mathbb{E}[\theta_2|\boldsymbol x] \end{aligned} θ^=E[θ∣x]=E[θ1+θ2∣x]=E[θ1∣x]+E[θ2∣x]
在描述完两个性质后,我们令X[n]=[x[0],x[1],⋯ ,x[n]]T\boldsymbol{ X}[n] = [x[0],x[1],\cdots,x[n]]^TX[n]=[x[0],x[1],⋯,x[n]]T,令x~[n]\tilde{x}[n]x~[n]为innovation(The innovation is the part of x[n]x[n]x[n] that is uncorrelated with the previous samples {x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}):
x~[n]=x[n]−x^[n∣n−1](5)
\tilde {x}[n] = x[n] - \hat{x}[n|n-1] \tag{5}
x~[n]=x[n]−x^[n∣n−1](5)
这里我想强调一下为什么x~[n]\tilde{x}[n]x~[n]与{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}不相关,因为x^[n∣n−1]\hat{x}[n|n-1]x^[n∣n−1]是基于观测数据{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}所做的关于x[n]x[n]x[n]的MMSES估计,根据正交原理:估计误差x~[n]\tilde{ x}[n]x~[n]与观测数据的线性组合(这里为数据本身)正交,所以得到x~[n]\tilde{x}[n]x~[n]与{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}不相关。事实上,我们可以把X[n]\boldsymbol{X}[n]X[n]和x~[n]\tilde{x}[n]x~[n]等效为集合{x[0],⋯ ,x[n−1],x[n]}\{x[0],\cdots,x[n-1],x[n]\}{x[0],⋯,x[n−1],x[n]},因为x[n]x[n]x[n]可以被恢复为:
x[n]=x~[n]+x^[n∣n−1]=x~[n]+∑k=0n−1akx[k]
\begin{aligned}
x[n] &= \tilde {x}[n] + \hat{x}[n|n-1] \\
&= \tilde {x}[n] + \sum_{k=0}^{n-1} a_k x[k]
\end{aligned}
x[n]=x~[n]+x^[n∣n−1]=x~[n]+k=0∑n−1akx[k]
其中aka_kak是MMSE估计器对应的相关系数,我们可以把式(3)写为:
s^[n∣n]=E[s[n]∣X[n−1],x~[n]]
\hat{s}[n|n] = \mathbb{E} \left [ s[n] | \boldsymbol X[n-1], \tilde x[n] \right]
s^[n∣n]=E[s[n]∣X[n−1],x~[n]]
又因为X[n−1]\boldsymbol{X}[n-1]X[n−1]与x~[n]\tilde{x}[n]x~[n]不相关,根据性质1可以得到:
s^[n∣n]=E[s[n]∣X[n−1]]+E[s[n]∣x~[n]]
\hat{s}[n|n] = \mathbb{E} \left [ s[n] | \boldsymbol X[n-1] \right] + \mathbb{E} \left [ s[n] | \tilde x[n] \right]
s^[n∣n]=E[s[n]∣X[n−1]]+E[s[n]∣x~[n]]
其中,E[s[n]∣X[n−1]]\mathbb{E}[s[n]|\boldsymbol{X}[n-1]]E[s[n]∣X[n−1]]是基于先前观测数据对s[n]s[n]s[n]的预测,令其为s^[n∣n−1]\hat{s}[n|n-1]s^[n∣n−1],根据式(1)和性质2,我们可以进一步得到:
s^[n∣n−1]=E[s[n]∣X[n−1]]=E[as[n−1]+u[n]∣X[n−1]]=aE[s[n−1]∣X[n−1]]=as^[n−1∣n−1]
\begin{aligned}
\hat{s}[n|n-1] &= \mathbb{E} \left [ s[n] | \boldsymbol X[n-1] \right] \\
&= \mathbb{E} \left [ as[n-1] + u[n] | \boldsymbol X[n-1] \right] \\
& = a \mathbb{E} \left [ s[n-1] | \boldsymbol X[n-1] \right] \\
&= a \hat{s}[n-1|n-1]
\end{aligned}
s^[n∣n−1]=E[s[n]∣X[n−1]]=E[as[n−1]+u[n]∣X[n−1]]=aE[s[n−1]∣X[n−1]]=as^[n−1∣n−1]
因为E[u[n]∣X[n−1]]=0\mathbb{E} \left [ u[n] | \boldsymbol X[n-1] \right]=0E[u[n]∣X[n−1]]=0,这是因为
E[u[n]∣X[n−1]]=E[u[n]]=0
\mathbb{E} \left [ u[n] | \boldsymbol X[n-1] \right] = \mathbb{E} [u[n]] = 0
E[u[n]∣X[n−1]]=E[u[n]]=0
这是因为u[n]u[n]u[n]独立于{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}(该独立性来源于两个方面:首先,u[n]u[n]u[n]独立于所有的w[n]w[n]w[n];其次,s[0],s[1],⋯ ,s[n−1]s[0],s[1],\cdots,s[n-1]s[0],s[1],⋯,s[n−1]是随机变量{u[0],u[1],⋯ ,u[n−1],s[−1]}\{u[0],u[1],\cdots,u[n-1],s[-1]\}{u[0],u[1],⋯,u[n−1],s[−1]}的线性组合,这些随机变量独立于u[n]u[n]u[n])。现在,我们有
s^[n∣n]=s^[n∣n−1]+E[s[n]∣x~[n]](6)
\hat{s}[n|n] = \hat{s}[n|n-1] + \mathbb{E} \left [ s[n]| \tilde x[n] \right] \tag{6}
s^[n∣n]=s^[n∣n−1]+E[s[n]∣x~[n]](6)
其中,
s^[n∣n−1]=as^[n−1∣n−1]
\hat{s}[n|n-1] = a \hat{s}[n-1|n-1]
s^[n∣n−1]=as^[n−1∣n−1]
注意到,E[s[n]∣x~[n]]\mathbb{E} \left [ s[n]| \tilde x[n] \right]E[s[n]∣x~[n]]是基于x~[n]\tilde{x}[n]x~[n]对s[n]s[n]s[n]的MMSE估计,因此该估计器是线性的,E[s[n]∣x~[n]]\mathbb{E} \left [ s[n]| \tilde x[n] \right]E[s[n]∣x~[n]]可以被表征为:
E[s[n]∣x~[n]]=K[n]x~[n]=K[n](x[n]−x^[n∣n−1])
\begin{aligned}
\mathbb{E} \left [ s[n]| \tilde x[n] \right] & = K[n] \tilde x[n]\\
& = K[n] (x[n] - \hat{x}[n|n-1] )
\end{aligned}
E[s[n]∣x~[n]]=K[n]x~[n]=K[n](x[n]−x^[n∣n−1])
(因为s[n]s[n]s[n]的均值为0,所以这里没有所谓的“截距”项),其中
K[n]=E[s[n]x~[n]]E[x~2[n]](7)
K[n] = \frac{\mathbb{E} \left [ s[n] \tilde{x}[n] \right]}{\mathbb{E}[\tilde x^2[n]]} \tag{7}
K[n]=E[x~2[n]]E[s[n]x~[n]](7)
上式是对θ,x\theta,xθ,x联合高斯分布的MMSE估计器,即
θ^=CθxCxx−1x=E[θx]E[x~2[n]]
\hat{\theta} = C_{\theta x} C^{-1}_{x x} x = \frac{\mathbb{E}[\theta x]}{\mathbb{E}[\tilde x^2[n]]}
θ^=CθxCxx−1x=E[x~2[n]]E[θx]
又因为标量观测方程:x[n]=s[n]+w[n]x[n] = s[n] + w[n]x[n]=s[n]+w[n],根据性质2,我们可以得到
x^[n∣n−1]=s^[n∣n−1]+w^[n∣n−1]=s^[n∣n−1]
\begin{aligned}
\hat x[n|n-1] &= \hat s[n|n-1] + \hat w[n|n-1] \\
&= \hat{s}[n|n-1]
\end{aligned}
x^[n∣n−1]=s^[n∣n−1]+w^[n∣n−1]=s^[n∣n−1]
根据式(6),我们知道
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(8)
\hat{s}[n|n] = \hat{s}[n|n-1] + K[n](x[n] - \hat s[n|n-1]) \tag{8}
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(8)
其中
s^[n∣n−1]=as^[n−1∣n−1](9)
\hat{ s}[n|n-1] = a \hat{s}[n-1|n-1] \tag{9}
s^[n∣n−1]=as^[n−1∣n−1](9)
现在只剩增益因子K[n]K[n]K[n]需要决定,根据式(7),我们知道
K[n]=E[s[n](x[n]−s^[n∣n−1])]E[(x[n]−s^[n∣n−1])2](10)
K[n] = \frac{\mathbb{E}\left [ s[n] (x[n] - \hat{s}[n|n-1]) \right ]}{\mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right]} \tag{10}
K[n]=E[(x[n]−s^[n∣n−1])2]E[s[n](x[n]−s^[n∣n−1])](10)
为了进一步完善K[n]K[n]K[n],我们先给出两个结论:
-
- E[s[n](x[n]−s^[n∣n−1])]=E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]\mathbb{E} \left [ s[n] (x[n] - \hat{s}[n|n-1]) \right ] = \mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ]E[s[n](x[n]−s^[n∣n−1])]=E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]
-
- E[w[n](s[n]−s^[n∣n−1])]=0\mathbb{E} \left [ w[n]\left ( s[n] - \hat{s}[n|n-1] \right) \right ] = 0E[w[n](s[n]−s^[n∣n−1])]=0
第一个结论是因为
x~[n]=x[n]−x^[n∣n−1]=x[n]−s^[n∣n−1](11)
\begin{aligned}
\tilde x [n] &= x[n] - \hat{x}[n|n-1] \\
&= x[n] - \hat{s}[n|n-1] \tag{11}
\end{aligned}
x~[n]=x[n]−x^[n∣n−1]=x[n]−s^[n∣n−1](11)
与之前的观测数据{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}不相关,必然也就与s^[n∣n−1]\hat{s}[n|n-1]s^[n∣n−1](为{x[0],⋯ ,x[n−1]}\{x[0],\cdots,x[n-1]\}{x[0],⋯,x[n−1]}的线性组合)不相关,因此E[s^[n∣n−1](x[n]−s^[n∣n−1])]=0\mathbb{E}[\hat{s}[n|n-1](x[n] - \hat{s}[n|n-1])]=0E[s^[n∣n−1](x[n]−s^[n∣n−1])]=0,也就得到了结论1。第二个结论比较直接,这里不做解释。把这两个结论代入到式(10)(10)(10)中,增益因子变为:
K[n]=E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]E[(s[n]−s^[n∣n−1]+w[n])2]=E[(s[n]−s^[n∣n−1])2]σn2+E[(s[n]−s^[n∣n−1])2](12)
\begin{aligned}
K[n] &= \frac{\mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ]}{\mathbb{E} \left [{\left( s[n] - \hat{s}[n|n-1] + w[n] \right)}^2 \right ]} \\
& = \frac{\mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right]}{ \sigma^2_n + \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] } \tag{12}
\end{aligned}
K[n]=E[(s[n]−s^[n∣n−1]+w[n])2]E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]=σn2+E[(s[n]−s^[n∣n−1])2]E[(s[n]−s^[n∣n−1])2](12)
上式的分子变为平方项是因为x[n]=s[n]+w[n]x[n] = s[n]+w[n]x[n]=s[n]+w[n],而w[n]w[n]w[n]独立于s[n]s[n]s[n]和s^[n∣n−1]\hat{s}[n|n-1]s^[n∣n−1]。另外,注意到,分子项E[(s[n]−s^[n∣n−1])2]\mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right]E[(s[n]−s^[n∣n−1])2]就是基于先前观测数据MMSE估计所对应的最小MSE,记为M[n∣n−1]M[n|n-1]M[n∣n−1],那么
K[n]=M[n∣n−1]σn2+M[n∣n−1](13)
K[n] = \frac{M[n|n-1]}{\sigma^2_n + M[n|n-1]} \tag{13}
K[n]=σn2+M[n∣n−1]M[n∣n−1](13)
因为s[n]=as[n−1]+u[n],s^[n∣n−1]=as^[n−1∣n−1]s[n]=as[n-1]+u[n], \hat{ s}[n|n-1] = a \hat{s}[n-1|n-1]s[n]=as[n−1]+u[n],s^[n∣n−1]=as^[n−1∣n−1],我们有
M[n∣n−1]=E[(s[n]−s^[n∣n−1])2]=E[(as[n−1]+u[n]−s^[n∣n−1])2]=E[(a(s[n−1]−s^[n−1∣n−1])+u[n])2]
\begin{aligned}
M[n|n-1] & = \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] \\
& = \mathbb{E} \left [ (as[n-1] + u[n] - \hat{s}[n|n-1])^2 \right] \\
& = \mathbb{E} \left [ \left(a(s[n-1] - \hat{s}[n-1|n-1] ) + u[n] \right)^2 \right]
\end{aligned}
M[n∣n−1]=E[(s[n]−s^[n∣n−1])2]=E[(as[n−1]+u[n]−s^[n∣n−1])2]=E[(a(s[n−1]−s^[n−1∣n−1])+u[n])2]
不难发现,
E[(s[n−1]−s^[n−1∣n−1])u[n]]=0
\mathbb{E} \left [ \left (s[n-1] - \hat{s}[n-1|n-1] \right) u [n]\right] = 0
E[(s[n−1]−s^[n−1∣n−1])u[n]]=0
因此,我们可以得到
M[n∣n−1]=a2M[n−1∣n−1]+σu2
M[n|n-1] = a^2 M[n-1|n-1] + \sigma^2_u
M[n∣n−1]=a2M[n−1∣n−1]+σu2
最终,我们需要对M[n∣n]M[n|n]M[n∣n]进行迭代,利用式(8):s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])\hat{s}[n|n] = \hat{s}[n|n-1] + K[n](x[n] - \hat s[n|n-1])s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1]),我们有
M[n∣n]=E[(s[n]−s^[n∣n])2]=E[(s[n]−s^[n∣n−1]−K[n](x[n]−s^[n∣n−1]))2]=E[(s[n]−s^[n∣n−1])2]−2K[n]⋅E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])] +K2[n]⋅E[(x[n]−s^[n∣n−1])2]
\begin{aligned}
M[n|n] & = \mathbb{E} \left [ (s[n] - \hat{s}[n|n])^2 \right] \\
&= \mathbb{E} \left [ \left ( s[n] - \hat{s}[n|n-1] - K[n](x[n] - \hat s[n|n-1]) \right)^2 \right] \\
& = \mathbb{E} \left [ (s[n] - \hat{s}[n|n-1])^2 \right] - 2 K[n] \cdot \mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ] \\
& \ \ \ \ + K^2[n] \cdot \mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right]
\end{aligned}
M[n∣n]=E[(s[n]−s^[n∣n])2]=E[(s[n]−s^[n∣n−1]−K[n](x[n]−s^[n∣n−1]))2]=E[(s[n]−s^[n∣n−1])2]−2K[n]⋅E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])] +K2[n]⋅E[(x[n]−s^[n∣n−1])2]
注意到,第二项的期望就是式(12)中K[n]K[n]K[n]的分子,最后一项的期望是K[n]K[n]K[n]的分母项,得到
E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]=K[n](M[n∣n−1]+σn2)\mathbb{E} \left[ (s[n] - \hat{s}[n|n-1])(x[n] - \hat{s}[n|n-1]) \right ] = K[n](M[n|n-1] + \sigma_n^2)E[(s[n]−s^[n∣n−1])(x[n]−s^[n∣n−1])]=K[n](M[n∣n−1]+σn2)
E[(x[n]−s^[n∣n−1])2]=M[n∣n−1]K[n] \mathbb{E} \left [ (x[n] - \hat{s}[n|n-1])^2 \right] = \frac{M[n|n-1]}{K[n]} E[(x[n]−s^[n∣n−1])2]=K[n]M[n∣n−1]
因此,
M[n∣n]=M[n∣n−1]−2K2[n](M[n∣n−1]+σn2)+K[n]M[n∣n−1]=M[n∣n−1]−2K[n]M[n∣n−1]+K[n]M[n∣n−1]=(1−K[n])M[n∣n−1]
\begin{aligned}
M[n|n] & = M[n|n-1] - 2K^2[n] (M[n|n-1] + \sigma^2_n) + K[n]M[n|n-1] \\
& = M[n|n-1] - 2K[n] M[n|n-1] + K[n] M[n|n-1] \\
& = (1-K[n]) M[n|n-1]
\end{aligned}
M[n∣n]=M[n∣n−1]−2K2[n](M[n∣n−1]+σn2)+K[n]M[n∣n−1]=M[n∣n−1]−2K[n]M[n∣n−1]+K[n]M[n∣n−1]=(1−K[n])M[n∣n−1]
至此,我们完成了标量形式Kalman滤波的推导,总结为:∀n≥0\forall n \geq 0∀n≥0,
Prediction:
s^[n∣n−1]=as^[n−1∣n−1](14)
\hat{s}[n|n-1] = a \hat{s} [n-1|n-1] \tag{14}
s^[n∣n−1]=as^[n−1∣n−1](14)
Minimum Prediction MSE:
M[n∣n−1]=a2M[n−1∣n−1]+σu2(15)
M[n|n-1] = a^2 M[n-1|n-1] + \sigma^2_u \tag{15}
M[n∣n−1]=a2M[n−1∣n−1]+σu2(15)
Kalman Gain:
K[n]=M[n∣n−1]σn2+M[n∣n−1](16)
K[n] = \frac{M[n|n-1]}{\sigma^2_n + M[n|n-1]} \tag{16}
K[n]=σn2+M[n∣n−1]M[n∣n−1](16)
Correction:
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(17)
\hat{s}[n|n] = \hat{s}[n|n-1] + K[n] (x[n] - \hat{s}[n|n-1]) \tag{17}
s^[n∣n]=s^[n∣n−1]+K[n](x[n]−s^[n∣n−1])(17)
Minimum MSE:
M[n∣n]=(1−K[n])M[n∣n−1](18)
M[n|n] = (1-K[n]) M[n|n-1] \tag{18}
M[n∣n]=(1−K[n])M[n∣n−1](18)
回顾之前的推导,我们知道均值为0的假设(包括μs=0,E[s[n]]=0\mu_s=0,\mathbb{E}[s[n]]=0μs=0,E[s[n]]=0)是为了利用正交性原理,但事实上,即使μs≠0\mu_s \neq 0μs=0,最终得到的公式与(14-18)式是一致的。在初始化过程中,我们使用s^[−1∣−1]=E[s[−1]]=μs\hat{s}[-1|-1] = \mathbb{E}[s[-1]] = \mu_ss^[−1∣−1]=E[s[−1]]=μs和M[−1∣−1]=σs2M[-1|-1] = \sigma^2_sM[−1∣−1]=σs2,因为这是没有观测数据之前所能掌握的数据。另外,我们可以把增益部分的估计视为对u[n]u[n]u[n]的估计u^[n]\hat{u}[n]u^[n],公式表征为:
s^[n∣n]=as^[n−1∣n−1]+u^[n]
\hat{s}[n|n] = a \hat{s}[n-1|n-1] + \hat{u} [n]
s^[n∣n]=as^[n−1∣n−1]+u^[n]
其中u^[n]=K[n](x[n]−s^[n∣n−1])\hat{u}[n] = K[n] (x[n] - \hat{s}[n|n-1])u^[n]=K[n](x[n]−s^[n∣n−1]),某种程度上来说,该估计可以认为是对u[n]u[n]u[n]的估计,所以合理地认为s^[n∣n]≈s[n]\hat{s}[n|n] \approx s[n]s^[n∣n]≈s[n]。