Cramer-Rao Lower Bound 推导

充分完备统计量在绝大多数情况下根本找不到,但即使在这种情况下,仍然可以求出统计量优化的极限,即克拉美罗下界。

Cramer-Rao Lower Bound (CRLB) 很明显是一个MSE值,并且和以下因素有关:

  • 统计模型 p ( x , θ ) p(\mathbf{x},\theta) p(x,θ)
  • 样本数量 n n n

1. 推导过程

下面推导Cramer-Rao Lower Bound:

假设样本数为 n n n,样本向量为 x \mathbf{x} x,待估计参数是一个标量 θ ∈ R \theta \in \R θR。假设估计量为 θ ^ ( x ) \hat\theta(\mathbf{x}) θ^(x)需要注意的是,CRLB和这个 θ ^ \hat\theta θ^无关,它是最好的 θ ^ \hat\theta θ^对应的MSE。
对于无偏统计量 θ ^ ( x ) \hat\theta(\mathbf{x}) θ^(x),考察它的MSE最小值。既然无偏:
E ( θ ^ − θ ) = ∫ R n ( θ ^ − θ ) p ( x , θ ) d x = 0 \mathrm{E}(\hat\theta - \theta) = \int_{\R^n} (\hat\theta - \theta)p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 0 E(θ^θ)=Rn(θ^θ)p(x,θ)dx=0
θ \theta θ求导:
∫ R n [ ∂ p ( x , θ ) ∂ θ ( θ ^ − θ ) − p ( x , θ ) ] d x = 0 \int_{\R^n} \left[\frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}(\hat\theta - \theta) - p(\mathbf{x},\theta) \right] \mathrm{d}\mathbf{x} = 0 Rn[θp(x,θ)(θ^θ)p(x,θ)]dx=0
即:
∫ R n ∂ p ( x , θ ) ∂ θ ( θ ^ − θ ) d x = 1 \int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}(\hat\theta - \theta) \mathrm{d}\mathbf{x} = 1 Rnθp(x,θ)(θ^θ)dx=1
引入对数进行构造:
∫ R n [ ( ( θ ^ − θ ) ∂ ln ⁡ p ( x , θ ) ∂ θ ) ⋅ p ( x , θ ) ] d x = 1 \int_{\R^n} \left[ \left( (\hat\theta - \theta) \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right) \cdot p(\mathbf{x}, \theta) \right] \mathrm{d}\mathbf{x} = 1 Rn[((θ^θ)θlnp(x,θ))p(x,θ)]dx=1

引入概率上的柯西-施瓦茨不等式:(乘积期望操作可以被当作内积

E ( X Y ) ≤ E ( X 2 ) ⋅ E ( Y 2 ) \mathrm{E}(XY) \leq \sqrt{\mathrm{E}(X^2)\cdot \mathrm{E}(Y^2)} E(XY)E(X2)E(Y2)

所以:
E ( θ ^ − θ ) 2 ⋅ E [ ∂ ln ⁡ p ( x , θ ) ∂ θ ] 2 ≥ E [ ( θ ^ − θ ) ∂ ln ⁡ p ( x , θ ) ∂ θ ] = 1 \mathrm{E}(\hat\theta - \theta)^2 \cdot \mathrm{E}\left[ \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right]^2 \geq \mathrm{E}\left[ (\hat\theta - \theta) \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right] = 1 E(θ^θ)2E[θlnp(x,θ)]2E[(θ^θ)θlnp(x,θ)]=1
即:
M S E ( θ ^ ) = V a r ( θ ^ ) ≥ [ E ( ∂ ln ⁡ p ( x , θ ) ∂ θ ) 2 ] − 1 \mathrm{MSE}(\hat\theta) = \mathrm{Var}(\hat\theta) \geq \left[\mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2 \right]^{-1} MSE(θ^)=Var(θ^)[E(θlnp(x,θ))2]1

2. 案例

用一个案例展示CRLB的计算方式。

考虑一种常见的估计:测量直流电压,观测量上存在一个AWGN。进行 n n n次采样,每次采样之间是i.i.d。即参数化模型为:
p ( x , θ ) = 1 ( 2 π σ ) n exp ⁡ [ − ∑ i = 1 n ( x i − θ ) 2 2 σ 2 ] p(\mathbf{x},\theta) = \frac{1}{(\sqrt{2\pi}\sigma)^n}\exp\left[-\frac{\sum_{i = 1}^n(x_i - \theta)^2}{2\sigma^2}\right] p(x,θ)=(2π σ)n1exp[2σ2i=1n(xiθ)2]
先做对数似然:
ln ⁡ p ( x , θ ) = − n ln ⁡ ( 2 π σ ) − 1 2 σ 2 ∑ i = 1 n ( x i − θ ) 2 \ln p(\mathbf{x}, \theta) = -n\ln(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (x_i - \theta)^2 lnp(x,θ)=nln(2π σ)2σ21i=1n(xiθ)2
然后再求导:
∂ ln ⁡ p ( x , θ ) ∂ θ = 1 σ 2 ∑ i = 1 n ( x i − θ ) \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} = \frac{1}{\sigma^2}\sum_{i = 1}^n(x_i - \theta) θlnp(x,θ)=σ21i=1n(xiθ)
下面计算Fisher Information,即把求导后的对数似然函数平方求期望。
I ( x , θ ) = E [ ∂ ln ⁡ p ( x , θ ) ∂ θ ] 2 = 1 σ 4 E [ ∑ i = 1 n ( x i − θ ) ] 2 = 1 σ 4 E [ ∑ i = 1 n ( x i − θ ) 2 + ∑ j ≠ k ( x j − θ ) ( x k − θ ) ] = n σ 2 \begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left[\frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right]^2\\ &= \frac{1}{\sigma^4}\mathrm{E}\left[ \sum_{i = 1}^n (x_i - \theta) \right]^2 \\ &= \frac{1}{\sigma^4}\mathrm{E}\left[ \sum_{i = 1}^n (x_i - \theta)^2 + \sum_{j \neq k}(x_j - \theta)(x_k - \theta) \right]\\ &= \frac{n}{\sigma^2} \end{aligned} I(x,θ)=E[θlnp(x,θ)]2=σ41E[i=1n(xiθ)]2=σ41Ei=1n(xiθ)2+j=k(xjθ)(xkθ)=σ2n
所以CRLB就是:(这个就是算数均值的MSE)
M S E ( x , θ ) ≥ σ 2 n \mathrm{MSE}(\mathbf{x},\theta) \geq \frac{\sigma^2}{n} MSE(x,θ)nσ2

3. Fisher Information 简化计算方法

这个Fisher Information有另一种计算方法:
I ( x , θ ) = E ( ∂ ln ⁡ p ( x , θ ) ∂ θ ) 2 = − E ( ∂ 2 ∂ θ 2 ln ⁡ p ( x , θ ) ) \begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2\\ &= -\mathrm{E}\left( \frac{\partial^2}{\partial \theta^2} \ln p(\mathbf{x}, \theta) \right) \end{aligned} I(x,θ)=E(θlnp(x,θ))2=E(θ22lnp(x,θ))
下面证明这个结论。

首先,基于概率的特性:
∫ R n p ( x , θ ) d x = 1 \int_{\R^n} p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 1 Rnp(x,θ)dx=1
然后开始对 θ \theta θ求导:
∂ ∂ θ ∫ R n p ( x , θ ) d x = 0 \frac{\partial}{\partial \theta}\int_{\R^n} p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 0 θRnp(x,θ)dx=0
交换导数和积分:
∫ R n ∂ p ( x , θ ) ∂ θ d x = 0 \int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = 0 Rnθp(x,θ)dx=0
这个积分形式不好,因为这玩意不伦不类,既不是个概率,也不是个期望。所以希望进行一些构从而得到有意义的形式:
∫ R n ∂ p ( x , θ ) ∂ θ d x = ∫ R n [ ∂ ∂ θ ln ⁡ [ p ( x , θ ) ] ] ⋅ p ( x , θ ) d x = 0 \int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = \int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} = 0 Rnθp(x,θ)dx=Rn[θln[p(x,θ)]]p(x,θ)dx=0
然后再来一次求导:
∂ ∂ θ ∫ R n ∂ p ( x , θ ) ∂ θ d x = ∂ ∂ θ ∫ R n [ ∂ ∂ θ ln ⁡ [ p ( x , θ ) ] ] ⋅ p ( x , θ ) d x = 0 \frac{\partial }{\partial \theta}\int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = \frac{\partial }{\partial \theta}\int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} = 0 θRnθp(x,θ)dx=θRn[θln[p(x,θ)]]p(x,θ)dx=0
中间那玩意进行求导积分顺序交换,然后应用乘积求导公式,求出来是这样的:
∂ ∂ θ ∫ R n [ ∂ ∂ θ ln ⁡ [ p ( x , θ ) ] ] ⋅ p ( x , θ ) d x = ∫ R n [ p ( x , θ ) ⋅ ∂ 2 ∂ θ 2 ln ⁡ p ( x , θ ) + ∂ p ( x , θ ) ∂ θ ⋅ ∂ ln ⁡ p ( x , θ ) ∂ θ ] d x = E [ ∂ 2 ∂ θ 2 ln ⁡ p ( x , θ ) ] + ∫ R n [ ∂ p ( x , θ ) ∂ θ ⋅ ∂ ln ⁡ p ( x , θ ) ∂ θ ] d x \begin{aligned} \frac{\partial }{\partial \theta}\int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} &= \int_{\R^n} \bigg[ p(\mathbf{x}, \theta) \cdot \frac{\partial^2}{\partial \theta^2}\ln p(\mathbf{x}, \theta) + \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x}\\ &= \mathrm{E}\left[ \frac{\partial^2}{\partial\theta^2} \ln p(\mathbf{x}, \theta) \right] + \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x} \end{aligned} θRn[θln[p(x,θ)]]p(x,θ)dx=Rn[p(x,θ)θ22lnp(x,θ)+θp(x,θ)θlnp(x,θ)]dx=E[θ22lnp(x,θ)]+Rn[θp(x,θ)θlnp(x,θ)]dx
后面那东西仍然是个不伦不类的积分,所以再用一下对数似然函数一阶导数的性质:
∫ R n [ ∂ p ( x , θ ) ∂ θ ⋅ ∂ ln ⁡ p ( x , θ ) ∂ θ ] d x = ∫ R n [ ∂ p ( x , θ ) ∂ θ ⋅ ∂ ln ⁡ p ( x , θ ) ∂ θ p ( x , θ ) p ( x , θ ) ] d x = ∫ R n ( ∂ ln ⁡ p ( x , θ ) ∂ θ ) 2 p ( x , θ ) d x = E [ ∂ ln ⁡ p ( x , θ ) ∂ θ ] 2 \begin{aligned} \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x} &= \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \frac{p(\mathbf{x}, \theta)}{p(\mathbf{x}, \theta)} \bigg] \mathrm{d}\mathbf{x}\\ &= \int_{\R^n} \left(\frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right)^2 p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x}\\ &= \mathrm{E}\left[ \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right]^2 \end{aligned} Rn[θp(x,θ)θlnp(x,θ)]dx=Rn[θp(x,θ)θlnp(x,θ)p(x,θ)p(x,θ)]dx=Rn(θlnp(x,θ))2p(x,θ)dx=E[θlnp(x,θ)]2
所以:
I ( x , θ ) = E ( ∂ ln ⁡ p ( x , θ ) ∂ θ ) 2 = − E ( ∂ 2 ∂ θ 2 ln ⁡ p ( x , θ ) ) \begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2\\ &= -\mathrm{E}\left( \frac{\partial^2}{\partial \theta^2} \ln p(\mathbf{x}, \theta) \right) \end{aligned} I(x,θ)=E(θlnp(x,θ))2=E(θ22lnp(x,θ))

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值