充分完备统计量在绝大多数情况下根本找不到,但即使在这种情况下,仍然可以求出统计量优化的极限,即克拉美罗下界。
Cramer-Rao Lower Bound (CRLB) 很明显是一个MSE值,并且和以下因素有关:
- 统计模型 p ( x , θ ) p(\mathbf{x},\theta) p(x,θ);
- 样本数量 n n n。
1. 推导过程
下面推导Cramer-Rao Lower Bound:
假设样本数为
n
n
n,样本向量为
x
\mathbf{x}
x,待估计参数是一个标量
θ
∈
R
\theta \in \R
θ∈R。假设估计量为
θ
^
(
x
)
\hat\theta(\mathbf{x})
θ^(x)。需要注意的是,CRLB和这个
θ
^
\hat\theta
θ^无关,它是最好的
θ
^
\hat\theta
θ^对应的MSE。
对于无偏统计量
θ
^
(
x
)
\hat\theta(\mathbf{x})
θ^(x),考察它的MSE最小值。既然无偏:
E
(
θ
^
−
θ
)
=
∫
R
n
(
θ
^
−
θ
)
p
(
x
,
θ
)
d
x
=
0
\mathrm{E}(\hat\theta - \theta) = \int_{\R^n} (\hat\theta - \theta)p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 0
E(θ^−θ)=∫Rn(θ^−θ)p(x,θ)dx=0
对
θ
\theta
θ求导:
∫
R
n
[
∂
p
(
x
,
θ
)
∂
θ
(
θ
^
−
θ
)
−
p
(
x
,
θ
)
]
d
x
=
0
\int_{\R^n} \left[\frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}(\hat\theta - \theta) - p(\mathbf{x},\theta) \right] \mathrm{d}\mathbf{x} = 0
∫Rn[∂θ∂p(x,θ)(θ^−θ)−p(x,θ)]dx=0
即:
∫
R
n
∂
p
(
x
,
θ
)
∂
θ
(
θ
^
−
θ
)
d
x
=
1
\int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}(\hat\theta - \theta) \mathrm{d}\mathbf{x} = 1
∫Rn∂θ∂p(x,θ)(θ^−θ)dx=1
引入对数进行构造:
∫
R
n
[
(
(
θ
^
−
θ
)
∂
ln
p
(
x
,
θ
)
∂
θ
)
⋅
p
(
x
,
θ
)
]
d
x
=
1
\int_{\R^n} \left[ \left( (\hat\theta - \theta) \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right) \cdot p(\mathbf{x}, \theta) \right] \mathrm{d}\mathbf{x} = 1
∫Rn[((θ^−θ)∂θ∂lnp(x,θ))⋅p(x,θ)]dx=1
引入概率上的柯西-施瓦茨不等式:(乘积期望操作可以被当作内积)
E ( X Y ) ≤ E ( X 2 ) ⋅ E ( Y 2 ) \mathrm{E}(XY) \leq \sqrt{\mathrm{E}(X^2)\cdot \mathrm{E}(Y^2)} E(XY)≤E(X2)⋅E(Y2)
所以:
E
(
θ
^
−
θ
)
2
⋅
E
[
∂
ln
p
(
x
,
θ
)
∂
θ
]
2
≥
E
[
(
θ
^
−
θ
)
∂
ln
p
(
x
,
θ
)
∂
θ
]
=
1
\mathrm{E}(\hat\theta - \theta)^2 \cdot \mathrm{E}\left[ \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right]^2 \geq \mathrm{E}\left[ (\hat\theta - \theta) \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right] = 1
E(θ^−θ)2⋅E[∂θ∂lnp(x,θ)]2≥E[(θ^−θ)∂θ∂lnp(x,θ)]=1
即:
M
S
E
(
θ
^
)
=
V
a
r
(
θ
^
)
≥
[
E
(
∂
ln
p
(
x
,
θ
)
∂
θ
)
2
]
−
1
\mathrm{MSE}(\hat\theta) = \mathrm{Var}(\hat\theta) \geq \left[\mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2 \right]^{-1}
MSE(θ^)=Var(θ^)≥[E(∂θ∂lnp(x,θ))2]−1
2. 案例
用一个案例展示CRLB的计算方式。
考虑一种常见的估计:测量直流电压,观测量上存在一个AWGN。进行
n
n
n次采样,每次采样之间是i.i.d。即参数化模型为:
p
(
x
,
θ
)
=
1
(
2
π
σ
)
n
exp
[
−
∑
i
=
1
n
(
x
i
−
θ
)
2
2
σ
2
]
p(\mathbf{x},\theta) = \frac{1}{(\sqrt{2\pi}\sigma)^n}\exp\left[-\frac{\sum_{i = 1}^n(x_i - \theta)^2}{2\sigma^2}\right]
p(x,θ)=(2πσ)n1exp[−2σ2∑i=1n(xi−θ)2]
先做对数似然:
ln
p
(
x
,
θ
)
=
−
n
ln
(
2
π
σ
)
−
1
2
σ
2
∑
i
=
1
n
(
x
i
−
θ
)
2
\ln p(\mathbf{x}, \theta) = -n\ln(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (x_i - \theta)^2
lnp(x,θ)=−nln(2πσ)−2σ21i=1∑n(xi−θ)2
然后再求导:
∂
ln
p
(
x
,
θ
)
∂
θ
=
1
σ
2
∑
i
=
1
n
(
x
i
−
θ
)
\frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} = \frac{1}{\sigma^2}\sum_{i = 1}^n(x_i - \theta)
∂θ∂lnp(x,θ)=σ21i=1∑n(xi−θ)
下面计算Fisher Information,即把求导后的对数似然函数平方求期望。
I
(
x
,
θ
)
=
E
[
∂
ln
p
(
x
,
θ
)
∂
θ
]
2
=
1
σ
4
E
[
∑
i
=
1
n
(
x
i
−
θ
)
]
2
=
1
σ
4
E
[
∑
i
=
1
n
(
x
i
−
θ
)
2
+
∑
j
≠
k
(
x
j
−
θ
)
(
x
k
−
θ
)
]
=
n
σ
2
\begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left[\frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right]^2\\ &= \frac{1}{\sigma^4}\mathrm{E}\left[ \sum_{i = 1}^n (x_i - \theta) \right]^2 \\ &= \frac{1}{\sigma^4}\mathrm{E}\left[ \sum_{i = 1}^n (x_i - \theta)^2 + \sum_{j \neq k}(x_j - \theta)(x_k - \theta) \right]\\ &= \frac{n}{\sigma^2} \end{aligned}
I(x,θ)=E[∂θ∂lnp(x,θ)]2=σ41E[i=1∑n(xi−θ)]2=σ41E⎣⎡i=1∑n(xi−θ)2+j=k∑(xj−θ)(xk−θ)⎦⎤=σ2n
所以CRLB就是:(这个就是算数均值的MSE)
M
S
E
(
x
,
θ
)
≥
σ
2
n
\mathrm{MSE}(\mathbf{x},\theta) \geq \frac{\sigma^2}{n}
MSE(x,θ)≥nσ2
3. Fisher Information 简化计算方法
这个Fisher Information有另一种计算方法:
I
(
x
,
θ
)
=
E
(
∂
ln
p
(
x
,
θ
)
∂
θ
)
2
=
−
E
(
∂
2
∂
θ
2
ln
p
(
x
,
θ
)
)
\begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2\\ &= -\mathrm{E}\left( \frac{\partial^2}{\partial \theta^2} \ln p(\mathbf{x}, \theta) \right) \end{aligned}
I(x,θ)=E(∂θ∂lnp(x,θ))2=−E(∂θ2∂2lnp(x,θ))
下面证明这个结论。
首先,基于概率的特性:
∫
R
n
p
(
x
,
θ
)
d
x
=
1
\int_{\R^n} p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 1
∫Rnp(x,θ)dx=1
然后开始对
θ
\theta
θ求导:
∂
∂
θ
∫
R
n
p
(
x
,
θ
)
d
x
=
0
\frac{\partial}{\partial \theta}\int_{\R^n} p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x} = 0
∂θ∂∫Rnp(x,θ)dx=0
交换导数和积分:
∫
R
n
∂
p
(
x
,
θ
)
∂
θ
d
x
=
0
\int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = 0
∫Rn∂θ∂p(x,θ)dx=0
这个积分形式不好,因为这玩意不伦不类,既不是个概率,也不是个期望。所以希望进行一些构从而得到有意义的形式:
∫
R
n
∂
p
(
x
,
θ
)
∂
θ
d
x
=
∫
R
n
[
∂
∂
θ
ln
[
p
(
x
,
θ
)
]
]
⋅
p
(
x
,
θ
)
d
x
=
0
\int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = \int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} = 0
∫Rn∂θ∂p(x,θ)dx=∫Rn[∂θ∂ln[p(x,θ)]]⋅p(x,θ)dx=0
然后再来一次求导:
∂
∂
θ
∫
R
n
∂
p
(
x
,
θ
)
∂
θ
d
x
=
∂
∂
θ
∫
R
n
[
∂
∂
θ
ln
[
p
(
x
,
θ
)
]
]
⋅
p
(
x
,
θ
)
d
x
=
0
\frac{\partial }{\partial \theta}\int_{\R^n} \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\mathrm{d}\mathbf{x} = \frac{\partial }{\partial \theta}\int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} = 0
∂θ∂∫Rn∂θ∂p(x,θ)dx=∂θ∂∫Rn[∂θ∂ln[p(x,θ)]]⋅p(x,θ)dx=0
中间那玩意进行求导积分顺序交换,然后应用乘积求导公式,求出来是这样的:
∂
∂
θ
∫
R
n
[
∂
∂
θ
ln
[
p
(
x
,
θ
)
]
]
⋅
p
(
x
,
θ
)
d
x
=
∫
R
n
[
p
(
x
,
θ
)
⋅
∂
2
∂
θ
2
ln
p
(
x
,
θ
)
+
∂
p
(
x
,
θ
)
∂
θ
⋅
∂
ln
p
(
x
,
θ
)
∂
θ
]
d
x
=
E
[
∂
2
∂
θ
2
ln
p
(
x
,
θ
)
]
+
∫
R
n
[
∂
p
(
x
,
θ
)
∂
θ
⋅
∂
ln
p
(
x
,
θ
)
∂
θ
]
d
x
\begin{aligned} \frac{\partial }{\partial \theta}\int_{\R^n} \left[ \frac{\partial}{\partial \theta}\ln [p(\mathbf{x}, \theta)] \right] \cdot p(\mathbf{x},\theta) \mathrm{d}\mathbf{x} &= \int_{\R^n} \bigg[ p(\mathbf{x}, \theta) \cdot \frac{\partial^2}{\partial \theta^2}\ln p(\mathbf{x}, \theta) + \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x}\\ &= \mathrm{E}\left[ \frac{\partial^2}{\partial\theta^2} \ln p(\mathbf{x}, \theta) \right] + \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x} \end{aligned}
∂θ∂∫Rn[∂θ∂ln[p(x,θ)]]⋅p(x,θ)dx=∫Rn[p(x,θ)⋅∂θ2∂2lnp(x,θ)+∂θ∂p(x,θ)⋅∂θ∂lnp(x,θ)]dx=E[∂θ2∂2lnp(x,θ)]+∫Rn[∂θ∂p(x,θ)⋅∂θ∂lnp(x,θ)]dx
后面那东西仍然是个不伦不类的积分,所以再用一下对数似然函数一阶导数的性质:
∫
R
n
[
∂
p
(
x
,
θ
)
∂
θ
⋅
∂
ln
p
(
x
,
θ
)
∂
θ
]
d
x
=
∫
R
n
[
∂
p
(
x
,
θ
)
∂
θ
⋅
∂
ln
p
(
x
,
θ
)
∂
θ
p
(
x
,
θ
)
p
(
x
,
θ
)
]
d
x
=
∫
R
n
(
∂
ln
p
(
x
,
θ
)
∂
θ
)
2
p
(
x
,
θ
)
d
x
=
E
[
∂
ln
p
(
x
,
θ
)
∂
θ
]
2
\begin{aligned} \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \bigg] \mathrm{d}\mathbf{x} &= \int_{\R^n} \bigg[ \frac{\partial p(\mathbf{x}, \theta)}{\partial \theta}\cdot \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \frac{p(\mathbf{x}, \theta)}{p(\mathbf{x}, \theta)} \bigg] \mathrm{d}\mathbf{x}\\ &= \int_{\R^n} \left(\frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta}\right)^2 p(\mathbf{x}, \theta) \mathrm{d}\mathbf{x}\\ &= \mathrm{E}\left[ \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right]^2 \end{aligned}
∫Rn[∂θ∂p(x,θ)⋅∂θ∂lnp(x,θ)]dx=∫Rn[∂θ∂p(x,θ)⋅∂θ∂lnp(x,θ)p(x,θ)p(x,θ)]dx=∫Rn(∂θ∂lnp(x,θ))2p(x,θ)dx=E[∂θ∂lnp(x,θ)]2
所以:
I
(
x
,
θ
)
=
E
(
∂
ln
p
(
x
,
θ
)
∂
θ
)
2
=
−
E
(
∂
2
∂
θ
2
ln
p
(
x
,
θ
)
)
\begin{aligned} I(\mathbf{x}, \theta) &= \mathrm{E}\left( \frac{\partial \ln p(\mathbf{x}, \theta)}{\partial \theta} \right)^2\\ &= -\mathrm{E}\left( \frac{\partial^2}{\partial \theta^2} \ln p(\mathbf{x}, \theta) \right) \end{aligned}
I(x,θ)=E(∂θ∂lnp(x,θ))2=−E(∂θ2∂2lnp(x,θ))