最大似然估计的渐近分布
记似然函数为
L(θ)=∏i=1nf(Xi;θ)
L(\theta)=\prod_{i=1}^{n}f(X_i;\theta)
L(θ)=i=1∏nf(Xi;θ)
令l(θ)=logL(θ)l(\theta)=logL(\theta)l(θ)=logL(θ)为对数似然函数,设θ\thetaθ为真值,θ^\hat{\theta}θ^为最大似然估计值。则有
∂l(θ^)∂θ=∂l(θ)∂θ+∂2l(θ)∂θ2(θ^−θ)=0
\frac{\partial l(\hat{\theta})}{\partial \theta} = \frac{\partial l(\theta)}{\partial \theta}+\frac{\partial^2 l(\theta)}{\partial \theta^2}(\hat{\theta}-\theta)=0
∂θ∂l(θ^)=∂θ∂l(θ)+∂θ2∂2l(θ)(θ^−θ)=0
从而
n(θ^−θ)=−nl′(θ)l′′(θ)=(1/n)l′(θ)−(1/n)l′′(θ)
\sqrt{n}(\hat{ \theta}-\theta)=-\sqrt{n}\frac{l'(\theta)}{l''(\theta)}=\frac{(1/\sqrt{n})l'(\theta)}{-(1/n)l''(\theta)}
n(θ^−θ)=−nl′′(θ)l′(θ)=−(1/n)l′′(θ)(1/n)l′(θ)
(i)由于
1nl′(θ)=1n∑i∂logf(Xi;θ)∂θ=n1n∑i∂logf(Xi;θ)∂θ
\begin{aligned}
\frac{1}{\sqrt{n}}l'(\theta)&=\frac{1}{\sqrt{n}}\sum_i \frac{\partial log f(X_i;\theta)}{\partial \theta}\\
&=\sqrt{n}\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\\
\end{aligned}
n1l′(θ)=n1i∑∂θ∂logf(Xi;θ)=nn1i∑∂θ∂logf(Xi;θ)
由于E[∂logf(Xi;θ)/∂θ=/0]\mathbb{E}[\partial logf(X_i;\theta)/\partial\theta=/0]E[∂logf(Xi;θ)/∂θ=/0]以及V[∂logf(Xi;θ)/∂θ]=I(θ)\mathbb{V}[\partial logf(X_i;\theta)/\partial\theta]=I(\theta)V[∂logf(Xi;θ)/∂θ]=I(θ)(看我之前Fisher信息矩阵的博客),进而由中心极限定理,可知
1n∑i∂logf(Xi;θ)∂θ⇝N(0,I(θ)/n)
\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\rightsquigarrow N(0,I(\theta)/n)
n1i∑∂θ∂logf(Xi;θ)⇝N(0,I(θ)/n)
因此
n1n∑i∂logf(Xi;θ)∂θ⇝N(0,I(θ))
\sqrt{n}\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\rightsquigarrow N(0,I(\theta))
nn1i∑∂θ∂logf(Xi;θ)⇝N(0,I(θ))
(ii)由于
−(1/n)l′′(θ)=−1n∑i∂log2f(Xi;θ)∂θ2
-(1/n)l''(\theta)=-\frac{1}{n}\sum_i \frac{\partial log^2 f(X_i;\theta)}{\partial \theta^2}
−(1/n)l′′(θ)=−n1i∑∂θ2∂log2f(Xi;θ)
由于E[−∂log2f(Xi;θ)∂θ2]=I(θ)E[-\frac{\partial log^2 f(X_i;\theta)}{\partial \theta^2}]=I(\theta)E[−∂θ2∂log2f(Xi;θ)]=I(θ),因此
−(1/n)l′′(θ)→I(θ)
-(1/n)l''(\theta)\rightarrow I(\theta)
−(1/n)l′′(θ)→I(θ)
(1/n)l′(θ)−(1/n)l′′(θ)⇝N(0,I(θ)I(θ)2)=N(0,I(θ)−1) \frac{(1/\sqrt{n})l'(\theta)}{-(1/n)l''(\theta)}\rightsquigarrow N(0,\frac{I(\theta)}{I(\theta)^2})=N(0,I(\theta)^{-1}) −(1/n)l′′(θ)(1/n)l′(θ)⇝N(0,I(θ)2I(θ))=N(0,I(θ)−1)
因此,最大似然估计量具有渐进正态分布。
下面我们证明加权最大似然估计具有渐进正态分布。
Theorem 1.(Hidetoshi, 2000)
在一定的正则条件下,即模型足够光滑等,设加权最小二乘估计器为θ\thetaθ,真实值为θ∗\theta^*θ∗,则n(θ−θ∗)\sqrt{n}(\theta-\theta^*)n(θ−θ∗)的渐进正态分布为N(0,H−1GH−1)N(0,H^{-1}GH^{-1})N(0,H−1GH−1),其中,HHH和GGG均为m×mm\times mm×m非奇异矩阵,定义为
G=E[∂lw(x,y∣θ)∂θ∣θ∗∂lw(x,y∣θ)∂θT∣θ∗]
G=E[\frac{\partial l_w(x,y|\theta)}{\partial \theta}|_{\theta^{*}}\frac{\partial l_w(x,y|\theta)}{\partial \theta^T}|_{\theta^{*}}]
G=E[∂θ∂lw(x,y∣θ)∣θ∗∂θT∂lw(x,y∣θ)∣θ∗]
H=E[∂2lw(x,y∣θ)∂θ∂θT∣θ∗] H=E[\frac{\partial^2 l_w(x,y|\theta)}{\partial \theta\partial \theta^T}|_{\theta^{*}}] H=E[∂θ∂θT∂2lw(x,y∣θ)∣θ∗]
其中,
lw(x,y∣θ)=−w(x)logp(y∣x,θ)
l_w(x,y|\theta)=-w(x)logp(y|x,\theta)
lw(x,y∣θ)=−w(x)logp(y∣x,θ)
Proof.
证明思路应该是与最大似然估计类似。
最大加权似然估计量满足
∑i∂lw(xi,yi∣θ)∂θ∣θ=θ∗=0
\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}=0
i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗=0
求导,有
∑i∂lw(xi,yi∣θ)∂θ∣θ=θ∗+∑i∂2lw(xi,yi∣θ)∂θ∂θ′∣θ=θ∗(θ−θ∗)=0
\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}+\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}(\theta-\theta^*)=0
i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗+i∑∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗(θ−θ∗)=0
进一步
n12(θ−θ∗)=n−1/2n−1∑i∂lw(xi,yi∣θ)∂θ∣θ=θ∗∑i∂2lw(xi,yi∣θ)∂θ∂θ′∣θ=θ∗
n^{\frac{1}{2}}(\theta-\theta^*)=\frac{n^{-1/2}}{n^{-1}}\frac{\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}}{\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}}
n21(θ−θ∗)=n−1n−1/2∑i∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗∑i∂θ∂lw(xi,yi∣θ)∣θ=θ∗
变形:
n−1∑i∂2lw(xi,yi∣θ)∂θ∂θ′∣θ=θ∗n12(θ−θ∗)=n−1/2∑i∂lw(xi,yi∣θ)∂θ∣θ=θ∗
n^{-1}\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}n^{\frac{1}{2}}(\theta-\theta^*)=n^{-1/2}\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}
n−1i∑∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗n21(θ−θ∗)=n−1/2i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗
根据中心极限定理,右侧⇝N(0,G)\rightsquigarrow N(0,G)⇝N(0,G),而左侧依据概率收敛到Hn(θ−θ∗)H\sqrt{n}(\theta-\theta^*)Hn(θ−θ∗),从而直接得到结论
n(θ−θ∗)⇝N(0,H−1GH−1)
\sqrt{n}(\theta-\theta^*)\rightsquigarrow N(0,H^{-1}GH^{-1})
n(θ−θ∗)⇝N(0,H−1GH−1)