经典分位回归估计量的渐近正态性
本文将使用巴哈杜尔表达式(Bahadur representation)给出分位回归估计量的渐近正态性的完整证明过程。这个证明不仅展示了巴哈杜尔表达式的强大应用,也揭示了分位回归估计量的统计性质。
前提与记号
1. 巴哈杜尔表达式的基本概念
巴哈杜尔表达式由印度统计学家R.R. Bahadur在1966年首次提出,为统计量的渐近理论提供了强大的分析工具。这种表达式最初用于描述样本分位数的渐近行为,后来被扩展到更复杂的统计模型中。
对于简单的样本分位数,巴哈杜尔表达式将其表示为一个线性统计量与一个高阶余项之和。具体而言,对于独立同分布样本 X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X1,X2,…,Xn的 τ \tau τ分位数估计量 q ^ n ( τ ) \hat{q}_n(\tau) q^n(τ),其巴哈杜尔表达式为:
q ^ n ( τ ) − q ( τ ) = 1 n f ( q ( τ ) ) ∑ i = 1 n [ τ − I ( X i ≤ q ( τ ) ) ] + R n \hat{q}_n(\tau) - q(\tau) = \frac{1}{nf(q(\tau))}\sum_{i=1}^n[\tau - I(X_i \leq q(\tau))] + R_n q^n(τ)−q(τ)=nf(q(τ))1i=1∑n[τ−I(Xi≤q(τ))]+Rn
其中:
- q ( τ ) q(\tau) q(τ)是总体分布的真实 τ \tau τ分位数
- f ( ⋅ ) f(\cdot) f(⋅)是概率密度函数
- I ( ⋅ ) I(\cdot) I(⋅)是示性函数
- R n R_n Rn是余项,满足 R n = o p ( n − 1 / 2 ) R_n = o_p(n^{-1/2}) Rn=op(n−1/2)
2. 技术条件与要求
巴哈杜尔表达式成立需要一些技术条件:
- 密度条件:在真实分位数 q ( τ ) q(\tau) q(τ)附近,分布密度函数 f f f存在且严格正( f ( q ( τ ) ) > 0 f(q(\tau)) > 0 f(q(τ))>0)
- 平滑性条件:密度函数在分位数附近满足Hölder连续性
- 矩条件:随机变量具有有限的高阶矩
对于分位回归,还需要额外的条件:
- 设计矩阵(协变量)满足一定的正则性条件
- 条件分布的密度函数满足一致的平滑性条件
- 参数空间是紧集
3.记号
现在让我们考虑经典的线性分位回归模型:
Q
Y
∣
X
(
τ
∣
X
)
=
X
′
β
(
τ
)
Q_{Y|X}(\tau|X) = X'\beta(\tau)
QY∣X(τ∣X)=X′β(τ)
分位回归估计量
β
^
(
τ
)
\hat{\beta}(\tau)
β^(τ) 是通过最小化以下目标函数得到的:
min
β
∑
i
=
1
n
ρ
τ
(
Y
i
−
X
i
′
β
)
\min_{\beta} \sum_{i=1}^n \rho_\tau(Y_i - X_i'\beta)
βmini=1∑nρτ(Yi−Xi′β)
其中 ρ τ ( u ) = u ( τ − I ( u < 0 ) ) \rho_\tau(u) = u(\tau - I(u < 0)) ρτ(u)=u(τ−I(u<0)) 是分位数检查函数(check loss function)。
证明步骤
步骤1:建立一阶条件
我们先分析目标函数的次梯度(subgradient):
∂
∑
i
=
1
n
ρ
τ
(
Y
i
−
X
i
′
β
)
=
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
)
\partial \sum_{i=1}^n \rho_\tau(Y_i - X_i'\beta) = \sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta)
∂i=1∑nρτ(Yi−Xi′β)=i=1∑nXi⋅ψτ(Yi−Xi′β)
其中 ψ τ ( u ) = τ − I ( u < 0 ) \psi_\tau(u) = \tau - I(u < 0) ψτ(u)=τ−I(u<0) 是检查函数的导数(除零点外)。
在最优解
β
^
(
τ
)
\hat{\beta}(\tau)
β^(τ) 处,次梯度必须包含零向量:
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
^
(
τ
)
)
=
0
\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\hat{\beta}(\tau)) = 0
i=1∑nXi⋅ψτ(Yi−Xi′β^(τ))=0
步骤2:应用泰勒展开
定义函数:
Z
n
(
β
)
=
1
n
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
)
Z_n(\beta) = \frac{1}{n}\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta)
Zn(β)=n1i=1∑nXi⋅ψτ(Yi−Xi′β)
根据一阶条件,我们有 Z n ( β ^ ( τ ) ) = 0 Z_n(\hat{\beta}(\tau)) = 0 Zn(β^(τ))=0。
对
Z
n
(
β
)
Z_n(\beta)
Zn(β) 在真实参数
β
(
τ
)
\beta(\tau)
β(τ) 附近进行泰勒展开:
Z
n
(
β
^
(
τ
)
)
=
Z
n
(
β
(
τ
)
)
+
∂
Z
n
(
β
)
∂
β
′
∣
β
=
β
~
⋅
(
β
^
(
τ
)
−
β
(
τ
)
)
Z_n(\hat{\beta}(\tau)) = Z_n(\beta(\tau)) + \frac{\partial Z_n(\beta)}{\partial \beta'}\bigg|_{\beta = \tilde{\beta}} \cdot (\hat{\beta}(\tau) - \beta(\tau))
Zn(β^(τ))=Zn(β(τ))+∂β′∂Zn(β)
β=β~⋅(β^(τ)−β(τ))
其中 β ~ \tilde{\beta} β~ 位于 β ^ ( τ ) \hat{\beta}(\tau) β^(τ) 和 β ( τ ) \beta(\tau) β(τ) 之间。
步骤3:计算导数矩阵
我们需要计算 Z n ( β ) Z_n(\beta) Zn(β) 对 β \beta β 的导数:
∂ Z n ( β ) ∂ β ′ = − 1 n ∑ i = 1 n X i X i ′ ⋅ f Y ∣ X ( X i ′ β ∣ X i ) \frac{\partial Z_n(\beta)}{\partial \beta'} = -\frac{1}{n}\sum_{i=1}^n X_i X_i' \cdot f_{Y|X}(X_i'\beta|X_i) ∂β′∂Zn(β)=−n1i=1∑nXiXi′⋅fY∣X(Xi′β∣Xi)
其中 f Y ∣ X ( ⋅ ∣ X i ) f_{Y|X}(\cdot|X_i) fY∣X(⋅∣Xi) 是给定 X i X_i Xi 条件下 Y Y Y 的条件密度函数。
为了简化记号,定义:
D
n
(
β
)
=
−
∂
Z
n
(
β
)
∂
β
′
=
1
n
∑
i
=
1
n
X
i
X
i
′
⋅
f
Y
∣
X
(
X
i
′
β
∣
X
i
)
D_n(\beta) = -\frac{\partial Z_n(\beta)}{\partial \beta'} = \frac{1}{n}\sum_{i=1}^n X_i X_i' \cdot f_{Y|X}(X_i'\beta|X_i)
Dn(β)=−∂β′∂Zn(β)=n1i=1∑nXiXi′⋅fY∣X(Xi′β∣Xi)
当样本量
n
→
∞
n \to \infty
n→∞ 时,由大数定律可知
D
n
(
β
(
τ
)
)
D_n(\beta(\tau))
Dn(β(τ)) 收敛到:
D
(
τ
)
=
E
[
X
X
′
⋅
f
Y
∣
X
(
X
′
β
(
τ
)
∣
X
)
]
D(\tau) = E[X X' \cdot f_{Y|X}(X'\beta(\tau)|X)]
D(τ)=E[XX′⋅fY∣X(X′β(τ)∣X)]
步骤4:建立巴哈杜尔表达式
将泰勒展开式与一阶条件结合:
0
=
Z
n
(
β
^
(
τ
)
)
=
Z
n
(
β
(
τ
)
)
−
D
n
(
β
~
)
⋅
(
β
^
(
τ
)
−
β
(
τ
)
)
0 = Z_n(\hat{\beta}(\tau)) = Z_n(\beta(\tau)) - D_n(\tilde{\beta}) \cdot (\hat{\beta}(\tau) - \beta(\tau))
0=Zn(β^(τ))=Zn(β(τ))−Dn(β~)⋅(β^(τ)−β(τ))
重新整理这个等式:
β
^
(
τ
)
−
β
(
τ
)
=
[
D
n
(
β
~
)
]
−
1
⋅
Z
n
(
β
(
τ
)
)
\hat{\beta}(\tau) - \beta(\tau) = [D_n(\tilde{\beta})]^{-1} \cdot Z_n(\beta(\tau))
β^(τ)−β(τ)=[Dn(β~)]−1⋅Zn(β(τ))
这可以进一步重写为:
β
^
(
τ
)
−
β
(
τ
)
=
[
D
(
τ
)
]
−
1
⋅
Z
n
(
β
(
τ
)
)
+
R
n
\hat{\beta}(\tau) - \beta(\tau) = [D(\tau)]^{-1} \cdot Z_n(\beta(\tau)) + R_n
β^(τ)−β(τ)=[D(τ)]−1⋅Zn(β(τ))+Rn
其中余项:
R
n
=
{
[
D
n
(
β
~
)
]
−
1
−
[
D
(
τ
)
]
−
1
}
⋅
Z
n
(
β
(
τ
)
)
R_n = \{[D_n(\tilde{\beta})]^{-1} - [D(\tau)]^{-1}\} \cdot Z_n(\beta(\tau))
Rn={[Dn(β~)]−1−[D(τ)]−1}⋅Zn(β(τ))
在适当的正则性条件下(例如设计矩阵和条件密度函数满足一定的平滑性和矩条件),我们可以证明 R n = o p ( n − 1 / 2 ) R_n = o_p(n^{-1/2}) Rn=op(n−1/2)。
因此,我们得到巴哈杜尔表达式:
n
(
β
^
(
τ
)
−
β
(
τ
)
)
=
[
D
(
τ
)
]
−
1
⋅
1
n
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
(
τ
)
)
+
o
p
(
1
)
\sqrt{n}(\hat{\beta}(\tau) - \beta(\tau)) = [D(\tau)]^{-1} \cdot \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau)) + o_p(1)
n(β^(τ)−β(τ))=[D(τ)]−1⋅n1i=1∑nXi⋅ψτ(Yi−Xi′β(τ))+op(1)
步骤5:应用中心极限定理
接下来,我们需要研究表达式中的随机项:
1
n
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
(
τ
)
)
\frac{1}{\sqrt{n}}\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau))
n1i=1∑nXi⋅ψτ(Yi−Xi′β(τ))
我们可以证明:
-
这些项是独立同分布的随机向量(因为原始观测是独立同分布的)
-
它们的期望为零:
E [ X i ⋅ ψ τ ( Y i − X i ′ β ( τ ) ) ] = E [ X i ⋅ E [ ψ τ ( Y i − X i ′ β ( τ ) ) ∣ X i ] ] E[X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau))] = E[X_i \cdot E[\psi_\tau(Y_i - X_i'\beta(\tau))|X_i]] E[Xi⋅ψτ(Yi−Xi′β(τ))]=E[Xi⋅E[ψτ(Yi−Xi′β(τ))∣Xi]]由于 β ( τ ) \beta(\tau) β(τ) 是条件 τ \tau τ 分位数,所以 P ( Y i ≤ X i ′ β ( τ ) ∣ X i ) = τ P(Y_i \leq X_i'\beta(\tau)|X_i) = \tau P(Yi≤Xi′β(τ)∣Xi)=τ,这意味着:
E [ ψ τ ( Y i − X i ′ β ( τ ) ) ∣ X i ] = τ − P ( Y i ≤ X i ′ β ( τ ) ∣ X i ) = τ − τ = 0 E[\psi_\tau(Y_i - X_i'\beta(\tau))|X_i] = \tau - P(Y_i \leq X_i'\beta(\tau)|X_i) = \tau - \tau = 0 E[ψτ(Yi−Xi′β(τ))∣Xi]=τ−P(Yi≤Xi′β(τ)∣Xi)=τ−τ=0因此 E [ X i ⋅ ψ τ ( Y i − X i ′ β ( τ ) ) ] = 0 E[X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau))] = 0 E[Xi⋅ψτ(Yi−Xi′β(τ))]=0
-
它们具有有限的二阶矩(在适当的矩条件下)
根据多元中心极限定理,我们有:
1
n
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
(
τ
)
)
→
d
N
(
0
,
J
(
τ
)
)
\frac{1}{\sqrt{n}}\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau)) \xrightarrow{d} N(0, J(\tau))
n1i=1∑nXi⋅ψτ(Yi−Xi′β(τ))dN(0,J(τ))
其中协方差矩阵 J ( τ ) = E [ X i X i ′ ⋅ ψ τ ( Y i − X i ′ β ( τ ) ) 2 ] J(\tau) = E[X_i X_i' \cdot \psi_\tau(Y_i - X_i'\beta(\tau))^2] J(τ)=E[XiXi′⋅ψτ(Yi−Xi′β(τ))2]。
步骤6:导出协方差矩阵
观察到:
ψ
τ
(
u
)
2
=
(
τ
−
I
(
u
<
0
)
)
2
=
τ
2
⋅
I
(
u
≥
0
)
+
(
1
−
τ
)
2
⋅
I
(
u
<
0
)
\psi_\tau(u)^2 = (\tau - I(u < 0))^2 = \tau^2 \cdot I(u \geq 0) + (1-\tau)^2 \cdot I(u < 0)
ψτ(u)2=(τ−I(u<0))2=τ2⋅I(u≥0)+(1−τ)2⋅I(u<0)
给定
X
i
X_i
Xi,我们有:
P
(
Y
i
−
X
i
′
β
(
τ
)
<
0
∣
X
i
)
=
τ
和
P
(
Y
i
−
X
i
′
β
(
τ
)
≥
0
∣
X
i
)
=
1
−
τ
P(Y_i - X_i'\beta(\tau) < 0|X_i) = \tau \quad \text{和} \quad P(Y_i - X_i'\beta(\tau) \geq 0|X_i) = 1-\tau
P(Yi−Xi′β(τ)<0∣Xi)=τ和P(Yi−Xi′β(τ)≥0∣Xi)=1−τ
因此:
E
[
ψ
τ
(
Y
i
−
X
i
′
β
(
τ
)
)
2
∣
X
i
]
=
τ
2
⋅
(
1
−
τ
)
+
(
1
−
τ
)
2
⋅
τ
=
τ
(
1
−
τ
)
E[\psi_\tau(Y_i - X_i'\beta(\tau))^2|X_i] = \tau^2 \cdot (1-\tau) + (1-\tau)^2 \cdot \tau = \tau(1-\tau)
E[ψτ(Yi−Xi′β(τ))2∣Xi]=τ2⋅(1−τ)+(1−τ)2⋅τ=τ(1−τ)
这样我们得到:
J
(
τ
)
=
τ
(
1
−
τ
)
⋅
E
[
X
i
X
i
′
]
J(\tau) = \tau(1-\tau) \cdot E[X_i X_i']
J(τ)=τ(1−τ)⋅E[XiXi′]
步骤7:确立渐近正态性
将中心极限定理的结果与巴哈杜尔表达式结合:
n
(
β
^
(
τ
)
−
β
(
τ
)
)
=
[
D
(
τ
)
]
−
1
⋅
1
n
∑
i
=
1
n
X
i
⋅
ψ
τ
(
Y
i
−
X
i
′
β
(
τ
)
)
+
o
p
(
1
)
\sqrt{n}(\hat{\beta}(\tau) - \beta(\tau)) = [D(\tau)]^{-1} \cdot \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i \cdot \psi_\tau(Y_i - X_i'\beta(\tau)) + o_p(1)
n(β^(τ)−β(τ))=[D(τ)]−1⋅n1i=1∑nXi⋅ψτ(Yi−Xi′β(τ))+op(1)
根据斯拉茨基定理(Slutsky’s theorem):
n
(
β
^
(
τ
)
−
β
(
τ
)
)
→
d
N
(
0
,
[
D
(
τ
)
]
−
1
⋅
J
(
τ
)
⋅
[
D
(
τ
)
]
−
1
)
\sqrt{n}(\hat{\beta}(\tau) - \beta(\tau)) \xrightarrow{d} N(0, [D(\tau)]^{-1} \cdot J(\tau) \cdot [D(\tau)]^{-1})
n(β^(τ)−β(τ))dN(0,[D(τ)]−1⋅J(τ)⋅[D(τ)]−1)
代入
J
(
τ
)
=
τ
(
1
−
τ
)
⋅
E
[
X
i
X
i
′
]
J(\tau) = \tau(1-\tau) \cdot E[X_i X_i']
J(τ)=τ(1−τ)⋅E[XiXi′]:
n
(
β
^
(
τ
)
−
β
(
τ
)
)
→
d
N
(
0
,
τ
(
1
−
τ
)
⋅
[
D
(
τ
)
]
−
1
⋅
E
[
X
i
X
i
′
]
⋅
[
D
(
τ
)
]
−
1
)
\sqrt{n}(\hat{\beta}(\tau) - \beta(\tau)) \xrightarrow{d} N\left(0, \tau(1-\tau) \cdot [D(\tau)]^{-1} \cdot E[X_i X_i'] \cdot [D(\tau)]^{-1}\right)
n(β^(τ)−β(τ))dN(0,τ(1−τ)⋅[D(τ)]−1⋅E[XiXi′]⋅[D(τ)]−1)
其中:
D
(
τ
)
=
E
[
X
i
X
i
′
⋅
f
Y
∣
X
(
X
i
′
β
(
τ
)
∣
X
i
)
]
D(\tau) = E[X_i X_i' \cdot f_{Y|X}(X_i'\beta(\tau)|X_i)]
D(τ)=E[XiXi′⋅fY∣X(Xi′β(τ)∣Xi)]
结论与直观解释
我们已经完整证明了分位回归估计量 β ^ ( τ ) \hat{\beta}(\tau) β^(τ) 的渐近正态性。这个证明的核心是巴哈杜尔表达式,它将非线性的估计问题线性化,使我们能够直接应用中心极限定理。
从直观上理解,渐近正态性意味着当样本量足够大时,分位回归估计量与真实参数的偏差(乘以 n \sqrt{n} n)近似服从正态分布。渐近方差的结构显示:
-
方差与 τ ( 1 − τ ) \tau(1-\tau) τ(1−τ) 成正比,这表明当 τ \tau τ 接近 0 或 1 时(极端分位点),估计的精度会下降。
-
条件密度 f Y ∣ X f_{Y|X} fY∣X 出现在分母位置,表明条件分布在分位点附近的密度越高,估计的精度越高。
-
设计矩阵的二阶矩 E [ X i X i ′ ] E[X_i X_i'] E[XiXi′] 影响估计精度,这与线性回归类似。
巴哈杜尔表达式不仅为渐近正态性提供了证明工具,也为构建置信区间和进行假设检验提供了理论基础,使分位回归成为一种强大的统计方法。
参考文献
Bahadur, R. Raj. “A note on quantiles in large samples.” The Annals of Mathematical Statistics 37.3 (1966): 577-580.
Belloni, Alexandre, et al. “Conditional Quantile Processes Based on Series or Many Regressors.” arXiv, 2011, arXiv:1105.6154.
Chernozhukov, Victor, et al. “Inference on Counterfactual Distributions.” Econometrica, vol. 81, no. 6, 2013, pp. 2205-2268.
Gutenbrunner, Christian, and Jana Jurečková. “Regression Rank Scores and Regression Quantiles.” The Annals of Statistics, vol. 20, no. 1, 1992, pp. 305-330.
He, Xuming, and Qi-Man Shao. “A General Bahadur Representation of M-Estimators and Its Application to Linear Regression with Nonstochastic Designs.” The Annals of Statistics, vol. 24, no. 6, 1996, pp. 2608-2630.
Jurečková, Jana. “Asymptotic Relations of M-Estimates and R-Estimates in Linear Regression Model.” The Annals of Statistics, vol. 5, no. 3, 1977, pp. 464-472.
Kato, Kengo. “Asymptotic Normality of Powell’s Kernel Estimator.” Annals of the Institute of Statistical Mathematics, vol. 64, no. 2, 2012, pp. 255-273.
Knight, Keith. “Limiting Distributions for L1 Regression Estimators Under General Conditions.” The Annals of Statistics, vol. 26, no. 2, 1998, pp. 755-770.
Koenker, Roger. Quantile Regression. Cambridge University Press, 2005.
Koenker, Roger, and Gilbert Bassett. “Regression Quantiles.” Econometrica, vol. 46, no. 1, 1978, pp. 33-50.
Koenker, Roger, and Stephen Portnoy. “L-Estimation for Linear Models.” Journal of the American Statistical Association, vol. 82, no. 399, 1987, pp. 851-857.
Pollard, David. “Asymptotics for Least Absolute Deviation Regression Estimators.” Econometric Theory, vol. 7, no. 2, 1991, pp. 186-199.
Portnoy, Stephen. “Asymptotic Behavior of Regression Quantiles in Non-Stationary, Dependent Cases.” Journal of Multivariate Analysis, vol. 38, no. 1, 1991, pp. 100-113.
van der Vaart, Aad W., and Jon A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
Welsh, Alan H. “On M-Processes and M-Estimation.” The Annals of Statistics, vol. 17, no. 1, 1989, pp. 337-361.