协方差矩阵(Covariance Matrix)
协方差
定义:设 XXX 和 YYY 为两个随机变量,XXX 与 YYY 的协方差
Cov[X,Y]=σXY=E[(X−E[X])(Y−E[Y])]=E[XY]−E[X]E[Y]
Cov[X,Y]=\sigma_{XY}=E[(X-E[X])(Y-E[Y])]=E[XY]-E[X]E[Y]
Cov[X,Y]=σXY=E[(X−E[X])(Y−E[Y])]=E[XY]−E[X]E[Y]
协方差矩阵
定义:设 nnn 维随机变量 (X1,X2,...,Xn)(X_1,X_2,...,X_n)(X1,X2,...,Xn) 的第 iii 个分量 XiX_iXi 与第 jjj 个分量 XjX_jXj 的协方差为 σij\sigma_{ij}σij,则称矩阵
Σ=(σ11σ12…σ1nσ21σ22…σ2n⋮⋮⋮σn1σn2…σnn)
\Sigma=
\begin{pmatrix}
\sigma_{11} & \sigma_{12} & \dots & \sigma_{1n} \\
\sigma_{21} & \sigma_{22} & \dots & \sigma_{2n} \\
\vdots & \vdots & & \vdots\\
\sigma_{n1} & \sigma_{n2} & \dots & \sigma_{nn} \\
\end{pmatrix}
Σ=⎝⎜⎜⎜⎛σ11σ21⋮σn1σ12σ22⋮σn2………σ1nσ2n⋮σnn⎠⎟⎟⎟⎞
为 nnn 维随机变量 (X1,X2,...,Xn)(X_1,X_2,...,X_n)(X1,X2,...,Xn) 的协方差矩阵。
设 Σ\SigmaΣ 为 nnn 维随机变量 (X1,X2,...,Xn)(X_1,X_2,...,X_n)(X1,X2,...,Xn) 的协方差矩阵,则有如下性质:
- ① Σ\SigmaΣ 为半正定矩阵(Positive Semidefinite Matrix)。
- ② 若随机变量 X1,X2,...,XnX_1,X_2,...,X_nX1,X2,...,Xn 相互独立,那么 Σ\SigmaΣ 为对角矩阵。
**证:**① 由协方差矩阵的定义知,Σ\SigmaΣ 为对称矩阵。
对任意 x∈Rnx\in \mathbb{R}^nx∈Rn
xTΣx=∑i=1n∑j=1n(σijxixj)=∑i=1n∑j=1n(E[(Xi−E[Xi])(Xj−E[Xj])]xixj)=E[∑i=1n∑j=1n(Xi−E[Xi])(Xj−E[Xj])xixj]=E[(∑i=1n(Xi−E[Xi])xi)2]⩾0
\begin{aligned}
x^T\Sigma x&=\sum_{i=1}^n\sum_{j=1}^n(\sigma_{ij}x_ix_j)\\
&=\sum_{i=1}^n\sum_{j=1}^n\left(E[(X_i-E[X_i])(X_j-E[X_j])]x_ix_j\right) \\
&=E\left[\sum_{i=1}^n\sum_{j=1}^n(X_i-E[X_i])(X_j-E[X_j])x_ix_j\right] \\
&=E\left[\left(\sum_{i=1}^n(X_i-E[X_i])x_i\right)^2\right] \\
&\geqslant 0
\end{aligned}
xTΣx=i=1∑nj=1∑n(σijxixj)=i=1∑nj=1∑n(E[(Xi−E[Xi])(Xj−E[Xj])]xixj)=E[i=1∑nj=1∑n(Xi−E[Xi])(Xj−E[Xj])xixj]=E⎣⎡(i=1∑n(Xi−E[Xi])xi)2⎦⎤⩾0
故,协方差矩阵 Σ\SigmaΣ 为半正定矩阵。
设 nnn 维随机变量 $
X=\begin{bmatrix}
X_1\
X_2\
\vdots\
X_n\end{bmatrix}$,则 XXX 的协方差矩阵
Σ=E[(X−E[X])(X−E[X])T]=E[XXT]−E[X]E[XT]
\Sigma=E[(X-E[X])(X-E[X])^T]=E[XX^T]-E[X]E[X^T]
Σ=E[(X−E[X])(X−E[X])T]=E[XXT]−E[X]E[XT]
证:
Σ=[Cov[X1,X1]Cov[X1,X2]⋯Cov[X1,Xn]Cov[X2,X1]Cov[X2,X2]⋯Cov[X2,Xn]⋮⋮⋮Cov[Xn,X1]Cov[Xn,X2]⋯Cov[Xn,Xn]]=[E[(X1−E[X1])(X1−E[X1])]E[(X1−E[X1])(X2−E[X2])]⋯E[(X1−E[X1])(Xn−E[Xn])]E[(X2−E[X2])(X1−E[X1])]E[(X2−E[X2])(X2−E[X2])]⋯E[(X2−E[X2])(Xn−E[Xn])]⋮⋮⋮E[(Xn−E[Xn])(X1−E[X1])]E[(Xn−E[Xn])(X2−E[X2])]⋯E[(Xn−E[Xn])(Xn−E[Xn])]]=E[(X1−E[X1])(X1−E[X1])(X1−E[X1])(X2−E[X2])⋯(X1−E[X1])(Xn−E[Xn])(X2−E[X2])(X1−E[X1])(X2−E[X2])(X2−E[X2])⋯(X2−E[X2])(Xn−E[Xn])⋮⋮⋮(Xn−E[Xn])(X1−E[X1])(Xn−E[Xn])(X2−E[X2])⋯(Xn−E[Xn])(Xn−E[Xn])]=E[[X1−E[X1]X2−E[X2]⋮Xn−E[Xn]][X1−E[X1]X2−E[X2]⋯Xn−E[Xn]]]=E[(X−E[X])(X−E[X])T]=E[(X−E[X])(XT−E[X]T)]=E[XXT−XE[X]T−E[X]XT+E[X]E[X]T]=E[XXT]−E[XE[X]T]−E[E[X]XT]+E[E[X]E[X]T]=E[XXT]−E[X]E[X]T−E[X]E[XT]+E[X]E[X]T=E[XXT]−E[X]E[XT]
\begin{aligned}
\Sigma&=
\begin{bmatrix}
Cov[X_1,X_1] & Cov[X_1,X_2] & \cdots & Cov[X_1,X_n] \\
Cov[X_2,X_1] & Cov[X_2,X_2] & \cdots & Cov[X_2,X_n] \\
\vdots & \vdots & & \vdots \\
Cov[X_n,X_1] & Cov[X_n,X_2] & \cdots & Cov[X_n,X_n] \\
\end{bmatrix} \\ \\
&=\begin{bmatrix}
E[(X_1-E[X_1])(X_1-E[X_1])] & E[(X_1-E[X_1])(X_2-E[X_2])] & \cdots & E[(X_1-E[X_1])(X_n-E[X_n])] \\
E[(X_2-E[X_2])(X_1-E[X_1])] & E[(X_2-E[X_2])(X_2-E[X_2])] & \cdots & E[(X_2-E[X_2])(X_n-E[X_n])] \\
\vdots & \vdots & & \vdots \\
E[(X_n-E[X_n])(X_1-E[X_1])] & E[(X_n-E[X_n])(X_2-E[X_2])] & \cdots & E[(X_n-E[X_n])(X_n-E[X_n])] \\
\end{bmatrix} \\ \\
&=E\begin{bmatrix}
(X_1-E[X_1])(X_1-E[X_1]) & (X_1-E[X_1])(X_2-E[X_2]) & \cdots & (X_1-E[X_1])(X_n-E[X_n]) \\
(X_2-E[X_2])(X_1-E[X_1]) & (X_2-E[X_2])(X_2-E[X_2]) & \cdots & (X_2-E[X_2])(X_n-E[X_n]) \\
\vdots & \vdots & & \vdots \\
(X_n-E[X_n])(X_1-E[X_1]) & (X_n-E[X_n])(X_2-E[X_2]) & \cdots & (X_n-E[X_n])(X_n-E[X_n]) \\
\end{bmatrix} \\ \\
&=E\begin{bmatrix}
\begin{bmatrix}
X_1-E[X_1]\\
X_2-E[X_2]\\
\vdots \\
X_n-E[X_n]\\
\end{bmatrix}
\begin{bmatrix}
X_1-E[X_1] & X_2-E[X_2] & \cdots & X_n-E[X_n]\\
\end{bmatrix}
\end{bmatrix} \\ \\
&=E[(X-E[X])(X-E[X])^T] \\\\
&=E[(X-E[X])(X^T-E[X]^T)] \\\\
&=E[XX^T-XE[X]^T-E[X]X^T+E[X]E[X]^T] \\\\
&=E[XX^T]-E[XE[X]^T]-E[E[X]X^T]+E[E[X]E[X]^T] \\\\
&=E[XX^T]-E[X]E[X]^T-E[X]E[X^T]+E[X]E[X]^T \\\\
&=E[XX^T]-E[X]E[X^T] \\\\
\end{aligned}
Σ=⎣⎢⎢⎢⎡Cov[X1,X1]Cov[X2,X1]⋮Cov[Xn,X1]Cov[X1,X2]Cov[X2,X2]⋮Cov[Xn,X2]⋯⋯⋯Cov[X1,Xn]Cov[X2,Xn]⋮Cov[Xn,Xn]⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡E[(X1−E[X1])(X1−E[X1])]E[(X2−E[X2])(X1−E[X1])]⋮E[(Xn−E[Xn])(X1−E[X1])]E[(X1−E[X1])(X2−E[X2])]E[(X2−E[X2])(X2−E[X2])]⋮E[(Xn−E[Xn])(X2−E[X2])]⋯⋯⋯E[(X1−E[X1])(Xn−E[Xn])]E[(X2−E[X2])(Xn−E[Xn])]⋮E[(Xn−E[Xn])(Xn−E[Xn])]⎦⎥⎥⎥⎤=E⎣⎢⎢⎢⎡(X1−E[X1])(X1−E[X1])(X2−E[X2])(X1−E[X1])⋮(Xn−E[Xn])(X1−E[X1])(X1−E[X1])(X2−E[X2])(X2−E[X2])(X2−E[X2])⋮(Xn−E[Xn])(X2−E[X2])⋯⋯⋯(X1−E[X1])(Xn−E[Xn])(X2−E[X2])(Xn−E[Xn])⋮(Xn−E[Xn])(Xn−E[Xn])⎦⎥⎥⎥⎤=E⎣⎢⎢⎢⎡⎣⎢⎢⎢⎡X1−E[X1]X2−E[X2]⋮Xn−E[Xn]⎦⎥⎥⎥⎤[X1−E[X1]X2−E[X2]⋯Xn−E[Xn]]⎦⎥⎥⎥⎤=E[(X−E[X])(X−E[X])T]=E[(X−E[X])(XT−E[X]T)]=E[XXT−XE[X]T−E[X]XT+E[X]E[X]T]=E[XXT]−E[XE[X]T]−E[E[X]XT]+E[E[X]E[X]T]=E[XXT]−E[X]E[X]T−E[X]E[XT]+E[X]E[X]T=E[XXT]−E[X]E[XT]