目录
- As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a special factorization (Singular Value Decomposition) A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A!
奇异值
奇异值的定义
- Let
A
A
A be an
m
×
n
m \times n
m×n matrix. Then
A
T
A
A^TA
ATA is symmetric and can be orthogonally diagonalized. Let
{
v
1
,
.
.
.
,
v
n
}
\{\boldsymbol v_1,...,\boldsymbol v_n\}
{v1,...,vn} be an orthonormal basis for
R
n
\R^n
Rn consisting of eigenvectors of
A
T
A
A^TA
ATA, and let
{
λ
1
,
.
.
.
,
λ
n
}
\{\lambda_1,...,\lambda_n\}
{λ1,...,λn} be the associated eigenvalues of
A
T
A
A^TA
ATA. Then, for
1
≤
i
≤
n
1\leq i\leq n
1≤i≤n,
So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
- The singular values of
A
A
A are the square roots of the eigenvalues of
A
T
A
A^TA
ATA, denoted by
σ
1
,
.
.
.
,
σ
n
\sigma_1,...,\sigma_n
σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of
A
A
A are the lengths of the vectors
A
v
1
,
.
.
.
,
A
v
n
A\boldsymbol v_1,...,A\boldsymbol v_n
Av1,...,Avn.
- The first singular value σ 1 \sigma_1 σ1 of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1 of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1.
EXERCISE
How are the singular values of A A A and A T A^T AT related?
SOLUTION
- A T = ( U Σ V T ) T = V Σ T U T A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T AT=(UΣVT)T=VΣTUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and Σ T \Sigma ^T ΣT is an n × m n\times m n×m “diagonal” matrix. Since Σ \Sigma Σ and Σ T \Sigma ^T ΣT have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.
非零奇异值
PROOF
- For
i
≠
j
i\neq j
i=j ,
Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1,...,Avn} is an orthogonal set.
- Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA.
- Finally, for any
y
=
A
x
\boldsymbol y=A\boldsymbol x
y=Ax in
C
o
l
A
ColA
ColA, we can write
x
=
c
1
v
1
+
.
.
.
+
c
n
v
n
\boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n
x=c1v1+...+cnvn, and
Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r. ( r r r 包括了重复的奇异值)
The Singular Value Decomposition (SVD)
奇异值分解
- The decomposition of
A
A
A involves an
m
×
n
m\times n
m×n “diagonal” matrix
Σ
\Sigma
Σ of the form
where D D D is an r × r r\times r r×r diagonal matrix ( r ≤ m ; r ≤ n r\leq m;r\leq n r≤m;r≤n)
- The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
- 注意:常将奇异值按降序排列以确保 Σ \Sigma Σ 的唯一性.
- 当
A
A
A 为正定矩阵时,奇异值分解与特征值分解结果相同
- 对 A A A 进行特征值分解可得 A = P D P T A=PDP^{T} A=PDPT,其中 P P P 的每一列 p i \boldsymbol p_i pi 均为 A A A 的一个特征向量且它们互相正交。易证 A A A 的特征向量 p i \boldsymbol p_i pi 也为 A T A A^TA ATA 的特征向量且对应的 A A A 的特征值的平方也为 A T A A^TA ATA 的特征值,因此找到了 A T A A^TA ATA 的一组特征向量基 { p 1 , . . . , p n } \{\boldsymbol p_1,...,\boldsymbol p_n\} {p1,...,pn} 且对应的特征向量为 { λ 1 2 , . . , λ n 2 } \{\lambda_1^2,..,\lambda_n^2\} {λ12,..,λn2},因此 v i = p i \boldsymbol v_i=\boldsymbol p_i vi=pi 且 σ i = λ i 2 = λ i \sigma_i=\sqrt{\lambda_i^2}=\lambda_i σi=λi2=λi,因此有 V = P , Σ = D V=P,\Sigma=D V=P,Σ=D,进而可以推出 U = P U=P U=P
PROOF
- Let
λ
i
\lambda_i
λi and
v
i
\boldsymbol v_i
vi be as in Theorem 9, so that
{
A
v
1
,
.
.
.
,
A
v
r
}
\{A\boldsymbol v_1,...,A\boldsymbol v_r\}
{Av1,...,Avr} is an orthogonal basis for
C
o
l
A
ColA
ColA. Normalize each
A
v
i
A\boldsymbol v_i
Avi to obtain an orthonormal basis
{
u
1
,
.
.
.
,
u
r
}
\{\boldsymbol u_1,...,\boldsymbol u_r\}
{u1,...,ur}, where
and
- Now extend
{
u
1
,
.
.
.
,
u
r
}
\{\boldsymbol u_1,...,\boldsymbol u_r\}
{u1,...,ur} to an orthonormal basis
{
u
1
,
.
.
.
,
u
m
}
\{\boldsymbol u_1,...,\boldsymbol u_m\}
{u1,...,um} of
R
m
\R^m
Rm, and let
By construction, U U U and V V V are orthogonal matrices. Also, from (4),
- Let
D
D
D be the diagonal matrix with diagonal entries
σ
1
,
.
.
.
,
σ
r
\sigma_1,...,\sigma_r
σ1,...,σr , and let
Σ
\Sigma
Σ be as in (3) above. Then
Since V V V is an orthogonal matrix,
EXAMPLE 4
Find a singular value decomposition of
SOLUTION
- Step 1. Find an orthogonal diagonalization of
A
T
A
A^TA
ATA. The eigenvalues of
A
T
A
A^TA
ATA are 18 and 0, with corresponding unit eigenvectors
- Step 2. Set up
V
V
V and
Σ
\Sigma
Σ.
- Step 3. Construct
U
U
U. To construct
U
U
U, first construct
A
v
1
A\boldsymbol v_1
Av1 and
A
v
2
A\boldsymbol v_2
Av2:
The only column found for U U U so far is
The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1−2x2+2x3=0. A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1,w2}, and obtain
- Another way to find u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 is to realize that u 1 \boldsymbol u_1 u1 form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)⊥=NulAT.
一些性质
- 由之前的讨论可知,假设 A A A 有 r r r 个非零奇异值,那么 { A v 1 , . . . , A v r } \{Av_1,...,Av_r\} {Av1,...,Avr} 为 C o l A Col\ A Col A 的一组正交基。由 u i = A v 1 σ i u_i=\frac{Av_1}{\sigma_i} ui=σiAv1 可知, A A A 的 r r r 个左奇异向量 u 1 , . . . , u r u_1,...,u_r u1,...,ur 构成了 C o l A Col\ A Col A 的一组标准正交基;由此可知, A A A 的 m − r m-r m−r 个左奇异向量 u r + 1 , . . . , u m u_{r+1},...,u_m ur+1,...,um 构成了 N u l l A T Null\ A^T Null AT 的一组标准正交基
- 由于 A T = V Σ T U T A^T=V\Sigma^TU^T AT=VΣTUT,同理可知, A A A 的 r r r 个右奇异向量 v 1 , . . . , v r v_1,...,v_r v1,...,vr 构成了 C o l A T Col\ A^T Col AT 的一组标准正交基; A A A 的 n − r n-r n−r 个右奇异向量 v r + 1 , . . . , v n v_{r+1},...,v_n vr+1,...,vn 构成了 N u l l A Null\ A Null A 的一组标准正交基
几何解释
- 从线性变换的角度理解奇异值分解,
m
×
n
m \times n
m×n 矩阵
A
A
A 表示从
n
n
n 维空间
R
n
\R^n
Rn 到
m
m
m 维空间
R
m
\R^m
Rm 的一个线性变换,
T : x → A x T:x\rightarrow Ax T:x→Ax - 由奇异值分解可知,线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换
正交矩阵对应的正交变换不改变向量长度,也不改变向量内积结果,因此不改变向量的正交性。也就是说,一组正交基在经过正交变换后仍然是一组正交基,且基的长度不变,因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)
紧奇异值分解与截断奇异值分解
- 定理 10 给出的奇异值分解
A = U Σ V T A=U\Sigma V^T A=UΣVT又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解,截断奇异值分解是比原始矩阵低秩的奇异值分解
紧奇异值分解
证明
A
=
U
Σ
V
T
=
[
u
1
.
.
.
u
m
]
[
Σ
r
0
0
0
]
[
v
1
T
.
.
.
v
n
T
]
=
[
σ
1
u
1
.
.
.
σ
r
u
r
0
.
.
.
0
]
[
v
1
T
.
.
.
v
n
T
]
=
∑
i
=
1
r
σ
i
u
i
v
i
T
=
U
r
Σ
r
V
r
T
\begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned}
A=UΣVT=[u1...um][Σr000]⎣⎡v1T...vnT⎦⎤=[σ1u1...σrur0...0]⎣⎡v1T...vnT⎦⎤=i=1∑rσiuiviT=UrΣrVrT
截断奇异值分解
- 在矩阵的奇异值分解中,只取最大的 k k k 个奇异值 ( k < r k < r k<r, r r r 为矩阵的秩) 对应的部分,就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时,通常指截断奇异值分解
奇异值分解与矩阵近似
- 奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
弗罗贝尼乌斯范数 (Frobenius norm)
- 矩阵的弗罗贝尼乌斯范数是向量的 L 2 L_2 L2 范数的直接推广,对应着机器学习中的平方损失函数
证明
- 一般地,若
Q
Q
Q 是
m
m
m 阶正交矩阵,则有
因为
- 若
P
P
P 是
n
n
n 阶正交矩阵,则由
∣
∣
A
∣
∣
F
=
∣
∣
A
T
∣
∣
F
||A||_F=||A^T||_F
∣∣A∣∣F=∣∣AT∣∣F 可知,
∣ ∣ A P T ∣ ∣ F = ∣ ∣ P A T ∣ ∣ F = ∣ ∣ A T ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F ∣∣APT∣∣F=∣∣PAT∣∣F=∣∣AT∣∣F=∣∣A∣∣F - 故
∥ A ∥ F = ∥ U Σ V T ∥ F = ∥ Σ ∥ F = ( σ 1 2 + σ 2 2 + . . . + σ n 2 ) 1 2 \|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}} ∥A∥F=∥∥UΣVT∥∥F=∥Σ∥F=(σ12+σ22+...+σn2)21
矩阵的最优近似
- 上述定理说明,奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
- 将
A
A
A 进行谱分解可得
A = ∑ i = 1 n σ i u i v i T = ∑ i = 1 r σ i u i v i T A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T A=i=1∑nσiuiviT=i=1∑rσiuiviT一般地,设 A A A 的截断奇异值分解为 A k = ∑ i = 1 k σ i u i v i T A_k=\sum_{i=1}^k\sigma_iu_iv_i^T Ak=i=1∑kσiuiviT则 A k A_k Ak 的秩为 k k k,并且 A k A_k Ak 是秩为 k k k 的矩阵中在弗罗贝尼乌斯范数意义下 A A A 的最优近似矩阵;由于通常奇异值 σ i \sigma_i σi 递减很快,所以 k k k 取很小值时, A k A_k Ak 也可以对 A A A 有很好的近似
证明
- 设
X
∈
M
X\in\mathcal M
X∈M 为满足
∣
∣
A
−
X
∣
∣
F
=
min
S
∈
M
∣
∣
A
−
S
∣
∣
F
||A-X||_F=\min_{S\in\mathcal M}||A-S||_F
∣∣A−X∣∣F=minS∈M∣∣A−S∣∣F 的一个矩阵,因此有
下面证明
即可
- 设
X
X
X 的奇异值分解为
Q
Ω
P
T
Q\Omega P^T
QΩPT,其中
若令矩阵 B = Q T A P B = Q^TAP B=QTAP,则 A = Q B P T A=QBP^T A=QBPT。由此得到
用 Ω \Omega Ω 的分块方法对 B B B 分块
可得
现证 B 12 = 0 B_{12} = 0 B12=0, B 21 = 0 B_{21} = 0 B21=0。用反证法。若 B 12 ≠ 0 B_{12}\neq0 B12=0,令
则 Y ∈ M Y\in\mathcal M Y∈M,且
这与 X X X 的定义式矛盾,证明了 B 12 = 0 B_{12} = 0 B12=0。同样可证 B 21 = 0 B_{21} = 0 B21=0。于是
再证 B 11 = Ω k B_{11} = \Omega_k B11=Ωk。为此令
则 Z ∈ M Z\in\mathcal M Z∈M,且
因此
∣ ∣ B 11 − Ω k ∣ ∣ F 2 = 0 ||B_{11}-\Omega_k||_F^2=0 ∣∣B11−Ωk∣∣F2=0即 B 11 = Ω k B_{11}=\Omega_k B11=Ωk。最后看 B 22 B_{22} B22。若 ( m − k ) × ( n − k ) (m-k)\times (n-k) (m−k)×(n−k) 子矩阵 B 22 B_{22} B22 有奇异值分解 U 1 Λ V 1 T U_{1} \Lambda V_{1}^{\mathrm{T}} U1ΛV1T,则
下面证明 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。为此,令
其中 I k I_k Ik 为 k k k 阶单位矩阵,则
U 2 T Q T A P V 2 = U 2 T B V 2 = [ I k 0 0 U 1 T ] [ Ω k 0 0 U 1 Λ V 1 T ] [ I k 0 0 V 1 ] = [ Ω k 0 0 Λ ] \begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned} U2TQTAPV2=U2TBV2=[Ik00U1T][Ωk00U1ΛV1T][Ik00V1]=[Ωk00Λ]因此
由此可知 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。故有
于是证明了
Applications of the Singular Value Decomposition
- The next few exercises show some interesting facts.
EXERCISE 19
A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U Σ V T A=U\Sigma V^T A=UΣVT , where U U U is an m × m m\times m m×m orthogonal matrix, Σ \Sigma Σ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of Σ \Sigma Σ are the singular values of A A A.
SOLUTION
- [Hint: Use the SVD to compute
A
T
A
A^TA
ATA and
A
A
T
AA^T
AAT .]
EXERCISE 25
Let T : R n ↦ R m T: \R^n\mapsto \R^m T:Rn↦Rm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.
SOLUTION
- Consider the SVD for the standard matrix of
T
T
T, say,
A
=
U
∑
V
T
A = U\sum V^T
A=U∑VT. Let
B
=
{
v
1
,
…
,
v
n
}
\mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\}
B={v1,…,vn} and
C
=
{
u
1
,
…
,
u
m
}
C = \{\boldsymbol u_1, …, \boldsymbol u_m\}
C={u1,…,um} be bases constructed from the columns of
V
V
V and
U
U
U, respectively. Observe that, since the columns of
V
V
V are orthonormal,
V
T
v
j
=
e
j
V^T\boldsymbol v_j = \boldsymbol e_j
VTvj=ej, where
e
j
\boldsymbol e_j
ej is the
j
j
jth column of the
n
×
n
n\times n
n×n identity matrix. To find the matrix of
T
T
T relative to
B
\mathcal B
B and
C
\mathcal C
C, compute
So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj)]C=σjej. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix Σ \Sigma Σ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.
Polar Decomposition (极分解)
- Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.
Proof
- [Hint: Use a singular value decomposition, A = U Σ V T A= U\Sigma V^T A=UΣVT , and observe that A = ( U Σ U T ) ( U V T ) A=(U\Sigma U^T)(UV^T) A=(UΣUT)(UVT) and U Σ U T U\Sigma U^T UΣUT is a symmetric matrix.]
估计矩阵的秩
Check Theorem 9
The Condition Number (条件数)
- Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
- If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)
Bases for Fundamental Subspaces
- Given an SVD for an
m
×
n
m \times n
m×n matrix
A
A
A, let
u
1
,
.
.
.
,
u
m
\boldsymbol u_1,...,\boldsymbol u_m
u1,...,um be the left singular vectors,
v
1
,
.
.
.
,
v
n
\boldsymbol v_1,...,\boldsymbol v_n
v1,...,vn the right singular vectors, and
σ
1
,
.
.
.
,
σ
n
\sigma_1,...,\sigma_n
σ1,...,σn the singular values, and let
r
r
r be the rank of
A
A
A. By Theorem 9,
{ u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1,...,ur} (5) is an orthonormal basis for C o l A ColA ColA. - Recall that
(
C
o
l
A
)
⊥
=
N
u
l
A
T
(Col A)^{\perp}= NulA^T
(ColA)⊥=NulAT . Hence
{ u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1,...,um} (6)is an orthonormal basis for N u l A T NulA^T NulAT . - Since
∥
A
v
i
∥
=
σ
i
\left\|A\boldsymbol v_i\right\| =\sigma_i
∥Avi∥=σi for
1
≤
i
≤
n
1\leq i\leq n
1≤i≤n, and
σ
i
\sigma_i
σi is 0 if and only if
i
>
r
i > r
i>r, the vectors
v
r
+
1
,
.
.
.
,
v
n
\boldsymbol v_{r+1},...,\boldsymbol v_n
vr+1,...,vn span a subspace of
N
u
l
A
NulA
NulA of dimension
n
−
r
n - r
n−r. By the Rank Theorem,
d
i
m
N
u
l
A
=
n
−
r
a
n
k
A
=
n
−
r
dim NulA = n - rankA=n-r
dimNulA=n−rankA=n−r. It follows that
{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn} (7)is an orthonormal basis for N u l A NulA NulA. -
(
N
u
l
A
)
⊥
=
C
o
l
A
T
=
R
o
w
A
(Nul A)^\perp= ColA^T = RowA
(NulA)⊥=ColAT=RowA. Hence, from
(
7
)
(7)
(7),
{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr} (8)is an orthonormal basis for R o w A RowA RowA.
- Figure 4 summarizes
(
5
)
–
(
8
)
(5)–(8)
(5)–(8), but shows the orthogonal basis
{
σ
1
u
1
,
.
.
.
,
σ
r
u
r
}
\{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\}
{σ1u1,...,σrur} for
C
o
l
A
ColA
ColA instead of the normalized basis, to remind you that
A
v
i
=
σ
i
u
i
A\boldsymbol v_i= \sigma_i \boldsymbol u_i
Avi=σiui for
1
≤
i
≤
r
1\leq i \leq r
1≤i≤r.
- The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.
Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)
- When
Σ
\Sigma
Σ contains rows or columns of zeros, a more compact decomposition of
A
A
A is possible. Using the notation established above, let
r
=
r
a
n
k
A
r= rankA
r=rankA, and partition
U
U
U and
V
V
V into submatrices whose first blocks contain
r
r
r columns:
Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that
- This factorization of
A
A
A is called a reduced singular value decomposition of
A
A
A. Since the diagonal entries in
D
D
D are nonzero,
D
D
D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of
A
A
A:
- The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.
Supplementary EXERCISE 12
- Verify the properties of
A
+
A^+
A+:
a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.
Supplementary EXERCISE 13
Suppose the equation
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b is consistent, and let
x
+
=
A
+
b
\boldsymbol x^+ = A^+\boldsymbol b
x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector
p
\boldsymbol p
p in
R
o
w
A
RowA
RowA such that
A
p
=
b
A\boldsymbol p =\boldsymbol b
Ap=b. The following steps prove that
x
+
=
p
\boldsymbol x^+ =\boldsymbol p
x+=p and
x
+
\boldsymbol x^+
x+ is the minimum length solution of
A
x
=
b
A\boldsymbol x=\boldsymbol b
Ax=b.
a. Show that
x
+
\boldsymbol x^+
x+ is in
R
o
w
A
RowA
RowA.
b. Show that
x
+
\boldsymbol x^+
x+ is a solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b.
c. Show that if
u
\boldsymbol u
u is any solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b, then
∥
x
+
∥
≤
∥
u
∥
\left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\|
∥x+∥≤∥u∥, with equality only if
u
=
x
+
\boldsymbol u = \boldsymbol x^+
u=x+.
SOLUTION
a.
x
+
=
V
r
D
−
1
U
r
T
\boldsymbol x^+=V_rD^{-1}U_r^T
x+=VrD−1UrT. Since the columns of
V
r
V_r
Vr form an orthonormal basis for
R
o
w
A
RowA
RowA,
x
+
\boldsymbol x^+
x+ is a linear combination of the
R
o
w
A
RowA
RowA's orthonormal basis. Thus
x
+
\boldsymbol x^+
x+ is in
R
o
w
A
RowA
RowA.
b.
A
x
+
=
A
A
+
b
=
A
A
+
A
x
=
A
x
=
b
A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b
Ax+=AA+b=AA+Ax=Ax=b
c.
x
+
\boldsymbol x^+
x+ is the orthogonal projection of
u
\boldsymbol u
u onto
R
o
w
A
RowA
RowA. …
Supplementary EXERCISE 14
Given any
b
\boldsymbol b
b in
R
m
\R^m
Rm, adapt Exercise 13 to show that
A
+
b
A^+\boldsymbol b
A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation
A
x
=
b
^
A\boldsymbol x = \hat\boldsymbol b
Ax=b^, where
b
^
\hat\boldsymbol b
b^ is the orthogonal projection of
b
\boldsymbol b
b onto
C
o
l
A
ColA
ColA.]
EXAMPLE 8 (Least-Squares Solution)
Given the equation
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b, use the pseudoinverse of
A
A
A to define
Then,
U
r
U
r
T
b
U_rU_r^T\boldsymbol b
UrUrTb is the orthogonal projection
b
^
\hat\boldsymbol b
b^ of
b
\boldsymbol b
b onto
C
o
l
A
ColA
ColA. Thus
x
^
\hat\boldsymbol x
x^ is a least-squares solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b. In fact, this
x
^
\hat \boldsymbol x
x^ has the smallest length among all least-squares solutions of
A
x
=
b
A\boldsymbol x=\boldsymbol b
Ax=b. See Supplementary Exercise 14.
Ref
- 《统计学习方法》
- L i n e a r Linear Linear a l g e b r a algebra algebra a n d and and i t s its its a p p l i c a t i o n s applications applications