Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)

  • As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP1 with D D D diagonal. However, a special factorization (Singular Value Decomposition) A = Q D P − 1 A = QDP^{-1} A=QDP1 is possible for any m × n m\times n m×n matrix A A A!

奇异值

奇异值的定义

  • Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1in,
    在这里插入图片描述So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
    在这里插入图片描述
  • The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn.
    • The first singular value σ 1 \sigma_1 σ1 of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| Ax over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1 of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| Ax over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1.

EXERCISE

How are the singular values of A A A and A T A^T AT related?

SOLUTION

  • A T = ( U Σ V T ) T = V Σ T U T A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T AT=(UΣVT)T=VΣTUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and Σ T \Sigma ^T ΣT is an n × m n\times m n×m “diagonal” matrix. Since Σ \Sigma Σ and Σ T \Sigma ^T ΣT have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.

非零奇异值

在这里插入图片描述
PROOF

  • For i ≠ j i\neq j i=j ,
    在这里插入图片描述Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1,...,Avn} is an orthogonal set.
  • Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1ir. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA.
  • Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1v1+...+cnvn, and
    在这里插入图片描述Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r. ( r r r 包括了重复的奇异值)

The Singular Value Decomposition (SVD)

奇异值分解

  • The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix Σ \Sigma Σ of the form
    在这里插入图片描述where D D D is an r × r r\times r r×r diagonal matrix ( r ≤ m ; r ≤ n r\leq m;r\leq n rm;rn)

在这里插入图片描述

  • The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
  • 注意:常将奇异值按降序排列以确保 Σ \Sigma Σ 的唯一性.
  • A A A 为正定矩阵时,奇异值分解与特征值分解结果相同
    • A A A 进行特征值分解可得 A = P D P T A=PDP^{T} A=PDPT,其中 P P P 的每一列 p i \boldsymbol p_i pi 均为 A A A 的一个特征向量且它们互相正交。易证 A A A 的特征向量 p i \boldsymbol p_i pi 也为 A T A A^TA ATA 的特征向量且对应的 A A A 的特征值的平方也为 A T A A^TA ATA 的特征值,因此找到了 A T A A^TA ATA 的一组特征向量基 { p 1 , . . . , p n } \{\boldsymbol p_1,...,\boldsymbol p_n\} {p1,...,pn} 且对应的特征向量为 { λ 1 2 , . . , λ n 2 } \{\lambda_1^2,..,\lambda_n^2\} {λ12,..,λn2},因此 v i = p i \boldsymbol v_i=\boldsymbol p_i vi=pi σ i = λ i 2 = λ i \sigma_i=\sqrt{\lambda_i^2}=\lambda_i σi=λi2 =λi,因此有 V = P , Σ = D V=P,\Sigma=D V=P,Σ=D,进而可以推出 U = P U=P U=P

PROOF

  • Let λ i \lambda_i λi and v i \boldsymbol v_i vi be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur}, where
    在这里插入图片描述and
    在这里插入图片描述
  • Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {u1,...,um} of R m \R^m Rm, and let
    在这里插入图片描述By construction, U U U and V V V are orthogonal matrices. Also, from (4),
    在这里插入图片描述
  • Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1,...,σr , and let Σ \Sigma Σ be as in (3) above. Then
    在这里插入图片描述Since V V V is an orthogonal matrix,
    在这里插入图片描述

EXAMPLE 4

Find a singular value decomposition of
在这里插入图片描述
SOLUTION

  • Step 1. Find an orthogonal diagonalization of A T A A^TA ATA. The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors
    在这里插入图片描述
  • Step 2. Set up V V V and Σ \Sigma Σ.
    在这里插入图片描述在这里插入图片描述
  • Step 3. Construct U U U. To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1 and A v 2 A\boldsymbol v_2 Av2:
    在这里插入图片描述The only column found for U U U so far is
    在这里插入图片描述The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x12x2+2x3=0. A basis for the solution set of this equation is
    在这里插入图片描述Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1,w2}, and obtain
    在这里插入图片描述在这里插入图片描述
    • Another way to find u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 is to realize that u 1 \boldsymbol u_1 u1 form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)=NulAT.

一些性质

  • 由之前的讨论可知,假设 A A A r r r 个非零奇异值,那么 { A v 1 , . . . , A v r } \{Av_1,...,Av_r\} {Av1,...,Avr} C o l   A Col\ A Col A 的一组正交基。由 u i = A v 1 σ i u_i=\frac{Av_1}{\sigma_i} ui=σiAv1 可知, A A A r r r 个左奇异向量 u 1 , . . . , u r u_1,...,u_r u1,...,ur 构成了 C o l   A Col\ A Col A 的一组标准正交基;由此可知, A A A m − r m-r mr 个左奇异向量 u r + 1 , . . . , u m u_{r+1},...,u_m ur+1,...,um 构成了 N u l l   A T Null\ A^T Null AT 的一组标准正交基
  • 由于 A T = V Σ T U T A^T=V\Sigma^TU^T AT=VΣTUT,同理可知, A A A r r r 个右奇异向量 v 1 , . . . , v r v_1,...,v_r v1,...,vr 构成了 C o l   A T Col\ A^T Col AT 的一组标准正交基 A A A n − r n-r nr 个右奇异向量 v r + 1 , . . . , v n v_{r+1},...,v_n vr+1,...,vn 构成了 N u l l   A Null\ A Null A 的一组标准正交基

几何解释

  • 从线性变换的角度理解奇异值分解, m × n m \times n m×n 矩阵 A A A 表示从 n n n 维空间 R n \R^n Rn m m m 维空间 R m \R^m Rm 的一个线性变换,
    T : x → A x T:x\rightarrow Ax T:xAx
  • 由奇异值分解可知,线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换
    在这里插入图片描述

正交矩阵对应的正交变换不改变向量长度,也不改变向量内积结果,因此不改变向量的正交性。也就是说,一组正交基在经过正交变换后仍然是一组正交基,且基的长度不变,因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)

紧奇异值分解与截断奇异值分解

  • 定理 10 给出的奇异值分解
    A = U Σ V T A=U\Sigma V^T A=UΣVT又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解,截断奇异值分解是比原始矩阵低秩的奇异值分解

紧奇异值分解

在这里插入图片描述

证明
A = U Σ V T = [ u 1 . . . u m ] [ Σ r 0 0 0 ] [ v 1 T . . . v n T ] = [ σ 1 u 1 . . . σ r u r 0 . . . 0 ] [ v 1 T . . . v n T ] = ∑ i = 1 r σ i u i v i T = U r Σ r V r T \begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned} A=UΣVT=[u1...um][Σr000]v1T...vnT=[σ1u1...σrur0...0]v1T...vnT=i=1rσiuiviT=UrΣrVrT


截断奇异值分解

  • 在矩阵的奇异值分解中,只取最大的 k k k 个奇异值 ( k < r k < r k<r, r r r 为矩阵的秩) 对应的部分,就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时,通常指截断奇异值分解

在这里插入图片描述

奇异值分解与矩阵近似

  • 奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩

弗罗贝尼乌斯范数 (Frobenius norm)

  • 矩阵的弗罗贝尼乌斯范数是向量的 L 2 L_2 L2 范数的直接推广,对应着机器学习中的平方损失函数

在这里插入图片描述


在这里插入图片描述
证明

  • 一般地,若 Q Q Q m m m 阶正交矩阵,则有
    在这里插入图片描述因为
    在这里插入图片描述
  • P P P n n n 阶正交矩阵,则由 ∣ ∣ A ∣ ∣ F = ∣ ∣ A T ∣ ∣ F ||A||_F=||A^T||_F AF=ATF 可知,
    ∣ ∣ A P T ∣ ∣ F = ∣ ∣ P A T ∣ ∣ F = ∣ ∣ A T ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F APTF=PATF=ATF=AF

  • ∥ A ∥ F = ∥ U Σ V T ∥ F = ∥ Σ ∥ F = ( σ 1 2 + σ 2 2 + . . . + σ n 2 ) 1 2 \|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}} AF=UΣVTF=ΣF=(σ12+σ22+...+σn2)21

矩阵的最优近似

在这里插入图片描述


在这里插入图片描述

  • 上述定理说明,奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
  • A A A 进行谱分解可得
    A = ∑ i = 1 n σ i u i v i T = ∑ i = 1 r σ i u i v i T A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T A=i=1nσiuiviT=i=1rσiuiviT一般地,设 A A A 的截断奇异值分解为 A k = ∑ i = 1 k σ i u i v i T A_k=\sum_{i=1}^k\sigma_iu_iv_i^T Ak=i=1kσiuiviT A k A_k Ak 的秩为 k k k,并且 A k A_k Ak 是秩为 k k k 的矩阵中在弗罗贝尼乌斯范数意义下 A A A 的最优近似矩阵;由于通常奇异值 σ i \sigma_i σi 递减很快,所以 k k k 取很小值时, A k A_k Ak 也可以对 A A A 有很好的近似

证明

  • X ∈ M X\in\mathcal M XM 为满足 ∣ ∣ A − X ∣ ∣ F = min ⁡ S ∈ M ∣ ∣ A − S ∣ ∣ F ||A-X||_F=\min_{S\in\mathcal M}||A-S||_F AXF=minSMASF 的一个矩阵,因此有
    在这里插入图片描述下面证明
    在这里插入图片描述即可
  • X X X 的奇异值分解为 Q Ω P T Q\Omega P^T QΩPT,其中
    在这里插入图片描述若令矩阵 B = Q T A P B = Q^TAP B=QTAP,则 A = Q B P T A=QBP^T A=QBPT。由此得到
    在这里插入图片描述 Ω \Omega Ω 的分块方法对 B B B 分块
    在这里插入图片描述可得
    在这里插入图片描述现证 B 12 = 0 B_{12} = 0 B12=0, B 21 = 0 B_{21} = 0 B21=0。用反证法。若 B 12 ≠ 0 B_{12}\neq0 B12=0,令
    在这里插入图片描述 Y ∈ M Y\in\mathcal M YM,且
    在这里插入图片描述这与 X X X 的定义式矛盾,证明了 B 12 = 0 B_{12} = 0 B12=0。同样可证 B 21 = 0 B_{21} = 0 B21=0。于是
    在这里插入图片描述再证 B 11 = Ω k B_{11} = \Omega_k B11=Ωk。为此令
    在这里插入图片描述 Z ∈ M Z\in\mathcal M ZM,且
    在这里插入图片描述因此
    ∣ ∣ B 11 − Ω k ∣ ∣ F 2 = 0 ||B_{11}-\Omega_k||_F^2=0 B11ΩkF2=0 B 11 = Ω k B_{11}=\Omega_k B11=Ωk。最后看 B 22 B_{22} B22。若 ( m − k ) × ( n − k ) (m-k)\times (n-k) (mk)×(nk) 子矩阵 B 22 B_{22} B22 有奇异值分解 U 1 Λ V 1 T U_{1} \Lambda V_{1}^{\mathrm{T}} U1ΛV1T,则
    在这里插入图片描述下面证明 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。为此,令
    在这里插入图片描述其中 I k I_k Ik k k k 阶单位矩阵,则
    U 2 T Q T A P V 2 = U 2 T B V 2 = [ I k 0 0 U 1 T ] [ Ω k 0 0 U 1 Λ V 1 T ] [ I k 0 0 V 1 ] = [ Ω k 0 0 Λ ] \begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned} U2TQTAPV2=U2TBV2=[Ik00U1T][Ωk00U1ΛV1T][Ik00V1]=[Ωk00Λ]因此
    在这里插入图片描述由此可知 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。故有
    在这里插入图片描述于是证明了
    在这里插入图片描述

Applications of the Singular Value Decomposition

  • The next few exercises show some interesting facts.

EXERCISE 19

A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U Σ V T A=U\Sigma V^T A=UΣVT , where U U U is an m × m m\times m m×m orthogonal matrix, Σ \Sigma Σ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of Σ \Sigma Σ are the singular values of A A A.

SOLUTION

  • [Hint: Use the SVD to compute A T A A^TA ATA and A A T AA^T AAT .]
    在这里插入图片描述

EXERCISE 25

Let T : R n ↦ R m T: \R^n\mapsto \R^m T:RnRm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.

SOLUTION

  • Consider the SVD for the standard matrix of T T T, say, A = U ∑ V T A = U\sum V^T A=UVT. Let B = { v 1 , … , v n } \mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\} B={v1,,vn} and C = { u 1 , … , u m } C = \{\boldsymbol u_1, …, \boldsymbol u_m\} C={u1,,um} be bases constructed from the columns of V V V and U U U, respectively. Observe that, since the columns of V V V are orthonormal, V T v j = e j V^T\boldsymbol v_j = \boldsymbol e_j VTvj=ej, where e j \boldsymbol e_j ej is the j j jth column of the n × n n\times n n×n identity matrix. To find the matrix of T T T relative to B \mathcal B B and C \mathcal C C, compute
    在这里插入图片描述So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj)]C=σjej. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix Σ \Sigma Σ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.

Polar Decomposition (极分解)

  • Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.

Proof

  • [Hint: Use a singular value decomposition, A = U Σ V T A= U\Sigma V^T A=UΣVT , and observe that A = ( U Σ U T ) ( U V T ) A=(U\Sigma U^T)(UV^T) A=(UΣUT)(UVT) and U Σ U T U\Sigma U^T UΣUT is a symmetric matrix.]

估计矩阵的秩

在这里插入图片描述

Check Theorem 9

The Condition Number (条件数)

  • Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum . If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum and V V V .
  • If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)

Bases for Fundamental Subspaces

  • Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1,...,um be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1,...,vn the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn the singular values, and let r r r be the rank of A A A. By Theorem 9,
    { u 1 , . . . , u r }      ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1,...,ur}    (5) is an orthonormal basis for C o l A ColA ColA.
  • Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)=NulAT . Hence
    { u r + 1 , . . . , u m }      ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1,...,um}    (6)is an orthonormal basis for N u l A T NulA^T NulAT .
  • Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i Avi=σi for 1 ≤ i ≤ n 1\leq i\leq n 1in, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r nr. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=nrankA=nr. It follows that
    { v r + 1 , . . . , v n }      ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn}    (7)is an orthonormal basis for N u l A NulA NulA.
  • ( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)=ColAT=RowA. Hence, from ( 7 ) (7) (7),
    { v 1 , . . . , v r }      ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr}    (8)is an orthonormal basis for R o w A RowA RowA.

  • Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1ir.
    在这里插入图片描述

  • The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

在这里插入图片描述

Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)

  • When Σ \Sigma Σ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:
    在这里插入图片描述Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Umr or V n − r V_{n-r} Vnr even though one of them may have no columns.) Then partitioned matrix multiplication shows that
    在这里插入图片描述
  • This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:
    在这里插入图片描述

  • The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

Supplementary EXERCISE 12

  • Verify the properties of A + A^+ A+:
    a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
    b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
    c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.

Supplementary EXERCISE 13
Suppose the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b is consistent, and let x + = A + b \boldsymbol x^+ = A^+\boldsymbol b x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector p \boldsymbol p p in R o w A RowA RowA such that A p = b A\boldsymbol p =\boldsymbol b Ap=b. The following steps prove that x + = p \boldsymbol x^+ =\boldsymbol p x+=p and x + \boldsymbol x^+ x+ is the minimum length solution of A x = b A\boldsymbol x=\boldsymbol b Ax=b.
a. Show that x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. Show that x + \boldsymbol x^+ x+ is a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b.
c. Show that if u \boldsymbol u u is any solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b, then ∥ x + ∥ ≤ ∥ u ∥ \left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\| x+u, with equality only if u = x + \boldsymbol u = \boldsymbol x^+ u=x+.
SOLUTION
a. x + = V r D − 1 U r T \boldsymbol x^+=V_rD^{-1}U_r^T x+=VrD1UrT. Since the columns of V r V_r Vr form an orthonormal basis for R o w A RowA RowA, x + \boldsymbol x^+ x+ is a linear combination of the R o w A RowA RowA's orthonormal basis. Thus x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. A x + = A A + b = A A + A x = A x = b A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b Ax+=AA+b=AA+Ax=Ax=b
c. x + \boldsymbol x^+ x+ is the orthogonal projection of u \boldsymbol u u onto R o w A RowA RowA. …

Supplementary EXERCISE 14
Given any b \boldsymbol b b in R m \R^m Rm, adapt Exercise 13 to show that A + b A^+\boldsymbol b A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation A x = b ^ A\boldsymbol x = \hat\boldsymbol b Ax=b^, where b ^ \hat\boldsymbol b b^ is the orthogonal projection of b \boldsymbol b b onto C o l A ColA ColA.]


EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define

在这里插入图片描述
Then,

在这里插入图片描述
U r U r T b U_rU_r^T\boldsymbol b UrUrTb is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.


在这里插入图片描述
在这里插入图片描述


Ref

  • 《统计学习方法》
  • L i n e a r Linear Linear a l g e b r a algebra algebra a n d and and i t s its its a p p l i c a t i o n s applications applications
### CVX 中 `quad_form` 函数引发的 ValueError 问题解决方案 当在 CVX 或其他凸优化工具包中遇到错误提示 `'Quadratic form matrices must be symmetric/Hermitian'` 时,这通常是因为输入矩阵未满足对称性或厄米特性质的要求。即使理论上该矩阵应为实数且正定,在数值计算过程中可能会引入微小误差,从而破坏其严格对称性。 #### 数值稳定性调整方法 为了修复这一问题,可以采取以下措施: 1. **强制构建对称矩阵** 如果理论上的矩阵 \( A \) 是对称的,但由于浮点运算导致轻微不对称,则可以通过取平均来重建对称版本: ```matlab A_symmetric = (A + A') / 2; ``` 这一操作能够有效消除由于舍入误差引起的非对称部分[^1]。 2. **验证并修正复数成分** 若矩阵可能含有虚部分量(尽管预期为零),则需显式移除这些接近于零的小虚部: ```matlab A_real = real(A); ``` 此外,对于复杂情况下的共轭转置形式 \( x^H A x \),确保 \( A \) 的确是厄米特矩阵(即 \( A = A^H \))。如果存在偏差,同样可通过如下方式重构: ```matlab A_hermitian = (A + conj(A')) / 2; ``` 3. **利用近似正定化技术** 对于几乎正定但因数值扰动失去完全正定性的矩阵,可加入一个小的恒等阵乘子以增强正定特性: ```matlab epsilon = 1e-8; % 小常数用于稳定化处理 A_positive_definite = A + epsilon * eye(size(A)); ``` 以上三种策略单独或者组合应用均有助于规避由细微数值不精确所触发的技术障碍。 ```matlab % 示例代码展示综合修正过程 function fixed_matrix = fix_quad_form_input(A, tol) % tol 定义容忍度阈值,默认设为机器精度级别 if nargin < 2 || isempty(tol), tol = eps(class(A)); end % 去掉极小虚部影响 A_cleaned = real(A); % 构建更严格的对称(厄密)结构 A_fixed = (A_cleaned + conj(A_cleaned.')) / 2; % 添加微量正值偏移保障正定条件 lambda_min = min(eig((A_fixed+A_fixed.')/2)); shift_value = max(-lambda_min+tol,0); fixed_matrix = A_fixed + shift_value*eye(size(A_fixed)); end ``` 通过上述手段预处理数据后再传递给 `quad_form()` 调用即可大幅减少报错几率。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值