Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)

最新推荐文章于 2025-03-18 12:34:28 发布

连理o

最新推荐文章于 2025-03-18 12:34:28 发布

阅读量1.2k

点赞数 2

分类专栏：线性代数文章标签：线性代数

本文链接：https://blog.youkuaiyun.com/weixin_42437114/article/details/108997195

版权

线性代数专栏收录该内容

50 篇文章

订阅专栏

奇异值

奇异值的定义

Let $A$ be an $\times n$ matrix. Then $A^TA$ is symmetric and can be orthogonally diagonalized. Let $\{\boldsymbol v_1,...,\boldsymbol v_n\}$ be an orthonormal basis for $R^n$ consisting of eigenvectors of $A^TA$ , and let $\{\lambda_1,...,\lambda_n\}$ be the associated eigenvalues of $A^TA$ . Then, for $1\leq i\leq n$ ,
So the eigenvalues of $A^TA$ are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
The singular values of $A$ are the square roots of the eigenvalues of $A^TA$ , denoted by $\sigma_1,...,\sigma_n$ , and they are arranged in decreasing order. By equation (2), the singular values of $A$ are the lengths of the vectors $A\boldsymbol v_1,...,A\boldsymbol v_n$ .
- The first singular value $\sigma_1$ of an $\times n$ matrix $A$ is the maximum of $\left\|Ax\right\|$ over all unit vectors. This maximum value is attained at a unit eigenvector $\boldsymbol v_1$ of $A^TA$ corresponding to the greatest eigenvalue $\lambda_1$ of $A^TA$ . The second singular value is the maximum of $\left\|Ax\right\|$ over all unit vectors orthogonal to $\boldsymbol v_1$ .

EXERCISE

How are the singular values of $A$ and $A^T$ related?

SOLUTION

$A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T$ . This is an SVD of $A^T$ because $V$ and $U$ are orthogonal matrices and $\Sigma ^T$ is an $n\times m$ “diagonal” matrix. Since $\Sigma$ and $\Sigma ^T$ have the same nonzero diagonal entries, $A$ and $A^T$ have the same nonzero singular values.

非零奇异值

在这里插入图片描述
PROOF

For $i\neq j$ ,
Thus $\{A\boldsymbol v_1,...,A\boldsymbol v_n\}$ is an orthogonal set.
Furthermore, since the lengths of the vectors $A\boldsymbol v_1,...,A\boldsymbol v_n$ are the singular values of $A$ , and since there are $r$ nonzero singular values, $A\boldsymbol v_i\neq\boldsymbol 0$ if and only if $1\leq i\leq r$ . So $A\boldsymbol v_1,...,A\boldsymbol v_r$ are linearly independent vectors, and they are in $C o l A$ .
Finally, for any $\boldsymbol y=A\boldsymbol x$ in $C o l A$ , we can write $\boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n$ , and
Thus $\boldsymbol y$ is in $Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ , which shows that $\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ is an (orthogonal) basis for $C o l A$ . Hence $r a n k A = d i m C o l A = r$ . ( $r$ 包括了重复的奇异值)

The Singular Value Decomposition (SVD)

奇异值分解

The decomposition of $A$ involves an $m\times n$ “diagonal” matrix $\Sigma$ of the form
where $D$ is an $r\times r$ diagonal matrix ( $r\leq m;r\leq n$ )

在这里插入图片描述

The matrices $U$ and $V$ are not uniquely determined by $A$ . The columns of $U$ in such a decomposition are called left singular vectors of $A$ , and the columns of $V$ are called right singular vectors of $A$ .
注意：常将奇异值按降序排列以确保 $\Sigma$ 的唯一性．
当 $A$ 为正定矩阵时，奇异值分解与特征值分解结果相同
- 对 $A$ 进行特征值分解可得 $A=PDP^{T}$ ，其中 $P$ 的每一列 $\boldsymbol p_i$ 均为 $A$ 的一个特征向量且它们互相正交。易证 $A$ 的特征向量 $\boldsymbol p_i$ 也为 $A^TA$ 的特征向量且对应的 $A$ 的特征值的平方也为 $A^TA$ 的特征值，因此找到了 $A^TA$ 的一组特征向量基 $\{\boldsymbol p_1,...,\boldsymbol p_n\}$ 且对应的特征向量为 $\{\lambda_1^2,..,\lambda_n^2\}$ ，因此 $\boldsymbol v_i=\boldsymbol p_i$ 且 $\sigma_i=\sqrt{\lambda_i^2}=\lambda_i$ ，因此有 $V=P,\Sigma=D$ ，进而可以推出 $U = P$

PROOF

Let $\lambda_i$ and $\boldsymbol v_i$ be as in Theorem 9, so that $\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ is an orthogonal basis for $C o l A$ . Normalize each $A\boldsymbol v_i$ to obtain an orthonormal basis $\{\boldsymbol u_1,...,\boldsymbol u_r\}$ , where
and
Now extend $\{\boldsymbol u_1,...,\boldsymbol u_r\}$ to an orthonormal basis $\{\boldsymbol u_1,...,\boldsymbol u_m\}$ of $R^m$ , and let
By construction, $U$ and $V$ are orthogonal matrices. Also, from (4),
Let $D$ be the diagonal matrix with diagonal entries $\sigma_1,...,\sigma_r$ , and let $\Sigma$ be as in (3) above. Then
Since $V$ is an orthogonal matrix,

EXAMPLE 4

Find a singular value decomposition of
在这里插入图片描述
SOLUTION

Step 1. Find an orthogonal diagonalization of $A^TA$ . The eigenvalues of $A^TA$ are 18 and 0, with corresponding unit eigenvectors
Step 2. Set up $V$ and $\Sigma$ .
Step 3. Construct $U$ . To construct $U$ , first construct $A\boldsymbol v_1$ and $A\boldsymbol v_2$ :
The only column found for $U$ so far is
The other columns of $U$ are found by extending the set $\{\boldsymbol u_1\}$ to an orthonormal basis for $R^3$ . In this case, we need two orthogonal unit vectors $\boldsymbol u_2$ and $\boldsymbol u_3$ that are orthogonal to $\boldsymbol u_1$ . Each vector must satisfy $\boldsymbol u_1^T\boldsymbol x= 0$ , which is equivalent to the equation $x_1-2x_2+ 2x_3= 0$ . A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to $\{\boldsymbol w_1,\boldsymbol w_2\}$ , and obtain
- Another way to find $\boldsymbol u_2$ and $\boldsymbol u_3$ is to realize that $\boldsymbol u_1$ form an orthonormal basis for $C o l A$ . The remaining $\boldsymbol u_2$ and $\boldsymbol u_3$ must be a basis for $A)^\perp = Nul A^T$ .

一些性质

由之前的讨论可知，假设 $A$ 有 $r$ 个非零奇异值，那么 ${Av_1,...,Av_r\}$ 为 $Col\ A$ 的一组正交基。由 $u_i=\frac{Av_1}{\sigma_i}$ 可知， $A$ 的 $r$ 个左奇异向量 $u_1,...,u_r$ 构成了 $Col\ A$ 的一组标准正交基；由此可知， $A$ 的 $m - r$ 个左奇异向量 $u_{r+1},...,u_m$ 构成了 $Null\ A^T$ 的一组标准正交基
由于 $A^T=V\Sigma^TU^T$ ，同理可知， $A$ 的 $r$ 个右奇异向量 $v_1,...,v_r$ 构成了 $Col\ A^T$ 的一组标准正交基； $A$ 的 $n - r$ 个右奇异向量 $v_{r+1},...,v_n$ 构成了 $Null\ A$ 的一组标准正交基

几何解释

从线性变换的角度理解奇异值分解， $\times n$ 矩阵 $A$ 表示从 $n$ 维空间 $R^n$ 到 $m$ 维空间 $R^m$ 的一个线性变换，
$T:x\rightarrow Ax$
由奇异值分解可知，线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换

正交矩阵对应的正交变换不改变向量长度，也不改变向量内积结果，因此不改变向量的正交性。也就是说，一组正交基在经过正交变换后仍然是一组正交基，且基的长度不变，因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)

紧奇异值分解与截断奇异值分解

定理 10 给出的奇异值分解
$A=U\Sigma V^T$ 又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解，截断奇异值分解是比原始矩阵低秩的奇异值分解

紧奇异值分解

在这里插入图片描述

证明
$\begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned}$

截断奇异值分解

在矩阵的奇异值分解中，只取最大的 $k$ 个奇异值 ( $k < r$ , $r$ 为矩阵的秩) 对应的部分，就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时，通常指截断奇异值分解

在这里插入图片描述

奇异值分解与矩阵近似

奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似，即数据压缩。紧奇异值分解对应着无损压缩，截断奇异值分解对应着有损压缩

弗罗贝尼乌斯范数 (Frobenius norm)

矩阵的弗罗贝尼乌斯范数是向量的 $L_2$ 范数的直接推广，对应着机器学习中的平方损失函数

在这里插入图片描述

在这里插入图片描述
证明

一般地，若 $Q$ 是 $m$ 阶正交矩阵，则有
因为
若 $P$ 是 $n$ 阶正交矩阵，则由 $A||_F=||A^T||_F$ 可知，
$AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F$
故
$\|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}}$

矩阵的最优近似

在这里插入图片描述

上述定理说明，奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似，即数据压缩。紧奇异值分解对应着无损压缩，截断奇异值分解对应着有损压缩
将 $A$ 进行谱分解可得
$A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T$ 一般地，设 $A$ 的截断奇异值分解为 $A_k=\sum_{i=1}^k\sigma_iu_iv_i^T$ 则 $A_k$ 的秩为 $k$ ，并且 $A_k$ 是秩为 $k$ 的矩阵中在弗罗贝尼乌斯范数意义下 $A$ 的最优近似矩阵；由于通常奇异值 $\sigma_i$ 递减很快，所以 $k$ 取很小值时， $A_k$ 也可以对 $A$ 有很好的近似

证明

设 $X\in\mathcal M$ 为满足 $||A-X||_F=\min_{S\in\mathcal M}||A-S||_F$ 的一个矩阵，因此有
下面证明
即可
设 $X$ 的奇异值分解为 $Q\Omega P^T$ ，其中
若令矩阵 $B = Q^TAP$ ，则 $A=QBP^T$ 。由此得到
用 $\Omega$ 的分块方法对 $B$ 分块
可得
现证 $B_{12} = 0$ , $B_{21} = 0$ 。用反证法。若 $B_{12}\neq0$ ，令
则 $Y\in\mathcal M$ ，且
这与 $X$ 的定义式矛盾，证明了 $B_{12} = 0$ 。同样可证 $B_{21} = 0$ 。于是
再证 $B_{11} = \Omega_k$ 。为此令
则 $Z\in\mathcal M$ ，且
因此
$||B_{11}-\Omega_k||_F^2=0$ 即 $B_{11}=\Omega_k$ 。最后看 $B_{22}$ 。若 $(m-k)\times (n-k)$ 子矩阵 $B_{22}$ 有奇异值分解 $U_{1} \Lambda V_{1}^{\mathrm{T}}$ ，则
下面证明 $\Lambda$ 的对角线元素为 $A$ 的奇异值。为此，令
其中 $I_k$ 为 $k$ 阶单位矩阵，则
$\begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned}$ 因此
由此可知 $\Lambda$ 的对角线元素为 $A$ 的奇异值。故有
于是证明了

Applications of the Singular Value Decomposition

The next few exercises show some interesting facts.

EXERCISE 19

$A$ is an $m\times n$ matrix with a singular value decomposition $A=U\Sigma V^T$ , where $U$ is an $m\times m$ orthogonal matrix, $\Sigma$ is an $m\times n$ “diagonal” matrix with $r$ positive entries and no negative entries, and $V$ is an $n\times n$ orthogonal matrix. Show that the columns of $V$ are eigenvectors of $A^TA$ , the columns of $U$ are eigenvectors of $AA^T$ , and the diagonal entries of $\Sigma$ are the singular values of $A$ .

SOLUTION

[Hint: Use the SVD to compute $A^TA$ and $AA^T$ .]

EXERCISE 25

Let $\R^n\mapsto \R^m$ be a linear transformation. Describe how to find a basis $\mathcal B$ for $R^n$ and a basis $\mathcal C$ for $R^m$ such that the matrix for $T$ relative to $\mathcal B$ and $\mathcal C$ is an $\times n$ “diagonal” matrix.

SOLUTION

Consider the SVD for the standard matrix of $T$ , say, $U\sum V^T$ . Let $\mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\}$ and $\{\boldsymbol u_1, …, \boldsymbol u_m\}$ be bases constructed from the columns of $V$ and $U$ , respectively. Observe that, since the columns of $V$ are orthonormal, $V^T\boldsymbol v_j = \boldsymbol e_j$ , where $\boldsymbol e_j$ is the $j$ th column of the $n\times n$ identity matrix. To find the matrix of $T$ relative to $\mathcal B$ and $\mathcal C$ , compute
So $[T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j$ . The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix $\Sigma$ is the matrix of $T$ relative to $\mathcal B$ and $\mathcal C$ .

Polar Decomposition (极分解)

Prove that any $n\times n$ matrix $A$ admits a polar decomposition of the form $A = P Q$ , where $P$ is an $\times n$ positive semidefinite matrix with the same rank as $A$ and where $Q$ is an $n\times n$ orthogonal matrix.

Proof

[Hint: Use a singular value decomposition, $U\Sigma V^T$ , and observe that $A=(U\Sigma U^T)(UV^T)$ and $U\Sigma U^T$ is a symmetric matrix.]

估计矩阵的秩

在这里插入图片描述

Check Theorem 9

The Condition Number (条件数)

Most numerical calculations involving an equation $A\boldsymbol x =\boldsymbol b$ are as reliable as possible when the SVD of $A$ is used. The two orthogonal matrices $U$ and $V$ do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in $\sum$ . If the singular values of $A$ are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in $\sum$ and $V$ .
If $A$ is an invertible $n\times n$ matrix, then the ratio $\sigma_1=\sigma_n$ of the largest and smallest singular values gives the condition number of $A$ . Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of $A\boldsymbol x =\boldsymbol b$ to changes (or errors) in the entries of A. (Actually, a “condition number” of $A$ can be computed in several ways, but the definition given here is widely used for studying $A\boldsymbol x =\boldsymbol b$ .)

Bases for Fundamental Subspaces

Given an SVD for an $\times n$ matrix $A$ , let $\boldsymbol u_1,...,\boldsymbol u_m$ be the left singular vectors, $\boldsymbol v_1,...,\boldsymbol v_n$ the right singular vectors, and $\sigma_1,...,\sigma_n$ the singular values, and let $r$ be the rank of $A$ . By Theorem 9,
$\{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5)$ is an orthonormal basis for $C o l A$ .
Recall that $A)^{\perp}= NulA^T$ . Hence
$\{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6)$ is an orthonormal basis for $NulA^T$ .
Since $\left\|A\boldsymbol v_i\right\| =\sigma_i$ for $1\leq i\leq n$ , and $\sigma_i$ is 0 if and only if $i > r$ , the vectors $\boldsymbol v_{r+1},...,\boldsymbol v_n$ span a subspace of $N u l A$ of dimension $n - r$ . By the Rank Theorem, $d i m N u l A = n - r a n k A = n - r$ . It follows that
$\{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7)$ is an orthonormal basis for $N u l A$ .
$A)^\perp= ColA^T = RowA$ . Hence, from $(7)$ ,
$\{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8)$ is an orthonormal basis for $R o w A$ .

Figure 4 summarizes $(5) - (8)$ , but shows the orthogonal basis $\{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\}$ for $C o l A$ instead of the normalized basis, to remind you that $A\boldsymbol v_i= \sigma_i \boldsymbol u_i$ for $1\leq i \leq r$ .

The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

在这里插入图片描述

Reduced SVD and the Pseudoinverse of $A$ (奇异值分解的简化和 $A$ 的伪逆)

When $\Sigma$ contains rows or columns of zeros, a more compact decomposition of $A$ is possible. Using the notation established above, let $r = r a n k A$ , and partition $U$ and $V$ into submatrices whose first blocks contain $r$ columns:
Then $U_r$ is $m\times r$ and $V_r$ is $n\times r$ . (To simplify notation, we consider $U_{m-r}$ or $V_{n-r}$ even though one of them may have no columns.) Then partitioned matrix multiplication shows that
This factorization of $A$ is called a reduced singular value decomposition of $A$ . Since the diagonal entries in $D$ are nonzero, $D$ is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of $A$ :

The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

Supplementary EXERCISE 12

Verify the properties of $A^+$ :
a. For each $\boldsymbol y$ in $R^m$ , $AA^+\boldsymbol y$ is the orthogonal projection of $\boldsymbol y$ onto $C o l A$ .
b. For each $\boldsymbol x$ in $R^n$ , $A^+A\boldsymbol x$ is the orthogonal projection of $\boldsymbol x$ onto $R o w A$ .
c. $AA^+A = A$ and $A^+AA^+ = A^+$ .

Supplementary EXERCISE 13
Suppose the equation $A\boldsymbol x =\boldsymbol b$ is consistent, and let $\boldsymbol x^+ = A^+\boldsymbol b$ . By Exercise 23 in Section 6.3, there is exactly one vector $\boldsymbol p$ in $R o w A$ such that $A\boldsymbol p =\boldsymbol b$ . The following steps prove that $\boldsymbol x^+ =\boldsymbol p$ and $\boldsymbol x^+$ is the minimum length solution of $A\boldsymbol x=\boldsymbol b$ .
a. Show that $\boldsymbol x^+$ is in $R o w A$ .
b. Show that $\boldsymbol x^+$ is a solution of $A\boldsymbol x =\boldsymbol b$ .
c. Show that if $\boldsymbol u$ is any solution of $A\boldsymbol x =\boldsymbol b$ , then $\left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\|$ , with equality only if $\boldsymbol u = \boldsymbol x^+$ .
SOLUTION
a. $\boldsymbol x^+=V_rD^{-1}U_r^T$ . Since the columns of $V_r$ form an orthonormal basis for $R o w A$ , $\boldsymbol x^+$ is a linear combination of the $R o w A$ 's orthonormal basis. Thus $\boldsymbol x^+$ is in $R o w A$ .
b. $A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b$
c. $\boldsymbol x^+$ is the orthogonal projection of $\boldsymbol u$ onto $R o w A$ . …

Supplementary EXERCISE 14
Given any $\boldsymbol b$ in $R^m$ , adapt Exercise 13 to show that $A^+\boldsymbol b$ is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation $A\boldsymbol x = \hat\boldsymbol b$ , where $\hat\boldsymbol b$ is the orthogonal projection of $\boldsymbol b$ onto $C o l A$ .]

EXAMPLE 8 (Least-Squares Solution)
Given the equation $A\boldsymbol x =\boldsymbol b$ , use the pseudoinverse of $A$ to define

在这里插入图片描述
Then,

在这里插入图片描述
$U_rU_r^T\boldsymbol b$ is the orthogonal projection $\hat\boldsymbol b$ of $\boldsymbol b$ onto $C o l A$ . Thus $\hat\boldsymbol x$ is a least-squares solution of $A\boldsymbol x =\boldsymbol b$ . In fact, this $\hat \boldsymbol x$ has the smallest length among all least-squares solutions of $A\boldsymbol x=\boldsymbol b$ . See Supplementary Exercise 14.