线性代数 · SVD | 令人困扰的精度 2

原创已于 2025-09-27 07:05:28 修改 · 474 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#线性代数 · SVD

于 2025-09-25 01:17:39 首次发布

mathematics 专栏收录该内容

186 篇文章

订阅专栏

注：本文为 “线性代数 · SVD” 相关英文引文，机翻未校。
如有内容异常，请看原文。

csdn 篇幅字数限制，分为两篇，此为第 2 篇。

线性代数 · SVD | 令人困扰的精度 1-优快云博客
https://blog.youkuaiyun.com/u013669912/article/details/152056616

Properties

性质

Singular value decomposition has lots of useful properties, some of which we’ll prove here. First, note that taking the transpose of a singular value decomposition $\Sigma V^{T}$ gives another singular value decomposition
奇异值分解具有许多实用性质，以下将证明其中一部分。首先，对奇异值分解 $\Sigma V^{T}$ 取转置，可得到另一个奇异值分解

$M^{T} = V \Sigma^{T} U^{T}$

showing that $M^{T}$ has the same singular values as $M$ , but with the left and right singular vectors swapped. This can be proven more conceptually as follows.
这表明 $M^{T}$ 与 $M$ 具有相同的奇异值，但左、右奇异向量互换。我们也可以从更概念化的角度证明这一点，如下所示。

Key lemma #2: Write $\langle u, M v \rangle = \langle M^{T} u, v \rangle$ . Then for every $\leq i \leq r$ , the left and right singular vectors $u_{i}$ , $v_{i}$ maximize the value of $B (u, v)$ subject to the constraint that $\|u\| = \|v\| = 1$ , $u$ is orthogonal to $u_{j}$ for all $\leq i - 1$ , and $v$ is orthogonal to $v_{j}$ for all $\leq i - 1$ . This maximum value is $\sigma_{i}$ .
关键引理2：定义 $\langle u, M v \rangle = \langle M^{T} u, v \rangle$ 。则对任意 $\leq i \leq r$ ，左奇异向量 $u_{i}$ 和右奇异向量 $v_{i}$ 满足：在约束条件（ $\|u\| = \|v\| = 1$ 、 $u$ 与所有 $\leq i - 1$ 的 $u_{j}$ 正交、 $v$ 与所有 $\leq i - 1$ 的 $v_{j}$ 正交）下， $B (u, v)$ 取得最大值，且该最大值为 $\sigma_{i}$ 。

Proof. At the maximum value of $B (u, v)$ subject to the above constraints, if we fix $v$ then $B(\cdot, v)$ takes its maximum value at $u$ . But for fixed $v$ , $\langle u, M v \rangle$ uniquely takes its maximum value when $u$ is proportional to $M v$ (if $\neq 0$ ), hence must in fact be equal to $\frac{M v}{\|M v\|}$ ; moreover, this is always possible thanks to key lemma #1. So we are in fact maximizing
证明：在上述约束条件下，当 $B (u, v)$ 取得最大值时，若固定 $v$ ，则 $B(\cdot, v)$ 在 $u$ 处取得最大值。对固定的 $v$ ， $\langle u, M v \rangle$ 的最大值唯一地在 $u$ 与 $M v$ 成比例（若 $\neq 0$ ）时取得，因此 $u$ 必等于 $\frac{M v}{\|M v\|}$ ；且借助关键引理1，这种选择始终可行。因此，我们实际上是在上述约束条件下最大化

$\left\langle \frac{M v}{\|M v\|}, M v \right\rangle = \|M v\|$

subject to the above constraints, and we already know the solution is given by $v = v_{i}$ .
而我们已知该最大化问题的解为 $v = v_{i}$ ，即证得所需结论。

Left-right symmetry: Let $\sigma_{i}$ , $u_{i}$ , $v_{i}$ be the singular values, left singular vectors, and right singular vectors of $M$ as above. Then $\sigma_{i}$ , $v_{i}$ , $u_{i}$ are the singular values, left singular vectors, and right singular vectors of $M^{T}$ . In particular, $M^{T} u_{i} = \sigma_{i} v_{i}$ .
左右对称性：设 $\sigma_{i}$ 、 $u_{i}$ 、 $v_{i}$ 分别为上述 $M$ 的奇异值、左奇异向量、右奇异向量，则 $\sigma_{i}$ 、 $v_{i}$ 、 $u_{i}$ 分别为 $M^{T}$ 的奇异值、左奇异向量、右奇异向量。特别地，有 $M^{T} u_{i} = \sigma_{i} v_{i}$ 。

Proof. Apply key lemma #2 to $M^{T}$ , and note that $B (u, v)$ is the same for $M$ and $M^{T}$ , just with the roles of $u$ and $v$ switched.
证明：对 $M^{T}$ 应用关键引理2，注意到 $M$ 和 $M^{T}$ 对应的 $B (u, v)$ 完全相同，仅需交换 $u$ 和 $v$ 的角色，即证得所需结论。

Singular = eigen: The left singular vectors $u_{i}$ are the eigenvectors of $M M^{T}$ corresponding to its nonzero eigenvalues, which are $\sigma_{i}^{2}$ for $\leq i \leq r$ . The right singular vectors $v_{i}$ are the eigenvectors of $M^{T} M$ corresponding to its nonzero eigenvalues, which are also $\sigma_{i}^{2}$ for $\leq i \leq r$ .
奇异向量与特征向量的关系：左奇异向量 $u_{i}$ 是矩阵 $M M^{T}$ 对应于非零特征值的特征向量，这些非零特征值为 $\leq i \leq r$ 对应的 $\sigma_{i}^{2}$ ；右奇异向量 $v_{i}$ 是矩阵 $M^{T} M$ 对应于非零特征值的特征向量，这些非零特征值同样为 $\leq i \leq r$ 对应的 $\sigma_{i}^{2}$ 。

Proof. We now know that $v_{i} = \sigma_{i} u_{i}$ and that $M^{T} u_{i} = \sigma_{i} v_{i}$ , hence
证明：由前文可知 $v_{i} = \sigma_{i} u_{i}$ 且 $M^{T} u_{i} = \sigma_{i} v_{i}$ ，因此

$M^{T} M v_{i} = M^{T}(\sigma_{i} u_{i}) = \sigma_{i}^{2} v_{i}$

and
同时有

$M^{T} u_{i} = M(\sigma_{i} v_{i}) = \sigma_{i}^{2} u_{i}.$

Hence $v_{i}$ , $u_{i}$ are orthonormal eigenvectors of $M^{T} M$ , $M M^{T}$ respectively. Moreover, these matrices have rank at most (in fact exactly) $r$ , so this exhausts all eigenvectors corresponding to nonzero eigenvalues.
因此， $v_{i}$ 和 $u_{i}$ 分别是 $M^{T} M$ 和 $M M^{T}$ 的标准正交特征向量。此外，这两个矩阵的秩至多为 $r$ （实际上恰好为 $r$ ），因此上述特征向量已穷尽所有对应于非零特征值的特征向量。

This gives an alternative route to understanding singular value decomposition which comes from writing $M v\|^{2}$ as
这为理解奇异值分解提供了另一种途径：将 $M v\|^{2}$ 表示为

$\|M v\|^{2} = \langle M v, M v \rangle = \langle v, M^{T} M v \rangle$

and then applying the spectral theorem (https://en.wikipedia.org/wiki/Spectral_theorem) to $M^{T} M$ to diagonalize it. But I think it’s worth knowing that there’s a route to singular value decomposition which is independent of the spectral theorem.
然后对 $M^{T} M$ 应用谱定理（https://en.wikipedia.org/wiki/Spectral_theorem）将其对角化。但值得注意的是，存在一种不依赖于谱定理的奇异值分解构造方法（即前文所述方法）。

In addition to the above algebraic characterization of singular values, the singular values also admit the following variational characterization.
除了上述奇异值的代数刻画外，奇异值还具有如下变分刻画。

Variational characterizations of singular values (Courant-Fischer): We have
奇异值的变分刻画（柯朗-费希尔定理）：

$\sigma_{k} = \max_{V \subseteq \mathbb{R}^{m}, \dim V = k} \min_{v \in V, \|v\| = 1} \|M v\|$

以及

$\sigma_{k + 1} = \min_{V \subseteq \mathbb{R}^{m}, \dim V = m - k} \max_{v \in V, \|v\| = 1} \|M v\|$

Proof. For the first characterization, any $k$ -dimensional subspace $V$ intersects $span(v_{k}, \dots, v_{m})$ nontrivially, hence contains a unit vector of the form
证明：对于第一个刻画，任意 $k$ 维子空间 $V$ 与 $span(v_{k}, \dots, v_{m})$ 的交集非空，因此 $V$ 中必包含形如

$\sum_{i = k}^{m} c_{i} v_{i}, \|v\| = \sum_{i = k}^{m} c_{i}^{2} = 1.$

We compute that
计算可得

$\sum_{i = k}^{m} c_{i} \sigma_{i} u_{i}$

and hence that
因此

$\|M v\|^{2} = \sum_{i = k}^{m} c_{i}^{2} \sigma_{i}^{2} \leq \sigma_{k}^{2}.$

We conclude that every $V$ contains a $v$ such that $\|M v\| \leq \sigma_{k}$ , hence $\min_{v \in V, \|v\| = 1} \|M v\| \leq \sigma_{k}$ . Equality is obtained when $span(v_{1}, \dots, v_{k})$ .
由此可知，每个子空间 $V$ 中都存在向量 $v$ 使得 $\|M v\| \leq \sigma_{k}$ ，因此 $\min_{v \in V, \|v\| = 1} \|M v\| \leq \sigma_{k}$ 。当 $span(v_{1}, \dots, v_{k})$ 时，等号成立。

The second characterization is very similar. Any $m - k$ -dimensional subspace $V$ intersects $span(v_{1}, \dots, v_{k + 1})$ nontrivially, hence contains a unit vector of the form
第二个刻画的证明非常类似。任意 $m - k$ 维子空间 $V$ 与 $span(v_{1}, \dots, v_{k + 1})$ 的交集非空，因此 $V$ 中必包含形如

$\sum_{i = 1}^{k + 1} c_{i} v_{i}, \|v\| = \sum_{i = 1}^{k + 1} c_{i}^{2} = 1.$

$\sum_{i = 1}^{k + 1} c_{i} v_{i}$ 的单位向量，其中 $\|v\| = \sum_{i = 1}^{k + 1} c_{i}^{2} = 1$

$\sum_{i = 1}^{k + 1} c_{i} \sigma_{i} u_{i}$

$\|M v\|^{2} = \sum_{i = 1}^{k + 1} c_{i}^{2} \sigma_{i}^{2} \geq \sigma_{k + 1}^{2}.$

We conclude that every $V$ contains a vector $v$ such that $\|M v\| \geq \sigma_{k + 1}$ , hence $\max_{v \in V, \|v\| = 1} \|M v\| \geq \sigma_{k + 1}$ . Equality is obtained when $span(v_{k + 1}, \dots, v_{m})$ .
由此可知，每个子空间 $V$ 中都存在向量 $v$ 使得 $\|M v\| \geq \sigma_{k + 1}$ ，因此 $\max_{v \in V, \|v\| = 1} \|M v\| \geq \sigma_{k + 1}$ 。当 $span(v_{k + 1}, \dots, v_{m})$ 时，等号成立。

The second variational characterization above can be used to prove the following important theorem.
上述第二个变分刻画可用于证明下述重要定理。

Low rank approximation (Eckart-Young): If $\Sigma V^{T}$ is the SVD of $M$ , let $M_{k} = U \Sigma_{k} V^{T}$ where $\Sigma_{k}$ has diagonal entries $\sigma_{1}, \dots, \sigma_{k}$ and all other entries zero. Then $M_{k}$ is the closest matrix to $M$ in operator norm with rank at most $k$ ; that is, $M_{k}$ minimizes $\|M - X\|$ subject to the constraint that $\leq k$ . This minimum value is $\sigma_{k + 1}$ .
低秩逼近（埃卡特-杨定理）：设 $\Sigma V^{T}$ 是 $M$ 的奇异值分解，定义 $M_{k} = U \Sigma_{k} V^{T}$ ，其中 $\Sigma_{k}$ 的对角元为 $\sigma_{1}, \dots, \sigma_{k}$ ，其余元素均为零。则 $M_{k}$ 是算子范数意义下与 $M$ 最接近的秩至多为 $k$ 的矩阵；即 $M_{k}$ 在约束条件 $\leq k$ 下最小化 $\|M - X\|$ ，且该最小值为 $\sigma_{k + 1}$ 。

Proof. Suppose $X$ is a matrix of rank at most $k$ . Let $W = k er (X)$ be the nullspace of $X$ , which by hypothesis has dimension at least $m - k$ . By the second variational characterization above, this means that $W$ contains a vector $w$ such that $\|M w\| \geq \sigma_{k + 1}$ , and since $Xw = 0$ this gives
证明：设 $X$ 是秩至多为 $k$ 的矩阵，令 $W = k er (X)$ （即 $X$ 的零空间），由假设可知 $\dim W \geq m - k$ 。根据上述第二个变分刻画， $W$ 中必存在向量 $w$ 使得 $\|M w\| \geq \sigma_{k + 1}$ ；又因为 $Xw = 0$ ，因此有
$\|(M - X) w\| = \|M w\| \geq \sigma_{k + 1}$

and hence that $\|M - X\| \geq \sigma_{k + 1}$ . Equality is obtained when $X = M_{k}$ as defined above.
进而可得 $\|M - X\| \geq \sigma_{k + 1}$ 。当 $X = M_{k}$ （如上述定义）时，等号成立。

The variational characterizations can also be used to prove the following inequality relating the singular values of two matrices and of their sum, which can be thought of as a quantitative refinement of the observation that the rank of a sum $M + N$ of two matrices is at most the sum of their ranks.
变分刻画还可用于证明下述不等式——该不等式将两个矩阵及其和的奇异值联系起来，可视为对“两个矩阵之和 $M + N$ 的秩至多为两矩阵秩之和”这一结论的定量细化。

Additive perturbation (Weyl): Let $M$ , $N$ be $\times m$ matrices with singular values $\sigma_{i}(M)$ , $\sigma_{i}(N)$ . Then
加性扰动（外尔不等式）：设 $M$ 、 $N$ 均为 $\times m$ 矩阵，其奇异值分别为 $\sigma_{i}(M)$ 、 $\sigma_{i}(N)$ ，则

$\sigma_{k + \ell + 1}(M + N) \leq \sigma_{k + 1}(M) + \sigma_{\ell + 1}(N).$

Proof. We want to bound $\sigma_{k + \ell + 1}(M + N)$ in terms of the singular values of $M$ and $N$ . By the second variational characterization, we have
证明：我们希望用 $M$ 和 $N$ 的奇异值来估计 $\sigma_{k + \ell + 1}(M + N)$ 。根据第二个变分刻画，有

$\sigma_{k + \ell + 1}(M + N) = \min_{V \subseteq \mathbb{R}^{m}, \dim V = m - k - \ell} \max_{v \in V, \|v\| = 1} \|(M + N) v\|.$

To give an upper bound on a minimum value of a function, we just need to give an upper bound on some value that it takes. Let $V_{M}$ and $V_{N}$ be the subspaces of $\mathbb{R}^{m}$ of dimensions $m - k$ , $\ell$ respectively which achieve the minimum values of $\max_{v \in V_{M}, \|v\| = 1} \|M v\|$ and $\max_{v \in V_{N}, \|v\| = 1} \|N v\|$ respectively, and let $V_{M} \cap V_{N}$ be their intersection. This intersection has dimension at least $\ell$ , and by construction
要对一个函数的最小值给出上界，只需对该函数的某个取值给出上界即可。设 $V_{M}$ 和 $V_{N}$ 分别是 $\mathbb{R}^{m}$ 中维数为 $m - k$ 和 $\ell$ 的子空间，且分别使得 $\max_{v \in V_{M}, \|v\| = 1} \|M v\|$ 和 $\max_{v \in V_{N}, \|v\| = 1} \|N v\|$ 取得最小值。令 $V_{M} \cap V_{N}$ （即两子空间的交集），则 $\dim W \geq m - k - \ell$ ；根据构造，有

$\max_{v \in W, \|v\| = 1} \|M v + N v\| \leq \max_{v \in W, \|v\| = 1} \|M v\| + \max_{v \in W, \|v\| = 1} \|N v\| \leq \sigma_{k + 1}(M) + \sigma_{\ell + 1}(N).$

Since $W$ has dimension at least $\ell$ , the above is an upper bound on the value of $\max_{v \in V, \|v\| = 1} \|(M + N) v\|$ for any $\ell)$ -dimensional subspace $\subseteq W$ , from which the conclusion follows.
由于 $W$ 的维数至少为 $\ell$ ，上述不等式给出了对任意 $\ell)$ 维子空间 $\subseteq W$ 而言， $\max_{v \in V, \|v\| = 1} \|(M + N) v\|$ 的上界，由此可推出所需结论。

The slightly curious off-by-one indexing in the above inequality can be understood as follows: if $\sigma_{k + 1}(M)$ and $\sigma_{\ell + 1}(N)$ are both very small, this means that $M$ and $N$ are close to matrices of rank at most $k$ and $\ell$ respectively, and hence $M + N$ is close to a matrix of rank at most $\ell$ , hence $\sigma_{k + \ell + 1}(M + N)$ also ought to be small.
上述不等式中略显特别的“错位1”下标可解释如下：若 $\sigma_{k + 1}(M)$ 和 $\sigma_{\ell + 1}(N)$ 都非常小，则意味着 $M$ 接近秩至多为 $k$ 的矩阵， $N$ 接近秩至多为 $\ell$ 的矩阵，因此 $M + N$ 接近秩至多为 $\ell$ 的矩阵，进而 $\sigma_{k + \ell + 1}(M + N)$ 也应当很小。

Setting $\ell = 0$ in the additive perturbation inequality we deduce the following corollary.
在加性扰动不等式中令 $\ell = 0$ ，可推出如下推论。

Singular values are Lipschitz: The singular values, as functions on matrices, are uniformly Lipschitz with respect to the operator norm with Lipschitz constant 1: that is,
奇异值的 Lipschitz 连续性：奇异值作为定义在矩阵上的函数，关于算子范数是一致 Lipschitz 连续的，且 Lipschitz 常数为 1，即

$\left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|.$

Proof. Apply additive perturbation twice with $\ell = 0$ , first to get
证明：令 $\ell = 0$ ，两次应用加性扰动不等式。第一次应用可得

$\sigma_{k}(M) \leq \sigma_{k}(N) + \sigma_{1}(M - N)$

(remembering that $\sigma_{1}$ is the operator norm), and second to get
（注意 $\sigma_{1}$ 即为算子范数）；第二次应用可得

$\sigma_{k}(N) \leq \sigma_{k}(M) + \sigma_{1}(N - M)$

(remembering that 注意 $\sigma_{1}(N - M) = \sigma_{1}(M - N)$ ).

Combining these two inequalities gives $\left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|$ .
将这两个不等式结合，即可得到 $\left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|$ 。

This is very much not the case with eigenvalues: a small perturbation of a square matrix can have a large effect on its eigenvalues. This is explained e.g. in this blog post by Terence Tao, and is related to pseudospectra.
这一性质与特征值形成鲜明对比：方阵的微小扰动可能导致其特征值发生巨大变化。陶哲轩（Terence Tao）在其博客文章中对此进行了阐述，该现象与伪谱相关。

Setting $\sigma_{\ell + 1}(N) = 0$ (or equivalently $\leq \ell$ ) in the additive perturbation inequality, we deduce the following corollary.
在加性扰动不等式中令 $\sigma_{\ell + 1}(N) = 0$ （等价于 $\leq \ell$ ），可推出如下推论。

Interlacing: Suppose $M$ , $N$ are matrices such that $\leq \ell$ . Then
交错性：设 $M$ 、 $N$ 为矩阵，且满足 $\leq \ell$ ，则

$\sigma_{k + \ell}(M) \leq \sigma_{k}(N) \leq \sigma_{k - \ell}(M).$

(Here, we take $\sigma_{i}(A) = 0$ if $i > r ank (A)$ and $\sigma_{i}(A) = \infty$ if $i < 1$ for any matrix $A$ .)
（对于任意矩阵 $A$ ，此处约定：若 $i > r ank (A)$ ，则 $\sigma_{i}(A) = 0$ ；若 $i < 1$ ，则 $\sigma_{i}(A) = \infty$ 。）

Proof. Apply additive perturbation twice, first to get
证明：两次应用加性扰动不等式。第一次应用可得

$\sigma_{k + \ell}(M) = \sigma_{(k) + (\ell) + 1 - 1}(M) \leq \sigma_{k}(N) + \sigma_{\ell + 1}(M - N) = \sigma_{k}(N)$

$\sigma_{k + \ell}(M) \leq \sigma_{k}(N) + \sigma_{\ell + 1}(M - N) = \sigma_{k}(N)$

(since 由于 $\sigma_{\ell + 1}(M - N) = 0$ because 故 $\leq \ell$ ), and second to get 第二次应用可得

$\sigma_{k}(N) \leq \sigma_{k - \ell}(M) + \sigma_{\ell + 1}(N - M) = \sigma_{k - \ell}(M)$

(since $\sigma_{\ell + 1}(N - M) = \sigma_{\ell + 1}(M - N) = 0$ ).

This completes the proof.
至此证明完毕。

Interlacing gives us some control over what happens to the singular values under a low-rank perturbation (as opposed to a low-norm perturbation; a low-rank perturbation may have arbitrarily high norm, and vice versa). For example, we learn that if all of the singular values of $M$ are clumped together, then a rank- $\ell$ perturbation will keep most of the singular values clumped together, except possibly for either the $\ell$ largest or $\ell$ smallest singular values. We can’t expect any control over these, since in the worst case a rank- $\ell$ perturbation can make the $\ell$ largest singular values arbitrarily large, or make the $\ell$ smallest singular values arbitrarily small.
交错性使我们能够控制低秩扰动下奇异值的变化（与低范数扰动不同：低秩扰动的范数可能任意大，反之亦然）。例如，若 $M$ 的所有奇异值都聚集在一起，则秩为 $\ell$ 的扰动会使大部分奇异值仍保持聚集状态，仅可能影响最大的 $\ell$ 个或最小的 $\ell$ 个奇异值。我们无法对这部分奇异值的变化进行控制，因为在最坏情况下，秩为 $\ell$ 的扰动可使最大的 $\ell$ 个奇异值变得任意大，或使最小的 $\ell$ 个奇异值变得任意小。

A particular special case of a low-rank perturbation is deleting a small number of rows or columns (note that a row or column which is entirely zero does not affect the singular values, so deleting a row or column is equivalent to setting all of its entries to zero), in which case the upper bound above can be tightened.
低秩扰动的一个特殊情形是删除少量行或列（注意：全零行或全零列不影响奇异值，因此删除一行或一列等价于将该行或列的所有元素设为零），此时上述上界可进一步收紧。

Cauchy interlacing: Suppose $M$ is a matrix and $N$ is obtained from $M$ by deleting at most $\ell$ rows. Then
柯西交错性：设 $M$ 为矩阵， $N$ 是由 $M$ 删除至多 $\ell$ 行后得到的矩阵，则

$\sigma_{k}(M) \geq \sigma_{k}(N) \geq \sigma_{k + \ell}(M)$

Proof. The lower bound follows from interlacing (since deleting $\ell$ rows is a rank- $\ell$ perturbation). The upper bound follows from the observation that we have $\|N v\| \leq \|M v\|$ for all $v$ , then applying either variational characterization of the singular values.
证明：下界可由交错性推出（删除 $\ell$ 行属于秩为 $\ell$ 的扰动）。上界可由如下观察推出：对所有向量 $v$ ，有 $\|N v\| \leq \|M v\|$ ；再应用奇异值的任意一个变分刻画，即可得到上界。

Cauchy interlacing also applies to deleting columns, or combinations of rows and columns, because the singular values are unchanged by transposition. In particular, we learn that if $N$ is obtained from $M$ by deleting either a single row or a single column, then the singular values of $N$ interlace with the singular values of $M$ , hence the name.
柯西交错性同样适用于删除列或同时删除行和列的情形，因为转置不改变矩阵的奇异值。特别地，若 $N$ 是由 $M$ 删除一行或一列得到的矩阵，则 $N$ 的奇异值与 $M$ 的奇异值满足交错关系，“交错性”由此得名。

In particular, if all of the singular values of $M$ are clumped together then so are those of $N$ , with no exceptions. Taking the contrapositive, if the singular values of $N$ are spread out, then the singular values of $M$ must be as well.
特别地，若 $M$ 的所有奇异值都聚集在一起，则 $N$ 的所有奇异值也必然聚集在一起，无一例外。取其逆否命题：若 $N$ 的奇异值分散，则 $M$ 的奇异值也必然分散。

Three special cases

三种特殊情形

Three special cases of the general singular value decomposition $\Sigma V^{T}$ are worth pointing out.
一般形式的奇异值分解 $\Sigma V^{T}$ 有三种特殊情形值得关注。

First, if $M$ has orthogonal columns, or equivalently if $M^{T} M$ is diagonal, then the singular values $\sigma_{i}$ are the lengths of its columns, we can take the right singular vectors to be the standard basis vectors $v_{i} = e_{i}$ and we can take the left singular vectors to be the unit rescalings of its columns. This means that we can take $V = I$ to be the identity matrix, and in general suggests that $\|I - V\|$ is a measure of the extent to which the columns of $M$ fail to be orthogonal (with the caveat that $V$ is not unique and so in general we would want to look at the $V$ closest to $I$ ).
第一种情形：若 $M$ 的列向量正交（等价于 $M^{T} M$ 为对角矩阵），则奇异值 $\sigma_{i}$ 等于各列向量的长度；右奇异向量可取标准基向量 $v_{i} = e_{i}$ ；左奇异向量可取各列向量单位化后的向量。这意味着我们可令 $V = I$ （单位矩阵），且通常而言， $\|I - V\|$ 可用于衡量 $M$ 的列向量偏离正交性的程度（需注意： $V$ 并非唯一，因此通常需选取与 $I$ 最接近的 $V$ ）。

Second, if $M$ has orthogonal rows, or equivalently if $M M^{T}$ is diagonal, then the singular values $\sigma_{i}$ are the lengths of its rows, we can take the left singular vectors to be the standard basis vectors $u_{i} = e_{i}$ , and we can take the right singular vectors to be the unit rescalings of its rows. This means that we can take $U = I$ to be the identity matrix, and in general suggests that $\|I - U\|$ is a measure of the extent to which the rows of $M$ fail to be orthogonal (with the same caveat as above).
第二种情形：若 $M$ 的行向量正交（等价于 $M M^{T}$ 为对角矩阵），则奇异值 $\sigma_{i}$ 等于各行向量的长度；左奇异向量可取标准基向量 $u_{i} = e_{i}$ ；右奇异向量可取各行向量单位化后的向量。这意味着我们可令 $U = I$ （单位矩阵），且通常而言， $\|I - U\|$ 可用于衡量 $M$ 的行向量偏离正交性的程度（需注意：与上述情形相同， $U$ 并非唯一）。

Finally, if $M$ is square and an orthogonal matrix, so that $M^{T} M = M M^{T} = I$ , then the singular values $\sigma_{i}$ are all equal to 1, and an arbitrary choice of either the left or the right singular vectors uniquely determines the other. This means that we can take $\Sigma = I$ to be the identity matrix, and in general suggests that $\|I - \Sigma\|$ is a measure of the extent to which $M$ fails to be orthogonal. In fact it is possible to show that the closest orthogonal matrix to $\Sigma V^{T}$ is given by $U V^{T}$ , or in other words by replacing all of the singular values of $M$ with 1, so

第三种情形：若 $M$ 是方阵且为正交矩阵（即满足 $M^{T} M = M M^{T} = I$ ），则所有奇异值 $\sigma_{i}$ 均等于 1；且任意选定左奇异向量或右奇异向量后，另一组奇异向量会被唯一确定。这意味着我们可令 $\Sigma = I$ （单位矩阵），且通常而言， $\|I - \Sigma\|$ 可用于衡量 $M$ 偏离正交性的程度。事实上可证明：与 $\Sigma V^{T}$ 最接近的正交矩阵为 $U V^{T}$ ，换言之，是将 $M$ 的所有奇异值替换为 1 后得到的矩阵，因此

$\|I - \Sigma\| = \max_{i} \left|1 - \sigma_{i}\right|$

is precisely the distance from $M$ to the nearest orthogonal matrix. This fact can be used to solve the orthogonal Procrustes problem.
$\|I - \Sigma\|$ 恰好等于 $M$ 到最近正交矩阵的距离。这一事实可用于求解正交普罗克拉斯提斯问题。

In general, we should expect that the SVD of a matrix $M$ is relevant to answering any question about $M$ whose answer is invariant under left and right multiplication by orthogonal matrices. This includes, for example, the question of low-rank approximations to $M$ with respect to operator norm we answered above, since both rank and operator norm are invariant.
一般而言，对于任何关于矩阵 $M$ 的问题，若其答案在 $M$ 左右两侧均乘以正交矩阵后保持不变，则 $M$ 的奇异值分解（SVD）必然与该问题的解答相关。例如，我们前文讨论的“算子范数意义下 $M$ 的低秩逼近”问题就属于此类——因为矩阵的秩和算子范数均具有正交不变性。

Posted in math.SP | 4 Comments

4 Responses

Ramsay

on March 18, 2017 at 11:17 am | Reply

Btw. In the proof of interlacing, I don’t see the first displayed equality: $\sigma_{k}(N)+\sigma_{\ell+1}(M-N)=\sigma_{k+\ell}(N)$ . What am I missing?
顺便提一句：在交错性的证明中，我无法理解第一个显式等式 $\sigma_{k}(N)+\sigma_{\ell+1}(M-N)=\sigma_{k+\ell}(N)$ ，我哪里理解错了？

Qiaochu Yuan

on March 18, 2017 at 2:14 pm | Reply

Oops, that’s a typo; it should just be $\sigma_{k}(N)$ on the RHS.
哦，那是一个笔误；等式右边应该只有 $\sigma_{k}(N)$ 。

on March 18, 2017 at 11:14 am | Reply

Thanks for that concise and clear introduction to the SVD. I do not understand why it is often not even touched in a first class in linear algebra. It seems to me that it would make sense to introduce it even before the spectral theorem.
感谢你对奇异值分解（SVD）简洁清晰的介绍。我不明白为什么线性代数入门课程通常甚至不会提及它——在我看来，甚至应该在谱定理之前介绍奇异值分解才合理。

Regarding “weighted projections”: up to a scale factor (i.e., a single weight $\sigma_{1}$ ), you can view a linear transformation $\to Y$ between Euclidean spaces as an orthogonal projection. Specifically, if $X$ and $Y$ are subspaces of $\mathbb{R}^{N}$ of dimension $n$ and $m$ respectively, and $P_{Y|X}$ is the orthogonal projection onto $Y$ restricted to $X$ , then the singular values of $P_{Y|X}$ are the cosines of the principal angles. If $\geq m + n$ , then these singular values can take any value in $[0, 1]$ . So we see that any $\to Y$ can be represented as $\sigma_{1}(T) P_{Y|X}$ for an appropriate choice of $X, Y$ as subspaces of $\mathbb{R}^{m + n - 1}$ .
关于“加权投影”：在相差一个比例因子（即单个权重 $\sigma_{1}$ ）的意义下，欧几里得空间之间的线性变换 $\to Y$ 可视为正交投影。具体而言，若 $X$ 和 $Y$ 分别是 $\mathbb{R}^{N}$ 中维数为 $n$ 和 $m$ 的子空间，且 $P_{Y|X}$ 是“到 $Y$ 的正交投影”在 $X$ 上的限制，则 $P_{Y|X}$ 的奇异值等于主角度的余弦值。若 $\geq m + n$ ，则这些奇异值可取 $[0, 1]$ 中的任意值。因此可知，对 $\mathbb{R}^{m + n - 1}$ 中适当选取的子空间 $X, Y$ ，任意线性变换 $\to Y$ 均可表示为 $\sigma_{1}(T) P_{Y|X}$ 。

Also, regarding the best orthogonal transformation to represent $M$ , it is worth pointing out that you are talking about the orthogonal factor in the polar decomposition, which is an immediate consequence of the SVD. We can always represent our matrix $M$ as a composition of an orthogonal matrix and a positive semidefinite matrix: $\Phi R = R' \Phi$ where $\Phi = U V^{T}$ and $\Sigma V^{T}$ and $\Sigma U^{T}$ .
此外，关于“表示 $M$ 的最佳正交变换”，值得指出的是：你所讨论的是极分解中的正交因子，而极分解是奇异值分解（SVD）的直接推论。我们总可将矩阵 $M$ 表示为一个正交矩阵与一个半正定矩阵的乘积： $\Phi R = R' \Phi$ ，其中 $\Phi = U V^{T}$ ， $\Sigma V^{T}$ ， $\Sigma U^{T}$ 。

Ammar Husain

on March 15, 2017 at 2:57 am | Reply

Thinking of the QR algorithm and the Toda lattice, you gain when you swap and refactor repeatedly as dressing transformations. Haven’t thought about what if anything reasonable happens when you permute the factors in a square SVD.
考虑到 QR 算法和托达（Toda）格子，当你将“交换因子”和“重新分解”作为修饰变换（dressing transformations）反复进行时，会得到一些有用的结果。但我尚未思考过：对 square SVD（方阵的奇异值分解）中的因子进行置换时，是否会产生任何有意义的结果。