A Tutorial for computing the full SVD of a matrix the easy way. A handy shortcut for computing "left" eigenvectors is described.
Dr. E. Garcia
Mi Islita.com
Email | Last Update: 09/11/06
Topics
Revisiting Singular Values
Computing "Right" Eigenvectors
Computing "Left" Eigenvectors
Computing the Full SVD
The Reduced SVD
Summary
Tutorial Review
References
Revisiting Singular Values
In Part 2 of this tutorial you have learned that SVD decomposes a regular matrix A into three matrices
Equation 1: A = USVT
S was computed by the following procedure:
- AT and ATA were computed.
- the eigenvalues of ATA were determined and sorted in descending order, in the absolute sense. The nonnegative square roots of these are the singular values of A.
- S was constructed by placing singular values in descending order along its diagonal.
You learned that the Rank of a Matrix is the number of nonzero singular values.
You also learned that since S is a diagonal matrix, its nondiagonal elements are equal to zero. This can be verified by computing S fromUTAV. However, one would need to know U first, which we have not defined yet. Either way, the alternate expression for S is obtained by postmultipliying by V and premultiplying by UT Equation 1:
Equation 2: AV = USVTV = US
Equation 3: UTAV = S
This implies that U and V are orthogonal matrices. As discussed in Matrix Tutorial 2: Basic Matrix Operations, if a matrix M is orthogonal then
Equation 4: MMT = MTM = I = 1
where I is the identity matrix. But, we know that MM-1 = I = 1. Consequently, MT = M-1.
Computing "right" eigenvectors, V, and VT
In the example given in Part 2 you learned that the eigenvalues of AAT and ATA are identical since both respond to the same characteristic equation; e.g.
Figure 1. Characteristic equation and eigenvalues for AAT and ATA.
Let's use these eigenvalues to compute the eigenvectors of ATA. This is done by solving
Equation 5: (A - ciI)Xi = 0
As mentioned in Matrix Tutorial 3: Eigenvalues and Eigenvectors for large matrices one would need to resource to the Power Method or other methods to do this. Fortunately in this case we are dealing with a small matrix, so we only need to use simple algebra.
We first compute eigenvectors for each eigenvalue, c1 = 40 and c2 = 10. Once computed, we convert eigenvectors to unit vectors. This is done by normalizing their lengths. Figure 2 illustrates these steps.
Figure 2. Right eigenvectors of ATA
We would have arrived at identical results if during normalization we assumed an arbitrary coordinate value for either x1 or x2. We now construct V by placing vectors along its columns and compute VT
Figure 3. V and its transpose VT
Hey! That wasn't that hard.
Note that we constructed V by preserving the order in which singular values were placed along the diagonal of S. That is, we placed the largest eigenvector in the first column and the second eigenvector in the second column. These end paired with singular values placed along the diagonal of S. Preserving the order in which singular values, eigenvalues and eigenvectors are placed in their corresponding matrices is very important. Otherwise we end with the wrong SVD.
Let's compute now the "left" eigenvectors and U.
Computing "left" eigenvectors and U
To compute U we can reuse eigenvalues and compute in exactly the same manner the eigenvectors of AAT. Once these are computed we place these along the columns of U. However, with large matrices this is time consuming. In fact, one would need to compute eigenvectors by resourcing again to the Power Method or other suitable methods.
In practice, it is common to use the following shortcut. Postmultiply Equation 2 by S-1 to obtain
Equation 6: AVS-1 = USS-1
Equation 7: U = AVS-1
and then compute U. Since A and V are known already, we just need to invert S. Since S is a diagonal matrix, it must follows that
Figure 4. Inverted Singular Matrix.
Since s1 = 401/2 = 6.3245 and s2 = 101/2 = 3.1622 (expressed to four decimal places), then
Figure 5. "Left" eigenvectors and U.
That was quite a mechanical task. Huh?
This shortcut is very popular since simplifies calculations. Unfortunately, its widespread use has resulted in many overlooking important information contained in the AATmatrix. In recent years, LSI researchers has found that high-order term-term co-occurrence patterns contained in this matrix might be important. At least two studies (1, 2), one a 2005 thesis, indicate that high-order term-term co-occurrence present in this matrix might be at the heart of LSI.
These studies are:
- Understanding LSI via the Truncated Term-term Matrix
- A Framework for Understanding Latent Semantic Indexing (LSI) Performance
In the first issue of our IR Watch - The Newsletter -which is free- our subscribers learned about this thesis and other equally interesting LSI resources.
The orthogonal nature of the V and U matrices is evident by inspecting their eigenvectors. This can be demonstrated by computing dot products between column vectors. All dot products are equal to zero. A visual inspection is also possible in this case. In Figure 6 we have plotted eigenvectors. Observe that they are all orthogonal and end to the right and left of each other, from here the reference to these as "right" and "left" eigenvectors.
Figure 6. "Right" and "Left" Eigenvectors.
Computing the Full SVD
So, we finally know U,S, V and VT. To complete the proof, we reconstruct A by computing its full SVD.
Figure 7. Computing the full SVD.
So as we can see, SVD is a straightforward matrix decomposition and reconstruction technique.
The Reduced SVD
Obtaining an approximation of the original matrix is quite easy. This is done by truncating the three matrices obtained from a full SVD. Essentially we keep the first k columns of U, the first k rows of VT and the first k rows and columns of S; that is, the first k singular values. This removes noisy dimensions and exposes the effect of the largest ksingular values on the original data. This effect is hidden, masked, latent, in the full SVD.
The reduction process is illustrated in Figure 8 and is often referred to as "computing the reduced SVD", dimensionality reduction or the Rank k Approximation.
Figure 8. The Reduced SVD or Rank k Approximation.
The shaded areas indicate the part of the matrices retained. The approximated matrix Ak is the Rank k Approximation of the original matrix and is defined as
Equation 8: Ak = UkSkVTk
So, once these matrices are approximated we simply compute their products to get Ak.
Quite easy. Huh?
Summary
So far we have learned that the full SVD of a matrix A can be computed by the following procedure:
- compute its transpose AT and ATA.
- determine the eigenvalues of ATA and sort these in descending order, in the absolute sense. Square roots these to obtain the singular values of A.
- Construct diagonal matrix S by placing singular values in descending order along its diagonal. Compute its inverse, S-1.
- use the ordered eigenvalues from step 2 and compute the eigenvectors of ATA. Place these eigenvectors along the columns of V and compute its transpose, VT.
- Compute U as U = AVS-1. To complete the proof, compute the full SVD using A = USVT.
These steps are also summarized in our Singular Value Decomposition (SVD) - A Fast Track Tutorial.
Before concluding, let me mention this: in this tutorial, you have learned how SVD is applied to a matrix where m = n. This is just only one possible scenario. In general, if
- m = n and all singular values are greater than zero, the pseudoinverse of A is given by A-1 = VS-1UT.
- m < n, S is m x n with last column elements being all zero. The SVD then gives a solution with minimum norm.
- m > n, S is n x n, there are more equations than unknowns, and the SVD solution is least square.
Movellan discuss these cases in great detail.
So, how does SVD is used in Latent Semantic Indexing (LSI)?
In LSI, it is not the intent to reconstruct A. The goal is to find the best rank k approximation of A that would improve retrieval. The selection of k and the number of singular values in S to use is still an open area of research. During her ternure at Bellcore (now Telcordia), Microsoft's Susan Dumais mentioned in the 1995 presentationTranscription of the Application that her research group experimented with k values largely "by seat of the pants".
Early studies with the MED database using few hundred documents and dozen of queries indicate that performance versus k values are not entirely proportional, but tend to describe an inverted U-shaped curve of around k = 100. These results might change at other experimental conditions. At the time of writing the optimum k values are still determined via trial-and-error experimentation.
Now that we have the basic calculations out of the way, let's move forward and and learn how LSI scores documents and queries. It is time to demystify these calculations. Wait for Part 4 and see.
This is getting exciting.
Next: SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations
Prev: SVD and LSI Tutorial 2: Computing Singular Values
Tutorial Review
For the matrix

- Compute the eigenvalues of ATA.
- Prove that this is a matrix of Rank 2.
- Compute its full SVD.
- Compute its Rank 2 Approximation.
References
- Understanding LSI via the Truncated Term-term Matrix, Thesis, Regis Newo, Germany (2005).
- A Framework for Understanding Latent Semantic Indexing (LSI) Performance, April Kontostathis and William Pottenger (Lehigh University).
- Transcription of the Application, Susan Dumais, Bellcore (1995).