MDS aims to embed data in a lower dimensional space in such a way that pair-wise distances between data points are preserved.
Say we have N points xi∈Rn for i∈[1,N] , let X=[x1,x2,⋯,xN] , we don’t know the postion of xi . We are only supplied with the pair-wise Euclidean distances among these points. Now the objection is to find out N points yi∈Rk,k<n , let Y=[y1,y2,⋯,yN] , such that the distance in pairs of X is the same as these of Y.
Given the distance matrix
DX
, each element of
DX
can be written as:
(DXij)2=(xi−xj)T(xi−xj)=∥xi∥2−2xTixj+∥xj∥2
we can easily see that
DX=Z−2XTX+ZT
Here,
Z=zeT
and
z=[∥x1∥2∥x2∥2⋯∥xN∥2]T
. Therefore Z takes the form
Now, let’s translate the mean of the set of hypothetical point set X to the origin. Note that this operation does not change the Euclidean distance between any pairs of points.
For better understanding, we introduce
Hence,
similiarly,
The centering matrix is defined as:
Let’s now apply double centering to DX to get
where X~=X(IN−1NeeT)
Remember, the task was to find a concrete set of N points
Y
in k dimensions so that the pairwise Euclidean distances betwwen all the pairs in the concrete set
Note that after applying the “double centering” operation to both X and
The above equation is a well known optimization problem that can be solved via Singular Value Decomposition(SVD) of
BX
.
Here,
U
is N by k matrix and
Reference:
http://www.cs.umd.edu/~djacobs/CMSC828/MDSexplain.pdf
https://homepage.uni-tuebingen.de/florian.wickelmaier/pubs/Wickelmaier2003SQRU.pdf
https://inside.mines.edu/~whereman/talks/delaPorte-Herbst-Hereman-vanderWalt-DiffusionMaps-PRASA2008.pdf