Multidimensional Scaling (MDS)

本文深入探讨了多维缩放(MDS)的基本原理和技术细节。通过数学推导解释了如何从给定的距离矩阵出发,找到一组低维度的点,使得这些点之间的距离能够最好地逼近原始的距离矩阵。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MDS aims to embed data in a lower dimensional space in such a way that pair-wise distances between data points are preserved.

Say we have N points xiRn for i[1,N] , let X=[x1,x2,,xN] , we don’t know the postion of xi . We are only supplied with the pair-wise Euclidean distances among these points. Now the objection is to find out N points yiRk,k<n , let Y=[y1,y2,,yN] , such that the distance in pairs of X is the same as these of Y.

Given the distance matrix DX , each element of DX can be written as:
(DXij)2=(xixj)T(xixj)=xi22xTixj+xj2
we can easily see that
DX=Z2XTX+ZT

Here, Z=zeT and z=[x12x22xN2]T . Therefore Z takes the form

Z=x12x12x12x12x12x12x12x12x12

Now, let’s translate the mean of the set of hypothetical point set X to the origin. Note that this operation does not change the Euclidean distance between any pairs of points.

For better understanding, we introduce 1NAeeT and 1NeeTA . Here, A is a N-by-N matrix which taks the form:

A=A11A21AN1A12A22AN2A1NA2NANN

Hence,

1NAeeT=1NA11A21AN1A12A22AN2A1NA2NANN111111111=1Nj=1NA1j1Nj=1NA2j1Nj=1NANj1Nj=1NA1j1Nj=1NA2j1Nj=1NANj1Nj=1NA1j1Nj=1NA2j1Nj=1NANj=mean of first row of Amean of second row of Amean of Nth row of Amean of first row of Amean of second row of Amean of Nth row of Amean of first row of Amean of second row of Amean of Nth row of A

similiarly,
1NeeTA=1N111111111A11A21AN1A12A22AN2A1NA2NANN=1Ni=1NAi11Ni=1NAi11Ni=1NAi11Ni=1NAi21Ni=1NAi21Ni=1NAi21Ni=1NAiN1Ni=1NAiN1Ni=1NAiN=mean of first column of Amean of first column of Amean of first column of Amean of second column of Amean of second column of Amean of second column of Amean of Nth column of Amean of Nth column of Amean of Nth column of A

The centering matrix is defined as:

H=IN1NeeT

Let’s now apply double centering to DX to get
AX=HDXH=(IN1NeeT)(Z2XTX+ZT)(IN1NeeT)=(IN1NeeT)Z(IN1NeeT)2(IN1NeeT)XTX(IN1NeeT)+(IN1NeeT)ZT(IN1NeeT)=2(IN1NeeT)XTX(IN1NeeT)=2(X(IN1NeeT))TX(IN1NeeT)=2X~TX~

where X~=X(IN1NeeT)

BX=12AX=12HDXH=X~TX~

Remember, the task was to find a concrete set of N points Y in k dimensions so that the pairwise Euclidean distances betwwen all the pairs in the concrete set Y is a close approximation to the pair-wise distances given to us in the matrix DX i.e. we want to find DY such that

DY=argminDXDY2F

Note that after applying the “double centering” operation to both X and Y, equation above yields
BY=argminBXBY2F=DY=X~TX~Y~TY~2F

The above equation is a well known optimization problem that can be solved via Singular Value Decomposition(SVD) of BX .

BXUDUT=(UD12)(D12UT)=YTY~

Here, U is N by k matrix and D is k by k diagonal matrix with k largest singular values on the diagonal and Y~=D12UT is k by N matrix. Finally, we get N embedding points in k dimension as the column vectors of Y~

Reference:
http://www.cs.umd.edu/~djacobs/CMSC828/MDSexplain.pdf
https://homepage.uni-tuebingen.de/florian.wickelmaier/pubs/Wickelmaier2003SQRU.pdf
https://inside.mines.edu/~whereman/talks/delaPorte-Herbst-Hereman-vanderWalt-DiffusionMaps-PRASA2008.pdf

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值