What is Mahalanobis distance? 马氏距离

最新推荐文章于 2021-04-16 17:07:02 发布

转载最新推荐文章于 2021-04-16 17:07:02 发布 · 475 阅读

1 ·

CC 4.0 BY-SA版权

原文链接：https://blogs.sas.com/content/iml/2012/02/08/.html

文章标签：

#机器学习

机器学习专栏收录该内容

11 篇文章

订阅专栏

本文详细解析了马氏距离的概念，包括其几何意义，如何通过Cholesky分解将变量从无相关转变为具有特定协方差结构，以及逆Cholesky变换用于消除变量间的相关性。此外，文章介绍了Cholesky分解在生成正态分布数据和解决涉及协方差矩阵的线性系统中的应用。

https://blogs.sas.com/content/iml/2012/02/15/what-is-mahalanobis-distance.html

https://blogs.sas.com/content/iml/2012/02/08/.html

https://stats.stackexchange.com/questions/62092/bottom-to-top-explanation-of-the-mahalanobis-distance
上面三个网页对马氏距离解释的很好

A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that “uncorrelates” the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is called the Cholesky transformation; it is represented by a matrix that is the “square root” of the covariance matrix.

The Square Root Matrix
Given a covariance matrix, Σ, it can be factored uniquely into a product Σ=UTU, where U is an upper triangular matrix with positive diagonal entries and the superscript denotes matrix transpose. The matrix U is the Cholesky (or “square root”) matrix. Some people (including me) prefer to work with lower triangular matrices. If you define L=UT, then Σ=LLT. This is the form of the Cholesky decomposition that is given in Golub and Van Loan (1996, p. 143). Golub and Van Loan provide a proof of the Cholesky decomposition, as well as various ways to compute it.

Geometrically, the Cholesky matrix transforms uncorrelated variables into variables whose variances and covariances are given by Σ. In particular, if you generate p standard normal variates, the Cholesky transformation maps the variables into variables for the multivariate normal distribution with covariance matrix Σ and centered at the origin (denoted MVN(0, Σ)).

The Cholesky Transformation: The Simple Case
Let’s see how the Cholesky transofrmation works in a very simple situation. Suppose that you want to generate multivariate normal data that are uncorrelated, but have non-unit variance. The covariance matrix for this situation is the diagonal matrix of variances: Σ = diag(σ21,…,σ2p). The square root of Σ is the diagonal matrix D that consists of the standard deviations: Σ = DTD where D = diag(σ1,…,σp).

Geometrically, the D matrix scales each coordinate direction independently of the other directions. This is shown in the following image. The X axis is scaled by a factor of 3, whereas the Y axis is unchanged (scale factor of 1). The transformation D is diag(3,1), which corresponds to a covariance matrix of diag(9,1). If you think of the circles in the top image as being probability contours for the multivariate distribution MVN(0, I), then the bottom shows the corresponding probability ellipses for the distribution MVN(0, D).

在这里插入图片描述

Geometry of transformation: A diagonal matrix transforms uncorrelated standardized variables to uncorrelated scaled variables. Shown for bivariate data.
The General Cholesky Transformation Correlates Variables
In the general case, a covariance matrix contains off-diagonal elements. The geometry of the Cholesky transformation is similar to the “pure scaling” case shown previously, but the transformation also rotates and shears the top image.

The following graph shows the geometry of the transformation in terms of the data and in terms of probability ellipses. The top graph is a scatter plot of the X and Y variables. Notice that they are uncorrelated and that the probability ellipses are circles. The bottom graph is a scatter plot of the Z and W variables. Notice that they are correlated and the probability contours are ellipses that are tilted with respect to the coordinate axes. The bottom graph is the transformation under L of points and circles in the top graph.

在这里插入图片描述
The Inverse Cholesky Transformation Uncorrelates Variables
You might wonder: Can you go the other way? That is, if you start with correlated variables, can you apply a linear transformation such that the transformed variables are uncorrelated? Yes, and it’s easy to guess the transformation that works: it is the inverse of the Cholesky transformation!

Suppose that you generate multivariate normal data from MVN(0,Σ). You can “uncorrelate” the data by transforming the data according to L^-1

Success! The covariance matrix is essentially the identity matrix. The inverse Cholesky transformation “uncorrelates” the variables.

The TRISOLV function, which uses back-substitution to solve the linear system, is extremely fast. Anytime you are trying to solve a linear system that involves a covariance matrix, you should try to solve the system by computing the Cholesky factor of the covariance matrix, followed by back-substitution.

In summary, you can use the Cholesky factor of a covariance matrix in several ways:

To generate multivariate normal data with a given covariance structure from uncorrelated normal variables.
To remove the correlations between variables. This task requires using the inverse Cholesky transformation.
To quickly solve linear systems that involve a covariance matrix.

在这里插入图片描述