It is a quite fundamental technique in machine learning and other fields to calculate the similarity of two vectors.
Given two vectors of n dimensions, as :
1. Euclidean Distance. (as Ed, Most frequently seen)
In fact, it's just straight-line distance of the two sample points (vectors) in a multi-dimensional space.
We have:
The formula above tells us how similar two vectors are by the value of Ed1, the smaller, the more similar. When Ed1 equals 0, we deem they are identical vectors. Also there is another alternative:
This seems more reasonable since a bigger value of Ed2 means greater similarity. And notice that Ed2 is within interval (0,1].
2. Pearson Correlation. (as PC)
PC is slightly more sophisticated than Ed. A pearson correlation coefficient (r) is generated to measure how well two sets of data fit on a straight line.
Formula to calculate (r) goes like what follows:
r is distributed between [-1,1], the bigger | r | (absolute) is, the more they are related. A positive r means they are positively correlated. And a zero value
of r means they are not related. Pearson correlation approach works even when vector dimensions are not quite well normalized.
本文介绍了两种常用的向量相似度计算方法:欧几里得距离(Ed)与皮尔逊相关系数(PC)。通过这两种方法可以有效评估多维空间中向量间的相似程度。Ed适用于直接测量两点间距离,数值越小表示向量越相似;而PC则能衡量两组数据的相关性,绝对值越大表明相关性越强。
316

被折叠的 条评论
为什么被折叠?



