Calculation of Vector Similarity

本文介绍了两种常用的向量相似度计算方法:欧几里得距离(Ed)与皮尔逊相关系数(PC)。通过这两种方法可以有效评估多维空间中向量间的相似程度。Ed适用于直接测量两点间距离,数值越小表示向量越相似;而PC则能衡量两组数据的相关性,绝对值越大表明相关性越强。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

It is a quite fundamental technique in machine learning and other fields to calculate the similarity of two vectors.

Given two vectors of n dimensions, as :

       

1. Euclidean Distance. (as Ed, Most frequently seen)

        In fact, it's just straight-line distance of the two sample points (vectors) in a multi-dimensional space.

        We have:

               

        The formula above tells us how similar two vectors are by the value of Ed1, the smaller, the more similar. When Ed1 equals 0, we deem they are identical vectors. Also there is another alternative:

              

               

        This seems more reasonable since a bigger value of Ed2 means greater similarity. And notice that Ed2 is within interval (0,1].


2. Pearson Correlation. (as PC)

        PC is slightly more sophisticated than Ed. A pearson correlation coefficient (r) is generated to measure how well two sets of data fit on a straight line.

        Formula to calculate (r) goes like what follows:

     

        r is distributed between [-1,1], the bigger | r | (absolute) is, the more they are related. A positive r means they are positively correlated. And a zero value of r means they are not related. Pearson correlation approach works even when vector dimensions are not quite well normalized.


        

       





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值