PyTorch：向量相似度度量、距离度量

原创已于 2023-06-06 21:14:03 修改 · 5.9k 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #深度学习 #python

于 2021-04-22 14:49:33 首次发布

Pytorch 专栏收录该内容

18 篇文章

订阅专栏

本文介绍了如何在PyTorch中计算余弦相似度和欧氏距离。余弦相似度用于衡量两个向量之间的夹角，而欧氏距离则是测量两个点之间直线距离的标准。文章通过代码示例展示了nn.CosineSimilarity和torch.cosine_similarity函数的使用，以及手动实现欧氏距离的四种方法。此外，还提到了其他向量相似度评估方法，如点积相似度等。

部署运行你感兴趣的模型镜像

余弦相似度CosineSimilarity

torch.nn.CosineSimilarity(dim=1, eps=1e-08) nn.CosineSimilarity

Returns cosine similarity between x1 and x2, computed along dim.

实现：

input1 = torch.randn(2, 4)
input2 = torch.randn(2, 4)
# 方式1：
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
cos_similarity = cos(input1, input2)
# 方式2：
cos_similarity = torch.cosine_similarity(input1, input2, dim=1)

tensor([[-0.2479, 0.0530, 1.1974, 0.6467],
[ 0.1524, 2.1820, 1.2043, 1.0184]])
tensor([[-1.4082, -0.7162, -0.6705, 0.5021],
[-1.7972, -0.4961, 1.1505, 0.9610]])
tensor([-0.0675, 0.1562])
tensor([-0.0675, 0.1562])

欧氏距离

即m*e和n*e张量之间的欧式距离

理论分析

算法实现

import torch

def euclidean_dist(x, y):
    """
    Args:
      x: pytorch Variable, with shape [m, d]
      y: pytorch Variable, with shape [n, d]
    Returns:
      dist: pytorch Variable, with shape [m, n]
    """
    m = x.size(0)
    n = y.size(0)
    e = x.size(1)

# 方式1
a1 = (x ** 2).sum(1, keepdim=True).expand(-1, n)
b2 = (y ** 2).sum(1).expand(m, -1)
dist = (a1 + b2 - 2 * torch.mm(x, y.T)).sqrt()
# 或者dist = (a1 + b2 - 2 * (x @ y.T)).sqrt()
print(dist)

    # 方式2
    x1 = x.unsqueeze(1).expand(m, n, e)
    y1 = y.expand(m, n, e)
    dist = (x1 - y1).pow(2).sum(2).float().sqrt()
    print(dist)

    # 方式3
    dist = torch.zeros((m, n))
    for i, xi in enumerate(x):
        for j, yi in enumerate(y):
            # 方式2.1
            # dist[i][j] = ((xi - yi) ** 2).sum().float().sqrt()
            # 方式2.2
            dist[i][j] = torch.pairwise_distance(torch.unsqueeze(xi, 0), torch.unsqueeze(yi, 0), p=2)
    print(dist)

    # 方式4
    dist = torch.zeros((m, n))
    for i, xi in enumerate(x):
        dist[i] = torch.pairwise_distance(xi, y, p=2)
    print(dist)
    return dist

a = torch.tensor([[1, 2], [3, 4], [5, 6]])
b = torch.tensor([[2, 3], [4, 5], [5, 6], [8, 9]])
dist = euclidean_dist(a, b)

tensor([[1.4142, 4.2426, 5.6569, 9.8995],
[1.4142, 1.4142, 2.8284, 7.0711],
[4.2426, 1.4142, 0.0000, 4.2426]])

[矩阵之间欧式距离的快捷计算方法（无循环）]

me和me张量之间的欧式距离pairwise_distance

import torch.nn.functional as F
distance = F.pairwise_distance(rep_a, rep_b, p=2)

其中rep_a和rep_b为[batch_size,hidden_dim]，两个维度必须相同，或者第一个维度为[hidden_dim]会自动进行广播操作（且最多只能有两个维度？）

[torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)]

使用numpy完成相同操作

import numpy as np
def euclidean_dist(a, b):
'''
计算a中向量和b中向量两两间的欧式距离
'''
import numpy as np
a = np.asarray(a)
b = np.asarray(b)
dist = np.sqrt(np.sum(a ** 2, 1, keepdims=True).repeat(b.shape[0], axis=1) +
np.sum(b ** 2, 1, keepdims=True).repeat(a.shape[0], axis=1).transpose() - a.dot(b.transpose()) * 2)
return dist

a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 3], [4, 5], [5, 6], [8, 9]])
dist = euclidean_dist(a, b)
print(dist)

[[1.41421356 4.24264069 5.65685425 9.89949494]
[1.41421356 1.41421356 2.82842712 7.07106781]
[4.24264069 1.41421356 0. 4.24264069]]

单个张量内部向量两两之间的欧氏距离

def self_euclidean_dist(embeddings):
    # 方式1（类似euclidean_dist(x, y)方式2）
    m, m, e = len(embeddings), len(embeddings), embeddings.shape[1]
    t1 = embeddings.unsqueeze(1).expand(m, m, e)
    t2 = embeddings.unsqueeze(0).expand(m, m, e)
    dist = (t1 - t2).pow(2).sum(2).float().sqrt()
    print(dist)
    return dist
a = torch.tensor([[1, 2], [3, 4], [5, 6]])
self_euclidean_dist(a)