余弦相似度CosineSimilarity
torch.nn.CosineSimilarity(dim=1, eps=1e-08) nn.CosineSimilarity
Returns cosine similarity between x1 and x2, computed along dim.
![]()
实现:
input1 = torch.randn(2, 4)
input2 = torch.randn(2, 4)
# 方式1:
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
cos_similarity = cos(input1, input2)
# 方式2:
cos_similarity = torch.cosine_similarity(input1, input2, dim=1)
tensor([[-0.2479, 0.0530, 1.1974, 0.6467],
[ 0.1524, 2.1820, 1.2043, 1.0184]])
tensor([[-1.4082, -0.7162, -0.6705, 0.5021],
[-1.7972, -0.4961, 1.1505, 0.9610]])
tensor([-0.0675, 0.1562])
tensor([-0.0675, 0.1562])
欧氏距离
即m*e和n*e张量之间的欧式距离
理论分析

算法实现
import torch
def euclidean_dist(x, y):
"""
Args:
x: pytorch Variable, with shape [m, d]
y: pytorch Variable, with shape [n, d]
Returns:
dist: pytorch Variable, with shape [m, n]
"""
m = x.size(0)
n = y.size(0)
e = x.size(1)
# 方式1
a1 = (x ** 2).sum(1, keepdim=True).expand(-1, n)
b2 = (y ** 2).sum(1).expand(m, -1)
dist = (a1 + b2 - 2 * torch.mm(x, y.T)).sqrt()
# 或者dist = (a1 + b2 - 2 * (x @ y.T)).sqrt()
print(dist)
# 方式2
x1 = x.unsqueeze(1).expand(m, n, e)
y1 = y.expand(m, n, e)
dist = (x1 - y1).pow(2).sum(2).float().sqrt()
print(dist)
# 方式3
dist = torch.zeros((m, n))
for i, xi in enumerate(x):
for j, yi in enumerate(y):
# 方式2.1
# dist[i][j] = ((xi - yi) ** 2).sum().float().sqrt()
# 方式2.2
dist[i][j] = torch.pairwise_distance(torch.unsqueeze(xi, 0), torch.unsqueeze(yi, 0), p=2)
print(dist)
# 方式4
dist = torch.zeros((m, n))
for i, xi in enumerate(x):
dist[i] = torch.pairwise_distance(xi, y, p=2)
print(dist)
return dist
a = torch.tensor([[1, 2], [3, 4], [5, 6]])
b = torch.tensor([[2, 3], [4, 5], [5, 6], [8, 9]])
dist = euclidean_dist(a, b)
tensor([[1.4142, 4.2426, 5.6569, 9.8995],
[1.4142, 1.4142, 2.8284, 7.0711],
[4.2426, 1.4142, 0.0000, 4.2426]])
m*e和m*e张量之间的欧式距离pairwise_distance
import torch.nn.functional as F
distance = F.pairwise_distance(rep_a, rep_b, p=2)
其中rep_a和rep_b为[batch_size,hidden_dim],两个维度必须相同,或者第一个维度为[hidden_dim]会自动进行广播操作(且最多只能有两个维度?)
[torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)]
使用numpy完成相同操作
import numpy as np
def euclidean_dist(a, b):
'''
计算a中向量和b中向量 两两间的欧式距离
'''
import numpy as np
a = np.asarray(a)
b = np.asarray(b)
dist = np.sqrt(np.sum(a ** 2, 1, keepdims=True).repeat(b.shape[0], axis=1) +
np.sum(b ** 2, 1, keepdims=True).repeat(a.shape[0], axis=1).transpose() - a.dot(b.transpose()) * 2)
return dist
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 3], [4, 5], [5, 6], [8, 9]])
dist = euclidean_dist(a, b)
print(dist)
[[1.41421356 4.24264069 5.65685425 9.89949494]
[1.41421356 1.41421356 2.82842712 7.07106781]
[4.24264069 1.41421356 0. 4.24264069]]
单个张量内部向量两两之间的欧氏距离
def self_euclidean_dist(embeddings):
# 方式1(类似euclidean_dist(x, y)方式2)
m, m, e = len(embeddings), len(embeddings), embeddings.shape[1]
t1 = embeddings.unsqueeze(1).expand(m, m, e)
t2 = embeddings.unsqueeze(0).expand(m, m, e)
dist = (t1 - t2).pow(2).sum(2).float().sqrt()
print(dist)
return dist
a = torch.tensor([[1, 2], [3, 4], [5, 6]])
self_euclidean_dist(a)
其它常用向量相似度评估方法 及 实现
2. DotProductSimilarity
3. ProjectedDotProductSimilarity
4. BiLinearSimilarity nn.Bilinear
5. TriLinearSimilarity
6. MultiHeadedSimilarity nn.MultiheadAttention
from: -柚子皮-
ref:
本文介绍了如何在PyTorch中计算余弦相似度和欧氏距离。余弦相似度用于衡量两个向量之间的夹角,而欧氏距离则是测量两个点之间直线距离的标准。文章通过代码示例展示了nn.CosineSimilarity和torch.cosine_similarity函数的使用,以及手动实现欧氏距离的四种方法。此外,还提到了其他向量相似度评估方法,如点积相似度等。
2万+

被折叠的 条评论
为什么被折叠?



