利用torch和numpy的unique函数处理重复张量-优快云博客

本文链接：https://blog.youkuaiyun.com/moshuilangting/article/details/121495761

torch和numpy中都有unique函数，作用是去除张量/向量组中的重复张量/向量。

torch

torch.unique(input, sorted=True, return_inverse=False, return_counts=False, dim=None)

可以看到有 return_inverse和return_counts两个参数可以修改，return_inverse代表input张量中的每个元素在output张量中的位置，return_counts代表input张量中每个元素的重复次数。

output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long))
#output:tensor([ 2,  3,  1])

output, inverse_indices = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long), sorted=True, return_inverse=True)
#output:tensor([ 1,  2,  3]) inverse_indices:tensor([ 0,  2,  1,  2])

output, inverse_indices = torch.unique(torch.tensor([[1, 3], [2, 3]], dtype=torch.long), sorted=True, return_inverse=True)
#output:tensor([ 1,  2,  3])  inverse_indices: tensor([[ 0,  2],[ 1,  2]])

numpy

numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None)

numpy的unique中多了一个return_index，可以返回在unique过程中，第一个重复向量（选取的向量）的位置。

a = np.array(['a', 'b', 'b', 'c', 'a'])
u, indices = np.unique(a, return_index=True)
# u:array(['a', 'b', 'c'], dtype='<U1')
# indices:array([0, 1, 3])

a[indices]
# array(['a', 'b', 'c'], dtype='<U1')

这个return_index很有作用。

举例：

现有torch张量如下

x:
[[ 1.00000,  7.00000, 30.00000,  0.44995],
 [ 7.00000,  4.00000, 30.00000,  0.30664],
 [ 7.00000,  4.00000, 30.00000,  0.30127],
 [ 7.00000,  0.00000, 31.00000,  0.28174],
 [ 7.00000,  0.00000, 38.00000,  0.27441],
 [ 7.00000,  1.00000, 31.00000,  0.25757],
 [ 7.00000,  4.00000, 30.00000,  0.24072],
 [ 1.00000,  7.00000, 30.00000,  0.23401],
 [ 1.00000,  4.00000, 30.00000,  0.23096],
 [ 4.00000,  0.00000, 31.00000,  0.22205],
 [ 7.00000,  0.00000, 31.00000,  0.21472],
 [ 7.00000,  4.00000, 30.00000,  0.20691]], device='cuda:0')

需要挑选出前三列相同且第四列值最大的不重复张量

首先对第四列排序:

x = x[x[:, 3].argsort(descending=True)]

用torch.unique可以得到前三列相同的张量

_x, _ = torch.unique(x[:,:3],dim=0,return_inverse=True)
'''
_x:
tensor([ 1.,  4., 30.], device='cuda:0')
tensor([ 1.,  7., 30.], device='cuda:0')
tensor([ 4.,  0., 31.], device='cuda:0')
tensor([ 7.,  0., 31.], device='cuda:0')
tensor([ 7.,  0., 38.], device='cuda:0')
tensor([ 7.,  1., 31.], device='cuda:0')
tensor([ 7.,  4., 30.], device='cuda:0')
_:
tensor([1, 6, 6, 3, 4, 5, 6, 1, 0, 2, 3, 6], device='cuda:0')
'''

将_转为numpy后，使用numpy的unique

_ = _.cpu().numpy()
vals, idx_start= np.unique(np.array(_), return_index=True)
# idx_start:[8 0 9 3 4 5 1]

从x中挑idx_start的张量

x_unique = x[torch.tensor(idx_start)]
'''
x_unique:
tensor([ 1.00000,  4.00000, 30.00000,  0.23096], device='cuda:0')
tensor([ 1.00000,  7.00000, 30.00000,  0.44995], device='cuda:0')
tensor([ 4.00000,  0.00000, 31.00000,  0.22205], device='cuda:0')
tensor([ 7.00000,  0.00000, 31.00000,  0.28174], device='cuda:0')
tensor([ 7.00000,  0.00000, 38.00000,  0.27441], device='cuda:0')
tensor([ 7.00000,  1.00000, 31.00000,  0.25757], device='cuda:0')
tensor([ 7.00000,  4.00000, 30.00000,  0.30664], device='cuda:0')
'''

达到要求，当然一开始直接把x转为numpy用unique也可以