python矩阵想成_Python-如何生成成对汉明距离矩阵

博客内容讲述了如何使用numpy库避免循环,通过向量化计算输入矩阵的行之间的汉明距离矩阵。提供的代码示例中,作者尝试创建一个函数`compute_HammingDistance`,但遇到了返回值为标量而非矩阵的问题。解决方案是利用numpy的广播机制,通过`(arr[:,None,:]!=arr).sum(2)`计算汉明距离,得到预期的矩阵结果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

beginner with Python here. So I'm having trouble trying to calculate the resulting binary pairwise hammington distance matrix between the rows of an input matrix using only the numpy library. I'm supposed to avoid loops and use vectorization. If for instance I have something like:

[ 1, 0, 0, 1, 1, 0]

[ 1, 0, 0, 0, 0, 0]

[ 1, 1, 1, 1, 0, 0]

The matrix should be something like:

[ 0, 2, 3]

[ 2, 0, 3]

[ 3, 3, 0]

ie if the original matrix was A and the hammingdistance matrix is B. B[0,1] = hammingdistance (A[0] and A[1]). In this case the answer is 2 as they only have two different elements.

So for my code is something like this

def compute_HammingDistance(X):

hammingDistanceMatrix = np.zeros(shape = (len(X), len(X)))

hammingDistanceMatrix = np.count_nonzero ((X[:,:,None] != X[:,:,None].T))

return hammingDistanceMatrix

However it seems to just be returning a scalar value instead of the intended matrix. I know I'm probably doing something wrong with the array/vector broadcasting but I can't figure out how to fix it. I've tried using np.sum instead of np.count_nonzero but they all pretty much gave me something similar.

解决方案

Try this approach, create a new axis along axis = 1, and then do broadcasting and count trues or non zero with sum:

(arr[:, None, :] != arr).sum(2)

# array([[0, 2, 3],

# [2, 0, 3],

# [3, 3, 0]])

def compute_HammingDistance(X):

return (X[:, None, :] != X).sum(2)

Explanation:

1) Create a 3d array which has shape (3,1,6)

arr[:, None, :]

#array([[[1, 0, 0, 1, 1, 0]],

# [[1, 0, 0, 0, 0, 0]],

# [[1, 1, 1, 1, 0, 0]]])

2) this is a 2d array has shape (3, 6)

arr

#array([[1, 0, 0, 1, 1, 0],

# [1, 0, 0, 0, 0, 0],

# [1, 1, 1, 1, 0, 0]])

3) This triggers broadcasting since their shape doesn't match, and the 2d array arr is firstly broadcasted along the 0 axis of 3d array arr[:, None, :], and then we have array of shape (1, 6) be broadcasted against (3, 6). The two broadcasting steps together make a cartesian comparison of the original array.

arr[:, None, :] != arr

#array([[[False, False, False, False, False, False],

# [False, False, False, True, True, False],

# [False, True, True, False, True, False]],

# [[False, False, False, True, True, False],

# [False, False, False, False, False, False],

# [False, True, True, True, False, False]],

# [[False, True, True, False, True, False],

# [False, True, True, True, False, False],

# [False, False, False, False, False, False]]], dtype=bool)

4) the sum along the third axis count how many elements are not equal, i.e, trues which gives the hamming distance.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值