python100行编程代码很反恐精英相似的_在Python中计算Jaccard相似度

weixin_39632693

于 2020-12-04 14:03:02 发布

阅读量301

点赞数

文章标签： python100行编程代码很反恐精英相似的

这是一个矢量化的方法 –

# Get the row, col indices that are to be set in output array

r,c = np.tril_indices(ndocs,-1)

# Use those indicees to slice out respective columns

p1 = rawdata[:,c]

p2 = rawdata[:,r]

# Perform n11 and n00 vectorized computations across all indexed columns

n11v = ((p1==1) & (p2==1)).sum(0)

n00v = ((p1==0) & (p2==0)).sum(0)

# Finally, setup output array and set final division computations

out = np.eye(ndocs)

out[c,r] = n11v / (nfeats-n00v)

使用np.einsum计算n11v和n00v的替代方法 –

n11v = np.einsum('ij,ij->j',(p1==1),(p2==1).astype(int))

n00v = np.einsum('ij,ij->j',(p1==0),(p2==0).astype(int))

如果rawdata只包含0和1,那么更简单的方法就是 –

n11v = np.einsum('ij,ij->j',p1,p2)

n00v = np.einsum('ij,ij->j',1-p1,1-p2)

标杆

功能定义 –

def original_app(rawdata, ndocs, nfeats):

tru_sim = np.zeros((ndocs,ndocs))

for i in range(0,ndocs):

tru_sim[i,i]=1

for j in range(i+1,ndocs):

tru_sim[i,j] = jaccard(rawdata[:,i],rawdata[:,j])

return tru_sim

def vectorized_app(rawdata, ndocs, nfeats):

r,c = np.tril_indices(ndocs,-1)

p1 = rawdata[:,c]

p2 = rawdata[:,r]

n11v = ((p1==1) & (p2==1)).sum(0)

n00v = ((p1==0) & (p2==0)).sum(0)

out = np.eye(ndocs)

out[c,r] = n11v / (nfeats-n00v)

return out

验证和时间 –

In [6]: # Setup inputs

...: rawdata = (np.random.rand(20,10000)>0.2).astype(int)

...: rawdata = np.transpose(rawdata)

...: ndocs = rawdata.shape[1]

...: nwords = rawdata.shape[0]

...: nfeats = 5

...:

In [7]: # Verify results

...: out1 = original_app(rawdata, ndocs, nfeats)

...: out2 = vectorized_app(rawdata, ndocs, nfeats)

...: print np.allclose(out1,out2)

...:

True

In [8]: %timeit original_app(rawdata, ndocs, nfeats)

1 loops, best of 3: 8.72 s per loop

In [9]: %timeit vectorized_app(rawdata, ndocs, nfeats)

10 loops, best of 3: 27.6 ms per loop

那里有一些神奇的300倍加速！

那么,它为什么这么快？好吧,涉及到很多因素,最重要的一个是NumPy数组是为性能而构建的,并针对矢量化计算进行了优化.通过提出的方法,我们可以很好地利用它,从而看到这样的加速.

这里有一个related Q&A详细讨论了这些性能标准.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。