这是一个矢量化的方法 –
# Get the row, col indices that are to be set in output array
r,c = np.tril_indices(ndocs,-1)
# Use those indicees to slice out respective columns
p1 = rawdata[:,c]
p2 = rawdata[:,r]
# Perform n11 and n00 vectorized computations across all indexed columns
n11v = ((p1==1) & (p2==1)).sum(0)
n00v = ((p1==0) & (p2==0)).sum(0)
# Finally, setup output array and set final division computations
out = np.eye(ndocs)
out[c,r] = n11v / (nfeats-n00v)
使用np.einsum计算n11v和n00v的替代方法 –
n11v = np.einsum('ij,ij->j',(p1==1),(p2==1).astype(int))
n00v = np.einsum('ij,ij->j',(p1==0),(p2==0).astype(int))
如果rawdata只包含0和1,那么更简单的方法就是 –
n11v = np.einsum('ij,ij->j',p1,p2)
n00v = np.einsum('ij,ij->j',1-p1,1-p2)
标杆
功能定义 –
def original_app(rawdata, ndocs, nfeats):
tru_sim = np.zeros((ndocs,ndocs))
for i in range(0,ndocs):
tru_sim[i,i]=1
for j in range(i+1,ndocs):
tru_sim[i,j] = jaccard(rawdata[:,i],rawdata[:,j])
return tru_sim
def vectorized_app(rawdata, ndocs, nfeats):
r,c = np.tril_indices(ndocs,-1)
p1 = rawdata[:,c]
p2 = rawdata[:,r]
n11v = ((p1==1) & (p2==1)).sum(0)
n00v = ((p1==0) & (p2==0)).sum(0)
out = np.eye(ndocs)
out[c,r] = n11v / (nfeats-n00v)
return out
验证和时间 –
In [6]: # Setup inputs
...: rawdata = (np.random.rand(20,10000)>0.2).astype(int)
...: rawdata = np.transpose(rawdata)
...: ndocs = rawdata.shape[1]
...: nwords = rawdata.shape[0]
...: nfeats = 5
...:
In [7]: # Verify results
...: out1 = original_app(rawdata, ndocs, nfeats)
...: out2 = vectorized_app(rawdata, ndocs, nfeats)
...: print np.allclose(out1,out2)
...:
True
In [8]: %timeit original_app(rawdata, ndocs, nfeats)
1 loops, best of 3: 8.72 s per loop
In [9]: %timeit vectorized_app(rawdata, ndocs, nfeats)
10 loops, best of 3: 27.6 ms per loop
那里有一些神奇的300倍加速!
那么,它为什么这么快?好吧,涉及到很多因素,最重要的一个是NumPy数组是为性能而构建的,并针对矢量化计算进行了优化.通过提出的方法,我们可以很好地利用它,从而看到这样的加速.
这里有一个related Q&A详细讨论了这些性能标准.