scipy.sparse.hastack blocks must be 2-D

最新推荐文章于 2024-07-23 00:08:13 发布

原创最新推荐文章于 2024-07-23 00:08:13 发布 · 1.6k 阅读

2 ·

CC 4.0 BY-SA版权

python 专栏收录该内容

25 篇文章

订阅专栏

博客讲述了使用scipy.parse.hstack()合并2个表时出现报错的问题。经分析，问题可能出在numpy的np.asarray()函数上，不同类型数据叠加方式不同。为解决报错，建议使用前先转换成sparse.coo_matrix，但转换较麻烦。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

采用scipy.parse.hstack() 合并2个表的时候报错。

事实证明这是个大坑。

在stackoverflow上搜到这个回答。

def hstack(blocks ...):
    return bmat([blocks], ...)

def bmat(blocks, ...):
    blocks = np.asarray(blocks, dtype='object')
    if blocks.ndim != 2:
        raise ValueError('blocks must be 2-D')
    (continue)

# @hpaulj
# https://stackoverflow.com/questions/31900567/scipy-sparse-hstack1-2-valueerror-blocks-must-be-2-d-why

可以看到，先将传入的参数转化为np.ndarray，然后判断是不是2维。

做测试：

import numpy as np
from scipy.sparse import coo_matrix, hstack

aa = np.array([[4],[5],[6]])
ba = np.array( [[1],[2],[3]])
print(aa.shape)    #(3,1)
print(bb.shape)    #(3,1)

A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)
print(A.shape)    #(3,1)
print(B.shape)    #(3,1)


#转换成scipy.sparse.coo_matrix之后可以正常合并
C=hstack([A,B])
print(C.shape)    #(3,2)

#使用原生的numpy.ndarray就会报错
c=hstack([aa,bb])
#raise('blocks must be 2-D')

这不是大坑是什么？？

凭什么np.ndarray就报错啊！！！

我们还原一下

#在hstack函数内，先将传入的参数转换成np.ndarray
blocks = [aa,bb]
blocks = np.asarray(blocks, dtype='object')

#打印看看
print(blocks.shape)

#输出
#(2, 3, 1)，这不是变成纵向排列了吗！

print(blocks.ndim)
#输出3，所以被判定为不是2-D矩阵

if blocks.ndim != 2:
    raise ValueError('blocks must be 2-D')

所以说来说去，还是numpy自己的函数np.asarray()写的不好。

搞sparse.coo_matrix的时候，[A,B]被横向叠加。

自np.ndarray的时候，[aa,bb]被纵向叠加。

综上，为了解决报错

建议在使用前都先转换成sparse.coo_matrix

import numpy as np
from scipy.sparse import coo_matrix, hstack

A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)

C = hstack([A,B])
#这样就不会出错了

但是这样转换很麻烦诶（台湾腔）！！

那怎么办呢！！

老爹说要用魔法打败魔法！

numpy的事情交给numpy对付！

numpy.hstack(tup)[source]

This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis. Rebuilds arrays divided by hsplit.

tup : sequence of ndarrays

The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length.

示例：

import numpy as np
import scipy
from scipy.sparse import coo_matrix, hstack

aa = np.array([[4],[5],[6]])
bb = np.array( [[1],[2],[3]])
print(aa.shape)    #(3,1)
print(bb.shape)    #(3,1)

#要用魔法打败魔法！
c=np.hstack([aa,bb])
print(c.shape)
#(3,2)


#最后转回sparse矩阵
cc = scipy.sparse.coo_matrix(c)


#-----------------------------------
A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)
print(A.shape)
print(B.shape)


C= scpipy.sparse.hstack([A,B])
print(C.shape)