Suppose I have a RowMatrix.
How can I transpose it. The API documentation does not seem to have a transpose method.
The Matrix has the transpose() method. But it is not distributed. If I have a large matrix greater that the memory how can I transpose it?
I have converted a RowMatrix to DenseMatrix as follows
DenseMatrix Mat = new DenseMatrix(m,n,MatArr);
which requires converting the RowMatrix to JavaRDD and converting JavaRDD to an array.
Is there any other convenient way to do the conversion?
Thanks in advance
解决方案
You are correct: there is no
RowMatrix.transpose()
method. You will need to do this operation manually.
Here is the non-distributed/local matrix versions:
def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
(for {
c
} yield m.map(_(c)) ).toArray
}
The distributed version would be along the following lines:
origMatRdd.rows.zipWithIndex.map{ case (rvect, i) =>
rvect.zipWithIndex.map{ case (ax, j) => ((j,(i,ax))
}.groupByKey
.sortBy{ case (i, ax) => i }
.foldByKey(new DenseVector(origMatRdd.numRows())) { case (dv, (ix,ax)) =>
dv(ix) = ax
}
Caveat: I have not tested the above: it will have bugs. But the basic approach is valid - and similar to work I had done in the past for a small LinAlg library for spark.
博客围绕Spark中RowMatrix的转置问题展开。因API文档无转置方法,对于大矩阵转置存在内存问题。介绍了将RowMatrix转换为DenseMatrix的方式,还给出非分布式和分布式矩阵转置的代码示例,虽未测试可能有bug,但基本思路可行。
2639

被折叠的 条评论
为什么被折叠?



