How to convert matrix to RDD[Vector] in spark

转载于 2017-07-21 11:28:00 发布 · 137 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/xiaoma0529/p/7216802.html

文章标签：

#大数据 #人工智能

本文介绍如何从奇异值分解(SVD)得到的矩阵转换为Spark RDD格式，以便进行聚类分析。通过将矩阵转换成RDD，可以进一步应用于支持RDD输入的聚类算法中。

The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis.

if your clustering only supports RDD as its input, here's how you can do the transformation

  def toRDD(sc :SparkContext,m: Matrix): RDD[Vector] = {
        val columns: Iterator[Array[Double]] = m.toArray.grouped(m.numRows)
//        val rows: Seq[Array[Double]] = columns.toSeq // Skip this if you want a column-major RDD.
        val rows: Seq[Seq[Double]] = columns.toSeq.transpose // Skip this if you want a column-major RDD.
        val vectors: Seq[DenseVector] = rows.map(row => new DenseVector(row.toArray))
        sc.parallelize(vectors)

转载于:https://www.cnblogs.com/xiaoma0529/p/7216802.html