spark中自定义多维度排序

最新推荐文章于 2022-01-18 15:06:02 发布

转载最新推荐文章于 2022-01-18 15:06:02 发布 · 138 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/jackie2016/p/5667025.html

文章标签：

#大数据

本文介绍如何在Spark中实现多维度排序。通过自定义排序Key，实现了基于三个维度的数据排序，并给出了具体的实现代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在spark中，原始的sortByKey是以map为操作对象，按照key进行排序，value跟随

如果我们要设置多维排序，就需要自定义这个key对象

下面以三维度为例：

该class需要 extends Ordered[T] with Serializable , 然后将这个类的对象作为sortByKey的第一个key参数，进行sort

val conf = new SparkConf()
conf.setAppName("thirdSort")
conf.setMaster("local")
val sc = new SparkContext(conf)

val lines = sc.textFile("d:/third.txt")

val a = lines.map(line => (new ThirdOrderKey(line.split(" ")(0).toInt,line.split(" ")(1).toInt,line.split(" ")(2).toInt),line))     // 将每行的key分别装入到ThirdOrderKey对象

a.sortByKey(false).map(x=>x._2).collect.foreach (println)                           //  sortByKey的结果会自动添加一行key，结果是value

下面是ThirdOrderKey的定义，2个关键点，一个是extends Ordered，另外一个是实现compare方法

class  ThirdOrderKey(val first:Int,val second:Int,val thrid:Int) extends Ordered[ThirdOrderKey] with Serializable {

  def compare(other:ThirdOrderKey):Int ={

    if(this.first-other.first!=0) {
      this.first-other.first
    }
    else if(this.second - other.second !=0)
    {
      this.second-other.second
    }
    else
    {
      this.thrid - other.thrid
    }
  }
}

转载于:https://www.cnblogs.com/jackie2016/p/5667025.html