原始数据:
两个文件
根据第三列,进行倒叙排序,取最大前5个
代码:
import org.apache.spark.{SparkConf, SparkContext}
object Top {
def main(args: Array[String]): Unit = {
//creat conf
val conf = new SparkConf().setAppName("TopApplicationTest").setMaster("local")
//create sc
val sc = new SparkContext(conf)
//read
val lines = sc.textFile("file:///D:/doc/spark/input/dir1/*",1)
//clean
val cleanLines = lines.filter(s => s.trim().nonEmpty && s.split(",").length == 4)
//split and to special array
val sortedRDD = cleanLines.map(s => s.split(",")).map(s => (s(2).toInt, (s(0), s(1), s(3)))).sortByKey(ascending =
false)
//top 5
val top = sortedRDD.take(5)
top.foreach(println(_))
}
}
ps: 在对键值对进行 排序前一定要 将key 进行 toInt 不然排序会失效