reduce
针对集合是单个元素的,reduceByKey
针对有key
的reduce
是行动算子,reduceByKey
的转换算子
案例
reduce
val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
val sc: SparkContext = spark.sparkContext
val rdd: RDD[Int] = sc.parallelize(Array(1, 2, 3))
println(rdd.reduce(_ + _))
val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
val sc: SparkContext = spark.sparkContext
val rdd: RDD[Int] = sc.parallelize(Array(1, 1, 2, 3))
val tuple: (Int, Int) = rdd.map((_, 1)).reduce((x: (Int, Int), y: (Int, Int)) => {
(x._1 + y._1, 1)
})
println(tuple)
reduceByKey
val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
val sc: SparkContext = spark.sparkContext
val rdd: RDD[Int] = sc.parallelize(Array(1, 1, 2, 3))
val value: RDD[(Int, Int)] = rdd.map((_, 1)).reduceByKey(_ + _)
println(value.collect().mkString("Array(", ", ", ")"))
参考
Spark-reduce和reduceByKey_lisery的博客-优快云博客