版本信息
spark version 2.3.3
jdk 1.8
idea 2019
MacBook Pro
ShuffleDependency
我们先在idea中搜素一下ShuffleDependency
可以看到,生成的依赖是ShuffleDependency的RDD有
CoGroupedRDD
ShuffledRDD
SubtractedRDD
然后我们分别看下什么算子产生了这些RDD
ShuffledRDD
我们看到有4个算子
org.apache.spark.rdd.OrderedRDDFunctions#sortByKey
org.apache.spark.rdd.OrderedRDDFunctions#repartitionAndSortWithinPartitions
org.apache.spark.rdd.PairRDDFunctions#combineByKeyWithClassTag
org.apache.spark.rdd.PairRDDFunctions#partitio