def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T]
该函数其实就是coalesce函数第二个参数为true的实现,coalesce 有合并联合的意思,更偏向于合并分区,而 repartion 算子就是重新分区的意思。
scala> var rdd2 = data.repartition(1)
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at repartition at <console>:29
scala> rdd2.partitions.size
res8: Int = 1
scala> var rdd2 = data.repartition(4)
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19] at repartition at <console>:29
scala> rdd2.partitions.size
res9: Int = 4