看到网上这个帖子 说map拉模式,mappartition是推模式
不能说错,对应不懂技术 刚刚学习的进行简单理解
https://blog.youkuaiyun.com/xingzhiqing/article/details/56304155
通过源代码进行剖析
def mapPartitionsWithIndex[U: ClassTag](
f: (Int, Iterator[T]) => Iterator[U],
preservesPartitioning: Boolean = false): RDD[U] = withScope {
val cleanedF = sc.clean(f)
new MapPartitionsRDD(
this,
(context: TaskContext, index: Int, iter: Iterator[T]) => cleanedF(index, iter),
preservesPartitioning)
}
di重点看标红的
def map[U: ClassTag](f: T => U): RDD[U] = withScope { val cleanF = sc.clean(f) new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF)) }
看map 调用两个参数 第三省略了
都调用MapPartitionsRDD
看MapPartitionsRDD具体实现 看标红 第三参数 默认是false
看方法体的第一行 override val partitioner = if (preservesPartitioning) firstParent[T].partitioner else None
如果是false 就分区,如果非false用非类的分区
private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
var prev: RDD[T],
f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator)
preservesPartitioning: Boolean = false)
extends RDD[U](prev) {
override val partitioner = if (preservesPartitioning) firstParent[T].partitioner else None
override def getPartitions: Array[Partition] = firstParent[T].partitions
override def compute(split: Partition, context: TaskContext): Iterator[U] =
f(context, split.index, firstParent[T].iterator(split, context))
override def clearDependencies() {
super.clearDependencies()
prev = null
}
}