spark-core源码阅读-广播
前言
(spark core源码阅读-Task介绍(六))我们讨论过submitMissingTasks
,期间谈到broadcast,当时只是说把序列化过得taskBytes
广播到出去,这里详细讨论广播实现原理
BroadcastManager在SparkEnv初始化时被实例化,实例化中initialize
实例化BroadcastFactory
,默认是TorrentBroadcastFactory
BroadcastFactory类别
- TorrentBroadcastFactory
对应TorrentBroadcast - HttpBroadcastFactory
对应HttpBroadcast
- TorrentBroadcastFactory
BroadcastFactory主要两个方法:
- newBroadcast:创建广播数据变量
SparkContext.broadcast
广播数据 - unbroadcast:删除广播数据
registerBroadcastForCleanup=>doCleanupBroadcast=>unbroadcast
registerBroadcastForCleanup
用到知识点见WeakReference(弱引用)与WeakHashMap
虚引用(PhantomReference)类似
- newBroadcast:创建广播数据变量
Broadcast
广播变量,通过BroadcastFactory创建出来scala> val broadcastVar = sc.broadcast(Array(1, 2, 3)) broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0) scala> broadcastVar.value res0: Array[Int] = Array(1, 2, 3)
如何触发BroadcastFactory方法
1.SparkContext.broadcast
def broadcast[T: ClassTag](value: T): Broadcast[T] = {
assertNotStopped()
if (classOf[RDD[_]].isAssignableFrom(classTag[T].runtimeClass)) {
// This is a warning instead of an exception in order to avoid breaking user programs that
// might have created RDD broadcast variables but not used them:
logWarning("Can not directly broadcast RDDs; instead, call collect() and "
+ "broadcast the result (see SPARK-5063)")
}
//创建
val bc = env.broadcastManager.newBroadcast[T](value, isLocal)
val callSite = getCallSite
logInfo("Created broadcast " + bc.id + " from " + callSite.shortForm)
cleaner.foreach(_.registerBroadcastForCleanup(bc))
bc
}
2.弱引用CleanupTask,每次GC时如果broadcast变量没有强引用,则回收CleanupTaskWeakReference
,并添加到referenceQueue
def registerBroadcastForCleanup[T](broadcast: Broadcast[T]): Unit = {
registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
}
private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask): Unit = {
referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue)
}
private class CleanupTaskWeakReference(
val task: CleanupTask,
referent: AnyRef,
referenceQueue: ReferenceQueue[AnyRef])
extends WeakReference(referent, referenceQueue)
3.注册清理线程,keepCleaning
,不断处理referenceQueue
队列task,当没有task时,timeout 100ms
private val cleaningThread = new Thread() { override