spark core源码阅读-广播(九)

本文深入探讨Spark Core的广播机制,包括TorrentBroadcast和HttpBroadcast的write、read、delete操作。讲解BroadcastFactory如何创建和删除广播变量,以及在Executor端如何通过HttpBroadcast从Driver端获取数据,TorrentBroadcast的分片传播策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

spark-core源码阅读-广播

前言

(spark core源码阅读-Task介绍(六))我们讨论过submitMissingTasks,期间谈到broadcast,当时只是说把序列化过得taskBytes
广播到出去,这里详细讨论广播实现原理

BroadcastManager在SparkEnv初始化时被实例化,实例化中initialize实例化BroadcastFactory,默认是TorrentBroadcastFactory

  • BroadcastFactory类别

    • TorrentBroadcastFactory
      对应TorrentBroadcast
    • HttpBroadcastFactory
      对应HttpBroadcast
  • BroadcastFactory主要两个方法:

    • newBroadcast:创建广播数据变量
      SparkContext.broadcast广播数据
    • unbroadcast:删除广播数据
      registerBroadcastForCleanup=>doCleanupBroadcast=>unbroadcast
      registerBroadcastForCleanup用到知识点见WeakReference(弱引用)与WeakHashMap虚引用(PhantomReference)类似
  • Broadcast
    广播变量,通过BroadcastFactory创建出来

    scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))
    broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)
    scala> broadcastVar.value
    res0: Array[Int] = Array(1, 2, 3)

如何触发BroadcastFactory方法

1.SparkContext.broadcast

  def broadcast[T: ClassTag](value: T): Broadcast[T] = {
    assertNotStopped()
    if (classOf[RDD[_]].isAssignableFrom(classTag[T].runtimeClass)) {
      // This is a warning instead of an exception in order to avoid breaking user programs that
      // might have created RDD broadcast variables but not used them:
      logWarning("Can not directly broadcast RDDs; instead, call collect() and "
        + "broadcast the result (see SPARK-5063)")
    }
    //创建
    val bc = env.broadcastManager.newBroadcast[T](value, isLocal)
    val callSite = getCallSite
    logInfo("Created broadcast " + bc.id + " from " + callSite.shortForm)
    cleaner.foreach(_.registerBroadcastForCleanup(bc))
    bc
  }

2.弱引用CleanupTask,每次GC时如果broadcast变量没有强引用,则回收CleanupTaskWeakReference,并添加到referenceQueue

  def registerBroadcastForCleanup[T](broadcast: Broadcast[T]): Unit = {
    registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
  }
  private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask): Unit = {
    referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue)
  }

  private class CleanupTaskWeakReference(
      val task: CleanupTask,
      referent: AnyRef,
      referenceQueue: ReferenceQueue[AnyRef])
    extends WeakReference(referent, referenceQueue)

3.注册清理线程,keepCleaning,不断处理referenceQueue队列task,当没有task时,timeout 100ms

  private val cleaningThread = new Thread() { override 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值