spark的ContextCleaner清理

本文深入探讨了Spark中ContextCleaner的运作机制,详细解释了如何通过弱引用清理无用的RDD、Broadcast等数据,以及ContextCleaner的内部结构和清理流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

ContextCleaner是Spark中用来清理无用rdd,broadcast等数据的清理器,其主要用到的是java的weakReference弱引用来达成清理无用数据的目的。

 

ContextCleaner主要由两个线程两个集合组成。

private val referenceBuffer =
  Collections.newSetFromMap[CleanupTaskWeakReference](new ConcurrentHashMap)

private val referenceQueue = new ReferenceQueue[AnyRef]

在ContextCleaner中,需要将来被清理而注册到ContextCleaner的数据都将会被构造成下文的CleanupTaskWeakReference。

private class CleanupTaskWeakReference(
    val task: CleanupTask,
    referent: AnyRef,
    referenceQueue: ReferenceQueue[AnyRef])
  extends WeakReference(referent, referenceQueue)

task代表具体的清理事情,为case类,根据数据类型区分具体的事件类型以便确定具体的清理方法。

referent为具体的被注册到这里的数据,将会直接被作为弱引用WeakReference构造方法的一员参数被使用,而referenceQueue则是上文中提到的ContextCleaner生效的集合之一,用来构造WeakReference,当被注册的数据只剩下当前这唯一一个弱引用,而在别处没有引用之后,将会准备作为gc的一部分被清理回收,并被放入到此referenceQueue中被获取到。

而上文另一个容器referenceBuffer则用阿里存放CleanupTaskWeakReference,可根据具体的数据类型确定具体的清理步骤。

CleanupTaskWeakReference的注册如下:

def registerRDDCheckpointDataForCleanup[T](rdd: RDD[_], parentId: Int): Unit = {
  registerForCleanup(rdd, CleanCheckpoint(parentId))
}

/** Register an object for cleanup. */
private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask): Unit = {
  referenceBuffer.add(new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue))
}

另外两个线程如下:

private val cleaningThread = new Thread() { override def run() { keepCleaning() }}

private val periodicGCService: ScheduledExecutorService =
  ThreadUtils.newDaemonSingleThreadScheduledExecutor("context-cleaner-periodic-gc")

def start(): Unit = {
  cleaningThread.setDaemon(true)
  cleaningThread.setName("Spark Context Cleaner")
  cleaningThread.start()
  periodicGCService.scheduleAtFixedRate(new Runnable {
    override def run(): Unit = System.gc()
  }, periodicGCInterval, periodicGCInterval, TimeUnit.SECONDS)
}

两个线程都将在ContextCleaner中被开启,其中单个线程池中的线程职责很简单,则是简单的调用System.gc()去开启垃圾回收进行数据清理。

另一个线程会在start()方法中被设置为守护线程,并被启动,其会开始执行keepCleaning()方法。

private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
  while (!stopped) {
    try {
      val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
        .map(_.asInstanceOf[CleanupTaskWeakReference])
      // Synchronize here to avoid being interrupted on stop()
      synchronized {
        reference.foreach { ref =>
          logDebug("Got cleaning task " + ref.task)
          referenceBuffer.remove(ref)
          ref.task match {
            case CleanRDD(rddId) =>
              doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
            case CleanShuffle(shuffleId) =>
              doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks)
            case CleanBroadcast(broadcastId) =>
              doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
            case CleanAccum(accId) =>
              doCleanupAccum(accId, blocking = blockOnCleanupTasks)
            case CleanCheckpoint(rddId) =>
              doCleanCheckpoint(rddId)
          }
        }
      }
    } catch {
      case ie: InterruptedException if stopped => // ignore
      case e: Exception => logError("Error in cleaning thread", e)
    }
  }
}

在这里,会不断从上文提到,需要被回收的对象将会在referenceQueue中,从这里取得并从erferenceBuffer中得到对应的case类确定执行清理的具体步骤,并移除。例如如果为rdd,则在这里获得的是CleanRDD,并将调用doCleanRDD()方法根据rddId去回收该rdd。

def doCleanupRDD(rddId: Int, blocking: Boolean): Unit = {
  try {
    logDebug("Cleaning RDD " + rddId)
    sc.unpersistRDD(rddId, blocking)
    listeners.asScala.foreach(_.rddCleaned(rddId))
    logInfo("Cleaned RDD " + rddId)
  } catch {
    case e: Exception => logError("Error cleaning RDD " + rddId, e)
  }
}

Rdd的具体回收包含两步,首先从blockManager中移除该数据,之后调用监听器通知rddCleaned()被回收。

在dataworks上使用scala编写spark任务,获取hologres表数据,数据量有1个亿, 现在报错 2025-08-04 10:04:28,985 ERROR org.apache.spark.ContextCleaner - Error cleaning broadcast 7 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:155) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) ~[spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at scala.Option.foreach(Option.scala:257) [scala-library-2.11.8.jar:?] at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) [spark-core_2.11-2.3.0-odps0.30.0.jar:?] Caused by: java.io.IOException: Failed to send RPC 5204201379953139886 to /25.91.0.79:52768: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) ~[spark-network-common_2.11-2.3.0-odps0.30.0.jar:?] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1316) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_65-AliJVM] Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[netty-all-4.1.17.Final.jar:4.1.17.Final]
最新发布
08-05
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值