spark ListenerBus

本文深入解析Spark中的ListenerBus机制,探讨其如何实现事件的异步处理与监听器的解耦,特别关注SparkListenerBus和AsyncEventQueue的角色及其实现细节。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

spark ListenerBus

系统中,常常需要异步处理监听事件.用监听器,可以解耦系统.ListenerBus 总是监听器总线.

ListenerBus 类
/**
 * An event bus which posts events to its listeners.
  * 事件总线 发送事件到监听器
 */
 //L 表示监听器, E表示传给监听器的事件.
private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {
 //这里用 一个CopyOnWriteArrayList来存储 监听器,CopyOnWriteArrayList
 适用于多读少写的场景,用在这里很适合
  private[this] val listenersPlusTimers = new CopyOnWriteArrayList[(L, Option[Timer])]
SparkListenerInterface 类

这个接口,就是ListenerBus 中的L(泛型),并定义了要处理的事件.

/**
 * Interface for listening to events from the Spark scheduler. Most applications should probably
 * extend SparkListener or SparkFirehoseListener directly, rather than implementing this class.
 *
 * Note that this is an internal interface which might change in different Spark releases.
 */
private[spark] trait SparkListenerInterface {

  /**
   * Called when a stage completes successfully or fails, with information on the completed stage.
   */
  def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit
  
SparkListenerBus 类实现了ListenerBus ,专为SparkListenerInterface 和对应的SparkListenerEvent事件服务

这个类 实现 了

/**
 * A [[SparkListenerEvent]] bus that relays [[SparkListenerEvent]]s to its listeners
 */
private[spark] trait SparkListenerBus
  extends ListenerBus[SparkListenerInterface, SparkListenerEvent] {

  protected override def doPostEvent(
      listener: SparkListenerInterface,
      event: SparkListenerEvent): Unit = {
    event match {
        //根据事件的不同调用不同的方法
      case stageSubmitted: SparkListenerStageSubmitted =>
        listener.onStageSubmitted(stageSubmitted)
AsyncEventQueue 类,实现了SparkListenerBus

通过有界的LinkedBlockingQueue来管理事件,有需要发送的消息时,就把它添加到LinkedBlockingQueue中. AsyncEventQueue 还需要启动一个线程,不断的消费LinkedBlockingQueue中的内容,把它传递到 实际的监听器.

一个AsyncEventQueue 就对应一个线程.

/**
 * An asynchronous queue for events. All events posted to this queue will be delivered to the child
 * listeners in a separate thread.
 *
 * Delivery will only begin when the `start()` method is called. The `stop()` method should be
 * called when no more events need to be delivered.
  * 事件的异步队列,发送到这个队列的所有事件将在一个单独的线程中被分发到listeners
  * 当start方法被调用,分发将开始.当没有事件需要分发进,stop方法被调用
 */
private class AsyncEventQueue(
    val name: String,
    conf: SparkConf,
    metrics: LiveListenerBusMetrics,
    bus: LiveListenerBus)
  extends SparkListenerBus
  with Logging {

  import AsyncEventQueue._

  // Cap the capacity of the queue so we get an explicit error (rather than an OOM exception) if
  // it's perpetually being added to more quickly than it's being drained.
  private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](
    conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY))
如果消息太多,超过有界队列怎么办

spark的处理是把消息丢掉,并打印日志.

LiveListenerBus

由于spark中的监听器太多了,一个线程不够用.所以分类成

private[scheduler] val SHARED_QUEUE = “shared”

private[scheduler] val APP_STATUS_QUEUE = “appStatus”

private[scheduler] val EXECUTOR_MANAGEMENT_QUEUE = “executorManagement”

private[scheduler] val EVENT_LOG_QUEUE = “eventLog”

每个类别下面,可以有多个监听器.

/**
 * Asynchronously passes SparkListenerEvents to registered SparkListeners.
 *
 * Until `start()` is called, all posted events are only buffered. Only after this listener bus
 * has started will events be actually propagated to all attached listeners. This listener bus
 * is stopped when `stop()` is called, and it will drop further events after stopping.
 */
private[spark] class LiveListenerBus(conf: SparkConf) {

  import LiveListenerBus._

  private var sparkContext: SparkContext = _
  private val queues = new CopyOnWriteArrayList[AsyncEventQueue]()
如何使用
  • 方法1
    在这里插入图片描述
  • 方法2
  /**
   * Registers listeners specified in spark.extraListeners, then starts the listener bus.
   * This should be called after all internal listeners have been registered with the listener bus
   * (e.g. after the web UI and event logging listeners have been registered).
   * 注册 在 in spark.extraListeners里指定的listeners ,然后启动 listener bus.
   */
  private def setupAndStartListenerBus(): Unit = {
    try {
      conf.get(EXTRA_LISTENERS).foreach { classNames =>
        val listeners = Utils.loadExtensions(classOf[SparkListenerInterface], classNames, conf)
        listeners.foreach { listener =>
          listenerBus.addToSharedQueue(listener)
          logInfo(s"Registered listener ${listener.getClass().getName()}")
        }
      }
    } catch {
      case e: Exception =>
        try {
          stop()
        } finally {
          throw new SparkException(s"Exception when registering SparkListener", e)
        }
    }

方法2应该好于方法1,因为在sparkContext中, listenerBus已经被开启了,在开启后在加入listener.
有的消息可能无法消费到.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值