Spark源码分析(二):Master注册机制

本文解析了Spark中的Master、Driver和Worker的注册流程。包括Master如何响应注册请求、Driver如何处理注册反馈及Worker启动后的注册过程。此外,还介绍了PersistenceEngine的作用及其在注册流程中的重要性。

Master注册机制

Application注册

前一篇已经分析了SparkContext的初始化流程,最后会向Master发送RegisterApplication类型的注册信息
下面看一下Master接收到这些信息之后,会怎样响应
首先Master类继承了ThreadSafeRpcEndpoint类
下面看一下Master的receive()
这里只看和应用注册相关的代码
因为Spark可以允许多个Master存在,但是只有其中一个是ACTIVE的,其他的都是STANDBY的,并且只有ACTIVE的Master会响应应用注册请求
主要做的就是将application信息添加到缓存结构中,并且向应用对应的driver发送类型为RegisteredApplication信息,最后会调用schedule(),schedule()的作用会在之后的博客中介绍

override def receive: PartialFunction[Any, Unit] = {
  // 处理来自application的注册信息
case RegisterApplication(description, driver) =>
  // TODO Prevent repeated registrations from some driver
  // 当前master是standby不是active,那么不做响应
  if (state == RecoveryState.STANDBY) {
    // ignore, don't send response
  } else {
    logInfo("Registering app " + description.name)
    // 使用注册信息创建application对象
    // 这里会生成application的id,具体格式为
    // val appId = "app-%s-%04d".format(createDateFormat.format(submitDate), nextAppNumber)
    // 其中nextAppNumber是从0开始自增的
    val app = createApplication(description, driver)
    registerApplication(app)
    logInfo("Registered app " + description.name + " with ID " + app.id)
    // 将当前的aplicationInfo加入到缓存引擎中
    persistenceEngine.addApplication(app)
    // 给driver发送响应信息
    driver.send(RegisteredApplication(app.id, self))
    schedule()
  }
}
/**
 * 处理来自application的注册
  * 将applicationInfo加入缓存
  * 将application加入等待队列
  * @param app
  */
private def registerApplication(app: ApplicationInfo): Unit = {
  // 获得应用driver的地址
  val appAddress = app.driver.address
  // 如果driver已经存在,那么就返回,判断为对app的重复注册
  if (addressToApp.contains(appAddress)) {
    logInfo("Attempted to re-register application at same address: " + appAddress)
    return
  }

  applicationMetricsSystem.registerSource(app.appSource)

  apps += app
  idToApp(app.id) = app
  endpointToApp(app.driver) = app
  addressToApp(appAddress) = app
  // 加入等待调度队列
  waitingApps += app
}

下面再看一下driver接收到响应信息之后会做什么,因为分析的是standalone模式,所以这里由StandaloneAppClient负责接收响应信息,从这里也可以看出AppClient用来和集群进行通信

override def receive: PartialFunction[Any, Unit] = {

  case RegisteredApplication(appId_, masterRef) =>
  	// 这里主要就是记录applicationId以及标志注册成功
    appId.set(appId_)
    registered.set(true)
    master = Some(masterRef)
    listener.connected(appId.get)
}

Driver注册

当使用spark-submit提交任务的时候,首先就会注册Driver,会向Master发送类型为RequestSubmitDriver的信息,下面看一下Master如何处理

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
 case RequestSubmitDriver(description) =>
 	// 当master处于ALIVE状态时,无法处理Driver的注册信息
   if (state != RecoveryState.ALIVE) {
     val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
       "Can only accept driver submissions in ALIVE state."
     context.reply(SubmitDriverResponse(self, false, None, msg))
   } else {
     logInfo("Driver submitted " + description.command.mainClass)
     // 下面将更新一些内存缓存
     val driver = createDriver(description)
     persistenceEngine.addDriver(driver)
     // 将driver添加到driver的等待队列中
     waitingDrivers += driver
     drivers.add(driver)
     // 触发调度
     schedule()

	// 向Driver发送响应消息
     context.reply(SubmitDriverResponse(self, true, Some(driver.id),
       s"Driver successfully submitted as ${driver.id}"))
   }
}

Worker注册

当worker启动之后也会向Master发送注册信息

case RegisterWorker(
   id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl, masterAddress) =>
   // 可以看到worker的信息中有worker的core数量以及RAM大小
   logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
     workerHost, workerPort, cores, Utils.megabytesToString(memory)))
     // 如果当前Master是STANDBY的,那么会通知worker
   if (state == RecoveryState.STANDBY) {
     workerRef.send(MasterInStandby)
   } else if (idToWorker.contains(id)) {
     workerRef.send(RegisterWorkerFailed("Duplicate worker ID"))
   } else {
     val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
       workerRef, workerWebUiUrl)
       // 将worker信息添加到缓存中
       
     if (registerWorker(worker)) {
       persistenceEngine.addWorker(worker)
       workerRef.send(RegisteredWorker(self, masterWebUiUrl, masterAddress))
       // 触发调度
       schedule()
     } else {
       val workerAddress = worker.endpoint.address
       logWarning("Worker registration failed. Attempted to re-register worker at same " +
         "address: " + workerAddress)
       workerRef.send(RegisterWorkerFailed("Attempted to re-register worker at same address: "
         + workerAddress))
     }
   }

下面看一下这三种注册中都出现了persistenceEngine
我们看一下PersistenceEngine是干什么用的

/**
 * Allows Master to persist any state that is necessary in order to recover from a failure.
 * The following semantics are required:
 *   - addApplication and addWorker are called before completing registration of a new app/worker.
 *   - removeApplication and removeWorker are called at any time.
 * Given these two requirements, we will have all apps and workers persisted, but
 * we might not have yet deleted apps or workers that finished (so their liveness must be verified
 * during recovery).
 *
 * The implementation of this trait defines how name-object pairs are stored or retrieved.
 */

基本意思就是说通过PersistenceEngine可以缓存那些从失败中恢复过来需要用到的状态,
并且必须能够保证以下几点:
(1)addApplication和addWorker是在完成一个新的app或者worker之前完成的
(2)可以在任何时候调用removeApplication和removeWorker
PersistenceEngine的初始化是在Master的onStart()方法中进行的,主要提供三种模式:ZOOKEEPER,FILESYSTEM,CUSTOM

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值