SparkContext初始化的流程(源码)

原创已于 2023-05-08 19:32:29 修改 · 1.3k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #spark

于 2021-08-15 22:37:15 首次发布

Spark 专栏收录该内容

20 篇文章

订阅专栏

5. StandaloneAppClient

6. Master

1. SparkConf

  - SparkConf对象，也就是Spark的配置对象，用来描述Spark的配置信息，主要是以键值对的行式加载配置信息。
  - 一旦通过new SparkConf()完成了对象的实例化，会默认的加载spark.*配置文件。
    class SparkConf(loadDefaults: Boolean) {
        def this() = this(true)
    }

注意事项

  - SparkContext对象的实例化，需要有一个SparkConf对象作为参数。
  - 在SparkContext内部，会完成对这个SparkConf对象的克隆，得到一个各个属性值都完全相同的对象，但是和传入的SparkConf并不是同一个对象
  - 在SparkContext后续的操作中，使用到的都是这个克隆的SparkConf对象。
  
  - 注意事项：将SparkConf对象作为参数，传递给SparkContext对象，后续在修改这个SparkConf对象是无效的！
  
  /** Copy this object */
  override def clone: SparkConf = {
      val cloned = new SparkConf(false)
      settings.entrySet().asScala.foreach { e =>
          cloned.set(e.getKey(), e.getValue(), true)
      }
      cloned
  }

2. SparkContext

  SparkContext的初始化过程:
  1. 初始化了SparkConf对象，读取了默认的配置信息，并可以设置一些信息。
  2. 将SparkConf对象，加载到SparkContext中，对各个配置属性进行初始化的设置。
  3. 通过createTaskScheduler方法，实例化了TaskScheduler和DAGScheduler。

  // 根据传入的Master地址，创建SchedulerBackend和TaskScheduler
  // SparkContext.scala ，line 2692
  private def createTaskScheduler(
      sc: SparkContext,
      master: String,
      deployMode: String): (SchedulerBackend, TaskScheduler) = {
      import SparkMasterRegex._
  
      // When running locally, don't try to re-execute tasks on failure.
      val MAX_LOCAL_TASK_FAILURES = 1
  
      master match {
          // setMaster("local")，local模式
          case "local" =>
              val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
              val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
              scheduler.initialize(backend)
              (backend, scheduler)
  
          // setMaster("local[2]") || setMaster("local[*]")，local模式
          case LOCAL_N_REGEX(threads) =>
              def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
              // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
              val threadCount = if (threads == "*") localCpuCount else threads.toInt
              if (threadCount <= 0) {
                  throw new SparkException(s"Asked to run locally with $threadCount threads")
              }
              val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
              val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
              scheduler.initialize(backend)
              (backend, scheduler)
  
          // Standalone模式
          case SPARK_REGEX(sparkUrl) =>
              val scheduler = new TaskSchedulerImpl(sc)
              val masterUrls = sparkUrl.split(",").map("spark://" + _)
              val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
              scheduler.initialize(backend)
              (backend, scheduler)
  
          // 其他的资源调度，例如Mesos、YARN
          case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
              // Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
              val memoryPerSlaveInt = memoryPerSlave.toInt
              if (sc.executorMemory > memoryPerSlaveInt) {
                  throw new SparkException(
                      "Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
                          memoryPerSlaveInt, sc.executorMemory))
              }
  
              val scheduler = new TaskSchedulerImpl(sc)
              val localCluster = new LocalSparkCluster(
                  numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
              val masterUrls = localCluster.start()
              val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
              scheduler.initialize(backend)
              backend.shutdownCallback = (backend: StandaloneSchedulerBackend) => {
                  localCluster.stop()
              }
              (backend, scheduler)
      }
  }

3. TaskScheduler

  /**
   * Low-level task scheduler interface, currently implemented exclusively by
   * [[org.apache.spark.scheduler.TaskSchedulerImpl]].
   * This interface allows plugging in different task schedulers. Each TaskScheduler schedules tasks
   * for a single SparkContext. These schedulers get sets of tasks submitted to them from the
   * DAGScheduler for each stage, and are responsible for sending the tasks to the cluster, running
   * them, retrying if there are failures, and mitigating stragglers. They return events to the
   * DAGScheduler.
   */
  TaskScheduler是一个低级别的Task调度的接口，目前只有一个实现类，就是TaskSchedulerImpl。这个TaskScheduler可以挂载在不同的调度器上（指的是SchedulerBackend）
  
  每一个TaskScheduler只能为一个SparkContext调度任务。初始化TaskScheduler是处理的之前的Spark的任务。如果有新的SparkApplication提交，此时就会销毁当前的TaskScheduler，并创建一个新的TaskScheduler来处理新的任务。
  
  TaskScheduler可以从DAGScheduler获取每一个Stage的TaskSet，用来提交处理这些Task，发到集群执行。如果失败后，会进行重复提交，处理散兵游勇。并将任务的执行结果结果反馈给DAGScheduler。
  （散兵游勇: 提交给集群运行的Task，可能会有掉队的情况，需要将这样的Task处理掉，不至于由于这一两个Task影响整体的执行。）

3.1. TaskSchedulerImpl

  客户端需要先调用initialize()和start()方法，然后才可以通过runTasks提交TaskSet

  // line81
  // Task的等待时常，默认是100ms
  val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms")
  // line92
  // 初始化TaskSet的时常，默认是15s
  val STARVATION_TIMEOUT_MS = conf.getTimeAsMs("spark.starvation.timeout", "15s")
  // line95
  // 每一个Task分配到的CPU核数
  val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)
  // line136
  // 调度模式，默认是FIFO
  private val schedulingModeConf = conf.get(SCHEDULER_MODE_PROPERTY, SchedulingMode.FIFO.toString)

  // CoarseGrainedSchedulerBackend
  // 粗粒度调度器（CoarseGrainedSchedulerBackend）
  //     Job的每一个生命周期中，都有一个Executor。
  //     当一个Task执行结束后，并不会立即释放Executor。
  //     当一个新的Task进来之后，不会创建一个新的Executor，会复用之前的Executor。
  //     实现了Executor的复用。
  // 
  // 细粒度调度器（FineGrainedSchedulerBackend）
  //     Task执行结束后，会释放Executor。
  //     当一个新的Task进来之后，会创建一个新的Executor去执行。
  // 
  // Standalone模式和YARN模式，只支持粗粒度调度器。
  // Mesos支持细粒度调度器。
  
  // 任务调度的方式:
  //     FIFO : 先进先出的调度方式
  //            优先将Executor分配到一个Worker上，当这个Worker资源不足的时候，才会将Executor分配到其他的Worker。
  //     FAIR : 公平调度
  //            基于负载均衡，平均的将Executor分配到每一个Worker节点
  def initialize(backend: SchedulerBackend) {
      this.backend = backend
      schedulableBuilder = {
        schedulingMode match {
          case SchedulingMode.FIFO =>
            new FIFOSchedulableBuilder(rootPool)
          case SchedulingMode.FAIR =>
            new FairSchedulableBuilder(rootPool, conf)
          case _ =>
            throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
            s"$schedulingMode")
        }
      }
      schedulableBuilder.buildPools()
  }
  
  override def start() {
      backend.start()
  
      if (!isLocal && conf.getBoolean("spark.speculation", false)) {
        logInfo("Starting speculative execution thread")
        speculationScheduler.scheduleWithFixedDelay(new Runnable {
          override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
            checkSpeculatableTasks()
          }
        }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
      }
  }

3.2. StandaloneSchedulerBackend

  override def start() {
      // 调用父类CoarseGrainedSchedulerBackend中的方法实现
      // driverEndpoint = createDriverEndpointRef(properties)
      // 实例化了一个Driver的RPC通信终端
      super.start()
  
      // ...
      
      // 创建了一个Application的描述对象，传递了一系列参数，表示Application所需要的资源信息
      val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
        webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
      // 创建了一个Application的任务对象，包含了作业的资源信息
      // 用于和集群管理器进行通信
      client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
      client.start()
      launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
      // 等待注册是否完成，在StandaloneAppClient完成
      waitForRegistration()
      launcherBackend.setState(SparkAppHandle.State.RUNNING)
    }

4. DriverEndpoint

  是CoarseGrainedSchedulerBackend的内部类，是Driver端的通信模型

  override def onStart() {
      // Periodically revive offers to allow delay scheduling to work
      val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s")
  
      reviveThread.scheduleAtFixedRate(new Runnable {
          override def run(): Unit = Utils.tryLogNonFatalError {
              // 给自己发送一个ReviveOffers信号
              Option(self).foreach(_.send(ReviveOffers))
          }
      }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
  }

  override def receive: PartialFunction[Any, Unit] = {
      case StatusUpdate(executorId, taskId, state, data) =>
          scheduler.statusUpdate(taskId, state, data.value)
          if (TaskState.isFinished(state)) {
              executorDataMap.get(executorId) match {
                  case Some(executorInfo) =>
                  executorInfo.freeCores += scheduler.CPUS_PER_TASK
                  makeOffers(executorId)
                  case None =>
                  // Ignoring the update since we don't know about the executor.
                  logWarning(s"Ignored task status update ($taskId state $state) " +
                             s"from unknown executor with ID $executorId")
              }
          }
  
      case ReviveOffers =>
          makeOffers()
  }

  // 为Executor创建虚拟的资源信息
  private def makeOffers() {
      // Make sure no executor is killed while some task is launching on it
      val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {
          // Filter out executors under killing
          val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
          val workOffers = activeExecutors.map { case (id, executorData) =>
              new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
          }.toIndexedSeq
          scheduler.resourceOffers(workOffers)
      }
      if (!taskDescs.isEmpty) {
          launchTasks(taskDescs)
      }
  }

5. StandaloneAppClient

  override def onStart(): Unit = {
      try {
          // 向Master发送注册信息
          // 参数的1代表第一次注册，在注册逻辑中，如果注册失败，则会将这个数字+1继续调用注册
          // 失败次数 >= 3的时候，注册失败
          registerWithMaster(1)
      } catch {
          case e: Exception =>
          logWarning("Failed to connect to master", e)
          markDisconnected()
          stop()
      }
  }

  private def registerWithMaster(nthRetry: Int) {
      registerMasterFutures.set(tryRegisterAllMasters())
      registrationRetryTimer.set(registrationRetryThread.schedule(new Runnable {
          override def run(): Unit = {
              if (registered.get) {
                  registerMasterFutures.get.foreach(_.cancel(true))
                  registerMasterThreadPool.shutdownNow()
              } else if (nthRetry >= REGISTRATION_RETRIES) {
                  markDead("All masters are unresponsive! Giving up.")
              } else {
                  registerMasterFutures.get.foreach(_.cancel(true))
                  registerWithMaster(nthRetry + 1)
              }
          }
      }, REGISTRATION_TIMEOUT_SECONDS, TimeUnit.SECONDS))
  }

6. Master

  // line 258
  // 在receive方法中，用来接收Driver端发送过来的消息，进行模式匹配
  
  case RegisterApplication(description, driver) =>
      // TODO Prevent repeatd registrations from some driver
      if (state == RecoveryState.STANDBY) {
          // ignore, don't send response
      } else {
          logInfo("Registering app " + description.name)
          // 创建应用程序，封装对应的Driver端的资源
          val app = createApplication(description, driver)
          // 在Master的内部完成Appliction的注册
          registerApplication(app)
          logInfo("Registered app " + description.name + " with ID " + app.id)
          // 使用持久化的操作，将任务的元信息保存，以便Task使用
          persistenceEngine.addApplication(app)
          // 告诉Driver端，注册完成！
          driver.send(RegisteredApplication(app.id, self))
          schedule()
      }

StandaloneAppClient

  override def receive: PartialFunction[Any, Unit] = {
      case RegisteredApplication(appId_, masterRef) =>
          // FIXME How to handle the following cases?
          // 1. A master receives multiple registrations and sends back multiple
          // RegisteredApplications due to an unstable network.
          // 2. Receive multiple RegisteredApplication from different masters because the master is
          // changing.
          appId.set(appId_)
          registered.set(true)
          master = Some(masterRef)
          listener.connected(appId.get)