一、启动driver
1.首先在Master.scala类中执行schedule()方法,该方法主要有两个方法lanuchDriver()和launchExecutor()分别用来启动driver和executor。在master上面一旦可用资源发生变动或者有新的application提交进来之后就会调用该schedule()方法。
2.先去调度所有的driver,针对这些application采取严格的优先级.当该worker上面的可用内存大于application需要的内存,以及worker上面可用的core数目大于application中需要的core数目,通过launchDriver(worker, driver)才能够启动dirver,启动之后将该dirver就从正在等待的driver列表(ArrayBuffer)中删除同时将当前driver的状态设置为(launched=)true。
def launchDriver(worker: WorkerInfo, driver: DriverInfo) { logInfo("Launching driver " + driver.id + " on worker " + worker.id) worker.addDriver(driver) driver.worker = Some(worker) worker.actor ! LaunchDriver(driver.id, driver.desc)//向Worker节点发送一个基于AKKA ACTOR的事件通知模型的样例类LaunchDriver driver.state = DriverState.RUNNING }
3.worker接收到master发送来的LanuchDriver事件通知(Worker.scala类中)
case LaunchDriver(driverId, driverDesc) => { logInfo(s"Asked to launch driver $driverId") val driver = new DriverRunner( conf, driverId, workDir, sparkHome, driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)), self, akkaUrl) drivers(driverId) = driver driver.start() coresUsed += driverDesc.cores memoryUsed += driverDesc.mem }
在LaunchRiver内部,创建DriverRunner对象(内部封装了一个线程),将DriverRunner对象添加进该Worker内部的dirver列表(HashMap<Key(driverID), Value(DriverRunner对象)>),调用DriverRunner对象的start方法,同时修改该worker进程的可用cores个数,内存个数。
4.查看driver.start()方法(DriverRunner.scala)
该方法主要作用:①创建的driver工作的目录,②从master中下载要执行的jar包(移动计算、不移动数据),③调用内部的方法launchDriver(ProcessBuilder, driverDriver, supervise)来启动driver启动ProcessBuilder(使用java的API去操作一个java的进程|Process Runtime)。 driver在此启动起来def start() = { new Thread("DriverRunner for " + driverId) { override def run() { try { val driverDir = createWorkingDirectory() val localJarFilename = downloadUserJar(driverDir) def substituteVariables(argument: String): String = argument match { case "{{WORKER_URL}}" => workerUrl case "{{USER_JAR}}" => localJarFilename case other => other } // TODO: If we add ability to submit multiple jars they should also be added here /* 类似组织启动进程的命令如下: Storm jar jar-path classpath paramters */ val builder = CommandUtils.buildProcessBuilder(driverDesc.command, driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables) launchDriver(builder, driverDir, driverDesc.supervise) } catch { case e: Exception => finalException = Some(e) } //获取当前driver的启动状态(KILLED ERROR FINISHED FAILED) val state = if (killed) { DriverState.KILLED } else if (finalException.isDefined) { DriverState.ERROR } else { finalExitCode match { case Some(0) => DriverState.FINISHED case _ => DriverState.FAILED } } finalState = Some(state) worker ! DriverStateChanged(driverId, state, finalException) } }.start() }
并向worker发送DriverStateChanged(driverId, state, finalException)事件通知,worker接收到后,向master发送相同的DriverStateChanged(driverId, state, finalException)事件通知,
二、启动executor
1.类似于启动driver。在schedule()方法满足一定的条件(e.g:资源满足等等其他条件)后启动launchExecutor() (Master.scala)
2.在worker上面,创建了一个内部持有一个Thread线程的ExecutorRunner的对象,内部封装了启动该executor所需要的信息,调用start方法def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc) { logInfo("Launching executor " + exec.fullId + " on worker " + worker.id) worker.addExecutor(exec) worker.actor ! LaunchExecutor(masterUrl, exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory) //启动executor之后,将该executor注册添加进driver exec.application.driver ! ExecutorAdded( exec.id, worker.id, worker.hostPort, exec.cores, exec.memory) }
3. 二-2中manager.start()方法(ExecutorRunner.scala)case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => ...... val manager = new ExecutorRunner(...) manager.start() coresUsed += cores_ memoryUsed += memory_ master ! ExecutorStateChanged(appId, execId, manager.state, None, None) ......
def start() { workerThread = new Thread("ExecutorRunner for " + fullId) { override def run() { fetchAndRunExecutor() } } workerThread.start() // Shutdown hook that kills actors on shutdown. shutdownHook = new Thread() { override def run() { killProcess(Some("Worker shutting down")) } } Runtime.getRuntime.addShutdownHook(shutdownHook) }
}
def fetchAndRunExecutor() { try { // Launch the process val builder = CommandUtils.buildProcessBuilder(appDesc.command, memory, sparkHome.getAbsolutePath, substituteVariables) val command = builder.command() logInfo("Launch command: " + command.mkString("\"", "\" \"", "\"")) builder.directory(executorDir) builder.environment.put("SPARK_LOCAL_DIRS", appLocalDirs.mkString(",")) // In case we are running this from within the Spark Shell, avoid creating a "scala" // parent process for the executor command builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0") // Add webUI log urls val baseUrl = s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType=" builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr") builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout") process = builder.start() val header = "Spark Executor Command: %s\n%s\n\n".format( command.mkString("\"", "\" \"", "\""), "=" * 40) // Redirect its stdout and stderr to files val stdout = new File(executorDir, "stdout") stdoutAppender = FileAppender(process.getInputStream, stdout, conf) val stderr = new File(executorDir, "stderr") Files.write(header, stderr, UTF_8) stderrAppender = FileAppender(process.getErrorStream, stderr, conf) // Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown) // or with nonzero exit code val exitCode = process.waitFor() state = ExecutorState.EXITED val message = "Command exited with code " + exitCode worker ! ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)) } catch { case interrupted: InterruptedException => { logInfo("Runner thread for executor " + fullId + " interrupted") state = ExecutorState.KILLED killProcess(None) } case e: Exception => { logError("Error running executor", e) state = ExecutorState.FAILED killProcess(Some(e.toString)) } } }
主要作用:类似driver中创建该executor工作目录,下载运行的jar。在fetchAndRunExecutor()开启执行application的进程executor。并向worker发送ExecutorStateChanged的事件通知,
4.worker向master发送接收到的ExecutorStateChanged的事件通知。