Spark2.2源码阅读顺序
1. Spark2.2源码分析:Spark-Submit提交任务
2. Spark2.2源码分析:Driver的注册与启动
客户端通过spark-submit命令提交作业后,会在spark-submit进程里做一系列操作(对应图中0部分)
spark集群启动后会干的事情大概画图如下:
概述整体步骤
1.先执行spark-submit脚本,准备参数,选择集群管理器
2.启动driver,注册application,启动executor,划分任务,分发任务
3.返回(或则落地)计算结果,spark任务计算完成
1.用户提交Spark命令如下
./bin/spark-submit \
--class cn.face.cdp.run.WordCount \
--master spark://192.168.1.11:7077 \
--deploy-mode cluster \
--executor-memory 4G \
--total-executor-cores 20 \
--supervise \
./face.jar \
city
这个sh内部会执行一个javal类的main方法
export PYTHONHASHSEED=0
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
显而易见,我们找到这个类的main方法就能一窥究竟
org.apache.spark.deploy.SparkSubmit
override def main(args: Array[String]): Unit = {
//检查参数封装后返回
val appArgs = new SparkSubmitArguments(args)
...
//匹配传过来的类型,这里走submit case
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs)
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
}
}
private def