driver是进程吗?
先说结论:
1.Standalone模式中:
- client模式下,driver是spark-submit进程中开启的一个线程,然后通过反射执行driver代码的main方法。
- cluster模式下,是开启DriverWrapper进程来运行driver。
2.Yarn模式中:
- client模式下,driver是spark-submit进程中开启一个线程,然后通过反射执行driver代码的main方法。
- cluster模式下,driver是ApplicationMaster进程中开启的一个线程,通过反射执行driver代码的main方法。
Standalone模式
- client模式:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop000:7077 \
--deploy-mode client \
./examples/jars/spark-examples_2.11-2.4.2.jar 1000
运行的进程:
[hadoop@hadoop000 ~]$ jps
16610 CoarseGrainedExecutorBackend
15156 Worker
15062 Master
16551 SparkSubmit
16713 Jps
- cluster模式:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop000:7077 \
--deploy-mode cluster \
./examples/jars/spark-examples_2.11-2.4.2.jar 1000
启动的进程:
开始:
[hadoop@hadoop000 ~]$ jps
16416 CoarseGrainedExecutorBackend
15156 Worker
16309 SparkSubmit
15062 Master
16348 DriverWrapper
16476 Jps
几秒后,SparkSubmit会退出,shell面板没有运行日志:
[hadoop@hadoop000 ~]$ jps
16209 CoarseGrainedExecutorBackend
15156 Worker
16276 Jps
15062 Master
16141 DriverWrapper
Yarn模式
- client模式:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.11-2.4.3.jar 1000
运行的进程:
[hadoop@hadoop000 ~]$ jps
18740 ExecutorLauncher
16949 ResourceManager
17061 NodeManager
17813 SecondaryNameNode
18021 SparkSubmit
18917 CoarseGrainedExecutorBackend
17640 DataNode
17500 NameNode
18940 Jps
18846 CoarseGrainedExecutorBackend
- cluster模式:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
./examples/jars/spark-examples_2.11-2.4.3.jar 1000
运行的进程,shell面板只有完成状态的日志,没有运行结果:
[hadoop@hadoop000 ~]$ jps
21041 Jps
16949 ResourceManager
17061 NodeManager
17813 SecondaryNameNode
17640 DataNode
20777 ApplicationMaster
20026 SparkSubmit
20923 CoarseGrainedExecutorBackend
17500 NameNode
21006 CoarseGrainedExecutorBackend
详情源码下面两篇博客讲的很好:
【Spark】部署流程的深度了解
Spark源码 —— 从 SparkSubmit 到 Driver启动