SparkSubmit提交任务时,launcher.main中构建命令的SparkSubmitCommandBuilder类调用的OptionParser类的parse(),对提交的参数进行解析。OptionParser继承了 SparkSubmitOptionParser,parse方法是直接调用了SparkSubmitOptionParser的。
private class OptionParser extends SparkSubmitOptionParser {...}
真正执行SparkSubmit提交任务时,SparkSubmit类中的parseArguments()的实例SparkSubmitArguments类,继承了SparkSubmitArgumentsParser抽象类,而SparkSubmitArgumentsParser也是继承的SparkSubmitOptionParser。
SparkSubmitOptionParser这才是真正的解析参数的类,这里解读下参数解析过程和内容。
我们提交任务时,一般设置参数是
spark-submit \
--name "appName" \
--master local[4] \
--conf spark.eventLog.enabled=false \
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
myApp.jar
例如本地跑SparkPi,参数为
--class org.apache.spark.examples.SparkPi
--master spark://192.168.2.1:7077
D:\spark\spark-2.4.3\examples\target\original-spark-examples_2.11-2.4.3.jar
具体参数可以参考
先从SparkSubmit的parseArguments()开始。
可以看到,parseArguments其实就是SparkSubmitArguments 类的实例,这里创建了SparkSubmitArguments(args)实例。
object SparkSubmit extends CommandLineUtils with Logging {
...
// 重写了class SparkSubmit的解析加载参数方法
override protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
// 创建SparkSubmitArguments类实例,这里进入了SparkSubmitArguments()中进行初始化
new SparkSubmitArguments(args) {
// 这里的log信息其实就是记录了org.apache.spark.deploy.SparkSubmit
override protected def logInfo(msg: => String): Unit = self.logInfo(msg)
override protected def logWarning(msg: => String): Unit = self.logWarning(msg)
}
}
...
}
进入SparkSubmitArguments类中,定义了一堆参数,其实都是各种运行模式需要的参数。
/**
* Parses and encapsulates arguments from the spark-submit script.
* The env argument is used for testing.
*/
private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env)
extends SparkSubmitArgumentsParser with Logging {
var master: String = null
var deployMode: String = null
var executorMemory: String = null
var executorCores: String = null
var totalExecutorCores: String = null
var propertiesFile: String = null
var driverMemory: String = null
var driverExtraClassPath: String = null
var driverExtraLibraryPath: String = null
var driverExtraJavaOptions: String = null
var queue: String = null
var numExecutors: String = null
var files: String = null
var archives: String = null
var mainClass: String = null
var primaryResource: String = null
var name: String = null
var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
var jars: String = null
var packages: String = null
var repositories: String = null
var ivyRepoPath: String = null
var ivySettingsPath: Option[String] = None
var packagesExclusions: String = null
var verbose: Boolean = false
var isPython: Boolean = false
var pyFiles: String = null
var isR: Boolean = false
var action: SparkSubmitAction = null
// 构建一个HashMap的sparkProperties,后面会将参数都放入里面
val sparkProperties: HashMap[String, String] = new HashMap[String, String]()
var proxyUser: String = null
var principal: String = null
var keytab: String = null
private var dynamicAllocationEnabled: Boolean = false
// Standalone cluster mode only
var supervise: Boolean = false
var driverCores: String = null
var submissionToKill: String = null
var submissionToRequestStatusFor: String = null
var useRest: Boolean = false // used internally
/** Default properties present in the currently defined defaults file. */
// 获取默认配置文件中的配置
// lazy val是第一次被访问时才被执行一次,不确定访问次数时,可以节省时间,val 是定义的时候就执行
lazy val defaultSparkProperties: HashMap[String, String] = {
// 创建一个默认配置文件的HashMap容器
val defaultProperties = new HashMap[String, String]()
if (verbose) {
logInfo(s"Using properties file: $propertiesFile")
}
// 将配置解析成k/v格式
Option(propertiesFile).foreach { filename =>
val properties = Utils.getPropertiesFromFile(filename)
properties.foreach { case (k, v) =>
// 将k/v格式参数放入容器中
defaultProperties(k) = v
}
// Property files may contain sensitive information, so redact before printing
// 如果数据格式不规范,先对格式进行校正,但不改变数据内容
if (verbose) {
Utils.redact(properties).foreach { case (k, v) =>
logInfo(s"Adding default property: $k=$v")
}
}
}
// 返回解析后的配置
defaultProperties
}
// Set parameters from command line arguments
// 解析提交时的参数
// 这个方法是直接调用了SparkSubmitOptionParser类的parse(),在launcher.main解析时解读过
// 因为SparkSubmitOptionParser是java类,这里.asJava转成了Java风格进行处理
parse(args.asJava)
// Populate `sparkProperties` map from properties file
// 将上面解析的默认配置文件的参数填充到sparkProperties中,方法就在下面
mergeDefaultSparkProperties()
// Remove keys that don't start with "spark." from `sparkProperties`.
// 过滤上面合并的参数中不是.spark开头的项
ignoreNonSparkProperties()
// Use `sparkProperties` map along with env vars to fill in any missing parameters
// 加载env中的环境变量,并和spark的合并参数再合并,补全缺少的参数
loadEnvironmentArguments()
// 从参数中判断是否开启REST服务
useRest = sparkProperties.getOrElse("spark.master.rest.enabled", "false").toBoolean
// 验证参数
validateArguments()
/**
* Merge values