Spark-submit提交作业

最新推荐文章于 2024-08-11 23:51:02 发布

U2014684

最新推荐文章于 2024-08-11 23:51:02 发布

阅读量533

点赞数

CC 4.0 BY-SA版权

分类专栏： Spark

本文链接：https://blog.youkuaiyun.com/m0_38028800/article/details/88795235

Spark 专栏收录该内容

3 篇文章

订阅专栏

本文介绍了如何使用`spark-submit`命令提交Spark作业，包括常见的命令实例和部分关键参数的解释，详细参考资料链接见官方文档。

提交命令：

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

常用命令举例：

# Run application locally on 8 cores
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

# Run on a Kubernetes cluster in cluster deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://xx.yy.zz.ww:443 \
  --deploy-mode cluster \
  --executor-memory 20G \
  --num-executors 50 \
  http://path/to/examples.jar \
  1000

部分命令参数解释：

--class: 应用程序入口，即main函数所在类
--master: master节点的URL (e.g. spark://23.195.26.187:7077)
--deploy-mode: 部署模式，client和cluster两种，默认是client
--conf: 配置参数
application-jar: 应用和所有依赖jar的路径（必须对所有集群节点可见，hdfs:// path or a file:// path)
application-arguments: 传递到主函数的参数

参考：https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit