spark-submit
在spark安装目录的bin目录下有一个spark-submit脚本,可以用来提交运行spark程序
如果配置了spark的path可以直接使用spark-submit命令
编译构建spark程序
使用sbt 或者maven构建程序生成jar包
spark-submit的使用
spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
--class: 要运行的jar包里的类,比如 test.spark.examples
--master: master的地址 比如 spark://23.195.26.187:7077
--deploy-mode: 部署模式
--conf: 运行时的一些配置 “key=value”类型
application-jar: 要运行的jar包路径,可以是hdfs:// 开头或者 file:// 开头。比如:/root/program/spark/test.jar
application-arguments: 要传给运行类主方法的参数,没有可以不传
例子
# 本地运行,使用8个核心,传入参数100
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
http://path/to/examples.jar \
1000
例子:
程序:
路径:
/root/worspace/test-1.0.jar
命令:
spark-submit --class SparkSQLExample --master local /root/worspace/test-1.0.jar
结果:
部分输出如下
17/10/09 17:58:20 INFO DAGScheduler: ResultStage 9 (show at SparkSQLExample.scala:104) finished in 0.027 s
17/10/09 17:58:20 INFO DAGScheduler: Job 7 finished: show at SparkSQLExample.scala:104, took 0.044894 s
+--------------------+----+-------+
| _corrupt_record| age| name|
+--------------------+----+-------+
| null|null|Michael|
| null| 30| Andy|
| null| 19| Justin|
|spark-submit --cl...|null| null|
| 100|null| null|
+--------------------+----+-------+