Spark 源码阅读一-启动脚本

本文详细介绍了如何使用Spark 1.5版本进行编译,包括设置Maven版本、构建Scala2.11依赖、调整Java内存选项等关键步骤。同时提供了Spark的默认配置、常用命令、FAQ以及启动脚本执行示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Spark Complile

Help Links

// Because spark 1.5 need maven version:3.3.3 ,so i track the branch-1.4
git branch -a
git checkout --track origin/branch-1.4
git tag 
git checkout v1.4.1

//Building for Scala 2.11 
./dev/change-version-to-2.11.sh 

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

// edit ~/sql/catalyst/pom.xml replace quasiquotes_2.10 artifactId name
mvn clean package -DskipTests -Pscala-2.11 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.2.0
// some other option  -Psbt -Pjava8-tests -Phive-thriftserver -Ptest-java-home 

// Building a Runnable Distribution
./make-distribution.sh --name custom-spark --tgz -DskipTests -Pscala-2.11 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.2.0  

Note

  • If you compile error in hive-thrift module, add the following dependency in the pom
<dependency>
      <groupId>jline</groupId>
      <artifactId>jline</artifactId>
      <version>0.9.94</version>
    </dependency>

Configuration

examples

spark.master                     spark://master:7077
spark.master                     yarn-client
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://dmp.zamplus.net:9000/logs/spark
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              2g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

spark.yarn.jar                    hdfs://dmp.zamplus.net:9000/libs/spark-assembly-1.4.1-hadoop2.4.0.jar

Helps

Spark default Configuration

  • executors
    • –num-executors (default : 2)
    • –executor-cores (default : 1)
  • memory
    • –driver-memory 4g
    • –executor-memory 2g
  • Java OPTS
    • -verberos:gc -XX;+PrintGCDetails -XX:+PrintGCTimeStamps
    • spark.driver.extraJavaOptions -XX:PermSize=128M -XX:MaxPermSize=256M ( same as –driver-java-options in the command line)
  • spark.serializer
    • default : org.apache.spark.serializer.KryoSerializer
    • -

[TODO]

  • I don’t know when i use spark-shell script ,I must add a parameter -Dspark.master=spark://dmp.zamplus.net:7077. This really pullzed me.

Startup script execution

  • $SPARK_HOME/bin/spark-shell
  • $SPARK_HOME/bin/spark-submit --class org.apache.spark.repl.Main
  • $SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main
  • $JAVA_HOME/java -cp $SPARK_HOME/lib/spark-assembly-1.4.1-hadoop2.4.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main
  • $JAVA_HOME/java-cp$SPARK_HOME/conf/:$SPARK_HOME/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:$SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar:$SPARK_HOME/lib/datanucleus-core-3.2.10.jar:$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar:/home/wankun/hadoop/etc/hadoop/-Xms2g -Xmx2g -XX:MaxPermSize=256morg.apache.spark.deploy.SparkSubmit--classorg.apache.spark.repl.Mainspark-shell

Notes

  • The output cmds is separated by ‘\0’.

FAQ

  • Q1

Invalid initial heap size: -Xms2g
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

This error is because of error configuration in spark-default.properties . Two space after the spark.driver.memory parameter.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值