环境配置
Maven-3.3.3
JDK 7u79
Scala 2.10.6
Hive 2.0.1
Spark 1.5.0 source
Hadoop 2.6.4
Hive版本Spark版本要相匹配,因此下载Hive源代码的pom.xml中查看spark.version来确定要使用的spark版本。
Note that you must have a version of Spark which does not include the Hive jars. Meaning one which was not built with the Hive profile.
注意:Spark官网上pre-build spark-2.x都是集成Hive的,所以想要使用Hive on spark那么必须要下载源代码进行编译
推荐 hive-1.2.1 on spark-1.3.1 / hive-2.0.1 on spark-1.5.2
编译Spark
默认是使用Scala 2.10.4来编译的
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.6 -DskipTests clean package
./make-distribution.sh --name xm-spark --tgz -Phadoop-2.6 -Pyarn
若果是用Scala 2.11.x来编译