Cloudera spark does not support sparksql. Here, I will take cdh-5.4.1 spark as example to enabled sparkSql.
The overall steps wil be update hive version, resolve compile issue and update spark package.
- Update hive version: cdh 5.4.1 is using hive1.1, while current apache spark only support hive0.12 and hive0.13. Here update hive version to 0.13.1a:
pom.xml:
- <hive.group>org.apache.hive</hive.group>
+ <hive.group>org.spark-project.hive</hive.group>
- <hive.version>${cdh.hive.version}</hive.version>
+ <hive.version>0.13.1a</hive.version>
- <hive.version.short>1.1.0</hive.version.short>
+ <hive.version.short>0.13.1</hive.version.short>
<profile>
<id>hive-0.13.1</id>
<properties>
+ <hive.group>org.spark-project.hive</hive.group>
<hive.version>0.13.1a</hive.version> - Resolve compile issues:
pom.xml:
+ <groupId>jline</groupId>
+ <artifactId>jline</artifactId>
+ <version>0.9.94</version>
+ </dependency>
+ <dependency>
sql/hive-thriftserver/pom.xml
+ <dependency>
+ <groupId>jline</groupId>
+ <artifactId>jline</artifactId>
+ <version>0.9.94</version>
+ </dependency>
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala
- // SessionManager.init() initializes the log manager but this method never actually calls
- // super.init(), so fix that here.
- val logManager = new LogManager()
- setSuperField(this, "logManager", logManager)
- addService(logManager)
-
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
- import org.apache.hive.com.esotericsoftware.kryo.Kryo
+ import com.esotericsoftware.kryo.Kryo
- Update spark package:
- Repackage spark: ./make-distribution.sh --tgz -Pyarn -Phive -Phive-thriftserver
- copy hive-site.xml to ${SPARK_HOME}/conf
- Update ${SPARK_HOME}/lib directory
本文介绍如何为 CDH 5.4.1 中的 Spark 启用 Spark SQL 功能。主要包括更新 Hive 版本至 0.13.1a,解决编译问题,以及重新打包 Spark 程序。这些步骤能够确保 Spark SQL 在 CDH 5.4.1 上正常运行。
1790

被折叠的 条评论
为什么被折叠?



