[TOC]
在Ambari中使用SparkSQL连结Hive读取Hbase映射表时,需要在修改配置如下.
对应组件版本
- spark2.3.0
- hive3.0.0
- hbase2.0.0
- Ambari2.7.1
- HDP3.0
建立软连接
在所有的SparkClient目录中增加Hbase的软连接
ln -s /usr/hdp/current/hbase-client/lib/hbase-client.jar /usr/hdp/current/spark2-client/jars/hbase-client.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-protocol.jar /usr/hdp/current/spark2-client/jars/hbase-protocol.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-common.jar /usr/hdp/current/spark2-client/jars/hbas-common.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-server.jar /usr/hdp/current/spark2-client/jars/hbase-server.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-annotations.jar /usr/hdp/current/spark2-client/jars/hbase-annotations.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-backup.jar /usr/hdp/current/spark2-client/jars/hbase-backup.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-endpoint.jar /usr/hdp/current/spark2-client/jars/hbase-endpoint.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-examples.jar /usr/hdp/current/spark2-client/jars/hbase-examples.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-external-blockcache.jar /usr/hdp/current/spark2-client/jars/hbase-external-blockcache.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar /usr/hdp/current/spark2-client/jars/hbase-hadoop-compat.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar /usr/hdp/current/spark2-client/jars/hbase-hadoop2-compat.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-http.jar /usr/hdp/current/spark2-client/jars/hbase-http.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-it.jar /usr/hdp/current/spark2-client/jars/hbase-it.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-mapreduce.jar /usr/hdp/current/spark2-client/jars/hbase-mapreduce.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-metrics-api.jar /usr/hdp/current/spark2-client/jars/hbase-metrics-api.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-metrics.jar /usr/hdp/current/spark2-client/jars/hbase-metrics.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-procedure.jar /usr/hdp/current/spark2-client/jars/hbase-procedure.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-protocol-shaded.jar /usr/hdp/current/spark2-client/jars/hbase-protocol-shaded.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-replication.jar /usr/hdp/current/spark2-client/jars/hbase-replication.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-resource-bundle.jar /usr/hdp/current/spark2-client/jars/hbase-resource-bundle.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-rest.jar /usr/hdp/current/spark2-client/jars/hbase-rest.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-rsgroup.jar /usr/hdp/current/spark2-client/jars/hbase-rsgroup.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-shaded-client.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-client.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-shaded-mapreduce.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-mapreduce.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-spark.jar /usr/hdp/current/spark2-client/jars/hbase-spark.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-thrift.jar /usr/hdp/current/spark2-client/jars/hbase-thrift.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-zookeeper.jar /usr/hdp/current/spark2-client/jars/hbase-zookeeper.jar
ln -s /usr/hdp/current/hbase-client/lib/hbase-shaded-netty-2.1.0.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-netty-2.1.0.jar
ln -s /usr/hdp/current/hbase-client/lib/metrics-core-3.2.1.jar /usr/hdp/current/spark2-client/jars/metrics-core-3.2.1.jar
ln -s /usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-miscellaneous-2.1.0.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-miscellaneous-2.1.0.jar
ln -s /usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-mapreduce.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-mapreduce.jar
ln -s /usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-client.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-client.jar
ln -s /usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-protobuf-2.1.0.jar /usr/hdp/current/spark2-client/jars/hbase-shaded-protobuf-2.1.0.jar
增加Hive软连接
ln -s /usr/hdp/current/hive-server2/lib/hive-hbase-handler.jar /usr/hdp/current/spark2-client/jars/hive-hbase-handler.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-accumulo-handler.jar /usr/hdp/current/spark2-client/jars/hive-accumulo-handler.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-beeline.jar /usr/hdp/current/spark2-client/jars/hive-beeline.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-classification.jar /usr/hdp/current/spark2-client/jars/hive-classification.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-cli.jar /usr/hdp/current/spark2-client/jars/hive-cli.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-common.jar /usr/hdp/current/spark2-client/jars/hive-common.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-contrib.jar /usr/hdp/current/spark2-client/jars/hive-contrib.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-druid-handler.jar /usr/hdp/current/spark2-client/jars/hive-druid-handler.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-exec.jar /usr/hdp/current/spark2-client/jars/hive-exec.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-hcatalog-core.jar /usr/hdp/current/spark2-client/jars/hive-hcatalog-core.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-hcatalog-server-extensions.jar /usr/hdp/current/spark2-client/jars/hive-hcatalog-server-extensions.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-hplsql.jar /usr/hdp/current/spark2-client/jars/hive-hplsql.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-jdbc-handler.jar /usr/hdp/current/spark2-client/jars/hive-jdbc-handler.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-jdbc.jar /usr/hdp/current/spark2-client/jars/hive-jdbc.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-kryo-registrator.jar /usr/hdp/current/spark2-client/jars/hive-kryo-registrator.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-llap-client.jar /usr/hdp/current/spark2-client/jars/hive-llap-client.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-llap-common.jar /usr/hdp/current/spark2-client/jars/hive-llap-common.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-llap-ext-client.jar /usr/hdp/current/spark2-client/jars/hive-llap-ext-client.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-llap-server.jar /usr/hdp/current/spark2-client/jars/hive-llap-server.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-llap-tez.jar /usr/hdp/current/spark2-client/jars/hive-llap-tez.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-metastore.jar /usr/hdp/current/spark2-client/jars/hive-metastore.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-pre-upgrade.jar /usr/hdp/current/spark2-client/jars/hive-pre-upgrade.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-serde.jar /usr/hdp/current/spark2-client/jars/hive-serde.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-service-rpc.jar /usr/hdp/current/spark2-client/jars/hive-service-rpc.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-service.jar /usr/hdp/current/spark2-client/jars/hive-service.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-shims-common.jar /usr/hdp/current/spark2-client/jars/hive-shims-common.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-shims-scheduler.jar /usr/hdp/current/spark2-client/jars/hive-shims-scheduler.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-shims.jar /usr/hdp/current/spark2-client/jars/hive-shims.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-standalone-metastore.jar /usr/hdp/current/spark2-client/jars/hive-standalone-metastore.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-storage-api.jar /usr/hdp/current/spark2-client/jars/hive-storage-api.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-streaming.jar /usr/hdp/current/spark2-client/jars/hive-streaming.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-testutils.jar /usr/hdp/current/spark2-client/jars/hive-testutils.jar
ln -s /usr/hdp/current/hive-server2/lib/hive-vector-code-gen.jar /usr/hdp/current/spark2-client/jars/hive-vector-code-gen.jar
在$SPARK_HOME/standalone-metastore目录下增加hive-hbase-handler软连接
ln -s /usr/hdp/current/hive-server2/lib/hive-hbase-handler.jar /usr/hdp/current/spark2-client/standalone-metastore/hive-hbase-handler.jar
项目POM清单
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>XXXXX</groupId>
<artifactId>XXXXXXX</artifactId>
<version>1.0</version>
<properties>
<spark.version>2.3.1</spark.version>
<hive.version>3.1.0</hive.version>
<hbase.version>2.0.0</hbase.version>
<scala.version>2.11</scala.version>
<zookeeper.version>3.4.13</zookeeper.version>
<hadoop.version>3.1.0</hadoop.version>
</properties>
<dependencies>
<!-- 加入spark支持 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- 加入hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<artifactId>httpclient</artifactId>
<groupId>org.apache.httpcomponents</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>org.glassfish</groupId>
<artifactId>javax.el</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-procedure</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol-shaded</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-miscellaneous</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-netty</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase.thirdparty</groupId>
<artifactId>hbase-shaded-protobuf</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-core4</artifactId>
<version>4.2.0-incubating</version>
</dependency>
<!-- 加入hive支持-->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>org.glassfish</groupId>
<artifactId>javax.el</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- 加入logback -->
<!-- 日志 -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>com.typesafe.scala-logging</groupId>
<artifactId>scala-logging-slf4j_2.11</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<!--<testSourceDirectory>src/test/java</testSourceDirectory>-->
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.bcht.bigdata.streaming.ApplicationRabbitMQ</mainClass>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<id>scala-compile-first</id>
<goals>
<goal>compile</goal>
</goals>
<configuration>
<includes>
<include>**/*.scala</include>
</includes>
</configuration>
</execution>
<execution>
<id>scala-test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
配置Spark
打开Ambari,找到Spark2 ->CONFIGS ->Advanced spark2-defaults。配置spark.driver.extraLibraryPath与spark.executor.extraLibraryPath分别如下:
spark.driver.extraLibraryPath={{spark_hadoop_lib_native}}:/usr/hdp/current/spark2-client/standalone-metastore/hive-hbase-handler.jar
spark.executor.extraLibraryPath={{spark_hadoop_lib_native}}:/usr/hdp/current/spark2-client/standalone-metastore/hive-hbase-handler.jar
提交Spark-submit测试
程序代码为:
##System.setProperty("hadoop.home.dir", "D:\\hadoop-3.1.0")
val warehouseLocation:String="hdfs://nn1.bcht:8020/user/hive/warehouse"
val ss = SparkSession.builder().master("local[*]").appName("XXXXXXX").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
ss.sqlContext.table("hive_vio_violation").createOrReplaceTempView("hive_vio_violation")
ss.sqlContext.sql("select * from hive_vio_violation limit 10").show(10)
最后化打包上传至集群,使用Spark-submit进行测试
spark-submit --master yarn --deploy-mode cluster --files /usr/local/bcht_lhyjg/hive-site.xml --class com.bcht.bigdata.lhyjg.Application_ydfx_bak /usr/local/bcht_lhyjg/original-LHYJG-1.0.jar