1. 软件准备
2. 编译源码包
(1) 解压源码包,修改项目根目录下的pom文件
根据自己使用的版本修改各组件的版本号,以下是我修改后的pom文件:
```
<groupId>com.hortonworks</groupId>
<artifactId>shc</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
<packaging>pom</packaging>
<properties>
<spark.version>2.3.0</spark.version>
<hbase.version>1.2.6</hbase.version>
<phoenix.version>4.14.0-HBase-1.2</phoenix.version>
<java.version>1.8</java.version>
</properties>
```
说明:
- 以上内容只是我修改的部分,没有修改的我没有贴出来
- 我修改了version,那么在子模块的pom中,也要修改为一样的version
以下是我修改后的两个子模块core和examples中的pom文件,只修改了version
<parent>
<groupId>com.hortonworks</groupId>
<artifactId>shc</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>shc-core</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
<packaging>jar</packaging>
<name>HBase Spark Connector Project Core</name>
<parent>
<groupId>com.hortonworks</groupId>
<artifactId>shc</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>shc-examples</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
<packaging>jar</packaging>
<name>HBase Spark Connector Project Examples</name>
(2) 编译源码
在源码包根目录下执行mvn命令:
mvn install -DskipTests
执行成功后,你的本地maven仓库中已经有了这个项目的jar包
3. 创建测试shc的maven工程
(1) 新建maven工程,在pom中引入我们编译好的shc-core的依赖
注意,我们只需要shc-core的依赖
<dependency>
<groupId>com.hortonworks</groupId>
<artifactId>shc-core</artifactId>
<version>spark-2.3.0-hbase-1.2.6</version>
</dependency>
(2) 导入spark相关的依赖,并解决依赖冲突
# 以下spark的依赖包排除了hadoop-client包,因为与shc-core中的hadoop-client有版本冲突
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>
# 手动引入hadoop-client的依赖
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.6</version>
</dependency>
这里选择引入hadoop2.7.6版本的hadoop-client是因为2.7版本的hadoop可以兼容全部版本的hbase,下图为hbase官网的hadoop与hbase各版本的兼容性对照表:
(3) 引入hbase相关的依赖并解决依赖冲突
这里只需要排除掉冲突的依赖就可以了
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.6</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.6</version>
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</exclusion>
</exclusions>
</dependency>
(4) 把hdfs-site.xml、core-site.xml和hbase-site.xml放到项目的resources目录下
(5) 其他
-
我在ideal中创建的这个maven工程,加入了scala-sdk,不再赘述
-
修改了项目架构,加入了scala主程序文件夹和测试文件夹
-
配置maven相关的插件,加入scala编译插件
<build> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.4.2</version> <executions> <execution> <id>scala-compile-first</id> <phase>process-resources</phase> <goals> <goal>add-source</goal> <goal>compile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>2.11.8</scalaVersion> <recompileMode>incremental</recompileMode> <useZincServer>true</useZincServer> <args> <arg>-unchecked</arg> <arg>-deprecation</arg> <arg>-feature</arg> </args> <javacArgs> <javacArg>-source</javacArg> <javacArg>1.8</javacArg> <javacArg>-target</javacArg> <javacArg>1.8</javacArg> </javacArgs> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <executions> <execution> <phase>compile</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> <configuration> <skip>true</skip> <source>1.8</source> <target>1.8</target> <encoding>UTF-8</encoding> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <version>3.1.0</version> <configuration> <encoding>UTF-8</encoding> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.1.1</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <!-- 设置jar包的主类 --> <!-- <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.bonc.rdpe.spark.hbase.SparkToHBase</mainClass> </transformer> </transformers> --> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions> </plugin> </plugins> </build>
至此,开始测试shc之前的全部准备工作就做好了!
转自:CoderJed
https://www.jianshu.com/p/ab09eb36b32a