文章目录
0.前言
最近接到flink读写hive的任务需求,不过flink1.9.x才支持hive的connector,而且需要针对集群hadoop版本对flink进行编译安装。
在编译的过程中遇到很多依赖包的问题,相关解决方法都在第3部分提到,如果已经配置过或者编译没有问题可以跳过。过程中大多参考网上资料,一些做得有问题的地方敬请指出。
针对CDH-5.14.2编译后的Flink-1.9.1安装包上传在了百度网盘,如有需要自取(仅供参考,概不负责):
链接:https://pan.baidu.com/s/18OGSyuPl_ZYAPLCy5DuDTA
提取码:xbso
复制这段内容后打开百度网盘手机App,操作更方便哦
1. 准备工作
2. 下载源码包
[root@node01 ~]# cd /opt/software/
[root@node01 software]# wget http://archive.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-src.tgz
[root@node01 software]# tar -zxf flink-1.9.1-src.tgz -C /opt/module/
3. 准备操作
3.1 配置支持CDH依赖
maven默认不支持cdh的依赖下载,修改maven目录下conf中的settings.xml(/opt/module/maven3/conf/settings.xml
)如下:(这里的cloudera-releases是flink源码中配置的id)
<mirrors>
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>*,!cloudera-releases,!cloudera</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
<!-- hortonworks maven -->
<mirror>
<id>nexus-hortonworks</id>
<mirrorOf>*,!central</mirrorOf>
<name>Nexus hortonworks</name>
<url>https://repo.hortonworks.com/content/groups/public/</url>
</mirror>
</mirrors>
修改flink-1.9.1/pom.xml
,添加:
<!--添加CDH的仓库-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
3.2 编译Flink-shaded
不同的 Flink 版本使用的 Flink-shaded不同,1.9.0 版本使用 7.0
如果不编译的话会报错找不到:flink-shaded-hadoop-2:jar:2.6.0-cdh5.14.2-7.0
[ERROR] Failed to execute goal on project flink-hadoop-fs: Could not resolve dependencies for project org.apache.flink:flink-hadoop-fs:jar:1.9.1: The following artifacts could not be resolved: org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-cdh5.14.2-7.0, org.apache.hadoop:hadoop-hdfs:jar:tests:2.6.0-cdh5.14.2, org.apache.hadoop:hadoop-common:jar:tests:2.6.0-cdh5.14.2: Could not find artifact org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-cdh5.14.2-7.0 in nexus-hortonworks (https://repo.hortonworks.com/content/groups/public/) -> [Help 1]
因此,这一步需要手动编译flink-shaded-hadoop-2,并将其打入到maven库。
1)下载flink-shaded-7.0-src.tgz:
tar -zxvf flink-shaded-7.0-src.tgz -C /opt/module/
cd flink-shaded-7.0
2)修改项目pom.xml
在 flink-shaded-7.0/pom.xml
文件中添加 cloudera 的maven库:
<!--添加CDH的仓库-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
在flink-shaded-7.0/flink-shaded-hadoop-2/pom.xml
文件中也添加:
<!--添加CDH的仓库-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
在 flink-shaded-7.0/flink-shaded-hadoop-2-uber/pom.xml
中的 dependencyManagement 标签中添加如下依赖:
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.3.1</version>
</dependency>
注意:这一步一定要添加,不然编译成功后,启动不了,并 .out 文件中抛出如下错误:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
原因是项目打包后,依赖的 commons-cli 是1.2版本的,build 方法在该版本中不存在。
3)开始编译:
mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.14.2
3.3 flink测试模块删减
删除flink中的以下test模块,防止编译出错:
<module>flink-tests</module>
<module>flink-end-to-end-tests</module>
<module>flink-yarn-tests</module>
<module>flink-fs-tests</module>
删除后:
<modules>
<!-- Dummy module to force execution of the Maven Shade plugin (see Shade plugin below) -->
<module>tools/force-shading</module>
<module>flink-annotations</module>
<module>flink-shaded-curator</module>
<module>flink-core</module>
<module>flink-java</module>
<module>flink-scala</module>
<module>flink-filesystems</module>
<module>flink-runtime</module>
<module>flink-runtime-web</module>
<module>flink-optimizer</module>
<module>flink-streaming-java</module>
<module>flink-streaming-scala</module>
<module>flink-connectors</module>
<module>flink-formats</module>
<module>flink-examples</module>
<module>flink-clients</module>
<module>flink-container</module>
<module>flink-queryable-state</module>
<module>flink-test-utils-parent</module>
<module>flink-state-backends</module>
<module>flink-libraries</module>
<module>flink-table</module>
<module>flink-quickstart</module>
<module>flink-contrib</module>
<module>flink-dist</module>
<module>flink-mesos</module>
<module>flink-metrics</module>
<module>flink-yarn</module>
<module>flink-docs</module>
<module>flink-python</module>
<module>flink-ml-parent</module>
</modules>
3.4 配置支持maven-assembly-plugin插件
编辑flink-1.9.1/flink-libraries/pom.xml
,新增maven-assembly-plugin插件,否则会报错。
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>