构建Flink与Hadoop集成-优快云博客

译自：https://ci.apache.org/projects/flink/flink-docs-release-1.7/flinkDev/building.html

编译Flink

为了构建Flink，您需要下载源代码。地址：https://github.com/apache/flink

此外，您还需要Maven 3和JDK。Flink 至少需要Java 8才能构建。

注意：Maven 3.3.x可以构建Flink，但不会正确地遮蔽某些依赖项。Maven 3.2.5正确创建了库。要构建单元测试，请使用Java 8u51或更高版本来防止使用PowerMock运行程序的单元测试失败。

要从git克隆，请输入：

git clone https://github.com/apache/flink

选择分支git checkout release-1.7

构建Flink的最简单方法是运行：

mvn clean install -DskipTests

这指示Maven（mvn）首先删除所有现有的构建（clean），然后创建一个新的Flink二进制文件（install）。

要加快构建速度，您可以跳过测试，QA插件和JavaDocs：

mvn clean install -DskipTests -Dfast

下面为pom.xml中关于fast的信息，会跳过一些东西，加速编译。

<profile>
   <id>fast</id>
   <activation>
      <property>
         <name>fast</name>
      </property>
   </activation>
   <build>
      <pluginManagement>
         <plugins>
            <plugin>
               <groupId>org.apache.rat</groupId>
               <artifactId>apache-rat-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
            <plugin>
               <groupId>org.apache.maven.plugins</groupId>
               <artifactId>maven-checkstyle-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
            <plugin>
               <groupId>org.scalastyle</groupId>
               <artifactId>scalastyle-maven-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
            <plugin>
               <groupId>org.apache.maven.plugins</groupId>
               <artifactId>maven-enforcer-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
            <plugin>
               <groupId>org.apache.maven.plugins</groupId>
               <artifactId>maven-javadoc-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
            <plugin>
               <groupId>com.github.siom79.japicmp</groupId>
               <artifactId>japicmp-maven-plugin</artifactId>
               <configuration>
                  <skip>true</skip>
               </configuration>
            </plugin>
         </plugins>
      </pluginManagement>
   </build>
</profile>

默认构建的Flink的JAR添加了hadoop2，以允许将Flink使用HDFS和YARN。

Hadoop版本支持

信息大多数用户不需要手动执行此操作。因为已经包含了常见的Hadoop版本的二进制软件包。

Flink依赖于HDFS和YARN，它们都是来自apache hadoop依赖项。存在许多不同版本的Hadoop（来自上游项目和不同的Hadoop发行版）。如果使用错误的版本组合，则可能发生异常。

Hadoop仅从2.4.0版本开始支持。您还可以指定要构建的特定Hadoop版本：

mvn clean install -DskipTests -Dhadoop.version=2.6.1

hadoop其他特定版本的支持

要针对特定于供应商的Hadoop版本构建Flink，请发出以下命令：

mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.1-cdh5.0.0

在-Pvendor-repos激活一个Maven 建立简档，其包括流行的Hadoop厂商如Cloudera的，Hortonworks，或MAPR的存储库。

pom.xml中对应的配置为，包括了Cloudera的，Hortonworks，或MAPR仓库地址

<profile>
   <id>vendor-repos</id>
   <activation>
      <property>
         <name>vendor-repos</name>
      </property>
   </activation>
   <!-- Add vendor maven repositories -->
   <repositories>
      <!-- Cloudera -->
      <repository>
         <id>cloudera-releases</id>
         <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
         <releases>
            <enabled>true</enabled>
         </releases>
         <snapshots>
            <enabled>false</enabled>
         </snapshots>
      </repository>
      <!-- Hortonworks -->
      <repository>
         <id>HDPReleases</id>
         <name>HDP Releases</name>
         <url>http://repo.hortonworks.com/content/repositories/releases/</url>
         <snapshots><enabled>false</enabled></snapshots>
         <releases><enabled>true</enabled></releases>
      </repository>
      <repository>
         <id>HortonworksJettyHadoop</id>
         <name>HDP Jetty</name>
         <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop</url>
         <snapshots><enabled>false</enabled></snapshots>
         <releases><enabled>true</enabled></releases>
      </repository>
      <!-- MapR -->
      <repository>
         <id>mapr-releases</id>
         <url>http://repository.mapr.com/maven/</url>
         <snapshots><enabled>false</enabled></snapshots>
         <releases><enabled>true</enabled></releases>
      </repository>
   </repositories>
</profile>

Scala版本

纯粹使用Java API和库的用户可以忽略此部分。

Flink具有用scala编写的API，库和运行时模块。Scala API和库的用户可能必须将Flink的Scala版本与其项目的Scala版本匹配（因为Scala不是严格向后兼容的）。

从版本1.7开始，Flink使用Scala版本2.11和2.12构建。

加密文件系统

如果您的主目录已加密，则可能会遇到java.io.IOException: File name too long异常。某些加密文件系统（如Ubuntu使用的encfs）不允许长文件名，这是导致此错误的原因。

解决方法是添加：

<args>
    <arg>-Xmax-classfile-name</arg>
    <arg>128</arg>
</args>

在pom.xml导致错误的模块文件的编译器配置中。

例如，如果flink-yarn模块中出现错误，则应在<configuration>标记下添加上述代码scala-maven-plugin。

具体配置：

<!-- Scala Compiler -->
<plugin>
   <groupId>net.alchim31.maven</groupId>
   <artifactId>scala-maven-plugin</artifactId>
   <executions>
      <!-- Run scala compiler in the process-resources phase, so that dependencies on
         scala classes can be resolved later in the (Java) compile phase -->
      <execution>
         <id>scala-compile-first</id>
         <phase>process-resources</phase>
         <goals>
            <goal>compile</goal>
         </goals>
      </execution>

      <!-- Run scala compiler in the process-test-resources phase, so that dependencies on
          scala classes can be resolved later in the (Java) test-compile phase -->
      <execution>
         <id>scala-test-compile</id>
         <phase>process-test-resources</phase>
         <goals>
            <goal>testCompile</goal>
         </goals>
      </execution>
   </executions>
   <configuration>
      <jvmArgs>
         <jvmArg>-Xms128m</jvmArg>
         <jvmArg>-Xmx512m</jvmArg>
      </jvmArgs>
      <args>
         <arg>-Xmax-classfile-name</arg>
         <arg>128</arg>
      </args>
   </configuration>
</plugin>

编译常见错误：