Flink version: 1.11.3
目录
2. 在resource下创建数据源文件 hello.txt
3. 创建 com.mit.wc.WordCount.class
1. 创建project
使用gradle创建项目,build.gradle 配置内容为
plugins {
id 'java'
}
apply plugin: 'java'
apply plugin: 'idea'
sourceCompatibility = 1.8
def flinkVersion = '1.11.3'
group 'org.mit'
version '0.1'
repositories {
maven {
url 'http://maven.aliyun.com/nexus/content/groups/public/'
}
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.12'
compile group: 'org.apache.flink', name: 'flink-java', version: flinkVersion
compile group: 'org.apache.flink', name: 'flink-streaming-java_2.12', version: flinkVersion
compile group: 'org.apache.flink', name: 'flink-streaming-scala_2.12', version: flinkVersion
}
task createDirs {
sourceSets*.java.srcDirs*.each {
it.mkdirs()
}
sourceSets*.resources.srcDirs*.each{
it.mkdirs()
}
}
[compileJava, javadoc, compileTestJava]*.options*.encoding = 'UTF-8'
//清除上次的编译过的文件
task clearPj(type:Delete){
delete 'build','target'
}
task copyJar(type:Copy){
from configurations.runtime
into ('build/libs/lib')
}
//把JAR复制到目标目录
task release(type: Copy,dependsOn: [build,copyJar]) {
// from 'conf'
// into ('build/libs/eachend/conf') // 目标位置
}
这里用到 flink-streaming-scala_2.12 是因为 flink 运行时 底层用到了scala的高并发框架 akka。
2. 在resource下创建数据源文件 hello.txt
hello asd
hello asfcas
hello DFV
ASDf asdfg
ASDF ASD
asd asdadfas
asdfwert asdgf
asdf asd
3. 创建 com.mit.wc.WordCount.class
package com.mit.wc;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
/**
* @Author mit
* @Description 批处理WordCount
* @Date 2020/12/22 下午8:31
* @Version 1.0
*/
public class WordCount {
public static void main(String[] args) throws Exception {
// 创建执行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// 从文件中读取数据
String inputPath = "/home/mit/my_project/big_data/flink/flink_java_study/src/main/resources/hello.txt";
DataSet<String> inputDataSet = env.readTextFile(inputPath);
// 对数据集进行处理,按空格展开为(word, 1)的二元组进行统计
DataSet<Tuple2<String, Integer>> resultSet = inputDataSet.flatMap( new MyFlatMapper() )
.groupBy(0)
.sum(1);
resultSet.print();
}
// 自定义类,实现FlatMapFunction接口
public static class MyFlatMapper implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
// 按空格分词
String[] words = value.split(" ");
// 遍历所有word,包成二元组输出
for (String word: words) {
out.collect(new Tuple2<>(word, 1));
}
}
}
}
3. 运行WordCount
第一次运行时提示异常 No ExecutorFactory found to execute the application

解决方法:
在build.gradle中增加依赖项
compile group: 'org.apache.flink', name: 'flink-clients_2.12', version: flinkVersion
重新运行成功

本文介绍如何使用Apache Flink 1.11.3版本完成简单的批处理WordCount任务。文章首先指导如何通过Gradle创建项目并配置依赖,接着展示了如何设置数据源文件,并提供了WordCount程序的具体实现代码,最后说明了运行过程中可能遇到的问题及解决办法。
1757

被折叠的 条评论
为什么被折叠?



