Flink1.8实现wordcount

博客记录了在Mac上安装Flink 1.8后,创建名为flink_begin的Maven项目,在pom.xml添加依赖,编写wordcount.java文件,开启端口监听后运行项目。运行中可能出现NoClassDefFoundError问题,可参考指定链接解决,输入内容后控制台会显示单词和词频信息。

之前已经记录了在Mac上安装Flink1.8

https://blog.youkuaiyun.com/zhangvalue/article/details/93166895

1️⃣、开始创建一个项目名为flink_begin的maven项目

2️⃣、pom.xml文件添加如下dependency:

<dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.8.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.8.0</version>
            <scope>provided</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>1.8.0</version>
        </dependency>
    </dependencies>

wordcount.java 文件如下:

/**
 * @ Author zhangsf
 * @CreateTime 2019-06-20 - 18:58
 */

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;


public class WordCount {
    public static void main(String[] args) throws Exception {
        //定义socket的端口号
        int port;
        try {
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            port = parameterTool.getInt("port");
        } catch (Exception e) {
            System.err.println("指定port参数,默认值为9000");
            port = 9000;
        }

        //获取运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //连接socket获取输入的数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", port, "\n");

        //计算数据
        DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
                String[] splits = value.split("\\s");
                for (String word : splits) {
                    out.collect(new WordWithCount(word, 1L));
                }
            }
        })//打平操作,把每行的单词转为<word,count>类型的数据
                .keyBy("word")//针对相同的word数据进行分组
                .timeWindow(Time.seconds(2), Time.seconds(1))//指定计算数据的窗口大小和滑动窗口大小
                .sum("count");

        //把数据打印到控制台
        windowCount.print()
                .setParallelism(1);//使用一个并行度
        //注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
        env.execute("streaming word count");

    }

    /**
     * 主要为了存储单词以及单词出现的次数
     */
    public static class WordWithCount {
        public String word;
        public long count;

        public WordWithCount() {
        }

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return "WordWithCount{" +
                    "word='" + word + '\'' +
                    ", count=" + count +
                    '}';
        }
    }


}

此时需要将本机的终端上开一个端口号为9000的监听

nc -l 9000

准备就绪就可以开始run 起来了

可能会出现问题 

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/datastream/DataStream

 https://blog.youkuaiyun.com/zhangvalue/article/details/93165357 解决

启动起来之后在

输入hello hello world world world world flink flink flink flink flink,回车。在IDEA的控制台会显示如下单词和词频的信息

<think>根据错误信息“CouldnotbuildtheprogramfromJARfile”,这通常意味着Flink在运行JAR文件时找不到主类或入口点。我们需要确保JAR文件正确打包,并且指定了正确的主类。以下是详细的解决方案:###解决方案步骤1.**确认主类名称**:在运行命令中,我们使用`-c`参数指定主类。请确保类名完全匹配(包括包名)。例如,如果`WordCount`类在包`com.example`中,则类全名为`com.example.WordCount`。2.**正确打包JAR文件**:使用Maven或Gradle构建包含所有依赖的JAR(uber-jar或fat-jar)。确保在打包时指定了主类。3.**检查JAR文件结构**:使用`jartfyour_jar_file.jar`命令检查JAR文件内容,确认`WordCount.class`位于正确的路径下。###修改后的打包配置(Maven)在`pom.xml`中配置`maven-jar-plugin`指定主类:```xml<build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>3.2.2</version><configuration><archive><manifest><mainClass>WordCount</mainClass><!--如果WordCount在默认包,否则使用全限定名--></manifest></archive></configuration></plugin><!--继续使用maven-shade-plugin--><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>3.2.4</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters></configuration></execution></executions></plugin></plugins></build>```###运行命令调整如果类不在默认包,必须使用全限定类名。例如,如果类在包`com.example`中:```bashflinkrun-ccom.example.WordCountyour_jar_file.jar--port9000```###备选方案:在JAR清单中指定主类如果不想每次运行都指定`-c`,可以在JAR的清单文件中指定主类。修改`maven-jar-plugin`配置:```xml<manifest><mainClass>com.example.WordCount</mainClass></manifest>```然后运行命令可以简化为:```bashflinkrunyour_jar_file.jar--port9000```###错误排查1.**检查Flink日志**:查看Flink的日志文件(通常在log目录下)获取详细错误。2.**验证JAR文件**:使用以下命令验证JAR文件是否可执行:```bashjava-jaryour_jar_file.jar--port9000```注意:Flink程序不能这样运行,但可以检查主类是否有效。3.**确认Flink版本兼容性**:确保构建JAR使用的Flink版本与运行环境一致。###完整代码修正版如果之前的代码没有包名,以下是不带包名的版本(默认包),可直接使用:```javaimportorg.apache.flink.api.common.functions.FlatMapFunction;importorg.apache.flink.api.java.utils.ParameterTool;importorg.apache.flink.streaming.api.datastream.DataStream;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.api.windowing.time.Time;importorg.apache.flink.util.Collector;publicclassWordCount{publicstaticvoidmain(String[]args)throwsException{finalintport;try{finalParameterToolparams=ParameterTool.fromArgs(args);port=params.getInt("port",9999);}catch(Exceptione){System.err.println("Noportspecified.Pleaserunwith--port[port]");return;}finalStreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();DataStream<String>text=env.socketTextStream("localhost",port,"\n");DataStream<WordWithCount>counts=text.flatMap(newFlatMapFunction<String,WordWithCount>(){publicvoidflatMap(Stringvalue,Collector<WordWithCount>out){for(Stringword:value.split("\\s")){if(!word.isEmpty()){out.collect(newWordWithCount(word,1L));}}}}).keyBy("word").timeWindow(Time.seconds(5)).reduce((a,b)->newWordWithCount(a.word,a.count+b.count));counts.print().setParallelism(1);env.execute("SocketWindowWordCount");}publicstaticclassWordWithCount{publicStringword;publiclongcount;publicWordWithCount(){}publicWordWithCount(Stringword,longcount){this.word=word;this.count=count;}@OverridepublicStringtoString(){returnword+":"+count;}}}```###构建和运行命令总结```bash#构建mvncleanpackage#运行(假设主类为WordCount且无包名)flinkrun-cWordCounttarget/your_jar_file.jar--port9000```如果问题仍然存在,请检查Flink集群状态和网络连接。§§相关问题§§:1.如何将Flink程序打包成可以在YARN集群上运行的JAR?2.Flink中如何处理有状态计算和故障恢复?3.如何为Flink程序设置检查点(Checkpoint)以实现容错?4.Flink的窗口函数有哪些类型(滚动、滑动、会话)?如何使用它们?</think># 解决 Flink JAR 运行问题:"Could not build the program from JAR file" 这个错误通常发生在 Flink 无法正确加载或执行 JAR 文件时。我将提供全面的解决方案,包括代码修正、构建配置和运行调试方法。 ## 完整解决方案 ### 1. 修正的 WordCount 程序代码 ```java import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.util.Collector; public class WordCount { public static void main(String[] args) throws Exception { // 1. 解析命令行参数 final ParameterTool params = ParameterTool.fromArgs(args); final String host = params.get("host", "localhost"); final int port = params.getInt("port", 9999); // 2. 获取执行环境 final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // 3. 设置全局参数 env.getConfig().setGlobalJobParameters(params); // 4. 连接数据源 DataStream<String> text = env.socketTextStream(host, port, "\n"); // 5. 数据处理流程 DataStream<WordCount.WordWithCount> counts = text .flatMap(new Tokenizer()) .keyBy("word") .timeWindow(Time.seconds(5)) .sum("count"); // 6. 输出结果 counts.print().setParallelism(1); // 7. 执行任务 env.execute("Socket Window WordCount"); } // 使用静态内部类实现Tokenizer public static final class Tokenizer implements FlatMapFunction<String, WordWithCount> { @Override public void flatMap(String value, Collector<WordWithCount> out) { // 转换为小写并分割单词 String[] words = value.toLowerCase().split("\\W+"); for (String word : words) { if (!word.isEmpty()) { out.collect(new WordWithCount(word, 1L)); } } } } // POJO类:存储单词和计数 public static class WordWithCount { public String word; public long count; public WordWithCount() {} // Flink需要空构造函数 public WordWithCount(String word, long count) { this.word = word; this.count = count; } @Override public String toString() { return word + " : " + count; } } } ``` ### 2. Maven 构建配置 (pom.xml) ```xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.flink.example</groupId> <artifactId>wordcount</artifactId> <version>1.0.0</version> <packaging>jar</packaging> <properties> <flink.version>1.17.1</flink.version> <java.version>1.8</java.version> <scala.binary.version>2.12</scala.binary.version> <maven.compiler.source>${java.version}</maven.compiler.source> <maven.compiler.target>${java.version}</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-core</artifactId> <version>${flink.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.1</version> <configuration> <source>${java.version}</source> <target>${java.version}</target> <encoding>${project.build.sourceEncoding}</encoding> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.4.1</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <artifactSet> <excludes> <exclude>org.apache.flink:force-shading</exclude> <exclude>com.google.code.findbugs:jsr305</exclude> <exclude>org.slf4j:*</exclude> <exclude>org.apache.logging.log4j:*</exclude> </excludes> </artifactSet> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>WordCount</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build> </project> ``` ### 3. 构建和运行脚本 ```bash #!/bin/bash # 1. 构建项目 echo "Building project with Maven..." mvn clean package # 检查构建是否成功 if [ $? -ne 0 ]; then echo "Build failed! Check Maven errors." exit 1 fi JAR_FILE=target/wordcount-1.0.0.jar # 2. 启动 Netcat 服务器 (后台运行) echo "Starting Netcat server on port 9000..." nc -lk 9000 > /dev/null & NC_PID=$! # 3. 运行 Flink 程序 echo "Submitting Flink job..." flink run -d \ -c WordCount \ $JAR_FILE \ --port 9000 # 4. 监控作业状态 echo "Job submitted. Monitoring output..." sleep 3 # 等待作业启动 flink list echo "Enter some text in the Netcat terminal to see word counts..." echo "Press Ctrl+C to stop" # 5. 清理 trap "echo 'Stopping...'; kill $NC_PID; flink cancel -m localhost:8081 $(flink list | grep 'Socket Window WordCount' | awk '{print $4}')" SIGINT # 等待用户输入 read -p "Press any key to exit..." -n1 -s echo "" kill $NC_PID ``` ## 错误原因及解决方案 ### 常见错误原因 1. **JAR 文件未正确打包**: - 缺少主类声明 - 依赖未正确包含 2. **类路径问题**: - Flink 找不到主类 - 依赖冲突 3. **环境配置问题**: - Flink 版本不匹配 - Java 版本不兼容 4. **代码问题**: - 缺少必要的空构造函数 - POJO 类不符合要求 ### 解决方案 1. **验证 JAR 文件内容**: ```bash # 检查JAR内容 jar tf target/wordcount-1.0.0.jar # 检查清单文件 unzip -p target/wordcount-1.0.0.jar META-INF/MANIFEST.MF ``` 2. **直接运行主类(本地调试)**: ```bash java -cp target/wordcount-1.0.0.jar WordCount --port 9000 ``` 3. **使用 Flink 客户端调试**: ```bash flink run -c WordCount target/wordcount-1.0.0.jar --port 9000 ``` 4. **检查 Flink 日志**: ```bash # 查看 Flink 任务管理器日志 tail -f flink-root-taskmanager-*.out ``` ## 运行示例 1. **启动 Netcat 服务器**: ```bash nc -lk 9000 ``` 2. **提交 Flink 作业**: ```bash flink run -c WordCount target/wordcount-1.0.0.jar --port 9000 ``` 3. **输入测试数据**: ```text Hello Flink Streaming processing is powerful Flink handles stateful computations ``` 4. **查看输出结果**: ```text flink : 2 hello : 1 streaming : 1 processing : 1 is : 1 powerful : 1 handles : 1 stateful : 1 computations : 1 ``` ## 关键改进点 1. **显式指定主类**: ```xml <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>WordCount</mainClass> </transformer> ``` 2. **优化依赖管理**: - 使用 `<scope>provided</scope>` 避免打包 Flink 核心库 - 排除不必要的依赖减少 JAR 大小 3. **增强错误处理**: ```java try { final ParameterTool params = ParameterTool.fromArgs(args); port = params.getInt("port", 9999); } catch (Exception e) { System.err.println("No port specified. Please run with --port [port]"); return; } ``` 4. **使用静态内部类**: ```java public static final class Tokenizer implements FlatMapFunction<String, WordWithCount> { // ... } ``` 确保序列化兼容性
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

zhangvalue

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值