Flink 流处理编程体验

文章介绍了如何使用ApacheFlink的Flink-Quickstart构建一个简单的WordCount程序,涉及配置Maven项目、设置依赖、编写Java代码实现流处理过程,以及事件驱动应用程序的概念和应用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

使用 flink-quickstart 构建一个 Flink 程序 https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/configuration/overview/

在这里插入图片描述
在这里插入图片描述

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>org.example</groupId>
	<artifactId>untitled</artifactId>
	<version>1.0-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>Flink Quickstart Job</name>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<flink.version>1.16.0</flink.version>
		<target.java.version>1.8</target.java.version>
		<scala.binary.version>2.12</scala.binary.version>
		<maven.compiler.source>${target.java.version}</maven.compiler.source>
		<maven.compiler.target>${target.java.version}</maven.compiler.target>
		<log4j.version>2.17.1</log4j.version>
	</properties>

	<repositories>
		<repository>
			<id>apache.snapshots</id>
			<name>Apache Development Snapshot Repository</name>
			<url>https://repository.apache.org/content/repositories/snapshots/</url>
			<releases>
				<enabled>false</enabled>
			</releases>
			<snapshots>
				<enabled>true</enabled>
			</snapshots>
		</repository>
	</repositories>

	<dependencies>
		<!-- Apache Flink dependencies -->
		<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-streaming-java</artifactId>
			<version>${flink.version}</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-clients</artifactId>
			<version>${flink.version}</version>
			<scope>provided</scope>
		</dependency>

		<!-- Add connector dependencies here. They must be in the default scope (compile). -->

		<!-- Example:

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-connector-kafka</artifactId>
			<version>${flink.version}</version>
		</dependency>
		-->

		<!-- Add logging framework, to produce console output when running in the IDE. -->
		<!-- These dependencies are excluded from the application JAR by default. -->
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-slf4j-impl</artifactId>
			<version>${log4j.version}</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-api</artifactId>
			<version>${log4j.version}</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-core</artifactId>
			<version>${log4j.version}</version>
			<scope>runtime</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>

			<!-- Java Compiler -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.1</version>
				<configuration>
					<source>${target.java.version}</source>
					<target>${target.java.version}</target>
				</configuration>
			</plugin>

			<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
			<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-shade-plugin</artifactId>
				<version>3.1.1</version>
				<executions>
					<!-- Run shade goal on package phase -->
					<execution>
						<phase>package</phase>
						<goals>
							<goal>shade</goal>
						</goals>
						<configuration>
							<createDependencyReducedPom>false</createDependencyReducedPom>
							<artifactSet>
								<excludes>
									<exclude>org.apache.flink:flink-shaded-force-shading</exclude>
									<exclude>com.google.code.findbugs:jsr305</exclude>
									<exclude>org.slf4j:*</exclude>
									<exclude>org.apache.logging.log4j:*</exclude>
								</excludes>
							</artifactSet>
							<filters>
								<filter>
									<!-- Do not copy the signatures in the META-INF folder.
									Otherwise, this might cause SecurityExceptions when using the JAR. -->
									<artifact>*:*</artifact>
									<excludes>
										<exclude>META-INF/*.SF</exclude>
										<exclude>META-INF/*.DSA</exclude>
										<exclude>META-INF/*.RSA</exclude>
									</excludes>
								</filter>
							</filters>
							<transformers>
								<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
								<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
									<mainClass>org.example.DataStreamJob</mainClass>
								</transformer>
							</transformers>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>

		<pluginManagement>
			<plugins>

				<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
				<plugin>
					<groupId>org.eclipse.m2e</groupId>
					<artifactId>lifecycle-mapping</artifactId>
					<version>1.0.0</version>
					<configuration>
						<lifecycleMappingMetadata>
							<pluginExecutions>
								<pluginExecution>
									<pluginExecutionFilter>
										<groupId>org.apache.maven.plugins</groupId>
										<artifactId>maven-shade-plugin</artifactId>
										<versionRange>[3.1.1,)</versionRange>
										<goals>
											<goal>shade</goal>
										</goals>
									</pluginExecutionFilter>
									<action>
										<ignore/>
									</action>
								</pluginExecution>
								<pluginExecution>
									<pluginExecutionFilter>
										<groupId>org.apache.maven.plugins</groupId>
										<artifactId>maven-compiler-plugin</artifactId>
										<versionRange>[3.1,)</versionRange>
										<goals>
											<goal>testCompile</goal>
											<goal>compile</goal>
										</goals>
									</pluginExecutionFilter>
									<action>
										<ignore/>
									</action>
								</pluginExecution>
							</pluginExecutions>
						</lifecycleMappingMetadata>
					</configuration>
				</plugin>
			</plugins>
		</pluginManagement>
	</build>
</project>

Word Count

public class WordCount {

    public static void main(String[] args) throws Exception {

        // 获取流执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 设置并行度
        env.setParallelism(1);

        // 从集合中获取数据源
        DataStreamSource<String> sourceStream = env
                .fromElements("Flink", "Kafka", "Flink", "Hadoop", "Redis", "Flink", "Hadoop");

        // map 操作
        SingleOutputStreamOperator<Tuple2<String, Long>> mapStream = sourceStream
                // 第一个参数输入类型
                // 第二个参数输出类型
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String word) throws Exception {
                        return Tuple2.of(word, 1L);
                    }
                });

        // 按 word 分组
        KeyedStream<Tuple2<String, Long>, String> keyedStream = mapStream
                // 第一个参数流中元素的类型
                // 第二个元素 key 类型
                .keyBy(new KeySelector<Tuple2<String, Long>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Long> value) throws Exception {
                        return value.f0;
                    }
                });


        // reduce 操作
        // reduce 会维护一个累加器
        // 第一条数据到来,作为累加输出
        // 第二条数据到来,和累加器进行聚合操作,然后输出累加器
        SingleOutputStreamOperator<Tuple2<String, Long>> resultStream = keyedStream
                .reduce(new ReduceFunction<Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> reduce(Tuple2<String, Long> value1, Tuple2<String, Long> value2) throws Exception {
                        return Tuple2.of(value1.f0, value1.f1 + value2.f1);
                    }
                });

        // 将结果输出到控制台
        resultStream.print();

        // 执行程序
        env.execute("Word Count");

    }

}

在这里插入图片描述
在这里插入图片描述

Flink 是事件驱动型程序,来一条数据处理一条
第一条数据来的是 “Flink”

  • 第一步先进行 map 操作,将单词转为 Tuple2 类型,f0 为单词,f1 为统计值 -> (Flink,1)
  • 第二步进行 keyBy 操作,按 Tuple2 的 f0 进行分组,即 “Flink”
  • 第三步进行聚合操作,第一条到来的数据作为结果,保存到累加器,并直接输出
  • 第四步打印到控制台 -> (Flink,1)

第二条数据来的是 “Kafka”

  • 第一步先进行 map 操作,将单词转为 Tuple2 类型,f0 为单词,f1 为统计值 -> (Kafka,1)
  • 第二步进行 keyBy 操作,按 Tuple2 的 f0 进行分组,即 “Kafka”
  • 第三步进行聚合操作,由于与 “Flink” 与 "Kafka"分组不同,所以累加器不会相互影响,此处的 (Kafka,1) 为该分组的第一条数据,直接将其输出
  • 重复第一条数据的四步操作,最终打印 (Kafka,1)

第三条数据来的是 “Flink”

  • 前两部操作与之前的操作相同
  • 第三步聚合操作,reduce 中维护了一个累加器,后面来的数据,与之前的结果进行累加操作,并将累加后的结果数据
  • 第四步打印到控制台 -> (Flink,2)

事件驱动应用程序

事件驱动的应用程序是一类有状态的流应用程序,它从一个或多个事件流提取数据,并根据到来的事件触发计算。根据业务逻辑,事件驱动的应用程序可以触发诸如发送警报、或电子邮件之类的操作,或者将事件写入向外发送的事件流以供另一个应用程序使用。
事件驱动应用程序的典型场景包括:
实时推荐
行为模式检测或复杂事件处理
异常检测
事件驱动应用程序是微服务的演变。它们通过事件日志而不是 REST 调用进行通信,并将应用程序数据保存为本地状态,而不是将其写入外部数据存储区(例如关系数据库或键值数据库)。下图显示了由事件驱动的流应用程序组成的服务架构。
在这里插入图片描述

上图中的应用程序通过事件日志连接。一个应用程序将其输出发送到事件日志通道(Kafka),另一个应用程序使用其他应用程序发出的事件。事件日志通道将发送者和接收者分离,并提供异步、非阻塞的事件传输。每个应用程序都可以是有状态的,并且可以本地管理自己的状态而无需访问外部数据存储。应用程序也可以单独处理和扩展。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值