Apache Flink源码解析之stream-sink

本文深入解析Flink中的SinkFunction,包括其核心接口、内置实现如WriteSinkFunction、SocketClientSink等,并介绍了第三方系统连接器中的Sink实现。

上一篇我们谈论了Flink stream source,它作为流的数据入口是整个DAG(有向无环图)拓扑的起点。那么与此对应的,流的数据出口就是跟source对应的Sink。这是我们本篇解读的内容。

SinkFunction

SourceFunction对应,Flink针对Sink的根接口被称为SinkFunction。继承自Function这一标记接口。SinkFunction接口只提供了一个方法:

    void invoke(IN value) throws Exception;

该方法提供基于记录级别的调用(也就是每个被输出的记录都会调用该接口一次)。上面方法的参数value即为需要输出的记录。

SinkFunction相对来说比较简洁,下面我们来看一下它的实现者。

内置的SinkFunction

同样,我们先来看一下完整的类型继承体系:

flink-stream-sink_all-class-diagram

DiscardingSink

这是最简单的SinkFunction的实现,它的实现等同于没有实现(其实现为空方法)。它的作用就是将记录丢弃掉。它的主要场景应该是那些无需最终处理结果的记录。

WriteSinkFunction

WriteSinkFunction是一个抽象类。该类的主要作用是将需要输出的tuples(元组)作为简单的文本输出到指定路径的文件中去,元组被收集到一个list中去,然后周期性得写入文件。

WriteSinkFunction的构造器接收两个参数:

  • path : 需要写入的文件路径
  • format : WriteFormat的实例,用于指定写入数据的格式

在构造器中,它调用方法cleanFile,该方法用于初始化指定path的文件。初始化的行为是:如果不存在则创建,如果存在则清空

invoke方法的实现:

    public void invoke(IN tuple) {

        tupleList.add(tuple);
        if (updateCondition()) {
            format.write(path, tupleList);
            resetParameters();
        }

    }

从实现来看,其先将需要sink的元组加入内部集合。然后调用updateCondition方法。该方法是WriteSinkFunction定义的抽象方法。用于实现判断将tupleList写入文件以及清空tupleList的条件。接着将集合中的tuple写入到指定的文件中。最后又调用了resetParameters方法。该方法同样是一个抽象方法,它的主要用途是当写入的场景是批量写入时,可能会有一些状态参数,该方法就是用于对状态进行reset。

WriteSinkFunctionByMillis

该类是WriteSinkFunction的实现类。它支持以指定的毫秒数间隔将tuple批量写入文件。间隔由构造器参数millis指定。在内部,WriteSinkFunctionlastTime维护上一次写入的时间状态。它主要涉及上面提到的两个抽象方法的实现:

    protected boolean updateCondition() {
        return System.currentTimeMillis() - lastTime >= millis;
    }

updateCondition的实现很简单,拿当前主机的当前时间戳跟上一次的执行时间戳状态作对比:如果大于指定的间隔,则条件为真,触发写入。

    protected void resetParameters() {
        tupleList.clear();
        lastTime = System.currentTimeMillis();
    }

resetParameters实现是先清空tupleList,然后将lastTime老的时间戳状态覆盖为最新时间戳。

WriteFormat

一个写入格式的抽象类,提供了两种实现:

  • WriteFormatAsText : 以原样文本的形式写入指定路径的文件
  • WriteFormatAsCsv : 以csv格式写入指定文件

RichSinkFunction

RichSinkFunction通过继承AbstractRichFunction为实现一个rich SinkFunction提供基础(AbstractRichFunction提供了一个open/close方法对,以及获取运行时上下文对象手段)。RichSinkFunction也是抽象类,它有三个具体实现。

SocketClientSink

支持以socket的方式将数据发送到特定目标主机所在的服务器作为flink stream的sink。数据被序列化为byte array然后写入到socket。该sink支持失败重试模式的消息发送。该sink 可启用autoFlush,如果启用,那么会导致吞吐量显著下降,但延迟也会降低。该方法的构造器,提供的参数:

  • hostName : 待连接的server的host name
  • port : server的端口
  • schema :SerializationSchema的实例,用于序列化对象。
  • maxNumRetries : 最大重试次数(-1为无限重试)
  • autoflush : 是否自动flush

重试的策略在invoke方法中,当发送失败时进入到异常捕捉块中进行。

OutputFormatSinkFunction

一个将记录写入OutputFormat的SinkFunction的实现。

OutputFormat :定义被消费记录的输出接口。指定了最终的记录如何被存储,比如文件就是一种存储实现。

PrintSinkFunction

该实现用于将每条记录输出到标准输出流(stdOut)或标准错误流(stdErr)。在输出时,如果当前task的并行subtask实例个数大于1,也就是说当前task是并行执行的(同时存在多个实例),那么在输出每条记录之前会输出一个prefix前缀。prefix为在全局上下文中当前subtask的位置。

常见连接器中的Sink

Flink自身提供了一些针对第三方主流开源系统的连接器支持,它们有:

  • elasticsearch
  • flume
  • kafka(0.8/0.9版本)
  • nifi
  • rabbitmq
  • twitter

这些第三方系统(除了twitter)的sink,无一例外都是继承自RichSinkFunction

小结

这篇文章我们主要谈及了Flink的stream sink相关的设计、实现。当然这个主题还没有完全谈完,还会有后续篇幅继续解读。



原文发布时间为:2016-05-07


本文作者:vinoYang


本文来自云栖社区合作伙伴优快云博客,了解相关信息可以关注优快云博客。

oscar@oscardeMacBook-Pro bz-sport-realtime % mvn dependency:tree [INFO] Scanning for projects... [INFO] [INFO] --------------------< com.sport:bz-sport-realtime >--------------------- [INFO] Building bz-sport-realtime 1.0-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [WARNING] 2 problems were encountered while building the effective model for org.apache.yetus:audience-annotations:jar:0.5.0 during dependency collection step for project (use -X to see details) [INFO] [INFO] --- dependency:3.7.0:tree (default-cli) @ bz-sport-realtime --- [INFO] com.sport:bz-sport-realtime:jar:1.0-SNAPSHOT [INFO] +- org.apache.flink:flink-java:jar:1.20.2:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.12.0:compile [INFO] | +- org.apache.commons:commons-math3:jar:3.6.1:compile [INFO] | +- com.twitter:chill-java:jar:0.7.6:compile [INFO] | +- org.slf4j:slf4j-api:jar:1.7.36:compile [INFO] | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile [INFO] +- org.apache.flink:flink-core:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-core-api:jar:1.20.2:compile [INFO] | | \- org.apache.flink:flink-metrics-core:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-annotations:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-shaded-asm-9:jar:9.5-17.0:compile [INFO] | +- org.apache.flink:flink-shaded-jackson:jar:2.14.2-17.0:compile [INFO] | +- org.snakeyaml:snakeyaml-engine:jar:2.6:compile [INFO] | +- org.apache.commons:commons-text:jar:1.10.0:compile [INFO] | +- com.esotericsoftware.kryo:kryo:jar:2.24.0:compile [INFO] | | +- com.esotericsoftware.minlog:minlog:jar:1.2:compile [INFO] | | \- org.objenesis:objenesis:jar:2.1:compile [INFO] | +- commons-collections:commons-collections:jar:3.2.2:compile [INFO] | +- org.apache.commons:commons-compress:jar:1.26.0:compile [INFO] | \- org.apache.flink:flink-shaded-guava:jar:31.1-jre-17.0:compile [INFO] +- org.apache.flink:flink-connector-base:jar:1.20.2:compile [INFO] +- org.apache.flink:flink-streaming-java:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-file-sink-common:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-runtime:jar:1.20.2:compile [INFO] | | +- org.apache.flink:flink-rpc-core:jar:1.20.2:compile [INFO] | | +- org.apache.flink:flink-rpc-akka-loader:jar:1.20.2:compile [INFO] | | +- org.apache.flink:flink-queryable-state-client-java:jar:1.20.2:compile [INFO] | | +- org.apache.flink:flink-hadoop-fs:jar:1.20.2:compile [INFO] | | +- org.apache.flink:flink-shaded-zookeeper-3:jar:3.7.1-17.0:compile [INFO] | | +- org.javassist:javassist:jar:3.24.0-GA:compile [INFO] | | \- tools.profiler:async-profiler:jar:2.9:compile [INFO] | \- org.apache.flink:flink-connector-datagen:jar:1.20.2:compile [INFO] +- org.apache.flink:flink-clients:jar:1.20.2:compile [INFO] | +- org.apache.flink:flink-optimizer:jar:1.20.2:compile [INFO] | +- commons-cli:commons-cli:jar:1.5.0:compile [INFO] | \- org.apache.flink:flink-datastream:jar:1.20.2:compile [INFO] | \- org.apache.flink:flink-datastream-api:jar:1.20.2:compile [INFO] +- org.apache.flink:flink-runtime-web:jar:1.20.2:compile [INFO] | \- org.apache.flink:flink-shaded-netty:jar:4.1.91.Final-17.0:compile [INFO] +- org.apache.flink:flink-connector-kafka:jar:3.4.0-1.20:compile [INFO] | +- com.google.guava:guava:jar:32.1.2-jre:compile [INFO] | | +- com.google.guava:failureaccess:jar:1.0.1:compile [INFO] | | +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile [INFO] | | +- org.checkerframework:checker-qual:jar:3.33.0:compile [INFO] | | +- com.google.errorprone:error_prone_annotations:jar:2.18.0:compile [INFO] | | \- com.google.j2objc:j2objc-annotations:jar:2.8:compile [INFO] | +- com.fasterxml.jackson.core:jackson-core:jar:2.15.2:compile [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.15.2:compile [INFO] | +- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.15.2:compile [INFO] | \- com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.15.2:compile [INFO] +- org.apache.flink:flink-table-api-java:jar:1.20.2:compile [INFO] +- org.apache.flink:flink-table-api-java-bridge:jar:1.20.2:provided [INFO] | \- org.apache.flink:flink-table-api-bridge-base:jar:1.20.2:provided [INFO] +- org.apache.flink:flink-table-planner_2.12:jar:1.20.2:compile [INFO] | +- org.immutables:value:jar:2.8.8:compile [INFO] | +- org.immutables:value-annotations:jar:2.8.8:compile [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.1.10:compile [INFO] | +- org.codehaus.janino:janino:jar:3.1.10:compile [INFO] | +- org.apache.flink:flink-scala_2.12:jar:1.20.2:compile [INFO] | | +- org.scala-lang:scala-reflect:jar:2.12.7:compile [INFO] | | +- org.scala-lang:scala-library:jar:2.12.7:compile [INFO] | | +- org.scala-lang:scala-compiler:jar:2.12.7:compile [INFO] | | | \- org.scala-lang.modules:scala-xml_2.12:jar:1.0.6:compile [INFO] | | \- com.twitter:chill_2.12:jar:0.7.6:compile [INFO] | \- org.apache.flink:flink-table-runtime:jar:1.20.2:compile [INFO] | \- org.apache.flink:flink-cep:jar:1.20.2:compile [INFO] +- org.apache.flink:flink-table-common:jar:1.20.2:compile [INFO] | \- com.ibm.icu:icu4j:jar:67.1:compile [INFO] +- org.apache.doris:flink-doris-connector-1.16:jar:25.1.0:compile [INFO] +- org.apache.flink:flink-connector-jdbc:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-core:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-cratedb:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-db2:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-mysql:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-oceanbase:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-oracle:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-postgres:jar:3.3.0-1.20:compile [INFO] | +- org.apache.flink:flink-connector-jdbc-sqlserver:jar:3.3.0-1.20:compile [INFO] | \- org.apache.flink:flink-connector-jdbc-trino:jar:3.3.0-1.20:compile [INFO] +- org.apache.flink:flink-connector-mysql-cdc:jar:3.4.0:compile [INFO] | +- org.apache.flink:flink-connector-debezium:jar:3.4.0:compile [INFO] | | +- io.debezium:debezium-api:jar:1.9.8.Final:compile [INFO] | | \- io.debezium:debezium-embedded:jar:1.9.8.Final:compile [INFO] | | +- org.apache.kafka:connect-api:jar:3.2.0:compile [INFO] | | | \- javax.ws.rs:javax.ws.rs-api:jar:2.1.1:runtime [INFO] | | +- org.apache.kafka:connect-runtime:jar:3.2.0:compile [INFO] | | | +- org.apache.kafka:connect-transforms:jar:3.2.0:compile [INFO] | | | +- org.apache.kafka:kafka-tools:jar:3.2.0:runtime [INFO] | | | | \- net.sourceforge.argparse4j:argparse4j:jar:0.7.0:runtime [INFO] | | | +- org.bitbucket.b_c:jose4j:jar:0.7.9:runtime [INFO] | | | +- com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.12.6:runtime [INFO] | | | | +- com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.12.6:runtime [INFO] | | | | \- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.12.6:runtime [INFO] | | | | \- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.2:runtime [INFO] | | | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.34:runtime [INFO] | | | | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.34:runtime [INFO] | | | | | \- org.glassfish.hk2.external:jakarta.inject:jar:2.6.1:runtime [INFO] | | | | \- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.6:runtime [INFO] | | | +- org.glassfish.jersey.inject:jersey-hk2:jar:2.34:runtime [INFO] | | | | \- org.glassfish.hk2:hk2-locator:jar:2.6.1:runtime [INFO] | | | | +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.6.1:runtime [INFO] | | | | +- org.glassfish.hk2:hk2-api:jar:2.6.1:runtime [INFO] | | | | \- org.glassfish.hk2:hk2-utils:jar:2.6.1:runtime [INFO] | | | +- javax.activation:activation:jar:1.1.1:compile [INFO] | | | +- org.eclipse.jetty:jetty-servlets:jar:9.4.44.v20210927:runtime [INFO] | | | | \- org.eclipse.jetty:jetty-continuation:jar:9.4.44.v20210927:runtime [INFO] | | | +- org.eclipse.jetty:jetty-client:jar:9.4.44.v20210927:runtime [INFO] | | | +- org.reflections:reflections:jar:0.9.12:runtime [INFO] | | | \- org.apache.maven:maven-artifact:jar:3.8.4:runtime [INFO] | | | \- org.codehaus.plexus:plexus-utils:jar:3.3.0:runtime [INFO] | | +- org.apache.kafka:connect-json:jar:3.2.0:compile [INFO] | | \- org.apache.kafka:connect-file:jar:3.2.0:compile [INFO] | +- io.debezium:debezium-connector-mysql:jar:1.9.8.Final:compile [INFO] | | +- io.debezium:debezium-core:jar:1.9.8.Final:compile [INFO] | | +- io.debezium:debezium-ddl-parser:jar:1.9.8.Final:compile [INFO] | | | \- org.antlr:antlr4-runtime:jar:4.8:compile [INFO] | | +- com.zendesk:mysql-binlog-connector-java:jar:0.27.2:compile [INFO] | | \- mysql:mysql-connector-java:jar:8.0.28:compile [INFO] | +- org.apache.flink:flink-cdc-common:jar:3.4.0:compile [INFO] | +- com.esri.geometry:esri-geometry-api:jar:2.2.0:compile [INFO] | +- com.zaxxer:HikariCP:jar:4.0.3:compile [INFO] | \- org.apache.flink:flink-cdc-runtime:jar:3.4.0:compile [INFO] +- com.mysql:mysql-connector-j:jar:8.0.33:compile [INFO] | \- com.google.protobuf:protobuf-java:jar:3.21.9:compile [INFO] +- org.apache.bahir:flink-connector-redis_2.12:jar:1.1.0:compile [INFO] | +- org.apache.flink:flink-streaming-java_2.12:jar:1.14.5:compile [INFO] | | \- org.apache.flink:flink-shaded-force-shading:jar:14.0:compile [INFO] | \- org.apache.flink:flink-table-api-java-bridge_2.12:jar:1.14.5:compile [INFO] +- redis.clients:jedis:jar:5.2.0:compile [INFO] | +- org.apache.commons:commons-pool2:jar:2.12.0:compile [INFO] | +- org.json:json:jar:20240303:compile [INFO] | \- com.google.code.gson:gson:jar:2.11.0:compile [INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.14:compile [INFO] | +- org.apache.httpcomponents:httpcore:jar:4.4.16:compile [INFO] | +- commons-logging:commons-logging:jar:1.2:compile [INFO] | \- commons-codec:commons-codec:jar:1.11:compile [INFO] +- com.alibaba.fastjson2:fastjson2:jar:2.0.53:compile [INFO] +- org.apache.flink:flink-json:jar:1.20.2:compile [INFO] +- org.projectlombok:lombok:jar:1.18.30:provided [INFO] +- org.apache.hadoop:hadoop-common:jar:3.3.6:compile [INFO] | +- org.apache.hadoop.thirdparty:hadoop-shaded-protobuf_3_7:jar:1.1.1:compile [INFO] | +- org.apache.hadoop:hadoop-annotations:jar:3.3.6:compile [INFO] | +- org.apache.hadoop.thirdparty:hadoop-shaded-guava:jar:1.1.1:compile [INFO] | +- commons-io:commons-io:jar:2.8.0:compile [INFO] | +- commons-net:commons-net:jar:3.9.0:compile [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile [INFO] | +- jakarta.activation:jakarta.activation-api:jar:1.2.1:runtime [INFO] | +- org.eclipse.jetty:jetty-server:jar:9.4.51.v20230217:compile [INFO] | | +- org.eclipse.jetty:jetty-http:jar:9.4.51.v20230217:compile [INFO] | | \- org.eclipse.jetty:jetty-io:jar:9.4.51.v20230217:compile [INFO] | +- org.eclipse.jetty:jetty-util:jar:9.4.51.v20230217:compile [INFO] | +- org.eclipse.jetty:jetty-servlet:jar:9.4.51.v20230217:compile [INFO] | | +- org.eclipse.jetty:jetty-security:jar:9.4.51.v20230217:compile [INFO] | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.4.51.v20230217:compile [INFO] | +- org.eclipse.jetty:jetty-webapp:jar:9.4.51.v20230217:compile [INFO] | | \- org.eclipse.jetty:jetty-xml:jar:9.4.51.v20230217:compile [INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:runtime [INFO] | +- com.sun.jersey:jersey-core:jar:1.19.4:compile [INFO] | | \- javax.ws.rs:jsr311-api:jar:1.1.1:compile [INFO] | +- com.sun.jersey:jersey-servlet:jar:1.19.4:compile [INFO] | +- com.github.pjfanning:jersey-json:jar:1.20:compile [INFO] | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | \- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | \- javax.xml.stream:stax-api:jar:1.0-2:compile [INFO] | +- com.sun.jersey:jersey-server:jar:1.19.4:compile [INFO] | +- ch.qos.reload4j:reload4j:jar:1.2.22:compile [INFO] | +- commons-beanutils:commons-beanutils:jar:1.9.4:compile [INFO] | +- org.apache.commons:commons-configuration2:jar:2.8.0:compile [INFO] | +- org.slf4j:slf4j-reload4j:jar:1.7.36:compile [INFO] | +- org.apache.avro:avro:jar:1.7.7:compile [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile [INFO] | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | +- com.google.re2j:re2j:jar:1.1:compile [INFO] | +- org.apache.hadoop:hadoop-auth:jar:3.3.6:compile [INFO] | | +- com.nimbusds:nimbus-jose-jwt:jar:9.8.1:compile [INFO] | | | \- com.github.stephenc.jcip:jcip-annotations:jar:1.0-1:compile [INFO] | | +- org.apache.curator:curator-framework:jar:5.2.0:compile [INFO] | | \- org.apache.kerby:kerb-simplekdc:jar:1.0.1:compile [INFO] | | +- org.apache.kerby:kerb-client:jar:1.0.1:compile [INFO] | | | +- org.apache.kerby:kerby-config:jar:1.0.1:compile [INFO] | | | +- org.apache.kerby:kerb-common:jar:1.0.1:compile [INFO] | | | | \- org.apache.kerby:kerb-crypto:jar:1.0.1:compile [INFO] | | | +- org.apache.kerby:kerb-util:jar:1.0.1:compile [INFO] | | | \- org.apache.kerby:token-provider:jar:1.0.1:compile [INFO] | | \- org.apache.kerby:kerb-admin:jar:1.0.1:compile [INFO] | | +- org.apache.kerby:kerb-server:jar:1.0.1:compile [INFO] | | | \- org.apache.kerby:kerb-identity:jar:1.0.1:compile [INFO] | | \- org.apache.kerby:kerby-xdr:jar:1.0.1:compile [INFO] | +- com.jcraft:jsch:jar:0.1.55:compile [INFO] | +- org.apache.curator:curator-client:jar:5.2.0:compile [INFO] | +- org.apache.curator:curator-recipes:jar:5.2.0:compile [INFO] | +- org.apache.zookeeper:zookeeper:jar:3.6.3:compile [INFO] | | +- org.apache.zookeeper:zookeeper-jute:jar:3.6.3:compile [INFO] | | +- org.apache.yetus:audience-annotations:jar:0.5.0:compile [INFO] | | +- io.netty:netty-handler:jar:4.1.63.Final:compile [INFO] | | | +- io.netty:netty-common:jar:4.1.63.Final:compile [INFO] | | | +- io.netty:netty-resolver:jar:4.1.63.Final:compile [INFO] | | | +- io.netty:netty-buffer:jar:4.1.63.Final:compile [INFO] | | | +- io.netty:netty-transport:jar:4.1.63.Final:compile [INFO] | | | \- io.netty:netty-codec:jar:4.1.63.Final:compile [INFO] | | +- io.netty:netty-transport-native-epoll:jar:4.1.63.Final:compile [INFO] | | | \- io.netty:netty-transport-native-unix-common:jar:4.1.63.Final:compile [INFO] | | +- org.slf4j:slf4j-log4j12:jar:1.7.25:compile [INFO] | | \- log4j:log4j:jar:1.2.17:compile [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.2.4:compile [INFO] | +- org.apache.kerby:kerb-core:jar:1.0.1:compile [INFO] | | \- org.apache.kerby:kerby-pkix:jar:1.0.1:compile [INFO] | | +- org.apache.kerby:kerby-asn1:jar:1.0.1:compile [INFO] | | \- org.apache.kerby:kerby-util:jar:1.0.1:compile [INFO] | +- org.codehaus.woodstox:stax2-api:jar:4.2.1:compile [INFO] | +- com.fasterxml.woodstox:woodstox-core:jar:5.4.0:compile [INFO] | +- dnsjava:dnsjava:jar:2.1.7:compile [INFO] | \- org.xerial.snappy:snappy-java:jar:1.1.8.2:compile [INFO] +- org.apache.hadoop:hadoop-hdfs-client:jar:3.3.6:compile [INFO] | +- com.squareup.okhttp3:okhttp:jar:4.9.3:compile [INFO] | | \- com.squareup.okio:okio:jar:2.8.0:compile [INFO] | +- org.jetbrains.kotlin:kotlin-stdlib:jar:1.4.10:compile [INFO] | +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.4.10:compile [INFO] | \- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile [INFO] \- org.apache.kafka:kafka-clients:jar:3.6.0:compile [INFO] +- com.github.luben:zstd-jni:jar:1.5.5-1:compile [INFO] \- org.lz4:lz4-java:jar:1.8.0:compile [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.779 s [INFO] Finished at: 2025-08-27T18:32:41+08:00 [INFO] ------------------------------------------------------------------------ oscar@oscardeMacBook-Pro bz-sport-realtime % 我的树是这样,那么你说的依赖冲突不存在,为什么还是会报我本地idea上flink任务可以正常消费Kafka,但是上yarn集群就报如下错误: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:830) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:665) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:646) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:626) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.<init>(KafkaPartitionSplitReader.java:97) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.connector.kafka.source.KafkaSource.lambda$createReader$1(KafkaSource.java:185) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.createSplitFetcher(SplitFetcherManager.java:259) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.connector.base.source.reader.fetcher.SingleThreadFetcherManager.addSplits(SingleThreadFetcherManager.java:148) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.connector.base.source.reader.SourceReaderBase.addSplits(SourceReaderBase.java:315) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.api.operators.SourceOperator.handleAddSplitsEvent(SourceOperator.java:626) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.api.operators.SourceOperator.handleOperatorEvent(SourceOperator.java:596) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.OperatorEventDispatcherImpl.dispatchEventToHandlers(OperatorEventDispatcherImpl.java:72) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.dispatchOperatorEvent(RegularOperatorChain.java:80) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$dispatchOperatorEvent$24(StreamTask.java:1609) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:101) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:414) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:383) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:368) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:973) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:917) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:949) [bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763) [bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) [bz-sport-realtime-1.0-SNAPSHOT.jar:?] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_451] Caused by: org.apache.kafka.common.KafkaException: class org.apache.kafka.common.serialization.ByteArrayDeserializer is not an instance of org.apache.kafka.common.serialization.Deserializer at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:405) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:436) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:421) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:709) ~[bz-sport-realtime-1.0-SNAPSHOT.jar:?] ... 26 more
最新发布
08-28
Flink 任务在本地 IDEA 可正常消费 Kafka,但在 YARN 集群上报 `Failed to construct kafka consumer` 错误,`Caused by: class org.apache.kafka.common.serialization.ByteArrayDeserializer is not an instance of org.apache.kafka.common.serialization.Deserializer`,可从以下几个方面解决: #### 检查依赖版本兼容性 确保 POM 文件中的 `flink-connector-kafka` 依赖版本与 Flink 版本兼容。对于 Flink 1.20.2,可使用如下依赖: ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_2.12</artifactId> <version>1.20.2</version> </dependency> ``` 这里假设使用的 Scala 版本是 2.12。同时要保证所有 Kafka 相关依赖版本一致,避免引入不兼容或者冲突的 Kafka 客户端依赖。 #### 排除重复依赖 重复的依赖可能会导致类加载冲突,可通过 `mvn dependency:tree` 命令查看依赖树,找出重复的 Kafka 相关依赖并在 POM 文件中排除。例如: ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_2.12</artifactId> <version>1.20.2</version> <exclusions> <exclusion> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> </exclusion> </exclusions> </dependency> ``` #### 检查类加载顺序 报错可能跟 Flink 的类加载方式有关,可修改 `flink-conf.yml` 中的 `classloader.resolve-order` 参数,将默认的 `child-first` 改成 `parent-first`: ```properties classloader.resolve-order: parent-first ``` #### 检查序列化器配置 确保在代码中正确配置了 Kafka 消费者的反序列化器。示例代码如下: ```java import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.connector.kafka.source.KafkaSource; import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import java.util.Properties; public class FlinkKafkaConsumerExample { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "test-group"); KafkaSource<String> source = KafkaSource.<String>builder() .setBootstrapServers("localhost:9092") .setTopics("test-topic") .setGroupId("test-group") .setStartingOffsets(OffsetsInitializer.earliest()) .setValueOnlyDeserializer(new SimpleStringSchema()) .build(); env.fromSource(source, org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext.class, "Kafka Source") .print(); env.execute("Flink Kafka Consumer Example"); } } ``` #### 检查 YARN 集群环境 确保 YARN 集群上的 Flink 版本和本地开发环境一致,并且检查 YARN 集群上是否已经有 Kafka 客户端的 Jar 包冲突。可以通过 `-Dyarn.provided.lib.dirs` 参数指定 Flink 依赖的 Jar 包路径,避免和集群上已有的 Jar 包冲突。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值