flink1.13报错总结

最新推荐文章于 2024-09-24 17:54:46 发布

原创最新推荐文章于 2024-09-24 17:54:46 发布 · 1.6k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #flink

flink 专栏收录该内容

1 篇文章

订阅专栏

文章列举了Flink使用CDC从MySQL和PostgreSQL读取数据时遇到的错误及解决方法，包括配置问题、binlog时间、数据类型转换等。还涉及Druid连接池、Flink的checkpoint、HDFS配置、JDBC操作、Phoenix数据写入、KafkaSink故障、HBase和StarRocks的连接与配置问题。

1.flinkcdc读取mysql报错
   Caused by: java.lang.IllegalStateException: The connector is trying to read binlog starting at Struct{version=1.5.4.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1668071717703,db=,server_id=0,file=mysql-bin.096881,pos=53976934,row=0}, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.

   解决方案:
       1.flinkcdc中的mysqlsource设置读取最新数据 .startupOptions(StartupOptions.latest());
       可能 2.将mysql的binlog保存时间加长;
       可能 3.mysql的默认连接时间是30min.

2.druid连接池的使用(一般配合线程池使用)
   当需要用多线程和异步io的时候使用,平时可以直接使用jdbc连接;

3.flink的checkpoint在hdfs上只有元数据的问题
   可能原因:
       1.保存的数据太小,所以只有元数据一个文件,等保存的数据多了之后就显示正常了,不是报错
       2.可能远程无法访问hdfs,在hdfs-site.xml配置文件增加
           
           <property>
               <description>only cofig in clients</description>
               <name>dfs.client.use.datanode.hostname</name>
               <value>true</value>
           </property>
           并且把core-site.xml和hdfs-site.xml配置文件放到idea项目的resources文件夹里

4.idea中将类打包后可能出现多个类的checkpoint路径失效,都在同一个hdfs的路径里的问题
   可能原因:
       pom.xml里的maven的打包插件绑定了主类

5.flinkcdc读取postgresql数据库的报错:(需要账号权限,还有开启逻辑复制)
   replication slot "flink" is active for PID 627067

   解决方案:
       1.postgresqlsource设置.slotName("flink_test")和.decodingPluginName("pgoutput")
       可以查看的参数://复制槽视图
                       select * from pg_replication_slots;
                       //查看复制槽
                       select * from pg_stat_replication;
                       select * from pg_publication_tables;
                       //创建物理复制槽
                       SELECT * FROM pg_create_physical_replication_slot('test_slot');
                       //创建逻辑复制槽
                       select * from pg_create_logical_replication_slot('test_logical_slot_81_72','wal2json');
                       //删除复制槽
                       SELECT * FROM pg_drop_replication_slot('flink_test');

6.flinkcdc读取数据库类型为decimal类型时打印出来的是string类型
   解决方案:
       Map config = new HashMap();
config.put(JsonConverterConfig.DECIMAL_FORMAT_CONFIG, DecimalFormat.NUMERIC.name());
JsonDebeziumDeserializationSchema jdd = new JsonDebeziumDeserializationSchema(false, config);

7.flink读取数据库数字类型的字段为null,怎么把这个null存入数据库数字类型的表中
   解决方案:
       (推荐做法)1. if (data.getProject_id() == null){
statement.setNull(2, Types.NULL);
}else {
statement.setInt(2, data.getProject_id());
};
       (不推荐做法)2.可以在map或process算子中if进行判断,判断为空,则jdbc写入的时候不写入这个字段.

   将string类型数据写入mysql的datetime列?
   解决方案: 可以通过jdbc的方式将string类型的datetime数据写入mysql表中datetime类型的列中
   ps.setObject(9, obj.getString("add_date"));
   ps.setString(9, obj.getString("add_date"));
if (obj.getString("add_date") == null || "0".equals(obj.getString("add_date"))){
ps.setString(9, "0000-00-00 00:00:00");
}else {
ps.setString(9, obj.getString("add_date"));
}

8.通过jdbc向phoenix插入数据,没报错,但是数据也没有写入问题
   解决方案:
       phoenixConnection.commit();

9.phoenix登录客户端报错
   Error: org.apache.hadoop.hbase.DoNotRetryIOException: Unable to load configured region split policy 'org.apache.phoenix.schema.MetaDataSplitPolicy' for table 'SYSTEM.CATALOG' Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks

   解决方案:
       在hbase-site.xml配置文件加上:
       <property>
           <name>hbase.table.sanity.checks</name>
           <value>false</value>
       </property>

10.flinkjob的kafkasink数据写不出去
   原因:hadoop,flink和kafka集群不在同一个集群(相同几台服务器)上,它们数据交互使用hostname,会通过hostname找不到对应服务器
   解决方案:在flink(hadoop)集群的/etc/hosts文件下加上其他服务器的ip映射

11.flinkjob启动报错classloader.xxx
   报错原因:
       flink1.13.x的bug,但不影响程序运行

   解决方案:
       在flink-conf.yaml配置文件添加classloader.check-leaked-classloader: false

12.flinkjob在本地idea正常,在服务器上运行就中文乱码
   解决方案:
       (推荐做法)1.在flink-conf.yaml配置文件添加env.java.opts: "-Dfile.encoding=UTF-8"
       2.在启动flink任务时加上参数就可以解决：-yD env.java.opts="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
       3.修改yarn配置文件https://www.freesion.com/article/86431040769/

13.flinkjob(有用到hbase框架)启动报错
   报错信息:
       Exception in thread “Thread-2” java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (null), this version is 2.0.0

   解决方案:
       在hbase-site.xml配置文件添加:
   <property>
       <name>hbase.defaults.for.version.skip</name>
       <value>true</value>
       <description>
       Set to true to skip the 'hbase.defaults.for.version' check.
       Setting this to true can be useful in contexts other than
       the other side of a maven generation; i.e. running in an
       ide. You'll want to set this boolean to true to avoid
       seeing the RuntimException complaint: "hbase-default.xml file
       seems to be for and old version of HBase (@@@VERSION@@@), this
       version is X.X.X-SNAPSHOT"
       </description>
   </property>

14.flink本地idea连接starrocks超时,将代码打包到云服务器上运行正常
   原因:
       starrocks拒接远程写入功能

   解决方案:
       在fe.conf配置文件开启远程写入功能
           # 是否启用远程写入，0-不启用，1-启用，默认为0
           remote_load_enable=1
           # 是否对写入数据进行签名验证，0-不校验，1-校验，默认为0
           remote_load_verify_hash=0
           # 是否启用远程写入
           remote_query_executor_enable=true
       之后在代码配置url:
           //本地idea:18040(默认8040)
           // public static String STARROCKS_LOAD_URL = "10.206.65.215:18040;10.206.66.52:18040;10.206.64.55:18040";
           //云服务器:18030(默认8030)
           public static String STARROCKS_LOAD_URL = "10.206.65.215:18030;10.206.66.52:18030;10.206.64.55:18030";

15.将starrocks作为数仓,使用flink将数据写入starrocks的ods表时,一行数据如果太多列为null,则报错写不进去
   tips: starrocks建立动态分区表后,不会立即生成对应的分区,时间不定,如果对应分区还未生成就想表中插入数据则会报错
   报错信息:
       ERROR com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor - Stream Load response:
       {“Status”:“Fail”,“BeginTxnTimeMs”:101,“Message”:“too many filtered rows”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:“061f0edd-7a68-445f-a4d5-079aa79d8d23”,“LoadBytes”:245396,“StreamLoadPlanTimeMs”:102,“NumberTotalRows”:984,“WriteDataTimeMs”:265,“TxnId”:13787,“LoadTimeMs”:469,“ErrorURL”:“http://172.30.16.26:18040/api/_load_error_log?file=error_log_5a4aad49f7900ad7_9dec92739f8027b7",“ReadDataTimeMs”:0,“NumberLoadedRows”:0,"NumberFilteredRows”:984}
   原因:
       starrocks默认开启strict mode,会将数据过滤掉

   解决方案:
       1)在starrocks建表ods表时,将before_xx列设置为 DEFAULT NULL,这样r或者c类型的before_xx列的值为null,u类型的before_xx列分别赋值before的对应数值;
       2)ods表的那些before_xx列进行赋值,分别赋值为after的对应数值;--不推荐做法

16.flinkCDC读取mysql的Date类型是,获取的值是距离1970-01-01的天数
   解决方案:
       自己取值处理
       if (after.getLong("entryDate") != null){
           after.put("entryDate",ParseDateTime.longToDate(after.getLong("entryDate")*24*3600*1000));
       }

17.flinkCDC将时间类型Date和DateTime类型数据写入starrocks时为null
   解决方案:
       将数据转为string类型写入starrocks
       after.put("last_login",ParseDateTime.longToDateTime(after.getLong("last_login")-8*3600*1000));

18.flinkCDC读取mysql时间类型数据时,自定义处理类,但是(有个bug:如果同步历史数据的timestamp,则会+8小时,如果是同步binlog数据的timestamp则没有问题)
   MySqlDateTimeConverter

19.有些系统的shell脚本不能识别飘号
   shell脚本的sql语句中有飘号,脚本不能识别,只能将飘号删了

20.flink的一个流有多种数据时,用侧输出流的方式处理
   解决方案:
       用process算子,里面进行判断,将数据分为主流(out.collect(value);)和侧输出流(ctx.output(new OutputTag<HzdxPojo>("cycle_mark20yyMM") {}, value);)将数据扔出,分为不同的流

21.flume支持实时监控es,hbase,doris等框架
   通过chatGPT的回答知道了,还未进行实践;doris可以开启binlog

22.通过jdbc的方式查询mysql时,如果列的类型是tinyint,返回的结果会变成BIT类型,既true或者false,如果该列初始值为0,1,-1等,则结果会不一致
   解决方案:
       String columnTypeName = metaData.getColumnTypeName(i);
       if ("BIT".equals(columnTypeName)){
           String columnName = metaData.getColumnLabel(i);
           int v = resultSet.getInt(i);
           BeanUtils.setProperty(t, columnName, v);
       }else {
           String columnName = metaData.getColumnLabel(i);
           Object v = resultSet.getObject(i);
           BeanUtils.setProperty(t, columnName, v);
       }

23.flinkjob中使用Mybatis时,有时间类型数据时会有异常
   解决方案:
       mybatis_config.xml添加
       <typeHandlers>
           <typeHandler handler="com.lhjsdt.flink.utils.DateTimeTypeHandler"/>
       </typeHandlers>

       DateTimeTypeHandler

import io.debezium.spi.converter.CustomConverter;
import io.debezium.spi.converter.RelationalColumn;
import org.apache.kafka.connect.data.SchemaBuilder;

import java.time.*;
import java.time.format.DateTimeFormatter;
import java.util.Properties;

/**
 * flinkcdc处理mysql日期字段时区/格式处理
 * 有个bug:如果同步历史数据的timestamp,则会+8小时,如果是同步binlog数据的timestamp则没有问题
 */
public class MySqlDateTimeConverter implements CustomConverter<SchemaBuilder, RelationalColumn> {

    private DateTimeFormatter dateFormatter = DateTimeFormatter.ISO_DATE;

    private DateTimeFormatter timeFormatter = DateTimeFormatter.ISO_TIME;

    private DateTimeFormatter datetimeFormatter = DateTimeFormatter.ISO_DATE_TIME;

    private DateTimeFormatter timestampFormatter = DateTimeFormatter.ISO_DATE_TIME;

    private ZoneId timestampZoneId = ZoneId.systemDefault();

    @Override
    public void configure(Properties props) {

    }

    @Override
    public void converterFor(RelationalColumn column, ConverterRegistration<SchemaBuilder> registration) {

        String sqlType = column.typeName().toUpperCase();

        SchemaBuilder schemaBuilder = null;

        Converter converter = null;

        if ("DATE".equals(sqlType)) {

            schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.date.string");

            converter = this::convertDate;

        }

        if ("TIME".equals(sqlType)) {

            schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.time.string");

            converter = this::convertTime;

        }

        if ("DATETIME".equals(sqlType)) {

            schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.datetime.string");

            converter = this::convertDateTime;


        }

        if ("TIMESTAMP".equals(sqlType)) {

            schemaBuilder = SchemaBuilder.string().optional().name("com.darcytech.debezium.timestamp.string");

            converter = this::convertTimestamp;

        }

        if (schemaBuilder != null) {

            registration.register(schemaBuilder, converter);

        }

    }


    private String convertDate(Object input) {

        if (input == null) {
            return null;
        }

        if (input instanceof LocalDate) {

            return dateFormatter.format((LocalDate) input);

        }

        if (input instanceof Integer) {

            LocalDate date = LocalDate.ofEpochDay((Integer) input);

            return dateFormatter.format(date);

        }

        return String.valueOf(input);

    }


    private String convertTime(Object input) {

        if (input == null) {
            return null;
        }

        if (input instanceof Duration) {

            Duration duration = (Duration) input;

            long seconds = duration.getSeconds();

            int nano = duration.getNano();

            LocalTime time = LocalTime.ofSecondOfDay(seconds).withNano(nano);

            return timeFormatter.format(time);

        }

        return String.valueOf(input);

    }


    private String convertDateTime(Object input) {

        if (input == null) {
            return null;
        }

        if (input instanceof LocalDateTime) {

            return datetimeFormatter.format((LocalDateTime) input).replaceAll("T", " ");

        }

        return String.valueOf(input);

    }


    private String convertTimestamp(Object input) {

        if (input == null) {
            return null;
        }

        if (input instanceof ZonedDateTime) {

            // mysql的timestamp会转成UTC存储，这里的zonedDatetime都是UTC时间

            ZonedDateTime zonedDateTime = (ZonedDateTime) input;

            LocalDateTime localDateTime = zonedDateTime.withZoneSameInstant(timestampZoneId).toLocalDateTime();

            return timestampFormatter.format(localDateTime).replaceAll("T", " ");

        }
        return String.valueOf(input);
    }
}

import org.apache.flink.table.shaded.org.joda.time.DateTime;
import org.apache.ibatis.type.BaseTypeHandler;
import org.apache.ibatis.type.JdbcType;

import java.sql.CallableStatement;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class DateTimeTypeHandler extends BaseTypeHandler<DateTime> {
    @Override
    public void setNonNullParameter(PreparedStatement ps, int i, DateTime parameter, JdbcType jdbcType)
            throws SQLException {
        ps.setTimestamp(i, new java.sql.Timestamp(parameter.getMillis()));
    }

    @Override
    public DateTime getNullableResult(ResultSet rs, String columnName) throws SQLException {
        java.sql.Timestamp timestamp = rs.getTimestamp(columnName);
        if (timestamp != null) {
            return new DateTime(timestamp.getTime());
        }
        return null;
    }

    @Override
    public DateTime getNullableResult(ResultSet rs, int columnIndex) throws SQLException {
        java.sql.Timestamp timestamp = rs.getTimestamp(columnIndex);
        if (timestamp != null) {
            return new DateTime(timestamp.getTime());
        }
        return null;
    }

    @Override
    public DateTime getNullableResult(CallableStatement cs, int columnIndex) throws SQLException {
        java.sql.Timestamp timestamp = cs.getTimestamp(columnIndex);
        if (timestamp != null) {
            return new DateTime(timestamp.getTime());
        }
        return null;
    }
}