flink-connector-starrocks 问题记录

本文档介绍了三个常见的Flink运行时错误及其解决方案:ClassNotFoundException与缺失的maven依赖,IllegalArgumentException关于主键未定义的问题,以及实时数据无法写入StarRocks的故障排查。解决方案包括添加flink-table-planner依赖,正确设置TableSchema的主键,以及配置StarRocksSink的刷新策略,如buffer-flush.max-rows和interval-ms等参数。

问题一:ClassNotFoundException: org.apache.flink.calcite.shaded.com.google.common.base.Preconditions

Caused by: java.lang.ClassNotFoundException: org.apache.flink.calcite.shaded.com.google.common.base.Preconditions
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 7 more

解决:缺依赖maven 仓库

 <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.11</artifactId>
            <version>1.12.0</version>
            <scope>provided</scope>
        </dependency>

Caused by: org.apache.flink.util.SerializedThrowable: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:289) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$2(DefaultStreamLoader.java:120) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] ... 1 more 2025-06-10 01:14:52,776 INFO com.starrocks.connector.flink.manager.StarRocksSinkManagerV2 [] - Stream load manager flush 2025-06-10 01:14:52,776 ERROR com.starrocks.connector.flink.manager.StarRocksSinkManagerV2 [] - catch exception, wait rollback com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:289) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$2(DefaultStreamLoader.java:120) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] 2025-06-10 01:14:52,776 ERROR com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2 [] - Failed to flush when closing java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.connector.flink.manager.StarRocksSinkManagerV2.AssertNotException(StarRocksSinkManagerV2.java:367) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.connector.flink.manager.StarRocksSinkManagerV2.flush(StarRocksSinkManagerV2.java:299) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.close(StarRocksDynamicSinkFunctionV2.java:184) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:41) ~[roi-flinksql-custom-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:791) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.runAndSuppressThrowable(StreamTask.java:770) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUpInvoke(StreamTask.java:689) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:593) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) [flink-dist_2.11-1.12.3.jar:1.12.3] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:289) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$2(DefaultStreamLoader.java:120) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] ... 1 more 2025-06-10 01:14:52,778 WARN org.apache.flink.runtime.taskmanager.Task [] - Sink: Sink(table=[default_catalog.default_database.sink_rt_dau_pvreport2sr], fields=[day, EXPR$1, client_act_time, is_client_act, ts, EXPR$5]) (1/1)#0 (6b68924ad80786daa5913d36d18c04f2) switched from RUNNING to FAILED. java.io.IOException: Could not perform checkpoint 1 for operator Sink: Sink(table=[default_catalog.default_database.sink_rt_dau_pvreport2sr], fields=[day, EXPR$1, client_act_time, is_client_act, ts, EXPR$5]) (1/1)#0. at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:963) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:115) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:156) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:157) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:179) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) [flink-dist_2.11-1.12.3.jar:1.12.3] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] Suppressed: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.connector.flink.manager.StarRocksSinkManagerV2.AssertNotException(StarRocksSinkManagerV2.java:367) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.connector.flink.manager.StarRocksSinkManagerV2.flush(StarRocksSinkManagerV2.java:299) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.close(StarRocksDynamicSinkFunctionV2.java:184) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:41) ~[roi-flinksql-custom-1.0-SNAPSHOT.jar:?] at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:791) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.runAndSuppressThrowable(StreamTask.java:770) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUpInvoke(StreamTask.java:689) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:593) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) [flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) [flink-dist_2.11-1.12.3.jar:1.12.3] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." } errorLog: null at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:289) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$2(DefaultStreamLoader.java:120) ~[flink-connector-starrocks-1.2.7_flink-1.12_2.11.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] ... 1 more Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 1 for operator Sink: Sink(table=[default_catalog.default_database.sink_rt_dau_pvreport2sr], fields=[day, EXPR$1, client_act_time, is_client_act, ts, EXPR$5]) (1/1)#0. Failure reason: Checkpoint was declined. at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:241) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:162) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:686) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:607) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:572) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:298) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$9(StreamTask.java:1004) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:988) ~[flink-dist_2.11-1.12.3.jar:1.12.3] at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:947) ~[flink-dist_2.11-1.12.3.jar:1.12.3] ... 13 more Caused by: org.apache.flink.util.SerializedThrowable: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: s_rpt, table: rpt_rt_dau_dim, label: 26639e7c-856e-477f-896f-d2ba3bb37f66, responseBody: { "Status": "INTERNAL_ERROR", "Message": "The size of this batch exceed the max size [104857600] of json type data data [ 106097396 ]. Set ignore_json_size to skip the check, although it may lead huge memory consuming." }
06-11
### FlinkStarRocks集成中JSON数据批次大小限制的解决方案 在FlinkStarRocks集成的过程中,如果遇到由于JSON数据批次大小超出限制(例如`104857600`字节)而引发的`StreamLoadFailException`,可以采取以下措施来解决此问题[^1]。 #### 1. 调整批次大小 可以通过设置Flink连接器中的参数来控制写入StarRocks时的数据批次大小。例如,调整`sink.buffer-flush.max-rows`和`sink.buffer-flush.interval`等参数以减少每次写入的数据量。这可以通过以下方式实现: ```properties # 配置文件示例 sink.buffer-flush.max-rows=1000 sink.buffer-flush.interval=1000 # 毫秒 ``` 通过降低每批次写入的行数或增加刷新间隔,可以有效避免单次写入的数据量超过StarRocks的限制[^2]。 #### 2. 修改StarRocks的最大JSON数据大小限制 如果上述方法无法满足需求,可以尝试修改StarRocks的配置参数`max_json_data_size`,以提高允许的最大JSON数据大小。该参数可以在StarRocks的BE节点配置文件中进行调整,例如: ```bash # 修改be.conf文件 max_json_data_size=209715200 # 设置为200MB ``` 完成修改后,需要重启StarRocks的BE节点以使更改生效[^3]。 #### 3. 关于`ignore_json_size`参数 目前StarRocks并不支持通过设置`ignore_json_size`参数跳过JSON数据大小检查。因此,这种方法不可行。建议优先考虑调整批次大小或修改StarRocks的配置参数来解决问题[^4]。 #### 4. 数据预处理 在Flink端对数据进行预处理也是一种有效的解决方案。可以通过将较大的JSON对象拆分为较小的部分,或者将JSON数据转换为其他更紧凑的格式(如Avro或Protobuf),从而减少单条记录的大小。以下是数据预处理的一个简单示例: ```python from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment env = StreamExecutionEnvironment.get_execution_environment() t_env = StreamTableEnvironment.create(env) # 假设输入为大JSON数据 input_table = t_env.from_elements([("large_json_data",)], ["json_column"]) # 对JSON数据进行拆分或其他预处理操作 processed_table = input_table.select("json_column['key1'], json_column['key2']") t_env.to_append_stream(processed_table, Types.ROW([Types.STRING(), Types.STRING()])).print() ``` 通过这种方式,可以显著减少写入StarRocks时的数据体积,从而避免批次大小超限的问题[^5]。 ### 注意事项 在实施上述方案时,请确保FlinkStarRocks的版本兼容性,并根据实际情况选择最合适的优化策略。此外,建议在测试环境中充分验证调整后的配置,以确保生产环境的稳定运行。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

露落梨花

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值