kafka java api 入数据报错：Error: MESSAGE_TOO_LARGE

最新推荐文章于 2025-03-28 19:08:52 发布

人蠢多读书

最新推荐文章于 2025-03-28 19:08:52 发布

阅读量3.5k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：大数据运维之kafka日常

本文链接：https://blog.youkuaiyun.com/qq_35440040/article/details/103600861

大数据运维之kafka日常专栏收录该内容

7 篇文章

订阅专栏

1.kafka版本：kafka_2.11-2.1.0-kafka-4.0.0.jar
2.server.properties：所有调优参数都是默认
3.topic ：null，所有参数默认
4.入库1G txt文件，只加三个参数：

acks：all

batch.size：1048576

linger.ms：10

引用：http://www.mamicode.com/info-detail-2265305.html

batch.size和linger.ms是对kafka producer性能影响比较大的两个参数。batch.size是producer批量发送的基本单位，默认是16384Bytes，即16kB；lingger.ms是sender线程在检查batch是否ready时候，判断有没有过期的参数，默认大小是0ms。

那么producer是按照batch.size大小批量发送消息呢，还是按照linger.ms的时间间隔批量发送消息呢？这里先说结论：其实满足batch.size和ling.ms之一，producer便开始发送消息。

5.入库报错：

[2019-12-18 16:50:50,325] WARN [Producer clientId=producer-1] Got error produce response in correlation id 645 on topic-partition null-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-18 16:50:50,359] WARN [Producer clientId=producer-1] Got error produce response in correlation id 646 on topic-partition null-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender)

6.bug链接：https://issues.apache.org/jira/browse/KAFKA-8350

Currently, producers do the batch splitting based on the batch size. However, the split will never succeed when batch size is greatly larger than the topic-level max message size.

For instance, if the batch size is set to 8MB but we maintain the default value for broker-side `message.max.bytes` (1000012, about1MB), producer will endlessly try to split a large batch but never succeeded, as shown below:

[2019-05-10 16:25:09,233] WARN [Producer clientId=producer-1] Got error produce response in correlation id 61 on topic-partition test-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender:617)
[2019-05-10 16:25:10,021] WARN [Producer clientId=producer-1] Got error produce response in correlation id 62 on topic-partition test-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender:617)
[2019-05-10 16:25:10,758] WARN [Producer clientId=producer-1] Got error produce response in correlation id 63 on topic-partition test-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender:617)
[2019-05-10 16:25:12,071] WARN [Producer clientId=producer-1] Got error produce response in correlation id 64 on topic-partition test-0, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE (org.apache.kafka.clients.producer.internals.Sender:617)

A better solution is to have producer do splitting based on the minimum of these two configs. However, it is tricky for the client to get the topic-level or broker-level config values. Seems there could be three ways to do this:

When broker throws `RecordTooLargeException`, do not swallow its real message since it contains the max message size already. If the message is not swallowed, the client easily gets it from the response.
Add code to issue `DescribeConfigsRequest` to retrieve the value.
If splitting failed, decreases the batch size gradually until the split is successful. For example,

// In RecordAccumulator.java
private int steps = 1;
......
public int splitAndReenqueue(ProducerBatch bigBatch) {
......
    Deque<ProducerBatch> dq = bigBatch.split(this.batchSize / steps);
    if (dq.size() == 1) // split failed
        steps++;
......
}