Kafka 整个消息管道的默认口径是 1M,换句话说,默认 Producer 只能向 Kafka 发送大小不超过 1M 的消息,Kafka 内部也只能处理大小不超过 1M 的消息,Consumer 也只能消费大小不超过 1M 的消息。
如果发送 2M(为了方便计算,以下 1M=1000K)大小的数据,client 会报异常
1
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 2000037 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
根据提示,在 Producer config 中设置 max.request.size 为 2M+(注意必须大于 2M,因为消息本身的 metadata 也会占用空间,比如上文日志中,一条包含 2M 数据的消息的大小是 2M+37 byte),但服务器返回异常
1
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
服务器无法接收这么大的消息。那么修改 Kafka 的 server.properties,添加 message.max.bytes=2M,然后重启 Kafka 集群。发送成功。
然后尝试消费消息,报错
1
Exception in thread "main" org.apache.kafka.common.errors.RecordTooLargeException: There are some messages at [Partition=Offset]: {test-1=6863354} whose size is larger than the fetch size 1048576 and hence cannot be ever returned. Increase the fetch size on the client (using max.partition.fetch.bytes), or decrease the maximum message size the broker will allow (using message.max.bytes).
根据提示,在 Consumer config 中设置 max.partition.fetch.bytes 为 2M。消费成功。
但还没完,这时发现 Kafka 的 bytes_out 异常的高,几乎逼近带宽极限,但这时并没有客户端在往 Kafka 里发送数据,查看 Kafka 日志,发现有异常
1
[2016-10-25 03:15:08,361] ERROR [ReplicaFetcherThread-0-1], Replication is failing due to a message that is greater than replica.fetch.max.bytes for partition [test,1]. This generally occurs when the max.message.bytes has been overridden to exceed this value and a suitably large message has also been sent. To fix this problem increase replica.fetch.max.bytes in your broker config to be equal or larger than your settings for max.message.bytes, both at a broker and topic level. (kafka.server.ReplicaFetcherThread)
原因是修改 message.max.bytes 之后,Kafka broker 已经能接收 2M 的数据,但 replication fetcher 尝试备份时失败了,因为 replica.fetch.max.bytes 控制了最大的备份消息 size 是 1M。由于 replication fetcher 会无限的重试备份,因此 bytes_out 会急剧升高。把 replica.fetch.max.bytes 设置成 2M,并重启 broker 之后,问题解决。
由于可以通过 topic-level 参数 max.message.bytes 来控制每个 topic 所能接收的最大消息 size,所以为了避免重启,可以将 replica.fetch.max.bytes 设置为一个比较大的值(比如 10M),以后如果有某个 topic 需要收发 oversize 的消息,只需要修改该 topic 的 max.message.bytes 就行,不需要重启集群。