日志
A log for a topic named "my_topic" with two partitions consists of two directories (namely my_topic_0 and my_topic_1) populated with data files containing the messages for that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer Nstoring the message length which is followed by the Nmessage bytes. Each message is uniquely identified by a 64-bit integer offsetgiving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition. The on-disk format of each message is given below. Each log file is named with the offset of the first message it contains. So the first file created will be 00000000000.kafka, and each additional file will have an integer name roughly Sbytes from the previous file where Sis the max log file size given in the configuration.
假设有2个分区的主题“my_topic”,它将由2个目录构成(my_topic_0和my_topic_1),用于存放该主题消息的数据文件。日志文件的格式是一个“日志条目”序列。每条日志条目都由一个存储消息长度的4字节整型N和紧跟着的N字节消息组成。其中每条消息都有一个64位整型的唯一标识offset,offset(偏移量)代表了topic分区中所有消息流中该消息的起始字节位置。每条消息在磁盘上的格式如下:每个日志文件用第一条消息的offset来命名的,因此,创建的第一个文件将是00000000000.kafka,并且每个附加文件都将是上一个文件S字节的整数命名,其中S是配置中设置的最大日志文件大小。
The exact binary format for messages is versioned and maintained as a standard interface so message sets can be transfered between producer, broker, and client without recopying or conversion when desirable. This format is as follows:
消息是二进制格式并作为一个标准接口,所以消息可以在producer,broker,client之间传输,无需再copy或转换。格式如下:
On-disk format of a message
message length : 4 bytes (value: 1+4+n)
"magic" value : 1 byte
crc : 4 bytes
payload : n bytes