概述
Log由一系列LogSegment组成,每个LogSegment都有一个base offset,表示该段中的第一条消息。
新的LogSegment会根据Log的配置策略来创建。配置策略控制了LogSegment的字节大小和创建的时间间隔。
成员变量
- dir
LogSegment的创建目录。
- LogStartOffset
可以暴露给client端的最早offset。LogStartOffset可以通过以下方式更新:
- 用户调用deleteRecordsRequest删除日志
- broker的log retention
- broker的日志截断
LogStartOffset用于以下情形:
日志删除。nextOffset小于Log的LogSegment的LogSegment可以被删除。如果active segment被删除了,也有可能触发日志回滚。
在ListOffsetRequest中会返回Log的LogStartOffset。为了避免OffsetOutofRange异常,必须确保logStartOffset <= Log的highWatermarker。
- activeSegment
指的是该 Log 管理的 segments 中那个最新的 segment(这里叫做活跃的 segment),一个 Log 中只会有一个活跃的 segment,其他的 segment 都已经被持久化到磁盘了;
- logEndOffset
表示下一条消息的 offset,实际上就是activeSegment的下一个偏移量
@threadsafe
class Log(@volatile var dir: File,
@volatile var config: LogConfig,
@volatile var logStartOffset: Long,
@volatile var recoveryPoint: Long,
scheduler: Scheduler,
brokerTopicStats: BrokerTopicStats,
val time: Time,
val maxProducerIdExpirationMs: Int,
val producerIdExpirationCheckIntervalMs: Int,
val topicPartition: TopicPartition,
val producerStateManager: ProducerStateManager,
logDirFailureChannel: LogDirFailureChannel) extends Logging with KafkaMetricsGroup {
/* The earliest offset which is part of an incomplete transaction. This is used to compute the
* last stable offset (LSO) in ReplicaManager. Note that it is possible that the "true" first unstable offset
* gets removed from the log (through record or segment deletion). In this case, the first unstable offset
* will point to the log start offset, which may actually be either part of a completed transaction or not
* part of a transaction at all. However, since we only use the LSO for the purpose of restricting the
* read_committed consumer to fetching decided data (i.e. committed, aborted, or non-transactional), this
* temporary abuse seems justifiable and saves us from scanning the log after deletion to find the first offsets
* of each ongoing transaction in order to compute a new first unstable offset. It is possible, however,
* that this could result in disagreement between replicas depending on when they began replicating the log.
* In the worst case, the LSO could be seen by a consumer to go backwards.
*/
@volatile var firstUnstableOffset: Option[LogOffsetMetadata] = None
/* Keep track of the current high watermark in order to ensure that segments containing offsets at or above it are
* not eligible for deletion. This means that the active segment is only eligible for deletion if the high watermark
* equals the log end offset (which may never happen for a partition under consistent load). This is needed to
* prevent the log start offset (which is exposed in fetch responses) from getting ahead of the high watermark.
*/
@volatile private var replicaHighWatermark: Option[Long] = None
/* the actual segments of the log */
private val segments: ConcurrentNavigableMap[java.lang.Long, LogSegment] = new ConcurrentSkipListMap[java.lang.Long, LogSegment]
// Visible for testing
@volatile var leaderEpochCache: Option[LeaderEpochFileCache] = None
/**
* The active segment that is currently taking appends
*/
def activeSegment = segments.lastEntry.getValue
/**
* The offset metadata of the next message that will be appended to the log
*/
def logEndOffsetMetadata: LogOffsetMetadata = nextOffsetMetadata
/**
* The offset of the next message that will be appended to the log
*/
def logEndOffset: Long = nextOffsetMetadata.messageOffset
追加日志
添加records到Log的active segmenet,必要时滚动创建新的 segment
该方法会为record赋值一个offset,然而如果assignOffsets参数为false,我们只是检查已经存在的offset是否有效。
该方法的主要流程如下:
- 对写入的消息进行检测,主要是检查消息的大小及 crc 校验;
- 迭代每个record,并设置每个record的offset从当前Log的LEO处开始递增,并对msg做进一步的检验;
- 每条 msg 都会有一个对应的时间戳记录,如果timestamp 的类型设置为日志追加时间,则将logAppendTime设置为当前时间;
- 判断Segment是否满了,如果满了进行日志滚动,创建新的LogSegment;