CleanerConfig
numThreads: Int =1
清理线程的个数,每个线程调用cleanerManager.grabFilthiestLog()返回的最该清理的topicAndPartition的LogToClean对象,之后开始清理工作
dedupeBufferSize: Long =4*1024*1024L
dedupeBufferLoadFactor: Double =0.9d
hashAlgorithm: String = "MD5"
offsetMap =newSkimpyOffsetMap(memory =
math.min(config.dedupeBufferSize/config.numThreads,
Int.MaxValue).toInt,
hashAlgorithm =config.hashAlgorithm)
hashAlgorithm为map中key值的转换成hash的方法
清理消息的起始位置是0,结束位置为endOffset
buildOffsetMap函数返回的偏移量记作endOffset,由如下两个因素决定
这个偏移量不会超过(cleanable.firstDirtyOffset+
map.slots * this.dupBufferLoadFactor)
在从segment往offsetMap写message.key,entry.offset时,写到map.utilization
<this.dupBufferLoadFactor位置的offset
1).得到需要清理的segment集合,取出cleanable.firstDirtyOffset到log.activeSegment.baseOffset的所有segment,记作dirty
2).通过offsetMap参数的大小,来计算一次清理的结束的offset,记作minStopOffset
minStopOffset
= (start + map.slots * this.dupBufferLoadFactor).toLong
3).遍历dirty
满足两个条件其中之一,segment.baseOffset<=minStopOffset||
map.utilization <this.dupBufferLoadFactor
就开始调用buildOffsetMapForSegment来把该segment信息保存在offsetMap中
segment把消息读到Cleaner.readBuffer中,之后利用Cleaner.readBuffer创建ByteBufferMessageSet
entry类型为MessageAndOffset(message: Message,offset:
Long)
offsetMap保存的内容是map.put(message.key,entry.offset)
ioBufferSize: Int =1024*1024
Cleaner.readBuffer
Cleaner.writeBuffer
这两个buff大小为config.ioBufferSize/config.numThreads/2
在做某个topicAndPartition清理时,需要从老segmernt中读到Cleaner.readBuffer,之后在把符合的message写入心segment时,要先把数据写到Cleaner.writeBuffer中
maxMessageSize: Int =32*1024*1024
maxIoBytesPerSecond: Double = Double.MaxValue
backOffMs: Long =15*1000
调用cleanerManager.grabFilthiestLog()返回的最该清理的topicAndPartition的LogToClean对象
如果该LogToClean对象唯恐,表示现在暂时没有需要符合清理条件的LogToClean,就调用backOffWaitLatch.await(config.backOffMs,
TimeUnit.MILLISECONDS)
enableCleaner: Boolean =true
是否可以进行清理操作
LogConfig
segmentSize: Int = Defaults.SegmentSize
log下每个段segment的字节最大大小,超过大小需要建立新段
segmentMs: Long = Defaults.SegmentMs
当前写入的段离该段创建时间超过segmentMs这个值,就建立新段
segmentJitterMs: Long = Defaults.SegmentJitterMs
为避免config.segmentMs后segment同时进行回滚,用@param segmentJitterMs 来错开进行回滚,就是写入新的segment
randomSegmentJitter =if (segmentJitterMs == 0) 0 else Utils.abs(scala.util.Random.nextInt()) %
math.min(segmentJitterMs, segmentMs)
flushInterval: Long = Defaults.FlushInterval
写入的消息个数达到阈值FlushInterval是,对这个topicAndPartition所属的log进行fulsh
上次flush到新写入的消息的个数 unflushedMessages() = this.logEndOffset - this.recoveryPoint
unflushedMessages >= config.flushInterval
flush使用的截至offset是nextOffsetMetadata.messageOffset
1.把this.recoveryPoint到offset的partition的绝对偏移量的segment列表,来逐个flush
segment.flush就是把index和log文件进行flush
2.用参数offset来设置this.recoveryPoint
3.this.lastflushedTime设置当前时间time.milliseconds
flushMs: Long = Defaults.FlushMs
log的flush时间小于log.config.flushMs,就对log进行flush
lastflushedTime.set(time.milliseconds)
retentionSize: Long = Defaults.RetentionSize
retentionMs: Long = Defaults.RetentionMs
对修改时间和log总字节大小限制进行Segment清理工作
1.删除log目录中修改时间需要删除的Segment
2.日志里的Segment字节总和超过log.config.retentionSize,
就删除一些Segment,直到总大小小于log.config.retentionSize
maxMessageSize: Int = Defaults.MaxMessageSize
每个消息的最大字节数
maxIndexSize:
Int = Defaults.MaxIndexSize
每个segment都有对应的index,index文件大小不能超过maxIndexSize
indexInterval: Int = Defaults.IndexInterval
在log写入数据时,间隔写入indexInterval条后,往index中写一个位置
fileDeleteDelayMs: Long = Defaults.FileDeleteDelayMs
定时删除segment
1.log和index文件后缀加.deleted后缀名
2.启动线程定时器,config.fileDeleteDelayMs后调用segment.delete()
segment.delete()删除指定的带有.deleted后缀名的文件
segment.lastModified
deleteRetentionMs: Long = Defaults.DeleteRetentionMs
得到需要删除的时间戳,比这个时间戳小的,就直接删除,不计入归并计算,记作deleteHorizonMs
1)把offset从0到cleanable.firstDirtyOffset的segment集合
2)取出该集合最后一个segment,这个segment是离当前时间最近的segment,
deleteHorizonMs = seg.lastModified
-
log.config.deleteRetentionMs
minCleanableRatio: Double = Defaults.MinCleanableDirtyRatio
最小的清理log下segment列表的比例
compact: Boolean = Defaults.Compact
需要清理的数据是否要压缩保存
uncleanLeaderElectionEnable: Boolean = Defaults.UncleanLeaderElectionEnable,
minInSyncReplicas: Int = Defaults.MinInSyncReplicas
Defaults
SegmentSize=1024*
1024
config.segmentMs是段对象存活的时间@param segmentMs
为避免config.segmentMs后segment同时进行回滚,用@param segmentJitterMs 来错开进行回滚
SegmentMs= Long.MaxValue
SegmentJitterMs=0L
FlushInterval= Long.MaxValue
FlushMs= Long.MaxValue
RetentionSize= Long.MaxValue
RetentionMs= Long.MaxValue
MaxMessageSize= Int.MaxValue
MaxIndexSize=1024*
1024
IndexInterval=4096
FileDeleteDelayMs=60*
1000L
DeleteRetentionMs=24*
60
* 60
* 1000L
MinCleanableDirtyRatio=0.5
Compact=false
UncleanLeaderElectionEnable=true
MinInSyncReplicas=1