四、Replication Subsystem
1、Replica
Replica是kafka分发数据的最小单元,主要代码如下:
class Replica(val brokerId: Int,
val partition: Partition,
time: Time = SystemTime,
initialHighWatermarkValue: Long = 0L,
val log: Option[Log] = None) extends Logging {
// the high watermark offset value, in non-leader replicas only its message offsets are kept
@volatile private[this] var highWatermarkMetadata: LogOffsetMetadata = new LogOffsetMetadata(initialHighWatermarkValue)
// the log end offset value, kept in all replicas;
// for local replica it is the log's end offset, for remote replicas its value is only updated by follower fetch
@volatile private[this] var logEndOffsetMetadata: LogOffsetMetadata = LogOffsetMetadata.UnknownOffsetMetadata
// the time when log offset is updated
private[this] val logEndOffsetUpdateTimeMsValue = new AtomicLong(time.milliseconds)
val topic = partition.topic
val partitionId = partition.partitionId
……………………………………
override def equals(that: Any): Boolean = {
if(!(that.isInstanceOf[Replica]))
return false
val other = that.asInstanceOf[Replica]
if(topic.equals(other.topic) && brokerId == other.brokerId && partition.equals(other.partition))
return true
false
}
override def hashCode(): Int = {
31 + topic.hashCode() + 17*brokerId + partition.hashCode()
}
……………………………………
}
其中主要成员有以下几个:
highWatermarkMetadata,高水位线标记(简称HW),其实就是offset,每个(consumer,topic,partition)的组合都会记录一个offset,是用于记录consumer的消费状态的元数据。
logEndOffsetMetadata,log中offset的最大值(简称LEO),如果该replica在该broker的本地,则该值是本地log文件的最大值,否则是该broker通过followerfetch得到的offset值。还值得注意的一点是,上述两个变量都被打上了@volatile注解,使得在多线程环境下每个线程访问时都得到内存中的最新值。
logEndOffsetUpdateTimeMsValue,意思显而易见。
topic,partition中的topic。
partitionId,partition的id。
另外该类覆盖equals方法的代码值得学习,不光覆盖了equals还覆盖了hashCode,是书上推荐的最严谨的做法。
2、ReplicaManager
这个类提供了