kafka的分区状态通过PartitionStateMachine来进行维护,下面通过源码来对这个类进行探究:
一 该类成员变量部分:
private val controllerContext = controller.controllerContext
private val controllerId = controller.config.brokerId
private val zkUtils = controllerContext.zkUtils
private val partitionState: mutable.Map[TopicAndPartition, PartitionState] = mutable.Map.empty
private val brokerRequestBatch = new ControllerBrokerRequestBatch(controller)
private val noOpPartitionLeaderSelector = new NoOpLeaderSelector(controllerContext)
private val stateChangeLogger = KafkaController.stateChangeLogger
this.logIdent = "[Partition state machine on Controller " + controllerId + "]: "
二
def startup() {
initializePartitionState()
triggerOnlinePartitionStateChange()
info("Started partition state machine with initial state -> " + partitionState.toString())
}
该方法的作用请求成功的controller选举,当触发了所有的分区状态的变更之后,会注册一个状态变化的监听器
1 initializePartitionState()方法
负责初始化分区的状态,分区状态在此方法里面会被存储一个Map结构:partitionState
2 triggerOnlinePartitionStateChange()方法负责具体的分区状态转换,其内部是通过
handleStateChange()方法来实现,
handleStateChange()方法里面,是通过将目标状态和kafka分区的NewPartition、OnlinePartition等状态进行模式匹配;
在kafka分区状态转换中,有一下几种状态之间的转换:
1 NonExistentPartition -> NewPartition:
将指定的副本从zk中加载到controller缓存
2 NewPartition -> OnlinePartition
指定第一个存活的副本作为其leader以及所有的副本作为其isr列表
3 OnlinePartition,OfflinePartition -> OnlinePartition
进行leader的重新选举以及isr列表
OnlinePartition转化为OnlinePartition发生在当前集群的Leader分布不均衡,导致节点与节点之间的负载波动比较大,需要进行 Leader的重新选举,此时优先考虑AR列表的第一个副本作为其Lrader,或者发生在当前Leader准备下线,此时需要出发Leader的重新选举进行重新上线,此时需要考虑ISR列表中排在原始Leader后的的第一个副本为Leader
OfflinePartition -> OnlinePartition 发生在所有的分区副本下线而又部分副本上线的时候,此时对于ISR和Leader的选举需要考虑
初始的ISR、AR以及当前Leader Broker之间的关系,优先选择ISR列表的第一个Live Broker,其次考虑AR的第一个Live Broker
4 NewPartition,OnlinePartition,OfflinePartition -> OfflinePartition
只进行一个状态的标记
5 OfflinePartition -> NonExistentPartition
只进行一个状态的标记
private def handleStateChange(topic: String, partition: Int, targetState: PartitionState,
leaderSelector: PartitionLeaderSelector,
callbacks: Callbacks) {
val topicAndPartition = TopicAndPartition(topic, partition)
val currState = partitionState.getOrElseUpdate(topicAndPartition, NonExistentPartition)
try {
//检查前置状态是否为targetState
assertValidTransition(topicAndPartition, targetState)
targetState match {
case NewPartition =>
//将状态切换为NewPartition,切换为NewPartition是很简单的,只要将分区状态置为NewPartition即可
partitionState.put(topicAndPartition, NewPartition)
val assignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition).mkString(",")
stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s with assigned replicas %s"
.format(controllerId, controller.epoch, topicAndPartition, currState, targetState,
assignedReplicas))
// post: partition has been assigned replicas
case OnlinePartition =>
partitionState(topicAndPartition) match {
case NewPartition =>
// initialize leader and isr path for new partition,初始化分区的leader和isr列表,通过
initializeLeaderAndIsrForPartition方法实现,在NewPartition转化为OnlinePartition的时候,并没有进行Leader的选举和选择isr列表,默认AR列表的第一个Live Broker为其Leader,Live Broker为其ISR,持久化至ZK
initializeLeaderAndIsrForPartition(topicAndPartition)
case OfflinePartition =>
//OfflinePartition转化为OnlinePartition会进行Leader的选举和ISr的选择
electLeaderForPartition(topic, partition, leaderSelector)
case OnlinePartition => // invoked when the leader needs to be re-elected
electLeaderForPartition(topic, partition, leaderSelector)
case _ => // should never come here since illegal previous states are checked above
}
partitionState.put(topicAndPartition, OnlinePartition)
val leader = controllerContext.partitionLeadershipInfo(topicAndPartition).leaderAndIsr.leader
stateChangeLogger.trace("Controller %d epoch %d changed partition %s from %s to %s with leader %d"
.format(controllerId, controller.epoch, topicAndPartition, currState, targetState, leader))
// post: partition has a leader
case OfflinePartition =>
// should be called when the leader for a partition is no longer alive
stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s"
.format(controllerId, controller.epoch, topicAndPartition, currState, targetState))
partitionState.put(topicAndPartition, OfflinePartition)
// post: partition has no alive leader
case NonExistentPartition =>
stateChangeLogger.trace("Controller %d epoch %d changed partition %s state from %s to %s"
.format(controllerId, controller.epoch, topicAndPartition, currState, targetState))
partitionState.put(topicAndPartition, NonExistentPartition)
// post: partition state is deleted from all brokers and zookeeper
}
} catch {
case t: Throwable =>
stateChangeLogger.error("Controller %d epoch %d initiated state change for partition %s from %s to %s failed"
.format(controllerId, controller.epoch, topicAndPartition, currState, targetState), t)
}
}