DelayedOperationPurgatory机制(七):DelayedFetch实现

本文深入解析Kafka中DelayedFetch机制的工作原理,包括关键字段、tryComplete方法的执行逻辑及条件,以及onComplete方法如何生成FetchResponse并返回至客户端。

DelayFetch的主要字段如下:

class DelayedFetch(
                   // 延迟操作的延迟时长
                   delayMs: Long,
                   // 为FetchRequest中所有相关分区记录了相关状态,主要用来判断DelayedProduce是否满足执行条件
                   fetchMetadata: FetchMetadata,
                   replicaManager: ReplicaManager,
                   // 满足条件或者到期执行时,在onComplete方法中调用回调函数,其主要功能是创建FetchResponse并添加到RequestChannels中对应的responseQueue中
                   responseCallback: Map[TopicAndPartition, FetchResponsePartitionData] => Unit)
  extends DelayedOperation(delayMs) {}

delayedFetch.tryComplete方法主要检测是否满足elayedFetch,并在满足时执行forceComplete调用forceComplete方法。判断条件有四条之一即满足:

  /**
   * The operation can be completed if:
   *
   * Case A: This broker is no longer the leader for some partitions it tries to fetch
   * Case B: This broker does not know of some partitions it tries to fetch
   * Case C: The fetch offset locates not on the last segment of the log
   * Case D: The accumulated bytes from all the fetching partitions exceeds the minimum bytes
   *
   * Upon completion, should return whatever data is available for each valid partition
   */
  override def tryComplete() : Boolean = {
    var accumulatedSize = 0
    // 遍历fetchMetadata中所有Partition的状态
    fetchMetadata.fetchPartitionStatus.foreach {
      case (topicAndPartition, fetchStatus) =>
        // 获取前面读取log时的结束位置
        val fetchOffset = fetchStatus.startOffsetMetadata
        try {
          if (fetchOffset != LogOffsetMetadata.UnknownOffsetMetadata) {
              // 查找分区的leader副本,如果找不到就抛异常
            val replica = replicaManager.getLeaderReplicaIfLocal(topicAndPartition.topic, topicAndPartition.partition)
            // 根据FetchRequest请求的来源设置能读取的最大offset值。消费者对应的endOffset是HW,而Follower副本对应的endOffset是LEO
            val endOffset =
              if (fetchMetadata.fetchOnlyCommitted)
                replica.highWatermark
              else
                replica.logEndOffset

            // Go directly to the check for Case D if the message offsets are the same. If the log segment
            // has just rolled, then the high watermark offset will remain the same but be on the old segment,
            // which would incorrectly be seen as an instance of Case C.
            // 检查上次读取后endOffset是否发生变化。如果没改变,之前读不到足够的数据现在还是读不到,即任务条件依然不满足;如果变了则继续下面的检查
            if (endOffset.messageOffset != fetchOffset.messageOffset) {
                // 条件一:开始读取的offset不在activeSegment中
              if (endOffset.onOlderSegment(fetchOffset)) {// 可能是发生了Log截断
                // Case C, this can happen when the new fetch operation is on a truncated leader
                debug("Satisfying fetch %s since it is fetching later segments of partition %s.".format(fetchMetadata, topicAndPartition))
                return forceComplete()
              } else if (fetchOffset.onOlderSegment(endOffset)) { // fetchOffset虽然在endOffset之前,但是产生了新的activeSegment,fetchOffset在旧的logSegments,而endOffset在新的logSegments
                // Case C, this can happen when the fetch operation is falling behind the current segment
                // or the partition has just rolled a new segment
                debug("Satisfying fetch %s immediately since it is fetching older segments.".format(fetchMetadata))
                return forceComplete()
              } else if (fetchOffset.messageOffset < endOffset.messageOffset) {
                // we need take the partition fetch size as upper bound when accumulating the bytes
                // 开始读取的offset和endOffset在同一个activeSegment中,切endOffsets向后移动,那就尝试计算累计的字节数
                accumulatedSize += math.min(endOffset.positionDiff(fetchOffset), fetchStatus.fetchInfo.fetchSize)
              }
            }
          }
        } catch {
          case utpe: UnknownTopicOrPartitionException => // Case B 当前broker找不到需要读取数据的分区副本
            debug("Broker no longer know of %s, satisfy %s immediately".format(topicAndPartition, fetchMetadata))
            return forceComplete()
          case nle: NotLeaderForPartitionException =>  // Case A 发生leader副本迁移
            debug("Broker is no longer the leader of %s, satisfy %s immediately".format(topicAndPartition, fetchMetadata))
            return forceComplete()
        }
    }

    // Case D 累计读取字节数超过最小字节数限制
    if (accumulatedSize >= fetchMetadata.fetchMinBytes)
      forceComplete()
    else
      false
  }

DelayedFetch.onComplete方法如下:

  override def onComplete() {
      // 重新从log中读取数据
    val logReadResults = replicaManager.readFromLocalLog(fetchMetadata.fetchOnlyLeader,
      fetchMetadata.fetchOnlyCommitted,
      fetchMetadata.fetchPartitionStatus.mapValues(status => status.fetchInfo))
    // 将结果进行封装
    val fetchPartitionData = logReadResults.mapValues(result =>
      FetchResponsePartitionData(result.errorCode, result.hw, result.info.messageSet))
    // 调用回调函数
    responseCallback(fetchPartitionData)
  }

// the callback for sending a fetch response
def sendResponseCallback(responsePartitionData: Map[TopicAndPartition, FetchResponsePartitionData]) {

  val convertedPartitionData =
    // Need to down-convert message when consumer only takes magic value 0.
    if (fetchRequest.versionId <= 1) {
      responsePartitionData.map { case (tp, data) =>

        // We only do down-conversion when:
        // 1. The message format version configured for the topic is using magic value > 0, and
        // 2. The message set contains message whose magic > 0
        // This is to reduce the message format conversion as much as possible. The conversion will only occur
        // when new message format is used for the topic and we see an old request.
        // Please note that if the message format is changed from a higher version back to lower version this
        // test might break because some messages in new message format can be delivered to consumers before 0.10.0.0
        // without format down conversion.
        val convertedData = if (replicaManager.getMessageFormatVersion(tp).exists(_ > Message.MagicValue_V0) &&
          !data.messages.isMagicValueInAllWrapperMessages(Message.MagicValue_V0)) {
          trace(s"Down converting message to V0 for fetch request from ${fetchRequest.clientId}")
          new FetchResponsePartitionData(data.error, data.hw, data.messages.asInstanceOf[FileMessageSet].toMessageFormat(Message.MagicValue_V0))
        } else data

        tp -> convertedData
      }
    } else responsePartitionData

  val mergedPartitionData = convertedPartitionData ++ unauthorizedPartitionData

  mergedPartitionData.foreach { case (topicAndPartition, data) =>
    if (data.error != Errors.NONE.code)
      debug(s"Fetch request with correlation id ${fetchRequest.correlationId} from client ${fetchRequest.clientId} " +
        s"on partition $topicAndPartition failed due to ${Errors.forCode(data.error).exceptionName}")
    // record the bytes out metrics only when the response is being sent
    BrokerTopicStats.getBrokerTopicStats(topicAndPartition.topic).bytesOutRate.mark(data.messages.sizeInBytes)
    BrokerTopicStats.getBrokerAllTopicsStats().bytesOutRate.mark(data.messages.sizeInBytes)
  }
   // 定义fetchResponseCallback函数
  def fetchResponseCallback(delayTimeMs: Int) {
    trace(s"Sending fetch response to client ${fetchRequest.clientId} of " +
      s"${convertedPartitionData.values.map(_.messages.sizeInBytes).sum} bytes")
      // 生成fetchResponse对象
    val response = FetchResponse(fetchRequest.correlationId, mergedPartitionData, fetchRequest.versionId, delayTimeMs)
    // 向赌赢responseQueue中添加一个SendAction的response,其中封装了上面的response对象
    requestChannel.sendResponse(new RequestChannel.Response(request, new FetchResponseSend(request.connectionId, response)))
  }


  // When this callback is triggered, the remote API call has completed
  request.apiRemoteCompleteTimeMs = SystemTime.milliseconds

  // Do not throttle replication traffic
  if (fetchRequest.isFromFollower) {
      // 调用fetchResponseCallback返回FetchResponse
    fetchResponseCallback(0)
  } else {
      // 底层也是调用fetchResponseCallback
    quotaManagers(ApiKeys.FETCH.id).recordAndMaybeThrottle(fetchRequest.clientId,
                                                           FetchResponse.responseSize(mergedPartitionData.groupBy(_._1.topic),
                                                                                      fetchRequest.versionId),
                                                           fetchResponseCallback)
  }
}

DelayFetch的流程如下:
1. Follower副本或消费者发送FetchRequest,从某些分区中获取消息
2. FetchRequest经过网络层和API层的处理,到达ReplicaManager,他会从日志存储子系统中读取数据,并检测是否要更新ISR集合、HW等,之后还会执行delayedProducePurgatory中满足条件相关的DelayedProduce
3. 日志存储子系统返回读取消息以及相关信息,例如读取到的offset等
4. ReplicaManager为FetchRequest生成DelayedFetch对象,并交由delayedFetchPurgatory管理
5. delayedFetchPurgatory使用SystemTimer管理DelayedFetch是否超时
6. 生产者发送ProduceRequest追加消息,同时也会检查DelayedFetch是否满足执行条件
7. DelayFetch执行时会调用回调函数产生FetchResponse,添加到RequestChannels中
8. 有网络层将FetchResponse返回到客户端

2025-10-27 17:01:20,950] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 17:01:20,958] INFO [ThrottledChannelReaper-ControllerMutation]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 17:01:21,055] INFO Loading logs from log dirs ArrayBuffer(/export/servers/kafka/kafka-logs) (kafka.log.LogManager) [2025-10-27 17:01:21,065] INFO Skipping recovery for all logs in /export/servers/kafka/kafka-logs since clean shutdown file was found (kafka.log.LogManager) [2025-10-27 17:01:21,095] INFO Loaded 0 logs in 40ms. (kafka.log.LogManager) [2025-10-27 17:01:21,096] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager) [2025-10-27 17:01:21,109] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager) [2025-10-27 17:01:23,477] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas) [2025-10-27 17:01:23,554] INFO Awaiting socket connections on slaver1:9092. (kafka.network.Acceptor) [2025-10-27 17:01:23,758] INFO [SocketServer listenerType=ZK_BROKER, nodeId=2] Created data-plane acceptor and processors for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer) [2025-10-27 17:01:23,908] INFO [broker-2-to-controller-send-thread]: Starting (kafka.server.BrokerToControllerRequestThread) [2025-10-27 17:01:23,947] INFO [ExpirationReaper-2-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:23,964] INFO [ExpirationReaper-2-ElectLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:23,962] INFO [ExpirationReaper-2-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:23,962] INFO [ExpirationReaper-2-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:24,002] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler) [2025-10-27 17:01:24,191] INFO Creating /brokers/ids/2 (is it secure? false) (kafka.zk.KafkaZkClient) [2025-10-27 17:01:24,246] INFO Stat of the created znode at /brokers/ids/2 is: 51539607715,51539607715,1761555684227,1761555684227,1,0,0,216173873305944069,212,0,51539607715 (kafka.zk.KafkaZkClient) [2025-10-27 17:01:24,249] INFO Registered broker 2 at path /brokers/ids/2 with addresses: PLAINTEXT://192.168.10.161:9092, czxid (broker epoch): 51539607715 (kafka.zk.KafkaZkClient) [2025-10-27 17:01:24,561] INFO [ExpirationReaper-2-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:24,583] INFO [ExpirationReaper-2-Heartbeat]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:24,598] INFO [ExpirationReaper-2-Rebalance]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:24,668] INFO [GroupCoordinator 2]: Starting up. (kafka.coordinator.group.GroupCoordinator) [2025-10-27 17:01:24,687] INFO [GroupCoordinator 2]: Startup complete. (kafka.coordinator.group.GroupCoordinator) [2025-10-27 17:01:24,923] INFO [ProducerId Manager 2]: Acquired new producerId block (brokerId:2,blockStartProducerId:1000,blockEndProducerId:1999) by writing to Zk with path version 2 (kafka.coordinator.transaction.ProducerIdManager) [2025-10-27 17:01:24,926] INFO [TransactionCoordinator id=2] Starting up. (kafka.coordinator.transaction.TransactionCoordinator) [2025-10-27 17:01:24,936] INFO [TransactionCoordinator id=2] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator) [2025-10-27 17:01:24,950] INFO [Transaction Marker Channel Manager 2]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager) [2025-10-27 17:01:25,272] INFO [ExpirationReaper-2-AlterAcls]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper) [2025-10-27 17:01:25,388] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread) [2025-10-27 17:01:25,471] INFO [SocketServer listenerType=ZK_BROKER, nodeId=2] Starting socket server acceptors and processors (kafka.network.SocketServer) [2025-10-27 17:01:25,557] INFO [SocketServer listenerType=ZK_BROKER, nodeId=2] Started data-plane acceptor and processor(s) for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer) [2025-10-27 17:01:25,558] INFO [SocketServer listenerType=ZK_BROKER, nodeId=2] Started socket server acceptors and processors (kafka.network.SocketServer) [2025-10-27 17:01:25,613] INFO Kafka version: 2.8.2 (org.apache.kafka.common.utils.AppInfoParser) [2025-10-27 17:01:25,613] INFO Kafka commitId: 3146c6ff4a24cc24 (org.apache.kafka.common.utils.AppInfoParser) [2025-10-27 17:01:25,613] INFO Kafka startTimeMs: 1761555685559 (org.apache.kafka.common.utils.AppInfoParser) [2025-10-27 17:01:25,618] INFO [KafkaServer id=2] started (kafka.server.KafkaServer) [2025-10-27 17:01:26,019] INFO [broker-2-to-controller-send-thread]: Recorded new controller, from now on will use broker master:9092 (id: 1 rack: null) (kafka.server.BrokerToControllerRequestThread)
最新发布
10-28
### ThrottledChannelReaper 组件 日志中显示了多个 `ThrottledChannelReaper` 线程的启动信息,如 `[ThrottledChannelReaper-Fetch]`、`[ThrottledChannelReaper-Produce]`、`[ThrottledChannelReaper-Request]` 和 `[ThrottledChannelReaper-ControllerMutation]`。这些线程属于 `kafka.server.ClientQuotaManager$ThrottledChannelReaper`,用于处理客户端配额相关的限流操作。启动信息如下: ```plaintext [2025-10-27 16:52:42,467] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 16:52:42,467] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 16:52:42,481] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 16:52:42,495] INFO [ThrottledChannelReaper-ControllerMutation]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper) ``` 在 Kafka 关闭时,这些线程也会相应地关闭,如: ```plaintext [2025-10-27 16:52:47,503] INFO [ThrottledChannelReaper-Fetch]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 16:52:47,588] INFO [ThrottledChannelReaper-Fetch]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper) [2025-10-27 16:52:47,589] INFO [ThrottledChannelReaper-Fetch]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper) ``` ### LogManager 组件 `LogManager` 负责管理 Kafka 的日志存储。启动时,它会加载日志目录中的日志: ```plaintext [2025-10-27 16:52:42,621] INFO Loading logs from log dirs ArrayBuffer(/export/servers/kafka/kafka-logs) (kafka.log.LogManager) ``` 由于发现了干净关闭文件,所以跳过了日志恢复过程: ```plaintext [2025-10-27 16:52:42,638] INFO Skipping recovery for all logs in /export/servers/kafka/kafka-logs since clean shutdown file was found (kafka.log.LogManager) ``` 加载完日志后,显示加载了 0 个日志,耗时 60ms: ```plaintext [2025-10-27 16:52:42,681] INFO Loaded 0 logs in 60ms. (kafka.log.LogManager) ``` 之后,启动日志清理和日志刷新任务: ```plaintext [2025-10-27 16:52:42,683] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager) [2025-10-27 16:52:42,692] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager) ``` 在 Kafka 关闭时,`LogManager` 也会进行相应的关闭操作: ```plaintext [2025-10-27 16:52:47,245] INFO Shutting down. (kafka.log.LogManager) [2025-10-27 16:52:47,352] INFO Shutdown complete. (kafka.log.LogManager) ``` ### SocketServer 组件 `SocketServer` 负责处理 Kafka 的网络连接。启动时会更新连接接受率: ```plaintext [2025-10-27 16:52:47,064] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas) ``` 但在尝试绑定到 `slaver2:9092` 时出现错误,导致 Kafka 启动失败: ```plaintext [2025-10-27 16:52:47,160] ERROR [KafkaServer id=3] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.kafka.common.KafkaException: Socket server failed to bind to slaver2:9092: 无法指定被请求的地址. ``` 随后,Kafka 开始进行关闭操作,包括停止 `SocketServer` 的请求处理器: ```plaintext [2025-10-27 16:52:47,178] INFO [SocketServer listenerType=ZK_BROKER, nodeId=3] Stopping socket server request processors (kafka.network.SocketServer) [2025-10-27 16:52:47,201] INFO [SocketServer listenerType=ZK_BROKER, nodeId=3] Stopped socket server request processors (kafka.network.SocketServer) ``` 最终,`SocketServer` 完成关闭: ```plaintext [2025-10-27 16:52:49,608] INFO [SocketServer listenerType=ZK_BROKER, nodeId=3] Shutting down socket server (kafka.network.SocketServer) [2025-10-27 16:52:49,722] INFO [SocketServer listenerType=ZK_BROKER, nodeId=3] Shutdown completed (kafka.network.SocketServer) ``` ### 其他组件 除了上述组件,日志中还显示了其他组件的启动和关闭信息,如 `FinalizedFeatureChangeListener`、`ZooKeeperClient`、`Metrics` 等。这些组件在 Kafka 的启动和关闭过程中也扮演着重要的角色。 ### 总结 Kafka 启动过程中,首先启动了多个 `ThrottledChannelReaper` 线程用于客户端配额限流,然后 `LogManager` 加载日志并启动日志清理和刷新任务。接着,`SocketServer` 尝试绑定到指定地址和端口,但由于地址无法指定导致启动失败,随后 Kafka 开始进行关闭操作,各个组件依次停止工作。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值