Flink Kafka Connector分析

Flink Kafka Connector分析

1 FlinkKafkaConsumer

Flink kafka consumer connector 是用来消费kafka数据到flink系统的连接器,作为flink系统的一个source存在。目前flink支持的kafka版本有0.8、0.9、0.10、0.11以及2.0+。由于目前我们使用的kafka版本是0.10.0.1,所以接下来主要基于0.10来分析。

在这里插入图片描述

上图是FlinkKafkaConsumer的继承关系图,我们关心的FlinkKafkaConsumer010是继承自FlinkKafkaConsumer09,并且FlinkKafkaConsumer010是一个RichFunction。它的基础类是FlinkKafkaConsumerBase,接下来我们开始分析FlinkKafkaConsumerBase

1.1 FlinkKafkaConsumerBase

FlinkKafkaConsumerBase是FlinkKafkaConsumer的基础类,也是非常核心的一个类,它除了继承自RichParallelSourceFunction,还实现了CheckpointedFunction和CheckpointListener这两个接口,主要用于checkpoint的快照保存和恢复以及快照完成后执行的回调。

public abstract class FlinkKafkaConsumerBase<T> extends RichParallelSourceFunction<T> implements
		CheckpointListener,
		ResultTypeQueryable<T>,
		CheckpointedFunction

下面来看看FlinkKafkaConsumerBase里定义了哪些属性?

//FlinkKafkaConsumerBase
//pendingOffsetsToCommit中最多保存100个checkoint,超过会删除最旧的。
public static final int MAX_NUM_PENDING_CHECKPOINTS = 100;
//partition discovery开关,默认关闭
public static final long PARTITION_DISCOVERY_DISABLED = Long.MIN_VALUE;
//metrics开关
public static final String KEY_DISABLE_METRICS = "flink.disable-metrics";
//partition discovery 间隔时间的key
public static final String KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS = "flink.partition-discovery.interval-millis";
//partition-offset state的名字
private static final String OFFSETS_STATE_NAME = "topic-partition-offset-states";
//topic的描述,即:topic的名称
private final KafkaTopicsDescriptor topicsDescriptor;
//kafka消息反序列化的schema
protected final KafkaDeserializationSchema<T> deserializer;
//从指定offset位置开始订阅topic的分区
private Map<KafkaTopicPartition, Long> subscribedPartitionsToStartOffsets;
//周期性watermark分配器
private SerializedValue<AssignerWithPeriodicWatermarks<T>> periodicWatermarkAssigner;
//间隙性watermark分配器
private SerializedValue<AssignerWithPunctuatedWatermarks<T>> punctuatedWatermarkAssigner;
//基于checkpoint提交offset开关,默认打开
private boolean enableCommitOnCheckpoints = true;
//使用当前topic描述去过滤不匹配的分区(基于快照恢复时)
private boolean filterRestoredPartitionsWithCurrentTopicsDescriptor = true;
//offset提交模式:关闭/基于checkpoint提交/基于kafka周期性提交
private OffsetCommitMode offsetCommitMode;
//partition discovery 间隔时间
private final long discoveryIntervalMillis;
//从指定offset位置开始订阅topic的分区的模式:EARLIEST/LATEST/GROUP_OFFSETS/SPECIFIC_OFFSETS/TIMESTAMP,其中0.10.0.1版本不支持TIMESTAMP
private StartupMode startupMode = StartupMode.GROUP_OFFSETS;
//从指定特殊的offset位置开始订阅topic的分区
private Map<KafkaTopicPartition, Long> specificStartupOffsets;
//从指定特殊的时间对应的offset位置开始订阅topic的分区
private Long startupOffsetsTimestamp;
//记录正在进行的快照(即partition-offset的state)
private final LinkedMap pendingOffsetsToCommit = new LinkedMap();
//用于从kafka 拉取数据
private transient volatile AbstractFetcher<T, ?> kafkaFetcher;
//用于分区实时发现
private transient volatile AbstractPartitionDiscoverer partitionDiscoverer;
//基于该state恢复
private transient volatile TreeMap<KafkaTopicPartition, Long> restoredState;
//保存的partition-offset state,为union state
private transient ListState<Tuple2<KafkaTopicPartition, Long>> unionOffsetStates;
//是否支持从老的状态恢复,老状态是指Flink 1.1 or 1.2.的状态
private boolean restoredFromOldState;
//分区实时发现的线程
private transient volatile Thread discoveryLoopThread;
//运行标志
private volatile boolean running = true;
//mrtrics
private final boolean useMetrics;
private transient Counter successfulCommits;
private transient Counter failedCommits;
private transient KafkaCommitCallback offsetCommitCallback;

FlinkKafkaConsumerBase中暴露给用户的api包括assignTimestampsAndWatermarks、setStartFromEarliest、setStartFromLatest、disableFilterRestoredPartitionsWithSubscribedTopics等,主要用于watermark的设置以及从指定位置开始消费数据。
FlinkKafkaConsumerBase中最先执行的方法是initializeState,主要用于状态的初始化:

(1) 从上下文获取stateStore

(2) 从stateStore中获取状态,包括老状态(应该是为了兼容1.2版本)和新状态,如果有老状态,迁移到unionOffsetStates 中

(3) 将unionOffsetStates 插入到restoredState,用于恢复状态。

	//FlinkKafkaConsumerBase
	public final void initializeState(FunctionInitializationContext context) throws Exception {
   
   

		OperatorStateStore stateStore = context.getOperatorStateStore();
		//获取老版本的状态
		ListState<Tuple2<KafkaTopicPartition, Long>> oldRoundRobinListState =
		stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
		//从stateStore中获取状态,如果没有会创建。
		this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(
				OFFSETS_STATE_NAME,
				TypeInformation.of(new TypeHint<Tuple2<KafkaTopicPartition, Long>>() {
   
   })));
		if (context.isRestored() && !restoredFromOldState) {
   
   
			restoredState = new TreeMap<>(new KafkaTopicPartition.Comparator());
			// 迁移老的状态到unionOffsetStates
			for (Tuple2<KafkaTopicPartition, Long> kafkaOffset : oldRoundRobinListState.get()) {
   
   
				restoredFromOldState = true;
				unionOffsetStates.add(kafkaOffset);
			}
			oldRoundRobinListState.clear();
			//存在老的状态并且partition discovery没有关闭
			if (restoredFromOldState && discoveryIntervalMillis != PARTITION_DISCOVERY_DISABLED) {
   
   
				//抛异常
			}
			// 将unionOffsetStates中的状态插入restoredState中
			for (Tuple2<KafkaTopicPartition, Long> kafkaOffset : unionOffsetStates.get()) {
   
   
				restoredState.put(kafkaOffset.f0, kafkaOffset.f1);
			}
			//log
		} else {
   
   
			//log
		}
	}

FlinkKafkaConsumerBase的open方法主要用于初始化,执行于initializeState方法之后,下面结合代码来看看具体流程:

(1) 初始化offsetCommitMode,在开启了checkpoint的情况下,如果enableCommitOnCheckpoint开启,则为ON_CHECKPOINTS,否则为DISABLED;如果未开启checkpoint,开启了自动提交offset,则为KAFKA_PERIODIC,否则为DISABLED。最后如果offsetCommitMode为ON_CHECKPOINTS或DISABLED,enable.auto.commit将被设置为false,具体是在adjustAutoCommitConfig方法中实现,比较简单。offsetCommitMode初始化完成后,接着初始化partitionDiscoverer,010创建的是Kafka010PartitionDiscoverer,然后调用AbstractPartitionDiscoverer的open方法,主要是初始化kafka consumer。

在这里插入图片描述

offsetCommitMode 描述
DISABLED 不开启offset提交
ON_CHECKPOINTS 确保checkpoint完成以后再提交offset到kafka
KAFKA_PERIODIC 周期性的自动提交offset到kafka

(2) 查找topic对应的所有分区,并初始化每个分区的消费位点,保存到subscribedPartitionsToStartOffsets中。

在这里插入图片描述

startupMode 描述
GROUP_OFFSETS 从该组最后提交的offset位置开始消费
EARLIEST 从开始位置开始消费
LATEST 从结束位置开始消费
TIMESTAMP 从指定时间戳对应的提交的offset位置开始消费
SPECIFIC_OFFSETS 从指定的offset位置开始消费
	//FlinkKafkaConsumerBase
	public void open(Configuration configuration) throws Exception {
   
   
		// 初始化offsetCommitMode
		this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
				getIsAutoCommitEnabled(),
				enableCommitOnCheckpoints,
				((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());
		// 初始化partitionDiscoverer,并调用open方法
		this.partitionDiscoverer = createPartitionDiscoverer(
				topicsDescriptor,
				getRuntimeContext().getIndexOfThisSubtask(),
				getRuntimeContext().getNumberOfParallelSubtasks());
		this.partitionDiscoverer.open();
		subscribedPartitionsToStartOffsets = new HashMap<>();
		// 查找topic的所有分区,后面介绍PartitionDiscoverer时会详细介绍
		final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
		// 如果从状态恢复
		if (restoredState != null) {
   
   
			for (KafkaTopicPartition partition : allPartitions) {
   
   
				if (!restoredState.containsKey(partition)) {
   
   
					//如果恢复的状态不包含该分区,则默认以EARLIEST开始消费,并插入restoredState
					restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);
				}
			}
			//遍历restoredState
			for (Map.Entry<KafkaTopicPartition, Long> restoredStateEntry : restoredState.entrySet()) {
   
   
				if (!restoredFromOldState) {
   
   
					//不从老状态恢复,则分配给该task订阅的分区和offset,并插入subscribedPartitionsToStartOffsets
					if (KafkaTopicPartitionAssigner.assign(
						restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())
							== getRuntimeContext().getIndexOfThisSubtask()){
   
   
						subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());
					}
				} else {
   
   
					// 老状态直接插入subscribedPartitionsToStartOffsets
					subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());
				}
			}
			//过滤不匹配的分区
			if (filterRestoredPartitionsWithCurrentTopicsDescriptor) {
   
   
				subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -> {
   
   
					if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) {
   
   
						//log
						return true;
					}
					return false;
				});
			}
			//log
		} else {
   
    // 不从状态恢复
			switch (startupMode) {
   
   
				//特殊offset恢复
				case SPECIFIC_OFFSETS:
					if (specificStartupOffsets == null) {
   
   
						// throw IllegalStateException
					}

					for (KafkaTopicPartition seedPartition : allPartitions) {
   
   
						Long specificOffset = specificStartupOffsets.get(seedPartition);
						if (specificOffset != null) {
   
   
							//从specificStartupOffsets中获取分区对应的offset并不为空,并插入subscribedPartitionsToStartOffsets,用于从此位置订阅分区。
							subscribedPartitionsToStartOffsets.put(seedPartition, specificOffset - 1);
						} else {
   
   
							//从specificStartupOffsets中获取分区对应的offset为空,默认以GROUP_OFFSET方式订阅
							subscribedPartitionsToStartOffsets.put(seedPartition, KafkaTopicPartitionStateSentinel.GROUP_OFFSET);
						}
					}
					break;
				//指定时间戳对应的offset恢复,适用于0.10.2+版本的kafka
				case TIMESTAMP:
					if (startupOffsetsTimestamp == null) {
   
   
						//throw IllegalStateException
					}
					//查找时间戳对应的partition-offset,如果对应的offset为空,默认以LATEST_OFFSET方式订阅
					for (Map.Entry<KafkaTopicPartition, Long> partitionToOffset
							: fetchOffsetsWithTimestamp(allPartitions, startupOffsetsTimestamp).entrySet()) {
   
   
						subscribedPartitionsToStartOffsets.put(
							partitionToOffset.getKey(),
							(partitionToOffset.getValue() == null)
									? KafkaTopicPartitionStateSentinel.LATEST_OFFSET
									: partitionToOffset.getValue() - 1);
					}
					break;
				//否则以GROUP_OFFSET方式订阅
				default:
					for (KafkaTopicPartition seedPartition : allPartitions) {
   
   
						subscribedPartitionsToStartOffsets.put(seedPartition, startupMode.getStateSentinel());
					}
			}
			//log
		}
	}

初始化完成后,接下来就到了核心的run方法了,这才是实际的执行逻辑,下面来看看run方法做了些什么:

(1) 初始化metric,初始化offsetCommitCallback,创建kafkaFetcher

(2) 如果开启了PARTITION_DISCOVERY,启动partition discovery线程和fetch loop,否则仅启动fetch loop

	//FlinkKafkaConsumerBase
	public void run(SourceContext<T> sourceContext) throws Exception {
   
   
		if (subscribedPartitionsToStartOffsets == null) {
   
   
			//throw Exception
		}
		// 初始化successfulCommits和failedCommits这两个metric,省略
		//获取当前subtask的index
		final int subtaskIndex = this.getRuntimeContext().getIndexOfThisSubtask();

		//初始化offsetCommitCallback,提交offset的回调函数,省略
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值