flink的mysql两阶段提交_Flink 源码之两阶段提交-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_29450147/article/details/115063885

本文深入探讨了Flink的两阶段提交协议，详细解释了`TwoPhaseCommitSinkFunction`的实现，包括`preCommit`、`commit`方法的调用时机，以及如何处理和恢复事务。此外，还介绍了`FlinkKafkaProducer`如何支持两阶段提交，特别是对`KafkaTransactionState`的管理，确保数据一致性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Flink 系列博客

Flink QuickStart

Flink双流操作

Flink on Yarn Kerberos的配置

Flink on Yarn部署和任务提交操作

Flink配置Prometheus监控

Flink in docker 部署

Flink HA 部署

Flink 常见调优参数总结

Flink 源码之任务提交流程分析

Flink 源码之基本算子

Flink 源码之Trigger

Flink 源码之Evictor

Flink 源码之Window

Flink 源码之WindowOperator

Flink 源码之StreamGraph生成

Flink 源码之JobGraph生成

两阶段提交协议

两阶段提交协议针对Flink的Sink。要求下游的系统支持事务，或者是幂等性。两阶段提交是指如下两个阶段：

preCommit：预提交。在Sink进行snapshot操作的时候调用此方法。

commit：真正的提交操作。当系统中各个operator的checkpoint操作都成功之后，JobManager会通知各个operator checkpoint操作已完成。此时会调用该方法。

TwoPhaseCommitSinkFunction

该类是实现两阶段提交Sink的父类，封装了两阶段提交的主要逻辑。

initializeState方法。该方法在CheckpointedFunction接口中定义，在集群中执行的时候调用，用于初始化状态后端。

该方法主要有以下逻辑：

获取状态存储变量state。

提交所有已经执行过preCommit的事务。

终止所有尚未preCommit的事务。

创建一个新事务。

代码如下：

@Override

public void initializeState(FunctionInitializationContext context) throws Exception {

// when we are restoring state with pendingCommitTransactions, we don't really know whether the

// transactions were already committed, or whether there was a failure between

// completing the checkpoint on the master, and notifying the writer here.

// (the common case is actually that is was already committed, the window

// between the commit on the master and the notification here is very small)

// it is possible to not have any transactions at all if there was a failure before

// the first completed checkpoint, or in case of a scale-out event, where some of the

// new task do not have and transactions assigned to check)

// we can have more than one transaction to check in case of a scale-in event, or

// for the reasons discussed in the 'notifyCheckpointComplete()' method.

// 获取状态存储

state = context.getOperatorStateStore().getListState(stateDescriptor);

boolean recoveredUserContext = false;

// 从上一个snapshot恢复完成的时候返回true，如果任务不支持snapshot，永远返回false

if (context.isRestored()) {

LOG.info("{} - restoring state", name());

for (State operatorState : state.get()) {

userContext = operatorState.getContext();

// 获取待提交的事务

// 在snapshotState方法调用preCommit之后，事务会被存储到该列表

List> recoveredTransactions = operatorState.getPendingCommitTransactions();

List handledTransactions = new ArrayList<>(recoveredTransactions.size() + 1);

for (TransactionHolder recoveredTransaction : recoveredTransactions) {

// If this fails to succeed eventually, there is actually data loss

// 恢复并提交这些之前在state中保存的事务

recoverAndCommitInternal(recoveredTransaction);

handledTransactions.add(recoveredTransaction.handle);

LOG.info("{} committed recovered transaction {}", name(), recoveredTransaction);

}

{

// 获取到尚未preCommit的事务

TXN transaction = operatorState.getPendingTransaction().handle;

// 恢复并终止该事务

recoverAndAbort(transaction);

handledTransactions.add(transaction);

LOG.info("{} aborted recovered transaction {}", name(), operatorState.getPendingTransaction());

}

if (userContext.isPresent()) {

finishRecoveringContext(handledTransactions);

recoveredUserContext = true;

}

// if in restore we didn't get any userContext or we are initializing from scratch

if (!recoveredUserContext) {

LOG.info("{} - no state to restore", name());

userContext = initializeUserContext();

}

this.pendingCommitTransactions.clear();

// 创建一个新的事务

currentTransactionHolder = beginTransactionInternal();

LOG.debug("{} - started new transaction '{}'", name(), currentTransactionHolder);

}

preCommit的调用时机：Sink的snapshotState方法。该方法在Sink保存快照的时候调用。

@Override

public void snapshotState(FunctionSnapshotContext context) throws Exception {

// this is like the pre-commit of a 2-phase-commit transaction

// we are ready to commit and remember the transaction

// 检查确保进行snapshot的时候必须存在事务

checkState(currentTransactionHolder != null, "bug: no transaction object when performing state snapshot");

long checkpointId = context.getCheckpointId();

LOG.debug("{} - checkpoint {} triggered, flushing transaction '{}'", name(), context.getCheckpointId(), currentTransactionHolder);

// 调用preCommit方法

preCommit(currentTransactionHolder.handle);

// 在未提交事务列表(pendingCommitTransactions)中记录该事务

pendingCommitTransactions.put(checkpointId, currentTransactionHolder);

LOG.debug("{} - stored pending transactions {}", name(), pendingCommitTransactions);

// 开启新的事务

currentTransactionHolder = beginTransactionInternal();

LOG.debug("{} - started new transaction '{}'", name(), currentTransactionHolder);

// 清空state，然后记录当前事务和待提交事务

state.clear();

state.add(new State<>(

this.currentTransactionHolder,

new ArrayList<>(pendingCommitTransactions.values()),

userContext));

}

commit方法的调用时机。notifyCheckpointComplete方法，当所有的operator都checkpoint成功的时候，JobManager会通知各个operator checkpoint过程已完成。此时会调用该方法。

@Override

public final void notifyCheckpointComplete(long checkpointId) throws Exception {

// the following scenarios are possible here

// (1) there is exactly one transaction from the latest checkpoint that

// was triggered and completed. That should be the common case.

// Simply commit that transaction in that case.

// (2) there are multiple pending transactions because one previous

// checkpoint was skipped. That is a rare case, but can happen

// for example when:

// - the master cannot persist the metadata of the last

// checkpoint (temporary outage in the storage system) but

// could persist a successive checkpoint (the one notified here)

// - other tasks could not persist their status during

// the previous checkpoint, but did not trigger a failure because they

// could hold onto their state and could successfully persist it in

// a successive checkpoint (the one notified here)

// In both cases, the prior checkpoint never reach a committed state, but

// this checkpoint is always expected to subsume the prior one and cover all

// changes since the last successful one. As a consequence, we need to commit

// all pending transactions.

// (3) Multiple transactions are pending, but the checkpoint complete notification

// relates not to the latest. That is possible, because notification messages

// can be delayed (in an extreme case till arrive after a succeeding checkpoint

// was triggered) and because there can be concurrent overlapping checkpoints

// (a new one is started before the previous fully finished).

// ==> There should never be a case where we have no pending transaction here

// 获取所有待提交的事务

Iterator>> pendingTransactionIterator = pendingCommitTransactions.entrySet().iterator();

checkState(pendingTransactionIterator.hasNext(), "checkpoint completed, but no transaction pending");

Throwable firstError = null;

while (pendingTransactionIterator.hasNext()) {

Map.Entry> entry = pendingTransactionIterator.next();

Long pendingTransactionCheckpointId = entry.getKey();

TransactionHolder pendingTransaction = entry.getValue();

// 只提交在checkpointId之前的事务

if (pendingTransactionCheckpointId > checkpointId) {

continue;

}

LOG.info("{} - checkpoint {} complete, committing transaction {} from checkpoint {}",

name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId);

logWarningIfTimeoutAlmostReached(pendingTransaction);

try {

// 逐个提交之前preCommit过的事务

commit(pendingTransaction.handle);

} catch (Throwable t) {

if (firstError == null) {

firstError = t;

}

LOG.debug("{} - committed checkpoint transaction {}", name(), pendingTransaction);

// 将提交过的事务从待提交事务列表中清除

pendingTransactionIterator.remove();

}

if (firstError != null) {

throw new FlinkRuntimeException("Committing one of transactions failed, logging first encountered failure",

firstError);

}

FlinkKafkaInternalProducer

FlinkKafkaInternalProducer为Flink对Kafka Producer的一个封装。

其中引入了producerClosingLock变量，用于对事务提交，回滚和关闭producer等操作加锁。在kafka 2.3.0之前有一个bug，关闭producer的线程和提交/终止事务的线程会发生死锁。在FlinkKafkaInternalProducer对这些操作手工加锁，避免了此类问题。

FlinkKafkaInternalProducer还持有一个transactionId。创建的时候会从ProducerConfig配置中获取。

主要代码如下所示：

@Override

public void beginTransaction() throws ProducerFencedException {

synchronized (producerClosingLock) {

ensureNotClosed();

kafkaProducer.beginTransaction();

}

@Override

public void commitTransaction() throws ProducerFencedException {

synchronized (producerClosingLock) {

ensureNotClosed();

kafkaProducer.commitTransaction();

}

@Override

public void abortTransaction() throws ProducerFencedException {

synchronized (producerClosingLock) {

ensureNotClosed();

kafkaProducer.abortTransaction();

}

@Override

public void close() {

closed = true;

synchronized (producerClosingLock) {

kafkaProducer.close();

}

调用事务的每个方法前先加锁(包括close方法)。防止上述的死锁情况发生。

FlinkKafkaProducer

FlinkKafkaProducer实现了TwoPhaseCommitSinkFunction。实现了两阶段提交的主要逻辑。

beginTransaction 方法。该方法会创建一个新的事务。

protected FlinkKafkaProducer.KafkaTransactionState beginTransaction() throws FlinkKafkaException {

switch (semantic) {

case EXACTLY_ONCE:

// 获取一个支持事务的kafka producer，类型为FlinkKafkaInternalProducer

FlinkKafkaInternalProducer producer = createTransactionalProducer();

// 开启kafka producer的事务

producer.beginTransaction();

// 返回事务的状态，包含有transaction Id

return new FlinkKafkaProducer.KafkaTransactionState(producer.getTransactionalId(), producer);

case AT_LEAST_ONCE:

case NONE:

// Do not create new producer on each beginTransaction() if it is not necessary

// 获取当前的transaction

final FlinkKafkaProducer.KafkaTransactionState currentTransaction = currentTransaction();

// 如果当前有事务，返回当前事务对应的kafka producer

if (currentTransaction != null && currentTransaction.producer != null) {

return new FlinkKafkaProducer.KafkaTransactionState(currentTransaction.producer);

}

// 否则直接返回不支持事务的kafka producer

return new FlinkKafkaProducer.KafkaTransactionState(initNonTransactionalProducer(true));

default:

throw new UnsupportedOperationException("Not implemented semantic");

}

preCommit方法如下所示：

@Override

protected void preCommit(FlinkKafkaProducer.KafkaTransactionState transaction) throws FlinkKafkaException {

switch (semantic) {

case EXACTLY_ONCE:

case AT_LEAST_ONCE:

// EXACTLY_ONCE和AT_LEAST_ONCE需要flush

flush(transaction);

break;

// NONE的话不进行任何操作

case NONE:

break;

default:

throw new UnsupportedOperationException("Not implemented semantic");

}

checkErroneous();

}

preCommit方法会调用kafka producer的flush方法，确保producer缓冲区的消息都已经发送至kafka broker。

flush方法的源码如下：

private void flush(FlinkKafkaProducer.KafkaTransactionState transaction) throws FlinkKafkaException {

// 调用kafka producer的flush方法，清空发送队列，flush过程中会阻塞

if (transaction.producer != null) {

transaction.producer.flush();

}

// 获取待发送记录数量。flush过后得发送消息条数应为0，如果不为0，抛出异常

long pendingRecordsCount = pendingRecords.get();

if (pendingRecordsCount != 0) {

throw new IllegalStateException("Pending record count must be zero at this point: " + pendingRecordsCount);

}

// if the flushed requests has errors, we should propagate it also and fail the checkpoint

checkErroneous();

}

以下是commit方法。commit方法中又调用了kafka producer的commitTransaction方法。然后回收循环使用transactionId，关闭kafka producer。

@Override

protected void commit(FlinkKafkaProducer.KafkaTransactionState transaction) {

if (transaction.isTransactional()) {

try {

// 提交任务

transaction.producer.commitTransaction();

} finally {

// 循环使用producer的transactionId(加入到availableTransactionalIds)，并且关闭producer

recycleTransactionalProducer(transaction.producer);

}

recoverAndCommit方法。该方法用于从transactionalId恢复出对应的kafka producer，然后在提交任务。

@Override

protected void recoverAndCommit(FlinkKafkaProducer.KafkaTransactionState transaction) {

if (transaction.isTransactional()) {

try (

// 尝试根据已有的transactionId重新建立producer，然后提交任务

FlinkKafkaInternalProducer producer =

initTransactionalProducer(transaction.transactionalId, false)) {

producer.resumeTransaction(transaction.producerId, transaction.epoch);

producer.commitTransaction();

} catch (InvalidTxnStateException | ProducerFencedException ex) {

// That means we have committed this transaction before.

LOG.warn("Encountered error {} while recovering transaction {}. " +

"Presumably this transaction has been already committed before",

ex,

transaction);

}

abort方法。用于终止事务，会放弃事务的提交，回收transactionId并关闭producer。

@Override

protected void abort(FlinkKafkaProducer.KafkaTransactionState transaction) {

if (transaction.isTransactional()) {

// 终止transaction

transaction.producer.abortTransaction();

// 回收transactionId，关闭producer

recycleTransactionalProducer(transaction.producer);

}