Flink学习 - 9. Checkpoint使用方式
checkpoint 开启
默认的checkpoint是关闭的,需要使用的使用要优先开启
开启方式:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 设置每隔5000ms启动一个checkpoint
env.enableCheckpointing(1000);
checkpoint 模式
默认的checkPointMode是 Exactly-once,可以设置成 AT_LEAST_ONCE;
主要是以上两种模式。
Exactly-once对于大多数应用来说是最合适的。At-least-once用在某些延迟超低的应用程序,对数据准确性要求不高的应用。
checkpointConfig.setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
flink给出的模式
/**
* The checkpointing mode defines what consistency guarantees the system gives in the presence of
* failures.
*
* <p>When checkpointing is activated, the data streams are replayed such that lost parts of the
* processing are repeated. For stateful operations and functions, the checkpointing mode defines
* whether the system draws checkpoints such that a recovery behaves as if the operators/functions
* see each record "exactly once" ({@link #EXACTLY_ONCE}), or whether the checkpoints are drawn
* in a simpler fashion that typically encounters some duplicates upon recovery
* ({@link #AT_LEAST_ONCE})</p>
*/
@Public
public enum CheckpointingMode {
/**
* Sets the checkpointing mode to "exactly once". This mode means that the system will
* checkpoint the operator and user function state in such a way that, upon recovery,
* every record will be reflected exactly once in the operator state.
*
* <p>For example, if a user function counts the number of elements in a stream,
* this number will consistently be equal to the number of actual elements in the stream,
* regardless of failures and recovery.</p>
*
* <p>Note that this does not mean that each record flows through the streaming data flow
* only once. It means that upon recovery, the state of operators/functions is restored such
* that the resumed data streams pick up exactly at after the last modification to the state.</p>
*
* <p>Note that this mode does not guarantee exactly-once behavior in the interaction with
* external systems (only state in Flink's operators and user functions). The reason for that
* is that a certain level of "collaboration" is required between two systems to achieve
* exactly-once guarantees. However, for certain systems, connectors can be written that facilitate
* this collaboration.</p>
*
* <p>This mode sustains high throughput. Depending on the data flow graph and operations,
* this mode may increase the record latency, because operators need to align their input
* streams, in order to create a consistent snapshot point. The latency increase for simple
* dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average
* latency remains small, but the slowest records typically have a