Flink状态以及故障容错

本文详细介绍了Apache Flink中的状态管理,包括KeyedState和OperatorState的使用,ManagedState与RawState的区别,以及如何利用ValueState、ListState、MapState等进行数据处理。此外,还探讨了Flink的容错机制,如Checkpoint、SavePoint和StateBackend的选择,以及故障恢复策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

State & Fault Tolerance

参考资料:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/

Flink状态使用

Flink中常见的状态大致分为了两类:Keyed StateOperator State.

Keyed State:Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream. each Keyed state is bound to <keyed-parallel-operator-instance, key>.and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
在这里插入图片描述
Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.

Operator State:With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance.

总结:在flink中无论是Keyed State或者是Operator State都可以两种形式存在 Managed StateRaw State,推荐使用**Managed State**,因为Flink可以优化状态存储,在故障恢复期间可以做状态重新分发。

Managed KeyedState

在这里插入图片描述

  • ValueState
object FlinkWordCountsValueState {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    fsEnv.socketTextStream("HadoopNode00",9999)
        .flatMap(line=>line.split("\\s+"))
        .map(word=>(word,1))
        .keyBy(0)
        .map(new ValueStateRichMapFunction)
        .printToErr("测试")

    fsEnv.execute("FlinkWordCounts")
  }
}
class ValueStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var historyCount:ValueState[Int]=_

  override def open(parameters: Configuration): Unit = {
    val vsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])
    historyCount=getRuntimeContext.getState[Int](vsd)
  }
  override def map(value: (String, Int)): (String, Int) = {
    var hisrtyCountNum:Int = historyCount.value()
    var currentCount=hisrtyCountNum+value._2
    historyCount.update(currentCount)

    (value._1,currentCount)
  }
}
  • ListSate
object FlinkUserPasswordListState {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    //001 zhangsan 123456
    fsEnv.socketTextStream("HadoopNode00",9999)
        .map(line=>line.split("\\s+"))
        .map(tokens=>(tokens(0),tokens(1),tokens(2)))
        .keyBy(0)
        .map(new ListStateRichMapFunction)
        .printToErr("测试")

    fsEnv.execute("FlinkWordCounts")
  }
}
class ListStateRichMapFunction extends RichMapFunction[(String,String,String),(String,String,String)]{
  var historyPasswords:ListState[String]=_

  override def open(parameters: Configuration): Unit = {
    val vsd = new ListStateDescriptor[String]("historyPasssword",createTypeInformation[String])
    historyPasswords=getRuntimeContext.getListState[String](vsd)
  }

  override def map(value: (String, String, String)): (String, String, String) = {
    historyPasswords.add(value._3)
    val list = historyPasswords.get().asScala.toList
    (value._1,value._2,list.mkString(","))
  }
}
  • Map state
object FlinkUserMapState {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    //001 zhangsan 水果类 4.5
    fsEnv.socketTextStream("HadoopNode00",9999)
        .map(line=>line.split("\\s+"))
        .map(tokens=>(tokens(0),tokens(1),tokens(2),tokens(3).toDouble))
        .keyBy(0)
        .map(new MapStateRichMapFunction)
        .printToErr("测试")

    fsEnv.execute("FlinkWordCounts")
  }
}
class MapStateRichMapFunction extends RichMapFunction[(String,String,String,Double),(String,String,String)]{
  var historyMapState:MapState[String,Double]=_

  override def open(parameters: Configuration): Unit = {
    val vsd = new MapStateDescriptor[String,Double]("mapstate",createTypeInformation[String],
      createTypeInformation[Double])
    historyMapState=getRuntimeContext.getMapState[String,Double](vsd)
  }

  override def map(value: (String, String, String,Double)): (String, String, String) = {
    val cost = historyMapState.get(value._3)
    var currentCost=cost+value._4
    historyMapState.put(value._3,currentCost)
    (value._1,value._2,historyMapState.iterator().asScala.map(t=>t.getKey+":"+t.getValue).mkString(","))
  }
}
  • ReducingState
object FlinkWordCountReducingState {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    //001 zhangsan 水果类 4.5
    fsEnv.socketTextStream("HadoopNode00",9999)
        .flatMap(line=>line.split("\\s+"))
        .map((_,1))
        .keyBy(0)
        .map(new ReducingStateRichMapFunction)
        .printToErr("测试")

    fsEnv.execute("FlinkWordCountReducingState")
  }
}
class ReducingStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var historyReducingState:ReducingState[Int]=_

  override def open(parameters: Configuration): Unit = {
    val vsd = new ReducingStateDescriptor[Int]("reducecount",new ReduceFunction[Int] {
      override def reduce(v1: Int, v2: Int): Int = {
        v1+v2
      }
    },createTypeInformation[Int])

    historyReducingState=getRuntimeContext.getReducingState[Int](vsd)
  }

  override def map(value: (String, Int)): (String, Int) = {
    historyReducingState.add(value._2)
    (value._1,historyReducingState.get())
  }
}
  • AggregatingState
object FlinkUserAggregatingState {
    def main(args: Array[String]): Unit = {
        val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
        //001 zhangsan 10000
        fsEnv.socketTextStream("HadoopNode00",9999)
        .map(line=>line.split("\\s+"))
        .map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble))
        .keyBy(0)
        .map(new AggregatingStateRichMapFunction)
        .printToErr("测试")

        fsEnv.execute("FlinkWordCounts")
    }
}
class AggregatingStateRichMapFunction extends RichMapFunction[(String,String,Double),(String,String,Double)]{
    var avgSalaryState:AggregatingState[Double,Double]=_

    override def open(parameters: Configuration): Unit = {
        val vsd = new AggregatingStateDescriptor[Double,(Double,Int),Double]("avgstate",
      new AggregateFunction[Double,(Double,Int),Double] {
        override def createAccumulator(): (Double, Int) = (0.0,0)

        override def add(value: Double, accumulator: (Double, Int)): (Double, Int) = (accumulator._1+value,accumulator._2+1)

  override def getResult(accumulator: (Double, Int)): Double = accumulator._1/accumulator._2

  override def merge(a: (Double, Int), b: (Double, Int)): (Double, Int) = (a._1+b._1,a._2+b._2)
      },createTypeInformation[((Double,Int))])
        avgSalaryState=getRuntimeContext.getAggregatingState[Double,(Double,Int),Double](vsd)
    }

    override def map(value: (String, String, Double)): (String, String, Double) = {
        avgSalaryState.add(value._3)
        (value._1,value._2,avgSalaryState.get())
    }
}

Managed Operator State

To use managed operator state, a stateful function can implement either the more general CheckpointedFunction interface, or the ListCheckpointed interface.Currently, list-style managed operator state is supported.

  • CheckpointedFunction
public interface CheckpointedFunction {
     //系统在checkpoint的时候需要用户调用Context执行状态快照
	void snapshotState(FunctionSnapshotContext context) throws Exception;
    // 初始化状态|故障恢复  
	void initializeState(FunctionInitializationContext context) throws Exception;
}

故障状态恢复方式
在这里插入图片描述

class UserDefineBufferSink(threshold:Int)  extends SinkFunction[String] with CheckpointedFunction{
  @transient
  private var checkpointedState: ListState[String] = _
  private val bufferedElements = ListBuffer[String]()


  override def invoke(value: String): Unit = {
    bufferedElements += value
    if(bufferedElements.size >= threshold){
      for(e <- bufferedElements){
        println("Element:"+e)
      }
      bufferedElements.clear()
    }
  }
  override def snapshotState(context: FunctionSnapshotContext): Unit = {
    checkpointedState.clear()
    for(e<-bufferedElements){
      checkpointedState.add(e)
    }
  }

  override def initializeState(context: FunctionInitializationContext): Unit = {
    println("initializeState...")
    val lsd = new ListStateDescriptor[String]("list",createTypeInformation[String])
    checkpointedState = context.getOperatorStateStore.getUnionListState(lsd) //Even Split

    if(context.isRestored){
      var list=checkpointedState.get().asScala
      println("State Restore :"+list.mkString(","))
      for(e<-list){
        bufferedElements += e
      }
    }
  }
}
object FlinkWordPrint {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    fsEnv.socketTextStream("HadoopNode00",9999)

        .flatMap(line=>line.split("\\s+"))
        .addSink(new UserDefineBufferSink(5))
        .uid("wordsink")//对有状态的算子打标记,名字必须唯一 推荐添加

    fsEnv.execute("FlinkWordCounts")
  }
}
  • ListCheckpointed
public interface ListCheckpointed<T extends Serializable> {
    
     //系统在checkpoint的时候,用户只需要返回需要做快照的List数据集即可,由系统自动帮助用户做快照
    List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
    // 故障恢复
	void restoreState(List<T> state) throws Exception;
}

以上两个标记接口作用适用于Operate State存储和恢复,不同的是CheckpointedFunction在做状态恢复的时候,状态恢复有两种方式:Event-split或者Union但是ListCheckpointed仅仅支持Event-split

class UserDefineCounterSource extends RichParallelSourceFunction[Long] with ListCheckpointed[JLong]{
  @volatile
  private var isRunning = true
  private var offset = 0L

  override def snapshotState(checkpointId: Long, timestamp: Long): util.List[JLong] = {
    var v:JLong=offset
    List(v).asJava
  }

  override def restoreState(state: util.List[JLong]): Unit = {
    for(v<-state.asScala){
      println("restoreState:"+v)
      offset=v
    }
  }

  override def run(ctx: SourceFunction.SourceContext[Long]): Unit = {
    val lock = ctx.getCheckpointLock
    while (isRunning) {
      Thread.sleep(1000)
      lock.synchronized({
        ctx.collect(offset)
        offset += 1
      })
    }
  }

  override def cancel(): Unit = isRunning=false
}
object FlinkCountPrint {
  def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    fsEnv.addSource[Long](new UserDefineCounterSource)
        .uid("countsource")
        .print()
    fsEnv.execute("FlinkCountPrint")
  }
}

State Time-To-Live (TTL)

基本使用

所有keyed state的状态都可以指定一个TTL配置,一旦配置TTL,且状态值已经过期了,Flink将会尽最大的努力删除过期的数据。所有的集合类型的状态数据,每个元素或者Entry都有独立的过期时间。

A time-to-live (TTL) can be assigned to the keyed state of any type. If a TTL is configured and a state value has expired, the stored value will be cleaned up on a best effort basis which is discussed in more detail below.

All state collection types support per-entry TTLs. This means that list elements and map entries expire independently.

使用TTL特性,只需要用户创建一个StateTtlConfig 对象,然后调用XxxStateDescriptor的enableTimeToLive方法即可:

import org.apache.flink.api.common.state.StateTtlConfig
import org.apache.flink.api.common.state.ValueStateDescriptor
import org.apache.flink.api.common.time.Time

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(10))//必须指定,表示过期时间10s
    //可选,OnCreateAndWrite(默认),OnReadAndWrite
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
    //可选,NeverReturnExpired(默认)|ReturnExpiredIfNotCleanedUp
     .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
    .build
    
val stateDescriptor = new ValueStateDescriptor[String]("text state", classOf[String])
stateDescriptor.enableTimeToLive(ttlConfig)

案例

object FlinkWordCountsValueStateWithTTL {
    def main(args: Array[String]): Unit = {
        val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
        fsEnv.socketTextStream("HadoopNode00",9999)
        .flatMap(line=>line.split("\\s+"))
        .map(word=>(word,1))
        .keyBy(0)
        .map(new ValueStateRichMapFunction)
        .printToErr("测试")

        fsEnv.execute("FlinkWordCountsValueStateWithTTL")
    }
}
class ValueStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
    var historyCount:ValueState[Int]=_

    override def open(parameters: Configuration): Unit = {
        val vsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])

        val ttlConfig = StateTtlConfig
        .newBuilder(Time.seconds(10))
        .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
        .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
        .build

        vsd.enableTimeToLive(ttlConfig)

        historyCount=getRuntimeContext.getState[Int](vsd)
    }
    override def map(value: (String, Int)): (String, Int) = {
        var hisrtyCountNum:Int = historyCount.value()
        var currentCount=hisrtyCountNum+value._2
        historyCount.update(currentCount)
        
        (value._1,currentCount)
    }
}

注意

1,开启TTL增加state存储,因为每个状态都要额外存储8bytes的long类型的时间戳

2,如果用户一开始没有开启TTL,在故障恢复时开启TTL,会导致恢复失败。

3,TTL是时间是处理节点时间

Cleanup of Expired State

Flink默认情况下并不会主动的删除过期的state,只用使用到该state的时候flink才会对State实行过期检查,将过期数据清除。这可能导致一些不经常使用的数据可能已经过期很长时间,但是因为没有使用的机会导致长时间驻留在Flink的内存中,带来内存浪费。

  • Cleanup in full snapshot

仅仅是在服务重启的时候,回去加载状态快照信息,在加载的时候检查过期的数据,并且删除数据,但是在程序的运行期间并不会主动的删除过期数据,一般运维人员只能通过定期创建savepoint或者checkpoint然后执行故障恢复才可以释放内存。

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(1))
    .cleanupFullSnapshot
    .build

注意:如果用户使用的是RocksDB的增量式的检查点机制,该种机制就不起作用。

This option is not applicable for the incremental checkpointing in the RocksDB state backend.

  • Cleanup inbackground

用户除了可以开启cleanupFullSnapshot在系统快照的时候清除过期的数据,同时还可以开启cleanupInBackground策略,该策略会根据用户使用state backend存储策略自动选择一种后台清理模式。

import org.apache.flink.api.common.state.StateTtlConfig
val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(1))
    .cleanupInBackground//5 false | 1000
    .build

目前Flink的state backend实现统共分为两大类:heap(堆) state backend和 RocksDB backend,其中基于heap(堆) state backend使用的是incremental cleanup而 RocksDB backend使用的是compaction filter清理策略。

  • Incremental cleanup
import org.apache.flink.api.common.state.StateTtlConfig
val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(1))
    //一次性检查5条state,false表示 只用在state访问的时候才会触发检查
    //如果设置为true表示只要有数据过来就执行一次检查
    .cleanupIncrementally(5, false)
    .build
  • Cleanup during RocksDB compaction
    在这里插入图片描述
import org.apache.flink.api.common.state.StateTtlConfig

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(1))
    //当系统compact处理1000state合并的时候,系统会执行一次查询,过期的数据清理掉
    .cleanupInRocksdbCompactFilter(1000)
    .build

注意用户需要额外开启CompactFilter特性,有两种途径:

1,配置flink-conf.yaml添加如下配置

state.backend.rocksdb.ttl.compaction.filter.enabled: true

2,或者通过API设置开启
RocksDBStateBackend::enableTtlCompactionFilter

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
val rocksDBStateBackend = new RocksDBStateBackend("hdfs:///rockdbs-statebackend")
rocksDBStateBackend.enableTtlCompactionFilter()
fsEnv.setStateBackend(rocksDBStateBackend)

Broadcast State

DataStream -> BoradcastStream

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 母婴 200
var userOrderItem = fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2),tokens(3).toDouble))
//母婴 150  参与抽奖
var configStream = fsEnv.socketTextStream("HadoopNode00",8888)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1).toDouble))

var ruleMapSateDescriptor=new MapStateDescriptor[String,Double]("bmsd",createTypeInformation[String], createTypeInformation[Double])
//实现流的状态广播
userOrderItem.connect(configStream.broadcast(ruleMapSateDescriptor))
.process(new UserDefineBroadcastProcessFunction(ruleMapSateDescriptor) )
.print()

fsEnv.execute("FlinkBroadCastState")
class UserDefineBroadcastProcessFunction(msd:MapStateDescriptor[String,Double]) extends BroadcastProcessFunction[
    (String,String,String,Double),(String,Double),(String,String,Double)]{

    //读取状态-只读 并且产生输出结果
    override def processElement(value: (String, String, String, Double),
                                ctx: BroadcastProcessFunction[(String, String, String, Double), (String, Double), (String, String, Double)]#ReadOnlyContext,
                                out: Collector[(String, String, Double)]): Unit = {
        val readOnlyState = ctx.getBroadcastState(msd)
        if(readOnlyState.contains(value._3)){
            val threshold = readOnlyState.get(value._3)
            if(value._4>=threshold){
                val random = new Random().nextInt(10)
                out.collect((value._1,value._3,random*1.0))//将数据输出到下游
            }
        }

    }
    //更新状态 读写
    override def processBroadcastElement(value: (String, Double),
                                         ctx: BroadcastProcessFunction[(String, String, String, Double), (String, Double), (String, String, Double)]#Context,
                                         out: Collector[(String, String, Double)]): Unit = {
        val stateTobeBroadcast = ctx.getBroadcastState(msd)
        stateTobeBroadcast.put(value._1,value._2)
    }
}

KeyedDataStream -> BoradcastStream

def main(args: Array[String]): Unit = {
    val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
    //001 zhangsan
    var userLoginStream = fsEnv.socketTextStream("HadoopNode00",9999)
    .map(line=>line.split("\\s+"))
    .map(tokens=>(tokens(0),tokens(1)))
    .keyBy(t=>t._1)

    //001 zhangsan 100.0
    var ordeerStream = fsEnv.socketTextStream("HadoopNode00",8888)
    .map(line=>line.split("\\s+"))
    .map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble))
    .keyBy(0)
    .sum(2)


    var ruleMapSateDescriptor=new MapStateDescriptor[String,Double]("bmsd",createTypeInformation[String],
                                                                    createTypeInformation[Double])
    //实现流的状态广播
    userLoginStream.connect(ordeerStream.broadcast(ruleMapSateDescriptor))
    .process(new UserDefineKeyedBroadcastProcessFunction(ruleMapSateDescriptor) )
    .print()


    fsEnv.execute("FlinkBroadCastState")
}
class UserDefineKeyedBroadcastProcessFunction(msd:MapStateDescriptor[String,Double]) extends KeyedBroadcastProcessFunction[String,(String,String),(String,String,Double),(String,String,String)]{

    override def processElement(value: (String, String),
                                ctx: KeyedBroadcastProcessFunction[String, (String, String), (String, String, Double), (String, String, String)]#ReadOnlyContext,
                                out: Collector[(String, String, String)]): Unit = {

        val userID = ctx.getCurrentKey
        val readOnlyState = ctx.getBroadcastState(msd)
        val historyCost = readOnlyState.get(userID)
        if(historyCost > 10000){
            out.collect((value._1,value._2,"金牌"))
        }else if(historyCost > 1000){
            out.collect((value._1,value._2,"银牌"))
        }else if(historyCost > 500){
            out.collect((value._1,value._2,"铜牌"))
        }else{
            out.collect((value._1,value._2,"铁牌"))
        }

    }

    override def processBroadcastElement(value: (String, String, Double),
                                         ctx: KeyedBroadcastProcessFunction[String, (String, String), (String, String, Double), (String, String, String)]#Context,
                                         out: Collector[(String, String, String)]): Unit = {

        val state = ctx.getBroadcastState(msd)
        state.put(value._1,value._3)
    }
}

Checkpoint&SavePoint&State Backend

概念

检查点是一种机制,由JobManger定期发出checkpoint指令-barrier/栅栏,下游任务收到barrier信号时都尝试持久化自己的状态到state backend中,当整个job中所有Task都完成持久化,JobManager会将此次的checkpoint标记为成功,并且删除上一次checkpoint,默认情况Flink不会给程序开启Checkpoint,需要用户手动开启。
在这里插入图片描述
SavePoint是一种人工触发一种检查点机制,由运维人员在关闭任务前指定savepoint目录。本质依然是有系统创建一个检查点,不同是有savepoint创建的检查点永远不会被删除。

无论是checkpoint还是save point最终都是要把状态数据存储到state backend中,目前Flink提供三种state backend策略。

  • MemoryStateBackend(测试):系统在做检查点的时候,会将所有状态信息进行快照-持久化,同时会将该状态信息发送给JobManager的节点,并存储在JobManager节点(单机)的内存中。(默认配置)

1.默认单个state大小不能超过5MB
2.所有聚合状态大小必须适配JobManager的内存

     fsEnv.setStateBackend(new MemoryStateBackend(1024*1024*5,true) )//5MB 异步快照
  • FsStateBackend:将所有使用状态数据存在TaskManager的内存中(多机环境),系统在做checkpoint的时候,系统会将状态快照数据存储在配置的文件系统路径下。

    1.当计算集群有大规模状态(相比较单机)存储的时候
    2.所有的生产环境推荐使用

fsEnv.setStateBackend(new FsStateBackend("hdfs:///xxxx路径",true) )//异步快照
  • RocksDBStateBackend:每一个TaskManager持有一个本地的RocksDB数据库(内存+磁盘),用于存储状态数据。系统在做checkpoint的时候,系统会将RocksDB数据存储在配置的文件系统路径下。

    1.当计算超级大规模状态(相比基于内存的集群)存储的时候
    2.所有的生产环境推荐使用
    3.受限于RocksDB数据库本身,最大允许的Key或者Value不能超过2^31Bytes大小-(4GB左右)

  fsEnv.setStateBackend(new RocksDBStateBackend("hdfs:///xxxx路径",true) )//增量checkpoint
  flink-conf.yaml- 全局配置
#==============================================================================
# Fault tolerance and checkpointing
  state.backend: rocksdb

   state.checkpoints.dir: hdfs:///flink-checkpoints

   state.savepoints.dir: hdfs:///flink-savepoints

   state.backend.incremental: true

   state.backend.rocksdb.ttl.compaction.filter.enabled: true

案例

object FlinkWordCountsCheckpoint {
      def main(args: Array[String]): Unit = {
          val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
  
          //设置检查点CheckpointInterval参数为5s
          fsEnv.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
          //检查点必须在2s以内完成,如果失败放弃本次checkpoint
          fsEnv.getCheckpointConfig.setCheckpointTimeout(2000)
          //检查点间时间间隔必须大于2s 优先级高于 CheckpointInterval
          fsEnv.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
          //如果检查点失败,终止计算任务
          fsEnv.getCheckpointConfig.setFailOnCheckpointingErrors(true)
          //在用户取消任务的时候,是否删除检查点数据,推荐配置为RETAIN_ON_CANCELLATION,仅仅当用户在取消的时候没有指定savepoint会保留检查点
          fsEnv.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
  
  
          fsEnv.socketTextStream("HadoopNode00",9999)
          .flatMap(line=>line.split("\\s+"))
          .map(word=>(word,1))
          .keyBy(0)
          .map(new WordCountMapFunction())
          .uid("wordcount")
          .print("测试")
  
          fsEnv.execute("FlinkWordCountsCheckpoint")
      }
  }
  class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
      var wcs:ValueState[Int]=_
  
      override def open(parameters: Configuration): Unit = {
          val wcvsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])
          wcs=getRuntimeContext.getState(wcvsd)
      }
      override def map(value: (String, Int)): (String, Int) = {
          var count=wcs.value()
          wcs.update(count+value._2)
          (value._1,wcs.value())
      }
  }

Task Failure Recovery

RestartStrategy

重启策略描述的是程序在故障的时候何时进行重启策略。目前Flink给用户提供了以下几种策略:

  • noRestart - 失败就会终止服务
  • fixedDelayRestart - 固定重启次数,用户可以设定时间间隔
//每间隔5秒中重启1次,总共尝试5次
fsEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,Time.seconds(5)))
  • failureRateRestart - 在规定时间间隔内,出错次数达到固定值,认定任务失败
//1分钟内总共失败5次,每次尝试间隔5秒
fsEnv.setRestartStrategy(RestartStrategies.failureRateRestart(5,Time.minutes(1),Time.seconds(5)))
  • fallBackRestart - 如果集群配置重启策略则使用集群配置策略,如果没有配置默认策略,系统会使用fixedDelayRestart
fsEnv.setRestartStrategy(RestartStrategies.fallBackRestart())
Failover Strategies

该配置配置系统已何种方式做故障重启,目前Flink支持两种重启策略region-局部| full-全部重启,需要用户配置flink-conf.yaml

jobmanager.execution.failover-strategy: region
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值