State & Fault Tolerance
参考资料:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/
Flink状态使用
Flink中常见的状态大致分为了两类:Keyed State
和Operator State
.
Keyed State
:Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream. each Keyed state is bound to <keyed-parallel-operator-instance, key>.and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.
Operator State
:With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance.
总结:在flink中无论是Keyed State
或者是Operator State
都可以两种形式存在 Managed State和Raw State,推荐使用**Managed State**
,因为Flink可以优化状态存储,在故障恢复期间可以做状态重新分发。
Managed KeyedState
- ValueState
object FlinkWordCountsValueState {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.map(new ValueStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
}
}
class ValueStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var historyCount:ValueState[Int]=_
override def open(parameters: Configuration): Unit = {
val vsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])
historyCount=getRuntimeContext.getState[Int](vsd)
}
override def map(value: (String, Int)): (String, Int) = {
var hisrtyCountNum:Int = historyCount.value()
var currentCount=hisrtyCountNum+value._2
historyCount.update(currentCount)
(value._1,currentCount)
}
}
- ListSate
object FlinkUserPasswordListState {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 123456
fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2)))
.keyBy(0)
.map(new ListStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
}
}
class ListStateRichMapFunction extends RichMapFunction[(String,String,String),(String,String,String)]{
var historyPasswords:ListState[String]=_
override def open(parameters: Configuration): Unit = {
val vsd = new ListStateDescriptor[String]("historyPasssword",createTypeInformation[String])
historyPasswords=getRuntimeContext.getListState[String](vsd)
}
override def map(value: (String, String, String)): (String, String, String) = {
historyPasswords.add(value._3)
val list = historyPasswords.get().asScala.toList
(value._1,value._2,list.mkString(","))
}
}
- Map state
object FlinkUserMapState {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 水果类 4.5
fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2),tokens(3).toDouble))
.keyBy(0)
.map(new MapStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
}
}
class MapStateRichMapFunction extends RichMapFunction[(String,String,String,Double),(String,String,String)]{
var historyMapState:MapState[String,Double]=_
override def open(parameters: Configuration): Unit = {
val vsd = new MapStateDescriptor[String,Double]("mapstate",createTypeInformation[String],
createTypeInformation[Double])
historyMapState=getRuntimeContext.getMapState[String,Double](vsd)
}
override def map(value: (String, String, String,Double)): (String, String, String) = {
val cost = historyMapState.get(value._3)
var currentCost=cost+value._4
historyMapState.put(value._3,currentCost)
(value._1,value._2,historyMapState.iterator().asScala.map(t=>t.getKey+":"+t.getValue).mkString(","))
}
}
- ReducingState
object FlinkWordCountReducingState {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 水果类 4.5
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map((_,1))
.keyBy(0)
.map(new ReducingStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCountReducingState")
}
}
class ReducingStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var historyReducingState:ReducingState[Int]=_
override def open(parameters: Configuration): Unit = {
val vsd = new ReducingStateDescriptor[Int]("reducecount",new ReduceFunction[Int] {
override def reduce(v1: Int, v2: Int): Int = {
v1+v2
}
},createTypeInformation[Int])
historyReducingState=getRuntimeContext.getReducingState[Int](vsd)
}
override def map(value: (String, Int)): (String, Int) = {
historyReducingState.add(value._2)
(value._1,historyReducingState.get())
}
}
- AggregatingState
object FlinkUserAggregatingState {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 10000
fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble))
.keyBy(0)
.map(new AggregatingStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
}
}
class AggregatingStateRichMapFunction extends RichMapFunction[(String,String,Double),(String,String,Double)]{
var avgSalaryState:AggregatingState[Double,Double]=_
override def open(parameters: Configuration): Unit = {
val vsd = new AggregatingStateDescriptor[Double,(Double,Int),Double]("avgstate",
new AggregateFunction[Double,(Double,Int),Double] {
override def createAccumulator(): (Double, Int) = (0.0,0)
override def add(value: Double, accumulator: (Double, Int)): (Double, Int) = (accumulator._1+value,accumulator._2+1)
override def getResult(accumulator: (Double, Int)): Double = accumulator._1/accumulator._2
override def merge(a: (Double, Int), b: (Double, Int)): (Double, Int) = (a._1+b._1,a._2+b._2)
},createTypeInformation[((Double,Int))])
avgSalaryState=getRuntimeContext.getAggregatingState[Double,(Double,Int),Double](vsd)
}
override def map(value: (String, String, Double)): (String, String, Double) = {
avgSalaryState.add(value._3)
(value._1,value._2,avgSalaryState.get())
}
}
Managed Operator State
To use managed operator state, a stateful function can implement either the more general CheckpointedFunction interface, or the ListCheckpointed interface.Currently, list-style managed operator state is supported.
- CheckpointedFunction
public interface CheckpointedFunction {
//系统在checkpoint的时候需要用户调用Context执行状态快照
void snapshotState(FunctionSnapshotContext context) throws Exception;
// 初始化状态|故障恢复
void initializeState(FunctionInitializationContext context) throws Exception;
}
故障状态恢复方式
class UserDefineBufferSink(threshold:Int) extends SinkFunction[String] with CheckpointedFunction{
@transient
private var checkpointedState: ListState[String] = _
private val bufferedElements = ListBuffer[String]()
override def invoke(value: String): Unit = {
bufferedElements += value
if(bufferedElements.size >= threshold){
for(e <- bufferedElements){
println("Element:"+e)
}
bufferedElements.clear()
}
}
override def snapshotState(context: FunctionSnapshotContext): Unit = {
checkpointedState.clear()
for(e<-bufferedElements){
checkpointedState.add(e)
}
}
override def initializeState(context: FunctionInitializationContext): Unit = {
println("initializeState...")
val lsd = new ListStateDescriptor[String]("list",createTypeInformation[String])
checkpointedState = context.getOperatorStateStore.getUnionListState(lsd) //Even Split
if(context.isRestored){
var list=checkpointedState.get().asScala
println("State Restore :"+list.mkString(","))
for(e<-list){
bufferedElements += e
}
}
}
}
object FlinkWordPrint {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.addSink(new UserDefineBufferSink(5))
.uid("wordsink")//对有状态的算子打标记,名字必须唯一 推荐添加
fsEnv.execute("FlinkWordCounts")
}
}
- ListCheckpointed
public interface ListCheckpointed<T extends Serializable> {
//系统在checkpoint的时候,用户只需要返回需要做快照的List数据集即可,由系统自动帮助用户做快照
List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
// 故障恢复
void restoreState(List<T> state) throws Exception;
}
以上两个标记接口作用适用于Operate State存储和恢复,不同的是CheckpointedFunction在做状态恢复的时候,状态恢复有两种方式:Event-split或者Union但是ListCheckpointed仅仅支持Event-split
class UserDefineCounterSource extends RichParallelSourceFunction[Long] with ListCheckpointed[JLong]{
@volatile
private var isRunning = true
private var offset = 0L
override def snapshotState(checkpointId: Long, timestamp: Long): util.List[JLong] = {
var v:JLong=offset
List(v).asJava
}
override def restoreState(state: util.List[JLong]): Unit = {
for(v<-state.asScala){
println("restoreState:"+v)
offset=v
}
}
override def run(ctx: SourceFunction.SourceContext[Long]): Unit = {
val lock = ctx.getCheckpointLock
while (isRunning) {
Thread.sleep(1000)
lock.synchronized({
ctx.collect(offset)
offset += 1
})
}
}
override def cancel(): Unit = isRunning=false
}
object FlinkCountPrint {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.addSource[Long](new UserDefineCounterSource)
.uid("countsource")
.print()
fsEnv.execute("FlinkCountPrint")
}
}
State Time-To-Live (TTL)
基本使用
所有keyed state的状态都可以指定一个TTL配置,一旦配置TTL,且状态值已经过期了,Flink将会尽最大的努力删除过期的数据。所有的集合类型的状态数据,每个元素或者Entry都有独立的过期时间。
A time-to-live (TTL) can be assigned to the keyed state of any type. If a TTL is configured and a state value has expired, the stored value will be cleaned up on a best effort basis which is discussed in more detail below.
All state collection types support per-entry TTLs. This means that list elements and map entries expire independently.
使用TTL特性,只需要用户创建一个StateTtlConfig 对象,然后调用XxxStateDescriptor的enableTimeToLive方法即可:
import org.apache.flink.api.common.state.StateTtlConfig
import org.apache.flink.api.common.state.ValueStateDescriptor
import org.apache.flink.api.common.time.Time
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(10))//必须指定,表示过期时间10s
//可选,OnCreateAndWrite(默认),OnReadAndWrite
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
//可选,NeverReturnExpired(默认)|ReturnExpiredIfNotCleanedUp
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build
val stateDescriptor = new ValueStateDescriptor[String]("text state", classOf[String])
stateDescriptor.enableTimeToLive(ttlConfig)
案例
object FlinkWordCountsValueStateWithTTL {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.map(new ValueStateRichMapFunction)
.printToErr("测试")
fsEnv.execute("FlinkWordCountsValueStateWithTTL")
}
}
class ValueStateRichMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var historyCount:ValueState[Int]=_
override def open(parameters: Configuration): Unit = {
val vsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(10))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build
vsd.enableTimeToLive(ttlConfig)
historyCount=getRuntimeContext.getState[Int](vsd)
}
override def map(value: (String, Int)): (String, Int) = {
var hisrtyCountNum:Int = historyCount.value()
var currentCount=hisrtyCountNum+value._2
historyCount.update(currentCount)
(value._1,currentCount)
}
}
注意
:
1,开启TTL增加state存储,因为每个状态都要额外存储8bytes的long类型的时间戳
2,如果用户一开始没有开启TTL,在故障恢复时开启TTL,会导致恢复失败。
3,TTL是时间是处理节点时间
Cleanup of Expired State
Flink默认情况下并不会主动的删除过期的state,只用使用到该state的时候flink才会对State实行过期检查,将过期数据清除。这可能导致一些不经常使用的数据可能已经过期很长时间,但是因为没有使用的机会导致长时间驻留在Flink的内存中,带来内存浪费。
- Cleanup in full snapshot
仅仅是在服务重启的时候,回去加载状态快照信息,在加载的时候检查过期的数据,并且删除数据,但是在程序的运行期间并不会主动的删除过期数据,一般运维人员只能通过定期创建savepoint或者checkpoint然后执行故障恢复才可以释放内存。
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
.cleanupFullSnapshot
.build
注意:如果用户使用的是RocksDB的增量式的检查点机制,该种机制就不起作用。
This option is not applicable for the incremental checkpointing in the RocksDB state backend.
- Cleanup inbackground
用户除了可以开启cleanupFullSnapshot在系统快照的时候清除过期的数据,同时还可以开启cleanupInBackground策略,该策略会根据用户使用state backend存储策略自动选择一种后台清理模式。
import org.apache.flink.api.common.state.StateTtlConfig
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
.cleanupInBackground//5 false | 1000
.build
目前Flink的state backend实现统共分为两大类:heap(堆) state backend和 RocksDB backend,其中基于heap(堆) state backend使用的是incremental cleanup而 RocksDB backend使用的是compaction filter清理策略。
- Incremental cleanup
import org.apache.flink.api.common.state.StateTtlConfig
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
//一次性检查5条state,false表示 只用在state访问的时候才会触发检查
//如果设置为true表示只要有数据过来就执行一次检查
.cleanupIncrementally(5, false)
.build
- Cleanup during RocksDB compaction
import org.apache.flink.api.common.state.StateTtlConfig
val ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
//当系统compact处理1000state合并的时候,系统会执行一次查询,过期的数据清理掉
.cleanupInRocksdbCompactFilter(1000)
.build
注意用户需要额外开启CompactFilter特性,有两种途径:
1,配置flink-conf.yaml添加如下配置
state.backend.rocksdb.ttl.compaction.filter.enabled: true
2,或者通过API设置开启
RocksDBStateBackend::enableTtlCompactionFilter
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
val rocksDBStateBackend = new RocksDBStateBackend("hdfs:///rockdbs-statebackend")
rocksDBStateBackend.enableTtlCompactionFilter()
fsEnv.setStateBackend(rocksDBStateBackend)
Broadcast State
DataStream -> BoradcastStream
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan 母婴 200
var userOrderItem = fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2),tokens(3).toDouble))
//母婴 150 参与抽奖
var configStream = fsEnv.socketTextStream("HadoopNode00",8888)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1).toDouble))
var ruleMapSateDescriptor=new MapStateDescriptor[String,Double]("bmsd",createTypeInformation[String], createTypeInformation[Double])
//实现流的状态广播
userOrderItem.connect(configStream.broadcast(ruleMapSateDescriptor))
.process(new UserDefineBroadcastProcessFunction(ruleMapSateDescriptor) )
.print()
fsEnv.execute("FlinkBroadCastState")
class UserDefineBroadcastProcessFunction(msd:MapStateDescriptor[String,Double]) extends BroadcastProcessFunction[
(String,String,String,Double),(String,Double),(String,String,Double)]{
//读取状态-只读 并且产生输出结果
override def processElement(value: (String, String, String, Double),
ctx: BroadcastProcessFunction[(String, String, String, Double), (String, Double), (String, String, Double)]#ReadOnlyContext,
out: Collector[(String, String, Double)]): Unit = {
val readOnlyState = ctx.getBroadcastState(msd)
if(readOnlyState.contains(value._3)){
val threshold = readOnlyState.get(value._3)
if(value._4>=threshold){
val random = new Random().nextInt(10)
out.collect((value._1,value._3,random*1.0))//将数据输出到下游
}
}
}
//更新状态 读写
override def processBroadcastElement(value: (String, Double),
ctx: BroadcastProcessFunction[(String, String, String, Double), (String, Double), (String, String, Double)]#Context,
out: Collector[(String, String, Double)]): Unit = {
val stateTobeBroadcast = ctx.getBroadcastState(msd)
stateTobeBroadcast.put(value._1,value._2)
}
}
KeyedDataStream -> BoradcastStream
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//001 zhangsan
var userLoginStream = fsEnv.socketTextStream("HadoopNode00",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1)))
.keyBy(t=>t._1)
//001 zhangsan 100.0
var ordeerStream = fsEnv.socketTextStream("HadoopNode00",8888)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble))
.keyBy(0)
.sum(2)
var ruleMapSateDescriptor=new MapStateDescriptor[String,Double]("bmsd",createTypeInformation[String],
createTypeInformation[Double])
//实现流的状态广播
userLoginStream.connect(ordeerStream.broadcast(ruleMapSateDescriptor))
.process(new UserDefineKeyedBroadcastProcessFunction(ruleMapSateDescriptor) )
.print()
fsEnv.execute("FlinkBroadCastState")
}
class UserDefineKeyedBroadcastProcessFunction(msd:MapStateDescriptor[String,Double]) extends KeyedBroadcastProcessFunction[String,(String,String),(String,String,Double),(String,String,String)]{
override def processElement(value: (String, String),
ctx: KeyedBroadcastProcessFunction[String, (String, String), (String, String, Double), (String, String, String)]#ReadOnlyContext,
out: Collector[(String, String, String)]): Unit = {
val userID = ctx.getCurrentKey
val readOnlyState = ctx.getBroadcastState(msd)
val historyCost = readOnlyState.get(userID)
if(historyCost > 10000){
out.collect((value._1,value._2,"金牌"))
}else if(historyCost > 1000){
out.collect((value._1,value._2,"银牌"))
}else if(historyCost > 500){
out.collect((value._1,value._2,"铜牌"))
}else{
out.collect((value._1,value._2,"铁牌"))
}
}
override def processBroadcastElement(value: (String, String, Double),
ctx: KeyedBroadcastProcessFunction[String, (String, String), (String, String, Double), (String, String, String)]#Context,
out: Collector[(String, String, String)]): Unit = {
val state = ctx.getBroadcastState(msd)
state.put(value._1,value._3)
}
}
Checkpoint&SavePoint&State Backend
概念
检查点是一种机制,由JobManger定期发出checkpoint指令-barrier/栅栏,下游任务收到barrier信号时都尝试持久化自己的状态到state backend中,当整个job中所有Task都完成持久化,JobManager会将此次的checkpoint标记为成功,并且删除上一次checkpoint,默认情况Flink不会给程序开启Checkpoint,需要用户手动开启。
SavePoint是一种人工触发一种检查点机制,由运维人员在关闭任务前指定savepoint目录。本质依然是有系统创建一个检查点,不同是有savepoint创建的检查点永远不会被删除。
无论是checkpoint
还是save point
最终都是要把状态数据存储到state backend
中,目前Flink提供三种state backend策略。
- MemoryStateBackend(测试):系统在做检查点的时候,会将所有状态信息进行快照-持久化,同时会将该状态信息发送给JobManager的节点,并存储在JobManager节点(单机)的内存中。(默认配置)
1.默认单个state大小不能超过5MB
2.所有聚合状态大小必须适配JobManager的内存
fsEnv.setStateBackend(new MemoryStateBackend(1024*1024*5,true) )//5MB 异步快照
- FsStateBackend:将所有使用状态数据存在TaskManager的内存中(多机环境),系统在做checkpoint的时候,系统会将状态快照数据存储在配置的文件系统路径下。
1.当计算集群有大规模状态(相比较单机)存储的时候
2.所有的生产环境推荐使用
fsEnv.setStateBackend(new FsStateBackend("hdfs:///xxxx路径",true) )//异步快照
- RocksDBStateBackend:每一个TaskManager持有一个本地的RocksDB数据库(内存+磁盘),用于存储状态数据。系统在做checkpoint的时候,系统会将RocksDB数据存储在配置的文件系统路径下。
1.当计算超级大规模状态(相比基于内存的集群)存储的时候
2.所有的生产环境推荐使用
3.受限于RocksDB数据库本身,最大允许的Key或者Value不能超过2^31Bytes大小-(4GB左右)
fsEnv.setStateBackend(new RocksDBStateBackend("hdfs:///xxxx路径",true) )//增量checkpoint
flink-conf.yaml- 全局配置
#==============================================================================
# Fault tolerance and checkpointing
state.backend: rocksdb
state.checkpoints.dir: hdfs:///flink-checkpoints
state.savepoints.dir: hdfs:///flink-savepoints
state.backend.incremental: true
state.backend.rocksdb.ttl.compaction.filter.enabled: true
案例
object FlinkWordCountsCheckpoint {
def main(args: Array[String]): Unit = {
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
//设置检查点CheckpointInterval参数为5s
fsEnv.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
//检查点必须在2s以内完成,如果失败放弃本次checkpoint
fsEnv.getCheckpointConfig.setCheckpointTimeout(2000)
//检查点间时间间隔必须大于2s 优先级高于 CheckpointInterval
fsEnv.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
//如果检查点失败,终止计算任务
fsEnv.getCheckpointConfig.setFailOnCheckpointingErrors(true)
//在用户取消任务的时候,是否删除检查点数据,推荐配置为RETAIN_ON_CANCELLATION,仅仅当用户在取消的时候没有指定savepoint会保留检查点
fsEnv.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.map(new WordCountMapFunction())
.uid("wordcount")
.print("测试")
fsEnv.execute("FlinkWordCountsCheckpoint")
}
}
class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var wcs:ValueState[Int]=_
override def open(parameters: Configuration): Unit = {
val wcvsd = new ValueStateDescriptor[Int]("wordcount",createTypeInformation[Int])
wcs=getRuntimeContext.getState(wcvsd)
}
override def map(value: (String, Int)): (String, Int) = {
var count=wcs.value()
wcs.update(count+value._2)
(value._1,wcs.value())
}
}
Task Failure Recovery
RestartStrategy
重启策略描述的是程序在故障的时候何时进行重启策略。目前Flink给用户提供了以下几种策略:
- noRestart - 失败就会终止服务
- fixedDelayRestart - 固定重启次数,用户可以设定时间间隔
//每间隔5秒中重启1次,总共尝试5次
fsEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,Time.seconds(5)))
- failureRateRestart - 在规定时间间隔内,出错次数达到固定值,认定任务失败
//1分钟内总共失败5次,每次尝试间隔5秒
fsEnv.setRestartStrategy(RestartStrategies.failureRateRestart(5,Time.minutes(1),Time.seconds(5)))
- fallBackRestart - 如果集群配置重启策略则使用集群配置策略,如果没有配置默认策略,系统会使用fixedDelayRestart
fsEnv.setRestartStrategy(RestartStrategies.fallBackRestart())
Failover Strategies
该配置配置系统已何种方式做故障重启,目前Flink支持两种重启策略region-局部| full-全部重启,需要用户配置flink-conf.yaml
jobmanager.execution.failover-strategy: region