概念
- 有一个参数,是个函数,该函数有两个参数,第一个是序列类型,第二个是Option类型
def updateStateByKey[S : ClassTag](updateFunc: (Seq[V], Option[S]) => Option[S]): DStream[(K, S)]
updateStateByKey
还有其他重载类型,上面这个属于比较简单的一种
案例
以nc作为测试源
nc -lk mypc01 10087
案例代码如下
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* 普通算子不能进行批次之间的聚合
*/
object UpdateBykeyDemo3 extends App {
private val conf = new SparkConf().setAppName("test").setMaster("local[*]")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
private val ssc: StreamingContext = new StreamingContext(conf, Seconds(5))
ssc.checkpoint("data")
private val dstream: ReceiverInputDStream[String] = ssc.socketTextStream("mypc01", 10087)
private val dstream2: DStream[(String, Int)] = dstream.flatMap((_.split(" "))).map((_, 1))
private val value: DStream[(String, Int)] = dstream2.updateStateByKey(updateFunc)
value.print()
ssc.start()
ssc.awaitTermination()
def updateFunc(seq:Seq[Int],option: Option[Int]): Option[Int] ={
println(s"option: $option + seq: ${seq.mkString(",")}")
Option(seq.sum + option.getOrElse(0))
}
}