Spark Streaming源码解读之Driver中的ReceiverTracker架构设计以及具体实现彻底研究
ReceiverTracker主要的功能:
1. 在Executor上启动Receivers。
2. 停止Receivers 。
3. 更新Receiver接收数据的速率(可以实现限流)
4. 接收Receivers的运行状态,只要Receiver停止运行,就重新启动Receiver。也就是Receiver的容错功能。
5. 接受Receiver的注册。
6. 借助ReceivedBlockTracker来管理Receiver接收数据的元数据。
7. 汇报Receiver发送过来的错误信息
启动receiver
ReceiverTracker的start方法中,实例化了ReceiverTrackerEndpoint,并且在Executor上启动Receivers:
ReceiverTracker.scala(149-161)
def start(): Unit = synchronized {
if (isTrackerStarted){
throw new SparkException("ReceiverTracker already started")
}
if (!receiverInputStreams.isEmpty) {
endpoint = ssc.env.rpcEnv.setupEndpoint(
"ReceiverTracker",
new ReceiverTrackerEndpoint(ssc.env.rpcEnv))
if (!skipReceiverLaunch)launchReceivers()
logInfo("ReceiverTracker started")
trackerState = Started
}
}
ReceiverTracker.scala(413-424)
private def launchReceivers(): Unit = {
val receivers = receiverInputStreams.map(nis => {
val rcvr= nis.getReceiver()
rcvr.setReceiverId(nis.id)
rcvr
})
runDummySparkJob()
logInfo("Starting " + receivers.length +" receivers")
endpoint.send(StartAllReceivers(receivers))
}
一直追踪函数调用过程receiver被封装为RDD (详见第5讲 ReceiverTracker.scala 583-589)
val receiverRDD: RDD[Receiver[_]] =
if (scheduledLocations.isEmpty){
ssc.sc.makeRDD(Seq(receiver),1)
} else {
val preferredLocations= scheduledLocations.map(_.toString).distinct
ssc.sc.makeRDD(Seq(receiver -> preferredLocations))
}
并被提交 (ReceiverTracker.scala 591)
ssc.sparkContext.setJobDescription(s"Streaming job running receiver$receiverId")
Receiver启动的注册
Receiver启动后,会向ReceiverTracker注册,注册成功才算正式启动了。
(未完待续)