一.SparkStreaming实例化过程
创建StreamingContext的方法有很多种,示例以SparkCon和任务执行间隔作为参数创建StreamingContext
val ssc = new StreamingContext(conf,Seconds(3))
1.检查SparkContext和CheckPoint,不能同时为空,否则抛出异常
2.检查checkPoint是否启用
3.更新SparkContext,如果SparkContext不存在,则将checkPoint需要的信息写入SparkConf,并以此构建SparkContext;如果SparkContext存在,则使用它
checkpoint需要的信息:
val propertiesToReload = List(
"spark.yarn.app.id",
"spark.yarn.app.attemptId",
"spark.driver.host",
"spark.driver.port",
"spark.master",
"spark.yarn.keytab",
"spark.yarn.principal",
"spark.ui.filters")
4.模式检查,spark.master不能为本地单线程模式
5.通过SparkContext来更新SparkConf和SparkEnv
6.实例化DStreamGraph
private[streaming] val graph: DStreamGraph = {
if (isCheckpointPresent) {
cp_.graph.setContext(this)
cp_.graph.restoreCheckpointData()
cp_.graph
} else {
require(batchDur_ != null, "Batch duration for StreamingContext cannot be null")
val newGraph = new DStreamGraph()
newGraph.setBatchDuration(batchDur_)
newGraph
}
}7.设置checkPoint目录private[streaming] var checkpointDir: String = {
if (isCheckpointPresent) {
sc.setCheckpointDir(cp_.checkpointDir)
cp_.checkpointDir
} else {
null
}
}8.创建Streaming作业调度器,JobScheduler,使用DStreamGraph产生Jobvalscheduler = new JobScheduler(this)9.创建ContextWaitervalwaiter = new ContextWaiter
jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batchgraph.generateJobs(time) // generate jobs using allocated blockval jobOption = outputStream.generateJob(time)