一、引子
在Worker Actor中,每次LaunchExecutor会创建一个CoarseGrainedExecutorBackend进程,Executor和CoarseGrainedExecutorBackend是1对1的关系。也就是说集群里启动多少Executor实例就有多少CoarseGrainedExecutorBackend进程。
那么到底是如何分配Executor的呢?怎么控制调节Executor的个数呢?
二、Driver和Executor资源调度
下面主要介绍一下Spark Executor分配策略:
我们仅看,当Application提交注册到Master后,Master会返回RegisteredApplication,之后便会调用schedule()这个方法,来分配Driver的资源,和启动Executor的资源。
schedule()方法是来调度当前可用资源的调度方法,它管理还在排队等待的Apps资源的分配,这个方法是每次在集群资源发生变动的时候都会调用,根据当前集群最新的资源来进行Apps的资源分配。
Driver资源调度:
// First schedule drivers, they take strict precedence over applications
val shuffledWorkers = Random.shuffle(workers) // 把当前workers这个HashSet的顺序随机打乱
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) { //遍历活着的workers
for (driver <- waitingDrivers) { //在等待队列中的Driver们会进行资源分配
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { //当前的worker内存和cpu均大于当前driver请求的mem和cpu,则启动
launchDriver(worker, driver) //启动Driver 内部实现是发送启动Driver命令给指定Worker,Worker来启动Driver。
waitingDrivers -= driver //把启动过的Driver从队列移除
}
}
}
Executor资源调度:
val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", true)
在介绍之前我们先介绍一个概念,
/**
* Can an app use the given worker? True if the worker has enough memory and we haven't already
* launched an executor for the app on it (right now the standalone backend doesn't like having
* two executors on the same worker).
*/
def canUse(app: ApplicationInfo, worker: WorkerInfo): Boolean = {
worker.memoryFree >= app.desc.memoryPerSlave && !worker.hasExecutor(app)
}