Standalone模式下Spark任务资源分配
1.背景
在spark集群采用spark-submit命令提交任务时,我们可以通过配置资源相关参数来控制任务使用的资源,比如总核数,每个executor中核数与内存等,但是这个任务在spark集群中实际占用资源是没法配置的,需要根据实际环境和参数结合起来考虑,下面是就是对Standalone模式下Spark任务在集群中实际资源分配的分析
需要说明的是,这里指只针对cpu、executor进行分析,不分析memory的分配,因为实际使用的总memory=executor个数*每个executor中memory,其中第二个参数是指定的,所以只需分析executor即可
2.源码分析
在这里以spark2.0.2为例进行分析
任务提交后,driver会向master注册,master收到消息后,会对维护的worker列表进行轮询,并在符合一定条件的worker上申请资源给executor,相关代码如下:
Master类中,Master获取driver注册application的消息后,会在Mater注册application,然后开始为application分配资源
case RegisterApplication(description, driver) =>
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
进入到schedule,在startExecutorsOnWorkers中开始在Worker上分配Executor
/**
* Schedule the currently available resources among waiting apps. This method will be called
* every time a new app joins or resource availability changes.
*/
private def schedule(): Unit = {
if (state != RecoveryState.ALIVE) {
return
}
// Drivers take strict precedence over executors
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
val numWorkersAlive = shuffledAliveWorkers.size
var curPos = 0
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
// We assign workers to each waiting driver in a round-robin fashion. For each driver, we
// start from the last worker that was assigned a driver, and continue onwards until we have
// explored all alive workers.
var launched = false
var numWorkersVisited = 0
while (numWorkersVisited < numWorkersAlive && !launched) {
val worker = shuffledAliveWorkers(curPos)
numWorkersVisited += 1
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
launched = true
}
curPos = (curPos + 1) % numWorkersAlive
}
}
startExecutorsOnWorkers()
}
private def launchExecutor(worke