本文从源码调用方面介绍从应用程序提交到启动ApplicationMaster的整个过程,期间涉及ClientRMService、RMAppManager、RMAppImpl、RMAppAttemptImpl、RMNode、ResourceScheduler等几个主要组件。
当客户端调用RPC函数ApplicationClientProtocol#submitApplication之后,ResourceManager端的处理过程如下:
步骤1:
ResourceManager中的ClientRMService实现了ApplicationClientProtocol协议,它处理来自客户端的请求,并调用RMAppManager#submitApplication通知其他相关服务作进一步处理。
//clientRMService.java
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException {
...
try {
// call RMAppManager to submit application directly
//开始提交作业
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
...
}
return response;
}
步骤2:
RMAppManager为该应用程序创建一个RMAppImpl对象以维护它的运行状态,并发送一个RMAppEventType.START事件。
//RMAppManager.java
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
LOG.info("begin to submitApplication");
//获得作业ID
ApplicationId applicationId = submissionContext.getApplicationId();
//构建一个app并放入applicationACLS
RMAppImpl application =
createAndPopulateNewRMApp(submissionContext, submitTime, user);
ApplicationId appId = submissionContext.getApplicationId();
if (UserGroupInformation.isSecurityEnabled()) {
...
} else {
// Dispatcher is not yet started at this time, so these START events
// enqueued should be guaranteed to be first processed when dispatcher
// gets started.
//触发app启动事件
LOG.info("send event RMAppEventType.START");
LOG.info("this.rmContext="+this.rmContext.toString());
//this.rmContext=RMContextImpl
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
}
步骤3:
RMAppImpl收到RMAppEventType.START事件后,会调用RMStateStore#storeApplication,以日志记录RMAppImpl当前信息,
至此,RMAppImpl的运行状态由NEW转移为NEW_SAVING。该步骤就较为复杂了,下面详细介绍下。
其中RMAppEventType注册到中央异步调度器的地方在ResourceManager.java中:
//ResourceManager.java
protected void serviceInit(Configuration configuration) throws Exception {
conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true);
...
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
// Register event handler for RmAppEvents
rmDispatcher.register(RMAppEventType.class,
new ApplicationEventDispatcher(rmContext));
// Register event handler for RmAppAttemptEvents
rmDispatcher.register(RMAppAttemptEventType.class,
new ApplicationAttemptEventDispatcher(rmContext));
// Register event handler for RmNodes
rmDispatcher.register(
RMNodeEventType.class, new NodeEventDispatcher(rmContext));
...
}
上面的this.rmContext=RMContextImpl,
this.rmContext.getDispatcher()=AsyncDispatcher,
this.rmContext.getDispatcher().getEventHandler()=AsyncDispatcher$GenericEventHandler
所以会进入AsyncDispatcher类中的内部类GenericEventHandler的函数handle中
//AsyncDispatcher.java
class GenericEventHandler implements EventHandler<Event> {
public void handle(Event event) {
LOG.info("begin to call GenericEventHandler::handle, event= "+event.toString());
if (blockNewEvents) {
return;
}
drained = false;
/* all this method does is enqueue all the events onto the queue */
int qSize = eventQueue.size();
if (qSize !=0 && qSize %1000 == 0) {
LOG.info("Size of event-queue is " + qSize);
}
int remCapacity = eventQueue.remainingCapacity();
if (remCapacity < 1000) {
LOG.warn("Very low remaining capacity in the event-queue: "
+ remCapacity);
}
try {
LOG.info("begin to put event in queue.");
eventQueue.put(event);
} catch (InterruptedException e) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", e);
}
throw new YarnRuntimeException(e);
}
};
}
handle函数里,最终把event事件放进了队列eventQueue中:eventQueue.put(event);
注意这个异步调度器AsyncDispatcher类是公用的。
RMAppEventType.START事件放入队列eventQueue中,会被RMAppImpl类获取,进入其handle函数
//RMAppImpl.java
public void handle(RMAppEvent event) {
this.writeLock.lock();
try {
ApplicationId appID = event.getApplicationId();
LOG.info("Processing event for " + appID + " of type "
+ event.getType());
final RMAppState oldState = getState();
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
/* fail the application on the failed transition */
}
if (oldState != getState()) {
LOG.info(appID + " State change from " + oldState + " to "
+ getState());
}
} finally {
this.writeLock.unlock();
}
}
这里面的关键语句是
this.stateMachine.doTransition(event.getType(), event);
这个stateMachine是个状态机工厂,其中绑定了很多的事件转换:
//RMAppImpl.java
private static final StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent> stateMachineFactory
= new StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent>(RMAppState.NEW)
// Transitions from NEW state
.addTransition(RMAppState.NEW, RMAppState.NEW,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
.addTransition(...)
...
addTransition(...)
其中第二个就是
addTransition(RMAppState.NEW, RMAppState.NEW_SAVING, RMAppEventType.START, new RMAppNewlySavingTransition())
意思就是接受RMAppEventType.START类型的事件,将状态由RMAppState.NEW转换为RMAppState.NEW_SAVING,调用的回调类是RMAppNewlySavingTransition。
在addTransition函数中,就将第二个参数postState传给了新构建的内部类SingleInternalArc
//StateMachineFactory.java
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, STATE postState,
EVENTTYPE eventType,
SingleArcTransition<OPERAND, EVENT> hook){
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>
(this, new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT>
(preState, eventType, new SingleInternalArc(postState, hook)));
}
初始化的内部类SingleInternalArc中,
保存了状态转换之后的值postState,此时的值就是RMAppState.NEW_SAVING。
也保存了回调函数hook=RMAppNewlySavingTransition。
//StateMachineFactory.java
SingleInternalArc(STATE postState,
SingleArcTransition<OPERAND, EVENT> hook) {
this.postState = postState;
this.hook = hook;
}
返回到RMAppImpl类的handle函数中,调用this.stateMachine.doTransition(e