Hadoop MapReduce Next Generation - Writing YARN Applications

最新推荐文章于 2017-02-28 21:14:24 发布

转载最新推荐文章于 2017-02-28 21:14:24 发布 · 1k 阅读

hadoop 专栏收录该内容

103 篇文章

订阅专栏

本文详细阐述了如何使用YARN向ResourceManager提交应用程序的过程，包括获取ApplicationId、提供必要的信息以启动应用程序的容器、注册ApplicationMaster并与ResourceManager通信，直至任务完成。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The general concept is that an 'Application Submission Client' submits an 'Application' to the YARN Resource Manager. The client communicates with the ResourceManager using the 'ApplicationClientProtocol' to first acquire a new 'ApplicationId' if needed via ApplicationClientProtocol#getNewApplication and then submit the 'Application' to be run via ApplicationClientProtocol#submitApplication. As part of the ApplicationClientProtocol#submitApplication call, the client needs to provide sufficient information to the ResourceManager to 'launch' the application's first container i.e. the ApplicationMaster. You need to provide information such as the details about the local files/jars that need to be available for your application to run, the actual command that needs to be executed (with the necessary command line arguments), any Unix environment settings (optional), etc. Effectively, you need to describe the Unix process(es) that needs to be launched for your ApplicationMaster.

The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container. The ApplicationMaster is then expected to communicate with the ResourceManager using the 'ApplicationMasterProtocol'. Firstly, the ApplicationMaster needs to register itself with the ResourceManager. To complete the task assigned to it, the ApplicationMaster can then request for and receive containers via ApplicationMasterProtocol#allocate. After a container is allocated to it, the ApplicationMaster communicates with the NodeManager using ContainerManager#startContainer to launch the container for its task. As part of launching this container, the ApplicationMaster has to specify the ContainerLaunchContext which, similar to the ApplicationSubmissionContext, has the launch information such as command line specification, environment, etc. Once the task is completed, the ApplicationMaster has to signal the ResourceManager of its completion via the ApplicationMasterProtocol#finishApplicationMaster.

Meanwhile, the client can monitor the application's status by querying the ResourceManager or by directly querying the ApplicationMaster if it supports such a service. If needed, it can also kill the application via ApplicationClientProtocol#forceKillApplication.

Ref: http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html