YARN的(最)重要论文
原MapReduce的问题(Hadoop1.0)也就是YARN要解决的问题:
1,tight coupling of a specific programming model with resource management infrastructure, forcing developers to abuse the Mapreduce programming model
2,centralize handling of jobs' control flow, which resulted in endliess scalability conerns for the scheduler
所以开发了YARN:The new architecture decouples the programming model from the resource management infrastructure and delegate many scheduling functions(e,., task fault-tolerance) to per-application components.
对YARN的需求:
1,Scalability
2,Multi-tanancy
3,Serviceability
4,Locality awareness
5,High Cluster Utilization
6,Reliability/Aailability
7,Secure and auditable operation
8,Support for programming model diversity
9,Flexible Resource Model
10,Backword compatibility
YARN构成:
Resource Manager(RM): A deamon on a dedicated machine and act as the central authoirty arbitrating resource among various competing applications
Application Master(AM): Coodinates the logical plan of a single job by requesting resources from the the RM, generating a physical plan from the resource it recieves and coodinating the execution of the plan around faults.
Node Manager(NM): A special system deamon running on each node.
关键点:
The RM dynamically allocate leases-called containers - to applications to run on particular nodes. The container is a logical bundle of resource(e.g., <2GB RAM , 1CPU>) bound to particular node.
All containers in YARN - including AMs are described by a container lauch contest(CLC)
本文探讨了YARN如何解决MapReduce在Hadoop1.0中存在的问题,如紧耦合编程模型与资源管理,以及集中式作业控制流程导致的可扩展性担忧。介绍了YARN通过分离编程模型和资源管理基础设施,将调度功能委托给应用程序组件,以实现需求如可扩展性、多租户、服务性等。详细解析了YARN的构成,包括ResourceManager、ApplicationMaster和NodeManager的角色和功能。
637

被折叠的 条评论
为什么被折叠?



