Apache Hadoop 2.xconsists of significant improvements over the previous stable release(hadoop-1.x).
相对于Hadoop 1.x的稳定版本,Hadoop 2.x提供了如下重大改进。
Here is a shortoverview of the improvments to both HDFS and MapReduce.
下面简单介绍一下在HDFS和MapReduce方面的改进。
-
HDFS Federation(HDFS联合)
In order to scale thename service horizontally, federation uses multiple independentNamenodes/Namespaces. The Namenodes are federated, that is, the Namenodes areindependent and don't require coordination with each other. The datanodes areused as common storage for blocks by all the Namenodes. Each datanode registerswith all the Namenodes in the cluster. Datanodes send periodic heartbeats andblock reports and handles commands from the Namenodes.
为了水平扩展,HDFS联合中使用多个互相独立的Namenodes。多个Namenodes联合在一起,即Namenodes之间互相独立且无需彼此协调。Datanodes可以为所有的Namenodes使用,以存储数据块。每个Datanode都在集群中的所有Namenodes上注册。Datanode向所有的Namenodes发送心跳信息和数据块列表信息,并响应Namenodes发回的命令。
-
MapReduce NextGen aka YARN aka MRv2(MapReduce NextGen也称为YARN或MRv2)
The new architectureintroduced in hadoop-0.23, divides the two major functions of the JobTracker:resource management and job life-cycle management into separate components.
hadoop-0.23开始使用的新MapReduce架构中,将JobTracker分为“资源管理”和“Job生命周期管理”两个独立的组件。
The newResourceManager manages the global assignment of compute resources toapplications and the per-application ApplicationMaster manages the application’s scheduling and coordination.
ResourceManager组件负责管理计算资源对所有应用的全局分配。
而每个应用都有的ApplicationMaster则负责管理应用内部的调度和协调。
An application iseither a single job in the sense of classic MapReduce jobs or a DAG of suchjobs.
一个应用要么是传统MapReduce概念上的一个Job,要么是Job的一个DAG。
The ResourceManagerand per-machine NodeManager daemon, which manages the user processes on thatmachine, form the computation fabric.
唯一的ResourceManager和每个机器上都有的NodeManager守护进程,一起形成了计算设施。其中,NodeManager负责管理所在机器上的用户进程。
The per-applicationApplicationMaster is, in effect, a framework specific library and is taskedwith negotiating resources from the ResourceManager and working with theNodeManager(s) to execute and monitor the tasks.
每个应用都有的ApplicationMaster是一个框架相关的类库,负责与ResourceManager协调资源,并与NodeManager(s)一起负责执行和监控Task。