前言
klose的Hadoop0.23.0初探的系列文章的前三篇分别介绍了:Hadoop的变迁的前因后果、HDFS Federation的配置、以及HDFS的NN、SNN、BN和HA之间的关系。第四篇为文章稍微减负以下,主要介绍Yarn的部署,以及跑第一个HelloWorld(MapReduce wordcount)。
YARN框架介绍

ResourceManager负责作业与资源的调度。接收JobSubmitter提交的作业,按照作业的上下文(Context)信息,以及从NodeManager收集来的状态信息,启动调度过程,分配一个Container作为App Mstr。
NodeManager负责Container状态的维护,并向RM保持心跳。
App Mstr负责一个Job生命周期内的所有工作。如果这里的App是MR App,那么这里的App Mstr相当于只负责一个Job的JobTracker。
Container是YARN为了将来作资源隔离而提出的一个框架。这一点应该借鉴了Mesos的工作(参考我的相关文章),只是目前只是一个框架,也仅仅提供java虚拟机内存的隔离。
YARN的部署
1、请按照HDFS Federation部署一文,配置HDFS以及相关参数。
2、配置${YARN_CONF_DIR}/yarn-site.xml
将配置文件yarn-site.xml拷贝到其它节点的${YARN_CONF_DIR}。
3、$bin/start-all.sh
启动RM和NM,这里默认使用${YARN_CONF_DIR}/slaves中的节点作为NM。
ps:${HADOOP_HOME_DIR}/bin中包含了YARN的启动脚本,${HADOOP_HOME_DIR}/sbin中包含了HDFS的启动脚本。
4、测试HelloWorld---WordCount
当前节点是gb17,在${HADOOP_CONF_DIR}/core-site.xml中添加,
<property>
<name>fs.defaultFS</name>
<value>hdfs://gb17:9000</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>
不然使用bin/hadoop fs,需要指定访问HDFS文件系统的Namespace,例如在添加了上面的配置之后:bin/hadoop fs -ls / 就相当于访问 bin/hadoop fs -ls hdfs://gb17:9000/
上传文件:
$bin/hadoop fs -copyFromLocal /home/jiangbing/input input
执行MapReduce程序:
$ bin/hadoop jar ./hadoop-mapreduce-examples-0.23.0.jar wordcount input output
11/12/14 19:35:47 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
11/12/14 19:35:47 INFO mapred.ResourceMgrDelegate: Connecting to ResourceManager at gb17/10.10.102.17:18040
11/12/14 19:35:47 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
11/12/14 19:35:47 INFO mapred.ResourceMgrDelegate: Connected to ResourceManager at gb17/10.10.102.17:18040
11/12/14 19:35:47 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
11/12/14 19:35:47 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
11/12/14 19:35:47 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 19:35:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11/12/14 19:35:47 WARN snappy.LoadSnappy: Snappy native library not loaded
11/12/14 19:35:47 INFO mapreduce.JobSubmitter: number of splits:2
11/12/14 19:35:48 INFO mapred.YARNRunner: AppMaster capability = memory: 2048
11/12/14 19:35:48 INFO mapred.YARNRunner: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
11/12/14 19:35:48 INFO mapred.ResourceMgrDelegate: Submitted application application_1323853159748_0002 to ResourceManager
11/12/14 19:35:48 INFO mapreduce.Job: Running job: job_1323853159748_0002
11/12/14 19:35:49 INFO mapreduce.Job: map 0% reduce 0%
11/12/14 19:35:56 INFO mapred.ClientServiceDelegate: Tracking Url of JOB is gb17:18088/proxy/application_1323853159748_0002/
11/12/14 19:35:56 INFO mapred.ClientServiceDelegate: Connecting to gb22:59973
11/12/14 19:35:56 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
11/12/14 19:35:56 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
11/12/14 19:36:07 INFO mapreduce.Job: map 50% reduce 0%
11/12/14 19:36:09 INFO mapreduce.Job: map 62% reduce 0%
11/12/14 19:36:12 INFO mapreduce.Job: map 65% reduce 0%
11/12/14 19:36:14 INFO mapreduce.Job: map 65% reduce 16%
11/12/14 19:36:15 INFO mapreduce.Job: map 72% reduce 16%
11/12/14 19:36:21 INFO mapreduce.Job: map 78% reduce 16%
11/12/14 19:36:24 INFO mapreduce.Job: map 82% reduce 16%
11/12/14 19:36:30 INFO mapreduce.Job: map 100% reduce 16%
11/12/14 19:36:30 INFO mapreduce.Job: map 100% reduce 100%
11/12/14 19:36:30 INFO mapreduce.Job: Job job_1323853159748_0002 completed successfully
11/12/14 19:36:31 INFO mapreduce.Job: Counters: 44
File System Counters
FILE: BYTES_READ=460980
FILE: BYTES_WRITTEN=748861
FILE: READ_OPS=0
FILE: LARGE_READ_OPS=0
FILE: WRITE_OPS=0
HDFS: BYTES_READ=76089518
HDFS: BYTES_WRITTEN=67808
HDFS: READ_OPS=12
HDFS: LARGE_READ_OPS=0
HDFS: WRITE_OPS=4
org.apache.hadoop.mapreduce.JobCounter
NUM_FAILED_MAPS=1
TOTAL_LAUNCHED_MAPS=3
TOTAL_LAUNCHED_REDUCES=1
DATA_LOCAL_MAPS=3
SLOTS_MILLIS_MAPS=53133
SLOTS_MILLIS_REDUCES=26001
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS=1487938
MAP_OUTPUT_RECORDS=11968511
MAP_OUTPUT_BYTES=123503747
MAP_OUTPUT_MATERIALIZED_BYTES=153542
SPLIT_RAW_BYTES=204
COMBINE_INPUT_RECORDS=11989987
COMBINE_OUTPUT_RECORDS=32214
REDUCE_INPUT_GROUPS=5369
REDUCE_SHUFFLE_BYTES=153542
REDUCE_INPUT_RECORDS=10738
REDUCE_OUTPUT_RECORDS=5369
SPILLED_RECORDS=42952
SHUFFLED_MAPS=2
FAILED_SHUFFLE=0
MERGED_MAP_OUTPUTS=2
GC_TIME_MILLIS=205
CPU_MILLISECONDS=43000
PHYSICAL_MEMORY_BYTES=588017664
VIRTUAL_MEMORY_BYTES=1867513856
COMMITTED_HEAP_BYTES=575340544
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ=76089314
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN=67808
文章来自于klose blog
本文是参阅了Hadoop-0.23.0的源码而来,有这方面问题的朋友可以联系作者,作者的邮箱在文章的参考文献之后给出。
参考文献
[1]http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
[2]https://issues.apache.org/jira/browse/MAPREDUCE-2983
klose的Hadoop0.23.0初探的系列文章的前三篇分别介绍了:Hadoop的变迁的前因后果、HDFS Federation的配置、以及HDFS的NN、SNN、BN和HA之间的关系。第四篇为文章稍微减负以下,主要介绍Yarn的部署,以及跑第一个HelloWorld(MapReduce wordcount)。
YARN框架介绍

ResourceManager负责作业与资源的调度。接收JobSubmitter提交的作业,按照作业的上下文(Context)信息,以及从NodeManager收集来的状态信息,启动调度过程,分配一个Container作为App Mstr。
NodeManager负责Container状态的维护,并向RM保持心跳。
App Mstr负责一个Job生命周期内的所有工作。如果这里的App是MR App,那么这里的App Mstr相当于只负责一个Job的JobTracker。
Container是YARN为了将来作资源隔离而提出的一个框架。这一点应该借鉴了Mesos的工作(参考我的相关文章),只是目前只是一个框架,也仅仅提供java虚拟机内存的隔离。
YARN的部署
1、请按照HDFS Federation部署一文,配置HDFS以及相关参数。
2、配置${YARN_CONF_DIR}/yarn-site.xml
ps:使用gb17节点为我的RM,配置相关的端口,在配置文件中必须配置的是yarn.nodemanager.aux-services<configuration> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>gb17:18040</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>gb17:18030</value> </property> <property> <description>The address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>gb17:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>gb17:18025</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>gb17:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> </configuration>
将配置文件yarn-site.xml拷贝到其它节点的${YARN_CONF_DIR}。
3、$bin/start-all.sh
启动RM和NM,这里默认使用${YARN_CONF_DIR}/slaves中的节点作为NM。
ps:${HADOOP_HOME_DIR}/bin中包含了YARN的启动脚本,${HADOOP_HOME_DIR}/sbin中包含了HDFS的启动脚本。
4、测试HelloWorld---WordCount
当前节点是gb17,在${HADOOP_CONF_DIR}/core-site.xml中添加,
<property>
</property>
不然使用bin/hadoop fs,需要指定访问HDFS文件系统的Namespace,例如在添加了上面的配置之后:bin/hadoop fs -ls / 就相当于访问 bin/hadoop fs -ls hdfs://gb17:9000/
上传文件:
$bin/hadoop
执行MapReduce程序:
$ bin/hadoop
11/12/14 19:35:47 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
11/12/14 19:35:47 INFO mapred.ResourceMgrDelegate: Connecting to ResourceManager at gb17/10.10.102.17:18040
11/12/14 19:35:47 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
11/12/14 19:35:47 INFO mapred.ResourceMgrDelegate: Connected to ResourceManager at gb17/10.10.102.17:18040
11/12/14 19:35:47 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
11/12/14 19:35:47 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
11/12/14 19:35:47 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 19:35:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11/12/14 19:35:47 WARN snappy.LoadSnappy: Snappy native library not loaded
11/12/14 19:35:47 INFO mapreduce.JobSubmitter: number of splits:2
11/12/14 19:35:48 INFO mapred.YARNRunner: AppMaster capability = memory: 2048
11/12/14 19:35:48 INFO mapred.YARNRunner: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
11/12/14 19:35:48 INFO mapred.ResourceMgrDelegate: Submitted application application_1323853159748_0002 to ResourceManager
11/12/14 19:35:48 INFO mapreduce.Job: Running job: job_1323853159748_0002
11/12/14 19:35:49 INFO mapreduce.Job:
11/12/14 19:35:56 INFO mapred.ClientServiceDelegate: Tracking Url of JOB is gb17:18088/proxy/application_1323853159748_0002/
11/12/14 19:35:56 INFO mapred.ClientServiceDelegate: Connecting to gb22:59973
11/12/14 19:35:56 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
11/12/14 19:35:56 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
11/12/14 19:36:07 INFO mapreduce.Job:
11/12/14 19:36:09 INFO mapreduce.Job:
11/12/14 19:36:12 INFO mapreduce.Job:
11/12/14 19:36:14 INFO mapreduce.Job:
11/12/14 19:36:15 INFO mapreduce.Job:
11/12/14 19:36:21 INFO mapreduce.Job:
11/12/14 19:36:24 INFO mapreduce.Job:
11/12/14 19:36:30 INFO mapreduce.Job:
11/12/14 19:36:30 INFO mapreduce.Job:
11/12/14 19:36:30 INFO mapreduce.Job: Job job_1323853159748_0002 completed successfully
11/12/14 19:36:31 INFO mapreduce.Job: Counters: 44
文章来自于klose blog
本文是参阅了Hadoop-0.23.0的源码而来,有这方面问题的朋友可以联系作者,作者的邮箱在文章的参考文献之后给出。
参考文献
[1]http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
[2]https://issues.apache.org/jira/browse/MAPREDUCE-2983