hadoop可分为:
1、本地(单机)模式,默认情况
2、伪分布模式,是在单节点上运行“集群”
3、全分布模式
单机模式在昨天的测试中已经测过,不再记录。
全分布模式配置过程记录:
1、准备两台虚拟机,网络可以互相连接,/etc/hosts文件如下
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.203.111 hdpNameNode
192.168.203.112 hdpDataNode
需要注意的时,127.0.0.1最好不要对应hostname,或者把它放到最下面去,当后面的配置文件中使用hostname时,可能会找错 。datanode无法连接namenode时,这就是这种常见的错误, http://wiki.apache.org/hadoop/ConnectionRefused
2、配置无密码ssh访问,即在hdpNameNode可以无密码,具体配置不在记录
[hdpuser@hdpNameNode hadoop]$ ssh hdpDataNode
Last login: Sat Dec 7 22:17:09 2013 from hdpnamenode
[hdpuser@hdpDataNode ~]$
3、到/home/hdpuser/hadoop-2.2.0/etc/hadoop配置后续文件
4、[hdpuser@hdpDataNode hadoop]$ cat masters
hdpNameNode
hdpNameNode
5、[hdpuser@hdpDataNode hadoop]$ cat slaves
hdpDataNode
hdpDataNode
6、配置
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdpNameNode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
7、配置 mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>Yarn</value>
</property>
</configuration>
8、配置 hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hdpuser/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hdpuser/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
9、配置 yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>Yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>Yarn.resourcemanager.address</name>
<value>hdpNameNode:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>Yarn.resourcemanager.scheduler.address</name>
<value>hdpNameNode:18030</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>Yarn.resourcemanager.webapp.address</name>
<value>hdpNameNode:18088</value>
</property>
<property>
<description>The address of the resource tracker interface.</description>
<name>Yarn.resourcemanager.resource-tracker.address</name>
<value>hdpNameNode:8025</value>
</property>
</configuration>
10、将上述配置直接scp到另外一台机器,目录结构保持和主节点相同
11、格式化HDFS以准备好存储数据
hadoop namenode -format
12、启动集群
[hdpuser@hdpNameNode dfs]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hdpNameNode]
hdpNameNode: starting namenode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-namenode-hdpNameNode.out
hdpDataNode: starting datanode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-datanode-hdpDataNode.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-secondarynamenode-hdpNameNode.out
starting yarn daemons
starting resourcemanager, logging to /home/hdpuser/hadoop-2.2.0/logs/yarn-hdpuser-resourcemanager-hdpNameNode.out
hdpDataNode: starting nodemanager, logging to /home/hdpuser/hadoop-2.2.0/logs/yarn-hdpuser-nodemanager-hdpDataNode.out
13、检查节点进程,在各个节点均执行。
[hdpuser@hdpNameNode dfs]$ jps
9807 ResourceManager
10048 Jps
9493 NameNode
[hdpuser@hdpDataNode dfs]$ jps
6053 DataNode
6167 NodeManager
6293 Jps
[hdpuser@hdpNameNode dfs]$ hadoop fs -mkdir /user
[hdpuser@hdpNameNode dfs]$ hadoop fs -mkdir /user/hdpuser
[hdpuser@hdpNameNode dfs]$ cd
[hdpuser@hdpNameNode ~]$ ls
Desktop dfs hadoop-2.2.0 test.txt
[hdpuser@hdpNameNode ~]$ hadoop fs -put test.txt /user/hdpuser
[hdpuser@hdpNameNode ~]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - hdpuser supergroup 0 2013-12-07 22:40 /user
drwxr-xr-x - hdpuser supergroup 0 2013-12-07 22:40 /user/hdpuser
-rw-r--r-- 3 hdpuser supergroup 17 2013-12-07 22:40 /user/hdpuser/test.txt
15、登录WEB界面查看
Security is OFF
1 files and directories, 0 blocks = 1 total.
Heap Memory used 44.07 MB is 90% of Commited Heap Memory 48.58 MB. Max Heap Memory is 966.69 MB.
Non Heap Memory used 25.17 MB is 70% of Commited Non Heap Memory 35.91 MB. Max Non Heap Memory is 118 MB.
Configured Capacity | : | 46.38 GB | |||
DFS Used | : | 24 KB | |||
Non DFS Used | : | 6.15 GB | |||
DFS Remaining | : | 40.23 GB | |||
DFS Used% | : | 0.00% | |||
DFS Remaining% | : | 86.75% | |||
Block Pool Used | : | 24 KB | |||
Block Pool Used% | : | 0.00% | |||
DataNodes usages | : | Min % | Median % | Max % | stdev % |
0.00% | 0.00% | 0.00% | 0.00% | ||
Live Nodes | : | 1 (Decommissioned: 0) | |||
Dead Nodes | : | 0 (Decommissioned: 0) | |||
Decommissioning Nodes | : | 0 | |||
Number of Under-Replicated Blocks |
配置过程关键问题记录:
1、datanode节点的/etc/hosts文件中,127.0.0.1对应了hdpDataNode,从而造成子节点连不上
2、在反复调试的过程中,不断格式化,从而造成/home/hdpuser/dfs/data/current/VERSION clusterID不同,直接清空目录,重新格式化解决
3、HADOOP框架的变化,造成从网上查看的配置文件有些不同,具体可参考
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/#_3.1_hadoop_0.23.0