hadoop学习笔记-第三天-搭建集群

    hadoop可分为:

    1、本地(单机)模式,默认情况

    2、伪分布模式,是在单节点上运行“集群”

    3、全分布模式

    单机模式在昨天的测试中已经测过,不再记录。

全分布模式配置过程记录:

   1、准备两台虚拟机,网络可以互相连接,/etc/hosts文件如下

127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.203.111 hdpNameNode
192.168.203.112 hdpDataNode

       需要注意的时,127.0.0.1最好不要对应hostname,或者把它放到最下面去,当后面的配置文件中使用hostname时,可能会找错  。datanode无法连接namenode时,这就是这种常见的错误, http://wiki.apache.org/hadoop/ConnectionRefused
    2、配置无密码ssh访问,即在hdpNameNode可以无密码,具体配置不在记录
     
[hdpuser@hdpNameNode hadoop]$ ssh hdpDataNode
Last login: Sat Dec  7 22:17:09 2013 from hdpnamenode
[hdpuser@hdpDataNode ~]$ 
    3、到/home/hdpuser/hadoop-2.2.0/etc/hadoop配置后续文件
    4、[hdpuser@hdpDataNode hadoop]$ cat masters 
           hdpNameNode
    5、[hdpuser@hdpDataNode hadoop]$ cat slaves 
            hdpDataNode
   6、配置 core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdpNameNode:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/hadoop/tmp</value>
</property>
        <property>
               <name>hadoop.proxyuser.hduser.hosts</name>
               <value>*</value>
       </property>
       <property>
               <name>hadoop.proxyuser.hduser.groups</name>
               <value>*</value>
       </property>

</configuration>

    7、配置 mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>Yarn</value>
</property>

</configuration>

    8、配置 hdfs-site.xml
<configuration>
         <property>
                  <name>dfs.namenode.name.dir</name>
                 <value>file:/home/hdpuser/dfs/name</value>
            </property>
           <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>file:/home/hdpuser/dfs/data</value>
            </property>
            <property>
                     <name>dfs.replication</name>
                     <value>3</value>
             </property>
             <property>
                     <name>dfs.webhdfs.enabled</name>
                     <value>true</value>

         </property>
</configuration>

    9、配置 yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
  <name>Yarn.nodemanager.aux-services</name>
  <value>mapreduce.shuffle</value>
  </property>
  <property>
  <description>The address of the applications manager interface in the RM.</description>
  <name>Yarn.resourcemanager.address</name>
  <value>hdpNameNode:18040</value>
  </property>

  <property>
  <description>The address of the scheduler interface.</description>
  <name>Yarn.resourcemanager.scheduler.address</name>
  <value>hdpNameNode:18030</value>
  </property>

  <property>
  <description>The address of the RM web application.</description>
  <name>Yarn.resourcemanager.webapp.address</name>
  <value>hdpNameNode:18088</value>
  </property>

  <property>
  <description>The address of the resource tracker interface.</description>
  <name>Yarn.resourcemanager.resource-tracker.address</name>
  <value>hdpNameNode:8025</value>
  </property>
</configuration>

   10、将上述配置直接scp到另外一台机器,目录结构保持和主节点相同

   11、格式化HDFS以准备好存储数据
    hadoop namenode -format
   12、启动集群
[hdpuser@hdpNameNode dfs]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hdpNameNode]
hdpNameNode: starting namenode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-namenode-hdpNameNode.out
hdpDataNode: starting datanode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-datanode-hdpDataNode.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hdpuser/hadoop-2.2.0/logs/hadoop-hdpuser-secondarynamenode-hdpNameNode.out
starting yarn daemons
starting resourcemanager, logging to /home/hdpuser/hadoop-2.2.0/logs/yarn-hdpuser-resourcemanager-hdpNameNode.out
hdpDataNode: starting nodemanager, logging to /home/hdpuser/hadoop-2.2.0/logs/yarn-hdpuser-nodemanager-hdpDataNode.out

    13、检查节点进程,在各个节点均执行。
[hdpuser@hdpNameNode dfs]$ jps
9807 ResourceManager
10048 Jps
9493 NameNode


[hdpuser@hdpDataNode dfs]$ jps
6053 DataNode
6167 NodeManager
6293 Jps

    14、添加文件和目录
[hdpuser@hdpNameNode dfs]$ hadoop fs -mkdir /user
[hdpuser@hdpNameNode dfs]$ hadoop fs -mkdir /user/hdpuser
[hdpuser@hdpNameNode dfs]$ cd
[hdpuser@hdpNameNode ~]$ ls
Desktop  dfs  hadoop-2.2.0  test.txt
[hdpuser@hdpNameNode ~]$ hadoop fs -put test.txt /user/hdpuser
[hdpuser@hdpNameNode ~]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x   - hdpuser supergroup          0 2013-12-07 22:40 /user
drwxr-xr-x   - hdpuser supergroup          0 2013-12-07 22:40 /user/hdpuser
-rw-r--r--   3 hdpuser supergroup         17 2013-12-07 22:40 /user/hdpuser/test.txt

     15、登录WEB界面查看

Security is OFF
1 files and directories, 0 blocks = 1 total.
Heap Memory used 44.07 MB is 90% of Commited Heap Memory 48.58 MB. Max Heap Memory is 966.69 MB.
Non Heap Memory used 25.17 MB is 70% of Commited Non Heap Memory 35.91 MB. Max Non Heap Memory is 118 MB.
Configured Capacity: 46.38 GB
DFS Used: 24 KB
Non DFS Used: 6.15 GB
DFS Remaining: 40.23 GB
DFS Used%: 0.00%
DFS Remaining%: 86.75%
Block Pool Used: 24 KB
Block Pool Used%: 0.00%
DataNodes usages: Min %Median %Max %stdev %
   0.00%0.00%0.00%0.00%
Live Nodes: 1 (Decommissioned: 0)
Dead Nodes: 0 (Decommissioned: 0)
Decommissioning Nodes: 0
Number of Under-Replicated Blocks


配置过程关键问题记录:

1、datanode节点的/etc/hosts文件中,127.0.0.1对应了hdpDataNode,从而造成子节点连不上
2、在反复调试的过程中,不断格式化,从而造成/home/hdpuser/dfs/data/current/VERSION   clusterID不同,直接清空目录,重新格式化解决
3、HADOOP框架的变化,造成从网上查看的配置文件有些不同,具体可参考 http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/#_3.1_hadoop_0.23.0




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值