hadoop-2.6分布式集群环境搭建

最新推荐文章于 2024-10-03 16:48:12 发布

转载最新推荐文章于 2024-10-03 16:48:12 发布 · 377 阅读

hadoop大数据学习专栏收录该内容

18 篇文章

订阅专栏

1.背景

上篇记录了hadoop的核心配置和zookeeper的基本配置，这篇将我的配置记录下，包括启动过程的总结！简单的分布式环境搭建了四遍，也算是懂些了皮毛，总算是可以启动了！我的运行环境这里不在详述。还是声明一点，所有的均是在root用户下完成的！

2.Hadoop 配置

2.1 etc/hadoop 目录下

先进入该目录下：

[plain]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop#   

（1）hadoop-env.sh

配置JAVA运行环境， JAVA_HOME ;

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop# vi hadoop-env.sh   

(2) core-site.xml

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop# more core-site.xml   

全部配置如下：

[html]view plaincopy 
   
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <!--  
   Licensed under the Apache License, Version 2.0 (the "License");  
   you may not use this file except in compliance with the License.  
   You may obtain a copy of the License at  
   
     http://www.apache.org/licenses/LICENSE-2.0  
   
   Unless required by applicable law or agreed to in writing, software  
   distributed under the License is distributed on an "AS IS" BASIS,  
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
   See the License for the specific language governing permissions and  
   limitations under the License. See accompanying LICENSE file.  
 -->  
   
 <!-- Put site-specific property overrides in this file. -->  
   
 <configuration>  
   
 <property>  
   <name>fs.defaultFS</name>  

[html]view plaincopy 
   
 <pre name="code" class="html">  <value>hdfs://yuannews</value>  

</property>

<property> <name>ha.zookeeper.quorum</name> <value>note1:2181,note3:2181,note4:2181</value> </property><property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop2</value></property></configuration>

（3）hdfs-site.xml

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop# cat hdfs-site.xml   

配置如下：

[html]view plaincopy 
   
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <!--  
   Licensed under the Apache License, Version 2.0 (the "License");  
   you may not use this file except in compliance with the License.  
   You may obtain a copy of the License at  
   
     http://www.apache.org/licenses/LICENSE-2.0  
   
   Unless required by applicable law or agreed to in writing, software  
   distributed under the License is distributed on an "AS IS" BASIS,  
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
   See the License for the specific language governing permissions and  
   limitations under the License. See accompanying LICENSE file.  
 -->  
   
 <!-- Put site-specific property overrides in this file. -->  
   
 <configuration>  
   
  <property>  
   <name>dfs.nameservices</name>  
   <value>yuannews</value>  
  </property>  
   
 <property>  
   <name>dfs.ha.namenodes.yuannews</name>  
   <value>nn1,nn2</value>  
 </property>  
   
 <property>  
   <name>dfs.namenode.rpc-address.yuannews.nn1</name>  
   <value>note1:8020</value>  
 </property>  
 <property>  
   <name>dfs.namenode.rpc-address.yuannews.nn2</name>  
   <value>note3:8020</value>  
 </property>  
   
 <property>  
   <name>dfs.namenode.http-address.yuannews.nn1</name>  
   <value>note1:50070</value>  
 </property>  
 <property>  
   <name>dfs.namenode.http-address.yuannews.nn2</name>  
   <value>note3:50070</value>  
 </property>  
   
 <property>  
   <name>dfs.namenode.shared.edits.dir</name>  
   <value>qjournal://note3:8485;note4:8485;note5:8485/yuannews</value>  
 </property>  
   
 <property>  
   <name>dfs.client.failover.proxy.provider.yuannews</name>  
   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>  
 </property>  
   
 <property>  
   <name>dfs.ha.fencing.methods</name>  
   <value>sshfence</value>  
 </property>  
   
 <property>  
   <name>dfs.ha.fencing.ssh.private-key-files</name>  
   <value>/root/.ssh/id_rsa</value>  
 </property>  
   
 <property>  
   <name>dfs.journalnode.edits.dir</name>  
   <value>/opt/hadoop/jn/data/</value>  
 </property>  
   
 <property>  
    <name>dfs.ha.automatic-failover.enabled</name>  
    <value>true</value>  
  </property>  
   
   
 </configuration>  

（4）maperd-site.xml

将 maperd-site.xml.template 重命名为 mapred-site.xml

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop# mv mapred-site.xml.template mapred-site.xml  

配置如下：

[html]view plaincopy 
   
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <!--  
   Licensed under the Apache License, Version 2.0 (the "License");  
   you may not use this file except in compliance with the License.  
   You may obtain a copy of the License at  
   
     http://www.apache.org/licenses/LICENSE-2.0  
   
   Unless required by applicable law or agreed to in writing, software  
   distributed under the License is distributed on an "AS IS" BASIS,  
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
   See the License for the specific language governing permissions and  
   limitations under the License. See accompanying LICENSE file.  
 -->  
   
 <!-- Put site-specific property overrides in this file. -->  
   
 <configuration>  
  <property>  
    <name>mapreduce.framework.name</name>  
    <value>yarn</value>  
  </property>  
 </configuration>  

（5）yarn-site.xml

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/etc/hadoop# more yarn-site.xml   

配置如下：配置主运行节点，我的是 note1 ;

[html]view plaincopy 
   
 <?xml version="1.0"?>  
 <!--  
   Licensed under the Apache License, Version 2.0 (the "License");  
   you may not use this file except in compliance with the License.  
   You may obtain a copy of the License at  
   
     http://www.apache.org/licenses/LICENSE-2.0  
   
   Unless required by applicable law or agreed to in writing, software  
   distributed under the License is distributed on an "AS IS" BASIS,  
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
   See the License for the specific language governing permissions and  
   limitations under the License. See accompanying LICENSE file.  
 -->  
 <configuration>  
   
 <!-- Site specific YARN configuration properties -->  
   
     <property>  
         <name>yarn.resourcemanager.hostname</name>  
         <value>note1</value>  
     </property>  

[html]view plaincopy 
   
     <property>  
         <name>yarn.nodemanager.aux-services</name>  
         <value>mapreduce_shuffle</value>  
     </property>  
   
     <property>  
         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
     </property>  
   
 </configuration>  

（6）slaves

配置其他集群机子地址，相当于 datanode 所在的地址！

配置如下：

[html]view plaincopy 
   
168.56.3  
168.56.4  
168.56.5  

(7) 配置总结

上面配置的 dfs.journalnode.edits.dir 的时候，需要手动创建该目录，其余的就是服务名称了，一定要对！

3.zookeeper配置

（1）zoo.cfg

[html]view plaincopy 
   
 root@note1:~/zookeeper-3.4.6/conf# more zoo.cfg   

将 zoo.simple.cfg 重命名为 zoo.cfg , 配置如下：

[html]view plaincopy 
   
 # The number of milliseconds of each tick  
 tickTime=2000  
 # The number of ticks that the initial   
 # synchronization phase can take  
 initLimit=10  
 # The number of ticks that can pass between   
 # sending a request and getting an acknowledgement  
 syncLimit=5  
 # the directory where the snapshot is stored.  
 # do not use /tmp for storage, /tmp here is just   
 # example sakes.  
 dataDir=/opt/zookeeper  
 # the port at which the clients will connect  
 clientPort=2181  
 # the maximum number of client connections.  
 # increase this if you need to handle more clients  
 # maxClientCnxns=60  
 #  
 # Be sure to read the maintenance section of the   
 # administrator guide before turning on autopurge.  
 #  
 # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance  
   
 # The number of snapshots to retain in dataDir  
 # autopurge.snapRetainCount=3  
 # Purge task interval in hours  
 # Set to "0" to disable auto purge feature  
 # autopurge.purgeInterval=1  
 server.1=note1:2888:3888  
 server.2=note3:2888:3888  
 server.3=note4:2888:3888  

注意：
（1）手动创建 dataDir 目录，我的是 /opt/zookeeper

(2) 在该目录下创建 /opt/zookeeper 目录下，创建 myid 文件，文件内容为上面zoo.cfg配置文件的最后的 server.x 的x , 规则如下：

1）m节点机子运行zookeeper，在m上，复制zookeeper的程序，即解压出来的，并且相同的配置！

2）每个节点机子上都创建 dataDir目录，并创建myid文件

3）myid 文件内容与 zoo.cfg 最后的对应，比如 server.2=note3:2888:3888 ，那么note3节点机子上的myid 内容为 2，仅仅一个2 ，就可以了，依次类推！

（2）全局配置

将 zookeeper的bin 目录配置到 /etc/profile文件中，我的如下：

[html]view plaincopy 
   
 export PATH=$PATH:/root/zookeeper-3.4.6/bin  

别忘了，执行 source /etc/profile !

（3）zookeeper 测试启动过程

[html]view plaincopy 
   
 zkServer.sh start  

4.初始化过程

（1）测试启动 journalnode

进入 hadoop/sbin 目录

[html]view plaincopy 
   
 ./hadoop-daemon.sh start journalnode  

（2）格式化一台namenode

我的有两台namenode , 在所以在一台机子上进行格式化 namenode，这里成为namenode1 , 其他的不需要格式化，但是需要进行以后的操作；

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/bin# ./hdfs namenode -format  

（3）初始化其他namenode

已经格式化了 namenode1 , 现在初始化 namenode2 , 所以，先启动刚才格式化的 namenode1 ：

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/sbin# ./hadoop-daemon.sh start namenode  

后在 namenode2 的节点机子上执行初始化操作：

[html]view plaincopy 
   
 root@note3:~/hadoop-2.6/bin# ./hdfs namenode -bootstrapStandby  

（4）初始化 zkfc

前提是，在你配置的 zookeeper的机子上，启动 zookeeper （ZK）, 然后才能格式化 zkfc , 否则，会报错！

[html]view plaincopy 
   
 root@note1:~/hadoop-2.6/bin# ./hdfs zkfc -formatZK  

（5）启动与停止

[html]view plaincopy 
   
 start-dfs.sh 和 stop-dfs.sh  

(6) 注意

在启动的时候，如果发现没有启动的话，注意检查2点，节点机子ip是否可以 ping通和节点机子的防火墙是否关闭（有时候）；

5.启动过程

先启动 zookeeper , 在启动 hadoop -dfs , 后启动 hadoop - yarn ;