1. 安装zookeeper
1. 解压缩tar -zxf zookeeper-3.4.5.tar.gz
2. conf目录下修改文件名zoo_sample.cfg 改为 zoo.cfg # mv zoo_sample.cfgzoo.cfg
3. 修改成如下内容即可(每个主机的这个配置文件一样)
dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
clientPort=2181
initLimit=5
syncLimit=2
server.0=172.17.138.67:4888:5888
server.1=172.17.138.68:4888:5888
server.2=172.17.138.69:4888:5888
server.3=172.17.138.70:4888:5888
4. 启动服务,三台电脑先后执行zkServer start 指令,但三台电脑间执行此指令的间隔不宜过久,如果没有出,错则成功启动
5.
执行测试,在一台机器执行下面操作,要保证能成功连接,否则后面的hadoop namenode可能都是standby模式
#bin/zkCli.sh -server 127.0.0.1:2181
2. 修改hadoop配置
#vi core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopData/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>EBPPTEST01:2181,EBPPTEST02:2181,EBPPTEST03:2181</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>172.17.138.67</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
#vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>EBPPTEST01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>EBPPTEST01:19888</value>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
</property>
</configuration>
#vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoopData/filesystem/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/data,/home/hadoop/hadoopData/filesystem/data</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>EBPPTEST01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>EBPPTEST01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>EBPPTEST02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>EBPPTEST02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://EBPPTEST01:8485;EBPPTEST02:8485;EBPPTEST03:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoopData/journalData</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
#vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
3. 部署
1. 启动JournalNode集群
在需要部署JournalNode的节点执行命令 hadoop-daemon.sh start journalnode
2. 格式化QJM文件系统
./hdfs namenode-initializeSharedEdits
3. 启动之前的namenode
hadoop-daemon.sh start namenode
4. 格式化另一个NameNode,需要在另一台机器上执行
./hdfs namenode -bootstrapStandby
5. 关闭HDFS
stop-dfs.sh
6. 在zookeeper中安装HA
hdfs zkfc -formatZK
7. 启动集群
start-dfs.sh
start-yarn.sh
参考文档:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/