1、安装前准备
①、集群规划:
| | 主机名称 | 用户 | 主机IP | 安装软件 | 运行进程 |
| | centos71 | hzq | 192.168.1.201 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) |
| | centos72 | hzq | 192.168.1.202 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) |
| | centos73 | hzq | 192.168.1.203 | jdk、hadoop | ResourceManager |
| | centos74 | hzq | 192.168.1.204 | jdk、hadoop | ResourceManager |
| | centos75 | hzq | 192.168.1.205 | jdk、hadoop | DataNode、NodeManager、JournalNode |
| | centos76 | hzq | 192.168.1.206 | jdk、hadoop | DataNode、NodeManager、JournalNode |
| | centos77 | hzq | 192.168.1.207 | jdk、hadoop | DataNode、NodeManager、JournalNode |
| | centos78 | hzq | 192.168.1.205 | jdk、zookeeper | QuorumPeerMain |
| | centos79 | hzq | 192.168.1.206 | jdk、zookeeper | QuorumPeerMain |
| | centos710 | hzq | 192.168.1.207 | jdk、zookeeper | QuorumPeerMain |
②、每台主机之间设置免密登陆,参考《ssh免密登陆》
③、每台安装jdk1.8.0_131,安装及配置见《Linux安装JDK步骤》
④、Zookeeper集群搭建,搭建步骤参考《zookeeper-3.4.10安装教程---分布式配置》
⑤、修改“etc/hosts"文件如下:
192.168.31.128 centos71
192.168.31.129 centos72
192.168.31.130 centos73
192.168.31.131 centos74
192.168.31.132 centos76
192.168.31.133 centos75
192.168.31.137 centos77
192.168.31.134 centos78
192.168.31.135 centos79
192.168.31.136 centos710
⑥、准备Hadoop安装包:hadoop-2.8.0.tar.gz
⑦、关闭防火墙
2、Hadoop安装:
①、在"/home/hzq/software/"下创建"hadoop"文件夹
②、在"hadoop"目录下创建"data"文件夹,用于存放hadoop运行时文件
③、将"hadoop-2.8.0.tar.gz"解压到hadoop目录下
- tar -zxvf ../package/hadoop-2.8.0.tar.gz -C /home/hzq/software/hadoop/
④、删除"hadoop-2.8.0"下"share"中的doc文件,为了提高scp拷贝时速度
- rm -rf hadoop-2.8.0/share/doc
3、Hadoop配置:
①、修改 hadoop-env.sh 配置文件,修改JAVA_HOME
- export JAVA_HOME=/home/hzq/software/jdk1.8.0_131
②、修改core-site.xml
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://hzqnns/</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hzq/software/hadoop/data</value>
- </property>
- <property>
- <name>ha.zookeeper.quorum</name>
- <value>centos78:2181,centos79:2181,centos710:2181</value>
- </property>
③、修改hdfs-site.xml
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.block.size</name>
- <value>64M</value>
- </property>
- <property>
- <name>dfs.nameservices</name>
- <value>hzqnns</value>
- </property>
- <property>
- <name>dfs.ha.namenodes.hzqnns</name>
- <value>nn1,nn2</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.hzqnns.nn1</name>
- <value>centos71:9000</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.hzqnns.nn1</name>
- <value>centos71:50070</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.hzqnns.nn2</name>
- <value>centos72:9000</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.hzqnns.nn2</name>
- <value>centos72:50070</value>
- </property>
- <property>
- <name>dfs.namenode.shared.edits.dir</name>
- <value>qjournal://centos75:8485;centos76:8485;centos77:8485/hzqnns</value>
- </property>
-
- <property>
- <name>dfs.journalnode.edits.dir</name>
- <value>/home/hzq/software/hadoop/data/journaldata</value>
- </property>
-
- <property>
- <name>dfs.ha.automatic-failover.enabled</name>
- <value>true</value>
- </property>
-
- <property>
- <name>dfs.client.failover.proxy.provider.hzqnns</name>
- <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
- </property>
-
- <property>
- <name>dfs.ha.fencing.methods</name>
- <value>
- sshfence
-
- shell(/bin/true)
- </value>
- </property>
-
- <property>
- <name>dfs.ha.fencing.ssh.private-key-files</name>
- <value>/home/hzq/.ssh/id_rsa</value>
- </property>
- <property>
- <name>dfs.ha.fencing.ssh.connect-timeout</name>
- <value>30000</value>
- </property>
④、mapred-site.xml
将“mapred-site.xml.template”进行重命名。
- mv mapred-site.xml.template mapred-site.xml
修改mapred-site.xml
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
⑤、yarn-site.xml
- <property>
- <name>yarn.resourcemanager.ha.enabled</name>
- <value>true</value>
- </property>
- <property>
- <name>yarn.resourcemanager.cluster-id</name>
- <value>yrc</value>
- </property>
-
- <property>
- <name>yarn.resourcemanager.ha.rm-ids</name>
- <value>rm1,rm2</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname.rm1</name>
- <value>centos73</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname.rm2</name>
- <value>centos74</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.resourcemanager.zk-address</name>
- <value>centos78:2181,centos79:2181,centos710:2181</value>
- </property>
⑥、配置DataNode主机,修改slaves
- centos75
- centos76
- centos77
⑦、将配置好的Hadoop发送到其他六台主机上
- scp -r hadoop/ centos72:/home/hzq/software/
- scp -r hadoop/ centos73:/home/hzq/software/
- scp -r hadoop/ centos74:/home/hzq/software/
- scp -r hadoop/ centos75:/home/hzq/software/
- scp -r hadoop/ centos76:/home/hzq/software/
- scp -r hadoop/ centos77:/home/hzq/software/
4、启动Hadoop(首次启动必须按照顺序来执行)
①、检查Zookeeper集群是否启动完成,如果没有,先启动Zookeeper集群。
- 分别在centos78,centos79,centos710启动zookeeper
②、启动journalnode(分别在centos75、centos76、centos77上执行)
- hadoop-daemon.sh start journalnode
注:运行jps命令检验是否启动成功,如成功,分别在
centos75、centos76、centos77多一个JournalNode进程
③、在centos71上格式化HDFS
④、使两个NameNode数据保持一直,将centos71主机上,data中的数据复制到centos72主机data中。
- scp -r data/ centos72:/home/hzq/software/hadoop/data
⑤、在centos71上格式化ZKFC
⑥、在centos71上启动HDFS
⑦、在centos73上启动Resourcemanager及NodeManager
⑧、在centos74上启动Resourcemanager
- yarn-daemon.sh start resourcemanager
5、验证是否启动成功:
①、在每台主机上分别使用jps验证。
②、HDFS管理界面 http://centos71:50070 或者 http://centos72:50070
③、MR管理界面 http://centos73:8088 或者 http://centos74:8088
6、常用命令:
- hdfs haadmin -getServiceState nn1
- hadoop-daemon.sh start namenode
- hadoop-daemon.sh start zkfc
- yarn-daemon.sh start resourcemanager
7、总结
1、搭建纯属于学习使用,没有做优化等等。
2、望路过大神多多指点指点。