这周尝试了搭建hadoop HA。以下从基本架构和环境搭建两方面进行讲解:
一、基本架构
Hadoop HA即Hadoop的高可用。HDFS集群中存在单点故障的可能,一般只有一个namenode的集群,在namenode机器出现意外之后,那么将会导致整个集群不可用。为了解决这种问题,Hadoop 给出了Hadoop HA的方案。Hadoop HA由两个NameNode组成,一个namenode处理active状态,一个namenode处于standby状态。active状态保持对外的服务,standby状态仅同步active的namenode,当active的namenode宕机时能迅速的启用并进行切换。
下图则详细的描述了hadoop ha 的架构:主要大的分为三部分,ZooKeeper、NameNode和DataNode。
二、环境搭建
上面介绍了Hadoop HA的基本架构,了解到Hadoop HA的搭建中主要安装Zookeeper、Hadoop,配置namenode和datanode,一般,机器为奇数的个数。
以下为安装的环境:
系统:CentOS 6.8 64位
JDK:1.7
Zookeeper:3.4.9
Hadoop:2.7.3
机器与安装软件:
主机名 | IP地址 | 安装的软件 |
namenode1 | 192.168.1.10 | zookeeper、hadoop |
namenode2 | 192.168.1.11 | zookeeper、hadoop |
datanode | 192.168.1.12 | zookeeper、hadoop |
1.添加用户和用户组hadoop
groupadd hadoop
useradd hadoop hadoop
vi /etc/hosts
192.168.1.10 namenode1
192.168.1.11 namenode2
192.168.1.12 datanode
3.ssh免密码登录
ssh-keygen -t rsa //生成秘钥
ssh-copy-id -i ~/.ssh/id_rsa.pub namenode1 //复制到其他机器
ssh-copy-id -i ~/.ssh/id_rsa.pub namenode2
ssh-copy-id -i ~/.ssh/id_rsa.pub datanode
三台主机重复以上命令,使三台主机相互之间面密码登录,使用ssh namenode1,ssh namenode2,ssh datanode测试
4.关闭防火墙
service iptables stop
5.关于selinux
vi /etc/selinux/config
SELINUX=disabled
要使以上配置生效,必须重启机器
6.安装JDK,第一篇文章有介绍,这里就不复述了。
7.安装Zookeeper
(1)解压到/opt下,配置conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/hadoopdata/zookeeper/data
dataLogDir=/opt/hadoopdata/zookeeper/data/log
# the port at which the clients will connect
clientPort=2181
server.1=namenode1:2888:3888
server.2=namenode2:2888:3888
server.3=datanode:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
(2) 建立配置文件中的文件夹/opt/hadoopdata/zookeeper/data,并在下面建立log文件夹和myid文件
(3) 将hadoopdata文件夹和解压并配置好的zookeeper文件夹复制到其他两台机器中,并修改为hadoop用户所有。
(4) 修改myid文件,三台机器一次修改为1/2/3
(5) 依次启动zookeeper
cd bin
./zkServer.sh start
启动之后能看到,QuorumPeerMain进程启动
----------------------------------------------------------------------------------------------------
以上,zookeeper的集群配置就结束了,但是,我的zookeeper其实启动是失败的,用./zkServer.sh status查看,会有如下错误,查了一下资料,各项设置都对了,防火墙也关了,不知还有怎样的方法可以尝试,若有人知道还烦请告知。所以,接下去配置hadoop,启动等等都出现了各种问题,比如namenode启动不了,zfkc启动不了等。但是下面我还是先记录一下步骤,下周有空我再来看看,若解决了再补充上。
8.安装hadoop
安装hadoop步骤第一篇文章也介绍过了,这里就不重复,主要是一些配置文件。
(1)hdfs-env.sh
# 以下为做过一些修改的地方
export JAVA_HOME=/opt/java/jdk1.7.0_25
export HADOOP_HEAPSIZE=1000
export HADOOP_PID_DIR=/opt/hadoop-2.7.3/pids
export HADOOP_LOG_DIR=/opt/hadoopdata/hadooplogs
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
以上配置里面的路径都得自己建立(2)core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoopdata/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>namenode1:2181,namenode2:2181,datanode:2181</value>
</property>
</configuration>
(3)hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>namenode1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>namenode2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://namenode1:8485;namenode2:8485;datanode:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfifuredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.jouralnode.edits.dir</name>
<value>/opt/mdisk/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/opt/hadoopdata/namenodedata</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoopdata/datanodedata</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>200</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>150</value>
</property>
</configuration>
9.配置环境变量
export JAVA_HOME=/opt/java/jdk1.7.0_25
export PATH=$PATH:$JAVA_HOME/bin
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.9
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export HADOOP_HOME=/opt/hadoop-2.7.3
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
10.初始化与启动hadoop
三台机器上分别启动journalnode
cd /opt/hadoop-2.7.3/sbin
./hadoop-daemon.sh start journalnode
namenode1上namenode格式化:hadoop namenode -format
启动namenode./hadoop-daemon.sh start namenode
之后namenode2同步hdfshdfs namenode -bootstrapStandy
namenode2上启动namenode
./hadoop-daemon.sh start namenode
namenode1上格式化zkfc
hdfs zkfc -formatZK
namenode1上启动dfs:
./start-dfs.sh
启动完成之后使用JPS如果能看到namenode1启动了namenode和zkfc;namenode2启动了namenode和zkfc;datanode启动了datanode;则表示一切正常。但由于我之前的zookeeper配置有点问题,导致后面的启动不太正常,namenode和zkfc启动不起来,其他都正常。具体出错原因暂时还未发现。还请各位若知道,请告知。感激不尽!!