1、准备工作
节点 | test-194 | test-206 | test-210 |
---|---|---|---|
Zookeeper | * | * | * |
zkfc | * | * | |
JournalNode | * | * | * |
NameNode | active | standby | |
DataNode | * | * | * |
HMaster | master | backup-master | |
HRegeionServer | * | * | * |
1、安装包
apache-zookeeper-3.5.9-bin.tar.gz
hadoop-3.1.4.tar.gz
hbase-2.3.4-bin.tar.gz
jdk-8u181-linux-x64.tar.gz
解压数据包、删除.txt.md.html.等无用的文本文件、删除.cmd的windows使用命令文件
下载链接
https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz
https://archive.apache.org/dist/hbase/2.3.4/hbase-2.3.4-bin.tar.gz
https://mirrors.huaweicloud.com/java/jdk/8u181-b13/jdk-8u181-linux-x64.tar.gz
2、校准时间
ntpdate time1.aliyun.com
3、配置环境
ln -s /home/jdk/ /usr/local/jdk/
vim /etc/profile
JAVA_HOME=/usr/local/jdk
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH
xport HADOOP_HOME=/home/hadoop
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
4、域名解析(使用DNS域名解析或编辑/etc/hosts)
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.224.192.194 test-194
10.224.192.206 test-206
10.224.192.210 test-210
在一台机器上生成密钥对,将公钥放在authorized_keys中
5、免密码登录
1、生成密钥:
ssh-keygen -t rsa
2、将公钥写入其他机器中
/home/coremail/.ssh/authorized_keys
scp -P56789 -i ~/.ssh/id_rsa id_rsa coremail@10.224.192.210:/home/coremail/.ssh/
scp -P56789 -i ~/.ssh/id_rsa id_rsa coremail@10.224.192.206:/home/coremail/.ssh/
测试登录:
[coremail@test-194 .ssh]$ ssh-agent bash
[coremail@test-194 .ssh]$ ssh-add ~/.ssh/id_rsa
Identity added: /home/coremail/.ssh/id_rsa (/home/coremail/.ssh/id_rsa)
[coremail@test-194 .ssh]$ ssh -A -p 56789 coremail@10.224.192.206
Last login: Thu Apr 15 15:15:10 2021 from 10.224.192.194
[coremail@test-206 ~]$
2、zookeeper部署
1、配置文件zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/zookeeper/zkdatas
clientPort=2181
#保留多少个快照(默认3)
autopurge.snapRetainCount=3
#日志多久清理一次(默认1h)
autopurge.purgeInterval=1
#服务器集群地址
server.1=test-194:2888:3888
server.2=test-206:2888:3888
server.3=test-210:2888:3888
4lw.commands.whitelist=*
2、添加mid
[coremail@test-194 zookeeper]$ mkdir /home/zookeeper/zkdatas/
[coremail@test-194 zkdatas]$ echo 1 > /home/zookeeper/zkdatas/myid
3、将zookeeper复制到其他机器上,更改myid
scp -r -P56789 -i ~/.ssh/id_rsa zookeeper/ coremail@10.224.192.210:/tmp/
scp -r -P56789 -i ~/.ssh/id_rsa zookeeper/ coremail@10.224.192.206:/tmp/
[root@test-206 zkdatas]# cat > myid
2
[root@test-210 zkdatas]# cat > myid
3
4、在每台机器上启动zookeepeer
[coremail@test-206 ~]$ /home/zookeeper/bin/zkServer.sh start
/usr/local/jdk/bin/java
ZooKeeper JMX enabled by default
Using config: /home/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
5、分别查看zookeeper服务器的状态
[coremail@test-206 zookeeper]$ ./bin/zkServer.sh status
/usr/local/jdk/bin/java
ZooKeeper JMX enabled by default
Using config: /home/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
6、zookeeper访问问题
[root@test-194 ~]# echo stat|nc 127.0.0.1 2181
stat is not executed because it is not in the whitelist.
解决方法
在zoo.cfg 文件里加入配置项让这些指令放行
#开启四字命令
4lw.commands.whitelist=*
3、Hadoop HDFS部署
1、core-site.xml 配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-cluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp/</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>test-194:2181,test-206:2181,test-210:2181</value>
</property>
</configuration>
2、hdfs-site.xml 配置
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/data/namenodedatas</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/data/datanodedatas</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster</value>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-cluster</name>
<value>test-194,test-206</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.test-194</name>
<value>test-194:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.test-206</name>
<value>test-206:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.test-194</name>
<value>test-194:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.test-206</name>
<value>test-206:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://test-194:8485;test-206:8485;test-210:8485/hadoop-cluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.hadoop-cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/coremail/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/journalnodes</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
3、workers
test-194
test-206
test-210
4、hadoop-env.sh
export JAVA_HOME=/usr/local/jdk
export HADOOP_SSH_OPTS="-p 56789"
export HADOOP_HOME=/home/hadoop
export HADOOP_PID_DIR=/home/hadoop/tmp/
5、复制到其他机器
scp -r -P56789 -i ~/.ssh/id_rsa zookeeper/ coremail@10.224.192.206:/tmp/
scp -r -P56789 -i ~/.ssh/id_rsa zookeeper/ coremail@10.224.192.210:/tmp/
6、启动服务
#启动所有的journame
hdfs --daemon start journalnode
#HDFS初始化
hdfs namenode -format
#初始化zk
hdfs zkfc -formatZK
#启动HDFS
start-dfs.sh
#备份namenode
[coremail@test-206 ~]$ hdfs namenode -bootstrapStandby
#启动namenode
start-dfs.sh
#修改namenode的active和standby
./hdfs haadmin -transitionToActive test-194
#查看namenode的状态
hdfs haadmin -getServiceState test-194
启动顺序:
1、启动zookeeper
2、启动三台机器的journalnode
3、初始化test-194的namenode,启动namenode
4、初始化test-194的zkfc,启动
5、将namenode同步到test-206,启动namenode
6、格式化test-206的zkfc,启动
7、启动剩余的所有节点start-dfs.sh
7、自动切换
[coremail@test-194 bin]$ hdfs haadmin -transitionToActive test-194
Automatic failover is enabled for NameNode at test-206/10.224.192.206:8020
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the --forcemanual flag.
[coremail@test-194 bin]$ hdfs haadmin -transitionToActive --forcemanual test-194
You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.
It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.
You may abort safely by answering 'n' or hitting ^C now.
Are you sure you want to continue? (Y or N) n
2021-04-18 21:27:23,806 ERROR ha.HAAdmin: Aborted
指定forcemanual标志,可能会引起脑裂,会破坏HDFS空间,如果需要管理,关闭namenode进行故障转移
#查看namenode状态
[coremail@test-194 bin]$ hdfs haadmin -getServiceState test-194
4、HBase部署
1、regionservers
test-194
test-206
test-210
2、hbase-env.sh
#jdk环境变量
export JAVA_HOME=/usr/local/jdk/
#不使用内置zookeeper
export HBASE_MANAGES_ZK=false
#ssh端口
export HBASE_SSH_OPTS="-p 56789"
3、hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hbase/tmp</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-cluster/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>test-194,test-206,test-210</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/zookeeper/zkDatas</value>
</property>
</configuration>
4、backup-masters
test-206
5、复制Hadoop的软连接
ln -s /home/hadoop/etc/hadoop/core-site.xml /home/hbase/conf/core-site.xml
ln -s /home/hadoop/etc/hadoop/hdfs-site.xml /home/hbase/conf/hdfs-site.xml
6、复制到其他机器上
scp -i ~/.ssh/id_rsa -P56789 -r hadoop/ coremail@10.224.192.206:/tmp/
scp -i ~/.ssh/id_rsa -P56789 -r hadoop/ coremail@10.224.192.210:/tmp/
7、启动
[coremail@test-194 hbase]$ bin/start-hbase.sh
5、hadoop 配置文件详解
为这个服务选择一个逻辑名称,例如“hadoop-cluster”,并使用这个逻辑名称作为这个配置选项的值。名称是任意的。既可以用于配置,也可以作为集群中HDFS绝对路径。
注意:如果你也在使用HDFS Federation,这个配置设置还应该包括其他nameservices,HA或其他,以逗号分隔的列表。
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster</value>
</property>
配置以逗号分隔的NameNode id列表。这将被datanode用来确定集群中的所有namenode。例如,如果你以前使用“hadoop-cluster”作为nameservice ID,可以使用“nn1”,“nn2”和“nn3”作为namenode的单独ID,配置如下:
<property>
<name>dfs.ha.namenodes.hadoopcluster</name>
<value>nn1,nn2,nn3</value>
</property>
对于前面配置的两个NameNode id,需要设置NameNode进程的完整地址和IPC端口。注意,这会导致两个单独的配置选项。例如:
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn1</name>
<value>machine1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn2</name>
<value>machine2.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn3</name>
<value>machine3.example.com:8020</value>
</property>
设置NameNode的HTTP服务器要监听的地址
<property>
<name>dfs.namenode.http-address.hadoop-cluster.nn1</name>
<value>machine1.example.com:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.nn2</name>
<value>machine2.example.com:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.nn3</name>
<value>machine3.example.com:9870</value>
</property>
配置journalnode的地址,journalnode提供共享编辑存储,由主NameNode写入并由备NameNode读取,以保持与主nameNode所做的所有文件系统更改同步。尽管必须指定多个JournalNode地址,但应该只配置其中一个uri。URI应该是这样的形式:qjournal://host1:port1;host2:port2;host3:port3/journalId。journal ID是这个命名服务的唯一标识符,它允许一组journalnode为多个命名系统提供存储。例如,如果此集群的JournalNodes在机器“node1.example.com”、“node2.example.com”和“node3.example.com”上运行,并且nameservice ID为“mycluster”,则可以使用以下值作为此设置的值(JournalNode的默认端口为8485):
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/hadoop-cluster</value>
</property>
配置Java类的名称,DFS客户机将使用该类来确定哪个NameNode是当前活动的,从而确定当前哪个NameNode正在为客户机请求提供服务。目前Hadoop附带的两个实现是ConfiguredFailoverProxyProvider和requestthedgingproxyprovider(对于第一个调用,并发调用所有的namenode来确定活动的一个,在后续的请求中,调用活动的namenode直到一个故障转移发生),因此,除非您使用自定义代理提供程序,否则请使用其中之一。例如:
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
为了保证系统的正确性,在任何给定的时间内只有一个NameNode处于活动状态。重要的是,当使用Quorum Journal Manager时,将只允许一个NameNode向journalnode写入数据,因此不存在破坏裂脑场景中的文件系统元数据的可能性。但是,当故障转移发生时,前一个活动的NameNode仍然可能向客户端提供读请求,这些请求可能会过期,直到NameNode在试图写入journalnode时关闭。由于这个原因,即使在使用Quorum Journal Manager时,仍然需要配置一些fencing方法。但是,为了在防御机制失败时提高系统的可用性,建议配置一个防御方法,它可以保证返回成功,作为列表中的最后一个防御方法。请注意,如果您选择不使用实际的fencing方法,您仍然必须为该设置配置一些东西,例如“shell(/bin/true)”。
故障转移过程中使用的隔离方法被配置为一个回车分隔列表,将依次尝试,直到其中一个表明隔离已经成功。Hadoop附带了两种方法:shell和sshfence。有关实现您自己的自定义防御方法的信息,请参阅org.apache.hadoop.ha.NodeFencer类。
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/exampleuser/.ssh/id_rsa</value>
</property>
也可以配置非标准的用户名或端口(可以不配置)来执行SSH。还可以为SSH配置一个超时(以毫秒为单位),在此之后,这个fencing方法将被认为失败。它可以这样配置:
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence([[username][:port]])</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
shell fencing方法运行任意的shell命令。它可以这样配置:
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
</property>
这是JournalNode机器上的绝对路径,JNs使用的编辑和其他本地状态将存储在这里。对于此配置,您只能使用单个路径。通过运行多个独立的journalnode,或在本地连接的RAID阵列上配置此目录,可以提供此数据的冗余。(部署的机器需要创建目录)例如:
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/path/to/journal/node/local/data</value>
</property>
自动故障转移的配置需要在配置中添加两个新参数。在你的hdfs-site.xml文件中添加:
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
在core-site.xml中添加:
<property>
<name>ha.zookeeper.quorum</name>
<value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value>
</property>