本文转自:http://www.micmiu.com/bigdata/hadoop/hadoop2-cluster-ha-setup/?utm_source=tuicool&utm_medium=referral
[一]、 引言
在Hadoop2.x初期的时候写过一篇 hadoop 2.2.0 集群模式安装配置和测试,记录了分布式搭建的最基本的搭建步骤和运行演示,那篇文章中没有对HA的配置做实验,本文会详细介绍 Hadoop2的分布式、NameNode配置HA以及ResourceManage配置HA的实验过程。
[二]、 实验环境
1、各节点及角色分配
本文以5个集群节点为基础做实验环境,具体的角色分配如下:
hostname | NameNode | DataNode | JournalNode | Zookeeper | ZKFC | ResourceManager |
nn1.hadoop | √(Active) | √ | √ | √ | √ | |
nn2.hadoop | √(Standby) | √ | √ | √ | √ | |
dn1.hadoop | √ | √ | √ | |||
dn2.hadoop | √ | |||||
dn3.hadoop | √ |
2、系统及软件版本
- CentOS 6.3 64位
- Java 1.7.0_75
- Hadoop 2.6.0
- zookeeper 3.4.6
3、安装JDK (所有节点需要操作)
1
2
3
4
5
6
7
8
|
//查询openjdk相关安装包
rpm
-
qa
|
grep
java
//卸载openjdk
rpm
-
e
--
nodeps
java
-
1.6.0
-
openjdk
-
1.6.0.0
-
1.45.1.11.1.el6.x86_64
rpm
-
e
--
nodeps
java
-
1.6.0
-
openjdk
-
javadoc
-
1.6.0.0
-
1.45.1.11.1.el6.x86_64
rpm
-
e
--
nodeps
java
-
1.6.0
-
openjdk
-
devel
-
1.6.0.0
-
1.45.1.11.1.el6.x86_64
rpm
-
e
--
nodeps
tzdata
-
java
-
2012c
-
1.el6.noarch
rpm
-
e
--
nodeps
java
-
1.5.0
-
gcj
-
1.5.0.0
-
29.1.el6.x86_64
|
Oracle官方下载 64为 jdk :jdk-7u3-linux-x64.rpm 执行安装命令:
rpm -ivh jdk-7u3-linux-x64.rpm
默认的安装路径:/usr/java/jdk1.7.0_75
4、配置hosts (所有节点需要操作)
1
2
3
4
5
|
172.17.225.61
nn1
.
hadoop
zk1
.
hadoop
172.17.225.121
nn2
.
hadoop
zk2
.
hadoop
172.17.225.72
dn1
.
hadoop
zk3
.
hadoop
172.17.225.76
dn2
.
hadoop
172.17.225.19
dn3
.
hadoop
|
5、确认SSHD已经安装并启动 (所有节点需要操作)
6、配置时钟同步
第一种方法 :(所有节点都要操作)都从公共NTP服务器同步,执行如下:
1
2
3
4
|
$
cp
/
usr
/
share
/
zoneinfo
/
Asia
/
Shanghai
/
etc
/
localtime
$
ntpdate
us
.
pool
.
ntp
.
org
$
crontab
-
e
0
-
59
/
10
*
*
*
*
/
usr
/
sbin
/
ntpdate
us
.
pool
.
ntp
.
org
|
logger
-
t
NTP
|
第二种方法:选一个节点搭建一个NTP服务,其他节点从该NTP服务器同步
7、创建专有用户(所有节点需要操作)
比如创建 hadoop用户,密码也初始化为hadoop, 下面有关hadoop部署配置都是以这个用户操作的
1
2
3
|
groupadd
hadoop
useradd
-
g
hadoop
hadoop
passwd
hadoop
|
为hadoop 用户修改环境变量 vi ~/.bash_profile
:
1
2
|
export
JAVA_HOME
=
/
usr
/
java
/
jdk1
.
7.0_75
export
PATH
=
"$JAVA_HOME/bin:$PATH"
|
8、SSH免密码登陆
配置所有的NameNode节点 可以免密码登录到其余所有节点,只需要单向免密登录即可,当然你要配置为双向也无妨。有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆
[三]、 实验过程
1、hadoop2的编译
在实验环境中任节点机器上 下载hadoop 2.6.0的源码,安装配置好Java 和Maven 然后执行 mvn package -Pdist,native -DskipTests -Dtar
进行源码编译,具体可参考:
2、zookeeper安装配置
下载最新稳定版本(3.4.6)部署在ZK的各个节点,修改环境变量vi ~/.bash_profile
:
1
2
|
export
ZOOKEEPER_HOME
=
/
usr
/
local
/
share
/
zookeeper
export
PATH
=
"$ZOOKEEPER_HOME/bin:$PATH"
|
修改配置文件:
1
2
3
|
cd
$
ZOOKEEPER_HOME
cp
conf
/
zoo_sample
.
cfg
conf
/
zoo
.
cfg
vi
conf
/
zoo
.
cfg
|
修改成如下:
1
2
3
4
5
6
7
8
9
|
tickTime
=
2000
initLimit
=
10
syncLimit
=
5
clientPort
=
2181
dataDir
=
/
bigdata
/
hadoop
/
zookeeper
/
zkdata
dataLogDir
=
/
bigdata
/
hadoop
/
zookeeper
/
zklogs
server
.
1
=
zk1
.
hadoop
:
2888
:
3888
server
.
2
=
zk2
.
hadoop
:
2888
:
3888
server
.
3
=
zk3
.
hadoop
:
2888
:
3888
|
配置文件中的相关目录路径需要先创建好且hadoop用户具有读写权限,不同zk节点配置不同的myid:
- 在zk1.hadoop 节点中 执行:echo 1 > /bigdata/hadoop/zookeeper/zkdata/myid
- 在zk2.hadoop 节点中 执行:echo 2 > /bigdata/hadoop/zookeeper/zkdata/myid
- 在zk3.hadoop 节点中 执行:echo 3 > /bigdata/hadoop/zookeeper/zkdata/myid
myid中的数值需要和 zoo.cfg中的配置一致。
3、hadoop 安装配置(所有节点需要修改)
3.1、配置环境变量vi ~/.bash_profile
:
1
2
|
export
HADOOP_HOME
=
/
usr
/
local
/
share
/
hadoop
export
PATH
=
"$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin"
|
3.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
<configuration>
<property>
<name>
fs.defaultFS
</name>
<value>
hdfs://mycluster
</value>
<description>
这里的 mycluster为HA集群的逻辑名,
与hdfs-site.xml中的dfs.nameservices配置一致
</description>
</property>
<property>
<name>
hadoop.tmp.dir
</name>
<value>
/bigdata/hadoop/temp
</value>
<description>
这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录。
用户也可单独指定每类数据的存储目录。这里目录结构需要自己先创建好
</description>
</property>
<property>
<name>
ha.zookeeper.quorum
</name>
<value>
zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181
</value>
<description>
这里是zk集群配置中各节点的地址和端口。
注意:数量一定是奇数而且和zoo.cfg中配置的一致
</description>
</property>
</configuration>
|
3.3、修改 $HADOOP_HOME/etc/hadoop/hfds-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
<property>
<name>
dfs.replication
</name>
<value>
3
</value>
<description>
配置副本数量
</description>
</property>
<property>
<name>
dfs.namenode.name.dir
</name>
<value>
file:/bigdata/hadoop/dfs/name
</value>
<description>
namenode元数据存储目录
</description>
</property>
<property>
<name>
dfs.datanode.data.dir
</name>
<value>
file:/bigdata/hadoop/dfs/data
</value>
<description>
datanode元数据存储目录
</description>
</property>
<property>
<name>
dfs.nameservices
</name>
<value>
mycluster
</value>
<description>
指定HA命名服务,可随意起名,
core-site.xml中fs.defaultFS配置需要引用它
</description>
</property>
<property>
<name>
dfs.ha.namenodes.mycluster
</name>
<value>
nn1,nn2
</value>
<description>
指定集群下NameNode逻辑名
</description>
</property>
<property>
<name>
dfs.namenode.rpc-address.mycluster.nn1
</name>
<value>
nn1.hadoop:9000
</value>
</property>
<property>
<name>
dfs.namenode.rpc-address.mycluster.nn2
</name>
<value>
nn2.hadoop:9000
</value>
</property>
<property>
<name>
dfs.namenode.http-address.mycluster.nn1
</name>
<value>
nn1.hadoop:50070
</value>
</property>
<property>
<name>
dfs.namenode.http-address.mycluster.nn2
</name>
<value>
nn2.hadoop:50070
</value>
</property>
<property>
<name>
dfs.namenode.servicerpc-address.mycluster.nn1
</name>
<value>
nn1.hadoop:53310
</value>
</property>
<property>
<name>
dfs.namenode.servicerpc-address.mycluster.nn2
</name>
<value>
nn2.hadoop:53310
</value>
</property>
<property>
<name>
dfs.ha.automatic-failover.enabled.mycluster
</name>
<value>
true
</value>
<description>
故障失败是否自动切换
</description>
</property>
<property>
<name>
dfs.namenode.shared.edits.dir
</name>
<value>
qjournal://nn1.hadoop:8485;nn2.hadoop:8485;dn1.hadoop:8485/hadoop-journal
</value>
<description>
配置JournalNode,包含三部分:
1.qjournal 前缀表名协议;
2.然后就是三台部署JournalNode的主机host/ip:端口,三台机器之间用分号分隔;
3.最后的hadoop-journal是journalnode的命名空间,可以随意取名。
</description>
</property>
<property>
<name>
dfs.journalnode.edits.dir
</name>
<value>
/bigdata/hadoop/dfs/journal/
</value>
<description>
journalnode的本地数据存放目录
</description>
</property>
<property>
<name>
dfs.client.failover.proxy.provider.mycluster
</name>
<value>
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
</value>
<description>
指定mycluster出故障时执行故障切换的类
</description>
</property>
<property>
<name>
dfs.ha.fencing.methods
</name>
<value>
sshfence
</value>
<description>
ssh的操作方式执行故障切换
</description>
</property>
<property>
<name>
dfs.ha.fencing.ssh.private-key-files
</name>
<value>
/home/hadoop/.ssh/id_dsa
</value>
<description>
如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置
</description>
</property>
<property>
<name>
dfs.ha.fencing.ssh.connect-timeout
</name>
<value>
1000
</value>
</property>
<property>
<name>
dfs.namenode.handler.count
</name>
<value>
10
</value>
</property>
|
3.4、修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
<property>
<name>
yarn.nodemanager.aux-services
</name>
<value>
mapreduce_shuffle
</value>
</property>
<property>
<name>
yarn.nodemanager.aux-services.mapreduce.shuffle.class
</name>
<value>
org.apache.hadoop.mapred.ShuffleHandler
</value>
</property>
<property>
<name>
yarn.resourcemanager.ha.enabled
</name>
<value>
true
</value>
</property>
<property>
<name>
yarn.resourcemanager.cluster-id
</name>
<value>
clusterrm
</value>
</property>
<property>
<name>
yarn.resourcemanager.ha.rm-ids
</name>
<value>
rm1,rm2
</value>
</property>
<property>
<name>
yarn.resourcemanager.hostname.rm1
</name>
<value>
nn1.hadoop
</value>
</property>
<property>
<name>
yarn.resourcemanager.hostname.rm2
</name>
<value>
nn2.hadoop
</value>
</property>
<property>
<name>
yarn.resourcemanager.recovery.enabled
</name>
<value>
true
</value>
</property>
<property>
<name>
yarn.resourcemanager.store.class
</name>
<value>
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
</value>
</property>
<property>
<name>
yarn.resourcemanager.zk-address
</name>
<value>
zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181
</value>
</property>
|
PS: yarn-site.xml中的HA相关配置格式和hdfs-site.xml中的HA配置类似。
3.5、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
1
2
3
4
5
|
<property>
<name>
mapreduce.framework.name
</name>
<value>
yarn
</value>
<final>
true
</final>
</property>
|
3.6、修改 $HADOOP_HOME/etc/hadoop/salves
1
2
3
|
dn1
.
hadoop
dn2
.
hadoop
dn3
.
hadoop
|
4、启动步骤和详细过程:
4.1、启动ZK
在所有的ZK节点执行命令: zkServer.sh start
可借助命令 zkServer.sh status
查看各个ZK的从属关系
4.2、格式化ZK(仅第一次需要做)
任意ZK节点上执行:hdfs zkfc -formatZK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
[
hadoop
@
nn1
micmiu
]
$
hdfs
zkfc
-
formatZK
15
/
02
/
02
16
:
54
:
24
INFO
tools
.
DFSZKFailoverController
:
Failover
controller
configured
for
NameNode
NameNode
at
nn1
.
hadoop
/
172.17.225.61
:
53310
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
zookeeper
.
version
=
3.4.6
-
1569965
,
built
on
02
/
20
/
2014
09
:
09
GMT
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
host
.
name
=
nn1
.
hadoop
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
version
=
1.7.0_75
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
vendor
=
Oracle
Corporation
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
home
=
/
usr
/
java
/
jdk1
.
7.0_75
/
jre
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
class
.
path
=
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
etc
/
hadoop
:
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
.
.
.
.
.
.
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
library
.
path
=
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
lib
/
native
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
io
.
tmpdir
=
/
tmp
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
java
.
compiler
=&
lt
;
NA
&
gt
;
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
os
.
name
=
Linux
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
os
.
arch
=
amd64
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
os
.
version
=
2.6.32
-
279.el6.x86_64
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
user
.
name
=
hadoop
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
user
.
home
=
/
home
/
hadoop
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Client
environment
:
user
.
dir
=
/
usr
/
local
/
share
/
hadoop
-
2.6.0
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Initiating
client
connection
,
connectString
=
zk1
.
hadoop
:
2181
,
zk2
.
hadoop
:
2181
,
zk3
.
hadoop
:
2181
sessionTimeout
=
5000
watcher
=
org
.
apache
.
hadoop
.
ha
.
ActiveStandbyElector
$
WatcherWithClientRef
@
1e884ca9
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ClientCnxn
:
Opening
socket
connection
to
server
nn1
.
hadoop
/
172.17.225.61
:
2181.
Will
not
attempt
to
authenticate
using
SASL
(
unknown
error
)
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ClientCnxn
:
Socket
connection
established
to
nn1
.
hadoop
/
172.17.225.61
:
2181
,
initiating
session
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ClientCnxn
:
Session
establishment
complete
on
server
nn1
.
hadoop
/
172.17.225.61
:
2181
,
sessionid
=
0x14b496d55810000
,
negotiated
timeout
=
5000
15
/
02
/
02
16
:
54
:
24
INFO
ha
.
ActiveStandbyElector
:
Session
connected
.
15
/
02
/
02
16
:
54
:
24
INFO
ha
.
ActiveStandbyElector
:
Successfully
created
/
hadoop
-
ha
/
mycluster
in
ZK
.
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ZooKeeper
:
Session
:
0x14b496d55810000
closed
15
/
02
/
02
16
:
54
:
24
INFO
zookeeper
.
ClientCnxn
:
EventThread
shut
down
|
4.3、启动ZKFC
ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
1
2
3
4
5
6
7
|
[
hadoop
@
nn1
micmiu
]
$
hadoop
-
daemon
.
sh
start
zkfc
starting
zkfc
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
zkfc
-
nn1
.
hadoop
.
out
[
hadoop
@
nn1
micmiu
]
$
[
hadoop
@
nn2
micmiu
]
$
hadoop
-
daemon
.
sh
start
zkfc
starting
zkfc
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
zkfc
-
nn2
.
hadoop
.
out
[
hadoop
@
nn2
micmiu
]
$
jps
|
4.4、启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
#JN节点1
[
hadoop
@
nn1
micmiu
]
$
hadoop
-
daemon
.
sh
start
journalnode
starting
journalnode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
journalnode
-
nn1
.
hadoop
.
out
[
hadoop
@
nn1
micmiu
]
$
jps
8499
QuorumPeerMain
8771
DFSZKFailoverController
8895
Jps
8837
JournalNode
#JN节点2
[
hadoop
@
nn2
micmiu
]
$
hadoop
-
daemon
.
sh
start
journalnode
starting
journalnode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
journalnode
-
nn2
.
hadoop
.
out
[
hadoop
@
nn2
micmiu
]
$
jps
7828
QuorumPeerMain
8198
JournalNode
8082
DFSZKFailoverController
8252
Jps
#JN节点3
[
hadoop
@
dn1
micmiu
]
$
hadoop
-
daemon
.
sh
start
journalnode
starting
journalnode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
journalnode
-
dn1
.
hadoop
.
out
[
hadoop
@
dn1
~
]
$
jps
748
QuorumPeerMain
1008
JournalNode
1063
Jps
|
4.5、格式化并启动主NN
格式化:hdfs namenode -format
注意:只有第一次启动系统时需格式化,请勿重复格式化!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
|
[
hadoop
@
nn1
micmiu
]
$
hdfs
namenode
-
format
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
NameNode
:
STARTUP_MSG
:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nn1.hadoop/172.17.225.61
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/share/hadoop/common/lib/.......
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG: java = 1.7.0_75
************************************************************/
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
NameNode
:
registered
UNIX
signal
handlers
for
[
TERM
,
HUP
,
INT
]
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
NameNode
:
createNameNode
[
-
format
]
Formatting
using
clusterid
:
CID
-
237f7c54
-
db75
-
470c
-
8baf
-
d4dcfaddaf2f
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
No
KeyProvider
found
.
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
fsLock
is
fair
:
true
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
DatanodeManager
:
dfs
.
block
.
invalidate
.
limit
=
1000
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
DatanodeManager
:
dfs
.
namenode
.
datanode
.
registration
.
ip
-
hostname
-
check
=
true
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
dfs
.
namenode
.
startup
.
delay
.
block
.
deletion
.
sec
is
set
to
000
:
00
:
00
:
00.000
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
The
block
deletion
will
start
around
2015
Feb
02
17
:
03
:
05
15
/
02
/
02
17
:
03
:
05
INFO
util
.
GSet
:
Computing
capacity
for
map
BlocksMap
15
/
02
/
02
17
:
03
:
05
INFO
util
.
GSet
:
VM
type
=
64
-
bit
15
/
02
/
02
17
:
03
:
05
INFO
util
.
GSet
:
2.0
%
max
memory
889
MB
=
17.8
MB
15
/
02
/
02
17
:
03
:
05
INFO
util
.
GSet
:
capacity
=
2
^
21
=
2097152
entries
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
dfs
.
block
.
access
.
token
.
enable
=
false
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
defaultReplication
=
3
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
maxReplication
=
512
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
minReplication
=
1
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
maxReplicationStreams
=
2
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
shouldCheckForEnoughRacks
=
false
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
replicationRecheckInterval
=
3000
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
encryptDataTransfer
=
false
15
/
02
/
02
17
:
03
:
05
INFO
blockmanagement
.
BlockManager
:
maxNumBlocksToLog
=
1000
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
fsOwner
=
hadoop
(
auth
:
SIMPLE
)
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
supergroup
=
supergroup
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
isPermissionEnabled
=
true
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
Determined
nameservice
ID
:
mycluster
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
HA
Enabled
:
true
15
/
02
/
02
17
:
03
:
05
INFO
namenode
.
FSNamesystem
:
Append
Enabled
:
true
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
Computing
capacity
for
map
INodeMap
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
VM
type
=
64
-
bit
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
1.0
%
max
memory
889
MB
=
8.9
MB
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
capacity
=
2
^
20
=
1048576
entries
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
NameNode
:
Caching
file
names
occuring
more
than
10
times
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
Computing
capacity
for
map
cachedBlocks
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
VM
type
=
64
-
bit
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
0.25
%
max
memory
889
MB
=
2.2
MB
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
capacity
=
2
^
18
=
262144
entries
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
FSNamesystem
:
dfs
.
namenode
.
safemode
.
threshold
-
pct
=
0.9990000128746033
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
FSNamesystem
:
dfs
.
namenode
.
safemode
.
min
.
datanodes
=
0
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
FSNamesystem
:
dfs
.
namenode
.
safemode
.
extension
=
30000
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
FSNamesystem
:
Retry
cache
on
namenode
is
enabled
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
FSNamesystem
:
Retry
cache
will
use
0.03
of
total
heap
and
retry
cache
entry
expiry
time
is
600000
millis
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
Computing
capacity
for
map
NameNodeRetryCache
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
VM
type
=
64
-
bit
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
0.029999999329447746
%
max
memory
889
MB
=
273.1
KB
15
/
02
/
02
17
:
03
:
06
INFO
util
.
GSet
:
capacity
=
2
^
15
=
32768
entries
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
NNConf
:
ACLs
enabled
?
false
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
NNConf
:
XAttrs
enabled
?
true
15
/
02
/
02
17
:
03
:
06
INFO
namenode
.
NNConf
:
Maximum
size
of
an
xattr
:
16384
15
/
02
/
02
17
:
03
:
07
INFO
namenode
.
FSImage
:
Allocated
new
BlockPoolId
:
BP
-
711086735
-
172.17.225.61
-
1422867787014
15
/
02
/
02
17
:
03
:
07
INFO
common
.
Storage
:
Storage
directory
/
bigdata
/
hadoop
/
dfs
/
name
has
been
successfully
formatted
.
15
/
02
/
02
17
:
03
:
07
INFO
namenode
.
NNStorageRetentionManager
:
Going
to
retain
1
images
with
txid
&
gt
;
=
0
15
/
02
/
02
17
:
03
:
07
INFO
util
.
ExitUtil
:
Exiting
with
status
0
15
/
02
/
02
17
:
03
:
07
INFO
namenode
.
NameNode
:
SHUTDOWN_MSG
:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1.hadoop/172.17.225.61
************************************************************/
[
hadoop
@
nn1
micmiu
]
$
|
在主NN节点执行命令启动NN: hadoop-daemon.sh start namenode
可以对比查看启动前后NN节点的进程:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
#启动前
[
hadoop
@
nn1
micmiu
]
$
jps
8499
QuorumPeerMain
8771
DFSZKFailoverController
8988
Jps
8837
JournalNode
[
hadoop
@
nn1
micmiu
]
$
hadoop
-
daemon
.
sh
start
namenode
starting
namenode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
namenode
-
nn1
.
hadoop
.
out
#启动后
[
hadoop
@
nn1
micmiu
]
$
jps
8499
QuorumPeerMain
9134
Jps
8771
DFSZKFailoverController
8837
JournalNode
9017
NameNode
|
4.6、在备NN上同步主NN的元数据信息hdfs namenode -bootstrapStandby
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
[
hadoop
@
nn2
~
]
$
hdfs
namenode
-
bootstrapStandby
15
/
02
/
02
17
:
04
:
43
INFO
namenode
.
NameNode
:
STARTUP_MSG
:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nn2.hadoop/172.17.225.121
STARTUP_MSG: args = [-bootstrapStandby]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0......
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG: java = 1.7.0_75
************************************************************/
15
/
02
/
02
17
:
04
:
43
INFO
namenode
.
NameNode
:
registered
UNIX
signal
handlers
for
[
TERM
,
HUP
,
INT
]
15
/
02
/
02
17
:
04
:
43
INFO
namenode
.
NameNode
:
createNameNode
[
-
bootstrapStandby
]
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
==
About
to
bootstrap
Standby
ID
nn2
from
:
Nameservice
ID
:
mycluster
Other
Namenode
ID
:
nn1
Other
NN
's HTTP address: http://nn1.hadoop:50070
Other NN'
s
IPC
address
:
nn1
.
hadoop
/
172.17.225.61
:
53310
Namespace
ID
:
263802668
Block
pool
ID
:
BP
-
711086735
-
172.17.225.61
-
1422867787014
Cluster
ID
:
CID
-
237f7c54
-
db75
-
470c
-
8baf
-
d4dcfaddaf2f
Layout
version
:
-
60
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
===
==
15
/
02
/
02
17
:
04
:
44
INFO
common
.
Storage
:
Storage
directory
/
bigdata
/
hadoop
/
dfs
/
name
has
been
successfully
formatted
.
15
/
02
/
02
17
:
04
:
45
INFO
namenode
.
TransferFsImage
:
Opening
connection
to
http
:
//nn1.hadoop:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:263802668:0:CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
15
/
02
/
02
17
:
04
:
45
INFO
namenode
.
TransferFsImage
:
Image
Transfer
timeout
configured
to
60000
milliseconds
15
/
02
/
02
17
:
04
:
45
INFO
namenode
.
TransferFsImage
:
Transfer
took
0.00s
at
0.00
KB
/
s
15
/
02
/
02
17
:
04
:
45
INFO
namenode
.
TransferFsImage
:
Downloaded
file
fsimage
.
ckpt_0000000000000000000
size
352
bytes
.
15
/
02
/
02
17
:
04
:
45
INFO
util
.
ExitUtil
:
Exiting
with
status
0
15
/
02
/
02
17
:
04
:
45
INFO
namenode
.
NameNode
:
SHUTDOWN_MSG
:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn2.hadoop/172.17.225.121
************************************************************/
[
hadoop
@
nn2
~
]
$
|
4.7、启动备NN
在备NN上执行命令:hadoop-daemon.sh start namenode
1
2
3
4
5
6
7
8
|
hadoop
@
nn2
~
]
$
hadoop
-
daemon
.
sh
start
namenode
starting
namenode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
namenode
-
nn2
.
hadoop
.
out
[
hadoop
@
nn2
~
]
$
jps
7828
QuorumPeerMain
8198
JournalNode
8082
DFSZKFailoverController
8394
NameNode
8491
Jps
|
4.8、设置和确认主NN
本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:
1
2
3
4
|
[
hadoop
@
nn1
~
]
$
hdfs
haadmin
-
getServiceState
nn1
active
[
hadoop
@
nn1
~
]
$
hdfs
haadmin
-
getServiceState
nn2
standby
|
如果是配置手动切换NN的,这一步是不可缺少的,因为系统还不知道谁是主NN,两个节点的NN都是Standby状态。手动激活主NN的命令:hdfs haadmin -transitionToActive nn1
4.9、在主NN上启动Datanode
启动所有datanode命令:hadoop-daemons.sh start datanode
注意:hadoop-daemons.sh 和 hadoop-daemon.sh 命令的差异
1
2
3
4
5
|
hadoop
@
nn1
~
]
$
hadoop
-
daemons
.
sh
start
datanode
dn3
.
hadoop
:
starting
datanode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
datanode
-
dn3
.
hadoop
.
out
dn1
.
hadoop
:
starting
datanode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
datanode
-
dn1
.
hadoop
.
out
dn2
.
hadoop
:
starting
datanode
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
hadoop
-
hadoop
-
datanode
-
dn2
.
hadoop
.
out
[
hadoop
@
nn1
~
]
$
|
4.10、启动YARN
方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh
方法二:分别启动ResourceManager和NodeManager:
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
(如果有多个datanode,需使用yarn-daemons.sh)
ResourceManager 也配置了HA,根据命令查看节点状态:
yarn rmadmin –getServiceState serviceid
1
2
3
4
5
|
[
hadoop
@
nn1
~
]
$
yarn
rmadmin
-
getServiceState
rm1
active
[
hadoop
@
nn1
~
]
$
yarn
rmadmin
-
getServiceState
rm2
standby
[
hadoop
@
nn1
~
]
$
|
4.11 启动MR JobHistory Server
在dn1.hadoop上运行MRJS :mr-jobhistory-daemon.sh start historyserver
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
//运行MRJS之前
[
hadoop
@
dn1
~
]
$
jps
14625
Jps
3568
NodeManager
748
QuorumPeerMain
1008
JournalNode
1194
DataNode
[
hadoop
@
dn1
~
]
$
mr
-
jobhistory
-
daemon
.
sh
start
historyserver
starting
historyserver
,
logging
to
/
usr
/
local
/
share
/
hadoop
-
2.6.0
/
logs
/
mapred
-
hadoop
-
historyserver
-
dn1
.
hadoop
.
out
//运行MRJS之后
[
hadoop
@
dn1
~
]
$
jps
14745
JobHistoryServer
3568
NodeManager
748
QuorumPeerMain
1008
JournalNode
1194
DataNode
14786
Jps
|
4.12、验证NameNode 和ResourceManager 的HA是否生效
把当前主节点中的相关进程kill掉 查看各节点状态切换情况。
4.13、验证NN HA的透明性
注意验证 hdfs dfs -ls
/ 和 hdfs dfs -ls hdfs://mycluster/
的访问效果是一致的:
1
2
3
4
5
6
7
8
9
|
[
hadoop
@
nn1
~
]
$
hdfs
dfs
-
ls
/
Found
2
items
drwx
--
--
--
-
hadoop
supergroup
0
2015
-
02
-
02
23
:
42
/
tmp
drwxr
-
xr
-
x
-
hadoop
supergroup
0
2015
-
02
-
02
23
:
39
/
user
[
hadoop
@
nn1
~
]
$
hdfs
dfs
-
ls
hdfs
:
//mycluster/
Found
2
items
drwx
--
--
--
-
hadoop
supergroup
0
2015
-
02
-
02
23
:
42
hdfs
:
//mycluster/tmp
drwxr
-
xr
-
x
-
hadoop
supergroup
0
2015
-
02
-
02
23
:
39
hdfs
:
//mycluster/user
[
hadoop
@
nn1
~
]
$
|
[五]、 运行wrodcount demo
这个demo的演示可参考:hadoop 2.2.0 集群模式安装配置和测试 中的 wordcount演示步骤,这里不再重复描述了。