hadoop3.3.6集群搭建
一、前置条件
-
服务器:3台,1主2从,centos7
IP hostname 说明 192.168.108.137 centos137 master 192.168.108.138 centos138 node 192.168.108.139 centos139 node 三台服务之间能通过hostname访问
# hostname修改 hostnamectl set-hostname centos137 #三台修改hosts文件,添加以下命令 192.168.108.137 centos137 192.168.108.138 centos138 192.168.108.139 centos139 #重启 reboot
-
hadoop集群(版本号2.2+),集群中安装有HDFS服务
-
JDK1.8+(推荐自己安装JDK,需要JAVA_HOME环境变量)
使用3.3.6版本
二、角色分配
节点部署角色目录
节点 | ip | NN | SNN | DN | RM | NM | HS |
---|---|---|---|---|---|---|---|
centos137 | 192.168.108.137 | √ | √ | ||||
centos138 | 192.168.108.138 | √ | √ | √ | √ | ||
centos139 | 192.168.108.139 | √ | √ |
角色说明
HDFS | YARN | MapReduce |
---|---|---|
NameNode(NN) | ResourceManager(RM) | HistoryServer(HS) |
SecondNameNode (SNN) | NodeManager(NM) | |
DataNode (DN) |
组件默认端口清单
组件 | 端口 | 说明 |
---|---|---|
HDFS | 8020 | NameNode |
50010,50020、50075 | DataNode | |
YARN | 8032 | ResourceManager |
8088 | Web界面 | |
8040 | NodeManager协议 | |
8042 | Web界面 | |
MapReduce | 10020 | HistoryServer协议 |
19888 | Web界面 | |
Hadoop Common | 49152~65535 | Inter-Process Communication |
ZooKeeper | 2181 | Hadoop集群的协调服务 |
Hadoop Web界面 | 9870 | NameNode Web界面 |
8088 | ResourceManager Web界面: | |
19888 | JobHistoryServer Web界面 | |
Hadoop RPC | 8019 | Remote Procedure Call |
安装包:国内镜像地址:Index of /apache/hadoop/common/hadoop-3.3.6 (tsinghua.edu.cn)
2.1软件安装(所有服务器)
ssh免密登录
useradd hadoop
passwd hadoop
忽略提示密码太短的警告
# 切换用户
su hadoop
先输入密码自己登录下自己生成.ssh目录
ssh localhost
生成秘钥
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
分发密钥(主节点上执行)
#后面是想要免密登录的节点主机名
ssh-copy-id centos137
ssh-copy-id centos138
ssh-copy-id centos139
测试centos137登录各个节点是否免密例如登录centos138
ssh centos138
在所有虚拟机根目录下新建文件夹export,export文件夹中新建data、servers和software文件
mkdir -p /export/data
mkdir -p /export/servers
mkdir -p /export/software
1.解压
tar -zxvf hadoop-3.3.6.tar.gz -C /export/servers/
2.配置环境变量
vi /etc/profile
# 文末追加
export HADOOP_HOME=/export/servers/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 环境变量生效
source /etc/profile
3.验证
[root@centos137 servers]# hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /export/servers/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar
** 注意环境切换 **
2.2主节点配置
进入hadoop安装目录
cd /export/servers/hadoop-3.3.6/etc/hadoop
修改配置hadoop-env.sh
vim hadoop-env.sh
# 添加JAVA_HOME
export JAVA_HOME=/export/servers/jdk
修改workers
vim workers
添加
centos137
centos138
centos139
内容:
[hadoop@centos138 sbin]$ cat workers
centos137
centos138
centos139
修改core-site.xml
vim core-site.xml
- 配置HDFS的URI和临时目录
- HDFS网页登录使用的静态用户
- 添加以下配置
<configuration>
<!--setting HDFS-->
<property>
<name>fs.defaultFS</name>
<!--setting namenode-->
<value>hdfs://centos137:9000</value>
</property>
<!--setting temp folder,default:/tem/hadoop-${user.name}-->
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-3.3.6/tmp</value>
</property>
<!-- HDFS web loggin static user -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
</configuration>
修改hdfs-site.xml文件
vim hdfs-site.xml
-
指定HDFS的数量
-
配置secondary namenode
-
添加以下配置
<configuration>
<!--setting HDFS number-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--setting secondary namenode-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>centos138:50090</value>
</property>
</configuration>
修改mapred-site.xml文件
vim mapred-site.xml
- 指定MapReduce运行时的框架,这里指定在YARN上,默认在local
- 历史服务器端地址
添加配置
<configuration>
<!-- 执行MapReduce的方式:yarn/local -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos138:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos138:19888</value>
</property>
</configuration>
修改yarn-site.xml文件
- 指定YARN集群的管理者(ResourceManager)的地址
- 指定MR走shuffle
- 开启日志聚集功能
- 设置日志聚集服务器地址
- 设置日志保留时间为 7 天
vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos137</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- open log aggregation -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- log erver -->
<property>
<name>yarn.log.server.url</name>
<value>http://centos138:19888/jobhistory/logs</value>
</property>
<!-- log save days-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
目录授权
将安装目录的权限赋予hadoop用户
chown -R hadoop:hadoop /export/
2.3文件分发
将master的配置分发到从node
scp -r /export/servers centos138:/export
scp -r /export/servers centos139:/export
2.4启动hadoop集群
格式化NameNode
hdfs namenode -format
格式化NameNode会产生新的集群id,导致DataNode中记录的的集群id和刚生成的NameNode的集群id不 一致,DataNode找不到NameNode。所以,格式化NameNode时,一定要先删除每个节点的data目录和logs日志,然后再格式化NameNode,一般只在搭建初期执行这一次。
在master(centos137)执行
# 启动集群
/export/servers/hadoop-3.3.6/sbin/start-all.sh
# 停止集群
/export/servers/hadoop-3.3.6/sbin/stop-all.sh
[hadoop@centos137 hadoop-3.3.6]$ /export/servers/hadoop-3.3.6/sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [centos137]
Starting datanodes
Starting secondary namenodes [centos138]
Starting resourcemanager
Starting nodemanagers
启动过程有错误输出,核对文件分发后的目录是否正确
启动历史服务(centos138节点)
mapred --daemon start historyserver
或者HDFS和YARN单独启动
# 启动
start-dfs.sh
start-yarn.sh
# 停止
stop-dfs.sh
stop-yarn.sh
集群部署验证
每个节点执行jps命令验证hdfs集群启动的角色是否正确
2.5集群部署验证
-
每个节点执行jps命令验证hdfs集群启动的角色是否正确
执行:jps
centos138角色: NN、RM、NM、DN
[hadoop@centos137 hadoop-3.3.6]$ jps 34082 ResourceManager 34228 NodeManager 33638 DataNode 33497 NameNode
37790 Jps
centos138角色:SNN、DN、NM、HS
```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode
centos139角色:NM、DN
[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode
根据组件默认端口清单访问WEB UI
34228 NodeManager
33638 DataNode
33497 NameNode
37790 Jps
centos138角色:SNN、DN、NM、HS
```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode
centos139角色:NM、DN
[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode
根据组件默认端口清单访问WEB UI
参考链接:
https://blog.youkuaiyun.com/weixin_43655425/article/details/134751084
https://blog.youkuaiyun.com/tang5615/article/details/120382513