准备安装包
hadoop安装包:hadoop-3.3.2.tar.gz
jdk安装包:jdk-8u192-linux-x64.tar.gz
虚拟机系统镜像:ubuntu-22.04.3-desktop-amd64.iso
安装虚拟机
- 通过KVM/QEMU方式安装虚拟机,确保CPU支持虚拟化技术(Intel VT或AMD-V):
root@hjwang-X10DRi:~# lscpu
架构: x86_64
CPU 运行模式: 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
字节序: Little Endian
CPU: 40
在线 CPU 列表: 0-39
厂商 ID: GenuineIntel
型号名称: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
CPU 系列: 6
型号: 79
每个核的线程数: 2
每个座的核数: 10
座: 2
步进: 1
CPU 最大 MHz: 3100.0000
CPU 最小 MHz: 1200.0000
BogoMIPS: 4399.71
标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_go
od nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase ts
c_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
虚拟化: VT-x
- 安装KVM和QEMU:
sudo apt update
sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
- 配置KVM
将当前用户添加到kvm组,避免在使用kvm时需要sudo
sudo usermod -aG kvm $USER
sudo usermod -aG libvirt $USER
- 启动virt-manager服务
sudo systemctl start libvirtd
sudo systemctl enable libvirtd
- 通过virt-manager创建虚拟机
在桌面系统中打开virt-manager工具
virt-manager
-
点击左上角文件-新建虚拟机
-
选择系统镜像
-
设置内存和CPU
-
选择存储位置
-
点击完成开始安装虚拟机系统
-
安装操作系统不做详细说明,用户名最好设定为部署hadoop需要的用户名,若未设置后续也可添加用户,本文为后续添加的用户。
-
以此为例创建三台虚拟机,虚拟机安装成功后,可修改三台虚拟机的IP地址为静态IP,记录三台虚拟机的IP地址。
-
修改hostname和hosts便于通信
vim /etc/hostsname
三台分别设置为hadoop001、hadoop002、hadoop003
vim /etc/hosts
文件中添加三台虚拟机的ip地址和名称
192.168.122.129 hadoop001
192.168.122.234 hadoop002
192.168.122.151 hadoop003
分别在三台虚拟机上测试能否ping通
ping hadoop001
ping hadoop002
ping hadoop003
- 分别在三台虚拟机上创建hadoop用户,以用于hadoop部署,若已设置可省略此步骤
sudo adduser hadoop
切换hadoop用户
su - hadoop
若为root用户使用,则为hadooop用户添加hadoop目录权限
root@hadoop002:/opt/hadoop-3.3.2/etc/hadoop# chown -R hadoop:hadoop /opt/hadoop-3.3.2/
- 三台虚拟机之间免密登录设置
安装openssh-server,启动服务并关闭防火墙
sudo apt install open-server
sudo /etc/init.d/ssh restart
sudo ufw disable
- 三台虚拟机之间免密登录设置
在每台虚拟机上生成SSH公钥,并分发给其他机器
ssh localhost
cd ~/.ssh
ssh-keygen -t rsa
全部默认回车
分发给其他节点,根据提示输入虚拟机密码
ssh-copy-id hadoop001
ssh-copy-id hadoop002
ssh-copy-id hadoop003
默认不开启root用户,可执行下列命令启动root用户
sudo password root
设置root用户密码
ssh服务需要开启root用户远程登录权限
修改/etc/ssh/sshd_config文件中的PermitRootLogin这一项,若无则添加一句:
PermitRootLogin yes
重启ssh服务
sudo systemctl restart sshd.service
在虚拟机上测试是否可以免密登录另外两台虚拟机
ssh hadoop002
ssh hadoop003
安装JDK和Hadoop
上传jdk和hadoop安装包至每台虚拟机目录下
jdk安装方法参考https://blog.youkuaiyun.com/rachelfffy/article/details/140952136?spm=1001.2014.3001.5501
hadoop安装:
sudo tar -zxvf hadoop-3.3.2.tar.gz -C /opt
为hadoop配置java环境,打开hadoop安装目录的etc/hadoop/hadoop-env.sh文件
sudo vim /opt/hadoop-3.3.2/etc/hadoop/hadoop-env.sh
找到JAVA_HOME写入自己的路径
# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
# export JAVA_HOME=
export JAVA_HOME=/usr/local/jdk1.8.0_192
打开hadoop安装目录查看是否安装成功
hadoop@hadoop002:/opt/hadoop-3.3.2/etc/hadoop$ /opt/hadoop-3.3.2/bin/hadoop version
Hadoop 3.3.2
Source code repository git@github.com:apache/hadoop.git -r 0bcb014209e219273cb6fd4152df7df713cbac61
Compiled by chao on 2022-02-21T18:39Z
Compiled with protoc 3.7.1
From source with checksum 4b40fff8bb27201ba07b6fa5651217fb
This command was run using /opt/hadoop-3.3.2/share/hadoop/common/hadoop-common-3.3.2.jar
hadoop配置
在hadoop001上打开hadoop安装目录,进入/etc/hadoop路径,找到配置文件,并按照如下配置
hadoop001 | hadoop002 | hadoop003 | |
---|---|---|---|
HDFS | DataNode、NameNode | DataNode | SecondaryNameNode |
YARN | NodeManager | NodeManager、ResourceManager | NodeManager |
core-site.xml
<configuration>
<!-- 指定Name Node的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:8020</value>
</property>
<!-- 指定hadoop数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.3.2/data</value>
</property>
<property>
<!-- 指定hadoop客户端用户名 --><name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop001:9870</value>
</property>
<!-- 2nn web端访问地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop003:9868</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop002</value>
<description>resourcemanager</description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop001:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- Site specific YARN configuration properties -->
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- mapreduce 运行环境,路径可在终端输入 hadoop classpath 找到-->
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>
HADOOP_MAPRED_HOME=/opt/hadoop-3.3.2/etc/hadoop:/opt/hadoop-3.3.2/share/hadoop/common/lib/*:/opt/hadoop-3.3.2/share/hadoop/common/*:/opt/hadoop-3.3.2/share/hadoop/hdfs:/opt/hadoop-3.3.2/share/hadoop/hdfs/lib/*:/opt/hadoop-3.3.2/share/hadoop/hdfs/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/*:/opt/hadoop-3.3.2/share/hadoop/yarn:/opt/hadoop-3.3.2/share/hadoop/yarn/lib/*:/opt/hadoop-3.3.2/share/hadoop/yarn/*
</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>
HADOOP_MAPRED_HOME=/opt/hadoop-3.3.2/etc/hadoop:/opt/hadoop-3.3.2/share/hadoop/common/lib/*:/opt/hadoop-3.3.2/share/hadoop/common/*:/opt/hadoop-3.3.2/share/hadoop/hdfs:/opt/hadoop-3.3.2/share/hadoop/hdfs/lib/*:/opt/hadoop-3.3.2/share/hadoop/hdfs/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/*:/opt/hadoop-3.3.2/share/hadoop/yarn:/opt/hadoop-3.3.2/share/hadoop/yarn/lib/*:/opt/hadoop-3.3.2/share/hadoop/yarn/*
</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>
HADOOP_MAPRED_HOME=/opt/hadoop-3.3.2/etc/hadoop:/opt/hadoop-3.3.2/share/hadoop/common/lib/*:/opt/hadoop-3.3.2/share/hadoop/common/*:/opt/hadoop-3.3.2/share/hadoop/hdfs:/opt/hadoop-3.3.2/share/hadoop/hdfs/lib/*:/opt/hadoop-3.3.2/share/hadoop/hdfs/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.3.2/share/hadoop/mapreduce/*:/opt/hadoop-3.3.2/share/hadoop/yarn:/opt/hadoop-3.3.2/share/hadoop/yarn/lib/*:/opt/hadoop-3.3.2/share/hadoop/yarn/*
</value>
</property>
</configuration>
workers
hadoop001
hadoop002
hadoop003
设置hadoop集群用户权限
vim /etc/profile
添加内容
export HDFS_NAMENODE_USER=hadoop
export HDFS_DATANODE_USER=hadoop
export HDFS_SECONDARYNAMENODE_USER=hadoop
export YARN_RESOURCEMANAGER_USER=hadoop
export YARN_NODEMANAGER_USER=hadoop
更新配置
source /etc/profile
编辑xsync文件
cd /usr/local/jdk1.8.0_192/bin
sudo vim xsync
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop100 hadoop101 hadoop102
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
设置执行权限
sudo chmod -R 777 xsync
分发hadoop配置,只分发配置文件,其余文件不同虚拟机有自己id不要修改。
xsync /opt/hadoop-3.2.4/etc/hadoop
在namenode节点(hadoop001)上格式化配置
hadoop@hadoop001:~$ /opt/hadoop-3.3.2/bin/hdfs namenode -format
在hadoop001上启动集群
hadoop@hadoop001:~$ /opt/hadoop-3.3.2/sbin/start-dfs.sh
Starting namenodes on [hadoop001]
Starting datanodes
hadoop003: WARNING: /opt/hadoop-3.3.2/logs does not exist. Creating.
hadoop002: WARNING: /opt/hadoop-3.3.2/logs does not exist. Creating.
Starting secondary namenodes [hadoop003]
在hadoop002上启动yarn
hadoop@hadoop002:~$ /opt/hadoop-3.3.2/sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
分别在jps上查看hadoop进程
hadoop001
hadoop@hadoop001:~$ jps
13200 NodeManager
13301 Jps
12777 NameNode
12906 DataNode
hadoop002
hadoop@hadoop002:~$ jps
10773 NodeManager
11094 Jps
10493 DataNode
10639 ResourceManager
hadoop003
hadoop@hadoop003:~$ jps
11157 NodeManager
10922 DataNode
11260 Jps
11052 SecondaryNameNode
在hadoop001上打开hadoop:9870
配置完成
参考网址:https://blog.youkuaiyun.com/wurobb/article/details/134281489