工具:
- VMware® Workstation 16 Pro
- hadoop-3.3.1.tar.gz
- jdk-8u202-linux-x64.tar.gz
- CentOS-7-x86_64-DVD-2009.iso
- Xftp-8.0.0057p.exe
- Xshell-8.0.0057p.exe
- 创建虚拟机
Tips:
1.网络和主机名 –》 打开网络, 主机名设置为hadoop1
2.软件选择-》Gnone桌面
3.添加hadoop用户,密码为hadoop
- 克隆虚拟机
完整克隆
- 配置虚拟机
(1)修改主机名
hostnamectl set-hostname hadoop2
bash
hostnamectl set-hostname hadoop3
bash
(2)修改主机名与IP映射
ip a 查看地址
vi /etc/hosts //三台主机都要修改,修改的ip地址是自己电脑上的
192.168.1.132 hadoop1
192.168.1.133 hadoop2
192.168.1.134 hadoop3
vi编辑器的使用技巧
o下一行插入
esc退出编辑模式
:wq 写入并退出
(3)修改静态静态IP
1.修改网卡信息
su
vi /etc/sysconfig/network-scripts/ifcfg-ens33
BOOTPROTO="static"
ONBOOT="yes"
IPADDR="192.168.1.132"
NETMASK="255.255.255.0"
GATEWAY="192.168.1.2"
DNS="8.8.8.8"
Tips: hadoop1 hadoop2 hadoop3三个IP不一样
2.重启网络服务
systemctl restart network
ip a //查看本机IP信息
(4)免密
ssh-keygen -t rsa //在三台主机都要生成
ssh-copy-id hadoop1 //在hadoop1节点执行这三个命令
ssh-copy-id hadoop2
ssh-copy-id hadoop3
- 安装JDK
(1)创建目录(三个节点都要创建)
mkdir -p /export/data
mkdir -p /export/servers
mkdir -p /export/software
(2)上传jdk文件
使用xftp上传文件到hadoop1
(3)解压并修改环境变量
cd /root
tar -xvzf jdk-8u202-linux-x64.tar.gz -C /export/servers/
vi /etc/profile
export JAVA_HOME=/export/servers/jdk1.8.0_202
export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
java -version
(4)分发
scp -r /export/servers/ jdk1.8.0_202 root@hadoop2:/export/servers/
scp -r /export/servers/ jdk1.8.0_202 root@hadoop3:/export/servers/
scp /etc/profile root@hadoop2:/etc/
scp /etc/profile root@hadoop3:/etc/
source /etc/profile
- 安装Hadoop
(1)上传hadoop压缩文件
(2)解压并修改环境变量
cd /root
tar -zxvf /root/hadoop-3.3.1.tar.gz -C /export/servers
vi /etc/profile
export HADOOP_HOME=/export/servers/hadoop-3.3.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
hadoop version
(3)修改配置文件
cd /export/servers/hadoop-3.3.1/etc/hadoop
vi hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_202 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root |
vi core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/export/data/hadoop-3.3.1</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> |
vi hdfs-site.xml
<property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop2:9868</value> </property> |
vi mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <property> <name>mapreduce.jobhistory.Webapp.address</name> <value>hadoop1:19888</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> |
vi yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://hadoop1:19888/jobhistory/logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> |
vi workers
hadoop2 hadoop3 |
(4)分发
scp -r /export/servers/hadoop-3.3.1 root@hadoop2:/export/servers/
scp -r /export/servers/hadoop-3.3.1 root@hadoop3:/export/servers/
scp /etc/profile root@hadoop2:/etc
scp /etc/profile root@hadoop3:/etc
source /etc/profile (hadoop2、hadoop3都需要)
source /etc/profile
(5)格式化HDFS
hdfs namenode -format (在hadoop1格式化一次)
如果格式化多了,修改clusterid 使datanode的clusterid与namenode的一致
(6)启动Hadoop
第一种方式:
start-all.sh (在hadoop1上启动) //自动启动HDFS和YARN
第二种方式:
start-dfs.sh (在hadoop1上启动) //启动HDFS
(hadoop1进程: NameNode
hadoop2进程:DataNode SecondaryNameNode
hadoop3进程:DataNode)
start-yarn.sh(在hadoop1上启动)
(hadoop1进程: ResourceManager
hadoop2进程:NodeManager
hadoop3进程:NodeManager)
(7)查看Hadoop运行状态
Jps
(8)WebUI查看集群状态
1.关防火墙
systemctl status firewalld
systemctl stop firewalld
systemctl disable firewalld
如果进入安全模式
hdfs dfsadmi -safemode leave
关闭安全模式
浏览器打开: http://192.168.1.132:9870
-
- 词频统计
(1)准备文本数据
vi word.txt
hello world hello hadoop hello hdfs hello yarn |
(2) 创建目录
hdfs dfs -mkdir -p /wordcount/input
(3) 上传文件
hdfs dfs -put /root/word.txt /wordcount/input
(4) 查看文件是否上传成功
http://192.168.1.132:9870
(5) 运行MapReduce程序
cd /export/servers/hadoop-3.3.1/share/hadoop/mapreduce/
ll
hadoop jar hadoop-mapreduce-examples-3.3.1.jar wordcount /wordcount/input/ /wordcount/output
hadoop dfsadmin -safemode leave
(6) 查看MapReduce程序运行状态
http://192.168.1.132:8088
hdfs dfs -cat /wordcount/output/part-r-00000