1 准备工作
1.1 虚拟机规划
- 版本:CentOS Linux release 7.6.1810
- VMware安装三台虚拟机
192.168.159.133(linux-01.potato.com) NameNode DataNode ResourceManager NodeManager
192.168.159.128(linux-02.potato.com) SecondaryNameNode DataNode NodeManager
192.168.159.131(linux-03.potato.com) DataNode NodeManager
1.2 用户
- hadoop不能用root用户启动,需要创建一个启动用户,本文使用dehuab作为启动用户
1.3 SSH免密登录
- 三台虚拟机配置免密登陆,避免后续集群操作频繁输入密码
- 在NameNode(linux-01.potato.com)服务器上操作
ssh-keygen -t rsa(3次回车)
ssh-copy-id -i ~/.ssh/id_rsa.pub dehaub@linux-01.potato.com(自己也要拷贝给自己)
ssh-copy-id -i ~/.ssh/id_rsa.pub dehaub@linux-02.potato.com
ssh-copy-id -i ~/.ssh/id_rsa.pub dehaub@linux-03.potato.com
1.4 JDK
- 版本 1.8.0_181
- 解压到/usr/local/jdk
1.5 Hadoop
- 版本 Hadoop 3.2.0
- 解压到/usr/local/hadoop
- 创建hadoop数据存放目录/usr/local/hadoop-data
- 设置hadoop目录的所属用户(启动用户dehuab)
sudo chown -R dehuab:dehuab /usr/local/hadoop
sudo chown -R dehuab:dehuab /usr/local/hadoop-data
1.6 配置hosts
192.168.159.133 linux-01.potato.com
192.168.159.128 linux-02.potato.com
192.168.159.131 linux-03.potato.com
1.7 防火墙
- 关闭firewall:systemctl stop firewalld.service
- 停止firewall(禁止firewall开机启动):systemctl disable firewalld.service
- 查看默认防火墙状态(关闭后显示notrunning,开启后显示running):firewall-cmd --state
1.8 环境变量
JAVA_HOME=/usr/local/jdk
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH
HADOOP_HOME=/usr/local/hadoop
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
- 使环境变量生效 source /etc/profile
2 配置文件
- 在NameNode机器上操作,其他机器通过scp拷贝
2.1 hadoop-env.sh
- 在hadoop-env.sh中,再显示地重新声明一遍export JAVA_HOME
export JAVA_HOME=/usr/local/jdk
2.2 hdfs-site.xml
<configuration>
<!--配置数据块的冗余度,默认是3-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--配置HDFS的权限检查,默认是true-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>linux-01.potato.com:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>linux-02.potato.com:50090</value>
</property>
</configuration>
2.3 core-site.xml
<configuration>
<!--配置HDFS主节点,namenode的地址,9000是RPC通信端口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://linux-01.potato.com:9001</value>
</property>
<!--配置HDFS数据块和元数据保存的目录,一定要修改-->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-data</value>
</property>
</configuration>
2.4 mapred-site.xml(默认没有)
<!--配置MR程序运行的框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.5 yarn-site.xml
<configuration>
<!--配置Yarn的ResourceManager节点-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>linux-01.potato.com</value>
</property>
<!--NodeManager执行MR任务的方式是Shuffle洗牌-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2.6 works
- 该文件里面配置所有节点机器的名称
- hadoop3.0以前文件名称为slaves,hadoop3.0以后文件名称改为works
bigdata122
bigdata123
2.7 scp拷贝
scp -r /etc/hosts root@linux-02.potato.com:/etc/hosts
scp -r /etc/profile root@linux-02.potato.com:/etc/profile
scp -r /usr/local/hadoop/ root@linux-02.potato.com:/usr/local/
scp -r /usr/local/jdk/ root@linux-02.potato.com:/usr/local/
scp -r /etc/hosts root@linux-03.potato.com:/etc/hosts
scp -r /etc/profile root@linux-03.potato.com:/etc/profile
scp -r /usr/local/hadoop/ root@linux-03.potato.com:/usr/local/
scp -r /usr/local/jdk/ root@linux-03.potato.com:/usr/local/
3 HDFS NameNode 格式化
hdfs namenode -format
成功的标志: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
4 启动服务
- 通过start-all.sh启动(环境变量配置成功后,start-all.sh可以在任意位置访问)
- jps命令查看进程
验证5个进程:
5022 NameNode
5314 SecondaryNameNode
5586 NodeManager
5476 ResourceManager
5126 DataNode
YARN: http://linux-01.potato.com:8088
HDFS: http://linux-01.potato.com:50070
/usr/local/hadoop/logs/hadoop-dehuab-datanode-linux-01.potato.com.log Shift+G 看启动日志