hadoop完全分布式集群安装过程非常简单,我们一起动手来做吧!
1. 环境
主机名 | ip | 角色 |
---|---|---|
master | 192.168.1.200 | ResourceManager/NameNode/SecondaryNameNode |
slave1 | 192.168.1.201 | NodeManager/DataNode |
slave2 | 192.168.1.202 | NodeManager/DataNode |
2.准备工作
- windows10(物理机系统)
- VMware12 workstation(虚拟机软件)
- centos7.0(虚拟机系统)
- hadoop3.0.0
- jdk1.8
- xshell
3.安装虚拟机
4.配置IP及SSH免密码登录
切换到root用户操作
配置ip
vim /etc/hosts
192.168.1.200 master
192.168.1.201 slave1
192.168.1.202 slave2
每个节点都要做这步操作,存盘执行 source /etc/hosts
关闭防火墙
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
关闭selinux
vim /etc/selinux/config
SELINUX=enforcing
SELINUXTYPE=targeted
以上步骤做完创建操作hadoop的用户,然后切换到新用户操作。
配置SSH免密码登录
ssh-keygen -t rsa
cd .ssh/
cp id_rsa.pub authorized_keys
将每个节点的 id_rsa.pub 公钥输出重定向成authorized_keys,再将authorized_keys覆盖到每个节点的authorized_keys。
完成以上步骤,验证是否可以免口令登录到其他机器上;否者不要往下继续!!!
5.安装基本环境
安装jdk
如果有openJDK,先将openJDK删掉
mkdir /usr/local/jdk
cd /usr/local/jdk
下载jdk-8u141-linux-x64.tar.gz解压
tar -zxvf jdk-8u141-linux-x64.tar.gz
安装hadoop
下载hadoop-3.0.0-alpha4.tar.gz包经行解压
tar -zxvf hadoop-3.0.0-alpha4.tar.gz
vim /etc/profile
# oracle jdk start
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_141
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# oracle jdk end
#hadoop start
export HADOOP_HOME=/usr/local/hadoop-3.0.0-alpha4
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME_WARN_SUPPRESS=1
#hadoop end
存盘执行source /etc/profile
这里我建议每台机上的jdk和hadoop安装路径都是一样的,不然其他节点要独特配置显得麻烦。
6.配置hadoop
vim /usr/local/hadoop-3.0.0-alpha4/etc/hadoop/hadoop-env.sh 修改java路径
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_141
vim /usr/local/hadoop-3.0.0-alpha4/etc/hadoop/hdfs-site.xml 加入配置目录
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop-3.0.0-alpha4/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop-3.0.0-alpha4/hdfs/data</value>
</property>
</configuration>
vim /usr/local/hadoop-3.0.0-alpha4/etc/hadoop/mapred-site.xml 加入 mapreduce配置
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop-3.0.0-alpha4/etc/hadoop,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/common/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/common/lib/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/hdfs/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/hdfs/lib/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/mapreduce/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/yarn/*,
/usr/local/hadoop-3.0.0-alpha4/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
vim /usr/local/hadoop-3.0.0-alpha4/etc/hadoop/workers 加入从节点主机名,之前的hadoop版本有mater和slave区分现为workers
slave1
slave2
vim /usr/local/hadoop-3.0.0-alpha4/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
将配置好的hadoop拷贝到从节点
7.启动hadoop守护进程及检查进程启动情况
格式化namenode
cd /usr/local/hadoop-3.0.0-alpha4/bin
hadoop namenode -format
如果格式化过程出现警告或者报错都是失败的
启动hadoop
cd /usr/local/hadoop-3.0.0-alpha4/sbin
./start-all.sh
查看启动进程
/usr/local/jdk/jdk1.8.0_141/bin/jps
查看集群状态
hadoop dfsadmin -report
总结
hadoop版本目录之间有细微差别,大家要注意区分,且hadoop版本之间功能有区别。