hadoop-3.2.1从零到一分布式环境搭建(一)
前言
时隔多年,终于回来学hadoop了,曾经买的《hadoop权威指南》都不权威了,
本文是根据hadoop官方文档部署,讲解的从虚拟机配置到hadoop集群的基础部署,
如有错误,劳烦指出
官网地址http://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
环境
hadoop-3.2.1.tar.gz
Jdk1.8.0_181 - jdk-8u181-linux-x64.tar.gz
虚拟机Centos7
- 192.168.1.10 ip10.hadoop.com
- 192.168.1.11 ip11.hadoop.com
- 192.168.1.12 ip12.hadoop.com
- 192.168.1.13 ip13.hadoop.com
安装部署第一台虚拟机 - ip10.hadoop.com
新建一台虚拟机,操作该虚拟机
- 配置该虚拟机为192.168.1.10 ip10.hadoop.com
网络配置见:[Linux虚拟机-桥接网络-可视化简单配置](https://blog.youkuaiyun.com/sinat_25528181/article/details/106327434)
- 配置hostname
ip10.hadoop.com
- 配置/etc/hosts
192.168.1.10 ip10.hadoop.com
192.168.1.11 ip11.hadoop.com
192.168.1.12 ip12.hadoop.com
192.168.1.13 ip13.hadoop.com
- 创建用户
#创建hadoop用户
[root@ip10]# useradd hadoop
[root@ip10]# passwd hadoop #用户名密码一致
- 解压jdk到/home/hadoop/jdk1.8.0_181
- 解压hadoop到/home/hadoop/hadoop-3.2.1
[root@ip10] tar -xvf jdk-8u181-linux-x64.tar.gz
[root@ip10] tar -xvf hadoop-3.2.1.tar.gz
#添加环境变量
[root@ip10] vim /etc/profile
###########
export JAVA_HOME=/home/hadoop/jdk1.8.0_181
export HADOOP_HOME=/home/hadoop/hadoop-3.2.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
###########
hadoop配置
设置HADOOP_PID_DIR 默认路径
#设置HADOOP_PID_DIR 默认路径
#log不设置,为默认值$HADOOP/logs
[hadoop@ip10 hadoop]$ echo 'export HADOOP_PID_DIR=$HADOOP_HOME/pid/' >> /home/hadoop/hadoop-3.2.1/etc/hadoop/hadoop-env.sh
设置JAVA_HOME默认路径
这里必须设置,本来我以为设置环境变量就可以了,没想到不好使,必须在hadoop-env.sh设置JAVA_HOME变量
#设置JAVA_HOME
[hadoop@ip10 hadoop]$ echo 'export JAVA_HOME=/home/hadoop/jdk1.8.0_181' >> /home/hadoop/hadoop-3.2.1/etc/hadoop/hadoop-env.sh
设置core-site.xml
[hadoop@ip13 ~]$ mkdir /home/hadoop/workdata/hadoop/tmp -p
[hadoop@ip10 hadoop]$ vim /home/hadoop/hadoop-3.2.1/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip10.hadoop.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/workdata/hadoop/tmp</value>
</property>
</configuration>
设置hdfs-site.xml
[hadoop@ip10 hadoop]$ vim /home/hadoop/hadoop-3.2.1/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ip10.hadoop.com:50090</value>
</property>
</configuration>
设置workers (原slaves)
#从节点,原来叫slaves
[hadoop@ip10 hadoop]$ vim /home/hadoop/hadoop-3.2.1/etc/hadoop/workers
ip11.hadoop.com
ip12.hadoop.com
ip13.hadoop.com
克隆第二台虚拟机
配置好如上配置后,关闭ip10虚拟机,克隆虚拟机,选择完整克隆。
配置第二台虚拟机网络 - ip11.hadoop.com
网络配置见:[Linux虚拟机-桥接网络-可视化简单配置](https://blog.youkuaiyun.com/sinat_25528181/article/details/106327434)
- 设置ip为192.168.1.11
- hostname ip11.hadoop.com
配置免密登录
- 在ip10服务器做如下操作
#配置免密登录
[hadoop@ip10 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@ip10 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@ip10 ~]$ ssh-copy-id ip11.hadoop.com
- 在ip11服务器做如下操作
#配置免密登录
[hadoop@ip11 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[hadoop@ip11 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@ip11 ~]$ ssh-copy-id ip10.hadoop.com
即可实现ip10和ip11的免密登录
克隆第三台虚拟机 - ip12.hadoop.com
网络配置见:[Linux虚拟机-桥接网络-可视化简单配置](https://blog.youkuaiyun.com/sinat_25528181/article/details/106327434)
- 设置ip为192.168.1.12
- hostname ip12.hadoop.com
克隆第四台虚拟机 - ip13.hadoop.com
网络配置见:[Linux虚拟机-桥接网络-可视化简单配置](https://blog.youkuaiyun.com/sinat_25528181/article/details/106327434)
- 设置ip为192.168.1.13
- hostname ip13.hadoop.com
截至目前,四台机器均完成免密配置,但是需要相互之间ssh一下,首次连接需要输个yes
启动测试
#格式化namenode节点
[hadoop@ip10 hadoop-3.2.1]$ hdfs namenode -format
#启动
[hadoop@ip10 hadoop-3.2.1]$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [ip10.hadoop.com]
Starting datanodes
Starting secondary namenodes [ip10.hadoop.com]
Starting resourcemanager
Starting nodemanagers
查看web页面
- MapReduce JobHistory Server 没启,就不看了
- http://ip10.hadoop.com:9870/
- http://ip10.hadoop.com:8088/
问题记录
- 开始以为配置了jdk的环境变量就不用配置hadoop-env.sh ,直到启动的时候看见报错,所以这个还是有必要的:
echo 'export JAVA_HOME=/home/hadoop/jdk1.8.0_181' >> /home/hadoop/hadoop-3.2.1/etc/hadoop/hadoop-env.sh
hadoop@ip10 sbin]$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [ip10.hadoop.com]
ip10.hadoop.com: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
ip13.hadoop.com: ERROR: JAVA_HOME is not set and could not be found.
ip11.hadoop.com: ERROR: JAVA_HOME is not set and could not be found.
ip12.hadoop.com: ERROR: JAVA_HOME is not set and could not be found.
- 官网配置文件属性太多了,暂均默认,后期遇到问题再加吧
发现
发现所有节点配置均一致,是不是将hadoop安装目录做个目录共享就可以了,就不需要来回配置了?