一、配置网络环境 host
1.通过ip addr show或ifconfig命令查看 IP地址
2.修改主机名字:vi /etc/sysconfig/network
3.NETWORKING=yes
HOSTNAME=hdpvm1 ###这里配置,改成你想要的名字 eg:master
输入 :hostname 检测是否修改成功
不行就重启下
二、JDK安装
1.解压:
2. 配置 Java环境变量
vim ~/.bashrc
//在文件最后添加
export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
source ~/.bashrc //刷新配置
java –version //验证,查看 java 版本
可能遇到的问题:
Linux安装jdk1.8时走到最后一步,检查jdk配置是否生效时,突然碰出一个问题来。
提示:bash: /usr/java/jdk1.8/bin/java: 权限不够
解决方法:
这时需要Linux连接Xshell,在Xshell中
输入命令:chmod +x /usr/java/jdk1.8/bin/java
三、关防火墙
#查看防火墙状态
service iptables status
#关闭防火墙
service iptables stop
#查看防火墙开机启动状态
chkconfig iptables --list
#关闭防火墙开机启动
chkconfig iptables off
四. ssh免密码验证配置
1.首先在Master机器配置
1-1.进去.ssh文件: [root@Hadoop-Master ~]#cd ~/.ssh 如果没有该目录,先执行一次ssh localhost,不要手动创建,不然配置好还要输入密码
1-2.生成秘钥 ssh-keygen:ssh-keygen -t rsa,一路狂按回车键就可以了,最终生成(id_rsa,id_rsa.pub两个文件)
1-3.生成authorized_keys文件:[spark@S1PA11 .ssh]$ cat id_rsa.pub >>authorized_keys
1-4.在另两台台机器Slave1、Slave2也生成公钥和秘钥
1-5.将Slave1机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s1
1-6.将Slave2机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s2
1-7.此切换到Master机器合并authorized_keys;
[root@Hadoop-Master .ssh]# cat id_rsa.pub_s1>> authorized_keys
[root@Hadoop-Master .ssh]# cat id_rsa.pub_s2>> authorized_keys
1-8.将authorized_keyscopy到Slave1、Slave2机器:
[root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave1:~/.ssh/
[root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave2:~/.ssh/
1-9.现在将各台 .ssh/文件夹权限改为700,authorized_keys文件权限改为600(or 644)
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
1-10.验证ssh
[root@Hadoop-Master .ssh]# ssh Hadoop-Slave1
Welcome to aliyun Elastic Compute Service!
[root@Hadoop-Slave1 ~]# exit
logout
Connection to Hadoop-Slave1 closed.
[root@Hadoop-Master .ssh]# ssh Hadoop-Slave2
Welcome to aliyun Elastic Compute Service!
[root@Hadoop-Slave2 ~]# exit
logout
Connection to Hadoop-Slave2 closed.
五、Hadoop部署:
3.1 安装Hadoop
3.1.1 下载
(1)到Hadoop官网下载,我下载的是hadoop-3.0.0.tar.gz
(2)同jdk类似,在家目录下(/home/ubuntu/)创建文件夹hadoop:mkdir hadoop,然后解压:tar –zxvf hadoop-3.0.0.tar.gz
3.1.2 配置环境变量
执行如下命令:
vim ~/.bashrc
//在文件最后添加
export HADOOP_HOME=/home/ubuntu/hadoop/hadoop-3.0.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc //刷新配置12345
3.1.3 创建文件目录
mkdir /home/ubuntu/hadoop/tmp
mkdir /home/ubuntu/hadoop/dfs
mkdir /home/ubuntu/hadoop/dfs/data
mkdir /home/ubuntu/hadoop/dfs/name1234
3.2 配置Hadoop
进入hadoop-3.0.0的配置目录:cd /home/ubuntu/hadoop/hadoop-3.0.0/etc/hadoop,依次修改hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及workers文件。
3.2.1 配置 hadoop-env.sh
vim hadoop-env.sh1
export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151 //在hadoop-env.sh中找到 JAVA_HOME,配置成对应安装路径
3.2.2 配置 core-site.xml (根据自己节点进行简单修改即可)
vim core-site.xml1
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mpi-1:9000</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>Size of read/write buffer used in SequenceFiles</description>
</property>
</configuration>1234567891011121314151617
3.2.3 配置 hdfs-site.xml
vim hdfs-site.xml1
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Hadoop的备份系数是指每个block在hadoop集群中有几份,系数越高,冗余性越好,占用存储也越多</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/ubuntu/hadoop/dfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/ubuntu/hadoop/dfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>mpi-1:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>dfs.permissions配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true</description>
</property>
</configuration>123456789101112131415161718192021222324252627282930
3.2.4 配置 mapred-site.xml
vim mapred-site.xml1
注:之前版本需要cp mapred-site.xml.template mapred-site.xml,hadoop-3.0.0直接是mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>mpi-1:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>mpi-1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>mpi-1:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://mpi-1:9001</value>
</property>
</configuration>123456789101112131415161718192021222324
3.2.5 配置 yarn-site.xml
vim yarn-site.xml1
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>mpi-1</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>mpi-1:8032</value>
<description>${yarn.resourcemanager.hostname}:8032</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>mpi-1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>mpi-1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>mpi-1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>mpi-1:8088</value>
</property>
</configuration>1234567891011121314151617181920212223242526272829303132
3.2.6 配置 workers 文件(之前版本是slaves,注意查看)
vim workers
---------------------
原文:https://blog.youkuaiyun.com/secyb/article/details/80170804
4 运行Hadoop集群
4.1 格式化namenode
hdfs namenode -format
//第一次使用hdfs,必须对其格式化(只需格式化一次)
六、最后 没起来的话,报这样的错:
Starting namenodes on [mpi-1]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [mpi-1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
2019-03-14 11:40:27,925 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@mpi-1 hadoop]# start-yarn.sh
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
解决办法:
写在最前注意:
1、master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件
2、如果你的Hadoop是另外启用其它用户来启动,记得将root改为对应用户
HDFS格式化后启动dfs出现以下错误:
[root@master sbin]# ./start-dfs.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [slave1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
1234567891011
查度娘,见一仁兄的博客有次FAQ,故参考处理顺便再做一记录
参考地址:https://blog.youkuaiyun.com/u013725455/article/details/70147331
在/hadoop/sbin路径下:
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
123456
还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
12345678
修改后重启 ./start-dfs.sh,成功!
[root@master sbin]# ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [master]
上一次登录:日 6月 3 03:01:37 CST 2018从 slave1pts/2 上
master: Warning: Permanently added 'master,192.168.43.161' (ECDSA) to the list of known hosts.
Starting datanodes
上一次登录:日 6月 3 04:09:05 CST 2018pts/1 上
Starting secondary namenodes [slave1]
上一次登录:日 6月 3 04:09:08 CST 2018pts/1 上
七、最终在浏览器检查
9870端口查看(这里是9870,不是50070了)
在浏览器输入master的IP:9870,结果如下:
测试YARN
在浏览器输入master的IP:8088,结果如下:
注:将绑定IP或mpi-1改为0.0.0.0,而不是本地回环IP,这样,就能够实现外网访问本机的8088端口了。比如这里需要将yarn-site.xml中的
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>mpi-1:8088</value>
</property>1234
修改为:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>1234
另外,可以直接参考Hadoop官网的默认配置文件进行修改,比如hdfs-site.xml文件,里面有详细的参数说明。另外可以使用hdfs dfs命令,比如hdfs dfs -ls /进行存储目录的查看。