目录
Hadoop实战记录-目录:https://blog.youkuaiyun.com/weixin_39565597/article/details/104525929
介绍
官方文档http://hadoop.apache.org/
环境准备
集群:
192.168.220.151 node1
192.168.220.152 node2
192.168.220.153 node3
jdk8:java -version
可联网centos7:ping www.baidu.com
主机名修改:node1、node2、node3
关闭防火墙:systemctl status firewalld.service结果为disavtive(dead)状态
远程免密登录:ssh root@node2、ssh root@node3
安装目录:/opt/soft/hadoop/hadoop-2.7.2
1、单机部署
# 查看版本
[root@node1 hadoop-2.7.2] ./bin/hadoop version
Hadoop 2.7.2
# 创建输入源
[root@node1 hadoop-2.7.2] mkdir input
[root@node1 hadoop-2.7.2] cp etc/hadoop/*.xml input
# 执行官方grep案例
[root@node1 hadoop-2.7.2] ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
# 查看输出结果
[root@node1 hadoop-2.7.2] cat output/*
# 创建输入源
[root@node1 hadoop-2.7.2] mkdir wcinput
[root@node1 hadoop-2.7.2] cd wcinput
[root@node1 wcinput] vim wc.input
hadoop yarn
hadoop mapreduce
# 执行官方wordcount案例
[root@node1 hadoop-2.7.2] ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput
# 查看输出结果
[root@node1 hadoop-2.7.2] cat wcoutput/part-r-00000
hadoop 2
yarn 1
mapreduce 1
2、分布式部署
保证三个节点的服务器均可以单独启动hadoop应用
集群规划
node1:Namenode、Datanode、nodemanager
node2:datanode、resourcemanager
node3:secondarynamenode、datanode、nodemanager
配置目录:/opt/soft/hadoop/hadoop-2.7.2/etc/hadoop
2.1、HDFS配置
配置core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/soft/hadoop/hadoop-2.7.2/data/tmp</value>
</property>
配置hadoop-env.sh文件
[root@node1 hadoop] vim hadoop-env.sh
export JAVA_HOME=/opt/soft/jdk1.8.0_144
配置hdfs-site.xml
<!-- 指定副本数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
</property>
2.2、YARN配置
配置yarn-env.sh文件
[root@node1 hadoop] vim yarn-env.sh
export JAVA_HOME=/opt/soft/jdk1.8.0_144
配置yarn-site.xml
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node2</value>
</property>
2.3、MapReduce配置
配置mapred-env.sh文件
[root@node1 hadoop] vim mapred-env.sh
export JAVA_HOME=/opt/soft/jdk1.8.0_144
配置mapred-site.xml
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.4、拷贝至其它集群节点
# 进入hadoop目录
[root@node1 hadoop] cd /opt/soft/hadoop
# 拷贝至node2节点
[root@node1 hadoop] scp -r ./hadoop-2.7.2 root@node2:/opt/soft/hadoop
# 拷贝至node3节点
[root@node1 hadoop] scp -r ./hadoop-2.7.2 root@node3:/opt/soft/hadoop
# 期间出现登录说明ssh免密登录未配置好
2.5、格式化
第一次启动时,需要格式化NameNode节点
[root@node1 hadoop-2.7.2] ./bin/hadoop namenode -format
2.6、集群启动
# 启动Namenode:会自动启动其余节点的datanode
# 启动异常可查看out日志
[root@node1 hadoop-2.7.2] /opt/soft/hadoop/hadoop-2.7.2/sbin/start-dfs.sh
Starting namenodes on [node1]
node1: starting namenode, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-node1.out
node1: starting datanode, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-node1.out
node2: starting datanode, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-node2.out
node3: starting datanode, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-node3.out
Starting secondary namenodes [node3]
node3: starting secondarynamenode, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-node3.out
# 启动yarn,会自动启动其余节点的manager
# 启动异常可查看out日志
[root@node2 hadoop-2.7.2] /opt/soft/hadoop/hadoop-2.7.2/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-node2.out
node1: starting nodemanager, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-node1.out
node2: starting nodemanager, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-node2.out
node3: starting nodemanager, logging to /opt/soft/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-node3.out
# 查看启动情况
[root@node1 hadoop-2.7.2] jps
29665 NameNode
30120 NodeManager
29801 DataNode
30254 Jps
[root@node2 hadoop-2.7.2] jps
21765 DataNode
21960 ResourceManager
22265 NodeManager
22415 Jps
[root@node3 hadoop-2.7.2] jps
23424 NodeManager
23221 SecondaryNameNode
23573 Jps
23142 DataNode
大功告成,返回浏览器输入http://node1:50070/,如果显示管理界面,则集群启动完成。