0 环境
- Ubuntu18.04
- openssh-server
- Hadoop3.0.2
- JDK1.8.0_191
1 ssh配置
1.0 安装openssh-server
sudo apt-get install openssh-server
1.2 配置ssh登录
# 进入ssh目录
cd ~/.ssh
# 使用rsa算法生成秘钥和公钥对
ssh-keygen -t rsa
# 授权
cat ./id_rsa.pub >> ./authorized_keys
2 安装Hadoop
2.1 下载Hadoop
Hadoop:镜像地址
进入链接:HTTP
↦
\mapsto
↦http:mirror
↦
\mapsto
↦选择版本
Hadoop:备用地址
2.2 解压至指定目录
tar -zxvf hadoop-3.0.2.tar.gz -C /xindaqi/hadoop
2.3 配置hadoop文件
文件目录:/etc/hadoop
2.3.1 伪分布式部署
- hadoop-env.sh
指定JDK所在位置
export JAVA_HOME=/usr/java/jdk1.8.0_191
- core-site.xml
指定name主机节点和HDFS的根目录
<configuration>
<property>
<name>hadoop.tmp</name>
<value>file:/home/xdq/xinPrj/hadoop/hadoop-3.0.2/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9820</value>
<description>host namenode, cannot use 9000,for which satisfy inner access.</description>
</property>
</configuration>
- hdfs-site.xml
指定namenode和datanode存放目录,配置每个节点存放的副本数和hdfs的IP地址和端口号
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/xdq/xinPrj/hadoop/hadoop-3.0.2/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/xdq/xinPrj/hadoop/hadoop-3.0.2/tmp/dfs/data</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:9870</value>
<description>hadoop3.x default access port.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>xdq:9001</value>
<description>username and port</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
- log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
- mapred-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2.3.2 新建文件夹
hadoop-3.0.2建立存储文件夹,目录结构:
-- hadoop-3.0.2
|
`-- tmp
`-- dfs
|-- data
`-- name
3 测试
hadoop各部分的启停文件在sbin路径下,sbin目录结构如下:
-- hadoop-3.0.2
|-- sbin
| |-- FederationStateStore
| | |-- MySQL
| | `-- SQLServer
| |-- distribute-exclude.sh
| |-- hadoop-daemon.sh
| |-- hadoop-daemons.sh
| |-- httpfs.sh
| |-- kms.sh
| |-- mr-jobhistory-daemon.sh
| |-- refresh-namenodes.sh
| |-- start-all.cmd
| |-- start-all.sh
| |-- start-balancer.sh
| |-- start-dfs.cmd
| |-- start-dfs.sh
| |-- start-secure-dns.sh
| |-- start-yarn.cmd
| |-- start-yarn.sh
| |-- stop-all.cmd
| |-- stop-all.sh
| |-- stop-balancer.sh
| |-- stop-dfs.cmd
| |-- stop-dfs.sh
| |-- stop-secure-dns.sh
| |-- stop-yarn.cmd
| |-- stop-yarn.sh
| |-- workers.sh
| |-- yarn-daemon.sh
| `-- yarn-daemons.sh
3.1 hdfs操作
3.1.0 格式化
hadoop的bin目录下启动
hdfs namenode -format
- 结果
2019-07-24 09:14:48,822 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xdq/127.0.1.1
************************************************************/
3.1.2 启动
hadoop的sbin目录下启动
/home/xdq/xinPrj/hadoop/hadoop-3.0.2/sbin
start-dfs.sh
- 结果
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [localhost]
jps
- 结果
10346 NameNode
10955 Jps
10811 SecondaryNameNode
3.1.2 停止
stop-dfs.sh
- 结果
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [localhost]
jps
- 结果
10103 Jps
3.1.3 web测试
localhost:9870
Hadoop文件系统

- 小结
(1) dfs启动两个进程:NameNode和SecondaryNode
(2) dfs启动dfs文件系统;
3.2 yarn操作
3.2.1 启动
start-yarn.sh
- 结果
Starting resourcemanager
Starting nodemanagers
jps
- 结果
11121 ResourceManager
11666 Jps
11481 NodeManager
10346 NameNode
10811 SecondaryNameNode
3.2.2 停止
stop-yarn.sh
3.2.3 web测试
localhost:8088
Hadoop资源管理系统

- 小结
(1) yarn启动两个服务:ResourceManagerNodeManager
(2) yarn管理结点服务;
3.3 启动所有服务
start-all.sh
jps
- 结果
13377 NameNode
13845 SecondaryNameNode
14455 NodeManager
14616 Jps
14088 ResourceManager
没有DataNode
- 方案
删除tmp/*
下的文件,重启服务. - 格式化dfs
hdfs namenode -format
- 启动所有服务
start-all.sh
jps
- 结果
19057 DataNode
19537 ResourceManager
20049 Jps
19894 NodeManager
18854 NameNode
19307 SecondaryNameNode
3.4 文件操作
- hdfs-site.xml
使用username和port,才能添加文件。
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>xdq:9001</value>
</property>
- 查看文件
hdfs fs -ls /
- 结果
若为空,则新建文件夹. - 新建文件
hdfs fs -mkdir /user
查询
hdfs dfs -ls /
- 结果
Found 1 items
drwxr-xr-x - xdq supergroup 0 2019-07-24 15:09 /user
- Web端结果
4 总结
- Hadoop3.x以后的版本,dfs文件系统的端口为9870;
- Hadoop2.x版本,dfs文件系统接口为50070;
- Hadoop2x和3.x的yarn接口均为8088;
- 相对于Mac部署Hadoop,Ubuntu部署没走弯路,除了端口9870;
- (网上资源很重要呀,找对了少走弯路!哈哈哈!)
- 小白阶段,配置注释待补充;
Mac部署Hadoop参考:Mac部署Hadoop环境
[参考文献]
[1]https://blog.youkuaiyun.com/weixin_42001089/article/details/81865101
[2]https://blog.youkuaiyun.com/Xin_101/article/details/85225604
[3]https://blog.youkuaiyun.com/ShellDawn/article/details/79539826
[4]https://blog.youkuaiyun.com/qq_35571554/article/details/84075605
[5]https://blog.youkuaiyun.com/yjc_1111/article/details/53817750
[6]https://www.cnblogs.com/gulang-jx/p/8568439.html