hadoop的本地模式与伪分布式(单节点)、分布式的部署及集群搭建

本文详细介绍了Hadoop分布式系统的基础架构,包括其核心组件HDFS和MapReduce的工作原理,以及如何在单机、伪分布和完全分布式环境下进行配置与测试。通过实际操作步骤,展示了Hadoop在处理大规模数据集时的高效性和灵活性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

何为hadoop

Hadoop是一个由Apache基金会所开发的分布式系统基础架构。
用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。

Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。

HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。

Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。

Hadoop解决哪些问题?

  • 海量数据需要及时分析和处理

  • 海量数据需要深入分析和挖掘

  • 数据需要长期保存

海量数据存储的问题:

  • 磁盘IO称为一种瓶颈,而非CPU资源

  • 网络带宽是一种稀缺资源

  • 硬件故障成为影响稳定的一大因素

HDFS采用master/slave架构

Hadoop的三种运行模式 :

1.独立(本地)运行模式:无需任何守护进程,所有的程序都运行在同一个JVM上执行。在独立模式下调试MR程序非常高效方便。所以一般该模式主要是在学习或者开发阶段调试使用 。

2.伪分布式模式: Hadoop守护进程运行在本地机器上,模拟一个小规模的集群,换句话说,可以配置一台机器的Hadoop集群,伪分布式是完全分布式的一个特例。

3.完全分布式模式:Hadoop守护进程运行在一个集群上。

HDFS的主要模块

1.NameNode:

功能:是整个文件系统的管理节点。维护整个文件系统的文件目录数,文件/目录的源数据和每个文件对应的数据快列表。用于接受用户的请求。

2.DataNode:

是HA(高可用性)的一个解决方案,是备用镜像,但不支持热设备

一、hadoop单机版测试

1.安装hadoop,创建hadoop用户

[root@server1 ~]# useradd hadoop
[root@server1 ~]# echo redhat | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
==========================================================
[root@server1 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
==========================================================
[root@server1 ~]# mv * ~hadoop/
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-2.7.3.tar.gz     jdk-7u79-linux-x64.tar.gz
hbase-1.2.4-bin.tar.gz  zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ tar zxf hadoop-2.7.3.tar.gz 
[hadoop@server1 ~]$ tar zxf jdk-7u79-linux-x64.tar.gz 
[hadoop@server1 ~]$ ln -s jdk1.7.0_79/ java
[hadoop@server1 ~]$ ln -s hadoop-2.7.3 hadoop
[hadoop@server1 ~]$ ls
hadoop        hadoop-2.7.3.tar.gz     java         jdk-7u79-linux-x64.tar.gz
hadoop-2.7.3  hbase-1.2.4-bin.tar.gz  jdk1.7.0_79  zookeeper-3.4.9.tar.gz

2.配置环境变量

[hadoop@server1 ~]$ vim hadoop/etc/hadoop/hadoop-env.sh
========================================================
 25 export JAVA_HOME=/home/hadoop/java

[hadoop@server1 ~]$ vim .bash_profile 
=======================================
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/java/bin/

[hadoop@server1 ~]$ source .bash_profile 
[hadoop@server1 ~]$ jps
2280 Jps

3.测试

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
===================================================

[hadoop@server1 hadoop]$ ls input/
capacity-scheduler.xml  hadoop-policy.xml  httpfs-site.xml  kms-site.xml
core-site.xml           hdfs-site.xml      kms-acls.xml     yarn-site.xml
[hadoop@server1 hadoop]$ ls output/
part-r-00000  _SUCCESS

二、伪分布式

1.编辑文件

[hadoop@server1 hadoop]$ vim etc/hadoop/core-site.xml
======================================================
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://172.25.70.1:9000</value>
    </property>
</configuration>

[hadoop@server1 hadoop]$ vim etc/hadoop/hdfs-site.xml 
========================================
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

2.免密操作

[hadoop@server1 hadoop]$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
...
===============================================================
[hadoop@server1 ~]$ ssh-copy-id 172.25.70.1
[hadoop@server1 ~]$ ssh-copy-id localhost
[hadoop@server1 ~]$ ssh-copy-id server1

3.格式化,开启服务

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
===================================================
...
19/05/24 02:12:16 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 353 bytes saved in 0 seconds.
19/05/24 02:12:16 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/05/24 02:12:16 INFO util.ExitUtil: Exiting with status 0
19/05/24 02:12:16 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server1/172.25.70.1
************************************************************/
=======================================================================
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
============================================
Starting namenodes on [server1]
server1: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server1.out
localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server1.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is 8f:49:b9:37:74:65:26:03:4e:73:fc:44:2d:8b:6e:83.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-server1.out

[hadoop@server1 hadoop]$ jps
3064 DataNode
3389 Jps
3242 SecondaryNameNode
2962 NameNode

网页输入http://172.25.70.1:50070/

4.测试,创建目录,上传

[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ dfs -ls
-bash: dfs: command not found
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2019-05-24 02:20 input
[hadoop@server1 hadoop]$

[hadoop@server1 hadoop]$ rm -fr input
[hadoop@server1 hadoop]$ rm -fr output
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output
==============================================================================
...
		Merged Map outputs=8
		GC time elapsed (ms)=618
		Total committed heap usage (bytes)=1378369536
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=26007
	File Output Format Counters 
		Bytes Written=9984
==============================================================================
[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*
===================================================
"*"	18
"AS	8
"License");	8
"alice,bob	18
&quot;kerberos&quot;.	1
&quot;simple&quot;	1
'HTTP/'	1
'none'	1
'random'	1
'sasl'	1
'string'	1
'zookeeper'	2
...
=======================================================
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output
[hadoop@server1 hadoop]$ ls output/
======================================
part-r-00000  _SUCCESS
[hadoop@server1 hadoop]$ 

网页查看

三、分布式

1.环境恢复

[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 
Stopping namenodes on [server1]
server1: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
[hadoop@server1 hadoop]$ cd /tmp/
[hadoop@server1 tmp]$ ls
hadoop-hadoop                       Jetty_0_0_0_0_50090_secondary____y6aanv
hsperfdata_hadoop                   Jetty_localhost_38281_datanode____.rmss8j
Jetty_0_0_0_0_50070_hdfs____w2cu08
[hadoop@server1 tmp]$ rm -fr *

2.新建立server2、server3节点

  • server2-3:
[root@server2 ~]# useradd hadoop
[root@server2 ~]# echo redhat | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
[root@server2 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
======================================================
[root@server3 ~]# useradd hadoop
[root@server3 ~]# echo redhat | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
[root@server3 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
  • server1-3:
[root@server1 ~]# yum install -y nfs-utils
[root@server2 ~]# yum install -y nfs-utils
[root@server3 ~]# yum install -y nfs-utils

[root@server1 ~]# systemctl start rpcbind
[root@server2 ~]# systemctl start rpcbind
[root@server3 ~]# systemctl start rpcbind

3.server1开启服务,配置

[root@server1 ~]# systemctl start nfs-server
[root@server1 ~]# vim /etc/exports
===================================
/home/hadoop *(rw,anonuid=1000,anongid=1000)
============================================
[root@server1 ~]# exportfs -r
[root@server1 ~]# exportfs -rv
exporting *:/home/hadoop
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *

4.server2、3挂载

  • server2-3:
[root@server2 ~]# mount 172.25.70.1:/home/hadoop /home/hadoop
[root@server2 ~]# df
Filesystem               1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel-root     18855936 1097944  17757992   6% /
devtmpfs                    239256       0    239256   0% /dev
tmpfs                       250228       0    250228   0% /dev/shm
tmpfs                       250228    8584    241644   4% /run
tmpfs                       250228       0    250228   0% /sys/fs/cgroup
/dev/sda1                  1038336  141504    896832  14% /boot
tmpfs                        50048       0     50048   0% /run/user/0
172.25.70.1:/home/hadoop  18855936 2232320  16623616  12% /home/hadoop
[root@server2 ~]# 
================================================================================
[root@server3 ~]# mount 172.25.70.1:/home/hadoop /home/hadoop
[root@server3 ~]# df
Filesystem               1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel-root     18855936 1097932  17758004   6% /
devtmpfs                    239256       0    239256   0% /dev
tmpfs                       250228       0    250228   0% /dev/shm
tmpfs                       250228    8584    241644   4% /run
tmpfs                       250228       0    250228   0% /sys/fs/cgroup
/dev/sda1                  1038336  141504    896832  14% /boot
tmpfs                        50048       0     50048   0% /run/user/0
172.25.70.1:/home/hadoop  18855936 2232320  16623616  12% /home/hadoop
[root@server3 ~]# 

5.重新编辑文件

[hadoop@server1 ~]$ vim hadoop/etc/hadoop/hdfs-site.xml
=======================================
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

===============================================
[hadoop@server1 ~]$ vim hadoop/etc/hadoop/workers
==================================================
172.25.70.2
172.25.70.3
=================================================
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ cat hadoop/etc/hadoop/workers 
172.25.70.2
172.25.70.3
[hadoop@server2 ~]$ 
=================================================
[hadoop@server3 ~]$ cat hadoop/etc/hadoop/workers 
172.25.70.2
172.25.70.3
[hadoop@server3 ~]$ 

6.格式化,并重启服务

[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [server1]
server1: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server1.out
localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-server1.out

=================================================================

[hadoop@server2 ~]$ jps
12032 Jps
[hadoop@server2 ~]$ 

=========================
[hadoop@server3 ~]$ jps
12153 Jps
[hadoop@server3 ~]$ 

7.测试

[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ ls
bin  include  libexec      logs        output        README.txt  share
etc  lib      LICENSE.txt  NOTICE.txt  part-r-00000  sbin        _SUCCESS
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/input

8.上传大文件

[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 5.85916 s, 89.5 MB/s
[hadoop@server1 hadoop]$ bin/hdfs dfs -put bigfile
[hadoop@server1 hadoop]$ 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值