原文链接: HDFS(一、HDFS概述、客户端、shell操作).
1. HDFS定义
HDFS(Hadoop Distributed File System),它是一个文件系统,用于存储文件,通过目录树来定位文件;其次,他是分布式的,由很多服务器联合起来实现其功能,集群的中的服务器都有各自的角色。
HDFS使用场景:适合一次写入,多次读出的场景(不支持数据修改),且不支持文件的修改。适合用来做数据分析,并不适合用来做网盘应用(读写频繁的)
2. HDFS优缺点
1. 优点:
2. 缺点:
3. HDFS组成架构
- **NameNode(nn):**就是master。它是一个主管,管理者。
:管理HDFS的命名空间;
:配置副本策略;
:管理数据块(Block)的映射信息;
:处理客户端读写请求。 - DataNode:就是Slave ,3.x叫worker。NameNode下达命令,DataNode 执行实际的操作。
:存储实际的数据块;
:执行数据块读写操作; - Client:就是客户端。
:文件切分,文件上传 HDFS 的时候,Client 将文件切分成一个个的 Block ,然后上传;
:与NameNode交互,获取文件位置信息;
:与DataNode交互,读取/写入数据;
:Client提供一些命令来管理HDFS,比如NameNode格式化;
:Client可以通过一些命令来访问HDFS,比如对HDFS的增删改查操作。 - Secondary NameNode:并非NameNode的热备,当NameNode挂掉时候,并不能马上替换NameNode并提供服务。
:辅助NameNode,分担其工作量,比如定期合并Fsimage(镜像文件)和Edits(编辑日志),并推送给NameNode;
:在紧急情况下,可辅助恢复NameNode。
4. HDFS文件块大小
134217728/1024/1024 = 128M
如果磁盘为固态硬盘等传输速率更高的,则可以将block设置为更大的大小
5. HDFS的Shell操作
基本语法
[root@hadoop100 hadoop-3.2.1]# hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-v] [-x] <path> ...]
[-expunge [-immediate]]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-head <file>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] [-s <sleep interval>] <file>]
[-test -[defswrz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
- -help: 输出这个命令参数
[root@hadoop100 hadoop-3.2.1]# hdfs dfs -help rm
-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ... :
Delete all files that match the specified file pattern. Equivalent to the Unix
command "rm <src>"
-f If the file does not exist, do not display a diagnostic message or
modify the exit status to reflect an error.
-[rR] Recursively deletes directories.
-skipTrash option bypasses trash, if enabled, and immediately deletes <src>.
-safely option requires safety confirmation, if enabled, requires
confirmation before deleting large directory with more than
<hadoop.shell.delete.limit.num.files> files. Delay is expected when
walking over large directory recursively to count the number of
files to be deleted before the confirmation.
- -ls: 显示目录信息 ,-ls -R: 显示详细信息
[root@hadoop100 ~]# hdfs dfs -ls -R /
drwx------ - root supergroup 0 2020-06-14 21:34 /tmp
drwx------ - root supergroup 0 2020-06-14 21:34 /tmp/hadoop-yarn
drwx------ - root supergroup 0 2020-06-14 21:39 /tmp/hadoop-yarn/staging
drwxr-xr-x - root supergroup 0 2020-06-14 21:39 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - root supergroup 0 2020-06-14 21:39 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx--- - root supergroup 0 2020-06-14 21:40 /tmp/hadoop-yarn/staging/history/done_intermediate/root
-rwxrwx--- 3 root supergroup 22773 2020-06-14 21:40 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1592141128432_0002-1592141973598-root-word+count-1592142001631-1-1-SUCCEEDED-default-1592141984973.jhist
-rwxrwx--- 3 root supergroup 439 2020-06-14 21:40 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1592141128432_0002.summary
-rwxrwx--- 3 root supergroup 223001 2020-06-14 21:40 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1592141128432_0002_conf.xml
drwx------ - root supergroup 0 2020-06-14 21:34 /tmp/hadoop-yarn/staging/root
drwx------ - root supergroup 0 2020-06-14 21:40 /tmp/hadoop-yarn/staging/root/.staging
drwxr-xr-x - root supergroup 0 2020-06-14 21:39 /user
drwxr-xr-x - root supergroup 0 2020-06-14 21:35 /user/input
-rw-r--r-- 3 root supergroup 70 2020-06-14 21:35 /user/input/wc.input
drwxr-xr-x - root supergroup 0 2020-06-14 21:40 /user/output
-rw-r--r-- 3 root supergroup 0 2020-06-14 21:40 /user/output/_SUCCESS
-rw-r--r-- 3 root supergroup 72 2020-06-14 21:40 /user/output/part-r-00000
[root@hadoop100 ~]# hdfs dfs -ls /
Found 2 items
drwx------ - root supergroup 0 2020-06-14 21:34 /tmp
drwxr-xr-x - root supergroup 0 2020-06-14 21:39 /user
- -mkdir: 在HDFS上创建目录
[root@hadoop100 ~]# hdfs dfs -mkdir /clearlove
[root@hadoop100 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - root supergroup 0 2020-06-19 00:34 /clearlove
drwx------ - root supergroup 0 2020-06-14 21:34 /tmp
drwxr-xr-x - root supergroup 0 2020-06-14 21:39 /user
- -moveFromLocal: 从本地剪切粘贴到HDFS
[root@hadoop100 hadoop-3.2.1]# cd input/
[root@hadoop100 input]# touch 4396.txt
[root@hadoop100 input]# vi 4396.txt
[root@hadoop100 input]# hdfs dfs -moveFromLocal ./4396.txt /clearlove
2020-06-19 00:39:08,549 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
5. -appendToFile: 追加一个文件到已经存在的文件末尾
[root@hadoop100 input]# touch 777.txt
[root@hadoop100 input]# vi 777.txt
[root@hadoop100 input]# hdfs dfs -appendToFile 777.txt /clearlove/4396.txt
2020-06-19 00:50:57,495 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@hadoop100 input]# hdfs dfs -cat /clearlove/4396.txt
2020-06-19 00:52:04,788 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
7777
clearlove
- -cat: 显示文件内容
如上 - -copyFromLocal: 从本地文件系统中拷贝文件到HDFS路径去
- -copyToLocal: 从HDFS拷贝到本地
- **-cp :**从HDFS的一个路径拷贝到HDFS的另一个路径
- -mv: 在HDFS目录中移动文件
- -get: 等同于copyToLocal,就是从HDFS下载文件到本地
- -getmerge: 合并下载多个文件,比如HDFS的目录 /clearlove下有多个文件
[root@hadoop100 input]# hdfs dfs -getmerge /clearlove/* ./uzi.txt
2020-06-19 00:59:21,523 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-06-19 00:59:21,575 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[root@hadoop100 input]# cat uzi.txt
7777
clearlove
clearlove
- -put: 等同于copyFromLocal
- -tail: 显示一个文件的末尾
[root@hadoop100 input]# hdfs dfs -tail /clearlove/777.txt
2020-06-19 01:01:47,715 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
clearlove
- -rm: 删除文件或文件夹
- -rmdir: 删除空目录
- -du: 统计文件夹的大小信息
[root@hadoop100 input]# hdfs dfs -du /clearlove
15 45 /clearlove/4396.txt
10 30 /clearlove/777.txt
[root@hadoop100 input]# hdfs dfs -du -s /clearlove
25 75 /clearlove
- -setrep: 设置HDFS中文件的副本数量
[root@hadoop100 input]# hdfs dfs -setrep 10 /clearlove/777.txt
Replication 10 set: /clearlove/777.txt