一. hadoop命令
官方文档:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
[ruoze@rzdata001 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[ruoze@rzdata001 ~]$
hadoop -help
查看命令帮助hadoop version
查看hadoop版本
[ruoze@rzdata001 ~]$ hadoop version
Hadoop 2.6.0-cdh5.16.2
Subversion http://github.com/cloudera/hadoop -r 4f94d60caa4cbb9af0709a2fd96dc3861af9cf20
Compiled by jenkins on 2019-06-03T10:42Z
Compiled with protoc 2.5.0
From source with checksum 79b9b24a29c6358b53597c3b49575e37
This command was run using /home/ruoze/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/hadoop-common-2.6.0-cdh5.16.2.jar
[ruoze@rzdata001 ~]$
hadoop jar
运行jar包
hadoop jar \
~/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar \ # jar
wordcount \ # jar的主函数
/user/ruoze/data/wordcount/input \ # 输入目录,hdfs上已创建
/user/ruoze/data/wordcount/output # 输出目录,hdfs上未创建
hadoop distcp
开并发拷贝
hadoop distcp -m 100 src_hdfs_path tar_hdfs_path
参数m 指定并发数,注意拷贝时源文件若有层级,拷贝后层级关系会消失
hadoop archive
归档小文件,减轻NameNode压力hadoop fs
[ruoze@rzdata001 ~]$ hadoop fs -help
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
常用如下:
hadoop fs -ls
查看HDFS上文件夹或文件信息
hadoop fs -touchz
创建一个HDFS上空文件
hadoop fs -cp
复制
hadoop fs -mv
移动
hadoop fs -rm
删除HDFS上文件 (-r 删除文件夹) (-f 强制删除force,无文件时不报错)
hadoop fs -mkdir
创建HDFS上文件夹
hadoop fs -put
把文件从本地客户端放到HDFS
hadoop fs -get
把文件从HDFS拿到本地客户端
hadoop fs -cat
查看HDFS上文件内容
hadoop fs -appendToFile
hadoop fs -test
- -d: path是文件夹则返回 0.
- -e: path存在则返回 0.
hadoop fs -test -e filename
- -f: path是文件则返回 0.
- -s: path不为空则返回 0.
- -r: path存在且有读权限则返回 0.
- -w: path存在且有写权限则返回 0.
- -z: 空文件则返回 0.
hadoop fs -text
获取源文件并以文本格式输出该文件。允许的格式是zip和TextRecordInputStream。
hadoop fs -du
计算大小 hadoop fs -du -sh
hadoop fs -count
- 两个概念,分布式文件系统为了保证文件的可靠性,往往会保存多个备份(一般是3份),只要备份数不为1的情况下,一般物理空间会是逻辑空间的几倍。
HDFS物理空间=逻辑空间*block备份数
概念 | 解释 |
---|---|
逻辑空间 | 即分布式文件系统上真正的文件大小 |
物理空间 | 存在分布式文件系统上该文件实际占用的空间 |
hadoop fs -count -q
会输出8列:
文件配额 | 剩余文件配额 | 物理空间的配额 | 剩余的物理空间 | 目录数 | 文件数 | 逻辑空间大小 | 路径 |
---|
hadoop fs -chmod
更改权限, -R递归更改
hadoop fs -chown
更改归属用户, -R递归更改
hadoop fs -chgrp
更改归属用户组, -R递归更改