Hadoop Definitive Guide --- Chapter 3. The Hadoop Distributed Filesystem

The Design of HDFS
HDFS的设计目标是在普通的大规模集群上存储海量的数据。
Very large files
非常大的文件意味着数百个M,G,T级别的数据,hadoop集群现在能存储PB级别的数据。
Streaming data access
HDFS建立在最高效的一次写,多次读的数据处理方式,所以采用流式数据读取。
Commodity hardware
Hadoop不需要很昂贵的高性能的机器,设计的初衷就是利用集群来达到海量存储和快速计算。

HDFS Concepts
Blocks
普通的基于单个硬盘的文件系统的块大小是512bytes,HDFS分布式文件系统的块大小是64MB。
查看组成文件的数据块命令: hadoop fsck / -files -blocks

NameNodes and DataNodes
一个HDFS集群由单个的NameNode和多个的DataNode组成。
名称节点管理文件系统的命名空间,它维护着文件系统树及其元数据。这些信息永久的保存在本地硬盘上,由如下两个文件存储,the namespace image 和 edit log。数据节点负责所在物理节点的存储管理。
由于名称节点是个单点,如果此单点发生故障的话将导致整个集群崩溃,hadoop提供以下两种方案来解决名称节点的单点故障问题。
第一种方案是将组成文件系统的元数据备份到本地硬盘,第二种方案是运行secondary namenode,它会周期性的合并namespace image和edit log,并会保存合并后的镜像文件用于当名称节点失败的时候可以恢复。

HDFS Federation
名称节点在内存中保存着文件和数据块的链接关系,这将意味着在大规模的集群中,名称节点的内存将会成为集群规模扩大的瓶颈。在Hadoop 2.x中,允许添加多个名称节点,每个名称节点管理文件系统的不同部分,
比如第一个管理/usr目录下的内容,第二个管理/share目录下的内容。

HDFS High-Availability
由于名称节点的单点问题,即使有secondary namenode来周期性的创建检查点,一旦名称节点失败的话,管理员需要启动一个新的名称节点,它需要做如下的启动工作。
将命名空间的镜像文件加载到内存中,恢复编辑日志文件,收集来自数据节点的块报告,通常在大规模的集群中启动一个新的名称节点的话需要30分钟或更多。
在Hadoop 2.x中,将会有一对名称节点active和standby。从架构上设计,这一对节点共享edit log,数据节点要将块报告发送给这一对名称节点。通常standby的节点能在数十秒内接管工作。

The Command-Line Interface

cat

Usage: hdfs dfs -cat URI [URI …]

Copies source paths to stdout.

Example:

  • hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
  • hdfs dfs -cat file:///file3 /user/hadoop/file4

Exit Code:
Returns 0 on success and -1 on error.

chgrp

Usage: hdfs dfs -chgrp [-R] GROUP URI [URI …]

Change group association of files. With -R, make the change recursively through the directory structure. The user must be the owner of files, or else a super-user. Additional information is in the Permissions Guide.

chmod

Usage: hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI …]

Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-user. Additional information is in the Permissions Guide.

chown

Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Change the owner of files. With -R, make the change recursively through the directory structure. The user must be a super-user. Additional information is in the Permissions Guide.

copyFromLocal

Usage: hdfs dfs -copyFromLocal <localsrc> URI

Similar to put command, except that the source is restricted to a local file reference.

copyToLocal

Usage: hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

count

Usage: hdfs dfs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. 

The output columns with -count are:

DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME 

The output columns with -count -q are:

QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME

Example:

  • hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
  • hdfs dfs -count -q hdfs://nn1.example.com/file1

Exit Code:

Returns 0 on success and -1 on error.

cp

Usage: hdfs dfs -cp URI [URI …] <dest>

Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory. 
Example:

  • hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
  • hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

du

Usage: hdfs dfs -du [-s] [-h] URI [URI …]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

Options:

  • The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files.
  • The -h option will format file sizes in a "human-readable" fashion (e.g 64.0m instead of 67108864)

Example:
hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1 
Exit Code:
Returns 0 on success and -1 on error. 

dus

Usage: hdfs dfs -dus <args>

Displays a summary of file lengths. This is an alternate form of hdfs dfs -du -s.

expunge

Usage: hdfs dfs -expunge

Empty the Trash. Refer to the HDFS Architecture Guide for more information on the Trash feature.

get

Usage: hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst> 

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

Example:

  • hdfs dfs -get /user/hadoop/file localfile
  • hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile

Exit Code:

Returns 0 on success and -1 on error.

getmerge

Usage: hdfs dfs -getmerge <src> <localdst> [addnl]

Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file.

ls

Usage: hdfs dfs -ls <args>

For a file returns stat on the file with the following format:

permissions number_of_replicas userid groupid filesize modification_date modification_time filename

For a directory it returns list of its direct children as in unix.A directory is listed as:

permissions userid groupid modification_date modification_time dirname

Example:

hdfs dfs -ls /user/hadoop/file1

Exit Code:

Returns 0 on success and -1 on error.

lsr

Usage: hdfs dfs -lsr <args> 
Recursive version of ls. Similar to Unix ls -R.

mkdir

Usage: hdfs dfs -mkdir <paths> 

Takes path uri's as argument and creates directories. The behavior is much like unix mkdir -p creating parent directories along the path.

Example:

  • hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
  • hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

moveFromLocal

Usage: dfs -moveFromLocal <localsrc> <dst>

Similar to put command, except that the source localsrc is deleted after it's copied.

moveToLocal

Usage: hdfs dfs -moveToLocal [-crc] <src> <dst>

Displays a "Not implemented yet" message.

mv

Usage: hdfs dfs -mv URI [URI …] <dest>

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted. 
Example:

  • hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
  • hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

Exit Code:

Returns 0 on success and -1 on error.

put

Usage: hdfs dfs -put <localsrc> ... <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

  • hdfs dfs -put localfile /user/hadoop/hadoopfile
  • hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
  • hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
  • hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile 
    Reads the input from stdin.

Exit Code:

Returns 0 on success and -1 on error.

rm

Usage: hdfs dfs -rm [-skipTrash] URI [URI …]

Delete files specified as args. Only deletes files. If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory. Refer to rmr for recursive deletes.
Example:

  • hdfs dfs -rm hdfs://nn.example.com/file

Exit Code:

Returns 0 on success and -1 on error.

rmr

Usage: hdfs dfs -rmr [-skipTrash] URI [URI …]

Recursive version of delete. The rmr command recursively deletes the directory and any content under it. If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory.
Example:

  • hdfs dfs -rmr /user/hadoop/dir
  • hdfs dfs -rmr hdfs://nn.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

setrep

Usage: hdfs dfs -setrep [-R] <path>

Changes the replication factor of a file. -R option is for recursively increasing the replication factor of files within a directory.

Example:

  • hdfs dfs -setrep -w 3 -R /user/hadoop/dir1

Exit Code:

Returns 0 on success and -1 on error.

stat

Usage: hdfs dfs -stat URI [URI …]

Returns the stat information on the path.

Example:

  • hdfs dfs -stat path

Exit Code:
Returns 0 on success and -1 on error.

tail

Usage: hdfs dfs -tail [-f] URI

Displays last kilobyte of the file to stdout. -f option can be used as in Unix.

Example:

  • hdfs dfs -tail pathname

Exit Code: 
Returns 0 on success and -1 on error.

test

Usage: hdfs dfs -test -[ezd] URI

Options: 
-e check to see if the file exists. Return 0 if true. 
-z check to see if the file is zero length. Return 0 if true. 
-d check to see if the path is directory. Return 0 if true. 

Example:

  • hdfs dfs -test -e filename

text

Usage: hdfs dfs -text <src> 

Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.

touchz

Usage: hdfs dfs -touchz URI [URI …] 

Create a file of zero length.

Example:

  • hadoop -touchz pathname

Exit Code:
Returns 0 on success and -1 on error.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值