【HDFS】hdfs与fsck结合使用

最新推荐文章于 2025-04-20 21:36:45 发布

原创最新推荐文章于 2025-04-20 21:36:45 发布 · 487 阅读

0 ·

CC 4.0 BY-SA版权

Hadoop 专栏收录该内容

16 篇文章

订阅专栏

本文详细介绍了 HDFS fsck 命令的使用方法，包括如何获取全部参数、获取文件数据块及位置信息。fsck 命令用于检查 HDFS 文件系统的健康状况，可报告丢失的块、损坏的文件，并提供文件状态的全面概览。

1.获取全部参数

[hadoop@node01 ~]$ hdfs fsck
Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-maintenance]
        <path>  start checking from this path
        -move   move corrupted files to /lost+found
        -delete delete corrupted files
        -files  print out files being checked
        -openforwrite   print out files opened for write
        -includeSnapshots       include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
        -list-corruptfileblocks print out list of missing blocks and files they belong to
        -blocks print out block report
        -locations      print out locations for every block
        -racks  print out network topology for data-node locations

        -maintenance    print out maintenance state node details
        -blockId        print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

Please Note:
        1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
        2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

2.获取文件数据块，位置等信息

[hadoop@node01 ~]$ hdfs fsck /a.txt -files -blocks -locations
Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&path=%2Fa.txt
FSCK started by hadoop (auth:SIMPLE) from /192.168.52.100 for path /a.txt at Tue Jan 07 21:14:12 CST 2020
/a.txt 0 bytes, 0 block(s):  OK

Status: HEALTHY
 Total size:    0 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      0
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    2
 Average block replication:     0.0
 Corrupt blocks:                0
 Missing replicas:              0
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Tue Jan 07 21:14:12 CST 2020 in 5 milliseconds


The filesystem under path '/a.txt' is HEALTHY