HDFS benchmark 基准测试

Hadoop基准测试

最新推荐文章于 2023-03-30 15:29:09 发布

原创最新推荐文章于 2023-03-30 15:29:09 发布 · 934 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#HDFS #测试

Hadoop生态专栏收录该内容

27 篇文章

订阅专栏

本文介绍Hadoop基准测试方法，包括TestDFSIO测试HDFS的IO性能、nnbench测试NameNode负载及mrbench测试小作业效率。同时展示了TeraSort排序示例。

一. Hadoop基准测试

Hadoop自带了几个基准测试，被打包在几个jar包中。本文主要是cloudera版本测试

[root@bd129118 hadoop-0.20-mapreduce]# ls /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop* | egrep "examples|test"

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-2.6.0-mr1-cdh5.8.3.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-mr1.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-mr1.jar

(1)、Hadoop Test

[root@bd129118 hadoop-0.20-mapreduce]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar

列出所有的测试程序：

An example program must be given as the first argument.

Valid program names are:

DFSCIOTest: Distributed i/o benchmark of libhdfs.

DistributedFSCheck: Distributed checkup of the file system consistency.

MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures

TestDFSIO: Distributed i/o benchmark.

dfsthroughput: measure hdfs throughput

filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

loadgen: Generic map/reduce load generator

mapredtest: A map/reduce test check.

minicluster: Single process HDFS and MR cluster.

mrbench: A map/reduce benchmark that can create many small jobs

nnbench: A benchmark that stresses the namenode.

testarrayfile: A test for flat files of binary key/value pairs.

testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

testfilesystem: A test for FileSystem read/write.

testmapredsort: A map/reduce program that validates the map-reduce framework's sort.

testrpc: A test for rpc.

testsequencefile: A test for flat files of binary key value pairs.

testsequencefileinputformat: A test for sequence file input format.

testsetfile: A test for flat files of binary key/value pairs.

testtextinputformat: A test for text input format.

threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

测试的时候切换到hdfs有权限操作的目录下，比如/tmp，因为测试时需要写入日志

(2) TestDFSIO write

TestDFSIO用于测试HDFS的IO性能，使用一个MapReduce作业来并发地执行读写操作，每个map任务用于读或写每个文件，map的输出用于收集与处理文件相关的统计信息，reduce用于累积统计信息，并产生summary。TestDFSIO的用法如下：

TestDFSIO

a、往HDFS中写入10个1000MB的文件：

[root@cdhmaster hadoop-0.20-mapreduce]# su hdfs

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

18/01/05 10:15:03 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write

18/01/05 10:15:03 INFO fs.TestDFSIO: Date & time: Fri Jan 05 10:15:03 CST 2018

18/01/05 10:15:03 INFO fs.TestDFSIO: Number of files: 10

18/01/05 10:15:03 INFO fs.TestDFSIO: Total MBytes processed: 10000.0

18/01/05 10:15:03 INFO fs.TestDFSIO: Throughput mb/sec: 5.968153930626179

18/01/05 10:15:03 INFO fs.TestDFSIO: Average IO rate mb/sec: 6.171319007873535

18/01/05 10:15:03 INFO fs.TestDFSIO: IO rate std deviation: 1.2151655970099144

18/01/05 10:15:03 INFO fs.TestDFSIO: Test exec time sec: 219.775

18/01/05 10:15:03 INFO fs.TestDFSIO:

(3) TestDFSIO read

从HDFS中读取10个1000MB的文件：

hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

18/01/05 10:21:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read

18/01/05 10:21:55 INFO fs.TestDFSIO: Date & time: Fri Jan 05 10:21:55 CST 2018

18/01/05 10:21:55 INFO fs.TestDFSIO: Number of files: 10

18/01/05 10:21:55 INFO fs.TestDFSIO: Total MBytes processed: 10000.0

18/01/05 10:21:55 INFO fs.TestDFSIO: Throughput mb/sec: 107.43677345881949

18/01/05 10:21:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 198.25967407226562

18/01/05 10:21:55 INFO fs.TestDFSIO: IO rate std deviation: 171.02233976664863

18/01/05 10:21:55 INFO fs.TestDFSIO: Test exec time sec: 36.756

(4) 清空测试数据

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -clean

二、nnbench测试

nnbench用于测试NameNode的负载，它会生成很多与HDFS相关的请求，给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。

执行如下命令，查看nnbench的用法如下：

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar nnbench

使用12个mapper和6个reducer来创建1000个文件：

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-'hostname -s'

NameNode Benchmark 0.4

18/01/05 10:39:39 INFO hdfs.NNBench: Test Inputs:

18/01/05 10:41:52 INFO hdfs.NNBench: -------------- NNBench -------------- :

18/01/05 10:41:52 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4

18/01/05 10:41:52 INFO hdfs.NNBench: Date & time: 2018-01-05 10:41:52,622

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench: Test Operation: create_write

18/01/05 10:41:52 INFO hdfs.NNBench: Start time: 2018-01-05 10:41:39,473

18/01/05 10:41:52 INFO hdfs.NNBench: Maps to run: 12

18/01/05 10:41:52 INFO hdfs.NNBench: Reduces to run: 6

18/01/05 10:41:52 INFO hdfs.NNBench: Block Size (bytes): 1

18/01/05 10:41:52 INFO hdfs.NNBench: Bytes to write: 0

18/01/05 10:41:52 INFO hdfs.NNBench: Bytes per checksum: 1

18/01/05 10:41:52 INFO hdfs.NNBench: Number of files: 1000

18/01/05 10:41:52 INFO hdfs.NNBench: Replication factor: 3

18/01/05 10:41:52 INFO hdfs.NNBench: Successful file operations: 0

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench: # maps that missed the barrier: 0

18/01/05 10:41:52 INFO hdfs.NNBench: # exceptions: 0

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench: TPS: Create/Write/Close: 0

18/01/05 10:41:52 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0

18/01/05 10:41:52 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN

18/01/05 10:41:52 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 0

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.0

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: Late maps: 0

18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: # of exceptions: 0

三、mrbench测试

mrbench会多次重复执行一个小作业，用于检查在机群上小作业的运行是否可重复以及运行是否高效。

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar mrbench -numRuns 50

四、 Hadoop Examples

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar

（1）TeraSort

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teragen 100000000 /examples/terasort-input

以下命令运行TeraSort对数据进行排序，并将结果输出到目录/examples/terasort-output：

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output

TeraGen产生的数据每行的格式如下：

<10 bytes key><10 bytes rowid><78 bytes filler>\r\n

以下命令运行TeraSort对数据进行排序，并将结果输出到目录/examples/terasort-output：

hadoop jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output

（2） terasort-validate 验证是否有序

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teravalidate /examples/terasort-output /examples/terasort-validate

查看检查结果

hadoop fs -cat /examples/terasort-validate/*