HDFS benchmark 基准测试

Hadoop基准测试
本文介绍Hadoop基准测试方法,包括TestDFSIO测试HDFS的IO性能、nnbench测试NameNode负载及mrbench测试小作业效率。同时展示了TeraSort排序示例。

一. Hadoop基准测试

Hadoop自带了几个基准测试,被打包在几个jar包中。本文主要是cloudera版本测试

 

[root@bd129118 hadoop-0.20-mapreduce]#      ls /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop* | egrep "examples|test"

 

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-2.6.0-mr1-cdh5.8.3.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-mr1.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar

/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-mr1.jar

 

 

(1)、Hadoop Test

 

[root@bd129118 hadoop-0.20-mapreduce]#   hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar

列出所有的测试程序:

An example program must be given as the first argument.

Valid program names are:

  DFSCIOTest: Distributed i/o benchmark of libhdfs.

  DistributedFSCheck: Distributed checkup of the file system consistency.

  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures

  TestDFSIO: Distributed i/o benchmark.

  dfsthroughput: measure hdfs throughput

  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

  loadgen: Generic map/reduce load generator

  mapredtest: A map/reduce test check.

  minicluster: Single process HDFS and MR cluster.

  mrbench: A map/reduce benchmark that can create many small jobs

  nnbench: A benchmark that stresses the namenode.

  testarrayfile: A test for flat files of binary key/value pairs.

  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

  testfilesystem: A test for FileSystem read/write.

  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.

  testrpc: A test for rpc.

  testsequencefile: A test for flat files of binary key value pairs.

  testsequencefileinputformat: A test for sequence file input format.

  testsetfile: A test for flat files of binary key/value pairs.

  testtextinputformat: A test for text input format.

  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

 

测试的时候切换到hdfs有权限操作的目录下,比如/tmp,因为测试时需要写入日志

(2) TestDFSIO write

TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。TestDFSIO的用法如下:

TestDFSIO

Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

 

a、往HDFS中写入10个1000MB的文件:

[root@cdhmaster hadoop-0.20-mapreduce]# su hdfs

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

 

18/01/05 10:15:03 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write

18/01/05 10:15:03 INFO fs.TestDFSIO:            Date & time: Fri Jan 05 10:15:03 CST 2018

18/01/05 10:15:03 INFO fs.TestDFSIO:        Number of files: 10

18/01/05 10:15:03 INFO fs.TestDFSIO: Total MBytes processed: 10000.0

18/01/05 10:15:03 INFO fs.TestDFSIO:      Throughput mb/sec: 5.968153930626179

18/01/05 10:15:03 INFO fs.TestDFSIO: Average IO rate mb/sec: 6.171319007873535

18/01/05 10:15:03 INFO fs.TestDFSIO:  IO rate std deviation: 1.2151655970099144

18/01/05 10:15:03 INFO fs.TestDFSIO:     Test exec time sec: 219.775

18/01/05 10:15:03 INFO fs.TestDFSIO:

 

(3) TestDFSIO read

从HDFS中读取10个1000MB的文件:

hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

 

18/01/05 10:21:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read

18/01/05 10:21:55 INFO fs.TestDFSIO:            Date & time: Fri Jan 05 10:21:55 CST 2018

18/01/05 10:21:55 INFO fs.TestDFSIO:        Number of files: 10

18/01/05 10:21:55 INFO fs.TestDFSIO: Total MBytes processed: 10000.0

18/01/05 10:21:55 INFO fs.TestDFSIO:      Throughput mb/sec: 107.43677345881949

18/01/05 10:21:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 198.25967407226562

18/01/05 10:21:55 INFO fs.TestDFSIO:  IO rate std deviation: 171.02233976664863

18/01/05 10:21:55 INFO fs.TestDFSIO:     Test exec time sec: 36.756

 

 

(4) 清空测试数据

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -clean

 

 

二、nnbench测试

nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。

执行如下命令,查看nnbench的用法如下:

[hdfs@cdhmaster tmp]$  hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar nnbench

 

使用12个mapper和6个reducer来创建1000个文件:

[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar  nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-'hostname -s'

 

NameNode Benchmark 0.4

18/01/05 10:39:39 INFO hdfs.NNBench: Test Inputs:

18/01/05 10:41:52 INFO hdfs.NNBench: -------------- NNBench -------------- :

18/01/05 10:41:52 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4

18/01/05 10:41:52 INFO hdfs.NNBench:                            Date & time: 2018-01-05 10:41:52,622

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench:                         Test Operation: create_write

18/01/05 10:41:52 INFO hdfs.NNBench:                             Start time: 2018-01-05 10:41:39,473

18/01/05 10:41:52 INFO hdfs.NNBench:                            Maps to run: 12

18/01/05 10:41:52 INFO hdfs.NNBench:                         Reduces to run: 6

18/01/05 10:41:52 INFO hdfs.NNBench:                     Block Size (bytes): 1

18/01/05 10:41:52 INFO hdfs.NNBench:                         Bytes to write: 0

18/01/05 10:41:52 INFO hdfs.NNBench:                     Bytes per checksum: 1

18/01/05 10:41:52 INFO hdfs.NNBench:                        Number of files: 1000

18/01/05 10:41:52 INFO hdfs.NNBench:                     Replication factor: 3

18/01/05 10:41:52 INFO hdfs.NNBench:             Successful file operations: 0

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench:         # maps that missed the barrier: 0

18/01/05 10:41:52 INFO hdfs.NNBench:                           # exceptions: 0

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0

18/01/05 10:41:52 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0

18/01/05 10:41:52 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN

18/01/05 10:41:52 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN

18/01/05 10:41:52 INFO hdfs.NNBench:

18/01/05 10:41:52 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0

18/01/05 10:41:52 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0

18/01/05 10:41:52 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 0

18/01/05 10:41:52 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 0.0

18/01/05 10:41:52 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0

18/01/05 10:41:52 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 0

 

 

三、mrbench测试

mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar mrbench -numRuns 50

 

四、 Hadoop Examples

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar

 

(1)TeraSort

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teragen 100000000 /examples/terasort-input

以下命令运行TeraSort对数据进行排序,并将结果输出到目录/examples/terasort-output:

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output

TeraGen产生的数据每行的格式如下:

<10 bytes key><10 bytes rowid><78 bytes filler>\r\n

以下命令运行TeraSort对数据进行排序,并将结果输出到目录/examples/terasort-output:

hadoop jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output

 

(2) terasort-validate 验证是否有序

hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teravalidate /examples/terasort-output /examples/terasort-validate

查看检查结果

hadoop fs -cat /examples/terasort-validate/*

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值