一. Hadoop基准测试
Hadoop自带了几个基准测试,被打包在几个jar包中。本文主要是cloudera版本测试
[root@bd129118 hadoop-0.20-mapreduce]# ls /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop* | egrep "examples|test"
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-2.6.0-mr1-cdh5.8.3.jar
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-mr1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-mr1.jar
(1)、Hadoop Test
[root@bd129118 hadoop-0.20-mapreduce]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar
列出所有的测试程序:
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
TestDFSIO: Distributed i/o benchmark.
dfsthroughput: measure hdfs throughput
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
testarrayfile: A test for flat files of binary key/value pairs.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testrpc: A test for rpc.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testsetfile: A test for flat files of binary key/value pairs.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
测试的时候切换到hdfs有权限操作的目录下,比如/tmp,因为测试时需要写入日志
(2) TestDFSIO write
TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。TestDFSIO的用法如下:
TestDFSIO
Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
a、往HDFS中写入10个1000MB的文件:
[root@cdhmaster hadoop-0.20-mapreduce]# su hdfs
[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
18/01/05 10:15:03 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/01/05 10:15:03 INFO fs.TestDFSIO: Date & time: Fri Jan 05 10:15:03 CST 2018
18/01/05 10:15:03 INFO fs.TestDFSIO: Number of files: 10
18/01/05 10:15:03 INFO fs.TestDFSIO: Total MBytes processed: 10000.0
18/01/05 10:15:03 INFO fs.TestDFSIO: Throughput mb/sec: 5.968153930626179
18/01/05 10:15:03 INFO fs.TestDFSIO: Average IO rate mb/sec: 6.171319007873535
18/01/05 10:15:03 INFO fs.TestDFSIO: IO rate std deviation: 1.2151655970099144
18/01/05 10:15:03 INFO fs.TestDFSIO: Test exec time sec: 219.775
18/01/05 10:15:03 INFO fs.TestDFSIO:
(3) TestDFSIO read
从HDFS中读取10个1000MB的文件:
hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
18/01/05 10:21:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
18/01/05 10:21:55 INFO fs.TestDFSIO: Date & time: Fri Jan 05 10:21:55 CST 2018
18/01/05 10:21:55 INFO fs.TestDFSIO: Number of files: 10
18/01/05 10:21:55 INFO fs.TestDFSIO: Total MBytes processed: 10000.0
18/01/05 10:21:55 INFO fs.TestDFSIO: Throughput mb/sec: 107.43677345881949
18/01/05 10:21:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 198.25967407226562
18/01/05 10:21:55 INFO fs.TestDFSIO: IO rate std deviation: 171.02233976664863
18/01/05 10:21:55 INFO fs.TestDFSIO: Test exec time sec: 36.756
(4) 清空测试数据
[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar TestDFSIO -clean
二、nnbench测试
nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。
执行如下命令,查看nnbench的用法如下:
[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar nnbench
使用12个mapper和6个reducer来创建1000个文件:
[hdfs@cdhmaster tmp]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.4.7.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-'hostname -s'
NameNode Benchmark 0.4
18/01/05 10:39:39 INFO hdfs.NNBench: Test Inputs:
18/01/05 10:41:52 INFO hdfs.NNBench: -------------- NNBench -------------- :
18/01/05 10:41:52 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4
18/01/05 10:41:52 INFO hdfs.NNBench: Date & time: 2018-01-05 10:41:52,622
18/01/05 10:41:52 INFO hdfs.NNBench:
18/01/05 10:41:52 INFO hdfs.NNBench: Test Operation: create_write
18/01/05 10:41:52 INFO hdfs.NNBench: Start time: 2018-01-05 10:41:39,473
18/01/05 10:41:52 INFO hdfs.NNBench: Maps to run: 12
18/01/05 10:41:52 INFO hdfs.NNBench: Reduces to run: 6
18/01/05 10:41:52 INFO hdfs.NNBench: Block Size (bytes): 1
18/01/05 10:41:52 INFO hdfs.NNBench: Bytes to write: 0
18/01/05 10:41:52 INFO hdfs.NNBench: Bytes per checksum: 1
18/01/05 10:41:52 INFO hdfs.NNBench: Number of files: 1000
18/01/05 10:41:52 INFO hdfs.NNBench: Replication factor: 3
18/01/05 10:41:52 INFO hdfs.NNBench: Successful file operations: 0
18/01/05 10:41:52 INFO hdfs.NNBench:
18/01/05 10:41:52 INFO hdfs.NNBench: # maps that missed the barrier: 0
18/01/05 10:41:52 INFO hdfs.NNBench: # exceptions: 0
18/01/05 10:41:52 INFO hdfs.NNBench:
18/01/05 10:41:52 INFO hdfs.NNBench: TPS: Create/Write/Close: 0
18/01/05 10:41:52 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 0.0
18/01/05 10:41:52 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN
18/01/05 10:41:52 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN
18/01/05 10:41:52 INFO hdfs.NNBench:
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 0
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 0.0
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: Late maps: 0
18/01/05 10:41:52 INFO hdfs.NNBench: RAW DATA: # of exceptions: 0
三、mrbench测试
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。
hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-test-2.6.0-mr1-cdh5.8.3.jar mrbench -numRuns 50
四、 Hadoop Examples
hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
(1)TeraSort
hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teragen 100000000 /examples/terasort-input
以下命令运行TeraSort对数据进行排序,并将结果输出到目录/examples/terasort-output:
hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output
TeraGen产生的数据每行的格式如下:
<10 bytes key><10 bytes rowid><78 bytes filler>\r\n
以下命令运行TeraSort对数据进行排序,并将结果输出到目录/examples/terasort-output:
hadoop jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort /examples/terasort-input /examples/terasort-output
(2) terasort-validate 验证是否有序
hadoop jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teravalidate /examples/terasort-output /examples/terasort-validate
查看检查结果
hadoop fs -cat /examples/terasort-validate/*