修改hibench以支持reduce端压缩
- 背景
对hdfs集群进行压测时,当mapreduce.output.fileoutputformat.compress
设置为true的时候,hibench会报错,提示无法读取到文件
修改的地方,在于2处
- 在对运行结果进行Analyse时候,会读取
part-00000
这个文件,但是当背景参数设置为true时,这个文件会被压缩,实际存在的形式为part-00000.snappy
;TestDFSIOEnh.java
源代码如下
Path reduceFile;
if (testType == TEST_TYPE_WRITE)
reduceFile = new Path(DfsioeConfig.getInstance().getWriteDir(fsConfig), "part-00000");
else
reduceFile = new Path(DfsioeConfig.getInstance().getReadDir(fsConfig), "part-00000");
这里有2个思路
- 1是对inputstream流进行解压缩
/**
* 原代码
*/
BufferedReader rd = null;
long tasks = 0;
long size = 0;
long time = 0;
float rate = 0;
float sqrate = 0;
float loggingTime = 0;
try {
rd = new BufferedReader(new InputStreamReader(new DataInputStream(fs.open(reduceFile))));
/**
* 修改后
*/
FSDataInputStream in = null;
BufferedReader lines = null;
CompressionCodecFactory factory = new CompressionCodecFactory(fsConfig);
CompressionCodec codec = factory.getCodec(new Path(DfsioeConfig.getInstance().getReadDir(fsConfig), "part-00000.snappy"));
try {
in = new FSDataInputStream(fs.open(new Path(DfsioeConfig.getInstance().getReadDir(fsConfig), "part-00000.snappy")));
- 2是在
part-00000.snappy
写完之后,对其进行解压缩,生成part-00000
/**
* 修改后
*/
Path outReducefile = new Path(DfsioeConfig.getInstance().getWriteDir(fsConfig), "part-00000.snappy");
FSDataOutputStream out;
CompressionCodecFactory factory = new CompressionCodecFactory(fsConfig);
CompressionCodec codec = factory.getCodec(outReducefile);
FSDataInputStream in = fs.open(outReducefile);
CompressionInputStream comInStream = codec.createInputStream(in);
out = fs.create(reducefile);
IOUtils.copyBytes(comInStream,out,1024 * 1024 *5, false);
out.flush();
comInStream.close();
in.close();
out.close();
- 触发merge的时候,会对reports目录下的文件进行
copyMerge
操作,导致读取到的文件乱码,考虑到snappy不支持切分,所以copyMerge操作后得到的文件和原始文件内容完全一致,因此直接修改inputstream
/**
* 原代码
*/
FileUtil.copyMerge(fs, DfsioeConfig.getInstance().getReportDir(fsConfig), fs, DfsioeConfig.getInstance().getRepo rtTmp(fsConfig), false, fsConfig, null);
LOG.info("remote report file " + DfsioeConfig.getInstance().getReportTmp(fsConfig) + " merged.");
BufferedReader lines = new BufferedReader(new InputStreamReader(new DataInputStream(fs.open(DfsioeConfig.getInst ance().getReportTmp(fsConfig)))));
/**
* 修改后
*/
CompressionCodec coec = factory.getCodec(new Path(DfsioeConfig.getInstance().getReportDir(fsConfig),"part-r-00000.snappy"));
FSDataInputStream in2 = fs.open(new Path(DfsioeConfig.getInstance().getReportDir(fsConfig),"part-r-00000.snappy"));
CompressionInputStream comInStream1 = coec.createInputStream(in2);
BufferedReader lines = new BufferedReader(new InputStreamReader(comInStream1));
后面测试顺利通过