hadoop文件压缩和解压缩的一个简单测试程序:
package org.myorg;
import java.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.util.ReflectionUtils;
public class StreamCompressor {
public static void main(String[] args) throws Exception{
String codecClassname = args[0];
Class<?> codecClass = Class.forName(codecClassname);
Configuration conf = new Configuration();
CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);
//把str里到数据压缩后放到text文件里
CompressionOutputStream out = codec.createOutputStream(new FileOutputStream(new File("text")));
String str = "try compress and decompress";
byte[] bytes = new byte[1024];
bytes = str.getBytes();
// IOUtils.copyBytes(new ByteArrayInputStream(bytes), out, 4096, false);
out.write(bytes);
out.finish();
//把text文件里到数据解压,然后输出到控制台
InputStream in = codec.createInputStream(new FileInputStream(new File("text")));
BufferedInputStream bfin = new BufferedInputStream(in);
bfin.read(bytes);
System.out.println(new String(bytes));
}
}
1. arg[0]的值为:org.apache.hadoop.io.compress.GzipCodec
2. 先创建一个压缩输出流out,向输出流写数据(try compress and decompress),然后关闭输出流。这时压缩好到数据被放到text文件里。
3. 从text文件获取input流进行解压,然后输出。输出结果如下。有些警告信息,目前还不太清楚原因。
