概述
- 作用
- 应用场景
- 示例
- 将hdfs中的文件copy到本地map/reduce程序端,供map/reduce端代码使用
应用场景
- 大文件与小文件合并操作,如大文件10G,小文件10M,并且输入格式可以完全不一样
- 主函数端代码
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); job.getConfiguration().set("xyz", "fileHdfsLocation"); }
- map或reduce类端
public static class LogMapper extends Mapper<Object, LongWritable, xxx, xxx> { private static HashSet<String> smallCollection = null; protected void setup(Context context) throws IOException, InterruptedException { smallCollection = new HashSet<String>(); Path fileIn = new Path(context.getConfiguration().get("xyz")); FileSystem hdfs = fileIn.getFileSystem(context.getConfiguration()); FSDataInputStream hdfsReader = hdfs.open(fileIn); Text line = new Text(); LineReader lineReader = new LineReader(hdfsReader); while (lineReader.readLine(line) > 0) { //you can do something here System.out.println(line.toString()); smallCollection.add(line.toString()); } lineReader.close(); hdfsReader.close(); } public void map(Object key, Text value, Context context) throws IOException, InterruptedException { // use this Hashset } }