Ways to write & read HDFS files
- Output Stream
FSDataOutputStream dos = fs.create(new Path("/user/tmp"), true);
dos.writeInt(counter);
dos.close();
- Buffered Writer/Reader
//Writer
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fs.create(new Path("/user/tmp"), true)));
bw.write(counter.toString());
bw.close();
//Reader
Configuration conf = context.getConfiguration();
FileSystem fs = FileSystem.get(conf);
DataInputStream d = new DataInputStream(fs.open(new Path(inFile)));
BufferedReader reader = new BufferedReader(new InputStreamReader(d));
while ((line = reader.readLine()) != null){
...
}
reader.close();
- SequenceFile Reader and Writer (I think most preferable way for Hadoop jobs):
//writer
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, new Path(pathForCounters, context.getTaskAttemptID().toString()), Text.class, Text.class);
writer.append(new Text(firtUrl.toString()+"__"+ context.getTaskAttemptID().getTaskID().toString()), new Text(counter+""));
writer.close();
//reader
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(makeUUrlFileOffsetsPathName(FileInputFormat.getInputPaths(context)[0].toString())), conf);
while (reader.next(key, val)){
offsets.put(key.toString(), Integer.parseInt(val.toString()));
}

本文介绍了三种在Hadoop中操作HDFS文件的方法:使用FSDataOutputStream直接进行读写、利用BufferedReader/Writer进行文本处理及SequenceFile进行高效的数据序列化读写。这些方法适用于不同的场景需求。
1978

被折叠的 条评论
为什么被折叠?



