码上代码:
建立测试环境:
创建seq 序列化文件:
/**
* 写操作
*/
@Test
public void zipGzip() throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","file:///");
FileSystem fs = FileSystem.get(conf);
Path p = new Path("d:/seq/1.seq") ;
SequenceFile.Writer writer = SequenceFile.createWriter(fs,
conf,
p,
IntWritable.class,
Text.class,
SequenceFile.CompressionType.BLOCK,
new GzipCodec());
for(int i = 0 ; i < 10 ; i ++){
writer.append(new IntWritable(i),new Text("tom" + i));
//添加一个同步点
writer.sync();
}
for(int i = 0 ; i < 10 ; i ++){
writer.append(new IntWritable(i),new Text("tom" + i));
if(i % 2 == 0){
writer.sync();
}
}
writer.close();
}
写文本文件:
在txt下建立1.txt 2.txt
运行:
19/01/16 10:25:52 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/01/16 10:25:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/01/16 10:25:54 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/16 10:25:54 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
19/01/16 10:25:54 INFO input.FileInputFormat: Total input paths to process : 1
19/01/16 10:25:54 INFO input.FileInputFormat: Total input paths to process : 2
19/01/16 10:25:54 INFO mapreduce.JobSubmitter: number of splits:3
19/01/16 10:25:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1578362493_0001
19/01/16 10:25:55 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/01/16 10:25:55 INFO mapreduce.Job: Running job: job_local1578362493_0001
19/01/16 10:25:55 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/01/16 10:25:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/16 10:25:55 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/01/16 10:25:55 INFO mapred.LocalJobRunner: Waiting for map tasks
19/01/16 10:25:55 INFO mapred.LocalJobRunner: Starting task: attempt_local1578362493_0001_m_000000_0
19/01/16 10:25:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/01/16 10:25:55 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
19/01/16 10:25:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@68701d3d
19/01/16 10:25:55 INFO mapred.MapTask: Processing split: file:/d:/mr/seq/1.seq:0+928
19/01/16 10:25:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/01/16 10:25:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/01/16 10:25:55 INFO mapred.MapTask: soft limit at 83886080
19/01/16 10:25:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/01/16 10:25:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/01/16 10:25:55 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/01/16 10:25:55 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
19/01/16 10:25:55 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/01/16 10:25:55 INFO mapred.LocalJobRunner:
19/01/16 10:25:55 INFO mapred.MapTask: Starting flush of map output
19/01/16 10:25:55 INFO mapred.MapTask: Spilling map output
19/01/16 10:25:55 INFO mapred.MapTask: bufstart = 0; bufend = 180; bufvoid = 104857600
19/01/16 10:25:55 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214320(104857280); length = 77/6553600
19/01/16 10:25:55 INFO mapred.MapTask: Finished spill 0
19/01/16 10:25:55 INFO mapred.Task: Task:a