目录
一、FileInputFormat实现类

二、常见实现类的介绍


三、KeyValueTextInputFormat使用案例
1、需求
统计输入文件中每一行的第一个单词相同的行数
(1)输入数据:
Hello hi good morning
Please
Hello bye quit
Fox panda mouse
Please give me a book
(2)期望输出结果
Fox 1
Hello 2
Please 2
2、需求分析

3、代码实现
(1)建包
在src/main/java下新建包com.wolf.mr.kv
(2)编写KVTextMapper.java
新建KVTextMapper.java,写入以下内容:
package com.wolf.mr.kv;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class KVTextMapper extends Mapper<Text,Text,Text, IntWritable> {
IntWritable v = new IntWritable(1);
@Override
protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {
// 1.pack obj
// 2.write out
context.write(key,v);
}
}
(3)编写KVTextReducer.java
新建KVTextReducer.java,写入以下内容:
package com.wolf.mr.kv;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class KVTextReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
IntWritable v = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
// 1. sum
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
v.set(sum);
// 2. write out
context.write(key,v);
}
}
(4)编写KVTextDriver.java
package com.wolf.mr.kv;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class KVTextDriver {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR," ");
// 1. get job
Job job = Job.getInstance(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);
// 2. set jar path
job.setJarByClass(KVTextDriver.class);
// 3. link mapper and reducer
job.setMapperClass(KVTextMapper.class);
job.setReducerClass(KVTextReducer.class);
// 4. set mapper type key value
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 5 set final type
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 6. path
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7. submit job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
这里需要注意的就是这两行:
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR," ");
job.setInputFormatClass(KeyValueTextInputFormat.class);
(5)设置输入输出参数
/home/wolf/kv.txt /home/wolf/output/out_KVText
(6)运行

(7)查看结果

可以看到,成功完成了需求。
四、自定义InputFormat
虽然MapReduce已经提供了三种InputFormat实现方法,但是仍然无法满足全部的应用场景,因此,我们常常需要自定义InputFormat,以满足各种各样的要求。

自定义InputFormat实操
这里先挖个坑,以后再回来学。
本文详细介绍了HadoopMapReduce中的FileInputFormat和KeyValueTextInputFormat的使用,包括一个统计文本文件中每个单词出现次数的案例,并提到如何自定义InputFormat以满足更多需求。
760

被折叠的 条评论
为什么被折叠?



