map类,实现map函数
public static class MyMapper extends Mapper<Object, Text, Text, IntWritable>{
private static final IntWritable one=new IntWritable(1);
private static Text word = new Text();
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer str = new StringTokenizer(value.toString());
while(str.hasMoreTokens()){
word.set(str.nextToken());
context.write(word,one);
}
}
}
reduce类,实现reduce函数
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private static IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable val:values){
sum +=val.get();
}
result.set(sum);
context.write(key, result);
}
}
启动mr的driver方法
public static void main(String[] args) throws Exception {
Configuration hadoopConf = new Configuration();
Job job = Job.getInstance(hadoopConf,"wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)? 0:1);
}
本文详细介绍了一种使用MapReduce框架进行词频统计的方法。通过自定义Mapper和Reducer类,文章展示了如何读取大量文本数据,将每个单词映射为键值对,再通过Reduce函数聚合相同单词的出现次数,最终输出每个单词及其频率。
1101

被折叠的 条评论
为什么被折叠?



