最近研究hadoop mapreduce,总结一下研究的结果。
eclipse中写一个简单的mapreduce程序:
1.首先配置eclipse 与 hadoop,其配置方式如下:
http://www.cnblogs.com/kinglau/p/3802705.html
配置完后就可以开始创建mapreduce程序了。
2.创建MapReduce Project工程。并准备输入文件。
输入文件:
file1: hello word file2: hello
hadoop word good
welcome good
3.创建Mapper Class,继承Mapper<Object, Text, Text, IntWritable>,并重写map(Object key, Text value, Context context)方法。
public class HadoopMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
4.创建Reduce Class,继承Reducer<Text,IntWritable,Text,IntWritable>,并重写reduce(Text key, Iterable<IntWritable> values, Context context)方法。
public class HadoopReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values){
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
5.创建MapReduce Drver Class。
public class HadoopMain {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs=new GenericOptionsParser(conf,args).getRemainingArgs();
for(String s: otherArgs){
System.out.println(s);
}
if (otherArgs.length != 2){
System.out.println("Usage:wordcount <in> <out>");
System.exit(2);
}
Job job=new Job(conf,"MaxPara1");
job.setJarByClass(HadoopMatlabMain.class);
job.setMapperClass(HadoopMatlabMapper.class);
job.setCombinerClass(HadoopMatlabReducer.class);
job.setReducerClass(HadoopMatlabReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0:1);
}
}
执行此mapreduce程序的过程如下:
执行流程:
排序过程:
Map执行过程: Reduce执行过程
第一步 第二步 (执行Combine过程) 第一步 第二步 (执行Reduce方法)
file1: hello word <hadoop,1> <hadoop,1> <good,Hst(2)> <good,2>
hadoop <hello,1> <hello,1> <hello,Hst(1,1)> <hello,2>
welcome <welcome,1> <welcome,1> <welcome,Hst(1)> <welcome,1>
<word,1> <word,1> <word,Hst(1,1)> <word,1>
file2: hello <good,1> <good,2>
word good <good,1> <hello,1>
good <hello,1> <word,1>
<word,1>
在map执行过程中会自动排序。
最后输出结果为
good 2
hello 2
welcome 1
word 1