hadoop-0.20.2使用了一套新的API,网上的示例一般是使用MapReduceBase的,例如官方的tutorial
http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html
新的API下应该使用这样的类
static class Mapper extends
Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
try {
System.out.println(this.getClass()
+ " Mapper running on "
+ InetAddress.getLocalHost().getHostName());
} catch (UnknownHostException e) { // no way this is going to happen
}
context.write(new ImmutableBytesWritable(value.getBytes()), put);
}
}
这样提交任务,这里还初始化了Hbase的reduce任务(将结果写入到Hbase中,代码片段里没有显示)
public Job createJob() throws Exception {
// create job
Job job = new Job();
FileInputFormat.setInputPaths(job, "hdfs://host:port/....");
job.setMapperClass(Mapper.class);
job.setOutputFormatClass(NullOutputFormat.class);
TableMapReduceUtil.initTableReducerJob("table_name", null, job);
return job;
}
//这样在context中write即可写入value
context.write(new ImmutableBytesWritable(row), new IntWritable(
values.list().size()));
Reducer长这个模样
static class Reducer
extends
TableReducer<ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key,
Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
try {
System.out.println(this.getClass()
+ " Reducer running on "
+ InetAddress.getLocalHost().getHostName());
} catch (UnknownHostException e) { // no way this is going to happen
}
}
}
基本上hdfs和hbase的模式没什么区别,下面分别是初始化Map和Reduce任务的片段
TableMapReduceUtil.initTableMapperJob("table_name", new Scan(),//创建一个新扫描
Mapper.class, ImmutableBytesWritable.class,
IntWritable.class, job);
// catch the result, maybe put them back into hbase later
TableMapReduceUtil.initTableReducerJob("table_name",
educer.class, job);