1、先看一个标准的hbase作为数据读取源和输出源的样例:
1
2
3
4
5
6
7
8
| Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "job name ");
job.setJarByClass(test.class);
Scan scan = new Scan(); //---------------------------------map----------------------------------------------
TableMapReduceUtil.initTableMapperJob(inputTable, scan, mapper.class,Writable.class, Writable.class, job); //---------------------------------reduce---------------------------------------------- TableMapReduceUtil.initTableReducerJob(outputTable, reducer.class, job); job.waitForCompletion(true); |
接下来就是mapper类:
1
2
3
4
5
6
7
8
9
10
11
| public class mapper extends TableMapper<KEYOUT, VALUEOUT> {
public void map(Writable key, Writable value, Context context) throws IOException, InterruptedException {
//mapper逻辑
context.write(key, value);
}
}
} |
然后reducer类:
1
2
3
4
5
6
7
8
| public class countUniteRedcuer extends TableReducer<KEYIN, VALUEIN, KEYOUT> {
public void reduce(Text key, Iterable<VALUEIN> values, Context context) throws IOException, InterruptedException {
//reducer逻辑
context.write(null, put or delete);
}
} |
======================类型2===============================
2、有时候我们需要数据源是hdfs的文本,输出对象是hbase。这时候变化也很简单:
1
2
3
4
5
6
7
8
9
10
11
| Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "job name ");
job.setJarByClass(test.class);
job.setMapperClass(mapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
FileInputFormat.setInputPaths(job, path);
TableMapReduceUtil.initTableReducerJob(tableName,reducer.class, job); |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| public class mapper extends Mapper<LongWritable,Writable,Writable,Writable> {
public void map(LongWritable key, Text line, Context context) {
//mapper逻辑
context.write(k, one);
}
}
public class redcuer extends
TableReducer<KEYIN, VALUEIN, KEYOUT> {
public void reduce(Writable key, Iterable<Writable> values, Context context)
throws IOException, InterruptedException {
//reducer逻辑
context.write(null, put or delete);
}
} |
======================类型3===============================
3、最后就是从hbase中的表作为数据源读取,hdfs作为数据输出,简单的如下:
1
2
3
4
5
6
7
8
9
10
| Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "job name ");
job.setJarByClass(test.class);
Scan scan = new Scan();
TableMapReduceUtil.initTableMapperJob(inputTable, scan, mapper.class, Writable.class, Writable.class, job);
job.setOutputKeyClass(Writable.class);
job.setOutputValueClass(Writable.class);
FileOutputFormat.setOutputPath(job, Path);
job.waitForCompletion(true); |
mapper和reducer简单如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| public class mapper extends
TableMapper<KEYOUT, VALUEOUT>{
public void map(Writable key, Writable value, Context context)
throws IOException, InterruptedException {
//mapper逻辑
context.write(key, value);
}
}
}
public class reducer extends
Reducer<Writable,Writable,Writable,Writable> {
public void reducer(Writable key, Writable value, Context context)
throws IOException, InterruptedException {
//reducer逻辑
context.write(key, value);
}
}
} |
最后说一下TableMapper和TableReducer的本质,其实这俩类就是为了简化一下书写代码,因为传入的4个泛型参数里都会有固定的参数类型,所以是Mapper和Reducer的简化版本,本质他们没有任何区别。源码如下:
1
2
3
4
5
6
7
| public abstract class TableMapper<KEYOUT, VALUEOUT>
extends Mapper<ImmutableBytesWritable, Result, KEYOUT, VALUEOUT> {
}
public abstract class TableReducer<KEYIN, VALUEIN, KEYOUT>
extends Reducer<KEYIN, VALUEIN, KEYOUT, Writable> {
}
|