在MapReduce中,Combiner的实现主要是继承Reducer,跟reduce的实现是一样的。
public class CombinerData extends Reducer<Text,Text,Text,Text>{
protected void reduce(Text key,Iterable<Text> values,Context context) throws Exception{
Iterator<Text> iter = values.iterator();
//接下来是对key所对应的值得处理
while(iter.hashNext()){
Text va = iter.next();
context.write(key,va);
}
}
}
这样的话,就在当前节点上,对相同key的值进行了合并。
Reduce端:
public class DataReduce extends Reducer<Text,Text,Text,Text>{
protected void reduce(Text key,Iterable<Text> values,Context context) throws Exception{
Iterator<Text> iter = values.iterator();
//接下来是对key所对应的值得处理
while(iter.hashNext()){
Text va = iter.next();
context.write(key,va);
}
}
}
也就是说Combiner和reduce的函数是一样的,前提是map的输出和reduce的输出类型是一样的。
Partition的过程:
partition主要是继承了Partitioner类,重写了getPartition方法.
public class DataPartition extends Partitioner<Text,Text>{
public int getPartition(Text key,Text value,int i){
//Key是map输出的key,value是map输出的value, i是reduce的个数
//如果key是一个字符串,可以通过根据首个字符来进行设置分区
String keystr = key.toString();
String s = keystr.substring(0,1);
int num = 0;
if(s.equals("a")){
num = 0%i;
}
return num;
}
}
接下来是RunJob类的设置:
public class RunJob{
public static void main(Stirng [] args){
Configuration conf= new Configuration();
//设置任务运行的队列
conf.set("mapred.job.queue.name","队列名字");
//设置数据以|分隔
conf.set("mapred.textoutputformat.ignoreseparator","true");
conf.set("mapred.textoutputformat.separator","|")
Job job = Job.getInstance(conf,"four");
job.setJarByClass(RunJob.class);
job.setMapperClass(DataMapper.class)
job.setMapOutputkeyclass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(6);
job.setReducerClass(DataReduce.class);
job.setPartitionerClass(DataPartition.class);
job.setCombinerClass(DataCombiner.class);
job.setOutputkeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job,new Path("输入路径"));
FileOutputFormat.setOutputPath(job,new Path("输出路径"));
if(job.waitForCompletion(true)){
System.out.println(job.waitForCompletion(true)?0:1)
}
}
}
每次运行前删除输出目录:
public static void deleteFile(Path path,Configuration conf){
FileSystem fs = path.getFileSystem(conf);
if(fs.exits(path)){
fs.delete(path,true);
}
}