MapReduce元模式之并行作业链
并行作业链的驱动程序和MapReduce元模式之基本作业链和相似,唯一的大改进就是作业以并行的方式提交,然后一直被监控,直到完成
并行作业链示例
问题描述
给定已经分好箱的用户,包含用户ID,用户声望和发表评论的次数,并行执行作业计算每个箱中用户的平均声望
样例输入
创建数据集的代码如下:
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class create {
public static void main(String[] args) throws IOException {
String path="input/aboveAvgInput.txt";
File file=new File(path);
if(!file.exists()){
file.getParentFile().mkdirs();
}
file.createNewFile();
FileWriter fw=new FileWriter(file,true);
BufferedWriter bw=new BufferedWriter(fw);
for(int i=0;i<5000;i++){
int id=(int)(Math.random()*1000+1000);
bw.write("UserId="+id+" reputation="+(int)(Math.random()*5000+3000)+" total_posts="+(int)(Math.random()*300)+"\n");
}
bw.flush();
bw.close();
fw.close();
}
}
执行结果如下
样例输出
mapper阶段任务
mapper将输出值拆分成一个字符串数组,所有map任务通过共享一个键来将用户的声望值聚合起来计算平均声望
mapper阶段编码
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.Map;
public class AverageReuptationMapper extends Mapper<LongWritable,Text, Text, DoubleWritable> {
private static final Text GROUP_ALL_KEY=new Text("Average Reputation:");
private DoubleWritable outValue=new DoubleWritable();
public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
String line=value.toString();
Map<String,String> parsed= MRDPUtil.transInformation(line);
System.out.println(parsed.get("reputation"));
double reputation=Double.parseDouble(parsed.get("reputation"));
outValue.set(reputation);
context.write(GROUP_ALL_KEY,outValue);
}
}
reducer阶段任务
reducer简单地迭代遍历所有的声望值,计算声望值的个数及总声望值,然后和输出键一起输出
reducer阶段编码
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class AverageReputationReducer extends Reducer<Text, DoubleWritable,Text,DoubleWritable> {
private DoubleWritable outValue=new DoubleWritable();
protected void reduce(Text key,Iterable<DoubleWritable> values,Context context) throws IOException,InterruptedException{
double sum=0.0;
double count=0;
for(DoubleWritable dw:values){
sum+=dw.get();
count++;
}
outValue.set(sum/count);
context.write(new Text(key),outValue);
}
}
驱动程序
驱动程序代码通过解析命令行获取作业的的输入和输出目录,接着调用辅助函数提交作业配置,然后两个作业的Job对象返回,并监视作业是否完成,只要一个任意一个作业在执行,驱动程序就再休眠5秒,一旦两个作业全部完成,检查作业是成功还是失败
import com.deng.FileUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Driver {
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
FileSystem.get(conf).delete(new Path("output"),true);
Path belowAvgInputDir=new Path("input/belowAvgInput.txt");
Path belowAvgOutputDir=new Path("output/belowAvgOut");
Path aboveAvgInputDir=new Path("input/aboveAvgInput.txt");
Path aboveAvgOutputDir=new Path("output/aboveAvgOutput");
Job belowAvgJob=submitJob(conf,belowAvgInputDir,belowAvgOutputDir);
Job aboveAvgJob=submitJob(conf,aboveAvgInputDir,aboveAvgOutputDir);
while(!belowAvgJob.isComplete()||!aboveAvgJob.isComplete()){
Thread.sleep(5000);
}
if(belowAvgJob.isSuccessful()){
System.out.println("Below average job completed successfully!");
}else{
System.out.println("Below average job failed!");
}
if(aboveAvgJob.isSuccessful()){
System.out.println("Above average job completed successfully!");
}else{
System.out.println("Above average job failed!");
}
System.exit(belowAvgJob.isSuccessful()&&aboveAvgJob.isSuccessful()?0:1);
}
public static Job submitJob(Configuration conf,Path inputDir,Path outputDir) throws Exception{
Job job=new Job(conf,"Driver");
job.setJarByClass(Driver.class);
job.setMapperClass(AverageReuptationMapper.class);
job.setReducerClass(AverageReputationReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job,inputDir);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job,outputDir);
job.submit();
return job;
}
}