MapReduce元模式之并行作业链

最新推荐文章于 2021-10-26 19:12:51 发布

路人张的鱼生

最新推荐文章于 2021-10-26 19:12:51 发布

阅读量257

点赞数

CC 4.0 BY-SA版权

分类专栏： MapReduce 文章标签：大数据 MapReduce

本文链接：https://blog.youkuaiyun.com/zhangdy12307/article/details/93472915

MapReduce 专栏收录该内容

41 篇文章

订阅专栏

博客围绕MapReduce元模式之并行作业链展开，介绍其驱动程序与基本作业链相似，作业以并行方式提交并被监控。通过具体示例，阐述了并行计算每个箱中用户平均声望的过程，包括问题描述、样例输入输出，以及mapper、reducer阶段任务和编码，还有驱动程序的工作流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

MapReduce元模式之并行作业链

并行作业链的驱动程序和MapReduce元模式之基本作业链和相似，唯一的大改进就是作业以并行的方式提交，然后一直被监控，直到完成

并行作业链示例

问题描述

给定已经分好箱的用户，包含用户ID，用户声望和发表评论的次数，并行执行作业计算每个箱中用户的平均声望

样例输入

创建数据集的代码如下：


import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class create {
    public static void main(String[] args) throws IOException {
        String path="input/aboveAvgInput.txt";
        File file=new File(path);
        if(!file.exists()){
            file.getParentFile().mkdirs();
        }
        file.createNewFile();
        FileWriter fw=new FileWriter(file,true);
        BufferedWriter bw=new BufferedWriter(fw);

        for(int i=0;i<5000;i++){
            int id=(int)(Math.random()*1000+1000);
            bw.write("UserId="+id+" reputation="+(int)(Math.random()*5000+3000)+" total_posts="+(int)(Math.random()*300)+"\n");
        }
        bw.flush();
        bw.close();
        fw.close();
    }
}

执行结果如下

样例输出

mapper阶段任务

mapper将输出值拆分成一个字符串数组，所有map任务通过共享一个键来将用户的声望值聚合起来计算平均声望

mapper阶段编码

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.Map;

public class AverageReuptationMapper extends Mapper<LongWritable,Text, Text, DoubleWritable> {
    private static final Text GROUP_ALL_KEY=new Text("Average Reputation:");
    private DoubleWritable outValue=new DoubleWritable();
    public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
        String line=value.toString();
        Map<String,String> parsed= MRDPUtil.transInformation(line);
        System.out.println(parsed.get("reputation"));
        double reputation=Double.parseDouble(parsed.get("reputation"));
        outValue.set(reputation);
        context.write(GROUP_ALL_KEY,outValue);
    }
}

reducer阶段任务

reducer简单地迭代遍历所有的声望值，计算声望值的个数及总声望值，然后和输出键一起输出

reducer阶段编码

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class AverageReputationReducer extends Reducer<Text, DoubleWritable,Text,DoubleWritable> {
    private DoubleWritable outValue=new DoubleWritable();
    protected void reduce(Text key,Iterable<DoubleWritable> values,Context context) throws IOException,InterruptedException{
        double sum=0.0;
        double count=0;
        for(DoubleWritable dw:values){
            sum+=dw.get();
            count++;
        }
        outValue.set(sum/count);
        context.write(new Text(key),outValue);
    }
}

驱动程序

驱动程序代码通过解析命令行获取作业的的输入和输出目录，接着调用辅助函数提交作业配置，然后两个作业的Job对象返回，并监视作业是否完成，只要一个任意一个作业在执行，驱动程序就再休眠5秒，一旦两个作业全部完成，检查作业是成功还是失败


import com.deng.FileUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class Driver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        FileSystem.get(conf).delete(new Path("output"),true);

        Path belowAvgInputDir=new Path("input/belowAvgInput.txt");
        Path belowAvgOutputDir=new Path("output/belowAvgOut");
        Path aboveAvgInputDir=new Path("input/aboveAvgInput.txt");
        Path aboveAvgOutputDir=new Path("output/aboveAvgOutput");
        Job belowAvgJob=submitJob(conf,belowAvgInputDir,belowAvgOutputDir);
        Job aboveAvgJob=submitJob(conf,aboveAvgInputDir,aboveAvgOutputDir);
        while(!belowAvgJob.isComplete()||!aboveAvgJob.isComplete()){
            Thread.sleep(5000);
        }
        if(belowAvgJob.isSuccessful()){
            System.out.println("Below average job completed successfully!");
        }else{
            System.out.println("Below average job failed!");
        }
        if(aboveAvgJob.isSuccessful()){
            System.out.println("Above average job completed successfully!");
        }else{
            System.out.println("Above average job failed!");
        }
        System.exit(belowAvgJob.isSuccessful()&&aboveAvgJob.isSuccessful()?0:1);
    }

    public static Job submitJob(Configuration conf,Path inputDir,Path outputDir) throws Exception{
        Job job=new Job(conf,"Driver");
        job.setJarByClass(Driver.class);
        job.setMapperClass(AverageReuptationMapper.class);
        job.setReducerClass(AverageReputationReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        FileInputFormat.addInputPath(job,inputDir);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileOutputFormat.setOutputPath(job,outputDir);
        job.submit();
        return job;
    }
}