MapReduce元模式之并行作业链

博客围绕MapReduce元模式之并行作业链展开,介绍其驱动程序与基本作业链相似,作业以并行方式提交并被监控。通过具体示例,阐述了并行计算每个箱中用户平均声望的过程,包括问题描述、样例输入输出,以及mapper、reducer阶段任务和编码,还有驱动程序的工作流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MapReduce元模式之并行作业链

并行作业链的驱动程序和MapReduce元模式之基本作业链和相似,唯一的大改进就是作业以并行的方式提交,然后一直被监控,直到完成

并行作业链示例

问题描述

给定已经分好箱的用户,包含用户ID,用户声望和发表评论的次数,并行执行作业计算每个箱中用户的平均声望

样例输入

创建数据集的代码如下:


import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class create {
    public static void main(String[] args) throws IOException {
        String path="input/aboveAvgInput.txt";
        File file=new File(path);
        if(!file.exists()){
            file.getParentFile().mkdirs();
        }
        file.createNewFile();
        FileWriter fw=new FileWriter(file,true);
        BufferedWriter bw=new BufferedWriter(fw);

        for(int i=0;i<5000;i++){
            int id=(int)(Math.random()*1000+1000);
            bw.write("UserId="+id+" reputation="+(int)(Math.random()*5000+3000)+" total_posts="+(int)(Math.random()*300)+"\n");
        }
        bw.flush();
        bw.close();
        fw.close();
    }
}

执行结果如下

样例输出
mapper阶段任务

mapper将输出值拆分成一个字符串数组,所有map任务通过共享一个键来将用户的声望值聚合起来计算平均声望

mapper阶段编码
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.Map;

public class AverageReuptationMapper extends Mapper<LongWritable,Text, Text, DoubleWritable> {
    private static final Text GROUP_ALL_KEY=new Text("Average Reputation:");
    private DoubleWritable outValue=new DoubleWritable();
    public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
        String line=value.toString();
        Map<String,String> parsed= MRDPUtil.transInformation(line);
        System.out.println(parsed.get("reputation"));
        double reputation=Double.parseDouble(parsed.get("reputation"));
        outValue.set(reputation);
        context.write(GROUP_ALL_KEY,outValue);
    }
}
reducer阶段任务

reducer简单地迭代遍历所有的声望值,计算声望值的个数及总声望值,然后和输出键一起输出

reducer阶段编码
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class AverageReputationReducer extends Reducer<Text, DoubleWritable,Text,DoubleWritable> {
    private DoubleWritable outValue=new DoubleWritable();
    protected void reduce(Text key,Iterable<DoubleWritable> values,Context context) throws IOException,InterruptedException{
        double sum=0.0;
        double count=0;
        for(DoubleWritable dw:values){
            sum+=dw.get();
            count++;
        }
        outValue.set(sum/count);
        context.write(new Text(key),outValue);
    }
}
驱动程序

驱动程序代码通过解析命令行获取作业的的输入和输出目录,接着调用辅助函数提交作业配置,然后两个作业的Job对象返回,并监视作业是否完成,只要一个任意一个作业在执行,驱动程序就再休眠5秒,一旦两个作业全部完成,检查作业是成功还是失败


import com.deng.FileUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class Driver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        FileSystem.get(conf).delete(new Path("output"),true);

        Path belowAvgInputDir=new Path("input/belowAvgInput.txt");
        Path belowAvgOutputDir=new Path("output/belowAvgOut");
        Path aboveAvgInputDir=new Path("input/aboveAvgInput.txt");
        Path aboveAvgOutputDir=new Path("output/aboveAvgOutput");
        Job belowAvgJob=submitJob(conf,belowAvgInputDir,belowAvgOutputDir);
        Job aboveAvgJob=submitJob(conf,aboveAvgInputDir,aboveAvgOutputDir);
        while(!belowAvgJob.isComplete()||!aboveAvgJob.isComplete()){
            Thread.sleep(5000);
        }
        if(belowAvgJob.isSuccessful()){
            System.out.println("Below average job completed successfully!");
        }else{
            System.out.println("Below average job failed!");
        }
        if(aboveAvgJob.isSuccessful()){
            System.out.println("Above average job completed successfully!");
        }else{
            System.out.println("Above average job failed!");
        }
        System.exit(belowAvgJob.isSuccessful()&&aboveAvgJob.isSuccessful()?0:1);
    }

    public static Job submitJob(Configuration conf,Path inputDir,Path outputDir) throws Exception{
        Job job=new Job(conf,"Driver");
        job.setJarByClass(Driver.class);
        job.setMapperClass(AverageReuptationMapper.class);
        job.setReducerClass(AverageReputationReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        FileInputFormat.addInputPath(job,inputDir);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileOutputFormat.setOutputPath(job,outputDir);
        job.submit();
        return job;
    }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值