MapReduce的顺序式依赖关系组合式链式

最新推荐文章于 2022-02-10 22:25:57 发布

genghaihua

最新推荐文章于 2022-02-10 22:25:57 发布

阅读量2.1k

点赞数

分类专栏： hadoop

hadoop 专栏收录该内容

15 篇文章

订阅专栏

本文详细介绍了MapReduce的三种执行方式：顺序式、依赖关系组合式和链式，包括配置实例、工作流程和代码实现，旨在帮助开发者深入理解MapReduce的工作原理和应用场景。

一、顺序式

执行完一个mapreduce，再执行一个mapreduce

configuration conf1 = new configuration();
conf1.set("mapred.job.tracker", "192.168.1.164:9001");
string[] ars=new string[]{"t2g_input","t2g_output1"};
string[] otherargs = new genericoptionsparser(conf1, ars).getremainingargs();
if (otherargs.length != 2) {
　　system.err.println("usage: word count  ");
　　system.exit(2);
}
job job1 = new job(conf1, "job1");
job1.setjarbyclass(t2g.class);
job1.setmapperclass(map.class);
job1.setreducerclass(reduce.class);
job1.setoutputkeyclass(text.class);
job1.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job1, new path(otherargs[0]));
fileoutputformat.setoutputpath(job1, new path(otherargs[1]));
job1.waitforcompletion(true);
	    
//sub mapreduce
	    
configuration conf2 = new configuration();
conf2.set("mapred.job.tracker", "192.168.1.164:9001");
ars=new string[]{"t2g_output1","t2g_output2"};
otherargs = new genericoptionsparser(conf2, ars).getremainingargs();
if (otherargs.length != 2) {
　　system.err.println("usage: word count  ");
　　system.exit(2);
}
job job2 = new job(conf2, "job2");
job2.setjarbyclass(t2g.class);
job2.setmapperclass(map2.class);
job2.setreducerclass(reduce2.class);
job2.setoutputkeyclass(text.class);
job2.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job2, new path(otherargs[0]));
fileoutputformat.setoutputpath(job2, new path(otherargs[1]));
job2.waitforcompletion(true);

二、依赖关系组合式

一个mapreduce有3个子任务job1，job2，job3构成，其中job1和job2相互独立，job3要在job1和job2完成之后才执行。这种关系就叫复杂数据依赖关系的组合式mapreduce。hadoop为这种组合关系提供了一种执行和控制机制，hadoop通过job和jobcontrol类提供具体的编程开发方法。job除了维护子任务的配置信息，还维护子任务的依赖关系，而jobcontrol控制整个作业流程，把所有的子任务作业加入到jobcontrol中，执行jobcontrol的run()开发方法即可运行程序。

configuration job1conf = new configuration();
job job1 = new job(job1conf,"job1");
.........//job1 其他设置
configuration job2conf = new configuration();
job job2 = new job(job2conf,"job2");
.........//job2 其他设置
configuration job3conf = new configuration();
job job3 = new job(job3conf,"job3");
.........//job3 其他设置
job3.adddepending(job1);//设置job3和job1的依赖关系
job3.adddepending(job2);
jobcontrol jc = new jobcontrol("123");
jc.addjob(job1);//把三个job加入到jobcontorl中
jc.addjob(job2);
jc.addjob(job3);
jc.run();

三、链式

首先看一下例子，来说明为什么要有链式mapreduce，假设在统计单词是，会出现这样的词，make，made，making等，他们都属于一个词，在单词累加的时候，都归于一个词。解决的开发方法为用一个单独的mapreduce任务可以实现，单增加了多个mapreduce作业，将增加整个作业处理的周期，还增加了i/o操作，因而处理效率不高。

一个较好的办法就是在核心的mapreduce之外，增加一个辅助的map开发过程，然后将这个辅助的map开发过程和核心的mapreudce开发过程合并为一个链式的mapreduce，从而完成整个作业。hadoop提供了专门的链式chainmapper和chainreducer来处理链式任务，chainmapper允许一个map任务中添加多个map的子任务，chainreducer可以在reducer执行之后，在加入多个map的子任务。其调用形式如下：

chainmapper.addmapper(...);
    chainreducer.addmapper(...);
    //addmapper()调用的开发方法 形式如下：
    public static void addmapper(job job,
            class<? extends mapper> mclass,
            class<?> inputkeyclass,
            class<?> inputvalueclass,
            class<?> outputkeyclass,
            class<?> outputvalueclass,
            configuration conf
    ){
    }

其中，chainreducer专门提供了一个setrreducer()开发方法来设置整个作业唯一的reducer。

note：这些mapper和reducer之间传递的键和值都必须保持一致。

下面举个例子：用chainmapper把map1加如并执行，然后用chainreducer把reduce和map2加入到reduce开发过程中。代码如下：map1.class 要实现map开发方法

public void function throws ioexception {
        configuration conf = new configuration();
        job job = new job(conf);
        job.setjobname("chianjob");
        // 在chainmapper里面添加map1
        configuration map1conf = new configuration(false);
        chainmapper.addmapper(job, map1.class, longwritable.class, text.class,
                text.class, text.class, true, map1conf);
        // 在chainreduce中加入reducer，map2；
        configuration reduceconf = new configuration(false);
        chainreducer.setreducer(job, reduce.class, longwritable.class,
                text.class, text.class, text.class, true, map1conf);
        configuration map2conf = new configuration();
        chainreducer.addmapper(job, map2.class, longwritable.class, text.class,
                text.class, text.class, true, map1conf);
        job.waitforcompletion(true);
    }

the chainmapper class allows to use multiple mapper classes within a single map task.

the chainreducer class allows to chain multiple mapper classes after a reducer within the reducer task.

通过chainmapper可以将多个map类合并成一个map任务。

下面个这个例子没

此文来自: 马开东博客转载请注明出处网址： http://www.makaidong.com

什么实际意思，但是很好的演示了chainmapper的作用。

源文件
100 tom 90
101 mary 85
102 kate 60

map00的结果，过滤掉100的记录
101 mary 85
102 kate 60

map01的结果，过滤掉101的记录
102 kate 60

reduce结果
102 kate 60

import java.io.ioexception;
import java.util.*;
import java.lang.string;

import org.apache.hadoop.fs.path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.lib.*;

public class word count
{

    public static class map00 extends mapreducebase implements mapper
    {

        public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
        {

            text ft = new text(“100′);

            if(!key.equals(ft))
            {
                output.collect(key, value);
            }
        }
    }

    public static class map01 extends mapreducebase implements mapper
    {

        public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
        {

            text ft = new text(“101′);

            if(!key.equals(ft))
            {
                output.collect(key, value);
            }
        }
    }

    public static class reduce extends mapreducebase implements reducer
    {
        public void reduce(text key, iterator values, outputcollector output, reporter reporter) throws ioexception
        {

            while(values.hasnext())
            {
                output.collect(key, values.next());
            }

        }
    }

    public static void main(string[] args) throws exception
    {

        jobconf conf = new jobconf(word count.class);
        conf.setjobname(“word count00′);

        conf.setinputformat(keyvaluetextinputformat.class);
        conf.setoutputformat(textoutputformat.class);

        chainmapper cm = new chainmapper();

        jobconf mapaconf = new jobconf(false);
        cm.addmapper(conf, map00.class, text.class, text.class, text.class, text.class, true, mapaconf);

        jobconf mapbconf = new jobconf(false);
        cm.addmapper(conf, map01.class, text.class, text.class, text.class, text.class, true, mapbconf);

        conf.setreducerclass(reduce.class);

        conf00.setoutputkeyclass(text.class);
        conf00.setoutputvalueclass(text.class);

        fileinputformat.setinputpaths(conf, new path(args[0]));
        fileoutputformat.setoutputpath(conf, new path(args[1]));

        jobclient.runjob(conf);

    }