MapReduce的顺序式依赖关系组合式链式

本文详细介绍了MapReduce的三种执行方式:顺序式、依赖关系组合式和链式,包括配置实例、工作流程和代码实现,旨在帮助开发者深入理解MapReduce的工作原理和应用场景。

一、顺序式

执行完一个mapreduce,再执行一个mapreduce

configuration conf1 = new configuration();
conf1.set("mapred.job.tracker", "192.168.1.164:9001");
string[] ars=new string[]{"t2g_input","t2g_output1"};
string[] otherargs = new genericoptionsparser(conf1, ars).getremainingargs();
if (otherargs.length != 2) {
  system.err.println("usage: word count  ");
  system.exit(2);
}
job job1 = new job(conf1, "job1");
job1.setjarbyclass(t2g.class);
job1.setmapperclass(map.class);
job1.setreducerclass(reduce.class);
job1.setoutputkeyclass(text.class);
job1.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job1, new path(otherargs[0]));
fileoutputformat.setoutputpath(job1, new path(otherargs[1]));
job1.waitforcompletion(true);
	    
//sub mapreduce
	    
configuration conf2 = new configuration();
conf2.set("mapred.job.tracker", "192.168.1.164:9001");
ars=new string[]{"t2g_output1","t2g_output2"};
otherargs = new genericoptionsparser(conf2, ars).getremainingargs();
if (otherargs.length != 2) {
  system.err.println("usage: word count  ");
  system.exit(2);
}
job job2 = new job(conf2, "job2");
job2.setjarbyclass(t2g.class);
job2.setmapperclass(map2.class);
job2.setreducerclass(reduce2.class);
job2.setoutputkeyclass(text.class);
job2.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job2, new path(otherargs[0]));
fileoutputformat.setoutputpath(job2, new path(otherargs[1]));
job2.waitforcompletion(true);

二、依赖关系组合式

一个mapreduce有3个子任务job1,job2,job3构成,其中job1和job2相互独立,job3要在job1和job2完成之后才执行。这种关系就叫复杂数据依赖关系的组合式mapreduce。hadoop为这种组合关系提供了一种执行和控制机制,hadoop通过job和jobcontrol类提供具体的编程开发方法 。job除了维护子任务的配置信息,还维护子任务的依赖关系,而jobcontrol控制整个作业流程,把所有的子任务作业加入到jobcontrol中,执行jobcontrol的run()开发方法 即可运行程序。


configuration job1conf = new configuration();
job job1 = new job(job1conf,"job1");
.........//job1 其他设置
configuration job2conf = new configuration();
job job2 = new job(job2conf,"job2");
.........//job2 其他设置
configuration job3conf = new configuration();
job job3 = new job(job3conf,"job3");
.........//job3 其他设置
job3.adddepending(job1);//设置job3和job1的依赖关系
job3.adddepending(job2);
jobcontrol jc = new jobcontrol("123");
jc.addjob(job1);//把三个job加入到jobcontorl中
jc.addjob(job2);
jc.addjob(job3);
jc.run();

三、链式

首先看一下例子,来说明为什么要有链式mapreduce,假设在统计单词是,会出现这样的词,make,made,making等,他们都属于一个词,在单词累加的时候,都归于一个词。解决的开发方法 为用一个单独的mapreduce任务可以实现,单增加了多个mapreduce作业,将增加整个作业处理的周期,还增加了i/o操作,因而处理效率不高。

一个较好的办法就是在核心的mapreduce之外,增加一个辅助的map开发过程 ,然后将这个辅助的map开发过程 和核心的mapreudce开发过程 合并为一个链式的mapreduce,从而完成整个作业。hadoop提供了专门的链式chainmapper和chainreducer来处理链式任务,chainmapper允许一个map任务中添加多个map的子任务,chainreducer可以在reducer执行之后,在加入多个map的子任务。其调用形式如下:

chainmapper.addmapper(...);
    chainreducer.addmapper(...);
    //addmapper()调用的开发方法 形式如下:
    public static void addmapper(job job,
            class<? extends mapper> mclass,
            class<?> inputkeyclass,
            class<?> inputvalueclass,
            class<?> outputkeyclass,
            class<?> outputvalueclass,
            configuration conf
    ){
    }

其中,chainreducer专门提供了一个setrreducer()开发方法 来设置整个作业唯一的reducer。

note:这些mapper和reducer之间传递的键和值都必须保持一致。

下面举个例子:用chainmapper把map1加如并执行,然后用chainreducer把reduce和map2加入到reduce开发过程 中。代码如下:map1.class 要实现map开发方法


public void function throws ioexception {
        configuration conf = new configuration();
        job job = new job(conf);
        job.setjobname("chianjob");
        // 在chainmapper里面添加map1
        configuration map1conf = new configuration(false);
        chainmapper.addmapper(job, map1.class, longwritable.class, text.class,
                text.class, text.class, true, map1conf);
        // 在chainreduce中加入reducer,map2;
        configuration reduceconf = new configuration(false);
        chainreducer.setreducer(job, reduce.class, longwritable.class,
                text.class, text.class, text.class, true, map1conf);
        configuration map2conf = new configuration();
        chainreducer.addmapper(job, map2.class, longwritable.class, text.class,
                text.class, text.class, true, map1conf);
        job.waitforcompletion(true);
    }
the chainmapper class allows to use multiple mapper classes within a single map task. 

the chainreducer class allows to chain multiple mapper classes after a reducer within the reducer task.

 通过chainmapper可以将多个map类合并成一个map任务。

下面个这个例子没

此文来自: 马开东博客 转载请注明出处 网址: http://www.makaidong.com

什么实际意思,但是很好的演示了chainmapper的作用。

源文件
100 tom 90
101 mary 85
102 kate 60

map00的结果,过滤掉100的记录
101 mary 85
102 kate 60

map01的结果,过滤掉101的记录
102 kate 60

reduce结果
102 kate 60

import java.io.ioexception;
import java.util.*;
import java.lang.string;

import org.apache.hadoop.fs.path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.lib.*;

public class word count
{

    public static class map00 extends mapreducebase implements mapper
    {

        public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
        {

            text ft = new text(“100′);

            if(!key.equals(ft))
            {
                output.collect(key, value);
            }
        }
    }

    public static class map01 extends mapreducebase implements mapper
    {

        public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
        {

            text ft = new text(“101′);

            if(!key.equals(ft))
            {
                output.collect(key, value);
            }
        }
    }

    public static class reduce extends mapreducebase implements reducer
    {
        public void reduce(text key, iterator values, outputcollector output, reporter reporter) throws ioexception
        {

            while(values.hasnext())
            {
                output.collect(key, values.next());
            }

        }
    }

    public static void main(string[] args) throws exception
    {

        jobconf conf = new jobconf(word count.class);
        conf.setjobname(“word count00′);

        conf.setinputformat(keyvaluetextinputformat.class);
        conf.setoutputformat(textoutputformat.class);

        chainmapper cm = new chainmapper();

        jobconf mapaconf = new jobconf(false);
        cm.addmapper(conf, map00.class, text.class, text.class, text.class, text.class, true, mapaconf);

        jobconf mapbconf = new jobconf(false);
        cm.addmapper(conf, map01.class, text.class, text.class, text.class, text.class, true, mapbconf);

        conf.setreducerclass(reduce.class);

        conf00.setoutputkeyclass(text.class);
        conf00.setoutputvalueclass(text.class);

        fileinputformat.setinputpaths(conf, new path(args[0]));
        fileoutputformat.setoutputpath(conf, new path(args[1]));

        jobclient.runjob(conf);

    }



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值