一、顺序式
执行完一个mapreduce,再执行一个mapreduce
configuration conf1 = new configuration();
conf1.set("mapred.job.tracker", "192.168.1.164:9001");
string[] ars=new string[]{"t2g_input","t2g_output1"};
string[] otherargs = new genericoptionsparser(conf1, ars).getremainingargs();
if (otherargs.length != 2) {
system.err.println("usage: word count ");
system.exit(2);
}
job job1 = new job(conf1, "job1");
job1.setjarbyclass(t2g.class);
job1.setmapperclass(map.class);
job1.setreducerclass(reduce.class);
job1.setoutputkeyclass(text.class);
job1.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job1, new path(otherargs[0]));
fileoutputformat.setoutputpath(job1, new path(otherargs[1]));
job1.waitforcompletion(true);
//sub mapreduce
configuration conf2 = new configuration();
conf2.set("mapred.job.tracker", "192.168.1.164:9001");
ars=new string[]{"t2g_output1","t2g_output2"};
otherargs = new genericoptionsparser(conf2, ars).getremainingargs();
if (otherargs.length != 2) {
system.err.println("usage: word count ");
system.exit(2);
}
job job2 = new job(conf2, "job2");
job2.setjarbyclass(t2g.class);
job2.setmapperclass(map2.class);
job2.setreducerclass(reduce2.class);
job2.setoutputkeyclass(text.class);
job2.setoutputvalueclass(intwritable.class);
fileinputformat.addinputpath(job2, new path(otherargs[0]));
fileoutputformat.setoutputpath(job2, new path(otherargs[1]));
job2.waitforcompletion(true);
二、依赖关系组合式
一个mapreduce有3个子任务job1,job2,job3构成,其中job1和job2相互独立,job3要在job1和job2完成之后才执行。这种关系就叫复杂数据依赖关系的组合式mapreduce。hadoop为这种组合关系提供了一种执行和控制机制,hadoop通过job和jobcontrol类提供具体的编程开发方法 。job除了维护子任务的配置信息,还维护子任务的依赖关系,而jobcontrol控制整个作业流程,把所有的子任务作业加入到jobcontrol中,执行jobcontrol的run()开发方法 即可运行程序。
configuration job1conf = new configuration();
job job1 = new job(job1conf,"job1");
.........//job1 其他设置
configuration job2conf = new configuration();
job job2 = new job(job2conf,"job2");
.........//job2 其他设置
configuration job3conf = new configuration();
job job3 = new job(job3conf,"job3");
.........//job3 其他设置
job3.adddepending(job1);//设置job3和job1的依赖关系
job3.adddepending(job2);
jobcontrol jc = new jobcontrol("123");
jc.addjob(job1);//把三个job加入到jobcontorl中
jc.addjob(job2);
jc.addjob(job3);
jc.run();
三、链式
首先看一下例子,来说明为什么要有链式mapreduce,假设在统计单词是,会出现这样的词,make,made,making等,他们都属于一个词,在单词累加的时候,都归于一个词。解决的开发方法 为用一个单独的mapreduce任务可以实现,单增加了多个mapreduce作业,将增加整个作业处理的周期,还增加了i/o操作,因而处理效率不高。
一个较好的办法就是在核心的mapreduce之外,增加一个辅助的map开发过程 ,然后将这个辅助的map开发过程 和核心的mapreudce开发过程 合并为一个链式的mapreduce,从而完成整个作业。hadoop提供了专门的链式chainmapper和chainreducer来处理链式任务,chainmapper允许一个map任务中添加多个map的子任务,chainreducer可以在reducer执行之后,在加入多个map的子任务。其调用形式如下:
chainmapper.addmapper(...);
chainreducer.addmapper(...);
//addmapper()调用的开发方法 形式如下:
public static void addmapper(job job,
class<? extends mapper> mclass,
class<?> inputkeyclass,
class<?> inputvalueclass,
class<?> outputkeyclass,
class<?> outputvalueclass,
configuration conf
){
}
其中,chainreducer专门提供了一个setrreducer()开发方法 来设置整个作业唯一的reducer。
note:这些mapper和reducer之间传递的键和值都必须保持一致。
下面举个例子:用chainmapper把map1加如并执行,然后用chainreducer把reduce和map2加入到reduce开发过程 中。代码如下:map1.class 要实现map开发方法
public void function throws ioexception {
configuration conf = new configuration();
job job = new job(conf);
job.setjobname("chianjob");
// 在chainmapper里面添加map1
configuration map1conf = new configuration(false);
chainmapper.addmapper(job, map1.class, longwritable.class, text.class,
text.class, text.class, true, map1conf);
// 在chainreduce中加入reducer,map2;
configuration reduceconf = new configuration(false);
chainreducer.setreducer(job, reduce.class, longwritable.class,
text.class, text.class, text.class, true, map1conf);
configuration map2conf = new configuration();
chainreducer.addmapper(job, map2.class, longwritable.class, text.class,
text.class, text.class, true, map1conf);
job.waitforcompletion(true);
}
the chainreducer class allows to chain multiple mapper classes after a reducer within the reducer task.
通过chainmapper可以将多个map类合并成一个map任务。
下面个这个例子没
此文来自: 马开东博客 转载请注明出处 网址: http://www.makaidong.com
什么实际意思,但是很好的演示了chainmapper的作用。
源文件
100 tom 90
101 mary 85
102 kate 60
map00的结果,过滤掉100的记录
101 mary 85
102 kate 60
map01的结果,过滤掉101的记录
102 kate 60
reduce结果
102 kate 60
import java.io.ioexception;
import java.util.*;
import java.lang.string;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.lib.*;
public class word count
{
public static class map00 extends mapreducebase implements mapper
{
public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
{
text ft = new text(“100′);
if(!key.equals(ft))
{
output.collect(key, value);
}
}
}
public static class map01 extends mapreducebase implements mapper
{
public void map(text key, text value, outputcollector output, reporter reporter) throws ioexception
{
text ft = new text(“101′);
if(!key.equals(ft))
{
output.collect(key, value);
}
}
}
public static class reduce extends mapreducebase implements reducer
{
public void reduce(text key, iterator values, outputcollector output, reporter reporter) throws ioexception
{
while(values.hasnext())
{
output.collect(key, values.next());
}
}
}
public static void main(string[] args) throws exception
{
jobconf conf = new jobconf(word count.class);
conf.setjobname(“word count00′);
conf.setinputformat(keyvaluetextinputformat.class);
conf.setoutputformat(textoutputformat.class);
chainmapper cm = new chainmapper();
jobconf mapaconf = new jobconf(false);
cm.addmapper(conf, map00.class, text.class, text.class, text.class, text.class, true, mapaconf);
jobconf mapbconf = new jobconf(false);
cm.addmapper(conf, map01.class, text.class, text.class, text.class, text.class, true, mapbconf);
conf.setreducerclass(reduce.class);
conf00.setoutputkeyclass(text.class);
conf00.setoutputvalueclass(text.class);
fileinputformat.setinputpaths(conf, new path(args[0]));
fileoutputformat.setoutputpath(conf, new path(args[1]));
jobclient.runjob(conf);
}