1、MR的宏观流程
-
两个阶段 Map阶段和Reduce阶段
-
一个MapReduce任务为一个Job,一个Job在执行不同的阶段时,启动若干Task
Map阶段启动的进程称为MapTask
MapTask启动的数量取决于切片数,切N片,启动N个MapTask
Reduce阶段启动的进程称为ReduceTask
ReduceTask启动的进程数量由开发人员自己设置
Job.setNumReduceTask(int n);
-
在学习MR时,在Map和Reduce之间,有一个讲Map输出的数据进行分区和排序和传输的过程
这个过程很重要,因此将这个阶段,单独命名为shuffle!在两个阶段的划分的基础上,再细分为 Map----------shuffle------------Reduce
shuffle阶段不会启动单独的进程来完成,shuffle横跨了MapTask和ReduceTask!
4.官方的阶段划分
Map阶段: map,sort
Reduce阶段: copy,sort,reduce
map(map阶段)---sort|copy|sort(shuffle阶段)---reduce(reduce阶段)
2、MapReduce有两种运行模式
-
local(本地模式): 使用LocalJobRunner提交时,Job就在本地运行!
在本地以多线程模拟MapTask和ReduceTask
(我们就是以该模式来讲解) -
YARN(在YARN上运行): 使用YARNRunner提交时,此时Job在运行之前,会初始化
MRAppMaster进程,由这个进程向RM申请运行所有Task的资源!RM将申请分配给NM,NM提供的资源会封装到Container中,此时启动Task进程!
(所以在local模式是看不到MRAppMaster)
3、源代码分析
我们就以WordCount案例来说明
3.1 WordCount案例代码
如下:
Mappe 类:
package andy.mywc;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* LongWritable 行的偏移量
* Text 表示输入数据的value, 在这里表示的一行内容;
* Text 表示map阶段的输出的key
* IntWritable 表示map阶段输出的value
*/
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
Text k = new Text();
IntWritable v = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1 获取一行的的字符(把Text转换为String)
String line = value.toString();
//2 分割一行字符,成为单词数组
String [] worlds = line.split(" ");
//3 遍历单词数组
for (String world : worlds) {
//4 设置输出的key为单词,value为1
k.set(world);
//5 输出
context.write(k,v);
}
}
}
Reducer 类:
package andy.mywc;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* reducer输入 和 mapper的输出是一致的
* Text reduce输入的 key
* LongWritable reduce输入的value
* Text reduce输出的 key
* LongWritable reduce输出的value
*/
public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
/**
* 每个相同的key只会进来一次reduce函数
*/
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
// 1 遍历values
for (IntWritable value : values) {
sum += value.get();
}
IntWritable v = new IntWritable();
v.set(sum);
//2 写出
context.write(key, v);
}
}
Driver类:
package andy.mywc;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WCDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
if(args == null || args.length < 2)
args = new String[] {"E:\\temp\\input","E:\\temp\\output3"};
//1. 创建配置
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
//2. 设置jarclass
job.setJarByClass(WCDriver.class);
//3. 关联mapper 和 reducer
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
//4. 设置Mapper 和 Reducer输出的key和values的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//5. 设置input和output的路径
FileInputFormat.setInputPaths(job,args[0]);
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//6. 提交程序
boolean b = job.waitForCompletion(true);
System.exit( b ? 0 : 1);
}
}
3.2 Job在提交之前的预处理阶段
(1) 开始
程序是从driver的main函数开始的,在main函数前面的都是设置job的,真正提交的是
boolean b = job.waitForCompletion(true);
那我们就从这里开始.
(2) waitForCompletion
public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
ClassNotFoundException {
if (state == JobState.DEFINE) {
submit();
}
if (verbose) {
monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis =
Job.getCompletionPollInterval(cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
return isSuccessful();
}
以下的代码是根据verbose变量来决定是否输出运行的信息的并定时检查job是否运行完毕,执行到这部分说明,job已经提交上去.
if (verbose) {
monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis =
Job.getCompletionPollInterval(cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
if (state == JobState.DEFINE)
这个代码块是检查job是否运行,如果不是运行状态,就调用submit函数.
(3) submit
public void submit()
throws IOException, InterruptedException, ClassNotFoundException {
ensureState(JobState.DEFINE);
setUseNewAPI();
connect();
final JobSubmitter submitter =
getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
public JobStatus run() throws IOException, InterruptedException,
ClassNotFoundException {
return submitter.submitJobInternal(Job.this, cluster);
}
});
state = JobState.RUNNING;
LOG.info("The url to track the job: " + getTrackingURL());
}
1) ensureState(JobState.DEFINE);
这行代码再次检查job的运行状态.2) setUseNewAPI();
我们的hadoop的jar包邮两个,一个是1.X版本的,一个是2.X版本的,这里设置使用2.X版本.
往下我们来看一下connect函数.
(4) connect

在connect函数中会根据config文件来创建一个集群对象cluster(这里是决定是本地模式还是yarn模式地方,如果是是yarn模式会返回yarn cluster对象),代表的集群的所有资源.

创建完了集群对象我们返回到submit函数.
(5) getJobSubmitter
执行完了submit函数,就执行getJobSubmitter函数

这里会生产一个提交器,用来提交job的,这个对象非常多的信息.

(6) submitJobInternal
然后就执行submitJobInternal函数,job提交的核心都在这个函数里面.这个函数非常长.





1) checkSpecs(job);

这里获取Reducer的个数,是否等于0(相当没有reducer阶段).
然后在检查是否用的new API,如果是
走下面else这行代码
ReflectionUtils.newInstance(job.getOutputFormatClass(),
job.getConfiguration());
获取OutputFormat 输出类,
public Class<? extends OutputFormat<?,?>> getOutputFormatClass()
throws ClassNotFoundException {
return (Class<? extends OutputFormat<?,?>>)
conf.getClass(OUTPUT_FORMAT_CLASS_ATTR, TextOutputFormat.class);
}
如果没有设置,将使用TextOutputFormat.class类,所以默认情况下输出的格式为TextOutputFormat,获取完输出格式类,然后检查输出目录是否存在和是否为空,空间是否足够等(如果输出目录为空或者该目录已经存在,会报异常).
2) 回到submitJobInternal函数
addMRFrameworkToDistributedCache(conf); 创建分布式缓存.
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
//设置job临时文件目录,该目录一般是在:
如果是本地文件系统,默认在idea所在的磁盘的tmp目录下
如果是HDFS,默认在hdfs的/tmp下
Path submitJobDir = new Path(jobStagingArea, jobId.toString()); //创建临时目录
3) writeSplits
line 196 : int maps = writeSplits(job, submitJobDir);

这是切片函数,切片就是在这里面切的.maps就是切片的个数.
通常情况下,如果是从文件中获取切片,那么使用FileSplit作为切片对象!
这个会在后面的流程会用到.
FileSplit有三个关键的属性:
Path p: 当前切片属于哪个文件
int start: 当前切片的起始位置是从这个文件的哪个位置开始
int length: 当前切片的长度
从指定文件的start位置开始,读取length的长度,就属于当前切片!
切片和块没有任何关系!只不过每次从指定文件中读取每片数据时,数据实际上以块的形式存储在HDFS,
读取数据时,会访问到指定的块!具体哪个切片对应哪些块,取决于切片如何切,没有直接的对应关系!
注意:默认情况下,使用FileInputFormat,在切片时,默认以块大小作为片大小!
片刚好是读取一个块!
回到submitJobInternal函数.
4) writeConf
line 234 : writeConf(conf, submitJobFile);
这个地方会把config文件和切片文件写入到上面的tmp 临时文件.
执行这个函数之后,tmp目录如下:

job.xml: 记录了Job所有的配置,这些配置来自于xxx-default.xml和xxx-site.xml
job.split:切片对象
job.splitmetainfo:切片对象的属性说明
5) submitClient.submitJob
line 240

这正式提交job.

这里新建一个Job的对象.



- line 150 -151 创建必要的配置文件对象,就是在tmp目录中xml
- line 152 到 188 都是对job的一些配置.
- line 190 this.start();启动提交job.
说明一下:
因为现在是local模式,所以是没有MRAppMaster,只能用Job来模拟.
LocalJobRunner.Job: 功能类似于MRAppMaster,负责整个Job的运行的申请,提交等操作!
a)new Job()
b)执行Job.start() 相当于启动了MRAppMaster
c) 进入Job.run() 相当于开始让MRAppMaster干活
所以在start之后,会执行Job的run方法.这个run方法我们暂时保留.后面再说.
假如说这行完了start之后,会回到waitForCompletion,然后监听job运行完毕,然后打印信息.提交job的前期工作就完成了,接下来就是MapReduce阶段了.
3.2 Map阶段
(1) 我们说local模式用Job来模拟MRAppMaster的,那我们来看Job的run方法.
public void run() {
try {
//根据之前生成的job.split和job.splitinfo文件创建TaskSplitMetaInfo[]
// 之前切几片,TaskSplitMetaInfo[]中就有几个切片的元数据对象
TaskSplitMetaInfo[] taskSplitMetaInfos =
SplitMetaInfoReader.readSplitMetaInfo(jobId, localFs, conf, systemJobDir);
// Map用来保存所有MapTask生成的结果文件的影响信息
Map<TaskAttemptID, MapOutputFile> mapOutputFiles =
Collections.synchronizedMap(new HashMap<TaskAttemptID, MapOutputFile>());
// 根据有几个切片,就创建几个MapTask的线程
List<RunnableWithThrowable> mapRunnables = getMapTaskRunnables(
taskSplitMetaInfos, jobId, mapOutputFiles);
initCounters(mapRunnables.size(), numReduceTasks);
ExecutorService mapService = createMapExecutor();
// 运行MapTask
runTasks(mapRunnables, mapService, "map");
// 如果有reduce阶段,运行ReduceTasks
try {
if (numReduceTasks > 0) {
List<RunnableWithThrowable> reduceRunnables = getReduceTaskRunnables(
jobId, mapOutputFiles);
ExecutorService reduceService = createReduceExecutor();
//运行ReduceTask
runTasks(reduceRunnables, reduceService, "reduce");
}
} finally {
for (MapOutputFile output : mapOutputFiles.values()) {
output.removeAll();
}
}
}
(2) runTasks


line 439 - 441 变量创建的Map线程,挨个启动(启动时按顺序的,但各个线程的执行时并行的).

我们看这个runables对象的实例是LocalJobRunner 里面的Job 里面的MapTaskRunnable,
其实是执行了submit之后,会到MapTaskRunnable里面执行run函数.
我们来看run函数
(3)MapTaskRunnable.run
查看LocalJobRunner.Job.MapTaskRunnable,类似于封装了Container,在其中运行了MapTask!
public void run() {
try {
TaskAttemptID mapId = new TaskAttemptID(new TaskID(
jobId, TaskType.MAP, taskId), 0);
LOG.info("Starting task: " + mapId);
mapIds.add(mapId);
//创建一个 MapTask对象,这个对象代表当前的MapTask,负责这个Task的总的运行流程
MapTask map = new MapTask(systemJobFile.toString(), mapId, taskId,
info.getSplitIndex(), 1);
......
//运行MapTask
map.run(localConf, Job.this);
}
(4) MapTask.run()
@Override
public void run(final JobConf job, final TaskUmbilicalProtocol umbilical)
throws IOException, ClassNotFoundException, InterruptedException {
this.umbilical = umbilical;
//定义了整个 MapTask的阶段划分
if (isMapTask()) {
// If there are no reducers then there won't be any sort. Hence the map
// phase will govern the entire attempt's progress.
if (conf.getNumReduceTasks() == 0) {
mapPhase = getProgress().addPhase("map", 1.0f);
} else {
// If there are reducers then the entire attempt's progress will be
// split between the map phase (67%) and the sort phase (33%).
mapPhase = getProgress().addPhase("map", 0.667f);
sortPhase = getProgress().addPhase("sort", 0.333f);
}
}
.......
if (useNewApi) {
// 开始运行Mapper
runNewMapper(job, splitMetaInfo, umbilical, reporter);
} else {
runOldMapper(job, splitMetaInfo, umbilical, reporter);
}
done(umbilical, reporter);
}
注意: 如果有reduce,那么Map阶段分为两个阶段
map------67%
sort-------33%
如果没有reduce,Map阶段只有map一个阶段
只有有reduce阶段时,数据才会排序!如果没有reduce阶段数据是按照读入的顺序处理后,输出!
(5) runNewMapper
private <INKEY,INVALUE,OUTKEY,OUTVALUE>
void runNewMapper(final JobConf job,
final TaskSplitIndex splitIndex,
final TaskUmbilicalProtocol umbilical,
TaskReporter reporter
) throws IOException, ClassNotFoundException,
InterruptedException {
// make a task context so we can get the classes
org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
getTaskID(),
reporter);
// make a mapper 一个MapTask只会创建一个Mapper对象
org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE> mapper =
(org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>)
ReflectionUtils.newInstance(taskContext.getMapperClass(), job);
// make the input format 创建输入格式对象
org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE> inputFormat =
(org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE>)
ReflectionUtils.newInstance(taskContext.getInputFormatClass(), job);
// rebuild the input split 重建当前MapTask的切片
org.apache.hadoop.mapreduce.InputSplit split = null;
split = getSplitDetails(new Path(splitIndex.getSplitLocation()),
splitIndex.getStartOffset());
LOG.info("Processing split: " + split);
//构建MapTask的输入对象,负责整个MapTask的输入工作,RecordReader由input负责进行调用读取数据
org.apache.hadoop.mapreduce.RecordReader<INKEY,INVALUE> input =
new NewTrackingRecordReader<INKEY,INVALUE>
(split, inputFormat, reporter, taskContext);
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
org.apache.hadoop.mapreduce.RecordWriter output = null;
//构建MapTask的输出对象
// get an output object
if (job.getNumReduceTasks() == 0) {
//如果没有reduce阶段,由Map收集输出的数据,直接输出
output =
new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
} else {
// 创建记录收集器
output = new NewOutputCollector(taskContext, job, umbilical, reporter);
}
org.apache.hadoop.mapreduce.MapContext<INKEY, INVALUE, OUTKEY, OUTVALUE>
mapContext =
new MapContextImpl<INKEY, INVALUE, OUTKEY, OUTVALUE>(job, getTaskID(),
input, output,
committer,
reporter, split);
//构建Mapper中使用的context对象,代表MapTask的上下文(来龙去脉),
org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>.Context
mapperContext =
new WrappedMapper<INKEY, INVALUE, OUTKEY, OUTVALUE>().getMapContext(
mapContext);
try {
// 会执行输入过程中所需要组件的一系列初始化
// 调用RecordReader.initialize()
input.initialize(split, mapperContext);
// 调用自己编写的Mapper的run()
mapper.run(mapperContext);
mapPhase.complete();
setPhase(TaskStatus.Phase.SORT);
statusUpdate(umbilical);
input.close();
input = null;
output.close(mapperContext);
output = null;
} finally {
closeQuietly(input);
closeQuietly(output, mapperContext);
}
}
这个函数比较麻烦,我我们分开来说
1) inputformat
org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE> inputFormat =
(org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE>)
ReflectionUtils.newInstance(taskContext.getInputFormatClass(), job);
public Class<? extends InputFormat<?,?>> getInputFormatClass()
throws ClassNotFoundException {
return (Class<? extends InputFormat<?,?>>)
conf.getClass(INPUT_FORMAT_CLASS_ATTR, TextInputFormat.class);
}
这里获取inputformat,如果有设置是获取我们设置的inputformat,不然使用TextInputFormat.class.
org.apache.hadoop.mapreduce.RecordReader<INKEY,INVALUE> input =
new NewTrackingRecordReader<INKEY,INVALUE>
(split, inputFormat, reporter, taskContext);
NewTrackingRecordReader
NewTrackingRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.InputFormat<K, V> inputFormat,
TaskReporter reporter,
org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
throws InterruptedException, IOException {
this.reporter = reporter;
this.inputRecordCounter = reporter
.getCounter(TaskCounter.MAP_INPUT_RECORDS);
this.fileInputByteCounter = reporter
.getCounter(FileInputFormatCounter.BYTES_READ);
List <Statistics> matchedStats = null;
if (split instanceof org.apache.hadoop.mapreduce.lib.input.FileSplit) {
matchedStats = getFsStatistics(((org.apache.hadoop.mapreduce.lib.input.FileSplit) split)
.getPath(), taskContext.getConfiguration());
}
fsStats = matchedStats;
long bytesInPrev = getInputBytes(fsStats);
this.real = inputFormat.createRecordReader(split, taskContext);
long bytesInCurr = getInputBytes(fsStats);
fileInputByteCounter.increment(bytesInCurr - bytesInPrev);
}
这行代码
this.real = inputFormat.createRecordReader(split, taskContext);
public RecordReader<LongWritable, Text>
createRecordReader(InputSplit split,
TaskAttemptContext context) {
String delimiter = context.getConfiguration().get(
"textinputformat.record.delimiter");
byte[] recordDelimiterBytes = null;
if (null != delimiter)
recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
return new LineRecordReader(recordDelimiterBytes);
}
默认TextInputFormat使用LineRecordReader!
一般我们自定义输入格式要重写createRecordReader方法,就是这个原因.
2) map阶段的输出
//构建MapTask的输出对象
// get an output object
if (job.getNumReduceTasks() == 0) {
//如果没有reduce阶段,由Map收集输出的数据,直接输出
output =
new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
} else {
// 创建记录收集器
output = new NewOutputCollector(taskContext, job, umbilical, reporter);
}
NewOutputCollector(org.apache.hadoop.mapreduce.JobContext jobContext,
JobConf job,
TaskUmbilicalProtocol umbilical,
TaskReporter reporter
) throws IOException, ClassNotFoundException {
//创建输出中的缓冲区对象,这个缓冲区不仅可以用来收集数据还会对数据进行排序
collector = createSortingCollector(job, reporter);
// 根据reduceTask的数量,确定Map阶段总的分区数(不等于实际上数据的分区)
partitions = jobContext.getNumReduceTasks();
// reduceTask个数>1,就使用用户配置的Partitioner
if (partitions > 1) {
partitioner = (org.apache.hadoop.mapreduce.Partitioner<K,V>)
ReflectionUtils.newInstance(jobContext.getPartitionerClass(), job);
} else {
partitioner = new org.apache.hadoop.mapreduce.Partitioner<K,V>() {
@Override
public int getPartition(K key, V value, int numPartitions) {
return partitions - 1;
}
};
a) Partitioner分区器的获取
从配置中获取用户定义的Partitioner,如果没有,使用HashPartitioner.class.
public Class<? extends Partitioner<?,?>> getPartitionerClass()
throws ClassNotFoundException {
//获取mapreduce.job.partitioner.class值,如果没有设置,使用HashPartitioner作为默认
return (Class<? extends Partitioner<?,?>>)
conf.getClass(PARTITIONER_CLASS_ATTR, HashPartitioner.class);
}
HashPartitioner的工作原理: key相同的会分到同一个区
public class HashPartitioner<K, V> extends Partitioner<K, V> {
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
所以我们在自定义Partitioner时:
继承Partitioner,实现public int getPartition(Text key, FlowBean value, int numPartitions)!
注意: 分区号必须为int型的值,且必须符合 0<= partitionNum < numPartitions
b) 缓冲区的构建
collector = createSortingCollector(job, reporter);
collector.init(context);
public void init(MapOutputCollector.Context context
) throws IOException, ClassNotFoundException {
partitions = job.getNumReduceTasks();
rfs = ((LocalFileSystem)FileSystem.getLocal(job)).getRaw();
//sanity checks
// 从配置中获取溢写的百分比,默认读取mapreduce.map.sort.spill.percent,如果没有配置,使用
// 0.8作为百分比
final float spillper =
job.getFloat(JobContext.MAP_SORT_SPILL_PERCENT, (float)0.8);
// 缓冲区的初始化大小,默认读取mapreduce.task.io.sort.mb,如果没有配置,默认使用100
final int sortmb = job.getInt(JobContext.IO_SORT_MB, 100);
indexCacheMemoryLimit = job.getInt(JobContext.INDEX_CACHE_MEMORY_LIMIT,
INDEX_CACHE_MEMORY_LIMIT_DEFAULT);
// 默认使用快排
sorter = ReflectionUtils.newInstance(job.getClass("map.sort.class",
QuickSort.class, IndexedSorter.class), job);
// k/v serialization
// 确定key的比较器
comparator = job.getOutputKeyComparator();
// 获取Mapper输出的key-value的类型
keyClass = (Class<K>)job.getMapOutputKeyClass();
valClass = (Class<V>)job.getMapOutputValueClass();
serializationFactory = new SerializationFactory(job);
//根据key的类型返回指定的序列化器
keySerializer = serializationFactory.getSerializer(keyClass);
keySerializer.open(bb);
valSerializer = serializationFactory.getSerializer(valClass);
valSerializer.open(bb);
// compression 在mapper的输出阶段使用压缩
if (job.getCompressMapOutput()) {
Class<? extends CompressionCodec> codecClass =
job.getMapOutputCompressorClass(DefaultCodec.class);
codec = ReflectionUtils.newInstance(codecClass, job);
} else {
codec = null;
}
// combiner 设置combiner
final Counters.Counter combineInputCounter =
reporter.getCounter(TaskCounter.COMBINE_INPUT_RECORDS);
combinerRunner = CombinerRunner.create(job, getTaskID(),
combineInputCounter,
reporter, null);
if (combinerRunner != null) {
final Counters.Counter combineOutputCounter =
reporter.getCounter(TaskCounter.COMBINE_OUTPUT_RECORDS);
combineCollector= new CombineOutputCollector<K,V>(combineOutputCounter, reporter, job);
} else {
combineCollector = null;
}
}
c) 确定key的比较器
public RawComparator getOutputKeyComparator() {
//尝试获取参数中配置的mapreduce.job.output.key.comparator.class,作为比较器,
//如果没有定义,默认为null,定义的话必须是RawComparator类型
Class<? extends RawComparator> theClass = getClass(
JobContext.KEY_COMPARATOR, null, RawComparator.class);
//如果用户配置,就实例化此类型的一个对象
if (theClass != null)
return ReflectionUtils.newInstance(theClass, this);
// 判断Mapper输出的key是否是writableComparable类型的子类,
//如果是,就默认由系统提供比较器,如果不是就抛异常!
return WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class), this);
}
如何在Map阶段定义key的排序:
①要比较的字段,设置为key
②提供一个RawComparator的key的比较器!
或让key实现WritableComparable接口!
根据比较器的compareTo()对key进行比较!
如果比较的条件是多个,称为二次排序!
d) 序列化
①什么时候需要序列化
有reduce阶段时,需要Map输出的key-value实现序列化
②怎么实现
实现Writable接口!
③是否必须实现?
不是!
④什么情况下,可以不实现Writable接口,为什么一定要实现Writable接口
只有实现了Writable接口,hadoop才会自动提供基于Writable接口的序列化器!
如果自己提供序列化器,就可以不是先Writable接口
3) mapper.run(mapperContext)
这个是调用自己编写的Mapper的run()
public void run(Context context) throws IOException, InterruptedException {
//在map()之前只被调用一次
setup(context);
try {
//调用RecoredReader的nextKeyValue()
while (context.nextKeyValue()) {
//每读取一对输入的KEYIN-VALUEIN,执行一次map记录
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
//在map()之后被调用1次
cleanup(context);
}
}
这个部分就是调用RecordReader的nextKeyValue,getCurrentKey,getCurrentValue,然后送给自定义的map方法处理.
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1 获取一行的的字符(把Text转换为String)
String line = value.toString();
//2 分割一行字符,成为单词数组
String [] worlds = line.split(" ");
//3 遍历单词数组
for (String world : worlds) {
//4 设置输出的key为单词,value为1
k.set(world);
//5 输出
context.write(k,v);
}
}
public void write(KEYOUT key, VALUEOUT value) throws IOException,
InterruptedException {
mapContext.write(key, value);
}
public void write(K key, V value) throws IOException, InterruptedException {
collector.collect(key, value,
partitioner.getPartition(key, value, partitions));
}
最后是来到 collector.collect
public synchronized void collect(K key, V value, final int partition
) throws IOException {
reporter.progress();
//检查key和value的class是否和你设置的一样
if (key.getClass() != keyClass) {
throw new IOException("Type mismatch in key from map: expected "
+ keyClass.getName() + ", received "
+ key.getClass().getName());
}
if (value.getClass() != valClass) {
throw new IOException("Type mismatch in value from map: expected "
+ valClass.getName() + ", received "
+ value.getClass().getName());
}
//检查分区是否合法
if (partition < 0 || partition >= partitions) {
throw new IOException("Illegal partition for " + key + " (" +
partition + ")");
}
//检查溢写是否异常
checkSpillException();
//检查缓冲区的空间
bufferRemaining -= METASIZE;
if (bufferRemaining <= 0) {
// start spill if the thread is not running and the soft limit has been
// reached
spillLock.lock();
try {
do {
if (!spillInProgress) {
final int kvbidx = 4 * kvindex;
final int kvbend = 4 * kvend;
// serialized, unspilled bytes always lie between kvindex and
// bufindex, crossing the equator. Note that any void space
// created by a reset must be included in "used" bytes
final int bUsed = distanceTo(kvbidx, bufindex);
final boolean bufsoftlimit = bUsed >= softLimit;
if ((kvbend + METASIZE) % kvbuffer.length !=
equator - (equator % METASIZE)) {
// spill finished, reclaim space
resetSpill();
bufferRemaining = Math.min(
distanceTo(bufindex, kvbidx) - 2 * METASIZE,
softLimit - bUsed) - METASIZE;
continue;
} else if (bufsoftlimit && kvindex != kvend) {
// spill records, if any collected; check latter, as it may
// be possible for metadata alignment to hit spill pcnt
//开始溢写
startSpill();
final int avgRec = (int)
(mapOutputByteCounter.getCounter() /
mapOutputRecordCounter.getCounter());
// leave at least half the split buffer for serialization data
// ensure that kvindex >= bufindex
final int distkvi = distanceTo(bufindex, kvbidx);
final int newPos = (bufindex +
Math.max(2 * METASIZE - 1,
Math.min(distkvi / 2,
distkvi / (METASIZE + avgRec) * METASIZE)))
% kvbuffer.length;
setEquator(newPos);
bufmark = bufindex = newPos;
final int serBound = 4 * kvend;
// bytes remaining before the lock must be held and limits
// checked is the minimum of three arcs: the metadata space, the
// serialization space, and the soft limit
bufferRemaining = Math.min(
// metadata max
distanceTo(bufend, newPos),
Math.min(
// serialization max
distanceTo(newPos, serBound),
// soft limit
softLimit)) - 2 * METASIZE;
}
}
} while (false);
} finally {
spillLock.unlock();
}
}
try {
// serialize key bytes into buffer
int keystart = bufindex;
keySerializer.serialize(key);
if (bufindex < keystart) {
// wrapped the key; must make contiguous
bb.shiftBufferedKey();
keystart = 0;
}
// serialize value bytes into buffer
final int valstart = bufindex;
valSerializer.serialize(value);
// It's possible for records to have zero length, i.e. the serializer
// will perform no writes. To ensure that the boundary conditions are
// checked and that the kvindex invariant is maintained, perform a
// zero-length write into the buffer. The logic monitoring this could be
// moved into collect, but this is cleaner and inexpensive. For now, it
// is acceptable.
bb.write(b0, 0, 0);
// the record must be marked after the preceding write, as the metadata
// for this record are not yet written
int valend = bb.markRecord();
mapOutputRecordCounter.increment(1);
mapOutputByteCounter.increment(
distanceTo(keystart, valend, bufvoid));
// write accounting info
kvmeta.put(kvindex + PARTITION, partition);
kvmeta.put(kvindex + KEYSTART, keystart);
kvmeta.put(kvindex + VALSTART, valstart);
kvmeta.put(kvindex + VALLEN, distanceTo(valstart, valend));
// advance kvindex
kvindex = (kvindex - NMETA + kvmeta.capacity()) % kvmeta.capacity();
} catch (MapBufferTooSmallException e) {
LOG.info("Record too large for in-memory buffer: " + e.getMessage());
spillSingleRecord(key, value, partition);
mapOutputRecordCounter.increment(1);
return;
}
}
写完数据到环形缓冲区之后会回到runNewMapper函数
setPhase(TaskStatus.Phase.SORT);
statusUpdate(umbilical);
input.close();
input = null;
//这里这个close里面会去对缓冲区的数据进行排序
output.close(mapperContext);
output = null;
} finally {
closeQuietly(input);
closeQuietly(output, mapperContext);
}
4)collector.flush();
最后会走到sortAndSpill这个函数里面.这就是真正排序的地方.而且这个函数函数里面进行combiner的动作.
private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
//approximate the length of the output file to be the length of the
//buffer + header lengths for the partitions
try {
// create spill file
//排序
sorter.sort(MapOutputBuffer.this, mstart, mend, reporter);
int spindex = mstart;
final IndexRecord rec = new IndexRecord();
final InMemValBytes value = new InMemValBytes();
for (int i = 0; i < partitions; ++i) {
IFile.Writer<K, V> writer = null;
try {
long segmentStart = out.getPos();
FSDataOutputStream partitionOut = CryptoUtils.wrapIfNecessary(job, out);
writer = new Writer<K, V>(job, partitionOut, keyClass, valClass, codec,
spilledRecordsCounter);
//判断是否定义了combiner
if (combinerRunner == null) {
// spill directly
DataInputBuffer key = new DataInputBuffer();
while (spindex < mend &&
kvmeta.get(offsetFor(spindex % maxRec) + PARTITION) == i) {
final int kvoff = offsetFor(spindex % maxRec);
int keystart = kvmeta.get(kvoff + KEYSTART);
int valstart = kvmeta.get(kvoff + VALSTART);
key.reset(kvbuffer, keystart, valstart - keystart);
getVBytesForOffset(kvoff, value);
writer.append(key, value);
++spindex;
}
} else {
int spstart = spindex;
while (spindex < mend &&
kvmeta.get(offsetFor(spindex % maxRec)
+ PARTITION) == i) {
++spindex;
}
// Note: we would like to avoid the combiner if we've fewer
// than some threshold of records for a partition
//进行combiner
if (spstart != spindex) {
combineCollector.setWriter(writer);
RawKeyValueIterator kvIter =
new MRResultIterator(spstart, spindex);
combinerRunner.combine(kvIter, combineCollector);
}
}
// close the writer
writer.close();
// record offsets
rec.startOffset = segmentStart;
rec.rawLength = writer.getRawLength() + CryptoUtils.cryptoPadding(job);
rec.partLength = writer.getCompressedLength() + CryptoUtils.cryptoPadding(job);
spillRec.putIndex(rec, i);
writer = null;
} finally {
if (null != writer) writer.close();
}
}
if (totalIndexCacheMemory >= indexCacheMemoryLimit) {
// create spill index file
Path indexFilename =
mapOutputFile.getSpillIndexFileForWrite(numSpills, partitions
* MAP_OUTPUT_INDEX_RECORD_LENGTH);
spillRec.writeToFile(indexFilename, job);
} else {
indexCacheList.add(spillRec);
totalIndexCacheMemory +=
spillRec.size() * MAP_OUTPUT_INDEX_RECORD_LENGTH;
}
LOG.info("Finished spill " + numSpills);
++numSpills;
} finally {
if (out != null) out.close();
}
}
5) 这部分有点乱,我们做一个总结
①先分区,每个输出的key-value在写出时,先调用partitioner计算分区号,再收集到缓冲区中
②数据被收集进入缓冲区中,当缓冲区达到溢写的条件时,会调用 Sorter对当前缓冲区中的所有的数据进行排序
只排索引(记录排好序的索引的信息)
③按照分区号,从0号开始依次溢写,每次溢写之前,如果设置了Combiner,先Conbine再溢写。
每次溢写会产生一个spillx.out的文件
④所有的数据全部收集到缓冲区后,会执行最后一次flush(),将不满足溢写条件的缓存中的残余数据再次溢写
⑤flush()之后,会调用mergeParts()进行合并
在合并时,先将多个文件的同一个分区的数据进行合并,合并后再排序!
之后讲所有分区的数据溢写为一个final.out文件!
在溢写之前,如果设置了Combiner,并且之前溢写的片段个数>=3,此时会再次调用Combiner,Combine后再溢写
3.3 Reduce 阶段
执行完了map阶段,就会进行Reduce阶段.
我们回到LocalJobRunner.java的run方法.
public void run() {
Map<TaskAttemptID, MapOutputFile> mapOutputFiles =
Collections.synchronizedMap(new HashMap<TaskAttemptID, MapOutputFile>());
List<RunnableWithThrowable> mapRunnables = getMapTaskRunnables(
taskSplitMetaInfos, jobId, mapOutputFiles);
initCounters(mapRunnables.size(), numReduceTasks);
ExecutorService mapService = createMapExecutor();
runTasks(mapRunnables, mapService, "map"); //这里面执行完了map之后
//下面这里是执行reduce阶段的
try {
if (numReduceTasks > 0) {
List<RunnableWithThrowable> reduceRunnables = getReduceTaskRunnables(
jobId, mapOutputFiles);
ExecutorService reduceService = createReduceExecutor();
//这里开始执行
runTasks(reduceRunnables, reduceService, "reduce");
}
}
(1) runTasks
private void runTasks(List<RunnableWithThrowable> runnables,
ExecutorService service, String taskType) throws Exception {
// Start populating the executor with work units.
// They may begin running immediately (in other threads).
for (Runnable r : runnables) {
service.submit(r);
}
try {
service.shutdown(); // Instructs queue to drain.
// Wait for tasks to finish; do not use a time-based timeout.
// (See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6179024)
LOG.info("Waiting for " + taskType + " tasks");
service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException ie) {
// Cancel all threads.
service.shutdownNow();
throw ie;
}
(2) service.submit®
这是还是执行到ReduceTaskRunnable类中的run方法:
public void run() {
try {
TaskAttemptID reduceId = new TaskAttemptID(new TaskID(
jobId, TaskType.REDUCE, taskId), 0);
//创建reduceTask对象
ReduceTask reduce = new ReduceTask(systemJobFile.toString(),
reduceId, taskId, mapIds.size(), 1);
......
try {
// 运行reduceTask.run()
reduce.run(localConf, Job.this);
}
(3) ReduceTask.run()
public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
throws IOException, InterruptedException, ClassNotFoundException {
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
//阶段定义划分
if (isMapOrReduce()) {
copyPhase = getProgress().addPhase("copy");
sortPhase = getProgress().addPhase("sort");
reducePhase = getProgress().addPhase("reduce");
}
.....
// Initialize the codec 如果MapTask写出有使用压缩,在此时获取压缩格式的编解码器进行解压缩
codec = initCodec();
RawKeyValueIterator rIter = null;
//定义shuffle阶段的消费者线程,从MapTask输出的结果中将指定分区的数据拷贝到ReduceTask
ShuffleConsumerPlugin shuffleConsumerPlugin = null;
// 定义combiner,在reduceTask端合并多个MapTask同一分区的数据时,如果reduceTask内存不够
//会发生溢写,在每次溢写前,还会调用combiner!
Class combinerClass = conf.getCombinerClass();
CombineOutputCollector combineCollector =
(null != combinerClass) ?
new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;
//初始化shuffle线程,调用其run()
rIter = shuffleConsumerPlugin.run();
// free up the data structures shuffle阶段已经完成了copy过程
mapOutputFilesOnDisk.clear();
// 在shuffle中,sort也已经完成。已经按照Mapper的输出,对所有的数据进行了整体的排序
sortPhase.complete(); // sort is complete
setPhase(TaskStatus.Phase.REDUCE);
statusUpdate(umbilical);
//获取Mapper输出的key-value的类型
Class keyClass = job.getMapOutputKeyClass();
Class valueClass = job.getMapOutputValueClass();
//定义分组比较器
RawComparator comparator = job.getOutputValueGroupingComparator();
if (useNewApi) {
//运行Reducer
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
} else {
runOldReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}
shuffleConsumerPlugin.close();
done(umbilical, reporter);
}
(4) 获取分组比较器
public RawComparator getOutputValueGroupingComparator() {
//先获取用户定义的比较器,从配置中获取mapreduce.job.output.group.comparator.class
//参数,必须是RawComparator类型
Class<? extends RawComparator> theClass = getClass(
JobContext.GROUP_COMPARATOR_CLASS, null, RawComparator.class);
//如果用户没有定义,默认使用Map阶段key的比较器
if (theClass == null) {
return getOutputKeyComparator();
}
//如果用户定义了,就使用用户的比较器
return ReflectionUtils.newInstance(theClass, this);
}
(5) runNewReducer
private <INKEY,INVALUE,OUTKEY,OUTVALUE>
void runNewReducer(JobConf job,
final TaskUmbilicalProtocol umbilical,
final TaskReporter reporter,
RawKeyValueIterator rIter,
RawComparator<INKEY> comparator,
Class<INKEY> keyClass,
Class<INVALUE> valueClass
) throws IOException,InterruptedException,
ClassNotFoundException {
// wrap value iterator to report progress.
//封装key-value的迭代器,让迭代器可以报告进度
final RawKeyValueIterator rawIter = rIter;
rIter = new RawKeyValueIterator() {
public void close() throws IOException {
rawIter.close();
}
//每次迭代key-value时,并不是将数据读取后直接封装为key-value
//而是获取当前key-value的byte[]的内容,再使用反序列化
//把这部分内容的属性设置到key-value的实例中
public DataInputBuffer getKey() throws IOException {
return rawIter.getKey();
}
public Progress getProgress() {
return rawIter.getProgress();
}
public DataInputBuffer getValue() throws IOException {
return rawIter.getValue();
}
public boolean next() throws IOException {
boolean ret = rawIter.next();
reporter.setProgress(rawIter.getProgress().getProgress());
return ret;
}
};
// 创建上下文
org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
getTaskID(), reporter);
// 实例化reducer对象,一个reduceTask只会创建一个reducer对象
org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =
(org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> trackedRW =
new NewTrackingRecordWriter<OUTKEY, OUTVALUE>(this, taskContext);
job.setBoolean("mapred.skip.on", isSkipping());
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
//reduce中的context
org.apache.hadoop.mapreduce.Reducer.Context
reducerContext = createReduceContext(reducer, job, getTaskID(),
rIter, reduceInputKeyCounter,
reduceInputValueCounter,
trackedRW,
committer,
reporter, comparator, keyClass,
valueClass);
try {
//运行reducer.run()
reducer.run(reducerContext);
} finally {
trackedRW.close(reducerContext);
}
}
(6) Reducer.run()
public void run(Context context) throws IOException, InterruptedException {
//在reduce()之前,调用一次setUp()
setup(context);
try {
//判断数据中是否有和当前读取的key相同的key-value,如果读到了相同的key-value,进入一次reduce()
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
// If a back up store is used, reset it
Iterator<VALUEIN> iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();
}
}
} finally {
//在reduce()之前,调用一次cleanUp()
cleanup(context);
}
}
在这里面调用我们定义的reduce方法,然后把k-v写出去.
(7) reduce
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
// 1 遍历values
for (IntWritable value : values) {
sum += value.get();
}
IntWritable v = new IntWritable();
v.set(sum);
//2 写出
context.write(key, v);
}
然后reduce阶段完毕.
本文深入解析MapReduce Job的提交过程,包括宏观流程、运行模式以及源码分析。在源码层面,详细探讨了Job的预处理阶段,如WordCount案例、提交前的步骤,如确保状态、设置API使用、连接集群等。此外,文章还介绍了Map和Reduce阶段的执行细节,如MapTask的运行、数据分区、排序和序列化,以及ReduceTask的执行过程。
1163





