Hadoop之MapReduce实现原理-编程模型篇

最新推荐文章于 2025-02-28 21:41:34 发布

4935同学

最新推荐文章于 2025-02-28 21:41:34 发布

阅读量769

点赞数 1

分类专栏： Hadoop 文章标签： hadoop mapreduce java

本文链接：https://blog.youkuaiyun.com/weixin_43929753/article/details/130560479

版权

Hadoop 专栏收录该内容

5 篇文章

订阅专栏

本文详细介绍了HadoopMapReduce的编程模型，包括MR接口体系结构、序列化机制、Reporter参数以及回调机制。重点解析了作业配置与提交、InputFormat的getSplits()和createRecordReader()方法、OutputFormat的实现，以及Mapper、Reducer和不同Partitioner的算法实现，如HashPartitioner、BinaryPartitioner和TotalOrderPartitioner。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、编程模型概述

文章目录

1. MR接口体系结构

如下图所示，MR的接口体系包括两层，工具层、编程接口层。
接口层主要是5大组件InputFormat、Mapper、Partitioner、Reducer 和 OutputFormat。
工具层是为了方便复杂的MR程序编写和通过其他编程语言增加MR兼容性而提出来的。

2. 序列化

序列化是指将结构化数据转换为字节流数据方便在网络中传输或者写入持久化文件中；
反序列化是指将字节流文件或者持久化的文件转换为结构化数据。
在Hadoop中，序列化的功能主要有两个：1. 永久存储 2. 进程间通信。

Hadoop使用了自带的Writable进行序列化，并没有使用java的序列化，原因是java的序列化过于庞大，会包含很多的结构信息等数据，并不适合在大数据环境下进行序列化传输和存储，因此使用了Writable接口来实现序列化。
Hadoop默认实现了很多种序列化器，用户也可以自定义序列化器，只需要实现Writable接口，然后重写write()方法和ReadFields()方法即可。

3. Reporter参数

Reporter是MR提供给应用程序的工具，里面封装了报告完成进度（progress）、设定状态消息（setStatus）以及更新计数器（incrCounter）

4. 回调机制

回调机制是一种常用的设计模式，开发者按照规则进行业务逻辑的开发，在执行的过程中，MR会自动调用他们。MR的5个组件全部数据回调接口。
MR回调机制实例

二、API解析

1. 作业配置与提交

1. 配置文件介绍

在Hadoop中，各个模块都有对应的配置文件，用于保存对应模块中可配置的参数，并且是由XML格式组成，分为两类：一类是系统默认配置文件，另一类是用户自定义配置文件，系统默认配置文件包含了所有可配置项的默认值，而自定义配置文件是由管理员设置，用于覆盖默认配置文件的默认值。
Hadoop配置文件

2. MR作业配置与提交

与作业配置相关的类是Job，该类具有作业配置和作业提交的功能，如下图所示：

2. InputFormat

InputFormat主要用于描述输入数据的格式，主要有两个功能：

数据切分：按照某个策略将数据数据切分成若干个split，以便确定Map Task个数以及对应的split。
为Mapper提供输入数据：给定某个split，能将其解析成一个个key/value对。

1. getSplits()方法

FileInputFormat示例如下：

 /** 
   * 生成文件列表，并使其成为FileSplits
   * @param job the job context
   * @throws IOException
   */
  public List<InputSplit> getSplits(JobContext job) throws IOException {
    StopWatch sw = new StopWatch().start();
    long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
    long maxSize = getMaxSplitSize(job);

    // 生成切片
    List<InputSplit> splits = new ArrayList<InputSplit>();
    List<FileStatus> files = listStatus(job);

    boolean ignoreDirs = !getInputDirRecursive(job)
      && job.getConfiguration().getBoolean(INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, false);
    for (FileStatus file: files) {
      if (ignoreDirs && file.isDirectory()) {
        continue;
      }
        // 文件路径
      Path path = file.getPath();
        // 文件长度
      long length = file.getLen();
      if (length != 0) {
        BlockLocation[] blkLocations;
        if (file instanceof LocatedFileStatus) {
          blkLocations = ((LocatedFileStatus) file).getBlockLocations();
        } else {
          FileSystem fs = path.getFileSystem(job.getConfiguration());
          blkLocations = fs.getFileBlockLocations(file, 0, length);
        }
        if (isSplitable(job, path)) {
            // 获取块大小
          long blockSize = file.getBlockSize();
            // 获取切片大小：默认块大小
          long splitSize = computeSplitSize(blockSize, minSize, maxSize);

          long bytesRemaining = length;
    	// 循环切片并放到splists中
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
              // 从这里可以看出切片只是保存位置信息，进行的是逻辑切片
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                        blkLocations[blkIndex].getHosts(),
                        blkLocations[blkIndex].getCachedHosts()));
            bytesRemaining -= splitSize;
          }

          if (bytesRemaining != 0) {
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,
                       blkLocations[blkIndex].getHosts(),
                       blkLocations[blkIndex].getCachedHosts()));
          }
        } else { // 不进行切片
          if (LOG.isDebugEnabled()) {
            // Log only if the file is big enough to be splitted
            if (length > Math.min(file.getBlockSize(), minSize)) {
              LOG.debug("File is not splittable so no parallelization "
                  + "is possible: " + file.getPath());
            }
          }
          splits.add(makeSplit(path, 0, length, blkLocations[0].getHosts(),
                      blkLocations[0].getCachedHosts()));
        }
      } else { 
        //如果文件长度为0则创建空阵列
        splits.add(makeSplit(path, 0, length, new String[0]));
      }
    }
    // 保存切片数量
    job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());
    sw.stop();
    if (LOG.isDebugEnabled()) {
      LOG.debug("Total # of splits generated by getSplits: " + splits.size()
          + ", TimeTaken: " + sw.now(TimeUnit.MILLISECONDS));
    }
      // 返回切片列表
    return splits;
  }

makeSplit()方法如下：

2. createRecordReader()方法

TextInputFormat示例如下：
MR在Map Task执行过程中会不断调用RecordReader对象中的方法，迭代获取key/value对并交给map()函数处理。

  @Override
  public RecordReader<LongWritable, Text> 
    createRecordReader(InputSplit split,
                       TaskAttemptContext context) {
    String delimiter = context.getConfiguration().get(
        "textinputformat.record.delimiter");
    byte[] recordDelimiterBytes = null;
    if (null != delimiter)
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
    return new LineRecordReader(recordDelimiterBytes);
  }

Hadoop的inputFormat主要实现类如下：

FileInputFormat为各种InputFormat提供统一的getSplits函数，该函数最核心的两个算法是文件切分算法和host选择算法：

文件切分算法：

文件切分算法主要用于确定InputSplit的格式以及每个InputSplit对应的数据段，FileInputFormat以文件为单位切分生成InputSplit，InputSplit的个数由以下三个参数决定：

maxSize：由SPLIT_MAXSIZE这个参数设置

public static final String SPLIT_MAXSIZE = 
    "mapreduce.input.fileinputformat.split.maxsize";

public static long getMaxSplitSize(JobContext context) {
    return context.getConfiguration().getLong(SPLIT_MAXSIZE, 
                                              Long.MAX_VALUE);
  }

minSize：InputSplit 的最小值，默认是1

long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));

blockSize：HDFS块大小

计算方法：

  protected long computeSplitSize(long blockSize, long minSize,
                                  long maxSize) {
    return Math.max(minSize, Math.min(maxSize, blockSize));
  }

host算法：

文件是存储在HDFS上的，可能分布在不同的服务器上，在确定每个InputSplit的元数据信息的时候，需要确定InputSplit所在的文件、起始位置、长度以及所在的host列表<file, start, length, hosts>。
host算法直接影响任务能不能实现数据本地性，Hadoop将数据本地行划分为三个等级：node locality、rack locality 和 data center locality（Hadoop 还未实现该 locality 级别）；在任务调度时会一次考虑这3个节点的locality，即优先让空闲资源处理本节点上的数据，如果节点上没有可处理的数据，则处理同一个机架上的数据，最差的情况是处理其他机架上的数据（机架感知）。
首先按照 rack 包含的数据量对 rack 进行排序，然后在 rack 内部按照每个 node 包含的数据量对 node 排序，最后取前 _N _个 node 的 host 作为 InputSplit 的 host 列表，这里的 _N _为 block副本数（默认是3）。这样，当任务调度器调度 Task 时，只要将 Task 调度给位于 host 列表的节点，就认为该 Task 满足本地性。

从以上 host 选择算法可知，当 InputSplit 尺寸大于 block 尺寸时，Map Task 并不能实现完全数据本地性，也就是说，总有一部分数据需要从远程节点上读取，因而可以得出以下结论：当使用基于FileInputFormat 实现 InputFormat 时，为了提高 Map Task 的数据本地性，应尽量使 InputSplit 大小与 block 大小相同。

3. OutputFormat

OutputFormat主要用于描述输出数据的格式：

FileOutputCommitter对应的实现：


public class FileOutputCommitter extends PathOutputCommitter {
  private static final Logger LOG =
      LoggerFactory.getLogger(FileOutputCommitter.class);

  /** 
   * Name of directory where pending data is placed.  Data that has not been
   * committed yet.
   */
  public static final String PENDING_DIR_NAME = "_temporary";
  /**
   * Temporary directory name 
   *
   * The static variable to be compatible with M/R 1.x
   */
  @Deprecated
  protected static final String TEMP_DIR_NAME = PENDING_DIR_NAME;
  public static final String SUCCEEDED_FILE_NAME = "_SUCCESS";
  public static final String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER =
      "mapreduce.fileoutputcommitter.marksuccessfuljobs";
  public static final String FILEOUTPUTCOMMITTER_ALGORITHM_VERSION =
      "mapreduce.fileoutputcommitter.algorithm.version";
  public static final int FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT = 2;
  // Skip cleanup _temporary folders under job's output directory
  public static final String FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED =
      "mapreduce.fileoutputcommitter.cleanup.skipped";
  public static final boolean
      FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT = false;

  // Ignore exceptions in cleanup _temporary folder under job's output directory
  public static final String FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED =
      "mapreduce.fileoutputcommitter.cleanup-failures.ignored";
  public static final boolean
      FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT = false;

  // Number of attempts when failure happens in commit job
  public static final String FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS =
      "mapreduce.fileoutputcommitter.failures.attempts";

  // default value to be 1 to keep consistent with previous behavior
  public static final int FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT = 1;

  // Whether tasks should delete their task temporary directories. This is
  // purely an optimization for filesystems without O(1) recursive delete, as
  // commitJob will recursively delete the entire job temporary directory.
  // HDFS has O(1) recursive delete, so this parameter is left false by default.
  // Users of object stores, for example, may want to set this to true. Note:
  // this is only used if mapreduce.fileoutputcommitter.algorithm.version=2
  public static final String FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED =
      "mapreduce.fileoutputcommitter.task.cleanup.enabled";
  public static final boolean
      FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT = false;

  private Path outputPath = null;
  private Path workPath = null;
  private final int algorithmVersion;
  private final boolean skipCleanup;
  private final boolean ignoreCleanupFailures;

  /**
   * Create a file output committer
   * @param outputPath the job's output path, or null if you want the output
   * committer to act as a noop.
   * @param context the task's context
   * @throws IOException
   */
  public FileOutputCommitter(Path outputPath, 
                             TaskAttemptContext context) throws IOException {
    this(outputPath, (JobContext)context);
    if (getOutputPath() != null) {
      workPath = Preconditions.checkNotNull(
          getTaskAttemptPath(context, getOutputPath()),
          "Null task attempt path in %s and output path %s",
          context, outputPath);
    }
  }
  
  /**
   * Create a file output committer
   * @param outputPath the job's output path, or null if you want the output
   * committer to act as a noop.
   * @param context the task's context
   * @throws IOException
   */
  @Private
  public FileOutputCommitter(Path outputPath, 
                             JobContext context) throws IOException {
    super(outputPath, context);
    Configuration conf = context.getConfiguration();
    algorithmVersion =
        conf.getInt(FILEOUTPUTCOMMITTER_ALGORITHM_VERSION,
                    FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT);
    LOG.info("File Output Committer Algorithm version is " + algorithmVersion);
    if (algorithmVersion != 1 && algorithmVersion != 2) {
      throw new IOException("Only 1 or 2 algorithm version is supported");
    }

    // if skip cleanup
    skipCleanup = conf.getBoolean(
        FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED,
        FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT);

    // if ignore failures in cleanup
    ignoreCleanupFailures = conf.getBoolean(
        FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED,
        FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT);

    LOG.info("FileOutputCommitter skip cleanup _temporary folders under " +
        "output directory:" + skipCleanup + ", ignore cleanup failures: " +
        ignoreCleanupFailures);

    if (outputPath != null) {
      FileSystem fs = outputPath.getFileSystem(context.getConfiguration());
      this.outputPath = fs.makeQualified(outputPath);
    }
  }
  /**
   * Create the temporary directory that is the root of all of the task 
   * work directories.
   * @param context the job's context
   */
  public void setupJob(JobContext context) throws IOException {
    if (hasOutputPath()) {
      Path jobAttemptPath = getJobAttemptPath(context);
      FileSystem fs = jobAttemptPath.getFileSystem(
          context.getConfiguration());
      if (!fs.mkdirs(jobAttemptPath)) {
        LOG.error("Mkdirs failed to create " + jobAttemptPath);
      }
    } else {
      LOG.warn("Output Path is null in setupJob()");
    }
  }

  /**
   * The job has completed, so do works in commitJobInternal().
   * Could retry on failure if using algorithm 2.
   * @param context the job's context
   */
  public void commitJob(JobContext context) throws IOException {
    int maxAttemptsOnFailure = isCommitJobRepeatable(context) ?
        context.getConfiguration().getInt(FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS,
            FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT) : 1;
    int attempt = 0;
    boolean jobCommitNotFinished = true;
    while (jobCommitNotFinished) {
      try {
        commitJobInternal(context);
        jobCommitNotFinished = false;
      } catch (Exception e) {
        if (++attempt >= maxAttemptsOnFailure) {
          throw e;
        } else {
          LOG.warn("Exception get thrown in job commit, retry (" + attempt +
              ") time.", e);
        }
      }
    }
  }

  /**
   * The job has completed, so do following commit job, include:
   * Move all committed tasks to the final output dir (algorithm 1 only).
   * Delete the temporary directory, including all of the work directories.
   * Create a _SUCCESS file to make it as successful.
   * @param context the job's context
   */
  @VisibleForTesting
  protected void commitJobInternal(JobContext context) throws IOException {
    if (hasOutputPath()) {
      Path finalOutput = getOutputPath();
      FileSystem fs = finalOutput.getFileSystem(context.getConfiguration());

      if (algorithmVersion == 1) {
        for (FileStatus stat: getAllCommittedTaskPaths(context)) {
          mergePaths(fs, stat, finalOutput, context);
        }
      }

      if (skipCleanup) {
        LOG.info("Skip cleanup the _temporary folders under job's output " +
            "directory in commitJob.");
      } else {
        // delete the _temporary folder and create a _done file in the o/p
        // folder
        try {
          cleanupJob(context);
        } catch (IOException e) {
          if (ignoreCleanupFailures) {
            // swallow exceptions in cleanup as user configure to make sure
            // commitJob could be success even when cleanup get failure.
            LOG.error("Error in cleanup job, manually cleanup is needed.", e);
          } else {
            // throw back exception to fail commitJob.
            throw e;
          }
        }
      }
      // True if the job requires output.dir marked on successful job.
      // Note that by default it is set to true.
      if (context.getConfiguration().getBoolean(
          SUCCESSFUL_JOB_OUTPUT_DIR_MARKER, true)) {
        Path markerPath = new Path(outputPath, SUCCEEDED_FILE_NAME);
        // If job commit is repeatable and previous/another AM could write
        // mark file already, we need to set overwritten to be true explicitly
        // in case other FS implementations don't overwritten by default.
        if (isCommitJobRepeatable(context)) {
          fs.create(markerPath, true).close();
        } else {
          fs.create(markerPath).close();
        }
      }
    } else {
      LOG.warn("Output Path is null in commitJob()");
    }
  }

  /**
   * Merge two paths together.  Anything in from will be moved into to, if there
   * are any name conflicts while merging the files or directories in from win.
   * @param fs the File System to use
   * @param from the path data is coming from.
   * @param to the path data is going to.
   * @throws IOException on any error
   */
  private void mergePaths(FileSystem fs, final FileStatus from,
      final Path to, JobContext context) throws IOException {
    if (LOG.isDebugEnabled()) {
      LOG.debug("Merging data from " + from + " to " + to);
    }
    reportProgress(context);
    FileStatus toStat;
    try {
      toStat = fs.getFileStatus(to);
    } catch (FileNotFoundException fnfe) {
      toStat = null;
    }

    if (from.isFile()) {
      if (toStat != null) {
        if (!fs.delete(to, true)) {
          throw new IOException("Failed to delete " + to);
        }
      }

      if (!fs.rename(from.getPath(), to)) {
        throw new IOException("Failed to rename " + from + " to " + to);
      }
    } else if (from.isDirectory()) {
      if (toStat != null) {
        if (!toStat.isDirectory()) {
          if (!fs.delete(to, true)) {
            throw new IOException("Failed to delete " + to);
          }
          renameOrMerge(fs, from, to, context);
        } else {
          //It is a directory so merge everything in the directories
          for (FileStatus subFrom : fs.listStatus(from.getPath())) {
            Path subTo = new Path(to, subFrom.getPath().getName());
            mergePaths(fs, subFrom, subTo, context);
          }
        }
      } else {
        renameOrMerge(fs, from, to, context);
      }
    }
  }

  private void reportProgress(JobContext context) {
    if (context instanceof Progressable) {
      ((Progressable) context).progress();
    }
  }

  private void renameOrMerge(FileSystem fs, FileStatus from, Path to,
      JobContext context) throws IOException {
    if (algorithmVersion == 1) {
      if (!fs.rename(from.getPath(), to)) {
        throw new IOException("Failed to rename " + from + " to " + to);
      }
    } else {
      fs.mkdirs(to);
      for (FileStatus subFrom : fs.listStatus(from.getPath())) {
        Path subTo = new Path(to, subFrom.getPath().getName());
        mergePaths(fs, subFrom, subTo, context);
      }
    }
  }

  @Override
  @Deprecated
  public void cleanupJob(JobContext context) throws IOException {
    if (hasOutputPath()) {
      Path pendingJobAttemptsPath = getPendingJobAttemptsPath();
      FileSystem fs = pendingJobAttemptsPath
          .getFileSystem(context.getConfiguration());
      // if job allow repeatable commit and pendingJobAttemptsPath could be
      // deleted by previous AM, we should tolerate FileNotFoundException in
      // this case.
      try {
        fs.delete(pendingJobAttemptsPath, true);
      } catch (FileNotFoundException e) {
        if (!isCommitJobRepeatable(context)) {
          throw e;
        }
      }
    } else {
      LOG.warn("Output Path is null in cleanupJob()");
    }
  }

  /**
   * Delete the temporary directory, including all of the work directories.
   * @param context the job's context
   */
  @Override
  public void abortJob(JobContext context, JobStatus.State state) 
  throws IOException {
    // delete the _temporary folder
    cleanupJob(context);
  }
  
  /**
   * No task setup required.
   */
  @Override
  public void setupTask(TaskAttemptContext context) throws IOException {
    // FileOutputCommitter's setupTask doesn't do anything. Because the
    // temporary task directory is created on demand when the 
    // task is writing.
  }

  /**
   * Move the files from the work directory to the job output directory
   * @param context the task context
   */
  @Override
  public void commitTask(TaskAttemptContext context) 
  throws IOException {
    commitTask(context, null);
  }

  @Private
  public void commitTask(TaskAttemptContext context, Path taskAttemptPath) 
      throws IOException {

    TaskAttemptID attemptId = context.getTaskAttemptID();
    if (hasOutputPath()) {
      context.progress();
      if(taskAttemptPath == null) {
        taskAttemptPath = getTaskAttemptPath(context);
      }
      FileSystem fs = taskAttemptPath.getFileSystem(context.getConfiguration());
      FileStatus taskAttemptDirStatus;
      try {
        taskAttemptDirStatus = fs.getFileStatus(taskAttemptPath);
      } catch (FileNotFoundException e) {
        taskAttemptDirStatus = null;
      }

      if (taskAttemptDirStatus != null) {
        if (algorithmVersion == 1) {
          Path committedTaskPath = getCommittedTaskPath(context);
          if (fs.exists(committedTaskPath)) {
             if (!fs.delete(committedTaskPath, true)) {
               throw new IOException("Could not delete " + committedTaskPath);
             }
          }
          if (!fs.rename(taskAttemptPath, committedTaskPath)) {
            throw new IOException("Could not rename " + taskAttemptPath + " to "
                + committedTaskPath);
          }
          LOG.info("Saved output of task '" + attemptId + "' to " +
              committedTaskPath);
        } else {
          // directly merge everything from taskAttemptPath to output directory
          mergePaths(fs, taskAttemptDirStatus, outputPath, context);
          LOG.info("Saved output of task '" + attemptId + "' to " +
              outputPath);

          if (context.getConfiguration().getBoolean(
              FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED,
              FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT)) {
            LOG.debug(String.format(
                "Deleting the temporary directory of '%s': '%s'",
                attemptId, taskAttemptPath));
            if(!fs.delete(taskAttemptPath, true)) {
              LOG.warn("Could not delete " + taskAttemptPath);
            }
          }
        }
      } else {
        LOG.warn("No Output found for " + attemptId);
      }
    } else {
      LOG.warn("Output Path is null in commitTask()");
    }
  }

  /**
   * Delete the work directory
   * @throws IOException 
   */
  @Override
  public void abortTask(TaskAttemptContext context) throws IOException {
    abortTask(context, null);
  }

  @Private
  public void abortTask(TaskAttemptContext context, Path taskAttemptPath) throws IOException {
    if (hasOutputPath()) { 
      context.progress();
      if(taskAttemptPath == null) {
        taskAttemptPath = getTaskAttemptPath(context);
      }
      FileSystem fs = taskAttemptPath.getFileSystem(context.getConfiguration());
      if(!fs.delete(taskAttemptPath, true)) {
        LOG.warn("Could not delete "+taskAttemptPath);
      }
    } else {
      LOG.warn("Output Path is null in abortTask()");
    }
  }

  /**
   * Did this task write any files in the work directory?
   * @param context the task's context
   */
  @Override
  public boolean needsTaskCommit(TaskAttemptContext context
                                 ) throws IOException {
    return needsTaskCommit(context, null);
  }

  @Private
  public boolean needsTaskCommit(TaskAttemptContext context, Path taskAttemptPath
    ) throws IOException {
    if(hasOutputPath()) {
      if(taskAttemptPath == null) {
        taskAttemptPath = getTaskAttemptPath(context);
      }
      FileSystem fs = taskAttemptPath.getFileSystem(context.getConfiguration());
      return fs.exists(taskAttemptPath);
    }
    return false;
  }

  @Override
  @Deprecated
  public boolean isRecoverySupported() {
    return true;
  }

  @Override
  public boolean isCommitJobRepeatable(JobContext context) throws IOException {
    return algorithmVersion == 2;
  }

  @Override
  public void recoverTask(TaskAttemptContext context)
      throws IOException {
    if(hasOutputPath()) {
      context.progress();
      TaskAttemptID attemptId = context.getTaskAttemptID();
      int previousAttempt = getAppAttemptId(context) - 1;
      if (previousAttempt < 0) {
        throw new IOException ("Cannot recover task output for first attempt...");
      }

      Path previousCommittedTaskPath = getCommittedTaskPath(
          previousAttempt, context);
      FileSystem fs = previousCommittedTaskPath.getFileSystem(context.getConfiguration());
      if (LOG.isDebugEnabled()) {
        LOG.debug("Trying to recover task from " + previousCommittedTaskPath);
      }
      if (algorithmVersion == 1) {
        if (fs.exists(previousCommittedTaskPath)) {
          Path committedTaskPath = getCommittedTaskPath(context);
          if (!fs.delete(committedTaskPath, true) &&
              fs.exists(committedTaskPath)) {
            throw new IOException("Could not delete " + committedTaskPath);
          }
          //Rename can fail if the parent directory does not yet exist.
          Path committedParent = committedTaskPath.getParent();
          fs.mkdirs(committedParent);
          if (!fs.rename(previousCommittedTaskPath, committedTaskPath)) {
            throw new IOException("Could not rename " + previousCommittedTaskPath +
                " to " + committedTaskPath);
          }
        } else {
            LOG.warn(attemptId+" had no output to recover.");
        }
      } else {
        // essentially a no-op, but for backwards compatibility
        // after upgrade to the new fileOutputCommitter,
        // check if there are any output left in committedTaskPath
        try {
          FileStatus from = fs.getFileStatus(previousCommittedTaskPath);
          LOG.info("Recovering task for upgrading scenario, moving files from "
              + previousCommittedTaskPath + " to " + outputPath);
          mergePaths(fs, from, outputPath, context);
        } catch (FileNotFoundException ignored) {
        }
        LOG.info("Done recovering task " + attemptId);
      }
    } else {
      LOG.warn("Output Path is null in recoverTask()");
    }
  }
}

setupJob：

创建临时目录 ${mapred.out.dir}/_temporary

commitJob：

删除临时目录，并在 ${mapred.out.dir} 目录下创建空文件 _
SUCCESS

abortJob：

删除临时目录

setupTask：

不进行任何操作。原本是需要在临时目录下创建 side-effect file
的，但它是用时创建的（create on demand）

needsTaskCommit：

只要存在 side-effect file，就返回 true

commitTask：

提交结果，即将 side-effect file 移动到 ${mapred.out.dir} 目录下

abortTask：

删除任务的 side-effect file

4. Mapper与Reducer

在Map()方法中会调每一个key/value，为输入分割中的每个键/值对调用一次。大多数应用程序应该覆盖这一点。
Mapper结构：

5. Partitioner

Partitioner的作用是对Mapper产生的中间结果进行分片，以便将同一分组的数据交给同一个Reducer处理，它直接影响Reduce阶段的负载均衡，具体实现如下图所示，其中HashPartitioner是默认实现：

HashPartitioner实现算法：

public class HashPartitioner<K, V> extends Partitioner<K, V> {

  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K key, V value,int numReduceTasks) {
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }

}

BinaryPartitioner实现算法：

public class BinaryPartitioner<V> extends Partitioner<BinaryComparable, V> 
  implements Configurable {

  public static final String LEFT_OFFSET_PROPERTY_NAME = 
    "mapreduce.partition.binarypartitioner.left.offset";
  public static final String RIGHT_OFFSET_PROPERTY_NAME = 
    "mapreduce.partition.binarypartitioner.right.offset";
  
  /**
   * Set the subarray to be used for partitioning to 
   * <code>bytes[left:(right+1)]</code> in Python syntax.
   * 
   * @param conf configuration object
   * @param left left Python-style offset
   * @param right right Python-style offset
   */
  public static void setOffsets(Configuration conf, int left, int right) {
    conf.setInt(LEFT_OFFSET_PROPERTY_NAME, left);
    conf.setInt(RIGHT_OFFSET_PROPERTY_NAME, right);
  }
  
  /**
   * Set the subarray to be used for partitioning to 
   * <code>bytes[offset:]</code> in Python syntax.
   * 
   * @param conf configuration object
   * @param offset left Python-style offset
   */
  public static void setLeftOffset(Configuration conf, int offset) {
    conf.setInt(LEFT_OFFSET_PROPERTY_NAME, offset);
  }
  
  /**
   * Set the subarray to be used for partitioning to 
   * <code>bytes[:(offset+1)]</code> in Python syntax.
   * 
   * @param conf configuration object
   * @param offset right Python-style offset
   */
  public static void setRightOffset(Configuration conf, int offset) {
    conf.setInt(RIGHT_OFFSET_PROPERTY_NAME, offset);
  }
  
  
  private Configuration conf;
  private int leftOffset, rightOffset;
  
  public void setConf(Configuration conf) {
    this.conf = conf;
    leftOffset = conf.getInt(LEFT_OFFSET_PROPERTY_NAME, 0);
    rightOffset = conf.getInt(RIGHT_OFFSET_PROPERTY_NAME, -1);
  }
  
  public Configuration getConf() {
    return conf;
  }
  
  /** 
   * Use (the specified slice of the array returned by) 
   * {@link BinaryComparable#getBytes()} to partition. 
   */
  @Override
  public int getPartition(BinaryComparable key, V value, int numPartitions) {
    int length = key.getLength();
    int leftIndex = (leftOffset + length) % length;
    int rightIndex = (rightOffset + length) % length;
    int hash = WritableComparator.hashBytes(key.getBytes(), 
      leftIndex, rightIndex - leftIndex + 1);
    return (hash & Integer.MAX_VALUE) % numPartitions;
  }
  
}

TotalOrderPartitioner算法实现：

TotalOrderPartitioner提供了一种基于区间的分片方法，通常在数据全排序中，在每个Map Task进行局部排序，在Reduce阶段，启动一个Reduce Task进行全局排序，由于全局只能有一个Reduce，因此会有性能瓶颈，为了提高性能和拓展性，MR提供了TotalOrderPartitioner算法，它能够将大小按照数据分成若干个区间（分片），并保证后一个区间的数据均大于前一个区间的数据，基于 TotalOrderPartitioner 全排序的效率跟 key 分布规律和采样算法有直接关系；key 值分布越均匀且采样越具有代表性，则 Reduce Task 负载越均衡，全排序效率越高。