MapReduce 初解。

最新推荐文章于 2025-08-04 18:50:55 发布

cudi7618

最新推荐文章于 2025-08-04 18:50:55 发布

阅读量106

点赞数

文章标签：大数据 java 运维

本文详细介绍了如何接手公司Hadoop集群，重点阐述了MapReduce框架的原理、组成、输入输出流程、用户接口及内存管理策略，同时探讨了任务执行与环境配置、目录结构、以及如何优化配置以实现负载均衡。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

开始接手公司的hadoop 集群了，需要了解下这个东西，以前接触过一段时间，

以后要来真的啦。

本文对应于： http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html

Mapreduce 是一个软件开发框架，能够帮助应用快速实现分布式计算，且具有较好的容错性，
一旦任务的某个部分失败了，系统可以重新发动失败的部分。

组成： JobTracker   TaskTracker

hadoop 系统是由java 编写，对java支持良好，

也可以通过hadoop streaming   s使用其他的变成语言编写mapreduce 应用，

还可以通过hadoop pipes （c++ 兼容的组件）来启用其他语言编写mapreduce程序。

mapreduce 的输入与输出：（Input and Outeput）
一个完整的流程为；
（input） -> map -> -> combine -> -> reduce -> (output)

mapreduce 的用户接口（User Interface）

Mapper ，输入key/value 对，然后输出中间结果 key/value 对。

mapper 通过JobConf 来提交任务，JobConfigurable。configure（JobConf）并重载他，来初始化自己

并通过重载Closeable.close() 方法来完成任务后的清理工作。

输出（Ooutput）的格式不一定要跟输入的格式相同，
输出的k/v对的收集是通过OutputCollector.collect(WritableComparable,Writable) 方法实现。

应用还可以用报告器（reporter）来报告，处理进度，应用运行状态，以及更新计数器，或者只是报告进程本身的存活状态。

中间结果集，可以通过有Comparator 通过JobConfsetOoutputKeyComparaterClass（Class）来执行group by 操作。

用户可以可选combiner 通过JobConf.setCombinerClass（Class）来完成中间结果集的本地归并，有利于在reduce 阶段，减少数据的传输。

中间结果的输出格式为(key-len,key,value-len,value)

一般定义多少个maps 比较合适呢，一般情况先是根据输入文件的尺寸来定的，一个数据块，对应一个mapper

我们目前采用的是默认大小的数据块64Mb

reducer

reduce 阶段主要分3个部分： shuffle sort reduce

shaffle 是通过http 把mapper 生成的中间结果集送到reducer

sort 对结中间结果集进行排序。

reduce 对中间结果集进行归并，生成 key/ list of value 的最终结果集。

一般定义多少个reducer ？
真确的结果一般为节点（node）数的）0.95 或者1.75倍，
如果是0.95 所有的reduce 会立即执行，如果是1.75倍，执行快的节点，完成后，可以再次分的一个reduce 任务，这样可以较好的实现负载均衡。

Partitioner 分区

partitioner 可以把mapper 生成的中间结果集分区，分区的数目一般是keyi 基于 reduce数目的hash 分区。

reduce 也可以使用reporter 报告自身的状态，进度，计数器。

OutputCollector 是有mapreduce 框架提供的map 与reduce 的结果收集设备。

Job Configuration

Jobconf 是用户提交mareduce 任务的主要接口。

部分参数会被标记为final 这些参数一般是管理员设定在配置文件里的，这样的参数是不能被修改的。

Task Execution&Environment

Tasktracker 通过启用一个子进行在单独的jvm里执行map reduce 任务。

子进程继承了父进程 tasktracker 的环境，用户可以通过mapred.{map|reduce}.child.java.opts 这个再jobconf里的环境变量来设置非标准的运行时参数，通过-Djava.library.path=<> 等参数可以设置额外的库搜索路径。

下面是一个示例：

mapred.map.child.java.opts

     -Xmx512M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc
      -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false


mapred.reduce.child.java.opts

     -Xmx1024M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc
      -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false


内存管理（Memory Management)

用户可以通过 mepred.{map|reduce}.child.ulimit 来设定非常规的内存大小，这个参数是kb 为单位，但是这个参数必须大于-Xmx 参数才可以启动jvm 。

mapred.{map|reduce}.child.java.opts 只能管理通过tasktracker 启动的子进程。

用户也可以通过覆盖默认值来设置内存大小，在内存可以管理的情况下

mapred.task.maxvmem   指定虚拟内存的大小单位字节(VM)

mapred.task.maxpmem   指定内存的大小（RAM)

MAP 参数 (Map Parameters)

io.sort.mb   map 过程中，用来排序的内存的总大小。

io.sort.record.percent 排序数据占到io.sort.mb 的百分比，就触发磁盘排序。

io.sort.spill.percent   This is the threshold for the accounting and serialization buffers. When this percentage of either buffer has filled, their contents will be spilled to disk in the background. Let io.sort.record.percent be r, io.sort.mb be x, and this value be q. The maximum number of records collected before the collection thread will spill is r * x * q * 2^16. Note that a higher value may decrease the number of- or even eliminate- merges, but will also increase the probability of the map task getting blocked. The lowest average map times are usually obtained by accurately estimating the size of the map output and preventing multiple spills.

Shaffle / reduce 参数：

io.sort.factor

int

Specifies the number of segments on disk to be merged at the same time. It limits the number of open files and compression codecs during the merge. If the number of files exceeds this limit, the merge will proceed in several passes. Though this limit also applies to the map, most jobs should be configured so that hitting this limit is unlikely there.

mapred.inmem.merge.threshold

int

The number of sorted map outputs fetched into memory before being merged to disk. Like the spill thresholds in the preceding note, this is not defining a unit of partition, but a trigger. In practice, this is usually set very high (1000) or disabled (0), since merging in-memory segments is often less expensive than merging from disk (see notes following this table). This threshold influences only the frequency of in-memory merges during the shuffle.

mapred.job.shuffle.merge.percent

float

The memory threshold for fetched map outputs before an in-memory merge is started, expressed as a percentage of memory allocated to storing map outputs in memory. Since map outputs that can't fit in memory can be stalled, setting this high may decrease parallelism between the fetch and merge. Conversely, values as high as 1.0 have been effective for reduces whose input can fit entirely in memory. This parameter influences only the frequency of in-memory merges during the shuffle.

mapred.job.shuffle.input.buffer.percent

float

The percentage of memory- relative to the maximum heapsize as typically specified in mapred.reduce.child.java.opts- that can be allocated to storing map outputs during the shuffle. Though some memory should be set aside for the framework, in general it is advantageous to set this high enough to store large and numerous map outputs.

mapred.job.reduce.input.buffer.percent

float

The percentage of memory relative to the maximum heapsize in which map outputs may be retained during the reduce. When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines. By default, all map outputs are merged to disk before the reduce begins to maximize the memory available to the reduce. For less memory-intensive reduces, this should be increased to avoid trips to disk.

目录结构：

task tracker 用本地目录${mapred.local.dir}/taskTracker/ 来建立本地cache 和本地job 。

${mapred.local.dir}/taskTracker/distcache/: 公共分发cache 可以被所有用户下面的job共享

${mapred.local.dir}/taskTracker/$user/distcache/ : 私有的分发cache ，单用户下的任务可以共享

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/ : 本地任务的工作目录

下面的子目录：
${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/work/ : job 特定的共享目录，在任务间可以共享这个目录可以通过API JobConf.getJobLocalDir() 取得，也作为一个系统属性存在。可以通过 system.getProperty("job.local.dir") 取得。

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/jars/ : jar 目录 job.jar 是应用程序的jar包，自动分发到各个节点。 job.jar 可以通过API jobConf.getJar() 取得。取unjarred 的目录可以通过jobConf.getJar().parent() 取得。

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/job.xml ：任务的配置文件。

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid : 任务目录

每个任务目录都有一下结构：

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/job.xml ：任务本地化的配置

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/output ：中间结果的输出文件，例如mapper 的输出

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work : 任务当前的工作目录，如果启动了jvmreuse 选项，这个目录在jvm启动时候建立。

${mapred.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work/tmp : 任务的临时目录

task 的本地化配置参数：
mapred.job.id : job id

mapred.jar : job.jar

job.local.dir : = 上面提到的work 目录

mapred.tip.id ：= 上面的taskid

mapred.task.is.map : ihis is a map task

mapred.task.partition: the id of the task whin the job , 一个任务中的子任务的。

map.input.file ： map 任务读取的文件

map.input.start ： map 读取的文件的位移（offset）

map.input.length ： map 读取的文件的长度

mapred.work.output.dir ：任务临时的输出目录。

OutputCommiter

OutputFormater 描述了一个mapreduce任务的输入的commiter

主要的功能：

初始化过程中，建立job的工作环境，比如建立临时文件夹等等。。

job完成后，清理战场，并标注job是完成，失败，或者被杀死。（succeded/failed/killed）

设置job的临时输出目录

检查任务是否需要commit ，（写磁盘）

其他部分例如队列(queues) , debug ,distributedcache

这些部分以后再详细整理。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/133735/viewspace-757338/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/133735/viewspace-757338/