MapReduce: Simplified Data Processing on Large Clusters 论文笔记

本文介绍了MapReduce编程模型,该模型简化了大规模数据集的并行处理过程。通过用户定义的Map和Reduce函数,可以轻松地实现数据的并行处理与聚合。MapReduce能够隐藏并行化、容错性等复杂细节,使没有并行编程经验的开发者也能高效地处理海量数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Why do it

The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues.

Programming Model

Map

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.

Reduce

The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory.

Execution overview

MapDeduce

Conclusions

why this model is success

  1. the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.
  2. a large variety of problems are easily expressible as MapReduce computations.
  3. we have developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines

Experiences

  1. restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tolerant.
  2. network bandwidth is a scarce resource, the locality optimization allows us to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth.
  3. redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值