Google’s MapReduce Programming Model-Revisted
Google's MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google's domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce and Sawzall, and we capture our findings as an executable specification. We also identify and resolve some obscurities in the informal presentation given in the seminal papers. We use typed functional programming (specifically Haskell) as a tool for design recovery and executable specification. Our development comprises three components: (i) the basic program skeleton that underlies MapReduce computations; (ii) the opportunities for parallelism in executing MapReduce computations; (iii) the fundamental characteristics of Sawzall's aggregators as an advancement of the MapReduce approach. Our development does not formalize the more implementational aspects of an actual, distributed execution of MapReduce computations.
Keywords:Data processing; Parallel programming; Distributed programming; Software design; Executable specification; Typed functional programming; MapReduce; Sawzall; Map; Reduce; List homomorphism; Haskell
http://portal.acm.org/citation.cfm?id=1290812
Ralf Lämmel
,a,
Abstract
Google’s MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google’s domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce and Sawzall, and we capture our findings as an executable specification. We also identify and resolve some obscurities in the informal presentation given in the seminal papers. We use typed functional programming (specifically Haskell) as a tool for design recovery and executable specification. Our development comprises three components: (i) the basic program skeleton that underlies MapReduce computations; (ii) the opportunities for parallelism in executing MapReduce computations; (iii) the fundamental characteristics of Sawzall’s aggregators as an advancement of the MapReduce approach. Our development does not formalize the more implementational aspects of an actual, distributed execution of MapReduce computations.
Keywords:Data processing; Parallel programming; Distributed programming; Software design; Executable specification; Typed functional programming; MapReduce; Sawzall; Map; Reduce; List homomorphism; Haskell
http://www.sciencedirect.com/science/article/pii/S0167642307001281
[PDF]
Google's MapReduce Programming Model — Revisited
-[]作者:R Lämmel- 被引用次数:111- 相关文章
Google's MapReduce Programming Model — Revisited. ∗. Ralf Lämmel. Data Programmability Team. Microsoft Corp. Redmond, WA, USA. Abstract ...
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104...
| Author: | Ralf Lämmel | Data Programmability Team, Microsoft Corp., Redmond, WA, USA |
|
本文深入探讨了Google的MapReduce编程模型及其在大规模并行处理大型数据集中的应用。通过对原始论文的逆向工程,揭示了MapReduce及Google特定领域的语言Sawzall的关键特性,并使用Haskell进行了形式化描述。

被折叠的 条评论
为什么被折叠?



