MapReduce模式MapReduce patterns

最新推荐文章于 2020-11-24 15:14:34 发布

GarfieldEr007

最新推荐文章于 2020-11-24 15:14:34 发布

阅读量955

点赞数

Hadoop 专栏收录该内容

123 篇文章

订阅专栏

本文详细介绍了MapReduce编程中的三种常见模式：总结、过滤与结构操作，并通过实例解释了每种模式的具体应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

After having modified and run a job in the last post, we can now examine which are the most frequent patterns we encounter in MapReduce programming.
Although there are many of them, I think that the most important ones are:

Summarization
Filtering
Structural

Let's examine them in detail.

Summarization
By summarization we mean all the jobs that perform numerical computation over a set of data, like:

indexing
mean (or other statistical functions) computation
min/max computation
count (we've seen the WordCount example)

Filtering
Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job.

Structural
When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS.

In the next posts, we'll see in more detail how to deal with these patterns.

from: http://andreaiacono.blogspot.com/2014/03/mapreduce-patterns.html