MapReduce模式MapReduce patterns

本文详细介绍了MapReduce编程中的三种常见模式:总结、过滤与结构操作,并通过实例解释了每种模式的具体应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

After having modified and run a job in the last post, we can now examine which are the most frequent patterns we encounter in MapReduce programming. 
Although there are many of them, I think that the most important ones are:

  • Summarization
  • Filtering
  • Structural

Let's examine them in detail. 

Summarization 
By summarization we mean all the jobs that perform numerical computation over a set of data, like:

  • indexing
  • mean (or other statistical functions) computation
  • min/max computation
  • count (we've seen the WordCount example)


Filtering 
Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job. 

Structural 
When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS. 

In the next posts, we'll see in more detail how to deal with these patterns.

from: http://andreaiacono.blogspot.com/2014/03/mapreduce-patterns.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值