difference between 0 reducer and identity reducer

最新推荐文章于 2024-08-31 09:00:00 发布

原创最新推荐文章于 2024-08-31 09:00:00 发布 · 151 阅读

0 ·

CC 4.0 BY-SA版权

Hadoop 专栏收录该内容

34 篇文章

订阅专栏

本文详细解释了Hadoop中0 reducer与Identity reducer的区别及其应用场景。0 reducer意味着Reduce步骤被跳过，Mapper的输出即为最终结果；而Identity reducer则会进行结果的排序，但不进行聚合操作。此外还介绍了使用Identity reducer将所有结果合并到指定数量输出文件的方法。

0 reducer means reduce step will be skipped and mapper output will be the final out
Identity reducer means then shuffling/sorting will still take place

If you do not need sorting of map results - you set 0 reduced,and the job is called map only.

If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.

Another use-case for using the Identity Reducer is to combine all the results into <# of reducers> output files. This can be handy if you are using Amazon Web Services to write to S3 directly, especially if the mapper output is small (e.g. a grep/search for a record), and you have a lot of mappers (e.g. 1000's).

References

http://stackoverflow.com/questions/10630447/hadoop-difference-between-0-reducer-and-identity-reducer