- 0 reducer means reduce step will be skipped and mapper output will be the final out
- Identity reducer means then shuffling/sorting will still take place
If you do not need sorting of map results - you set 0 reduced,and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
Another use-case for using the Identity Reducer is to combine all the results into <# of reducers> output files. This can be handy if you are using Amazon Web Services to write to S3 directly, especially if the mapper output is small (e.g. a grep/search for a record), and you have a lot of mappers (e.g. 1000's).
References
http://stackoverflow.com/questions/10630447/hadoop-difference-between-0-reducer-and-identity-reducer
本文详细解释了Hadoop中0 reducer与Identity reducer的区别及其应用场景。0 reducer意味着Reduce步骤被跳过,Mapper的输出即为最终结果;而Identity reducer则会进行结果的排序,但不进行聚合操作。此外还介绍了使用Identity reducer将所有结果合并到指定数量输出文件的方法。
1万+

被折叠的 条评论
为什么被折叠?



