Optimizing Map/Reduce with MongoDB

本文探讨了使用MongoDB 1.8及更早版本进行Map/Reduce操作时常见的性能问题,并提出通过设置适当的排序参数来提高效率的方法。特别是介绍了如何选择与发射键相同的输入排序键,以及确保该键被索引的重要性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Optimizing Map/Reduce with MongoDB

I’ve come across several users who experience poor performance when using Map/Reduce with MongoDB version 1.8 and older, and it turns out that in many cases it is easily fixable. Today I will focus on the “sort” parameter of the MapReduce command, which is often overlooked but critical.

Here is how the M/R works in the general case, assuming there is no query filter:

  • mongod does full table scan in natural order, going through all documents of collection
  • for each document, map() is called, which emits a document like {_id: key, value: val} which gets stored in an in memory map (tree).
  • mongod checks every 100 records that the size of the map is not over 50KB, if so it runs reduce on ALL current keys. If size of map is still over 100KB, it dumps all current documents to disk in an “incremental” collection.
  • when all mapping is done, it reads back from the inc collection sorted by _id, and does the final reduce.

Now if you have many documents, and the key distribution is fairly random, it can result in following: all docs get inserted to map but it is not useful for reduction, and most documents will end up in the “inc” collection on disk that needs to be read back in order. The particular issue to understand is that since mongod has no idea what key you will use to emit, it cannot presort the data to make it efficient.

To fix this issue:

  • add an input sort key for the M/R job that is the same as the emit key.
  • make sure that key is indexed and works well with your query filter. You should run a find() with same query and sort with explain(), and make sure it uses an index.

This can result in 100x performance in some cases. Note that in mongo 1.9 and above, some works has been done to improve performance:

  • threshold to run reduces or dump to disk have been increased.
  • there is a new “pure JS” mode that can be very fast for light jobs.
  • optimized the js engine interface

But in any case mongod is still not aware of your emit key, so use sort!

cheers

AG

    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值