MongoDB regular expression with indexed field

最新推荐文章于 2024-06-04 12:00:00 发布

SalmonellaVaccine

最新推荐文章于 2024-06-04 12:00:00 发布

阅读量729

点赞数

分类专栏： MongoDB

MongoDB 专栏收录该内容

23 篇文章

订阅专栏

本文探讨了在MongoDB中使用索引与正则表达式查询时的性能差异原因。通过对比带索引与不带索引的情况，解释了为何有时带索引查询可能比不带索引查询慢，并提出了提高查询效率的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

http://stackoverflow.com/questions/8107102/mongodb-regular-expression-with-indexed-field

I was creating my first app using MongoDB. Created index for a field, and tried a find query with $regex param, launched in a shell

> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
        "cursor" : "BtreeCursor A_1 multi",
        "nscanned" : 500001,
        "nscannedObjects" : 10,
        "n" : 10,
        "millis" : 956,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {
                "A" : [
                        [
                                "",
                                {

                                }
                        ],
                        [
                                /BLABLA!25500[0-9]/,
                                /BLABLA!25500[0-9]/
                        ]
                ]
        }
}

It's very strange, because when i'm launching the same query, but with no index in collection, the performance is much better.

> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 500002,
        "nscannedObjects" : 500002,
        "n" : 10,
        "millis" : 531,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

Obviously, searching a field with index without regex is working much faster(i.e. searching document with constant field) , but i'm really interested in reason of such behavior.

The reason for the performance differential here is likely that, with the index enabled, your query must traverse the index (load into memory), then load the matching documents to be returned into memory also. Since you are not using the prefix query all values in the index will be scanned and tested against the regular expression. Not very efficient.

When you remove the index you are just doing a table scan and matching the regex there - essentially you simplified things from the first one slightly.

You might be able to make the indexed version quicker if it were a covered index query, it would also likely be faster if this were a compound index and you needed to combine it with the criteria for another field.

When you use a prefix query, it's not that it only uses an index then, but you use the index efficiently, which is key, and hence you see the real performance gains.