mongodb之分页与索引

最新推荐文章于 2025-03-12 08:35:16 发布

第十人

最新推荐文章于 2025-03-12 08:35:16 发布

阅读量820

点赞数

分类专栏： mongodb 文章标签： mongodb 索引分页

本文链接：https://blog.youkuaiyun.com/bestlove1990/article/details/25803425

版权

mongodb 专栏收录该内容

2 篇文章

订阅专栏

本文介绍了MongoDB在大数据量查询时的分页和索引优化方法。避免使用`skip`，因为它会导致服务查询大量无用数据。确保`sort`字段包含在索引中，以减少内存消耗和提高查询效率。对于复合索引，排序顺序可能影响其是否能支持排序操作。实验表明，即使`sort`字段不在查询条件中，只要在索引内，仍能有效支持排序。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

mongodb作为nosql数据库的一个优势便是大数据量下的查询速度，说起查询，分页就呼之欲出；查询有一个伴生物，那便是索引。

优化如下：

1. 弃用skip：

官方解释如下：

The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset (e.g. pageNumber above) increases, cursor.skip() will become slower and more CPU intensive. With larger collections, cursor.skip() may become IO bound.

大致意思是：使用skip时，服务要从开始已经查到相应的offset值，才会将结果返回，也就是说，skip前的数据服务也要获得，这无疑会增加服务查询的数量。

2. 将sort字段放在索引中：

官方解释找不到了，但是原因如下：

如果sort不存在于索引中的话，mongo会将所有符合条件的记录放到内存中（无视limit），再进行排序，这无疑是很好性能的。

个人测试如下：

查询语句：db.order.find({ "Id" : 1, "refStatus" : 0, "orderTime" : { "$gte" : ISODate("2014-03-01T16:00:00Z"), "$lt" : ISODate("2014-03-02T16:00:00Z") }, "clearStatus" : 1 }).limit(50).sort({ "_id" : 1 }).explain();

语句分析：
{
"cursor" : "BtreeCursor Id_1_orderTime_-1",
"isMultiKey" : false,
"n" : 50,
"nscannedObjects" : 55461,
"nscanned" : 55461,
"nscannedObjectsAllPlans" : 221846,
"nscannedAllPlans" : 221846,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 1323,
"indexBounds" : {
"Id" : [
[
1,
1
]
],
"orderTime" : [
[
ISODate("2014-03-02T16:00:00Z"),
ISODate("2014-03-01T16:00:00Z")
]
]
},
"server" : "master:37017"
}

上面的语句解释显示，完全没有用到_id的索引，因此会将所有的相关数据查出来。

我们换一种方式，sort使用的字段在使用的索引中：

查询语句：

db.union_order.find({ "unionId" : 1, "refStatus" : 0, "orderTime" : { "$gte" : ISODate("2014-03-01T16:00:00Z"), "$lt" : ISODate("2014-03-02T16:00:00Z") }, "clearStatus" : 1 }).limit(50).sort({ "orderTime" : -1 }).explain()

语句分析：

{
"cursor" : "BtreeCursor unionId_1_orderTime_-1",
"isMultiKey" : false,
"n" : 50,
"nscannedObjects" : 93,
"nscanned" : 93,
"nscannedObjectsAllPlans" : 278,
"nscannedAllPlans" : 278,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
"unionId" : [
[
1,
1
]
],
"orderTime" : [
[
ISODate("2014-03-02T16:00:00Z"),
ISODate("2014-03-01T16:00:00Z")
]
]
},
"server" : "master:37017"
}

注：orderTime和_id的顺序完全一致

由以上两个语句对比发现，如果sort对应字段不在索引中，那么需要查出所有的相关信息，而如果在索引中，那么查询的数据会非常少。

这篇博客中也有介绍:10gen工程师谈MongoDB组合索引的优化

3. 索引中含有sort字段，但是也不一定会有效

官方解释如下：

For single-field indexes, the sort order of keys doesn’t matter because MongoDB can traverse the index in either direction. However, for compound indexes, sort order can matter in determining whether the index can support a sort operation.

大致意思是：

对于单字段索引，key的顺序并没有多大影响；但是对于符合索引，排序的顺序会关系到索引是否支持排序操作。

比如对于索引：db.events.ensureIndex( { "username" : 1, "date" : -1 } )

以下两个查询都是索引可以支持排序的：

db.events.find().sort( { username: -1, date: 1 } )

db.events.find().sort( { username: 1, date: -1 } )

但是，这个查询却不可以：

db.events.find().sort( { username: 1, date: 1 } )

我这里还存在一个疑问，下面这个查询，索引会不会支持排序呢

db.events.find({ username:“a”, date: { "$gte" : ISODate("2014-03-01T16:00:00Z"), "$lt" : ISODate("2014-03-02T16:00:00Z") }}).sort( { date: 1 } )

个人猜测是可以的。原因如下：

sort( { date: 1 }，其实可以使用{ username: -1, date: 1 }，因为username只有一个值而不是一个范围，所以，我想mongo不会那么弱智的吧

update @ 20140515

我通过explain发现，是支持排序的

explain如下：

"cursor" : "BtreeCursor unionId_1_orderTime_-1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 45154,
"nscanned" : 45154,
"nscannedObjectsAllPlans" : 46513,
"nscannedAllPlans" : 46513,
"scanAndOrder" : false,
"indexOnly" : false,