MongoDB : Aggregation mongo聚合操作详细说明

本文介绍了MongoDB中聚合操作的基本概念及使用方法,包括计数(count)、获取唯一值(distinct)和分组(group)等功能,并提供了多个示例来演示如何在不同场景下应用这些功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

This page last changed on Feb 02, 2012 by dwight.

Mongo includes utility functions which provide server-side count, distinct, and group by operations.  More advanced aggregate functions can be crafted using MapReduce.

Aggregation Framework

v2.2+

This is a new module in the v2.2 release.  You can test it via v2.1.0 in the 2.1 development (unstable) branch. Aggregation Framework

Count

count() returns the number of objects in a collection or matching a query. If a document selector is provided, only the number of matching documents will be returned.

size() is like count() but takes into consideration any limit() or skip() specified for the query.

db.collection.count(selector);

For example:

print( "# of objects: " + db.mycollection.count() );
print( db.mycollection.count( {active:true} );

count is faster if an index exists for the condition in the selector. For example, to make the count on active fast, invoke

db.mycollection.ensureIndex( {active:1} );

Distinct

The distinct command returns returns a list of distinct values for the given key across a collection. 

Command is of the form:

{ distinct : <collection_name>, key : <key>[, query : <query>] }

although many drivers have a helper function for distinct.

> db.addresses.insert({"zip-code": 10010})
> db.addresses.insert({"zip-code": 10010})
> db.addresses.insert({"zip-code": 99701})

> // shell helper:
> db.addresses.distinct("zip-code");
[ 10010, 99701 ]

> // running as a command manually:
> db.runCommand( { distinct: 'addresses', key: 'zip-code' } )
{ "values" : [ 10010, 99701 ], "ok" : 1 }

distinct may also reference a nested key:

> db.comments.save({"user": {"points": 25}})
> db.comments.save({"user": {"points": 31}})
> db.comments.save({"user": {"points": 25}})

> db.comments.distinct("user.points");
[ 25, 31 ]

You can add an optional query parameter to distinct as well

> db.address.distinct( "zip-code" , { age : 30 } )

Note: the distinct command results are returned as a single BSON object.  If the results could be large (> max document size – 4/16MB ), use map/reduce instead.

Covered Index Use
Starting with 1.7.3 distinct can use an index not only to find the documents in the query, but also to return the data.

Group

Note: currently one must use map/reduce instead of group() in sharded MongoDB configurations.

group returns an array of grouped items. The command is similar to SQL's group by. The SQL statement

select a,b,sum(c) csum from coll where active=1 group by a,b

corresponds to the following in MongoDB:

db.coll.group(
           {key: { a:true, b:true },
            cond: { active:1 },
            reduce: function(obj,prev) { prev.csum += obj.c; },
            initial: { csum: 0 }
            });

Note: the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception. For larger grouping operations without limits, please use map/reduce .

group takes a single object parameter containing the following fields:

  • key: Fields to group by.
  • reduce: The reduce function aggregates (reduces) the objects iterated. Typical operations of a reduce function include summing and counting. reduce takes two arguments: the current document being iterated over and the aggregation counter object. In the example above, these arguments are named obj and prev.
  • initial: initial value of the aggregation counter object.
  • keyf: An optional function returning a "key object" to be used as the grouping key. Use this instead of key to specify a key that is not a single/multiple existing fields. Could be used to group by day of the week, for example. Set in lieu of key.
  • cond: An optional condition that must be true for a row to be considered. This is essentially a find() query expression object. If null, the reduce function will run against all rows in the collection.
  • finalize: An optional function to be run on each item in the result set just before the item is returned. Can either modify the item (e.g., add an average field given a count and a total) or return a replacement object (returning a new object with just _id and average fields). See jstests/group3.js for examples.

To order the grouped data, simply sort it client-side upon return. The following example is an implementation of count() using group().

function gcount(collection, condition) {
  var res =
    db[collection].group(
           { key: {},
             initial: {count: 0},
             reduce: function(obj,prev){ prev.count++;},
             cond: condition } );
  // group() returns an array of grouped items.  here, there will be a single
  // item, as key is {}.
  return res[0] ? res[0].count : 0;
}

Examples

The examples assume data like this:

{ domain: "www.mongodb.org"
, invoked_at: {d:"2009-11-03", t:"17:14:05"}
, response_time: 0.05
, http_action: "GET /display/DOCS/Aggregation"
}

Show me stats for each http_action in November 2009:

db.test.group(
   { cond: {"invoked_at.d": {$gte: "2009-11", $lt: "2009-12"}}
   , key: {http_action: true}
   , initial: {count: 0, total_time:0}
   , reduce: function(doc, out){ out.count++; out.total_time+=doc.response_time }
   , finalize: function(out){ out.avg_time = out.total_time / out.count }
   } );

[
  {
    "http_action" : "GET /display/DOCS/Aggregation",
    "count" : 1,
    "total_time" : 0.05,
    "avg_time" : 0.05
  }
]

Show me stats for each domain for each day in November 2009:

db.test.group(
   { cond: {"invoked_at.d": {$gte: "2009-11", $lt: "2009-12"}}
   , key: {domain: true, invoked_at.d: true}
   , initial: {count: 0, total_time:0}
   , reduce: function(doc, out){ out.count++; out.total_time+=doc.response_time }
   , finalize: function(out){ out.avg_time = out.total_time / out.count }
   } );

[
  {
    "http_action" : "GET /display/DOCS/Aggregation",
    "count" : 1,
    "total_time" : 0.05,
    "avg_time" : 0.05
  }
]

Using Group from Various Languages

Some language drivers provide a group helper function.  For those that don't, one can manually issue the db command for group.  Here's an example using the Mongo shell syntax:

 > db.foo.find()
{"_id" :  ObjectId( "4a92af2db3d09cb83d985f6f")  , "x" : 1}
{"_id" :  ObjectId( "4a92af2fb3d09cb83d985f70")  , "x" : 3}
{"_id" :  ObjectId( "4a92afdab3d09cb83d985f71")  , "x" : 3}

> db.$cmd.findOne({group : {
... ns : "foo",
... cond : {},
... key : {x : 1},
... initial : {count : 0},
... $reduce : function(obj,prev){prev.count++;}}})
{"retval" : [{"x" : 1 , "count" : 1},{"x" : 3 , "count" : 2}] , "count" : 3 , "keys" : 2 , "ok" : 1}

If you use the database command with keyf (instead of key) it must be prefixed with a $. For example:

db.$cmd.findOne({group : {
... ns : "foo",
... $keyf : function(doc) { return {"x" : doc.x}; },
... initial : {count : 0},
... $reduce : function(obj,prev) { prev.count++; }}})

Map/Reduce

MongoDB provides a MapReduce facility for more advanced aggregation needs. CouchDB users: please note that basic queries in MongoDB do not use map/reduce.

See Also

site: http://api.mongodb.org/wiki/current/Aggregation.html#Aggregation-Group

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值