mongodb的聚合工具有四个:count,distinct,group,MapReduce,MapReduce内容有点多,本次先学习前三个。
Count
count是最简单的聚合工具,返回集合中的文档数量
>db.students.count(); 7200
不论集合有多大,count总是能很快的返回文档数量
也可以为count增加查询条件进行查询
db.students.count({"sex":"男"});
3598
distinct
distinct用来找出给定键的所有不同值。使用时必须指定集合和键
{ distinct: "<collection>", key: "<field>", query: <query> }
例如:
db.runCommand({"distinct":"students","key":"sex"})
结果:
{
"values" : [
"男",
"女"
],
"stats" : {
"n" : 7200,
"nscanned" : 7200,
"nscannedObjects" : 7200,
"timems" : 7,
"cursor" : "BasicCursor"
},
"ok" : 1
}
这样就获得了集合中sex键的所有的值,以数组的形式返回。
想要查询数学成绩在95分以上的学生的年龄,可以使用如下:
db.runCommand({"distinct":"students","key":"age","query":{"score.math":{"$gt":95}}}) 结果: { "values" : [ 20, 22, 18, 21, 19 ], "stats" : { "n" : 311, "nscanned" : 7200, "nscannedObjects" : 7200, "timems" : 31, "cursor" : "BasicCursor" }, "ok" : 1 }
group
选定分组所依据的键,而后mongodb会将集合依据所选定的兼职的不同分成若干组,然后通过聚合每一组内的文档,产生一个结果文档
{ group: { ns: <namespace>, key: <key>, initial:<initial document>, $reduce: <reduce function>, $keyf: <key function>, cond: <query>, finalize: <finalize function> } }
Field | Type | Description | 说明 |
---|---|---|---|
ns | String | The collection from which to perform the group by operation. | 指定要分组的集合 |
key | document | The field or fields to group. Returns a “key object” for use as the grouping key. | 指定文档分组依据的键 |
initial | document | Initializes the aggregation result document. | 叠加时的初始化文档 |
$reduce | function | An aggregation function that operates on the documents during the grouping operation. These functions may return a sum or a count. The function takes two arguments: the current document and an aggregation result document for that group. | 叠加时的逻辑处理函数 |
$keyf | function | Optional. Alternative to the key field. Specifies a function that creates a “key object” for use as the grouping key. Use $keyf instead of key to group by calculated fields rather than existing document fields. | |
cond | document | Optional. The selection criteria to determine which documents in the collection to process. If you omit the cond field, group processes all the documents in the collection for the group operation. | |
finalize | function | Optional. A function that runs each item in the result set before group returns the final value. This function can either modify the result document or replace the result document as a whole. Unlike the $keyf and $reduce fields that also specify a function, this field name is finalize, not $finalize. | 完成器。也就是最终文档的过滤函数 |
group这个一直是迷迷糊糊的,上网找了好多例子,慢慢分析,慢慢消化。
下面两个例子都是mongodb权威指南一本书上给的。
下面使用一个例子来消化group,先准备数据:
db.blog.insert({title:"J2EE实战",author:"li",day:"2012-12-12",tags:["java","J2EE","struts2","spring","hibernate"]}); db.blog.insert({title:"轻量级J2EE开发",author:"ligang",day:"2012-12-14",tags:["java","J2EE","struts2","spring","hibernate"]}); db.blog.insert({title:"疯狂Java",author:"May",day:"2012-12-16",tags:["java","J2SE"]}); db.blog.insert({title:"android开发实例",author:"WenDy",day:"2012-12-18",tags:["java","android","J2ME","移动开发"]}); db.blog.insert({title:"MongoDB权威指南",author:"coolcao",day:"2012-12-16",tags:["mongdb","nosql","数据库"]}); db.blog.insert({title:"srping-data-mongo",author:"lucy",day:"2012-12-16",tags:["java","spring-data","mongdb","数据库"]}); db.blog.insert({title:"struts2权威指南",author:"jack",day:"2012-12-12",tags:["java","J2EE","struts2","MVC"]}); db.blog.insert({title:"springMVC",author:"cate",day:"2012-12-18",tags:["java","J2EE","spring","MVC","springMVC"]}); db.blog.insert({title:"Oracle",author:"dog",day:"2012-12-12",tags:["数据库","Oracle"]}); db.blog.insert({title:"mysql",author:"zhu",day:"2012-12-14",tags:["数据库","mysql"]});
上面是一组博客的数据数据,其中tags表示的是一篇博客的相关标签。问题是要按照时间统计出每天的博客中,提到最多的标签tags是哪些
这里就要使用group
db.blog.group({ key:{"day":true}, initial:{tags:{}}, $reduce:function(doc,prev){ for(i in doc.tags){ if(doc.tags[i] in prev.tags){ prev.tags[doc.tags[i]]++; }else{ prev.tags[doc.tags[i]]=1; } } } }); 或: db.runCommand({ group:{ ns:"blog", key:{day:true}, initial:{tags:{}}, $reduce:function(doc,prev){ for(i in doc.tags){ if(doc.tags[i] in prev.tags){ prev.tags[doc.tags[i]]++; }else{ prev.tags[doc.tags[i]]=1; } } } } });
结果:/* 0 */ { "0" : { "day" : "2012-12-12", "tags" : { "java" : 2, "J2EE" : 2, "struts2" : 2, "spring" : 1, "hibernate" : 1, "MVC" : 1, "数据库" : 1, "Oracle" : 1 } }, "1" : { "day" : "2012-12-14", "tags" : { "java" : 1, "J2EE" : 1, "struts2" : 1, "spring" : 1, "hibernate" : 1, "数据库" : 1, "mysql" : 1 } }, "2" : { "day" : "2012-12-16", "tags" : { "java" : 2, "J2SE" : 1, "mongdb" : 2, "nosql" : 1, "数据库" : 2, "spring-data" : 1 } }, "3" : { "day" : "2012-12-18", "tags" : { "java" : 2, "android" : 1, "J2ME" : 1, "移动开发" : 1, "J2EE" : 1, "spring" : 1, "MVC" : 1, "springMVC" : 1 } } }
从例子中可以看出,例子里的查询语句中,统计了每天的每个tags的数量,按日期分组输出。
key即要分组的键,我们要按日期输出,那么分组的键就是day
initial,初始化的文档。group最终的输出是以文档的形式,在计算的过程中会采用“叠加”的方式进行统计。initial初始化的便是一个空的文档。用来“叠加”的最初的文档。
$reduce,这里是一个函数。在mongo中,使用的是js函数操作数据,因此一些复杂的操作,都是可以使用自定义函数来实现。$reduce便是分组统计时的逻辑实现函数。函数的两个参数doc是每次遍历时的文档,prev是前一次的文档。
拿上面的例子来讲,tags的值是一个数组,for循环遍历每个文档的tags值,如果此次遍历中的tags值在前一次遍历中出现过,那么前一次遍历的tags要加1,如果在前一次遍历中没有出现,那么把这个没有的tag放入文档,初始值为1。
如此遍历结束,便统计出了每天的博客的每个标签的出现次数。可能这里有点绕,对照着上面的group函数再思量一下。
返回的文档即最后的prev文档,其中的键,其实是由key和initial初始化的文档组成的。上面的例子中,key是day,initial初始化的数组是tags:{},因此最终输出的文档包含day,tags键
可是这样并没有达到我们的要求,我们只是想要每天出现次数最多的标签,看看相关博客哪方面是最热门的
db.blog.group({ key:{"day":true}, initial:{tags:{}}, $reduce:function(doc,prev){ for(i in doc.tags){ if(doc.tags[i] in prev.tags){ prev.tags[doc.tags[i]]++; }else{ prev.tags[doc.tags[i]]=1; } } }, finalize:function(prev){ var mostPopular = 0 ; for(i in prev.tags){ if(prev.tags[i]>mostPopular){ prev.tag = i; mostPopular = prev.tags[i]; } } delete prev.tags; } }); 或: db.blog.runCommand({ group:{ ns:"blog", key:{day:true}, initial:{tags:{}}, $reduce:function(doc,prev){ for(i in doc.tags){ if(doc.tags[i] in prev.tags){ prev.tags[doc.tags[i]]++; }else{ prev.tags[doc.tags[i]] = 1; } } }, finalize:function(prev){ var mostPopular = 0; for(i in prev.tags){ if(prev.tags[i]>mostPopular){ prev.tag = i; mostPopular = prev.tags[i]; } } delete prev.tags; } } });
结果:
/* 0 */
{
"retval" : [
{
"day" : "2012-12-12",
"tag" : "java"
},
{
"day" : "2012-12-14",
"tag" : "java"
},
{
"day" : "2012-12-16",
"tag" : "java"
},
{
"day" : "2012-12-18",
"tag" : "java"
}
],
"count" : 10,
"keys" : 4,
"ok" : 1
}
finalize便是最后的过滤函数。我们将每个标签按天统计出来后,再遍历统计结果,拿出出现次数最多的标签留下。如上面的结果
上面例子是我参照之前转的一篇博客中的例子,也和mongodb权威指南中的例子差不多,大家可以参考一下,自己想点别的例子实践一下。
下面是mongodb权威指南上的一个例子
db.stocks.insert({day:"2012-12-12",time:"2012-12-12 12:00:00",price:4.23});
db.stocks.insert({day:"2012-12-12",time:"2012-12-12 13:00:00",price:4.33});
db.stocks.insert({day:"2012-12-12",time:"2012-12-12 14:00:00",price:4.26});
db.stocks.insert({day:"2012-12-12",time:"2012-12-12 15:00:00",price:4.02});
db.stocks.insert({day:"2012-12-12",time:"2012-12-12 16:00:00",price:4.18});
db.stocks.insert({day:"2012-12-13",time:"2012-12-13 12:00:00",price:4.19});
db.stocks.insert({day:"2012-12-13",time:"2012-12-13 13:00:00",price:4.04});
db.stocks.insert({day:"2012-12-13",time:"2012-12-13 14:00:00",price:4.35});
db.stocks.insert({day:"2012-12-13",time:"2012-12-13 15:00:00",price:4.31});
db.stocks.insert({day:"2012-12-13",time:"2012-12-13 16:00:00",price:4.08});
db.stocks.insert({day:"2012-12-14",time:"2012-12-14 12:00:00",price:4.08});
db.stocks.insert({day:"2012-12-14",time:"2012-12-14 13:00:00",price:4.12});
db.stocks.insert({day:"2012-12-14",time:"2012-12-14 14:00:00",price:4.24});
db.stocks.insert({day:"2012-12-14",time:"2012-12-14 15:00:00",price:4.09});
db.stocks.insert({day:"2012-12-14",time:"2012-12-14 16:00:00",price:4.16});
每个文档中,day是指的哪一天,time是具体每天的哪个点,price指的是股票的价格。
我们也要按照天分组显示每天中股票的最高价格,这里不是统计数量了,而是找出最大值。
db.stocks.group({
key:{day:true},
initial:{time:"",price:0},
$reduce:function(doc,prev){
for(i in doc.price){
if(doc.price>prev.price){
prev.price = doc.price;
prev.time = doc.time;
}
}
}
});
或者:
db.runCommand({
group:{
ns:"stocks",
key:{day:true},
initial:{time:"",price:0},
$reduce:function(doc,prev){
for(i in doc.price){
if(doc.price>prev.price){
prev.price = doc.price;
prev.time = doc.time;
}
}
}
}
});
结果:
{
"0" : {
"day" : "2012-12-12",
"time" : "2012-12-12 13:00:00",
"price" : 4.33
},
"1" : {
"day" : "2012-12-13",
"time" : "2012-12-13 14:00:00",
"price" : 4.35
},
"2" : {
"day" : "2012-12-14",
"time" : "2012-12-14 14:00:00",
"price" : 4.24
}
}
其实这个例子比上一个例子要简单点,虽然两个例子都挺简单的,第一个例子先是统计,最后选择最大值,第二个例子直接选择最大值即可,因此没有使用finalize最后过滤器这个参数。
注意:如上两个例子中都没有使用$keyf,cond两个参数,$keyf,cond,finalize这三个参数都是可选的。
$keyf和cond两个参数在以后学习中继续补充
注意:db.runCommand()和db.collection.group()两种方法返回的数据有点小区别,上面例子中并没有做详细区分。两种方法的区别会慢慢学习。