MapReduce在执行时先指定一个Map(映射)函数,把输入<key,value>对映射成一组新的<key,value>对,经过一定处理后交给 Reduce,Reduce对相同key下的所有value处理后再输出<key,value>对作为最终的结果。
指令原型
db.runCommand(
{ mapreduce : 字符串,集合名,
map : 函数
reduce : 函数
[, query : 文档,发往map函数前先给过渡文档]
[, sort : 文档,发往map函数前先给文档排序]
[, limit : 整数,发往map函数的文档数量上限]
[, out : 字符串,统计结果保存的集合]
[, keeptemp: 布尔值,链接关闭时临时结果集合是否保存]
[, finalize : 函数,将reduce的结果送给这个函数,做最后的处理]
[, scope : 文档,js代码中要用到的变量]
[, jsMode : 布尔值,是否减少执行过程中BSON和JS的转换,默认true]
[, verbose : 布尔值,是否产生更加详细的服务器日志,默认true]
}
);
下面是我做的例子,Python脚本编写,实现对数据进行分组统计(前提需安装pymongo 驱动http://blog.youkuaiyun.com/t_ells/article/details/50265889)
1.实现在限制的时间内对漏洞类型进行统计
#-*- encoding: utf-8 -*-
import sys
sys.setdefaultencoding('utf-8')
import datetime,time,sys,json,os,re
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
from bson.code import Code
import sys
sys.setdefaultencoding('utf-8')
import datetime,time,sys,json,os,re
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
from bson.code import Code
Client = MongoClient("192.168.62.15",27017)#连接服务器数据库,端口默认27017
db = Client["CVDB"] #数据库
Col = db["vulnerability.database"] #collection 数据表
starttime1= datetime.datetime.strptime("2015-01-01 12:20:00",'%Y-%m-%d %H:%M:%S')#时间 做筛选条件
starttime = starttime1 - datetime.timedelta(hours=8)#需要注意的是mongodb中时间与本地时间相差8小时
mapper=Code("""function () {
if(this.type != null)
emit(this.type,1);
}
""")
mapper=Code("""function () {
if(this.type != null)
emit(this.type,1);
}
""")
reducer= Code("""
function (key, values) {
var reduced = 0;
values.forEach(function(val) {
reduced += val;
});
return reduced;
}
""")
finalize=Code("""
function (key, reduced) {
return reduced;
}""")
result=Col.map_reduce(mapper,reducer,out ="out_result",full_response=True,query={"time":{'$gt':starttime}})
#db.out_result.drop()#删除保存结果的集合
结果:
map中除了可以将数据字段作为key进行统计,还可以将函数作为key来统计
2.下面是我对时间字段进行统计 我数据库中时间格式为ISODate("2015-12-25T00:12:00Z") ,实现统计2015-12-10 到2015-12-20之间每天的漏洞数
#-*- encoding: utf-8 -*-
import sys
sys.setdefaultencoding('utf-8')
import datetime,time,sys,json,os,re
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
from bson.code import Code
import sys
sys.setdefaultencoding('utf-8')
import datetime,time,sys,json,os,re
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
from bson.code import Code
Client = MongoClient("192.168.62.15",27017)#连接服务器数据库,端口默认27017
db = Client["CVDB"] #数据库
Col = db["vulnerability.database"] #collection 数据表
starttime1 = datetime.datetime.strptime("2015-01-01 12:20:00",'%Y-%m-%d %H:%M:%S')
stoptime1 = datetime.datetime.strptime("2015-01-01 12:20:00",'%Y-%m-%d %H:%M:%S')
starttime = starttime1 - datetime.timedelta(hours=8)
stoptime = stoptime1 - datetime.timedelta(hours=8)
mapper=Code("""function () {
var date = new Date(this.time);
var dateKey = ""+date.getFullYear()+"-"+(date.getMonth()+1)+"-"+date.getDate();
emit(dateKey,1);
}
""")
stoptime1 = datetime.datetime.strptime("2015-01-01 12:20:00",'%Y-%m-%d %H:%M:%S')
starttime = starttime1 - datetime.timedelta(hours=8)
stoptime = stoptime1 - datetime.timedelta(hours=8)
mapper=Code("""function () {
var date = new Date(this.time);
var dateKey = ""+date.getFullYear()+"-"+(date.getMonth()+1)+"-"+date.getDate();
emit(dateKey,1);
}
""")
reducer= Code("""
function (key, values) {
var reduced = 0;
values.forEach(function(val) {
reduced += val;
});
return reduced;
}
""")
finalize=Code("""
function (key, reduced) {
return reduced;
}""")
result=Col.map_reduce(mapper,reducer,out ="out",full_response=True,query={"time":{'$gt':starttime,'$lte':stoptime}})
#db.out.drop()#删除保存结果的集合
结果显示: