MongoDB 笔记

最新推荐文章于 2025-02-16 15:11:26 发布

weixin_34406061

最新推荐文章于 2025-02-16 15:11:26 发布

阅读量108

点赞数

文章标签：数据库 shell javascript ViewUI

MongoDB数据类型

Mongodb数据类型
null {"x":null}
Boolean {"x":true}, {"x":false}
数据类型, 在Mongodb Shell中默认使用64位浮点型数据,如{"x":2.32}、{"x":2}，如果要使用整数类型则用{"x":NumberInt(2)}、{"x":NumberLong(2)}
字符串, Mongodb中字符串采用UTF-8编码方式，{"x":"hello world"}
日期类型, {"x":new Date()}
正则表达式, Mongodb中可使用和javascript相同的正则表达式 {"x":/itbilu/i}
数据, Mongodb中数组的使用和javascript相同{"x":["hello","world"]}
内嵌文档, {"x"：{"y":"Hello"}}
Id和ObjectId(), Mongodb每个文档都会包含一个_id，如果你不指定时Mongodb会自动生成一个ObjectId对象
代码, {"x":function aa(){}}
二进制

查找

# 列出所有
db.getCollection('article').find({})

# = 条件
db.getCollection('article').find({name : 'name'})

# and条件
db.getCollection('article').find({name:'name', age:18});

# or条件
db.getCollection('article').find({$or:[{title:/release/}, {title:/Faq/}]}, {title:1})article

# in条件
db.getCollection('article_756').find({author:{$in:['david', 'Bens', 'xxh']}})

# like 条件(正则), 注意这里的正则字符串是不加引号的
db.getCollection('article').find({name : /ThisName/})
# like 忽略大小写
db.getCollection('article').find({name : /ThisName/i})
db.getCollection('article').find({_id : /^756.*/})

# 列出指定字段
db.getCollection('article').find({}, {name: 1, rank: 1})

# 不列出指定字段
db.getCollection('article').find({}, {name: 0, rank: 0})

# 排序, 1: ASC, -1: DESC
db.getCollection('article').find({}).sort({updatedAt: -1})

# 翻页 limit, skip
db.getCollection('article').find({}).limit(20).skip(20000)

注意: 排序时, 如果 MongoDB 在排序的字段上不能使用索引, 所有记录的合并大小不能超过32MB

修改值

修改指定记录的值

db.article.update({_id:309},{$set:{'lastPage':1}})

替换_id字段的值. 如果只是修改值, 可以在原表上修改(save一个, remove一个),

# 注意这边 _id 的类型是 int32, 所以 这个等式右边是有问题的, 最后会被转为int
db.getCollection('article').find({}).forEach( function(u) {
    var old = u._id;
    u._id = u.boardId+'.'+u._id;
    db.getCollection('article').save(u);
    db.getCollection('article').remove({_id, ObjectId(old)});
})

如果是同时修改类型和值, 不能在原表上直接修改, 要新建一个collection来处理.

注意: 这个在Robo3T里面执行会报错, 必须到命令行下面执行

# 修改collection名称
db.getCollection('article').renameCollection('article_old')

# 将新记录填入新collection
db.getCollection('article_old').find({}).forEach( function(u) {
    var newId = u.boardId.toString() +'.'+ u._id.toString();
    u._id = newId;
    u.parentId = u.boardId.toString() +'.'+ u.parentId.toString();
    db.getCollection('article').save(u);
})

修改和删除字段名

格式

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)
# query : update的查询条件，类似sql update查询内where后面的。
# update : update的对象和一些更新的操作符（如$,$inc...）等，也可以理解为sql update查询内set后面的
# upsert : 可选，这个参数的意思是，如果不存在update的记录，是否插入objNew,true为插入，默认是false，不插入。
# multi : 可选，mongodb 默认是false,只更新找到的第一条记录，如果这个参数为true, 就把按条件查出来多条记录全部更新。
# writeConcern :可选，抛出异常的级别。

修改字段名

db.getCollection('article').update({},{$rename:{"COMMPP":'COMP_NAME'}},false,true)

.删除字段

//把 from等于hengduan 并且zhLatin是空的数据的zhLatin字段删除
db.getCollection('species').update({"from":"hengduan","zhLatin":null},{$unset: {'zhLatin':''}},false, true)

删除

删db

db.dropDatabase()

删collection

db.getCollection('section_to_board').drop()

删记录, 对应_id的值可以是ObjectId, string, int 等

db.getCollection('article').remove({_id: ObjectId("adfasdfadsf")})
db.getCollection('article').remove({board: 'name')})

mongodb删除集合后磁盘空间不释放, 为避免记录删除后的数据的大规模挪动, 原记录空间不删除, 只标记“已删除”, 以后还可以重复利用. 这些空间需要用修复命令db.repairDatabase() 释放. 如果在修复的过程中mongodb挂掉, 重启不了的, 可以使用./mongod --repair --dbpath=/data/mongo/ 进行修复. dbpath时就指向要修复的数据库文件目录就可以. 修复可能要花费很长的时间

索引

Mongodb 3.0.0 版本前创建索引方法为 db.collection.ensureIndex()，之后的版本使用了 db.collection.createIndex()

# 创建联合唯一索引, 方式为后台创建, 不阻塞
db.collection.ensureIndex( {"id":1,"name":1}, {background:1,unique:1} )
# 创建索引
db.collection.createIndex( { orderDate: 1 } )
# 指定索引名称, 如果未指定, MongoDB 通过连接索引的字段名和排序顺序生成一个索引名称
db.collection.createIndex( { category: 1 }, { name: "category_fr" } )
# 创建联合索引
db.collection.createIndex( { orderDate: 1, category: 1 }, { name: "date_category_fr", collation: { locale: "fr", strength: 2 } } )

# 查看集合索引
db.collection.getIndexes()
# 查看集合索引大小
db.collection.totalIndexSize()
# 删除集合所有索引
db.collection.dropIndexes()
# 删除集合指定索引
db.collection.dropIndex("索引名称")

统计

主要是count, distinct 和 group

# 统计记录数 count
db.getCollection('article').find({name:'name', age:18}).count()

# Distinct
# 格式 db.collectionName.distinct(field, query, options)
# 统计所有的记录中flag的不同值, flag要加引号
db.getCollection('article').distinct(flag)
# 带条件的distinct, 去author为gre的flag的不同值
db.getCollection('article').distinct('flag', {author: 'gre'})

# Group 实际上是一种 MapReduce 方式的统计
# 对于如下结构的数据进行统计
{
 "_id" : ObjectId("552a333f05c2b62c01cff50e"),
 "_class" : "com.mongo.model.Orders",
 "onumber" : "004",
 "date" : ISODate("2014-01-05T16:03:00Z"),
 "cname" : "zcy",
 "item" : {
   "quantity" : 5,
   "price" : 4.0,
   "pnumber" : "p002"
  }
}

# 按date和pnumber对记录进行分组, 在reduce中累计quantity, 会输出key 和out中的字段
db.orders.group({
    key: { date:1,'item.pnumber':1 },
    initial: {"total":0},
    reduce: function Reduce(doc, out) {
        out.total+=doc.item.quantity
    }
})

# 按date对记录进行分组, 在reduce中统计数量和金额, 最后再补充计算单件平均价格
db.orders.group({
    key: {date:1},
    initial: {"total":0,"money":0},
    reduce: function Reduce(doc, out) {
        out.total+=doc.item.quantity;
        out.money+=doc.item.quantity*doc.item.price;
    },
    finalize : function Finalize(out) {
        out.avg=out.money/out.total
        return out;
    }
});

注意: group命令不能在分片集合上运行, group的结果集大小不能超过16MB

聚合

执行相同的统计, aggregate 性能比group好

关键词含义, 注意以$开头的关键字, 以及字段名

$sum	计算总和。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg	计算平均值	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min	获取集合中所有文档对应值得最小值。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max	获取集合中所有文档对应值得最大值。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push	在结果文档中插入值到一个数组中。	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet	在结果文档中插入值到一个数组中，但不创建副本。	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first	根据资源文档的排序获取第一个文档数据。	db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last	根据资源文档的排序获取最后一个文档数据	db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

# 统计文章数量

db.getCollection('article').aggregate([{$group : {_id : "$author", num_articles : {$sum : 1}}}])

# 等价于

db.getCollection('article_756').group({
    key: {author:1},
    initial: {"total":0},
    reduce: function Reduce(doc, out) {
        out.total += 1;
    },
    finalize : function Finalize(out) {
        out.avg = out.total / 100
        return out;
    }
});

# 带条件的统计

db.getCollection('article').aggregate([
    {$match: { author: 'Milton' }},
    {$group: { _id: "$boardId", total: { $sum: 1 } } },
    {$sort: { total: -1 } }
])

备份和恢复

备份数据, 使用mongodump命令, 可以指定的最小粒度为Collection, 命令行例子

mongodump -h 127.0.0.1:27017 -d demodb -c article_1 -o ./
# -h 服务器IP和端口
# -d db
# -c collection, 不指定则导出所有collection
# -o 导出文件存放路径, 默认会再增加一层db名称的目录

.导出时, 会增加一层与db同名的目录, 同时各个collection以单独的文件存放, 每个collection会生成一个bson文件和一个metadata.json文件

如果使用的是mongodb4, 还有以下参数可以使用

--gzip 输出压缩好的文件, 会在原文件名后增加.gz 后缀, 这样省得导出后再自己压缩了
--dumpDbUsersAndRoles 需要与 -d 配合使用, 同时导出用户和角色. 如果未指定db, mongodump会自动导出全部用户和角色数据
--excludeCollection string 排除指定的collection, 如果要排除多个, 需要多次使用这个参数
--excludeCollectionsWithPrefix string 排除名称包含指定前缀的collection, 如果要排除多个, 需要多次使用这个参数
--numParallelCollections int, -j int 指定并发导出的数量, 默认为4
--viewsAsCollections 将view当作collection导出, 在restore后变成collection, 如果不指定, 则只导出view的metadata, 待restore后会重建

.例如

mongodump -h 127.0.0.1:27017 --gzip -d demodb -c article -o ./
mongodump -h 127.0.0.1:27017 -d demodb -o ./ --gzip --excludeCollection=col1 --excludeCollection=col2 --excludeCollection=col3

批量下载多个collection不能直接用mongodump命令行实现, 要通过shell脚本

#!/bin/bash
db=demodb
var=$1
collection_list=${var//,/ }
host=127.0.0.1
port=27017
out_dir="./"

for collection in $collection_list; do
    echo $collection
    mongodump -h $host:$port -c $collection -d $db -o ${out_dir} --gzip
done


# 使用时, 多个collection以逗号分隔, 中间不要留空格, 例如
./dump.sh c1,c2,c3,c4,c_5,c_6

恢复数据, 使用mongorestore命令, 命令行例子

# 恢复使用--gzip导出的备份, 不能使用-c参数指定collection
mongorestore -h 127.0.0.1:27017 -d demodb --objcheck --stopOnError --gzip folder/

.如果在restore中需要指定包含和排除的collection, 要使用 --nsInclude 和 --nsExclude 参数

mongorestore --nsInclude 'transactions.*' --nsExclude 'transactions.*_dev' dump/

快速在db之间复制collection

mongodump --archive --db src_db -h 127.0.0.1:27017 --excludeCollection board --excludeCollection section --excludeCollection section_to_board --excludeCollection user | mongorestore --archive -j1 -h 127.0.0.1:27017 --nsInclude 'src_db.col_*' --nsFrom 'src_db.col_$A$' --nsTo 'tgt_db.col_$A$'

# 因为在后面已经有nsInclude, 在前半部可以不加 --excludeCollection 参数, 这样对于src_db里所有的collection只会做一个count操作, 并不会真的发生传输. 对执行时间影响不大
mongodump --archive --db src_db -h 127.0.0.1:27017 | mongorestore --archive -j1 -h 127.0.0.1:27017 --nsInclude 'src_db.col_*' --nsFrom 'src_db.col_$A$' --nsTo 'tgt_db.col_$A$'

如果需要将多个collection合并到同一个, 需要多次执行下面的语句, 其中col_305是每次需要更换的collection名称. 使用通配符会报confliction, 如果知道如何一次性导入多个, 请留言赐教.

mongodump --archive --db src_db -h 127.0.0.1:27017 | mongorestore --archive -j1 -h 127.0.0.1:27017 --nsInclude 'src_db.col_305' --nsFrom 'src_db.col_$A$' --nsTo 'tgt_db.col_all'

.这是一个用于批量合并的脚本

#!/bin/bash

if [ -z $1 ]; then
  echo $"Usage: $0 [file_name]"
  exit 2
else
    cat $1 | while read line
    do
        echo $line
	mongodump --archive -d src_db -c col_${line} -j1 -h 127.0.0.1:27017 | mongorestore --archive -j1 -h 127.0.0.1:27017 --nsInclude src_db.col_${line} --nsFrom 'src_db.col_$A$' --nsTo 'tgt_db.col_all'
    done
fi