爬虫--07:MongoDB

这篇博客详细介绍了MongoDB的安装、基本使用、优势、聚合操作、索引创建及Python与MongoDB的交互。内容包括MongoDB的概念、与SQL的对比、数据库操作、查询与更新数据、聚合管道、索引的必要性和命令、Python操作MongoDB的步骤以及MongoDB在Scrapy爬虫中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MongoDB

一、概念

  • 非关系型数据库,保存数据非常灵活
  • MongoDB是一个介于关系型数据库和非关系型数据库啊之间的产品,是非关系型数据库当中功能最丰富,最像关系型数据库的。它支持的数据结构非常松散,因此可以存储比较复杂的数据类型。Mongo最大的特点是它支持的查询语言非常强大,其语法有点类似于面向对象的查询语言,几乎可以实现类似关系数据库表单查询的绝大部分功能,而且还支持对数据建立索引。(索引

二、SQL与NoSQL的区别

  • SQL:数据库------表------数据
  • NoSQL:数据库------集合(表)------文档(数据)

Mongo的优势

  • 无数据结构的限制
    • 业务开发比较方便
  • 性能高
  • 良好的支持
    • 发展比价长,完善的文档
    • 跨平台性好

三、安装

MongoDB shell version v4.4.5
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("08893c5f-ca3a-40b5-948f-beadf475331f") }
MongoDB server version: 4.4.5
---
The server generated these startup warnings when booting:
        2021-04-16T05:36:24.282+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
---

四、Mongo的基本使用

1、查看数据库

show dbs 

MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB

以上三个数据库都是mongo自带的

2、使用/创建数据库

use admin

MongoDB Enterprise > use admin
switched to db admin

意思是使用admin数据库

use demo

MongoDB Enterprise > use demo
switched to db demo
MongoDB Enterprise > db
demo
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB

此时use的功能是创建新的书库据demo
因为新建的数据库只存在内存之中,并没有保存在硬盘当中
新建的这个数据库是一个表或者集合

3、查看当前使用的数据库

db

MongoDB Enterprise > db
admin

查看当前使用的数据库

4、看淡数据库当中的表

第一种
MongoDB Enterprise > show tables
system.version

第二种
MongoDB Enterprise > show collections
system.versio

5、向当前数据库插入数据

非手动添加数据
MongoDB Enterprise > db
demo
MongoDB Enterprise > db.jerry.insert({s:1})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
demo    0.000GB
local   0.000GB

手动添加数据
db.creatCollection(name, options)
name:表示集合(表)的名字------注意表的名字不能重复
options:表示可选参数,可以指定表的大小
MongoDB Enterprise > db.createCollection('wangjiaxin_cllection')
{ "ok" : 1 }
MongoDB Enterprise > db.createCollection('wangjiaxin1',{capped:true,size:4})
{ "ok" : 1 }
上述的size:4  表示该表最多插入6条数据
在mongo中如果字节小于256就默认是256个字节

6、删除数据库的数据

MongoDB Enterprise > show tables
jerry
MongoDB Enterprise > db.dropDatabase()
{ "dropped" : "demo", "ok" : 1 }
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB

7、查看表中的数据

查询数据
db.name.find()

# 手动添加的数据
MongoDB Enterprise > db.wangjiaaxin_collection.find()

# 非手动添加的数据
MongoDB Enterprise > db.wangjiaxin.find()
{ "_id" : ObjectId("60793c2b9dc82dc5d7862223"), "x" : 1 }

8、查看表是否存在上限

MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin.isCapped()
false
返回false表示集合不存在上限
MongoDB Enterprise > db.wangjiaxin1.isCapped()
true
返回ture表示集合存在上限

9、删除数据库的表

MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin_cllection.drop()
true
MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1

10插入数据补充

向数据库已经存在的表内插入数据(通过ID插入替换数据)
第一种方式:插入单条数据
db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
> db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',_id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
{ "_id" : 1, "name" : "wangjiaxin", "age" : 25, "gender" : "madl" }

注意:主keyID不能重复_id:1

第二种方式:插入多条数据
> db.wangjiaxin1.insert({name:'wangjiaxin',age:25,gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
> db.wangjiaxin1.insert([{name:'wangjiaxin',age:25},{name:'lirui',age:23}])
BulkWriteResult({
	"writeErrors" : [ ],
	"writeConcernErrors" : [ ],
	"nInserted" : 2,
	"nUpserted" : 0,
	"nMatched" : 0,
	"nModified" : 0,
	"nRemoved" : 0,
	"upserted" : [ ]
})
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7e5c4474722e11521252"), "name" : "wangjiaxin", "age" : 25 }
{ "_id" : ObjectId("607a7e5c4474722e11521253"), "name" : "lirui", "age" : 23 }
> 

批量添加数据的方式
for(i=2;i<10;i++)db.wangjiaxin3.insert({x:i})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }

根据主key去做数据更新
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
> db.wangjiaxin3.save({_id:ObjectId("607a7f824474722e11521254"),name:18,gender:'male'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
也有单独的插入功能
> db.wangjiaxin3.save({name:'abc',gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
{ "_id" : ObjectId("607a81d14474722e1152125c"), "name" : "abc", "gender" : "male" } 

11、查询数据补充

查询数据库
show dbs  

查询表
show tables/collection

查询表里面的数据
db.name.find()
> db.stu.find({name:'张三'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }

美化查询(格式化的查询打印)
> db.stu.find({name:'老李'}).pretty()
{
	"_id" : ObjectId("607a96864474722e1152125e"),
	"name" : "老李",
	"hometown" : "广州",
	"age" : 18,
	"gender" : false
}


查询一条
> db.stu.findOne()
{
	"_id" : ObjectId("607a96864474722e1152125d"),
	"name" : "张三",
	"hometown" : "长沙",
	"age" : 20,
	"gender" : true
}

条件查找
 db.stu.find({age:18})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

12、比较运算符

  • 大于
> db.stu.find({age:{$gt:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
  • 大于等于
> db.stu.find({age:{$gte:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
  • 多条件查询
> db.stu.find({age:{$gte:18},hometown:'长沙'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }

13、逻辑运算符

  • 找到年龄或者性别符合
> db.stu.find({$or:[{age:{$gt:18}},{gender:false}]})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
  • 范围判断
> db.stu.find({age:{$in:[18,28]}})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

14、操作查询结果

  • 查询结果数量
> db.stu.find().count()
7
  • limit查询指定数量
> db.stu.find().limit(2)
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
  • skip跳过指定数量的数据
> db.stu.find().skip(2)
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
  • 映射
> db.stu.find({},{age:1})
{ "_id" : ObjectId("607a96864474722e1152125d"), "age" : 20 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521260"), "age" : 40 }
{ "_id" : ObjectId("607a96864474722e11521261"), "age" : 16 }
{ "_id" : ObjectId("607a96864474722e11521262"), "age" : 45 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }

> db.stu.find({},{age:1,_id:0})
{ "age" : 20 }
{ "age" : 18 }
{ "age" : 18 }
{ "age" : 40 }
{ "age" : 16 }
{ "age" : 45 }
{ "age" : 18 }

> db.stu.find({age:18},{age:1,genfder:1})
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }
  • 排序
升序
> db.stu.find().sort({age:1})
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }

降序
> db.stu.find().sort({age:-1})
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }

15、修改数据

db.name.update({query},{update},{multi:boolean})
query:查询条件
undate:更新的内容
multi:可选参数,默认是false 表示满足条件的第一条数据
ture:表示吧满足条件的数据都更新

# 指定键值得修改
> db.stu.update({name:'张三'},{$set:{name:'zhangsan'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

# 普通修改

> db.stu.update({name:'jerry'},{name:'wangjiaxin'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

满足条件修改
{multi:true}
> db.stu.update({},{$set:{gender:0}},{multi:true})
WriteResult({ "nMatched" : 7, "nUpserted" : 0, "nModified" : 7 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }

16、删除数据

条件删除
> db.stu.remove({name:'zhangsan'})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }

删除其中一个
> db.stu.remove({age:18},{justOne:true})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }

删除表里的所以有元素
> db.stu.remove({})
WriteResult({ "nRemoved" : 5 })
> db.stu.find()

删除表
> show tables
stu
wangjiaxin
wangjiaxin1
wangjiaxin3
> db.stu.drop()
true
> show tables
wangjiaxin
wangjiaxin1
wangjiaxin3

五、练习

  • 基础数据
> db.persons.find().pretty()
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),
	"name" : "jim",
	"age" : 25,
	"email" : "75431457@qq.com",
	"c" : 89,
	"m" : 96,
	"e" : 87,
	"country" : "USA",
	"books" : [
		"JS",
		"C++",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),
	"name" : "tom",
	"age" : 25,
	"email" : "214557457@qq.com",
	"c" : 75,
	"m" : 66,
	"e" : 97,
	"country" : "USA",
	"books" : [
		"PHP",
		"JAVA",
		"EXTJS",
		"C++"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),
	"name" : "lili",
	"age" : 26,
	"email" : "344521457@qq.com",
	"c" : 75,
	"m" : 63,
	"e" : 97,
	"country" : "USA",
	"books" : [
		"JS",
		"JAVA",
		"C#",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),
	"name" : "zhangsan",
	"age" : 27,
	"email" : "2145567457@qq.com",
	"c" : 89,
	"m" : 86,
	"e" : 67,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),
	"name" : "lisi",
	"age" : 26,
	"email" : "274521457@qq.com",
	"c" : 53,
	"m" : 96,
	"e" : 83,
	"country" : "China",
	"books" : [
		"JS",
		"C#",
		"PHP",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),
	"name" : "wangwu",
	"age" : 27,
	"email" : "65621457@qq.com",
	"c" : 45,
	"m" : 65,
	"e" : 99,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"C++",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),
	"name" : "zhaoliu",
	"age" : 27,
	"email" : "214521457@qq.com",
	"c" : 99,
	"m" : 96,
	"e" : 97,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"PHP"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),
	"name" : "piaoyingjun",
	"age" : 26,
	"email" : "piaoyingjun@uspcat.com",
	"c" : 39,
	"m" : 54,
	"e" : 53,
	"country" : "Korea",
	"books" : [
		"JS",
		"C#",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),
	"name" : "lizhenxian",
	"age" : 27,
	"email" : "lizhenxian@uspcat.com",
	"c" : 35,
	"m" : 56,
	"e" : 47,
	"country" : "Korea",
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),
	"name" : "lixiaoli",
	"age" : 21,
	"email" : "lixiaoli@uspcat.com",
	"c" : 36,
	"m" : 86,
	"e" : 32,
	"country" : "Korea",
	"books" : [
		"JS",
		"JAVA",
		"PHP",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),
	"name" : "zhangsuying",
	"age" : 22,
	"email" : "zhangsuying@uspcat.com",
	"c" : 45,
	"m" : 63,
	"e" : 77,
	"country" : "Korea",
	"books" : [
		"JS",
		"JAVA",
		"C#",
		"MONGODB"
	]
}
  • 1.查询年龄大于25小于27的name,age
> db.persons.find({age:{$gt:25,$lt:27}},{name:1,age:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "age" : 26 }

通过find()查找范围,在通过映射得到name,age 
  • 2.查询出不是美国的name
> db.persons.find({country:{$ne:'USA'}},{name:1,country:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "name" : "zhangsan", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "name" : "wangwu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "name" : "zhangsuying", "country" : "Korea" }

同上
扩展:¥ne :含义是不等于
  • 3.查询国籍是中国或者美国的学生信息
> db.persons.find({$or:[{country:'USA'},{country:'China'}]}).pretty()
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),
	"name" : "jim",
	"age" : 25,
	"email" : "75431457@qq.com",
	"c" : 89,
	"m" : 96,
	"e" : 87,
	"country" : "USA",
	"books" : [
		"JS",
		"C++",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),
	"name" : "tom",
	"age" : 25,
	"email" : "214557457@qq.com",
	"c" : 75,
	"m" : 66,
	"e" : 97,
	"country" : "USA",
	"books" : [
		"PHP",
		"JAVA",
		"EXTJS",
		"C++"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),
	"name" : "lili",
	"age" : 26,
	"email" : "344521457@qq.com",
	"c" : 75,
	"m" : 63,
	"e" : 97,
	"country" : "USA",
	"books" : [
		"JS",
		"JAVA",
		"C#",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),
	"name" : "zhangsan",
	"age" : 27,
	"email" : "2145567457@qq.com",
	"c" : 89,
	"m" : 86,
	"e" : 67,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),
	"name" : "lisi",
	"age" : 26,
	"email" : "274521457@qq.com",
	"c" : 53,
	"m" : 96,
	"e" : 83,
	"country" : "China",
	"books" : [
		"JS",
		"C#",
		"PHP",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),
	"name" : "wangwu",
	"age" : 27,
	"email" : "65621457@qq.com",
	"c" : 45,
	"m" : 65,
	"e" : 99,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"C++",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),
	"name" : "zhaoliu",
	"age" : 27,
	"email" : "214521457@qq.com",
	"c" : 99,
	"m" : 96,
	"e" : 97,
	"country" : "China",
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"PHP"
	]
}

  • 4.查询语文成绩大于85或者英语成绩大于90的学生信息
> db.persons.find({$or:[{c:{$gt:85}},{e:{$gt:90}}]},{c:1,e:1,name:1})

  • 5.查询出名字中存在"li"的学生信息
> db.persons.find({name:/li/},{name:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli" }
  • 6.查询喜欢看MONGODB和PHP的学生
> db.persons.find({books:{$all:['MONGODB','PHP']}},{name:1,books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "books" : [ "JS", "C#", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
  • 7.查询第二本书是JAVA的学生信息
> db.persons.find({books:'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }

> db.persons.find({'books.1':'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
  • 8.查询喜欢的书数量是4本的学生
> db.persons.find({books:{$size:4}},{books:1}).pretty()
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),
	"books" : [
		"JS",
		"C++",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),
	"books" : [
		"PHP",
		"JAVA",
		"EXTJS",
		"C++"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),
	"books" : [
		"JS",
		"JAVA",
		"C#",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),
	"books" : [
		"JS",
		"C#",
		"PHP",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),
	"books" : [
		"JS",
		"JAVA",
		"C++",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"PHP"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),
	"books" : [
		"JS",
		"C#",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),
	"books" : [
		"JS",
		"JAVA",
		"EXTJS",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),
	"books" : [
		"JS",
		"JAVA",
		"PHP",
		"MONGODB"
	]
}
{
	"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),
	"books" : [
		"JS",
		"JAVA",
		"C#",
		"MONGODB"
	]
}


$size  :   指定数量
  • 9.查询出persons中的国家分别是什么
>  db.persons.distinct('country') 
[ "China", "Korea", "USA" ]

六、聚合

  • 聚合是基于数据处理的聚合管道,每个文档通过一个由多个阶段组成的管道,可以对每个阶段的管道进行分组、过滤等功能,然后经过一系列的处理,输出相应的结果
    在这里插入图片描述

1、查询数据

  • 在使用聚合的时候使用aggregate()效率会更高
> db.stu.aggregate()
{ "_id" : ObjectId("607d1d7a8b3cfdc5eb3dd1e6"), "name" : "a", "hometown" : "东北", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da18b3cfdc5eb3dd1ea"), "name" : "e", "hometown" : "山东", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607d1dad8b3cfdc5eb3dd1ec"), "name" : "g", "hometown" : "大理", "age" : 18, "gender" : true }

2、分组功能

按照年龄来分组
> db.stu.aggregate({$group:{_id:'$age'}})
{ "_id" : 40 }
{ "_id" : 16 }
{ "_id" : 20 }
{ "_id" : 45 }
{ "_id" : 18 }

按照性别来分组
> db.stu.aggregate({$group:{_id:'$gender'}})
{ "_id" : false }
{ "_id" : true }

3、查询每组含有的数量

> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1}}})
{ "_id" : false, "stu_count" : 2 }
{ "_id" : true, "stu_count" : 5 }

注意:
1是正常的    
2或者以上的其他则是原始数量的倍数
> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:2}}})
{ "_id" : false, "stu_count" : 4 }
{ "_id" : true, "stu_count" : 10 }

4、查询每组含有的数据

> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1},name:{$push:"$name"}}})
{ "_id" : true, "stu_count" : 5, "name" : [ "a", "d", "e", "f", "g" ] }
{ "_id" : false, "stu_count" : 2, "name" : [ "b", "c" ] }

5、查询范围性数据

> db.stu.aggregate({$match:{age:{$gt:20}}})
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }

> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown'}})
{ "_id" : "江苏" }
{ "_id" : "华山" }

> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown',count:{$sum:1}}})
{ "_id" : "江苏", "count" : 1 }
{ "_id" : "华山", "count" : 1 }

6、跳过几个数据,指定查询几个数据

> db.stu.aggregate({$skip:1},{$limit:2})
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }

七、mongo创建索引

1、为什么要创建索引?

  • 加快查询的效率问题(优化)
  • 进行数据的去重

2、命令

查看索引
> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
还未创建索引,显示的效果

显示查询操作的详尽信息
> db.test.find({name:'test9999'}).explain('executionStats')
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "suoyin.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"name" : {
				"$eq" : "test9999"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"name" : {
					"$eq" : "test9999"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 46,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 100000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"name" : {
					"$eq" : "test9999"
				}
			},
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 1,
			"works" : 100002,
			"advanced" : 1,
			"needTime" : 100000,
			"needYield" : 0,
			"saveState" : 100,
			"restoreState" : 100,
			"isEOF" : 1,
			"direction" : "forward",
			"docsExamined" : 100000
		}
	},
	"serverInfo" : {
		"host" : "wangjiaxindeMacBook-Pro-131.local",
		"port" : 27017,
		"version" : "4.4.4",
		"gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"
	},
	"ok" : 1
}
创建索引
> db.test.ensureIndex({name:1}){  "createdCollectionAutomatically" : false,  "numIndexesBefore" : 1,  "numIndexesAfter" : 2,  "ok" : 1}> db.test.getIndexes()[  {    "v" : 2,    "key" : {      "_id" : 1    },    "name" : "_id_"  },  {    "v" : 2,    "key" : {      "name" : 1    },    "name" : "name_1"  }]
删除索引
> db.test.dropIndex({name:1})
{ "nIndexesWas" : 2, "ok" : 1 }

> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]

> db.test.find({name:'test99999'}).explain('executionStats')
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "suoyin.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"name" : {
				"$eq" : "test99999"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"name" : {
					"$eq" : "test99999"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 47,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 100000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"name" : {
					"$eq" : "test99999"
				}
			},
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 1,
			"works" : 100002,
			"advanced" : 1,
			"needTime" : 100000,
			"needYield" : 0,
			"saveState" : 100,
			"restoreState" : 100,
			"isEOF" : 1,
			"direction" : "forward",
			"docsExamined" : 100000
		}
	},
	"serverInfo" : {
		"host" : "wangjiaxindeMacBook-Pro-131.local",
		"port" : 27017,
		"version" : "4.4.4",
		"gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"
	},
	"ok" : 1
}

八、python与mongo的交互

1、安装

pip install pymongo

2、导入模块

import pymongo

3、连接mongo

mongo_client = pymongo.MongoClient(host='127.0.0.1', port=27017)
4、操作mongodb
mongo_client['Wjx']['like'].insert({'name': 'lirui'})

5、代码总结

  • 插入单条数据
import pymongo


class MongoData():

    def __init__(self, name):
        # 连接数据库
        self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)
        # 选择数据库
        self.db = self.client['Wjx'][name]

    # 插入单条数据
    def add_one(self, data):
        result = self.db.insert_one(data)
        print(result.inserted_id)


if __name__ == '__main__':

    md = MongoData('like')
    md.add_one({'name':'pengli'})
  • 插入多条数据
import pymongo


class MongoData():

    def __init__(self, name):
        # 连接数据库
        self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)
        # 选择数据库
        self.db = self.client['Wjx'][name]
        
    # 插入多条数据
    def add_many(self, data):
        result = self.db.insert_many(data)
        return result.inserted_ids


if __name__ == '__main__':

    md = MongoData('like')
    r = md.add_many([{'x': i} for i in range(2)])
    print(r)
  • 查询单条数据
import pymongo


class MongoData():

    def __init__(self, name):
        # 连接数据库
        self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)
        # 选择数据库
        self.db = self.client['Wjx'][name]

    # 插入单条数据
    def add_one(self, data):
        result = self.db.insert_one(data)
        print(result.inserted_id)

    # 插入多条数据
    def add_many(self, data):
        result = self.db.insert_many(data)
        return result.inserted_ids

    # 查询一条数据
    # query = None 表示无条件查询
    def get_one(self, query=None):
        if query is None:
            return self.db.find_one()
        else:
            return self.db.find_one(query)

if __name__ == '__main__':

    md = MongoData('like')
    
    # md.add_one({'name':'pengli'})
    
    # r = md.add_many([{'x': i} for i in range(2)])
    # print(r)
    
    r = md.get_one({'name':'zhuqi'})
    print(r)
  • 查询多条数据
import pymongo


class MongoData():

    def __init__(self, name):
        # 连接数据库
        self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)
        # 选择数据库
        self.db = self.client['Wjx'][name]

    # 插入单条数据
    def add_one(self, data):
        result = self.db.insert_one(data)
        print(result.inserted_id)

    # 插入多条数据
    def add_many(self, data):
        result = self.db.insert_many(data)
        return result.inserted_ids

    # 查询一条数据
    # query = None 表示无条件查询
    def get_one(self, query=None):
        if query is None:
            return self.db.find_one()
        else:
            return self.db.find_one(query)

    # 查询多条数据
    def get_many(self, query=None):
        if query is None:
            return self.db.find()
        else:
            return self.db.find(query)

if __name__ == '__main__':

    md = MongoData('like')

    # md.add_one({'name':'pengli'})

    # r = md.add_many([{'x': i} for i in range(2)])
    # print(r)

    # r = md.get_one({'name':'zhuqi'})
    # print(r)

    r = md.get_many()
    for i in r:
        print(i)

九、MongoDB与scrapy的交互

  • 爬虫文件
import scrapy
from chaoshenspider.items import ChaoshenspiderItem


class XintiantingSpider(scrapy.Spider):
    name = 'xintianting'
    allowed_domains = ['biduoxs.com']
    start_urls = ['https://www.biduoxs.com/biquge/51_51108/c20390104.html']

    def parse(self, response):
        chapter_name = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bookname"]/h1/text()').get()
        chapter_content = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@id="content"]/text()').getall()
        chapter_text = '\n'.join(chapter_content)
        # print(chapter_name)
        # print(chapter_text)
        item = ChaoshenspiderItem()
        item['chapter_name'] = chapter_name
        item['chapter_text'] = chapter_text

        yield item

        # 爬取下一章
        chapter_href = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bottem2"]/a/@href').getall()[2]
        # print(chapter_href)
        if chapter_href == '/biquge/51_51108/':
            pass
        else:
            chapter_url = response.urljoin(chapter_href)
            yield scrapy.Request(
                url=chapter_url,
                callback=self.parse
            )
  • items.py
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class ChaoshenspiderItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    chapter_name = scrapy.Field()
    chapter_text = scrapy.Field()
    pass

  • settings.py
# Scrapy settings for chaoshenspider project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'chaoshenspider'

SPIDER_MODULES = ['chaoshenspider.spiders']
NEWSPIDER_MODULE = 'chaoshenspider.spiders'

LOG_LEVEL = 'WARNING'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'chaoshenspider (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'chaoshenspider.middlewares.ChaoshenspiderSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'chaoshenspider.middlewares.ChaoshenspiderDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'chaoshenspider.pipelines.ChaoshenspiderPipeline': 300,
}

MONGODB_HOST = '127.0.0.1'
MONGODB_PORT = 27017
MONGODB_DBNAME = 'fiction'
MONGODB_DBCNAME = '超神学院之新天庭'

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

  • piplines.py
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
import pymongo
from itemadapter import ItemAdapter
from chaoshenspider import settings


class ChaoshenspiderPipeline:

    def __init__(self):
        host = settings.MONGODB_HOST
        port = settings.MONGODB_PORT
        dbname = settings.MONGODB_DBNAME
        dbcname = settings.MONGODB_DBCNAME

        client = pymongo.MongoClient(host=host, port=port) # 链接数据库

        fiction = client[dbname] # 指定数据库

        self.add = fiction[dbcname] # 指定表

        self.book = open('超神学院之新天庭.txt', 'w', encoding='utf-8')

        print('爬虫程序开始!')

    def process_item(self, item, spider):
        print(item['chapter_name']+'下载完成!')

        # 下载小说文件
        self.book.write(item['chapter_name']+'\n')
        self.book.write(item['chapter_text']+'\n\n')

        # 存放数据库
        data = dict(item)
        self.add.insert_one(data)

        return item

    def close_spider(self, item):
        self.book.close()
        print('爬虫程序结束!')
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值