今天学习的主要模块依旧是爬虫应用开发,而在爬虫开发中,数据存储是关键环节,而 MySQL 作为常用的关系型数据库,常通过 pymysql 库与 Python 进行交互。为大家介绍 pymysql 在数据存储中的具体应用。首先进行的是pymysql的安装及应用,然后它的使用包括四个方面,
1.创建数据库链接对象 connect
2.获取游标对象 cursor
3.执行SQL语句
4.提交事务,关闭链接
创建python数据库后,可新建一个数据表students,然后插入数据,数据库涉及事务问题,事务存在ACID特性,即原子性、一致性、隔离型和持久性。插入、更新和删除操作都是对数据库进行更改的操作,更改操作都必须为一个事务,则更改操作标准的代码实现如下:
import pymysql
data = {
'id':'20120002',
'name':'Andy',
'age':20
}
table = 'students'
keys = ','.join(data.keys())
values = ','.join(['%s']*len(data))
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = 'insert into {table}({keys}) values ({values})'.format(table=table, keys=keys, values=values)
try:
if cursor.execute(sql,tuple(data.values())):
print('success')
db.commit()
except:
print('error')
db.rollback()
db.close()
上面数据插入的操作是通过构造SQL语句实现的,若表新增字段,比如gender性别字段,SQL语句需要修改,如需使用动态字典插入数据的话代码如下;
import pymysql
id = '20120001'
user = 'bob'
age=20
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql='insert into students(id,name,age) values(%s,%s,%s)'
val = ["2012001","bob","20"]
try:
cursor.execute(sql,val)
db.commit()
except:
db.rollback()
db.close()
更新数据:
import pymysql
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = 'update students set age = %s where name =%s'
try:
cursor.execute(sql,(23,'bob'))
db.commit()
except:
db.rollback()
db.close()
删除数据:
import pymysql
table = 'students'
condition = 'age >22'
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
crsor = db.cursor()
sql = 'delete from {table} where {condition}'.format(table=table,condition=condition)
try:
crsor.execute(sql)
db.commit()
except:
db.rollback()
db.close()
查询数据:
import pymysql
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = "select * from students where age>=20"
try:
cursor.execute(sql)
print('count:', cursor.rowcount)
row = cursor.fetchone()
while row:
print('Row',row)
row = cursor.fetchone()
except:
print('error')
接下来的代码使用此方式爬取豆瓣图书,示例如下
import requests
import pymysql
from lxml import etree
def get_html(url):
try:
headers = {"User-Agent":"Mozilla/5.0 Chrome/138.0.0.0 Safari/537.36"}
r = requests.get(url, headers=headers)
r.encoding = r.apparent_encoding
r.raise_for_status()
html = r.text
return html
except Exception as err:
print(err)
def parse(html):
doc = etree.HTML(html)
out_list = []
for row in doc.xpath('//*[@id="content"]/div[2]/div[1]/ul/li'):
#书名
name = row.xpath("./div[2]/h2/a/text()")[0].strip()
print("name",name)
#评分
score_list = row.xpath("./div[2]/p[2]/span[2]/text()")
score = score_list[0].strip() if score_list else '0.0'
author = row.xpath("./div[2]/p[1]/text()")[0].strip().split('/')
row_list = [name,score,author[0],author[-2],author[-1]]
out_list.append(row_list)
print(out_list)
return out_list
def save_mysql(sql,val,**dbinfo):
connect = None
cursor = None
try:
connect = pymysql.connect(**dbinfo)
cursor = connect.cursor()
cursor.executemany(sql,val)
connect.commit()
except Exception as err:
connect.rollback()
print(err)
finally:
if cursor:
try:
cursor.close()
except Exception as close_err:
print(f"关闭游标失败:{close_err}")
# 只有连接存在时才关闭
if connect:
try:
connect.close()
except Exception as close_err:
print(f"关闭连接失败:{close_err}")
if __name__ == '__main__':
url = "https://book.douban.com/latest?subcat=全部"
html = get_html(url)
out_list = parse(html)
parms = {
"host":"127.0.0.1",
"user":"root",
"password":"123123",
"db":"day12",
"charset":"utf8",
"cursorclass":pymysql.cursors.DictCursor,
"port":3306}
sql = "insert into bookinfo(bookname,score,autor,press,pubdate)"\
"values(%s,%s,%s,%s,%s)"
save_mysql(sql,out_list,**parms)
9704

被折叠的 条评论
为什么被折叠?



