开心实习之第十二天~

今天学习的主要模块依旧是爬虫应用开发,而在爬虫开发中,数据存储是关键环节,而 MySQL 作为常用的关系型数据库,常通过 pymysql 库与 Python 进行交互。为大家介绍 pymysql 在数据存储中的具体应用。首先进行的是pymysql的安装及应用,然后它的使用包括四个方面,

1.创建数据库链接对象 connect
2.获取游标对象 cursor
3.执行SQL语句
4.提交事务关闭链接
创建python数据库后,可新建一个数据表students,然后插入数据,数据库涉及事务问题,事务存在ACID特性,即原子性、一致性、隔离型和持久性。插入、更新和删除操作都是对数据库进行更改的操作,更改操作都必须为一个事务,则更改操作标准的代码实现如下:
import pymysql
data = {
    'id':'20120002',
    'name':'Andy',
    'age':20
}
table = 'students'
keys = ','.join(data.keys())
values = ','.join(['%s']*len(data))
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = 'insert into {table}({keys}) values ({values})'.format(table=table, keys=keys, values=values)
try:
    if cursor.execute(sql,tuple(data.values())):
        print('success')
        db.commit()
except:
    print('error')
    db.rollback()
db.close()

上面数据插入的操作是通过构造SQL语句实现的,若表新增字段,比如gender性别字段,SQL语句需要修改,如需使用动态字典插入数据的话代码如下;

import pymysql
id = '20120001'
user = 'bob'
age=20
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql='insert into students(id,name,age) values(%s,%s,%s)'
val = ["2012001","bob","20"]
try:
    cursor.execute(sql,val)
    db.commit()
except:
    db.rollback()
db.close()

更新数据:

import pymysql
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = 'update students set age = %s where name =%s'
try:
    cursor.execute(sql,(23,'bob'))
    db.commit()
except:
    db.rollback()
db.close()

删除数据:

import pymysql
table = 'students'
condition = 'age >22'
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
crsor = db.cursor()
sql = 'delete from {table} where {condition}'.format(table=table,condition=condition)
try:
    crsor.execute(sql)
    db.commit()
except:
    db.rollback()
db.close()

查询数据:

import pymysql
db = connect = pymysql.Connect(host="127.0.0.1",user="root",password="123123",db='zy',charset='utf8',port=3306)
cursor = db.cursor()
sql = "select * from students where age>=20"
try:
    cursor.execute(sql)
    print('count:', cursor.rowcount)
    row = cursor.fetchone()
    while row:
        print('Row',row)
        row = cursor.fetchone()
except:
    print('error')

接下来的代码使用此方式爬取豆瓣图书,示例如下

import requests
import pymysql
from lxml import etree


def get_html(url):
    try:
        headers = {"User-Agent":"Mozilla/5.0 Chrome/138.0.0.0 Safari/537.36"}
        r = requests.get(url, headers=headers)
        r.encoding = r.apparent_encoding
        r.raise_for_status()
        html = r.text
        return html
    except Exception as err:
        print(err)

def parse(html):
    doc = etree.HTML(html)
    out_list = []
    for row in doc.xpath('//*[@id="content"]/div[2]/div[1]/ul/li'):
        #书名
        name = row.xpath("./div[2]/h2/a/text()")[0].strip()
        print("name",name)
        #评分
        score_list = row.xpath("./div[2]/p[2]/span[2]/text()")
        score = score_list[0].strip() if score_list else '0.0'
        author = row.xpath("./div[2]/p[1]/text()")[0].strip().split('/')
        row_list = [name,score,author[0],author[-2],author[-1]]
        out_list.append(row_list)
        print(out_list)
    return out_list
def save_mysql(sql,val,**dbinfo):
    connect = None
    cursor = None
    try:
        connect = pymysql.connect(**dbinfo)
        cursor = connect.cursor()
        cursor.executemany(sql,val)
        connect.commit()
    except Exception as err:
        connect.rollback()
        print(err)
    finally:
        if cursor:
            try:
                cursor.close()
            except Exception as close_err:
                print(f"关闭游标失败:{close_err}")
            # 只有连接存在时才关闭
        if connect:
            try:
                connect.close()
            except Exception as close_err:
                print(f"关闭连接失败:{close_err}")
if __name__ == '__main__':
    url = "https://book.douban.com/latest?subcat=全部"
    html = get_html(url)
    out_list = parse(html)
    parms = {
        "host":"127.0.0.1",
        "user":"root",
        "password":"123123",
        "db":"day12",
        "charset":"utf8",
        "cursorclass":pymysql.cursors.DictCursor,
        "port":3306}
    sql = "insert into bookinfo(bookname,score,autor,press,pubdate)"\
    "values(%s,%s,%s,%s,%s)"
    save_mysql(sql,out_list,**parms)

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值