使用Python 抓取数据并写入Mysql数据库

最新推荐文章于 2025-07-06 20:48:16 发布

꧁星星之火꧂

最新推荐文章于 2025-07-06 20:48:16 发布

阅读量512

点赞数

CC 4.0 BY-SA版权

分类专栏： Python 文章标签： python

本文链接：https://blog.youkuaiyun.com/weixin_43292784/article/details/124170448

Python 专栏收录该内容

19 篇文章

订阅专栏

该代码实现了一个简单的网络爬虫，从古诗词网抓取诗词数据，包括标题、作者和内容，并将这些信息存储到MySQL数据库的`tangshi`表中。爬虫使用了requests库获取网页，BeautifulSoup解析HTML，以及pymysql进行数据库操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import re

import requests
import pymysql
from bs4 import BeautifulSoup

def spideData(page):
    conn = pymysql.connect(host='118.190.8.4', user='shici', password='shici', database='shici', port=3306)
    cursor = conn.cursor()
    cursor.execute(
         "create table if not exists tangshi(id int(11) NOT NULL AUTO_INCREMENT,title varchar(100),author varchar(50),content text,PRIMARY KEY (`id`))")

    url="https://so.gushiwen.cn/shiwens/default.aspx?page="+str(page)+"&tstr=&astr=&cstr=&xstr=诗"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
    }
    html=requests.get(url=url,headers=headers).text
    soup=BeautifulSoup(html,'lxml')
    titleList=soup.select('div.sons p b')
    authorList=soup.select('.sons p.source')
    contentList=soup.select('.sons .contson')

    index=0
    for title in titleList:
        query="insert into tangshi (title,author,content) values (%s,%s,%s)"
        cursor.execute(query,(title.text.strip(),authorList[index].text.strip(),contentList[index].text.strip()))
        index+=1

    conn.commit()
    cursor.close()
    conn.close()

if __name__=="__main__":
    for i in range(2,10):
        spideData(i)