使用正则爬取糗事百科段子，并保存为文本

最新推荐文章于 2021-03-31 11:38:01 发布

原创最新推荐文章于 2021-03-31 11:38:01 发布 · 303 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #正则表达式

爬虫系列专栏收录该内容

24 篇文章

订阅专栏

本文介绍了一种使用Python的requests库和正则表达式从糗事百科抓取段子的方法，并将抓取到的数据保存为文本文件。通过设置随机User-Agent，避免了被网站封禁的风险。

使用正则爬取糗事百科段子，并保存为文本

import requests
import re
from fake_useragent import UserAgent

url='https://www.qiushibaike.com/text/page/1/'
headers={
    'User-Agent':UserAgent().random
}

response=requests.get(url,headers=headers)
info=response.text
#使用正则提取
infos=re.findall(r'<div class="content">\s*<span>\s*(.+)\s*</span>', info)
#保存
with open('duanzi.txt','a',encoding='utf-8') as f:
    for info in infos:
        f.write(info + "\n\n\n")