地址:
http://www.jianlaixiaoshuo.com/
代码实现:
#导入模块
import requests
from pyquery import PyQuery as pq
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}
#得到每章网址
response = requests.get("http://www.jianlaixiaoshuo.com/",headers=headers)
response.encoding = "utf8"
dob = pq(response.text)
links = dob("dl > dd a")
for link in links.items():
response = requests.get("http://www.jianlaixiaoshuo.com" + link.attr.href,headers=headers)
response.encoding="utf8"
doc = pq(response.text)
title = doc("#BookCon > h1").text()
print(title + "下载完成")
content = doc("#BookText").text()
with open("2.txt",mode="a+",encoding="utf-8") as f:
f.write(title)
f.write(content)
f.write("\n")
该代码实现了从网站http://www.jianlaixiaoshuo.com/抓取每章的标题和内容,并将它们写入到2.txt文件中。使用了requests和pyquery库进行网页请求和解析。每个章节的标题被打印并伴随着'下载完成'的提示,内容则被追加到文件中。
1923

被折叠的 条评论
为什么被折叠?



