python爬虫-笔趣阁

突发奇想去爬笔趣阁的小说,毕竟我是一个老书虫,闲话不多说,代码呈上。主要使用requests
BeautifulSoup
from urllib.request import quote, unquote
import requests
from bs4 import BeautifulSoup
import sys
import time

content = input('请输入你想要查找的小说名:')
initial_content = content
keyword = quote(initial_content,encoding='gb2312')
url = 'http://www.biquge.com.tw/modules/article/soshu.php?searchkey='+keyword
print(url)
re = requests.get(url)   #笔趣阁搜索url
retype=re.apparent_encoding
re.encoding = retype
print(re.status_code)
html = re.text
soup = BeautifulSoup(html, 'html.parser')
fileName = '/Users/john/Desktop/小说/'+initial_content+'.txt'
print(fileName)
file = open(fileName, 'a', encoding='utf-8')

chapters = soup.find_all(id='list')
info = soup.find_all(id='info')
for link in info:
    file.write(link.get_text())#书籍作者信息
download_soup = BeautifulSoup(str(chapters), 'html.parser')

arr = []
for child in download_soup.dl.children:    #dl下所有子节点
    if hasattr(child, 'href') and child.a != None:
        arr.append(child.get_text())
numbel = len(arr)
print(numbel)
index= 1
time1 = time.time ()#获取当前时间(秒)
for child in download_soup.dl.children:     #dl下所有子节点
    if hasattr(child, 'href') and child.a != None:
        file.write(child.get_text() + '\n' + '-----------------------------------------------' + '\n')
        url = 'http://www.biquge.com.tw/' + child.a['href']
        # print(url)
        reponse_dl = requests.get(url)
        type_dl = reponse_dl.apparent_encoding
        reponse_dl.encoding = type_dl
        html_dl = reponse_dl.text
        soup_dl = BeautifulSoup(html_dl, 'html.parser')
        contents = soup_dl.find_all(id='content')   #带着<div id="content">
        for link in contents:
            #print(link.get_text())
            file.write(link.get_text() + '\n\n')
        print("已下载:%.3f%%" % float(index / numbel*100))#爬取进度
        index += 1
time2 = time.time()
tt = (time2 - time1)
print('花费时间:' + str(tt) + '秒')
file.close()
控制台输入,就会在左面小说文件夹中生成小说了,我的是macox系统,window系统的吧文件保存路径需要改一下就行了。主要是练习了一下BeautifulSoup,毕竟爬虫必不可少的工具,好了结束。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值