入门级爬虫-17k小说站爬取指定小说

最新推荐文章于 2024-04-21 08:52:18 发布

loonslo_

最新推荐文章于 2024-04-21 08:52:18 发布

阅读量668

点赞数

CC 4.0 BY-SA版权

分类专栏：爬虫文章标签：爬虫 python

本文链接：https://blog.youkuaiyun.com/weixin_40508682/article/details/89443068

本文介绍了如何使用Python基础爬虫技术，通过BeautifulSoup4和requests库来爬取17k小说网站上的指定小说数据。在开始之前，需要确保已经安装了这两个必备的Python包。

执行py文件前，需要先安装这2个包
pip install beautifulsoup4
pip install requests

#!/usr/bin/env python3.7
# -*- coding: utf-8 -*
# author by slo


from bs4 import BeautifulSoup
import requests


class DownLoader(object):
    def __init__(self):
        self.url = 'http://www.17k.com'
        # 这里是你要下载的小说目录页
        self.target = 'http://www.17k.com/list/349579.html'
        # 存储小说的章节
        self.names = []
        # 存储小说相应章节的url地址
        self.urls = []
        # 存储下载的章节数
        self.nums = []

    # 获取下载链接
    def get_download_url(self):
        html = requests.get(url=self.target).content.decode('utf-8')
        soup