利用CSS选择器爬取豆瓣上的图书

最新推荐文章于 2023-03-17 22:32:41 发布

Lgs_ning

最新推荐文章于 2023-03-17 22:32:41 发布

阅读量606

点赞数

CC 4.0 BY-SA版权

文章标签：基础爬虫

本文链接：https://blog.youkuaiyun.com/Lgs_ning/article/details/82726852

本文介绍了如何利用requests和BeautifulSoup库通过CSS选择器爬取豆瓣网站上的图书信息，包括书名、作者、出版社和出版时间。具体展示了爬取到的多本图书详情。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

利用CSS选择器爬取豆瓣上的图书

主要技术：熟练掌握requests、BeautifulSoup
爬取图书链接 "https://book.douban.com/latest?icn=index-latestbook-all"

代码块

import requests
from bs4 import BeautifulSoup



def get_film(url):
    headers = {
  
  'User-Agent': 'Mozilla/5.0'}   #防止反爬虫
    try:
        r=requests.get(url,headers=headers)
        r.raise_for_status()
        r.encoding=r.status_code
        return r.text

    except:
        return "爬取失败!"

def parse_html(html,List):
    film_name1=[]
    film_actor1=[]
    film_actor2=[]
    soup=BeautifulSoup(html,'html.parser')
    for name in soup.select(