Python爬取豆瓣TOP250电影实战。

最新推荐文章于 2025-12-03 20:31:34 发布

原创最新推荐文章于 2025-12-03 20:31:34 发布 · 137 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能

Python 爬取豆瓣TOP250电影数据

爬取豆瓣TOP250电影数据是一个常见的实战项目，适合练习Python网络爬虫技术。以下是一个完整的实现方案，包含请求发送、页面解析和数据存储。

准备工作

需要安装以下Python库：

requests：发送HTTP请求
BeautifulSoup：解析HTML页面
pandas：数据存储为CSV文件

pip install requests beautifulsoup4 pandas

发送请求获取页面

豆瓣TOP250页面通过分页展示，每页25条数据，共10页。需要构造URL并发送GET请求。

import requests
from bs4 import BeautifulSoup
import pandas as pd

def get_page(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        print(f"请求失败，状态码：{response.status_code}")
        return None

解析页面数据

使用BeautifulSoup解析HTML页面，提取电影名称、评分、评价人数等信息。

def parse_page(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', class_='item')
    movies = []
    for item in items:
        rank = item.find('em').get_text()
        title = item.find('span', class_='title').get_text()
        rating = item.find('span', class_='rating_num').get_text()
        num_reviews = item.find('div', class_='star').find_all('span