每日实战：Python爬取微博热榜数据存入Excel_微博热搜榜爬取-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_59078658/article/details/145915522

🎯 项目效果

自动抓取微博实时热榜TOP50
结构化存储热搜标题、热度、分类等信息
存入excel

🔍 爬虫逻辑解析

技术路线图

请求接口 → 解析JSON → 数据清洗 → 存储Excel

核心步骤说明

接口请求
通过分析微博网页端，找到热榜数据接口：
https://weibo.com/ajax/side/hotSearch

模拟浏览器
需要添加以下请求头伪装正常访问：

headers = {
    'cookie': 'YOUR_COOKIE',       # ← 需替换实际值
    'user-agent': 'YOUR_USER_AGENT', # ← 需替换实际值
    'referer': 'https://weibo.com/'
}

数据解析
接口返回JSON格式数据，关键数据结构：

{
  "data": {
    "realtime": [
      {
        "rank": 1,
        "word": "热搜关键词",
        "num": 3855433,
        "category": "文娱",
        "label_name": "热"
      }
    ]
  }
}

🛠️ 完整代码实现

import json
import requests
import pandas as pd

url = "https://weibo.com/ajax/side/hotSearch"
# 配置请求头（填写自己的）
headers = {
    'cookie': 'YOUR_COOKIE_HERE',      # 替换为实际Cookie
    'user-agent': 'YOUR_USER_AGENT',   # 替换为浏览器UA
    'referer': 'https://weibo.com/'
}

response = requests.get(url=url, headers=headers)

if response.status_code == 200:
    try:
        data = json.loads(response.text)
        content = data['data']['realtime']

        # 收集所有数据的列表
        hot_list = []

        for item in content:
            # 根据实际接口返回字段调整键名
            hot_item = {
                '排名': item.get('rank', 0),
                '标题': item.get('word', '无'),
                '热度值': item.get('num', 0),  # 讨论量
                '分类': item.get('category', '无'),
                '标签': item.get('label_name', '无'),
                '链接': f"https://s.weibo.com/weibo?q={item.get('word', '')}"
            }
            hot_list.append(hot_item)

        # 转换为DataFrame
        df = pd.DataFrame(hot_list)

        # 保存到Excel
        df.to_excel('weibo_hotsearch.xlsx', index=False)
        print("数据已保存到 weibo_hotsearch.xlsx")

    except json.JSONDecodeError:
        print("返回的内容不是有效的 JSON 格式")
        print(response.text)
    except KeyError as e:
        print(f"数据字段缺失: {e}")
        print("接口返回数据结构可能有变化，请检查响应内容")
else:
    print("请求失败，状态码：", response.status_code)
    print("返回的内容：", response.text)

📌 使用指南

配置说明

获取Cookie

浏览器登录微博后按F12 → Network → 刷新页面 → 复制请求头中的Cookie