Python简单操作爬取微博热搜榜（表格.xls模式存储）_爬取搜狗热搜榜前十并存储到excel表格中-优快云博客

本文链接：https://blog.youkuaiyun.com/VinciB/article/details/113983736

该博客介绍了一个使用Python爬取微博热搜榜的代码实现，主要涉及BeautifulSoup和requests库。代码首先通过requests获取HTML页面，然后利用BeautifulSoup解析并提取标题、链接和热度数据，最后将数据保存到Excel文件中。此爬虫为静态爬取，适用于基础的数据抓取学习。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

爬取热搜榜需要重点注意的是：

这里的代码是先划区分类——把标题，链接和点击数划分在一个大的集合里，存储的时候再分开。
注意提取数据的方式soup.select。
requests用于将链接转化成html语言，BeautifulSoup则用于查找需要的内容。
通过采用soup.select()方法，可以得到所需的内容。
其中关键点在于，对于所需内容的精准定位，通过（）内的语句来实现：

# -*- coding = utf-8 -*-
# @Time : 2021/2/22 20:04
# @Author : Vinci
# @File : 爬取热搜.py
# @Software: PyCharm

from bs4 import BeautifulSoup   #网页解析，获取数据
import sys
import xlwt     #进行excel操作
import re       #正则表达式，进行文字匹配
import urllib   #制定url，获取网页数据
import urllib.request
import urllib.error
import sqlite3
import requests
import datetime


def main():

    savepath = ".\\热搜榜.xls"
    baseurl = "https://s.weibo.com/top/summary/"
    r = requests.get(baseurl)
    soup = BeautifulSoup(r.text, 'lxml')
    findtitile = soup.select('#pl_top_realtimehot > table > tbody > tr > td.td-02 > a')
    findlink = soup.select('#pl_top_realtimehot > table > tbody > tr > td.td-02 > span')
    news = []
    for i in range(len(findtitile) - 1):
        new = {}
        new['title'] = findtitile[i + 1].get_text()
        new['url'] = "https://s.weibo.com" + findtitile[i]['href']
        new['hotness'] = findlink[i].get_text()
        news.append(new)
    #print(news)
    #today = datetime.date.today()
    book = xlwt.Workbook(encoding="utf-8", style_compression=0)  # 创建workbook对象
    sheet = book.add_sheet('热搜榜', cell_overwrite_ok=True)  # 创建工作表
    col = ("热搜内容", "热搜链接", "点击次")
    for i in range(0, 3):
        sheet.write(0, i, col[i])
    i = 1
    for j in news:
        #print("第%d条" % i)
        sheet.write(i,0, j['title'])
        sheet.write(i,1,j['url'])
        sheet.write(i,2,j['hotness'])
        i=i+1
    book.save('热搜榜.xls')

if __name__ =="__main__":       #当程序执行时
#调用函数
    main()
    print("爬取完毕")