python 抓取小红书

最新推荐文章于 2025-04-30 17:50:32 发布

小五咔咔咔

最新推荐文章于 2025-04-30 17:50:32 发布

阅读量364

点赞数 1

文章标签： python 开发语言

python相关学习资料：

https://edu.51cto.com/video/3832.html

https://edu.51cto.com/video/4102.html

https://edu.51cto.com/video/1158.html

Python 抓取小红书数据的科普文章

小红书是一个流行的社交电商平台，用户可以分享购物心得、生活点滴等。本文将介绍如何使用Python语言抓取小红书的数据，包括用户信息、笔记内容等。

环境准备

在开始之前，确保你的Python环境已经安装了以下库：

requests：用于发送HTTP请求。
BeautifulSoup：用于解析HTML文档。
pandas：用于数据处理和导出。

可以使用以下命令安装这些库：

抓取小红书用户信息

首先，我们以抓取小红书用户信息为例，介绍如何使用Python进行数据抓取。

发送HTTP请求，获取用户信息页面的HTML内容。

使用BeautifulSoup解析HTML，提取用户信息。

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
user_info = soup.find('div', class_='user-info')  # 根据实际页面结构调整
username = user_info.find('h2').text.strip()
followers = user_info.find('span', class_='count').text.strip()

将抓取到的数据存储到Pandas DataFrame中。

抓取小红书笔记内容

接下来，我们介绍如何抓取小红书用户的笔记内容。

获取用户笔记列表页面的URL。

发送HTTP请求，获取笔记列表页面的HTML内容。

使用BeautifulSoup解析HTML，提取笔记链接。

遍历笔记链接，抓取每篇笔记的详细内容。

for link in notes_links:
    note_url = link['href']  # 获取笔记的URL
    note_response = requests.get(note_url)
    note_html = note_response.text

    note_soup = BeautifulSoup(note_html, 'html.parser')
    title = note_soup.find('h1').text.strip()
    content = note_soup.find('div', class_='note-content').text.strip()

    # 将抓取到的数据存储到DataFrame中
    note_data = {'title': title, 'content': content}
    note_df = pd.DataFrame([note_data])
    df = pd.concat([df, note_df])