利用 Python 爬虫获取店铺所有商品实战指南

原创于 2025-08-27 15:04:22 发布 · 430 阅读

CC 4.0 BY-SA版权

文章标签：

在电商领域，精准获取店铺所有商品信息对于市场分析、选品决策和竞争情报收集至关重要。1688 作为国内领先的 B2B 电商平台，提供了丰富的商品数据和强大的 API 接口。通过 Python 爬虫技术，我们可以高效地获取 1688 店铺的所有商品信息。本文将详细介绍如何利用 Python 爬虫获取 1688 店铺的所有商品信息，并提供完整的代码示例。

一、前期准备

（一）Python 开发环境

确保你的开发环境中已经安装了 Python，并且启用了以下库：

requests：用于发送 HTTP 请求。
BeautifulSoup：用于解析 HTML 数据。
pandas：用于数据处理和存储。

可以通过以下命令安装这些库：

bash

pip install requests beautifulsoup4 pandas

（二）注册 1688 开放平台账号

在 1688 开放平台上注册成为开发者，并创建应用以获取 AppKey 和 AppSecret。这些凭证将用于构建访问 API 的请求。

二、编写爬虫代码

（一）发送 HTTP 请求

使用 requests 库发送 GET 请求，获取店铺商品列表页面的 HTML 内容。

Python

import requests

def get_html(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        print(f"请求失败，状态码：{response.status_code}")
        return None

（二）解析 HTML 内容

使用 BeautifulSoup 解析 HTML 内容，提取商品详情。

Python

from bs4 import BeautifulSoup

def parse_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    products = []
    product_items = soup.find_all('div', class_='sm-offer-item')
    for item in product_items:
        title = item.find('a', class_='offer-title').text.strip()
        price = item.find('span', class_='price').text.strip()
        description = item.find('div', class_='description').text.strip()
        products.append({
            'title': title,
            'price': price,
            'description': description
        })
    return products

（三）按关键字搜索商品

根据关键字构建搜索 URL，并获取搜索结果页面的 HTML 内容。

Python

def search_products(keyword, page=1):
    base_url = "https://s.1688.com/selloffer/offer_search.htm"
    url = f"{base_url}?keywords={keyword}&pageno={page}"
    html = get_html(url)
    if html:
        return parse_html(html)
    return []

（四）整合代码

将上述功能整合到主程序中，实现完整的爬虫程序。

Python

import pandas as pd

def main():
    keyword = "女装"
    max_pages = 3
    all_products = []

    for page in range(1, max_pages + 1):
        products = search_products(keyword, page)
        all_products.extend(products)
        print(f"Page {page} products fetched.")

    df = pd.DataFrame(all_products)
    df.to_csv('shop_products.csv', index=False, encoding='utf-8')
    print("数据已保存到 shop_products.csv 文件中")

if __name__ == "__main__":
    main()