Python 爬虫实战指南：获取 1688 商品详情

最新推荐文章于 2025-10-17 11:52:41 发布

原创最新推荐文章于 2025-10-17 11:52:41 发布 · 1.4k 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#python #爬虫 #开发语言

该文章已生成可运行项目，

在电商领域，1688 作为国内领先的 B2B 电商平台，拥有海量的商品资源。通过 Python 爬虫技术，我们可以高效地获取 1688 商品的详细信息，包括商品名称、价格、图片、描述等。本文将详细介绍如何利用 Python 爬虫获取 1688 商品详情，并提供完整的代码示例。

一、准备工作

（一）注册 1688 开放平台账号

首先，需要在 1688 开放平台注册开发者账号并创建应用，获取 App Key 和 App Secret，这些凭证将用于后续的 API 调用。

（二）安装必要的 Python 库

安装以下 Python 库，用于发送 HTTP 请求和解析 HTML 内容：

bash

pip install requests beautifulsoup4 selenium

如果需要处理动态加载的内容，还需要安装 selenium。

（三）下载 ChromeDriver

为了使用 selenium，需要下载与浏览器版本匹配的 ChromeDriver，并确保其路径正确配置。

二、爬虫实现步骤

（一）发送 HTTP 请求

使用 requests 库发送 GET 请求，获取商品页面的 HTML 内容：

Python

import requests

def get_html(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(url, headers=headers)
    return response.text

（二）解析 HTML 内容

使用 BeautifulSoup 解析 HTML 内容，提取商品详情：

Python

from bs4 import BeautifulSoup

def parse_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    product_info = {}
    product_name = soup.find('h1', class_='product-title').text.strip()
    product_info['product_name'] = product_name
    product_price = soup.find('span', class_='price').text.strip()
    product_info['product_price'] = product_price
    product_description = soup.find('div', class_='product-description').text.strip()
    product_info['product_description'] = product_description
    product_image = soup.find('img', class_='main-image')['src']
    product_info['product_image'] = product_image
    return product_info

（三）处理动态加载的内容

如果商品详情页的内容是动态加载的，可以使用 Selenium 获取完整的页面内容：

Python

from selenium import webdriver
import time

def get_html_dynamic(url):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    driver = webdriver.Chrome(options=options)
    driver.get(url)
    time.sleep(3)
    html = driver.page_source
    driver.quit()
    return html

（四）整合代码

将上述功能整合到主程序中，实现完整的爬虫程序：

Python

def main():
    url = "https://detail.1688.com/offer/123456789.html"
    html = get_html_dynamic(url)
    if html:
        product_info = parse_html(html)
        print("商品名称:", product_info['product_name'])
        print("商品价格:", product_info['product_price'])
        print("商品描述:", product_info['product_description'])
        print("商品图片:", product_info['product_image'])

if __name__ == "__main__":
    main()