Python爬虫实战：利用代理IP获取电商数据（手把手保姆级教程）

最新推荐文章于 2025-10-28 15:07:49 发布

原创最新推荐文章于 2025-10-28 15:07:49 发布 · 846 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#python #爬虫 #tcp/ip #其他

文章目录

一、为什么要用代理IP？（这波操作必须懂！）

老铁们有没有遇到过这种情况？辛辛苦苦写的爬虫突然被网站封IP了（心态炸裂）！这时候代理IP就是你的救命稻草！特别是爬电商平台数据时（比如某宝某东），没有代理IP分分钟被反爬系统教做人！

（超级重要）代理IP的三大核心作用：

突破访问频率限制 → 防止被封IP
隐藏真实IP地址 → 保护隐私安全
获取不同地区数据 → 比如查看商品价格差异

二、环境准备篇（超详细配置指南）

2.1 必须安装的库

# 终端执行这些命令（Windows用户记得用管理员权限）
pip install requests
pip install beautifulsoup4
pip install fake-useragent

2.2 代理IP选择技巧

推荐两种获取方式：

免费代理网站（适合练手）：西刺代理、快代理
付费服务（商用推荐）：Luminati、Oxylabs

（踩坑预警）免费代理的存活率不到30%！正式项目建议用付费服务！

三、实战代码解析（直接上硬货！）

3.1 基础爬虫框架

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

# 生成随机请求头（反反爬必备技能）
ua = UserAgent()
headers = {'User-Agent': ua.random}

# 代理IP配置（这里用免费代理示例）
proxies = {
    'http': 'http://118.24.219.151:3289',
    'https': 'https://118.24.219.151:3289'
}

# 目标URL（以某电商平台手机分类为例）
url = 'https://list.jd.com/list.html?cat=9987,653,655'

try:
    response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
    response.encoding = 'utf-8'
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'lxml')
        # 这里写你的解析逻辑...
        print('成功获取数据！')
    else:
        print(f'请求失败，状态码：{response.status_code}')
        
except Exception as e:
    print(f'出现异常：{str(e)}')

3.2 高级技巧：自动切换代理IP

import random

# 代理IP池（实际项目建议存数据库）
proxy_pool = [
    '112.85.131.53:9999',
    '113.121.39.248:9999',
    '114.239.1.155:8089'
]

def get_random_proxy():
    return {'http': 'http://' + random.choice(proxy_pool)}

# 在请求时使用
response = requests.get(url, headers=headers, proxies=get_random_proxy())

四、反反爬策略大全（血泪经验总结）

4.1 必须设置的请求头

headers = {
    'User-Agent': ua.random,
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Referer': 'https://www.jd.com/',
    'Accept-Encoding': 'gzip, deflate, br'
}

4.2 请求频率控制（保命关键！）

import time
import random

# 随机延时（1-3秒）
time.sleep(random.uniform(1, 3))

# 每10次请求休息10秒
if request_count % 10 == 0:
    time.sleep(10)

五、数据存储方案（多种姿势任选）

5.1 CSV存储示例

import csv

with open('product_data.csv', 'a', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['商品名称', '价格', '评论数', '店铺名称'])

5.2 MySQL存储（推荐生产环境使用）

import pymysql

conn = pymysql.connect(
    host='localhost',
    user='root',
    password='123456',
    database='spider_data'
)

cursor = conn.cursor()
sql = '''
INSERT INTO jd_products 
(name, price, comments, shop) 
VALUES (%s, %s, %s, %s)
'''
cursor.execute(sql, (name, price, comments, shop))
conn.commit()