Python网络请求神器：Requests库从入门到精通

原创已于 2025-07-29 13:30:13 修改 · 902 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#python #scrapy #conda #beautifulsoup

于 2025-07-29 13:29:21 首次发布

爬虫专栏收录该内容

4 篇文章

订阅专栏

Python网络请求神器：Requests库从入门到精通

掌握HTTP请求的艺术，让你的Python程序轻松连接世界

在当今互联网时代，让程序与网络服务交互已成为开发者必备技能。而Python的Requests库正是这一领域最受欢迎的利器。本文将带你从零开始，全面掌握Requests库的使用技巧，助你轻松处理各种HTTP请求场景。

一、Requests库简介

Requests是一个优雅而简洁的Python HTTP库，由Kenneth Reitz开发，专为人类设计。它抽象了复杂的底层操作，让发送HTTP请求变得简单直观。与Python内置的urllib相比，Requests提供了更简洁的API和更强大的功能，包括：

支持多种HTTP方法（GET、POST、PUT、DELETE等）
自动处理连接池和持久连接
简洁的请求参数处理
完善的异常处理机制
会话和Cookie持久化支持

正因如此，Requests已成为Python开发者进行网络请求的首选工具，在爬虫开发、API调用、Web服务测试等场景中广泛应用。

二、安装Requests库

在开始使用Requests前，需要先安装它。安装过程非常简单：

方法1：使用pip安装（推荐）

pip install requests

方法2：使用conda安装（适合Anaconda用户）

conda install requests

验证安装

安装完成后，可以通过以下代码验证是否安装成功：

import requests
print(requests.__version__)  # 输出安装的版本号

如果输出版本号（如2.31.0），说明安装成功。

提示：如果安装速度慢，可使用国内镜像源加速：
pip install requests -i https://pypi.tuna.tsinghua.edu.cn/simple

三、发起HTTP请求

Requests支持所有常见的HTTP方法，下面逐一介绍。

1. GET请求 - 获取资源

GET是最常用的HTTP方法，用于从服务器获取资源：

import requests

response = requests.get('https://api.example.com/data')

# 检查请求是否成功
if response.status_code == 200:
    print("请求成功！")
    print("响应内容：", response.text)
else:
    print(f"请求失败，状态码：{response.status_code}")

带参数的GET请求

实际应用中，经常需要在URL中添加查询参数：

params = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('https://api.example.com/search', params=params)

# 实际请求的URL将是：
# https://api.example.com/search?key1=value1&key2=value2
print("实际请求URL:", response.url)

2. POST请求 - 提交数据

POST用于向服务器提交数据，如表单提交或API调用：

发送表单数据

data = {'username': 'admin', 'password': 'secret'}
response = requests.post('https://api.example.com/login', data=data)

发送JSON数据

json_data = {'name': 'John', 'age': 30}
response = requests.post('https://api.example.com/users', json=json_data)

使用json参数时，Requests会自动设置Content-Type为application/json，并将字典转换为JSON字符串。

3. 其他HTTP方法

Requests同样支持PUT、DELETE等其他HTTP方法：

# PUT请求 - 更新资源
response = requests.put('https://api.example.com/user/1', data={'name': '新名字'})

# DELETE请求 - 删除资源
response = requests.delete('https://api.example.com/user/1')

# HEAD请求 - 获取头部信息
response = requests.head('https://api.example.com')
print(response.headers)  # 只获取头部信息，不返回内容主体

# OPTIONS请求 - 获取服务器支持的HTTP方法
response = requests.options('https://api.example.com')
print(response.headers.get('allow'))  # 输出支持的HTTP方法

四、高级请求技巧

1. 设置请求头

自定义请求头可以模拟浏览器行为或传递认证信息：

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
    'Authorization': 'Bearer YOUR_ACCESS_TOKEN',
    'Accept': 'application/json'
}

response = requests.get('https://api.example.com/protected', headers=headers)

2. 文件上传

使用Requests上传文件非常简单：

# 上传单个文件
files = {'file': open('report.xls', 'rb')}
response = requests.post('https://api.example.com/upload', files=files)

# 上传多个文件
files = [
    ('images', ('image1.jpg', open('image1.jpg', 'rb'), 'image/jpeg')),
    ('images', ('image2.png', open('image2.png', 'rb'), 'image/png'))
]
response = requests.post('https://api.example.com/upload', files=files)

3. 流式上传大文件

对于大文件，建议使用流式上传避免内存占用过高：

with open('large_file.zip', 'rb') as f:
    response = requests.post('https://api.example.com/upload', data=f)

五、处理响应

Requests提供了多种方式处理服务器返回的响应内容。

1. 响应内容解析

response = requests.get('https://api.example.com/data')

# 获取文本内容（自动解码）
print(response.text)

# 获取二进制内容（如图片、文件）
with open('image.jpg', 'wb') as f:
    f.write(response.content)

# 解析JSON响应
data = response.json()
print(data['key'])

# 获取响应头信息
print(response.headers['Content-Type'])
print(response.headers.get('Content-Length'))

2. 状态码检查

HTTP状态码表示请求的处理结果：

if response.status_code == 200:
    # 请求成功
    pass
elif response.status_code == 404:
    # 资源未找到
    pass
elif response.status_code == 500:
    # 服务器内部错误
    pass

Requests还提供了raise_for_status()方法，当状态码不是200时会抛出异常：

try:
    response = requests.get('https://api.example.com/data')
    response.raise_for_status()  # 如果状态码不是200，抛出HTTPError异常
except requests.exceptions.HTTPError as err:
    print(f"HTTP错误: {err}")

六、会话管理和Cookie持久化

当需要跨请求保持会话状态（如登录状态）时，可以使用Session对象：

# 创建会话对象
with requests.Session() as session:
    # 登录请求，保存Cookie到会话
    login_data = {'username': 'admin', 'password': 'secret'}
    session.post('https://api.example.com/login', data=login_data)
    
    # 后续请求自动携带Cookie
    response = session.get('https://api.example.com/dashboard')
    print(response.text)
    
    # 在会话中保持自定义头信息
    session.headers.update({'X-Custom-Header': 'value'})
    response = session.get('https://api.example.com/profile')

使用Session对象的主要优势：

自动Cookie持久化：登录后Cookie在所有请求中自动携带
连接池重用：提高性能，减少建立新连接的开销
统一配置：可为所有请求设置公共头信息或认证

七、异常处理

网络请求可能因各种原因失败，健壮的程序需要妥善处理异常：

try:
    # 设置超时时间为3秒
    response = requests.get('https://api.example.com/data', timeout=3)
    response.raise_for_status()
    
except requests.exceptions.Timeout:
    print("请求超时，请检查网络连接或稍后重试")
    
except requests.exceptions.HTTPError as err:
    print(f"HTTP错误: {err}")
    
except requests.exceptions.ConnectionError:
    print("网络连接错误，无法建立连接")
    
except requests.exceptions.RequestException as e:
    print(f"请求发生未知错误: {e}")

关键异常类型：

Timeout：请求超时
ConnectionError：网络连接问题
HTTPError：HTTP状态码错误（非200）
RequestException：所有requests异常的基类

最佳实践：始终设置合理的timeout值（如3-5秒），避免程序无响应。

八、实际应用场景

Requests库在多种实际场景中发挥重要作用：

1. 网页数据抓取

response = requests.get('https://news.example.com/latest')
# 结合BeautifulSoup等库解析HTML

2. RESTful API调用

response = requests.get('https://api.weather.com/forecast', params={
    'location': 'Beijing',
    'apikey': 'YOUR_API_KEY'
})
weather_data = response.json()

3. 接口自动化测试

# 测试用户创建流程
def test_user_creation():
    data = {'name': 'test_user', 'email': 'test@example.com'}
    response = requests.post('https://api.example.com/users', json=data)
    assert response.status_code == 201
    user_id = response.json()['id']
    
    # 验证用户已创建
    response = requests.get(f'https://api.example.com/users/{user_id}')
    assert response.status_code == 200

4. 服务监控

# 定期检查网站可用性
def check_service_health():
    try:
        response = requests.get('https://example.com/health', timeout=5)
        return response.status_code == 200
    except:
        return False