Python requests乱码的五种解决办法， resp.content.decode() 和 resp.text 如何选择

转载已于 2025-05-26 12:23:21 修改 · 2.8k 阅读

10 ·

CC 4.0 BY-SA版权

原文链接：https://huaweicloud.youkuaiyun.com/6380300ddacf622b8df86681.html

文章标签：

#python #开发语言

于 2023-07-16 13:54:03 首次发布

文章介绍了使用requests模块请求网页内容时可能出现的乱码问题及其解决方案，包括通过apparent_encoding、指定utf-8解码、使用chardet和cchardet库进行编码检测以及encode+decode方法。重点提到了cchardet作为chardet的加速版，适用于对性能有要求的场景。

resp.text

工作原理: requests 库会尝试根据 HTTP 响应头中的 Content-Type（例如 text/html; charset=utf-8）或通过 Chardet 库自动猜测响应内容的编码。然后，它会将原始的字节内容自动解码成 Unicode 字符串。

优点:方便快捷: 大多数情况下，这是获取文本内容最直接、最方便的方式，无需手动处理编码。

自动编码检测: 能够智能地处理多种编码情况。

缺点:
可能出现乱码: 如果 requests 自动检测到的编码不正确，或者响应头中没有明确指定编码，就可能出现乱码问题。

选择：

当您确定响应内容是文本（如 HTML、JSON、XML、纯文本等），并且 requests 能够正确识别其编码时，这是最推荐和最方便的方式。
快速获取文本: 如果您只是想快速查看响应内容，不关心具体的编码细节，可以先尝试 resp.text。

首选 resp.text，乱码时考虑 resp.content.decode()

======================

使用requests模块请求网页内容，经常会出现乱码，例如：

import requests
res = requests.get("https://www.baidu.com/")
print(res.text)

乱码的原因是内容编码和解码方式不一致导致的，解决办法有以下几种解决办法：

第一种：apparent_encoding

import requests
res = requests.get("https://www.baidu.com/")
res.encoding = res.apparent_encoding
print(res.text)

第二种：content utf-8解码
一种临时性的解决办法，不建议用这种方法，相当于写死代码了。

import requests
res = requests.get("https://www.baidu.com/")
try:
    txt = res.content.decode('gbk')
except UnicodeDecodeError as e:
    # print(e)
    txt = res.content.decode('utf-8')
print(txt)

第三种：chardet

import requests
import chardet
res = requests.get("https://www.baidu.com/")
encoding = chardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

第四种：cchardet
cchardet需要提前安装一下：pip install cchardet。

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
encoding = cchardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

chardet 和 cchardet的区别：cchardet 是 chardet 的一个加速版本，使用了C语言实现，因此性能更高

chardet 和 cchardet 都是 Python 库，用于字符编码检测，主要用于确定文本数据的字符编码格式（如UTF-8、ISO-8859-1等），以便正确地解析和处理文本数据。它们之间的主要区别在于性能和实现语言。

chardet:
- chardet 是一个用 Python 编写的字符编码检测库。
- 它的性能相对较慢，因为它是一个纯Python库，不是特别适合处理大型文本数据。
- chardet 基于统计模型和启发式算法，通过分析字符的分布和出现频率来猜测文本的编码。
- 你可以使用 chardet 安装它，通常是通过 pip：pip install chardet。
cchardet:
- cchardet 是 chardet 的一个加速版本，使用了C语言实现，因此性能更高。
- 由于它是用C编写的，所以在处理大型文本文件时速度更快，适用于需要高性能字符编码检测的应用。
- cchardet 通常被认为是 chardet 的替代品，可以无缝替代 chardet，因为它提供了相同的接口。
- 你可以使用 cchardet 安装它，通常是通过 pip：pip install cchardet。

总之，如果你需要进行字符编码检测并且对性能有较高要求，可以考虑使用 cchardet。如果性能不是首要考虑因素，或者你需要在某些环境中使用纯Python库，那么 chardet 仍然是一个不错的选择。

第五种：encode + decode

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
res_encoding = res.encoding  # 响应的编码方式
con_encoding = cchardet.detect(res.content)['encoding']  # 内容的编码方式
print(res.text.encode(res_encoding).decode(con_encoding))  # 重新编解码text