一般在获取响应内容时,会出现响应结果乱码或者为Unicode,首先需要知道用的是那种编码方式,其次如何针对性去解码
- encoding是从http的header中的charset字段中提取的编码方式,如果header中没有charst字段,则会默认为ISO-8859-1编码模式。则无法解析中文,所以会造成乱码
- apparent_encoding是从网页的响应内容分析编码的方式,所有更加准确,当响应内容出现乱码时,可以把apparent_encoding的编码格式赋值给encoding
实例1:
(1)未解码直接打印输出内容
import unittest
import requests
class APTTest(unittest.TestCase):
# 测试网站
url="http://httpbin.org"
#响应结果处理
# 编码-》中文
#\u5f20\u4e09--unicode编码
def test_005(self):
url = self.url + "/get"
data = {"age": 18, "name": "张三"}
res = requests.get(url, params=data)
print(res.apparent_encoding) # 获取编码方式
if res.status_code==200:
print(res.text)# #直接获取响应body,字符
else:
print(res.status_code)
代码运行结果:
Ran 1 test in 0.493s
OK
Process finished with exit code 0
ascii
{
"args": {
"age": "18",
"name": "\u5f20\u4e09"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.0",
"X-Amzn-Trace-Id": "Root=1-60a36ce3-53304fe747e36c6158b9ee28"
},
"origin": "125.76.177.143",
"url": "http://httpbin.org/get?age=18&name=\u5f20\u4e09"
}
(2)解码后输出:通过content获取到字节码,然后进行decode
def test_005(self):
url = self.url + "/get"
data = {"age": 18, "name": "张三"}
res = requests.get(url, params=data)
print(res.apparent_encoding)
# header,body,cookie
if res.status_code==200:
#自己解码,先通过content获取到字节码,然后进行decode
print(res.content.decode("unicode-escape")) #获取到字节码,解码方式为unicode-escape
else:
print(res.status_code)
代码运行结果如下:
Process finished with exit code 0
ascii
Ran 1 test in 1.179s
OK
{
"args": {
"age": "18",
"name": "张三" #解码后结果展示
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.0",
"X-Amzn-Trace-Id": "Root=1-60a389be-7bc646273989e4784ecbb7db"
},
"origin": "125.76.177.143",
"url": "http://httpbin.org/get?age=18&name=张三"
}
实例2:
(1)未解码直接打印输出内容
#响应结果乱码
def test_006(self):
res = requests.get("http://www.baidu.com")
print(res.apparent_encoding) # 获取编码方式
if res.status_code==200:
print(res.text) #直接获取响应body,字符
else:
print(res.status_code)
代码运行结果如下:title–乱码
utf-8
Ran 1 test in 0.109s
OK
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title>
(2)解码后输出:通过content获取到字节码,然后进行decode’
def test_006(self):
res = requests.get("http://www.baidu.com")
print(res.apparent_encoding)
if res.status_code==200:
#自己解码,先通过content获取到字节码,然后进行decode
print(res.content.decode("utf8")) # 解码方式为uft8
else:
print(res.status_code)
代码运行结果如下
utf-8
Ran 1 test in 0.098s
OK
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title>
- 我们常用的输出一般为res.json(),也可以解决乱码的问题
实例3:
def test_005(self):
url = self.url + "/get"
data = {"age": 18, "name": "张三"}
res = requests.get(url, params=data)
print(res.apparent_encoding)
# header,body,cookie
if res.status_code==200:
#自己解码,先通过content获取到字节码,然后进行decode
#print(res.content.decode("unicode-escape")) #获取到字节码,解码方式为unicode-escape
#print(res.text)
print(res.json())
else:
print(res.status_code)
结果为:
Ran 1 test in 0.475s
OK
Process finished with exit code 0
ascii
{'args': {'age': '18', 'name': '张三'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.25.0', 'X-Amzn-Trace-Id': 'Root=1-60a39894-356c8857476a9cae1128cf09'}, 'origin': '117.22.144.67', 'url': 'http://httpbin.org/get?age=18&name=张三'}