scrapy 爬虫返回json格式内容unicode编码转换为中文的问题解决

最新推荐文章于 2025-05-11 15:20:58 发布

原创最新推荐文章于 2025-05-11 15:20:58 发布 · 3.8k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#Python3 #scrapy #爬虫json

Python3 同时被 3 个专栏收录

13 篇文章

订阅专栏

爬虫学习

10 篇文章

订阅专栏

Python爬虫

9 篇文章

订阅专栏

本文介绍在Python3.6.5环境下，使用Scrapy框架爬取JSON数据时遇到的Unicode格式问题及解决方案。通过先编码再解码的方法，成功将Unicode数据转换为可读格式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近在基于python3.6.5 的环境使用scrapy框架爬虫获取json数据，返回的数据是unicode格式的，在spider里面的parse接口中打印response.text出来如下：

class TestSpider(Spider):
......
    def parse(self, response):
	    print(response.text)

结果如下：

{
	"status":"true",
	"last_view_time":null,
	"message":"",
	"shown_offset":0,
	"articles":[
	{
		"channel":"\u8d44\u8bafnew",
		"comments":113,
		"created_at":"09\u670828\u65e5",
		"desc":"  \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0\u5173\u6ce8ITValue\uff0c\u67e5\u770b\u4f01\u4e1a\u7ea7\u5e02\u573a\u6700\u65b0\u9c9c\u3001\u6700\u5177\u4ef7\u503c\u7684\u62a5\u9053\uff01\u4e2d\u56fd\u667a\u6167\u529e\u516c\u54c1\u724c\u6df1\u5733\u5e02\u84dd\u51cc
		.......

python3版本开始取消了string的decode方法，不能像以前一样使用类似mystring.decode(“utf-8”) 的方式转码。

其实可以绕一下解决，先编码再解码：

 def parse(self, response):
     datas = json.dumps(response.text, ensure_ascii= False, indent=4, separators=(',', ': '))
     json_data = json.loads(datas).encode('utf-8').decode('unicode_escape')
     print(json_data)

关键在于：mystr.encode('utf-8').decode('unicode_escape')

最后打印内容正常了：

{
	"status":"true",
	"last_view_time":null,
	"message":"",
	"shown_offset":0,
	"articles":[
	{
		"channel":"默认",
		"comments":25,
		"created_at":"09月28日",
		"desc":"  了解快捷键能够提升您的生产力。这里有一些实用的 Ubuntu 快捷键助您像专业人士一样使用 Ubuntu。-- Abhishek Prakash有用的原文链接请访问文末的...","downs":0,"id":"82879369","isexpert":0,"sourcetype":1,"tag":"","title"
		............