Python中json.loads的时候出错->要注意要解码的Json字符的编码

最新推荐文章于 2024-12-25 13:39:17 发布

转载最新推荐文章于 2024-12-25 13:39:17 发布 · 1w 阅读

本文介绍了在Python中使用json.loads函数解析JSON字符串时需要注意的事项，包括如何正确处理非UTF-8编码的字符串以及如何处理包含非ASCII字符的JSON数据。

部署运行你感兴趣的模型镜像

Python中json.loads的时候出错->要注意要解码的Json字符的编码

2012 年 4 月 3 日下午 11:09 crifan 已有3341人围观我来说几句

记录一些关于Python中使用json.loads时候的注意事项。

在贴出注意事项之前，先贴上，python文档中，关于json.loads的说明：

json. loads ( s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]] )

Deserialize s (a str or unicode instance containing a JSON document) to a Python object.

If s is a str instance and is encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified.Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.

The other arguments have the same meaning as in load().

1.如果传入的字符串的编码不是UTF-8的话，需要用encoding指定字符编码

对于：

dataDict = json.loads(dataJsonStr);

其中dataJsonStr是json字符串，如果其编码本身是非UTF-8的话，比如是GB2312的，那么上述代码，就会导致出错。改为对应的：

dataDict = json.loads(dataJsonStr, encoding="GB2312");

就可以了。

此处，即对应着上面函数解释中的：

If s is a str instance and is encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified

2.如果要解析的字符串，本身的编码类型，不是基于ASCII的，那么，调用json.loads之前，需要先将对应字符串，转换为Unicode类型的

还是以上述的：

dataDict = json.loads(dataJsonStr, encoding="GB2312");

为例，即使你此处的字符串dataJsonStr，已经通过encoding指定了合适的编码，但是由于其中，包含了其他的编码的字符，比如我本身dataJsonStr是GB2312的字符，但是其中又包含了的一些日文字符，此时，json.loads还是会出错，因为此处的dataJsonStr不是以ASCII为基础的字符编码，所以，需要先去将dataJsonStr转换为Unicode，然后再调用json.loads，就可以了。

代码如下：

dataJsonStrUni = dataJsonStr.decode("GB2312");
dataDict = json.loads(dataJsonStrUni, encoding="GB2312");

此处对应着上面解释中的：

Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.

您可能感兴趣的与本文相关的镜像

Python3.10

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本