python json loads 解析含有重复key的json

最新推荐文章于 2025-06-25 09:06:26 发布

原创最新推荐文章于 2025-06-25 09:06:26 发布 · 1.6k 阅读

2 ·

CC 4.0 BY-SA版权

部署运行你感兴趣的模型镜像

python自带的json包能够方便的解析json文本，但是如果json文本中包含重复key的时候，解析的结果就是错误的。如下为例

Python

In [5]: d = """ {"key":"1", "key":"2", "key":"3", "key2":"4"}""" In [6]: d Out[6]: ' {"key":"1", "key":"2", "key":"3", "key2":"4"}' In [7]: json.loads(d) Out[7]: {'key': '3', 'key2': '4'}

In [ 5 ] : d = """ {"key":"1", "key":"2", "key":"3", "key2":"4"}"""

In [ 6 ] : d

Out [ 6 ] : ' {"key":"1", "key":"2", "key":"3", "key2":"4"}'

In [ 7 ] : json . loads ( d )

Out [ 7 ] : { 'key' : '3' , 'key2' : '4' }

原因是python解析的时候是创建一个字典，首先会读取到key的值，但是后面遇到重复键的时候，后来的值会覆盖原来的值，导致最后只有一个key的值留下来。

这肯定不是我们想要的结果，其中一种结果可以是将相同键的值聚合成一个数组，即如下所示。

Python

{ "key":["1","2","3"], "key2":"4" }

{

"key" : [ "1" , "2" , "3" ] ,

"key2" : "4"

}

如何得到这种结果呢？python的json包还是留下了活路的。首先来看一下解析函数loads的原型。

Python

json.loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

json . loads ( s , encoding = None , cls = None ,

object_hook = None , parse_float = None ,

parse_int = None , parse_constant = None ,

object_pairs_hook = None , * * kw )

要注意的是object_pairs_hook这个参数，这是个回调函数，在解析json文本的时候会调用它并更改返回的结果。为了得到前述的结果，我们定义如下的hook函数：

Python

def my_obj_pairs_hook(lst): result={} count={} for key,val in lst: if key in count:count[key]=1+count[key] else:count[key]=1 if key in result: if count[key] > 2: result[key].append(val) else: result[key]=[result[key], val] else: result[key]=val return result

def my_obj_pairs_hook ( lst ) :

result = { }

count = { }

for key , val in lst :

if key in count : count [ key ] = 1 + count [ key ]

else : count [ key ] = 1

if key in result :

if count [ key ] > 2 :

result [ key ] . append ( val )

else :

result [ key ] = [ result [ key ] , val ]

else :

result [ key ] = val

return result

在解析文本的时候将上述函数作为参数传入，代码如下所示：

Python

json.loads(data, object_pairs_hook=my_obj_pairs_hook) Signature: json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Docstring: Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. ``parse_float``, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal). ``parse_int``, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float). ``parse_constant``, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. The ``encoding`` argument is ignored and deprecated. File: /usr/local/anaconda3/lib/python3.6/json/__init__.py Type: function

json . loads ( data , object_pairs_hook = my_obj_pairs_hook )

Signature : json . loads ( s , * , encoding = None , cls = None , object_hook = None , parse_float = None , parse_int = None , parse_constant = None , object_pairs_hook = None , * * kw )

Docstring :

Deserialize ` ` s ` ` ( a ` ` str ` ` , ` ` bytes ` ` or ` ` bytearray ` ` instance

containing a JSON document ) to a Python object .

` ` object_hook ` ` is an optional function that will be called with the

result of any object literal decode ( a ` ` dict ` ` ) . The return value of

` ` object_hook ` ` will be used instead of the ` ` dict ` ` . This feature

can be used to implement custom decoders ( e . g . JSON - RPC class hinting ) .

` ` object_pairs_hook ` ` is an optional function that will be called with the

result of any object literal decoded with an ordered list of pairs . The

return value of ` ` object_pairs_hook ` ` will be used instead of the ` ` dict ` ` .

This feature can be used to implement custom decoders that rely on the

order that the key and value pairs are decoded ( for example ,

collections . OrderedDict will remember the order of insertion ) . If

` ` object_hook ` ` is also defined , the ` ` object_pairs_hook ` ` takes priority .

` ` parse_float ` ` , if specified , will be called with the string

of every JSON float to be decoded . By default this is equivalent to

float ( num_str ) . This can be used to use another datatype or parser

for JSON floats ( e . g . decimal . Decimal ) .

` ` parse_int ` ` , if specified , will be called with the string

of every JSON int to be decoded . By default this is equivalent to

int ( num_str ) . This can be used to use another datatype or parser

for JSON integers ( e . g . float ) .

` ` parse_constant ` ` , if specified , will be called with one of the

following strings : - Infinity , Infinity , NaN .

This can be used to raise an exception if invalid JSON numbers

are encountered .

To use a custom ` ` JSONDecoder ` ` subclass , specify it with the ` ` cls ` `

kwarg ; otherwise ` ` JSONDecoder ` ` is used .

The ` ` encoding ` ` argument is ignored and deprecated .

File : / usr / local / anaconda3 / lib / python3 . 6 / json / __init__ . py

Type : function

即可得到前述的相同键的值合并为数组的结果。
在这个示例中，传入my_obj_pairs_hook的参数是一个元组列表，大致如下所示：

Python

[("key","1"),("key","2"),("key","3"),("key2","4")]

1 2	[ ( "key" , "1" ) , ( "key" , "2" ) , ( "key" , "3" ) , ( "key2" , "4" ) ]

之所以参数是这个样子，是因为这几个键值对组成了一个字典，python使用默认的dict方法返回字典，自然会出现值覆盖的情况。而有了my_obj_pairs_hook之后就调用这个函数得到字典结果，这样我们就保证了键值的不丢失，最终得到我们希望的结果。如果是个更加复杂的json文本，则每次解析一个字典的时候都会调用这个函数，也会传入不同的元组列表，大致如示例所示。

zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱

您可能感兴趣的与本文相关的镜像

Python3.9

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本