python自带的json包能够方便的解析json文本,但是如果json文本中包含重复key的时候,解析的结果就是错误的。如下为例
|
1
2
3
4
5
6
7
8
9
|
In
[
5
]
:
d
=
""" {"key":"1", "key":"2", "key":"3", "key2":"4"}"""
In
[
6
]
:
d
Out
[
6
]
:
' {"key":"1", "key":"2", "key":"3", "key2":"4"}'
In
[
7
]
:
json
.
loads
(
d
)
Out
[
7
]
:
{
'key'
:
'3'
,
'key2'
:
'4'
}
|
原因是python解析的时候是创建一个字典,首先会读取到key的值,但是后面遇到重复键的时候,后来的值会覆盖原来的值,导致最后只有一个key的值留下来。
这肯定不是我们想要的结果,其中一种结果可以是将相同键的值聚合成一个数组,即如下所示。
|
1
2
3
4
5
|
{
"key"
:
[
"1"
,
"2"
,
"3"
]
,
"key2"
:
"4"
}
|
如何得到这种结果呢?python的json包还是留下了活路的。首先来看一下解析函数loads的原型。
|
1
2
3
4
5
|
json
.
loads
(
s
,
encoding
=
None
,
cls
=
None
,
object_hook
=
None
,
parse_float
=
None
,
parse_int
=
None
,
parse_constant
=
None
,
object_pairs_hook
=
None
,
*
*
kw
)
|
要注意的是object_pairs_hook这个参数,这是个回调函数,在解析json文本的时候会调用它并更改返回的结果。为了得到前述的结果,我们定义如下的hook函数:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def
my_obj_pairs_hook
(
lst
)
:
result
=
{
}
count
=
{
}
for
key
,
val
in
lst
:
if
key
in
count
:
count
[
key
]
=
1
+
count
[
key
]
else
:
count
[
key
]
=
1
if
key
in
result
:
if
count
[
key
]
>
2
:
result
[
key
]
.
append
(
val
)
else
:
result
[
key
]
=
[
result
[
key
]
,
val
]
else
:
result
[
key
]
=
val
return
result
|
在解析文本的时候将上述函数作为参数传入,代码如下所示:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
json
.
loads
(
data
,
object_pairs_hook
=
my_obj_pairs_hook
)
Signature
:
json
.
loads
(
s
,
*
,
encoding
=
None
,
cls
=
None
,
object_hook
=
None
,
parse_float
=
None
,
parse_int
=
None
,
parse_constant
=
None
,
object_pairs_hook
=
None
,
*
*
kw
)
Docstring
:
Deserialize
`
`
s
`
`
(
a
`
`
str
`
`
,
`
`
bytes
`
`
or
`
`
bytearray
`
`
instance
containing
a
JSON
document
)
to
a
Python
object
.
`
`
object_hook
`
`
is
an
optional
function
that
will
be
called
with
the
result
of
any
object
literal
decode
(
a
`
`
dict
`
`
)
.
The
return
value
of
`
`
object_hook
`
`
will
be
used
instead
of
the
`
`
dict
`
`
.
This
feature
can
be
used
to
implement
custom
decoders
(
e
.
g
.
JSON
-
RPC
class
hinting
)
.
`
`
object_pairs_hook
`
`
is
an
optional
function
that
will
be
called
with
the
result
of
any
object
literal
decoded
with
an
ordered
list
of
pairs
.
The
return
value
of
`
`
object_pairs_hook
`
`
will
be
used
instead
of
the
`
`
dict
`
`
.
This
feature
can
be
used
to
implement
custom
decoders
that
rely
on
the
order
that
the
key
and
value
pairs
are
decoded
(
for
example
,
collections
.
OrderedDict
will
remember
the
order
of
insertion
)
.
If
`
`
object_hook
`
`
is
also
defined
,
the
`
`
object_pairs_hook
`
`
takes
priority
.
`
`
parse_float
`
`
,
if
specified
,
will
be
called
with
the
string
of
every
JSON
float
to
be
decoded
.
By
default
this
is
equivalent
to
float
(
num_str
)
.
This
can
be
used
to
use
another
datatype
or
parser
for
JSON
floats
(
e
.
g
.
decimal
.
Decimal
)
.
`
`
parse_int
`
`
,
if
specified
,
will
be
called
with
the
string
of
every
JSON
int
to
be
decoded
.
By
default
this
is
equivalent
to
int
(
num_str
)
.
This
can
be
used
to
use
another
datatype
or
parser
for
JSON
integers
(
e
.
g
.
float
)
.
`
`
parse_constant
`
`
,
if
specified
,
will
be
called
with
one
of
the
following
strings
:
-
Infinity
,
Infinity
,
NaN
.
This
can
be
used
to
raise
an
exception
if
invalid
JSON
numbers
are
encountered
.
To
use
a
custom
`
`
JSONDecoder
`
`
subclass
,
specify
it
with
the
`
`
cls
`
`
kwarg
;
otherwise
`
`
JSONDecoder
`
`
is
used
.
The
`
`
encoding
`
`
argument
is
ignored
and
deprecated
.
File
:
/
usr
/
local
/
anaconda3
/
lib
/
python3
.
6
/
json
/
__init__
.
py
Type
:
function
|
即可得到前述的相同键的值合并为数组的结果。
在这个示例中,传入my_obj_pairs_hook的参数是一个元组列表,大致如下所示:
|
1
2
|
[
(
"key"
,
"1"
)
,
(
"key"
,
"2"
)
,
(
"key"
,
"3"
)
,
(
"key2"
,
"4"
)
]
|
之所以参数是这个样子,是因为这几个键值对组成了一个字典,python使用默认的dict方法返回字典,自然会出现值覆盖的情况。而有了my_obj_pairs_hook之后就调用这个函数得到字典结果,这样我们就保证了键值的不丢失,最终得到我们希望的结果。如果是个更加复杂的json文本,则每次解析一个字典的时候都会调用这个函数,也会传入不同的元组列表,大致如示例所示。
301

被折叠的 条评论
为什么被折叠?



