In this blog post, we are going to learn about JSON. We are going to learn what JSON is, why it exists, why people use it and finally how to utilise the power of python to help us process it.
在此博客文章中,我们将学习JSON。 我们将学习JSON是什么,为什么存在,为什么人们使用它以及最终如何利用python的功能来帮助我们处理JSON。
As a data scientist, programmer or simply an avid learner of technology, it is important that you understand what JSON is and how to use it. I am certain that you will find numerous applications for it once you understand it.
作为数据科学家,程序员或仅仅是技术的狂热学习者,重要的是要了解什么是JSON以及如何使用它。 我敢肯定,一旦您了解了它,您将找到许多应用程序。
什么是JSON? (What is JSON?)
JSON stands for JavaScript Object notation and it is essentially a way to represent data. JSON follows a format that humans can naturally understand, and its true power comes from its ability to capture complex data relationships.
JSON表示JavaScript对象表示法,它实质上是表示数据的一种方式。 JSON遵循人类可以自然理解的格式,其真正能力来自捕获复杂数据关系的能力。
You can see this in the following example:
您可以在以下示例中看到这一点:
{
"usersName": "Costas",
"website": {
"host": "www.medium.com",
"account": "costasandreou",
"blogpost": [
{
"type": "title",
"description": {
"main" : "Working with JSON in Python",
"sub" : "the info you need"
}
}
]
}
}
Notice how JSON is structured in key-value pairs, while also holding array objects (notice that “blogpost” has several types underneath it). This is very important to note if you’re coming from a flat data structure background (think Excel or Pandas dataFrames).
请注意,JSON是如何在键值对中构造的,同时还保留数组对象(请注意,“ blogpost”在其下方具有多种类型)。 如果您来自平面数据结构背景(请考虑使用Excel或Pandas dataFrames),这一点非常重要。
为什么JSON存在? (Why does JSON exist?)
JSON was created at the beginning of the millennial to save us from Flash and Java applets. If you happen to not remember those in-browser plugins, consider yourself extremely lucky, as they were the stuff that nightmares are made of! But I digress. JSON was built to provide a protocol for stateless real-time server to browser communication¹.
JSON是在千禧年初期创建的,目的是将我们从Flash和Java小程序中解救出来。 如果您不记得那些浏览器内置插件,请认为自己非常幸运,因为它们是噩梦的源头! 但是我离题了。 JSON的构建旨在为无状态实时服务器与浏览器通信¹提供协议。
If you are already familiar with XML, you can think of JSON as a lightweight, easier to use, faster alternative.
如果您已经熟悉XML,则可以将JSON视为一种轻量级,易于使用且速度更快的替代方案。
人们为什么使用JSON? (Why do people use JSON?)
It is believed that the first major company to begin offering services and therefore popularising the adoption of JSON, was Yahoo in 2005¹. It is now believed that JSON is the most used data format.
据信,第一家开始提供服务并因此普及采用JSON的大型公司是2005年的雅虎。 现在认为JSON是最常用的数据格式。
The top 3 reasons people use JSON are:
人们使用JSON的三大原因是:
- Very easy to read, write and manipulate 易于阅读,编写和操作
- It’s very fast to transfer over the network通过网络传输非常快
- Supported by all major browsers, backend tech stacks所有主流浏览器均支持,后端技术栈
读取JSON(Reading the JSON)
The very first thing you’d want to do when you have to work with JSON is to read it into your Python application. The json library in Python expects JSON to come through as string.
当您必须使用JSON时,您要做的第一件事就是将其读入Python应用程序。 Python中的json库希望JSON以字符串形式通过。
Assuming your data JSON data is already a string:
假设您的数据JSON数据已经是一个字符串:
obj = '{ "usersName": "Costas", "website": { "host": "www.medium.com", "account": "costasandreou", "blogpost": [ { "type": "title", "description": { "main" : "Working with JSON in Python", "sub" : "the info you need" } } ] } }'import json
json_obj = json.loads(obj)
print('obj: ',type(obj))
print('obj.usersName: ', json_obj['usersName'])
which returns:
返回:
obj: <class 'str'>
obj.usersName: Costas
If on the other hand, if you hold the JSON in a Python Object (like a dictionary), the json library allows you to convert it back to string for further processing.
另一方面,如果您将JSON保留在Python对象(如字典)中,则json库允许您将其转换回字符串以进行进一步处理。
obj1 = { "usersName": "Costas", "website": { "host": "www.medium.com", "account": "costasandreou", "blogpost": [ { "type": "title", "description": { "main" : "Working with JSON in Python", "sub" : "the info you need" } } ] } }print('obj1: ',type(obj1))
json_obj1str = json.dumps(obj1)
print('json_obj1str: ', type(json_obj1str))
json_obj1 = json.loads(json_obj1str)
print('obj1.usersName: ', json_obj1['usersName'])
which returns:
返回:
obj1: <class 'dict'>
json_obj1str: <class 'str'>
obj1.usersName: Costas
使用JSON (Working with JSON)
Now that we have seen how to load our JSON data into our Python application, let us look at the options we have to work with JSON.
现在,我们已经了解了如何将JSON数据加载到Python应用程序中,让我们看一下必须使用JSON的选项。
直接查询 (Direct Querying)
Extracting information from the JSON is as simple as defining the path we are after and provide the name of the key-value pair. Let’s look at a few examples.
从JSON提取信息就像定义我们要遵循的路径并提供键值对的名称一样简单。 让我们看几个例子。
>>> print(json_obj['usersName'])
Costas>>> print(json_obj['website'])
{'host': 'www.medium.com', 'account': 'costasandreou', 'blogpost': [{'type': 'title', 'description': {'main': 'Working with JSON in Python', 'sub': 'the info you need'}}]}>>> print(json_obj['website']['host'])
www.medium.com>>> print(json_obj['website']['blogpost'])
[{'type': 'title', 'description': {'main': 'Working with JSON in Python', 'sub': 'the info you need'}}]>>> print(json_obj['website']['blogpost'][0]['description']['sub'])
the info you need
Once we can extract the data as above, we can easily store them in lists or databases for further processing.
一旦我们可以如上所述提取数据,就可以轻松地将它们存储在列表或数据库中以进行进一步处理。
用熊猫处理数据 (Processing your data with Pandas)
If you are already comfortable in Pandas and you’d like to work with flat data, there is a quick way to get your data into data frames. First up, you can take the first level of your data and add it to a dataFrame.
如果您已经对Pandas感到满意并且想使用平面数据,则可以使用一种快速的方法将数据放入数据框。 首先,您可以获取数据的第一级并将其添加到dataFrame中。
df = pd.read_json(obj)

As you can see, read_json takes in a string data type and only allows us to see top-level information. In other words, it doesn’t flatten any of the structures. We need a way to navigate through the lower levels of the data.
如您所见,read_json接受字符串数据类型,仅允许我们查看顶级信息。 换句话说,它不会使任何结构扁平化。 我们需要一种方法来浏览较低级别的数据。
We can flatten the data by a further level by using the json_normalise pandas method.
我们可以使用json_normalise pandas方法将数据进一步平整。
obj = json.loads(obj)
df1 = pd.json_normalize(obj)

展平JSON (Flattening the JSON)
There are certain cases where you simply want to flatten out the data. It certainly can make data analysis quicker at times and can allow for the visual inspection of a large number of records (think Excel type data filtering).
在某些情况下,您只是想整理数据。 当然,它有时可以使数据分析更快,并且可以可视化检查大量记录(请考虑使用Excel类型的数据过滤)。
You can do that using a third-party library. First up, install the library:
您可以使用第三方库来实现。 首先,安装库:
pip install json-flatten
Then, run the following commands:
然后,运行以下命令:
>>> import json
>>> obj = json.loads(obj)>>> import json_flatten
>>> print(json_flatten.flatten(obj)){'usersName': 'Costas', 'website.host': 'www.medium.com', 'website.account': 'costasandreou', 'website.blogpost.0.type': 'title', 'website.blogpost.0.description.main': 'Working with JSON in Python', 'website.blogpost.0.description.sub': 'the info you need'}
交叉记录分析 (Cross Record Analysis)
All the items we have explored so far focused on processing a single JSON record. However, in the real world, you are likely required to process many JSON documents and be required to aggregate across records.
到目前为止,我们探索的所有项目都集中在处理单个JSON记录上。 但是,在现实世界中,您可能需要处理许多JSON文档,并且需要跨记录进行汇总。
You can consider any of the following methods for enabling your analysis:
您可以考虑采用以下任何一种方法来进行分析:
- MongoDB to store the data and then run aggregations on top MongoDB存储数据,然后在顶部运行聚合
- Flatten the data and use pandas for operations整理数据并使用熊猫进行操作
- Flatten the data and export to Excel for further operations展平数据并导出到Excel以进行进一步的操作
- Choose the attributes you are after and load them in a SQL DB for further analysis选择您需要的属性并将其加载到SQL DB中以进行进一步分析
结论(Conclusions)
There you have it. In exactly 5 minutes you know everything you need to know to get you started working with JSON. I encourage you to spend some time and get intimately familiar with it. It is widely used and it is certainly a must-have on your CV!
你有它。 在短短5分钟内,您就了解了开始使用JSON所需的一切。 我鼓励您花一些时间并熟悉它。 它被广泛使用,并且绝对是您简历上的必备品!
翻译自: https://towardsdatascience.com/working-with-json-in-python-a53c3b88cc0