The Java serialization algorithm revealed---reference

本文深入探讨了Java序列化过程,解释了为什么序列化是必要的,并提供了如何序列化对象及还原序列化的对象的方法。详细分析了序列化算法的工作原理,包括类元数据的写入、递归地写出超类描述直至java.lang.Object、从顶级超类开始实际数据的写入,直到最衍生类。同时,展示了序列化对象的格式和Java序列化算法的细节。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Serialization is the process of saving an object's state to a sequence of bytes; deserialization is the process of rebuilding those bytes into a live object. The Java Serialization API provides a standard mechanism for developers to handle object serialization. In this tip, you will see how to serialize an object, and why serialization is sometimes necessary. You'll learn about the serialization algorithm used in Java, and see an example that illustrates the serialized format of an object. By the time you're done, you should have a solid knowledge of how the serialization algorithm works and what entities are serialized as part of the object at a low level.

Why is serialization required?

In today's world, a typical enterprise application will have multiple components and will be distributed across various systems and networks. In Java, everything is represented as objects; if two Java components want to communicate with each other, there needs be a mechanism to exchange data. One way to achieve this is to define your own protocol and transfer an object. This means that the receiving end must know the protocol used by the sender to re-create the object, which would make it very difficult to talk to third-party components. Hence, there needs to be a generic and efficient protocol to transfer the object between components. Serialization is defined for this purpose, and Java components use this protocol to transfer objects.

Figure 1 shows a high-level view of client/server communication, where an object is transferred from the client to the server through serialization.

 

A high-level view of serialization in action

 

Figure 1. A high-level view of serialization in action 

How to serialize an object

In order to serialize an object, you need to ensure that the class of the object implements thejava.io.Serializable interface, as shown in Listing 1.

Listing 1. Implementing Serializable
import  java.io.Serializable;
classTestSerialimplementsSerializable{
publicbyte version =100;
publicbyte count =0;
}

In Listing 1, the only thing you had to do differently from creating a normal class is implement the java.io.Serializable interface. The Serializable interface is a marker interface; it declares no methods at all. It tells the serialization mechanism that the class can be serialized.

Now that you have made the class eligible for serialization, the next step is to actually serialize the object. That is done by calling the writeObject() method of thejava.io.ObjectOutputStream class, as shown in Listing 2.

Listing 2. Calling writeObject()
publicstaticvoid main(String args[])throwsIOException{
FileOutputStream fos =newFileOutputStream("temp.out");
ObjectOutputStream oos =newObjectOutputStream(fos);
TestSerial ts =newTestSerial();
    oos.writeObject(ts);
    oos.flush();
    oos.close();}

Listing 2 stores the state of the TestSerial object in a file called temp.out.oos.writeObject(ts); actually kicks off the serialization algorithm, which in turn writes the object to temp.out.

To re-create the object from the persistent file, you would employ the code in Listing 3.

Listing 3. Recreating a serialized object
publicstaticvoid main(String args[])throwsIOException{
FileInputStream fis =newFileInputStream("temp.out");
ObjectInputStream oin =newObjectInputStream(fis);
TestSerial ts =(TestSerial) oin.readObject();
System.out.println("version="+ts.version);}

In Listing 3, the object's restoration occurs with theoin.readObject() method call. This method call reads in the raw bytes that we previously persisted and creates a live object that is an exact replica of the original object graph. Because readObject() can read any serializable object, a cast to the correct type is required.

Executing this code will print version=100 on the standard output.

The serialized format of an object

What does the serialized version of the object look like? Remember, the sample code in the previous section saved the serialized version of the TestSerial object into the file temp.out. Listing 4 shows the contents of temp.out, displayed in hexadecimal. (You need a hexadecimal editor to see the output in hexadecimal format.)

Listing 4. Hexadecimal form of TestSerial
AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65
73 74 A0 0C 34 00 FE B1 DD F9 02 00 02 42 00 05
63 6F 75 6E 74 42 00 07 76 65 72 73 69 6F 6E 78
70 00 64

If you look again at the actual TestSerial object, you'll see that it has only two byte members, as shown in Listing 5.

Listing 5. TestSerial's byte members
publicbyte version =100;
publicbyte count =0;

Java's serialization algorithm

By now, you should have a pretty good knowledge of how to serialize an object. But how does the process work under the hood? In general the serialization algorithm does the following:

  • It writes out the metadata of the class associated with an instance.
  • It recursively writes out the description of the superclass until it findsjava.lang.object.
  • Once it finishes writing the metadata information, it then starts with the actual data associated with the instance. But this time, it starts from the topmost superclass.
  • It recursively writes the data associated with the instance, starting from the least superclass to the most-derived class.

I've written a different example object for this section that will cover all possible cases. The new sample object to be serialized is shown in Listing 6.

Listing 6. Sample serialized object
class parent implementsSerializable{
int parentVersion =10;
}
class contain implementsSerializable{
int containVersion =11;
}

public classSerialTestextends parent implementsSeriali zable{
int version =66;
contain con =new contain();

publicint getVersion(){
return version;}

public static void main(String args[])throwsIOException{
FileOutputStream fos =newFileOutputStream("temp.out");
ObjectOutputStream oos =newObjectOutputStream(fos);
SerialTest st =newSerialTest();
        oos.writeObject(st);
        oos.flush();
        oos.close();
}
}

 

This example is a straightforward one. It serializes an object of type SerialTest, which is derived from parent and has a container object, contain. The serialized format of this object is shown in Listing 7.

Listing 7. Serialized form of sample object
AC ED 00057372000A53657269616C546573740552815A AC 6602 F6 02000249000776657273696F6E4C0003636F6E7400094C636F6E7461696E3B78720006706172656E740E DB D2 BD 85 EE 637A02000149000D706172656E7456657273696F6E78700000000A0000004273720007636F6E7461696E FC BB E6 0E FB CB 60 C7 02000149000E636F6E7461696E56657273696F6E78700000000B
Figure 2 offers a high-level look at the serialization algorithm for this scenario.

 

An outline of the serialization algorithm

 

Figure 2. An outline of the serialization algorithm

Let's go through the serialized format of the object in detail and see what each byte represents. Begin with the serialization protocol information:

  • AC EDSTREAM_MAGIC. Specifies that this is a serialization protocol.
  • 00 05STREAM_VERSION. The serialization version.
  • 0x73TC_OBJECT. Specifies that this is a new Object.

The first step of the serialization algorithm is to write the description of the class associated with an instance. The example serializes an object of type SerialTest, so the algorithm starts by writing the description of theSerialTest class.

  • 0x72TC_CLASSDESC. Specifies that this is a new class.
  • 00 0A: Length of the class name.
  • 53 65 72 69 61 6c 54 65 73 74SerialTest, the name of the class.
  • 05 52 81 5A AC 66 02 F6SerialVersionUID, the serial version identifier of this class.
  • 0x02: Various flags. This particular flag says that the object supports serialization.
  • 00 02: Number of fields in this class.

Next, the algorithm writes the fieldint version = 66;.

  • 0x49: Field type code. 49 represents "I", which stands for Int.
  • 00 07: Length of the field name.
  • 76 65 72 73 69 6F 6Eversion, the name of the field.

And then the algorithm writes the next field, contain con = new contain();. This is an object, so it will write the canonical JVM signature of this field.

  • 0x74TC_STRING. Represents a new string.
  • 00 09: Length of the string.
  • 4C 63 6F 6E 74 61 69 6E 3BLcontain;, the canonical JVM signature.
  • 0x78TC_ENDBLOCKDATA, the end of the optional block data for an object.

The next step of the algorithm is to write the description of the parent class, which is the immediate superclass of SerialTest.

  • 0x72TC_CLASSDESC. Specifies that this is a new class.
  • 00 06: Length of the class name.
  • 70 61 72 65 6E 74SerialTest, the name of the class
  • 0E DB D2 BD 85 EE 63 7ASerialVersionUID, the serial version identifier of this class.
  • 0x02: Various flags. This flag notes that the object supports serialization.
  • 00 01: Number of fields in this class.

Now the algorithm will write the field description for the parentclass. parent has one field, int parentVersion = 100;.

 
  • 0x49: Field type code. 49 represents "I", which stands forInt.
  • 00 0D: Length of the field name.
  • 70 61 72 65 6E 74 56 65 72 73 69 6F 6EparentVersion, the name of the field.
  • 0x78TC_ENDBLOCKDATA, the end of block data for this object.
  • 0x70TC_NULL, which represents the fact that there are no more superclasses because we have reached the top of the class hierarchy.

So far, the serialization algorithm has written the description of the class associated with the instance and all its superclasses. Next, it will write the actual data associated with the instance. It writes the parent class members first:

  • 00 00 00 0A: 10, the value of parentVersion.

Then it moves on to SerialTest.

  • 00 00 00 42: 66, the value of version.

The next few bytes are interesting. The algorithm needs to write the information about the contain object, shown in Listing 8.

Listing 8. The contain object
contain con =new contain();

Remember, the serialization algorithm hasn't written the class description for the contain class yet. This is the opportunity to write this description.

  • 0x73TC_OBJECT, designating a new object.
  • 0x72TC_CLASSDESC.
  • 00 07: Length of the class name.
  • 63 6F 6E 74 61 69 6Econtain, the name of the class.
  • FC BB E6 0E FB CB 60 C7SerialVersionUID, the serial version identifier of this class.
  • 0x02: Various flags. This flag indicates that this class supports serialization.
  • 00 01: Number of fields in this class.

Next, the algorithm must write the description for contain's only field, int containVersion = 11;.

  • 0x49: Field type code. 49 represents "I", which stands forInt.
  • 00 0E: Length of the field name.
  • 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6EcontainVersion, the name of the field.
  • 0x78TC_ENDBLOCKDATA.

Next, the serialization algorithm checks to see if contain has any parent classes. If it did, the algorithm would start writing that class; but in this case there is no superclass for contain, so the algorithm writes TC_NULL.

  • 0x70TC_NULL.

Finally, the algorithm writes the actual data associated with contain.

  • 00 00 00 0B: 11, the value of containVersion.

Conclusion

In this tip, you have seen how to serialize an object, and learned how the serialization algorithm works in detail. I hope this article gives you more detail on what happens when you actually serialize an object.

About the author

Sathiskumar Palaniappan has more than four years of experience in the IT industry, and has been working with Java-related technologies for more than three years. Currently, he is working as a system software engineer at the Java Technology Center, IBM Labs. He also has experience in the telecom industry.

Resources

 

reference address:http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值