protocol buffers 序列化实例

Protocol Buffers序列化与反序列化详解

最新推荐文章于 2023-12-20 18:18:01 发布

glory-of-me

最新推荐文章于 2023-12-20 18:18:01 发布

阅读量1k

点赞数

CC 4.0 BY-SA版权

分类专栏：分布式网站 rpc

本文链接：https://blog.youkuaiyun.com/glory1234work2115/article/details/51983537

分布式网站同时被 2 个专栏收录

22 篇文章

订阅专栏

rpc

4 篇文章

订阅专栏

本文介绍了Protocol Buffers的序列化过程，包括 proto 文件编写、生成 Java 类并进行测试。通过实例展示了序列化和反序列化的结果，并解析了序列化后的数字编码，解释了Base 128 Varints算法和Zigzag编码如何优化存储效率。

protocol buffers

1.首先是proto文件编写 addressbook.proto

package tutorial;

option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";

message Person{
    required string name = 1;
    required int32 age = 2;
    optional string email = 3;

    enum PhoneType{
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }

    message PhoneNumber{
        required string number = 1;
        optional PhoneType type = 2[default = HOME];
    }

    repeated PhoneNumber phone =  4;
}

message AddressBook{
    repeated Person person = 1;
}

2.使用protoc.exe使用proto文件生成java文件

....proto.exe --java_out=. addressbook.proto

3.测试生成的proto对应的java类

package com.example.tutorial;

import com.example.tutorial.AddressBookProtos.Person;
import com.google.protobuf.InvalidProtocolBufferException;

import java.util.Arrays;

public class Main {

    public static void main(String[] args) throws InvalidProtocolBufferException {
        Person person = Person.newBuilder().setEmail("gloryfome@163.com").setName("gloryfome").setAge(23).addPhone(Person.PhoneNumber.newBuilder().setNumber("186").setType(Person.PhoneType.MOBILE).build()).build();
        System.out.println("person.toString");
        System.out.println(person.toString());
        System.out.println("序列化");
        System.out.println(Arrays.toString(person.toByteArray()));
        //反序列化
        System.out.println("反序列化");
        Person newPerson = Person.parseFrom(person.toByteArray());
        System.out.println(newPerson);
    }
}

对应的打印结果：

person.toString
name: "gloryofme"
age: 23
email: "gloryofme@163.com"
phone {
number: "186"
type: MOBILE
}

序列化
Disconnected from the target VM, address: '127.0.0.1:53294', transport: 'socket'
[10, 9, 103, 108, 111, 114, 121, 111, 102, 109, 101, 16, 23, 26, 17, 103, 108, 111, 114, 121, 111, 102, 109, 101, 64, 49, 54, 51, 46, 99, 111, 109, 34, 7, 10, 3, 49, 56, 54, 16, 0]
反序列化
name: "gloryofme"
age: 23
email: "gloryofme@163.com"
phone {
number: "186"
type: MOBILE
}

分析上面的序列化结果：

protocol buffers 使用数字压缩算法是 base 128 varints算法，小于128的数字可以用一个字节表示，而不是4个字节

使用sint32和sint64类型优化负数，采用的是zigzag编码格式( Z字形),0->0,-1->1,1->2.....类推，优化负数在计算机中首位1表示符号负数，这样会造成绝对值很小的负数表示也会是很大的值，使用zigzag后可以很好使用varints算法来减少序列化二进制数据大小

10 --> 0000 1010 --> 00001|010 --> field_number<<3|field_type --》 field_number=1,field_type=2 对应的是string--》string name = 1;

9 --->0 0001001 --》name对应value的长度为9--》“gloryofme”

之后的 103,108....101 对应的就是 g,l....e (gloryofme)

采用的是unicode编码对应的十进制数，下面是unicode编码表（a~z,A~Z,0~9）：

A-Z 的 Unicode 字符编码表
	十进制　　　十六进制
1.“A”的 Unicode 编码为：
2.“B”的 Unicode 编码为：
3.“C”的 Unicode 编码为：
4.“D”的 Unicode 编码为：
5.“E”的 Unicode 编码为：
6.“F”的 Unicode 编码为：
7.“G”的 Unicode 编码为：
8.“H”的 Unicode 编码为：
9.“I”的 Unicode 编码为：
10.“J”的 Unicode 编码为：
11.“K”的 Unicode 编码为：
12.“L”的 Unicode 编码为：
13.“M”的 Unicode 编码为：
14.“N”的 Unicode 编码为：
15.“O”的 Unicode 编码为：
16.“P”的 Unicode 编码为：
17.“Q”的 Unicode 编码为：
18.“R”的 Unicode 编码为：
19.“S”的 Unicode 编码为：
20.“T”的 Unicode 编码为：
21.“U”的 Unicode 编码为：
22.“V”的 Unicode 编码为：
23.“W”的 Unicode 编码为：
24.“X”的 Unicode 编码为：
25.“Y”的 Unicode 编码为：
26.“Z”的 Unicode 编码为：

a-z 的 Unicode 字符编码表
	十进制　　　十六进制
1.“a”的 Unicode 编码为：
2.“b”的 Unicode 编码为：
3.“c”的 Unicode 编码为：
4.“d”的 Unicode 编码为：
5.“e”的 Unicode 编码为：
6.“f”的 Unicode 编码为：
7.“g”的 Unicode 编码为：
8.“h”的 Unicode 编码为：
9.“i”的 Unicode 编码为：
10.“j”的 Unicode 编码为：
11.“k”的 Unicode 编码为：
12.“l”的 Unicode 编码为：
13.“m”的 Unicode 编码为：
14.“n”的 Unicode 编码为：
15.“o”的 Unicode 编码为：
16.“p”的 Unicode 编码为：
17.“q”的 Unicode 编码为：
18.“r”的 Unicode 编码为：
19.“s”的 Unicode 编码为：
20.“t”的 Unicode 编码为：
21.“u”的 Unicode 编码为：
22.“v”的 Unicode 编码为：
23.“w”的 Unicode 编码为：
24.“x”的 Unicode 编码为：
25.“y”的 Unicode 编码为：
26.“z”的 Unicode 编码为：

0-9 的 Unicode 字符编码表
	十进制　　　十六进制
1.“0”的 Unicode 编码为：
2.“1”的 Unicode 编码为：
3.“2”的 Unicode 编码为：
4.“3”的 Unicode 编码为：
5.“4”的 Unicode 编码为：
6.“5”的 Unicode 编码为：
7.“6”的 Unicode 编码为：
8.“7”的 Unicode 编码为：
9.“8”的 Unicode 编码为：
10.“9”的 Unicode 编码为：