2021-03-19 protobuf跟踪测试_bytesize proto-优快云博客

1个字节8位,最高位被用来标识终止编码, 那么还有7位, 最高可表示128
当数字<128时, 直接使用单个字节存储
当数字>=128时, 再使用varint编码, 由于varint特性, 必然>=2字节占用, 则先读的第一个字节必然最高位为1, 则读取第一个字节, 则必然>=256>128
读取时,先读取第一个字节
1. 如果<128, 则直接取值
2. 否则, 使用varint

适用范围:

protobuf: 所有的整数, 都用varint的优化的方式来存储

TAG
复合类型的长度

二. 基类分割符

以递推来看待类结构的序列化过程, 则基类, 可以认为是本类的一个值.

以TAG的意义来看, 基类也相当于一种不需要长度标识, 不同于lenDel的类型

从而可以用 TAG + VALUE的方式, 用唯一的TAG来标识, 不需要用魔数

TAG = (index << 3 ) | (wireTyipe & 7)

所以可以通过, 添加一个新的wireType类型, 来标识基类这种类型. index无意义, 取0即可.

基类的分隔符为 BaseWireTyipe

存储字节数为1

三. 复合类型的长度

复合类型出现在递归公式中, 而非递推公式, 对于长度, protobuf采取的策略是

len(varint) + value

重点在于, 先写len, 再写value, 在写len的时候, 就需要知道len的具体值.

protobuf采取了一个预先计算的方法, 函数签名固定为

size_t ByteSizelong() const;

每个复合类型, 类定义中都包含本类的长度字段, 序列化过程中, 直接读取.

对于len长度的压缩,对应的解决策略有两种:

三.1 类似protobuf的做法, 在序列化前,先做预处理, 把复合类型的长度都获取到.

数据结构定义简略:

struct CompundSize

{

size_t m_len;

std::map<std::string, CompundSize> m_com_size;

CompundSize m_super;

};

三.2 交换len与value的位置, len放在value之后

这样在写len的时候,因为已经通过value的写入偏移, 可以通过字节流偏偏移获取len的值.

这样就可以通过varint方式做压缩

读取则方向进行

四. protobuf对于容器的处理

list: TAG + LEN + VALUE | TAG + LEN + VALUE

map: TAG + LEN + K + LEN + VALUE | TAG + LEN + K + LEN + V

分析: 对于容器, 比较无脑, 直接把每个容器元素都当作成员, 给写了进去.

做兼容时, 滑动字节流, 是把每个元素都做一遍.

直白说, 额外占用了cpu, TAG也存在冗余, 不具备借鉴价值

代码:

caffe.proto

syntax = "proto2";
package caffe; //域名


enum PhoneType {  
    MOBILE = 0;  
    HOME = 1;  
    WORK = 2;  
  }  

message PhoneNumber {  
    required string number = 1;  
    optional PhoneType type = 2 [default = HOME];  
}  

message Person {  
  required string name = 1;  
  required int32 age = 2;  
  optional string email = 3;  
  
  repeated PhoneNumber phone = 4;
  map<string, string> mdata = 5;
}

生成脚本:

run_proto.sh

#!/bin/bash

rm -r -f caffe.pb.h
rm -r -f caffe.pb.cc

protoc -I=. --cpp_out=. ./caffe.proto
# protoc -I=. --lua_out=. ./caffe.proto

write.cpp

#include <iostream>  
#include <fstream>  
#include "caffe.pb.h"  
  
using namespace std;  
using namespace caffe;  
  
int main()  
{  
    Person person;  
  
    person.set_name("flamingo");     
    person.set_age(18);   
    person.set_email("majianfei1023@gmail.com");  
    // person.mutable_phone()->set_number("135525");

    for (size_t i = 0; i < 2; i++)
    {
        PhoneNumber* phone = person.add_phone();
        phone->set_number("hello world");
    }

    std::string str = "ss";

    auto data = person.mutable_mdata();
    // data->insert<std::string, std::string>(str, str);
    
    // Write  
    fstream output("./log", ios::out | ios::trunc | ios::binary);  
  
    if (!person.SerializeToOstream(&output)) {  
        cerr << "Failed to write msg." << endl;  
        return -1;  
    }  
  
    //system("pause");  
    return 0;  
}

read.cpp

#include <iostream>  
#include <fstream>  
#include "caffe.pb.h"  
  
 
using namespace std;  
using namespace caffe;  
  
void PrintInfo(const Person & person) {   
    cout << person.name() << endl;   
    cout << person.age() << endl;   
    cout << person.email() << endl;  
    // cout << person.phone().number() << endl; 
    // cout << person.phone().type() << endl; 
 
}   
  
int main()  
{  
    Person person;    
  
    fstream input("./log", ios::in | ios::binary);  
      
    if (!person.ParseFromIstream(&input)) {  
        cerr << "Failed to parse address book." << endl;  
        return -1;  
    }  
  
    PrintInfo(person);  
 
    return 0;  
}