Protobuf-Compiler相关类&代码生成流程

最新推荐文章于 2025-09-24 11:10:14 发布

原创

最新推荐文章于 2025-09-24 11:10:14 发布 · 2.6k 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#protobuf

本文详细介绍了Protobuf的代码生成流程，包括CommandLineInterface、SourceTree、Importer、Tokenizer等核心类的作用和工作原理。同时，阐述了Parser如何将tokenizer对象转化为FileDescriptorProto，以及CodeGenerator如何生成代码的相关知识。

代码生成流程：

核心流程如下图所示：

avatar

核心数据结构

类CommandLineInterface

generators_: map<string, GeneratorInfo>，提供从”–cpp_out” -> CppGenerator的映射，从protoc参数中获取需要的generator的名称；
plugins_：map<string, string> ，plugin提供非protobuf已有的CodeGenerator服务，plugin采用进程方式提供服务。plugins_记录的是：plugin名称 -> plugin可执行程序在磁盘上的path
plugin_prefix_：设置为”protoc-“

类SourceTree

接口类，表示.proto文件的目录树。

类DiskSourceTree

类SourceTree的子类，用于加载磁盘上的多个文件，并且提供从物理磁盘路径/文件 ->SourceTree上的节点的map关系.还可以设置”” -> SourceTree上的root节点。如果多个路径设置对应了同一个文件，那么搜索时会按照设置的顺序来处理。

类Importer

根据.proto文件的name，返回对应的FileDescriptor。实际是通过DescriptorPool提供的服务。

类io::Tokenizer

词法分析器，1个Tokenizer对象处理一个ZeroCopyInputStream，将raw text的stream转化为能够被parser解析的stream（token序列）。外部使用者仅需循环调用Tokenizer::Next()和Tokenizer::current()，就可以按照顺序获得对应的token，就像一个token化的stream一样。

token的定义如下：

      struct Token {
        TokenType type;
        string text;       // The exact text of the token as it appeared in
                           // the input.  e.g. tokens of TYPE_STRING will still
                           // be escaped and in quotes.

        // "line" and "column" specify the position of the first character of
        // the token within the input stream.  They are zero-based.
        int line;
        int column;
        int end_column;
      };

token类型定义：

      enum TokenType {
        TYPE_START,       // Next() has not yet been called.
        TYPE_END,         // End of input reached.  "text" is empty.

        TYPE_IDENTIFIER,  // A sequence of letters, digits, and underscores, not