文章大纲
parquet 简介
文档:
https://parquet.apache.org/documentation/latest/
中文介绍,写的比较好的一篇:
https://blog.youkuaiyun.com/yu616568/article/details/50993491
核心点只要记住:
- 列式存储
- 自带Schema
- 具备Predicate Filter特性,当进行查询逻辑时可以提前进行过滤。
基本类型如下
The types supported by the file format are intended to be as minimal as possible, with a focus on how the types effect on disk storage. For example, 16-bit ints are not explicitly supported in the storage format since they are covered by 32-bit ints with an efficient encoding. This reduces the complexity of implementing readers and wr