
ORC
zhixingheyi_tian
Intel Big Data. Spark
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
ORC 之 C++ 之 RowReaderOptions
include/orc/Reader.hh /** * Options for creating a RowReader. */ class RowReaderOptions { private: ORC_UNIQUE_PTR<RowReaderOptionsPrivate> privateBits;src/Options.hh/** * RowReaderOptions Implementation */ struct RowReaderOpti原创 2021-12-21 09:20:44 · 232 阅读 · 0 评论 -
ORC 之 C++ 之 Reader
orc/c++/include/orc/Reader.hhRowReader/** * The interface for reading rows in ORC files. * This is an an abstract class that will be subclassed as necessary. */ class RowReader { public: virtual ~RowReader(); /** * Get the selec原创 2021-11-23 17:19:23 · 1072 阅读 · 0 评论 -
ORC 从stream read 到 column read 的转变
ORC StreamOrc 在读取一个 stripe时,是安装stream为单位读取的,stripe 中的column可能只有一个stream,或者多个不同属性的stream组成,stream 不是 column的子单元,enum Kind { // boolean stream of whether the next value is non-null PRESENT = 0;...原创 2020-02-14 16:26:23 · 639 阅读 · 0 评论 -
ORC向量化读源码分析
RecordReaderImpl.java@Override public boolean nextBatch(VectorizedRowBatch batch) throws IOException { try { if (rowInStripe &amp;gt;= rowCountInStripe) { currentStripe += 1; ...原创 2019-02-25 13:22:00 · 1023 阅读 · 0 评论 -
ORC 读数据之元数据读取
ReaderImpl.java protected OrcTail tail;OrcTail.java public OrcProto.Footer getFooter() { return fileTail.getFooter(); }原创 2019-11-05 12:09:04 · 634 阅读 · 0 评论 -
ORC 读数据源码分析 之 createStreams
不管是 readAllDataStreams 还是 readPartialDataStreams ,最后都要 经过 createStreamscreateStreamscreateStreams 最后的输出结果是这个Map<StreamName, InStream> streams每个 StreamName 对应的内存流void createStreams(List<O...原创 2019-11-02 11:47:51 · 292 阅读 · 0 评论 -
ORC 读数据源码分析 之 readPartialDataStreams
readPartialDataStreamsRecordReaderImpl.java private void readPartialDataStreams(StripeInformation stripe) throws IOException { List<OrcProto.Stream> streamList = stripeFooter.getStreamsLi...原创 2019-10-31 19:30:20 · 340 阅读 · 0 评论 -
ORC 读数据源码分析 之 readAllDataStreams
readAllDataStreamsRecordReaderImpl.java private void readAllDataStreams(StripeInformation stripe) throws IOException { long start = stripe.getIndexLength(); long end = start + stripe.getDat...原创 2019-10-16 17:29:34 · 424 阅读 · 0 评论 -
ORC文件格式分析
ORC provides three level of indexes within each file:file level - statistics about the values in each column across the entire filestripe level - statistics about the values in each column for each ...原创 2019-10-12 13:39:43 · 1080 阅读 · 0 评论 -
Spark 引擎层面 OrcBatch 代码分析
OrcColumnarBatchReader.java /** * Return true if there exists more data in the next batch. If exists, prepare the next batch * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBa...原创 2019-02-25 18:31:30 · 459 阅读 · 0 评论