
Spark源码
文章平均质量分 56
zhixingheyi_tian
Intel Big Data. Spark
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Spark 源码编译的各种方式
spark build原创 2022-12-02 09:28:02 · 734 阅读 · 0 评论 -
Spark 之 Plan (11 month)
Spark Plan原创 2022-11-14 20:07:41 · 777 阅读 · 0 评论 -
Spark Decode parquet
Spark Parquet Decode原创 2022-08-17 20:21:02 · 1076 阅读 · 0 评论 -
parquet meta data and size
Parquet Meta原创 2022-08-16 10:12:37 · 588 阅读 · 0 评论 -
Spark UT troubleshoot 记录
checkAnswer系列- pivot with null and aggregate type not supported by PivotFirst returns correct result *** FAILED ***null 由 null 变成 0,引入了 多组 c2r,r2c,问题出在ArrowWritableColumnVector读写出了问题。原创 2022-05-10 18:21:00 · 340 阅读 · 0 评论 -
Spark 之 OnHeapColumnVector
allocateColumns /** * Allocates columns to store elements of each field of the schema on heap. * Capacity is the initial capacity of the vector and it will grow as necessary. Capacity is * in number of elements, not number of bytes. */ publi原创 2022-03-26 16:26:46 · 2301 阅读 · 0 评论 -
Spark 3.0 Data Source v2
以parquet 来举例基本的接口实现DataSourceV2 => Table => ScanBuilder => Scan => PartitionReaderFactory= (VectorizedParquetRecordReader )ParquetPartitionReaderFactoryParquetPartitionReaderFactory 包装了 VectorizedParquetRecordReader...原创 2020-09-11 10:47:02 · 872 阅读 · 0 评论 -
Spark 之 ListenerBus
ListenerBus 是一个 trait,可以接受事件,并将事件提交到对应事件的监听器private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {原创 2020-02-27 11:58:12 · 218 阅读 · 0 评论 -
Spark 之 org.apache.spark.network.util.JavaUtils
spark 递归删除目录的方法,会尝试两种做法若第一种deleteRecursivelyUsingUnixNative不成功,会立即尝试第二种// org.apache.spark.network.util.JavaUtils.java/** * Delete a file or directory and its contents recursively. * Don't fo...原创 2019-10-22 10:06:44 · 1007 阅读 · 0 评论 -
Spark 之 InternalRow
InternalRow — Abstract Binary Row FormatInternalRow is also called Catalyst row or Spark SQL row.abstract class InternalRow extends SpecializedGetters with Serializable {}UnsafeRowUnsafeRow is a...原创 2019-04-01 14:32:18 · 955 阅读 · 0 评论 -
Spark 引擎层面的 VectorizedParquet 代码分析
VectorizedParquetRecordReader.java{ /** * The number of rows that have been returned. */ private long rowsReturned;/** * The number of rows that have been reading, including the current...原创 2019-02-25 11:39:27 · 647 阅读 · 1 评论 -
Physical Query Operator
BinaryExecNodeBinary physical operator with two child left and right physical operatorsLeafExecNodeLeaf physical operator with no childrenBy default, the set of all attributes that are produce...原创 2019-01-12 15:31:56 · 288 阅读 · 0 评论