spark源码分析:catalyst 草稿

object Optimizer extends RuleExecutor[LogicalPlan] {
val batches =
Batch("ConstantFolding", Once,
ConstantFolding,
[color=red]BooleanSimplification,
SimplifyFilters,[/color]
SimplifyCasts) ::
Batch("Filter Pushdown", Once,
CombineFilters,
PushPredicateThroughProject,
PushPredicateThroughInnerJoin,
ColumnPruning) :: Nil
}

SimplifyFilters

object SimplifyFilters extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case Filter(Literal(true, BooleanType), child) =>
child
case Filter(Literal(null, _), child) =>
LocalRelation(child.output)
case Filter(Literal(false, BooleanType), child) =>
LocalRelation(child.output)
}
}
起到削减一些逻辑判断,直接返回child或者child.output的作用,那么这些Literal(true, BooleanType)之类的模式是从哪里来的呢?查看Optimizer 的batches 可以发现,是SimplifyFilters前面的batch:BooleanSimplification,在这里面形成的


SQLContext.createSchemaRDD(RDD<A>, TypeTag<A>) line: 90
BaiJoin$.main(String[]) line: 26
BaiJoin.main(String[]) line: not available

看这句:SQLContext.createSchemaRDD(RDD<A>, TypeTag<A>)
当时的断点停在new SchemaRDD这一句:
implicit def createSchemaRDD[A <: Product: TypeTag](rdd: RDD[A]) =
new SchemaRDD(this, SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd)))
当时的varible界面里有这样一个变量:evidence$1 TypeTags$TypeTagImpl<T> (id=107)
它的值是 TypeTag[com.ailk.test.sql.tb],所以可以近似认为:A就是com.ailk.test.sql.tb(一个case class类型)
rdd则是:MappedRDD[2] at map at BaiJoin.scala:16
MappedRDD[1] at textFile at BaiJoin.scala:16
HadoopRDD[0] at textFile at BaiJoin.scala:16

def fromProductRdd[A <: Product : TypeTag](productRdd: RDD[A]) = {
ExistingRdd(ScalaReflection.attributesFor[A], productToRowRdd(productRdd))
}
把A里面,所有的item都取出来,成为一个列表,就是com.ailk.test.sql.tb定义的所有列
可见ScalaReflection.attributesFor[A]的结果是一个Seq[Attribute],它的excute就是返回一个RDD[Row]
case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends LeafNode {
override def execute() = rdd
}
输入是RDD[A],输出是RDD[Row]
def productToRowRdd[A <: Product](data: RDD[A]): RDD[Row] = {
data.mapPartitions { iterator =>
if (iterator.isEmpty) {
Iterator.empty
} else {
val bufferedIterator = iterator.buffered
val mutableRow = new GenericMutableRow(bufferedIterator.head.productArity)

bufferedIterator.map { r =>
var i = 0
while (i < mutableRow.length) {
mutableRow(i) = r.productElement(i)
i += 1
}

mutableRow
}
}
}
}

/
heap jit-Compiler gc
dfs3
申请内存的操作必须是原子操作 线程的模式:tlab--为每个线程来 freeList Bumpthepointer
复制算法
s0和s1复制的是eden中存活的对象
标记清除算法---内存碎片
标记压缩算法----内存拷贝比较严重

root的选择:class thread stacklocal jnilocal monitor “held by jvm”
dfs3 标记法
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值