aggregate vs treeAggregate

aggregate

aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U)

aggregate函数将每个分区进行seqOp,且从zeroValue开始遍历分区里的所有元素.然后用combOp,从zeroValue开始遍历所有分区的结果.

注意:每个partition的seqOp只应用一次zeroValue,最后的combOp也应用一次zeroValue.

例子:

scala> def seq(a:Int,b:Int):Int={
     | println("seq:"+a+":"+b)
     | math.min(a,b)}
seq: (a: Int, b: Int)Int

scala> def comb(a:Int,b:Int):Int={
     | println("comb:"+a+":"+b)
     | a+b}
comb: (a: Int, b: Int)Int

val z =sc.parallelize(List(1,2,4,5,8,9),3)
scala> z.aggregate(3)(seq,comb)
seq:3:4
seq:3:1
seq:1:2
seq:3:8
seq:3:5
seq:3:9
comb:3:1
comb:4:3
comb:7:3
res10: Int = 10

treeAggregate

treeAggregate[U: ClassTag](zeroValue: U)(
      seqOp: (U, T) => U,
      combOp: (U, U) => U,
      depth: Int = 2)

与aggregate不同的地方是:在每个分区,会做两次或者多次combOp,避免将所有局部的值传给driver端.另外,经过测验初始值zeroValue不会参与combOp.

例子:

scala> z.treeAggregate(3)(seq,comb)
seq:3:4
seq:3:5
seq:3:1
seq:1:2
seq:3:8
seq:3:9
comb:3:3
comb:6:1
res12: Int = 7

对比图:
aggregatevstreeaggregate
注释:
Aggregate

  1. each executor holds a portion of learning set
  2. broadcast model to excutors
  3. collect results to driver

TreeAggregate

  1. simple heuristic to add level
  2. perform partial aggregation by shipping results to other executors(by repartitioning)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值