函数原型:
def
aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
zeroValue
the initial value for the accumulated result of each partition for the seqOp operator, and also the initial value for the combine results from different partitions for the combOp operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)
seqOp
an operator used to accumulate results within a partition
combOp
an associative operator used to combine results from different partitionsaggregate函数将每个分区里面的元素进行聚合(seqOp),然后用combine函数将每个分区的结果和初始值(zeroValue)进行combine操作。这个函数最终返回的类型不需要和RDD中元素类型一致。
实例:
scala> def seqOP(a:Int, b:Int) : Int = {
| val r = a*b
| println("seqOp: " + a + "\t" + b+"=>"+r)
| r
| }
seqOP: (a: Int, b: Int)Int
scala> def combOp(a:Int, b:Int): Int = {
| val r= a+b
| println("combOp: " + a + "\t" + b+"=>"+r)
| r
| }
combOp: (a: Int, b: Int)Int
scala> val z = sc. parallelize ( List (1 ,2 ,3 ,4 ,5 ,6) , 2)
z: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[9] at parallelize at <console>:27
scala> z. aggregate(3)(seqOP, combOp)
combOp: 3 18=>21
combOp: 21 360=>381
res20: Int = 381计算流程:
1、对List(1,2,3,4,5,6)分区,分成(1,2,3)(4,5,6)
2、对(1,2,3)执行seqOp方法:
3(初始值)*1=>3
3(上轮计算结果)*2=>6
6*3=>18
对(4,5,6)执行seqOp方法
3(初始值)*4=>12
12(上轮计算结果)*5=>60
60*6=>360
3、对分区结果惊醒combine操作
3(初始值)+18(分区结果)=>21
21(上轮计算结果)+360(分区结果) =>381
注意:
1、reduce函数和combine函数必须满足交换律(commutative)和结合律(associative)
2、从aggregate 函数的定义可知,combine函数的输出类型必须和输入的类型一致
本文参考:http://www.iteblog.com/archives/1268
本文详细解析了Spark中RDD的aggregate函数的工作原理与使用方法,通过实例展示了如何利用该函数进行数据聚合,并强调了reduce和combine函数需满足的条件。
1142

被折叠的 条评论
为什么被折叠?



