spark--closure broadcast accumulator

本文深入探讨了在Apache Spark中Closure的工作原理及常见陷阱,解释了为什么直接在RDD操作中修改变量会导致错误结果,并介绍了如何使用Accumulator正确实现累加操作。同时,对比了Broadcast变量的作用。

closure:

这种代码是错误的:
var counter = 0
var rdd = sc.parallelize(data)

// Wrong: Don't do this!!
rdd.foreach(x => counter += x)

println("Counter value: " + counter)
closure :To execute jobs, Spark breaks up the processing of RDD operations into
	 tasks, each of which is executed by an executor. Prior to execution, Spark 
	 computes the task’s closure. The closure is those variables and methods which 
	 must be visible for the executor to perform its computations on the RDD (in this 
		 case foreach()). This closure is serialized and sent to each executor.
		 
	The variables within the closure sent to each executor are now copies and thus,
	 when counter is referenced within the foreach function, it’s no longer the 
	 counter on the driver node. There is still a counter in the memory of the driver 
	 node but this is no longer visible to the executors! The executors only see the 
	 copy from the serialized closure. Thus, the final value of counter will still be zero 
	 since all operations on counter were referencing the value within the serialized 
	 closure.

上面的代码中的逻辑可以使用Accumulator来完成

Accumulator:
  Warning: When a Spark task finishes, Spark will try to merge the accumulated 
  updates in this task to an accumulator. If it fails, Spark will ignore the failure 
  and still mark the task successful and continue to run other tasks. Hence, a 
  buggy accumulator will not impact a Spark job, but it may not get updated 
  correctly although a Spark job is successful.
//用法:
LongAccumulator accum = jsc.sc().longAccumulator();

sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x -> accum.add(x));
// ...
// 10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s

accum.value();

broadcast与accumulator类似,不过broadcast只是作为共享变量不允许被更新。

//用法:
Broadcast<int[]> broadcastVar = sc.broadcast(new int[] {1, 2, 3});

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值