问题描述
spark读取配置文件读取成功后,rdd中未拿到配置文件的值(executor未拿到配置文件的值,但是driver有这个值)
解决方案
将所需要的对象通过广播发送到各个executor
code:
object BroadcastDemo {
var c1 = 0
var c2 = 0
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("BroadcostDemo").setMaster(args(0))
val sc = new SparkContext(conf)
c1 = 10
val rdd1 = sc.parallelize(1 to 10, 1)
val c2_init = 10
c2 = sc.broadcast(c2_init).value
val c3_init = 10
var c3 = sc.broadcast(c3_init).value
rdd1.mapPartitions(t => {
System.out.println("get c1:" + c1)
System.out.println("get c2:" + c2)
System.out.println("get c3:" + c3)
t
}).collect()
}
}
启动参数
spark-submit --class com.blue.spark.demo.BroadcastDemo \
--master yarn-cluster --num-executors 1 \
--driver-memory 1g --executor-memory 1g --executor-cores 1 \
/tmp/broadcast-demo.jar yarn-cluster
输出结果
get c1:0
get c2:0
get c3:10
分析
- c1由于存放在driver,mapPartitions运行在executor,driver和executor不在同一台机器上,故不到c1更改后的值
- c2也是定义在driver的,所以就是使用broadcast广播后,也未生效
- c3由于是通过broadcast设置一个广播值,然后通过value将这个变量和c3绑定,所以在executor可以拿到c3的值