spark sql broadcast join 配置:
–conf spark.sql.autoBroadcastJoinThreshold=31457280 \
一个比较不错的介绍广播的博文:
https://blog.youkuaiyun.com/lsshlsw/article/details/48662669
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html
// Force BroadcastHashJoin using SQL’s BROADCAST hint
// Supported hints: BROADCAST, BROADCASTJOIN or MAPJOIN
val qBroadcastLeft = “”"
SELECT /*+ BROADCAST (lf) */ *
FROM range(100) lf, range(1000) rt
WHERE lf.id = rt.id
“”"
scala> sql(qBroadcastLeft).explain
== Physical Plan ==
*BroadcastHashJoin [id#34L], [id#35L], Inner, BuildRight
:- *Range (0, 100, step=1, splits=8)
± BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
± *Range (0, 1000, step=1, splits=8)
val qBroadcastRight = “”"
SELECT /*+ MAPJOIN (rt) */ *
FROM range(100) lf, range(1000) rt
WHERE lf.id = rt.id
“”"
scala> sql(qBroadcastRight).explain
== Physical Plan ==
*BroadcastHashJoin [id#42L], [id#43L], Inner, BuildRight
:- *Range (0, 100, step=1, splits=8)
± BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
± *Range (0, 1000, step=1, splits=8)
本文介绍了Spark SQL广播连接(Broadcast Join)的配置,给出了配置参数示例。同时提供了两篇介绍广播的博文链接,并通过Scala代码展示了使用SQL的BROADCAST和MAPJOIN提示强制进行广播哈希连接的示例及物理执行计划。
526

被折叠的 条评论
为什么被折叠?



