org.apache.spark.util.SparkFatalException-优快云博客

在处理大数据时遇到Spark广播操作超时的问题，原因是广播数据集过大。解决方案包括调整`spark.sql.broadcastTimeout`参数以增加等待时间，或者关闭自动广播功能(`spark.sql.autoBroadcastJoinThreshold=-1`)来避免广播大表。

前言

本文隶属于专栏《Spark异常问题汇总》，该专栏为笔者原创，引用请注明来源，不足和错误之处请在评论区帮忙指出，谢谢！

本专栏目录结构和参考文献请见 Spark异常问题汇总

问题描述

加工维表的过程中做了两个维表的关联报错：

java.util.concurrent.ExecutionException: org.apache.spark.util.SparkFatalException
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:206)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$doExecuteBroadcast$2.apply$mcVI$sp(BroadcastExchangeExec.scala:152)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$doExecuteBroadcast$2.apply(BroadcastExchangeExec.scala:150)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$doExecuteBroadcast$2.apply(BroadcastExchangeExec.scala:150)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.immutable.Range.foreach(Range.scala:160)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:150)
        at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:387)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:158)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:154)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:166)
        at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:154)
        at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:117)
        at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:211)
        at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:101)
        at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:189)
        at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:41)
        at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:71)
        at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:189)
        at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:91)
        at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:216)
        at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:189)
        at org.apache.spark.sql.execution.FileSourceScanExec.consume(DataSourceScanExec.scala:165)
        at org.apache.spark.sql.execution.ColumnarBatchScan$class.produceBatches(ColumnarBatchScan.scala:144)
        at org.apache.spark.sql.execution.ColumnarBatchScan$class.doProduce(ColumnarBatchScan.scala:83)
        at org.apache.spark.sql.execution.FileSourceScanExec.doProduce(DataSourceScanExec.scala:165)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:90)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:166)
        at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.FileSourceScanExec.produce(DataSourceScanExec.scala:165)
        at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:131)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:90)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:166)
        at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:91)
        at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:51)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:90)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:166)
        at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:41)
        at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:96)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:90)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:166)
        at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:40)
        at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:51)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:90)
        at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:169)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)