org.apache.spark.SparkException: Failed to get broadcast_270_piece0 of broadcast_270

本文介绍在Spark1.6.0中遇到的“Failed to get broadcast”错误原因及解决方法,包括调整spark.cleaner.ttl参数防止RDD过早清理,以及正确管理SparkContext实例避免重复初始化。

在spark1.6.0中运行代码,出现如下错误:

org.apache.spark.SparkException: Failed to get broadcast_270_piece0 of broadcast_270

解决方法

1.可能是因为spark.cleaner.ttl导致的,spark.cleaner.ttl设置一个清除时间,使spark清除超过这个时间的所有RDD数据,以便腾出空间给后来的RDD使用。可按如下设置时长(s):

val sc = newSparkConf ().setMaster (“local [2]” ).setAppName (“test” ).set (“spark.cleaner.ttl” ,“2000” )

2.可能是因为将sparkcontext定义在了object体内,而不是object的方法内,这就导致方法在执行时,sparkcontext初始化多次。在spark中,上一个sparkcontext没有关闭,则会出错。
可以额外写一个spark初始化类,然后在需要的object方法内调用即可,

import org.apache.spark.{SparkConf, SparkContext}

class Spark extends Serializable {
  def getContext: SparkContext = {
    @transient lazy val conf: SparkConf = 
          new SparkConf()
          .setMaster("local")
          .setAppName("test")

    @transient lazy val sc: SparkContext = new SparkContext(conf)
    sc.setLogLevel("OFF")

   sc
  }
 }

调用:

object Test extends Spark{

  def main(args: Array[String]): Unit = {
  val sc = getContext
  val irisRDD: RDD[String] = sc.textFile("...")
...
}

参考https://www.jianshu.com/p/33fe0987f715

步骤9: 使用reduceByKey计算各城市的平均年龄 各城市平均年龄: 2025-12-15 00:19:40,673 INFO spark.SparkContext: Starting job: collect at /root/PycharmProjects/pythonProject1/readspark.py:204 2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Registering RDD 22 (reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) as input to shuffle 1 2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Got job 15 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204) with 1 output partitions 2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Final stage: ResultStage 17 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204) 2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 16) 2025-12-15 00:19:40,677 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 16) 2025-12-15 00:19:40,678 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 16 (PairwiseRDD[22] at reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188), which has no missing parents 2025-12-15 00:19:40,717 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 13.9 KiB, free 413.3 MiB) 2025-12-15 00:19:40,719 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 8.1 KiB, free 413.3 MiB) 2025-12-15 00:19:40,722 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on 127.0.0.1:44313 (size: 8.1 KiB, free: 413.8 MiB) 2025-12-15 00:19:40,725 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1535 2025-12-15 00:19:40,726 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 16 (PairwiseRDD[22] at reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) (first 15 tasks are for partitions Vector(0)) 2025-12-15 00:19:40,726 INFO scheduler.TaskSchedulerImpl: Adding task set 16.0 with 1 tasks resource profile 0 2025-12-15 00:19:40,730 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 16.0 (TID 16) (172.20.10.3, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes) 2025-12-15 00:19:40,731 INFO executor.Executor: Running task 0.0 in stage 16.0 (TID 16) 2025-12-15 00:19:40,737 INFO rdd.HadoopRDD: Input split: file:/root/PycharmProjects/pythonProject1/user_data.txt:0+365 2025-12-15 00:19:40,785 INFO storage.BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:44313 in memory (size: 6.0 KiB, free: 413.8 MiB) 2025-12-15 00:19:40,860 INFO python.PythonRunner: Times: total = 61, boot = -418, init = 479, finish = 0 2025-12-15 00:19:40,922 INFO executor.Executor: Finished task 0.0 in stage 16.0 (TID 16). 1620 bytes result sent to driver 2025-12-15 00:19:40,925 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 16.0 (TID 16) in 196 ms on 172.20.10.3 (executor driver) (1/1) 2025-12-15 00:19:40,925 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks have all completed, from pool 2025-12-15 00:19:40,927 INFO scheduler.DAGScheduler: ShuffleMapStage 16 (reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) finished in 0.248 s 2025-12-15 00:19:40,927 INFO scheduler.DAGScheduler: looking for newly runnable stages 2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: running: Set() 2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 17) 2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: failed: Set() 2025-12-15 00:19:40,929 INFO scheduler.DAGScheduler: Submitting ResultStage 17 (PythonRDD[25] at collect at /root/PycharmProjects/pythonProject1/readspark.py:204), which has no missing parents 2025-12-15 00:19:40,942 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 10.6 KiB, free 413.3 MiB) 2025-12-15 00:19:40,944 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 6.3 KiB, free 413.3 MiB) 2025-12-15 00:19:40,947 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on 127.0.0.1:44313 (size: 6.3 KiB, free: 413.8 MiB) 2025-12-15 00:19:40,962 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1535 2025-12-15 00:19:40,963 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 17 (PythonRDD[25] at collect at /root/PycharmProjects/pythonProject1/readspark.py:204) (first 15 tasks are for partitions Vector(0)) 2025-12-15 00:19:40,963 INFO scheduler.TaskSchedulerImpl: Adding task set 17.0 with 1 tasks resource profile 0 2025-12-15 00:19:40,966 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 17.0 (TID 17) (172.20.10.3, executor driver, partition 0, ANY, 7181 bytes) 2025-12-15 00:19:40,967 INFO executor.Executor: Running task 0.0 in stage 17.0 (TID 17) 2025-12-15 00:19:41,013 INFO storage.ShuffleBlockFetcherIterator: Getting 1 (142.0 B) non-empty blocks including 1 (142.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks 2025-12-15 00:19:41,014 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 2025-12-15 00:19:41,098 ERROR executor.Executor: Exception in task 0.0 in stage 17.0 (TID 17) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age rounded_avg = __builtins__.round(avg_age_value, 1) AttributeError: 'dict' object has no attribute 'round' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2025-12-15 00:19:41,241 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age rounded_avg = __builtins__.round(avg_age_value, 1) AttributeError: 'dict' object has no attribute 'round' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2025-12-15 00:19:41,243 ERROR scheduler.TaskSetManager: Task 0 in stage 17.0 failed 1 times; aborting job 2025-12-15 00:19:41,244 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks have all completed, from pool 2025-12-15 00:19:41,251 INFO scheduler.TaskSchedulerImpl: Cancelling stage 17 2025-12-15 00:19:41,276 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 17: Stage cancelled 2025-12-15 00:19:41,278 INFO scheduler.DAGScheduler: ResultStage 17 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204) failed in 0.347 s due to Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age rounded_avg = __builtins__.round(avg_age_value, 1) AttributeError: 'dict' object has no attribute 'round' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 2025-12-15 00:19:41,283 INFO scheduler.DAGScheduler: Job 15 failed: collect at /root/PycharmProjects/pythonProject1/readspark.py:204, took 0.610148 s 2025-12-15 00:19:41,395 INFO storage.BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:44313 in memory (size: 8.1 KiB, free: 413.8 MiB) 2025-12-15 00:19:41,553 INFO storage.BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:44313 in memory (size: 6.2 KiB, free: 413.9 MiB) 2025-12-15 00:19:41,704 INFO storage.BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:44313 in memory (size: 6.3 KiB, free: 413.9 MiB) Traceback (most recent call last): File "/root/PycharmProjects/pythonProject1/readspark.py", line 204, in <module> city_avg_results = city_avg_age.collect() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/pyspark/rdd.py", line 1814, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__ File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/pyspark/errors/exceptions/captured.py", line 169, in deco return f(*a, **kw) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age rounded_avg = __builtins__.round(avg_age_value, 1) AttributeError: 'dict' object has no attribute 'round' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age rounded_avg = __builtins__.round(avg_age_value, 1) AttributeError: 'dict' object has no attribute 'round' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Process finished with exit code 1
最新发布
12-16
/usr/java/jdk1.8.0_171/bin/java -javaagent:/root/idea/lib/idea_rt.jar=40900 -Dfile.encoding=UTF-8 -classpath /usr/java/jdk1.8.0_171/jre/lib/charsets.jar:/usr/java/jdk1.8.0_171/jre/lib/deploy.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/cldrdata.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/dnsns.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/jaccess.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/jfxrt.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/localedata.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/nashorn.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/sunec.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/sunjce_provider.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/sunpkcs11.jar:/usr/java/jdk1.8.0_171/jre/lib/ext/zipfs.jar:/usr/java/jdk1.8.0_171/jre/lib/javaws.jar:/usr/java/jdk1.8.0_171/jre/lib/jce.jar:/usr/java/jdk1.8.0_171/jre/lib/jfr.jar:/usr/java/jdk1.8.0_171/jre/lib/jfxswt.jar:/usr/java/jdk1.8.0_171/jre/lib/jsse.jar:/usr/java/jdk1.8.0_171/jre/lib/management-agent.jar:/usr/java/jdk1.8.0_171/jre/lib/plugin.jar:/usr/java/jdk1.8.0_171/jre/lib/resources.jar:/usr/java/jdk1.8.0_171/jre/lib/rt.jar:/root/IdeaProjects/untitled5/out/production/untitled5:/root/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.12.20/scala-library-2.12.20.jar:/root/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-reflect/2.12.20/scala-reflect-2.12.20.jar:/root/mycode/spark-3.2/jars/xz-1.8.jar:/root/mycode/spark-3.2/jars/ivy-2.5.0.jar:/root/mycode/spark-3.2/jars/oro-2.0.8.jar:/root/mycode/spark-3.2/jars/blas-2.2.0.jar:/root/mycode/spark-3.2/jars/core-1.1.2.jar:/root/mycode/spark-3.2/jars/gson-2.8.6.jar:/root/mycode/spark-3.2/jars/tink-1.6.0.jar:/root/mycode/spark-3.2/jars/avro-1.10.2.jar:/root/mycode/spark-3.2/jars/okio-1.14.0.jar:/root/mycode/spark-3.2/jars/opencsv-2.3.jar:/root/mycode/spark-3.2/jars/shims-0.9.0.jar:/root/mycode/spark-3.2/jars/arpack-2.2.0.jar:/root/mycode/spark-3.2/jars/guava-14.0.1.jar:/root/mycode/spark-3.2/jars/jsr305-3.0.0.jar:/root/mycode/spark-3.2/jars/lapack-2.2.0.jar:/root/mycode/spark-3.2/jars/log4j-1.2.17.jar:/root/mycode/spark-3.2/jars/minlog-1.3.0.jar:/root/mycode/spark-3.2/jars/stream-2.9.6.jar:/root/mycode/spark-3.2/jars/generex-1.0.2.jar:/root/mycode/spark-3.2/jars/hk2-api-2.6.1.jar:/root/mycode/spark-3.2/jars/janino-3.0.16.jar:/root/mycode/spark-3.2/jars/objenesis-2.6.jar:/root/mycode/spark-3.2/jars/paranamer-2.8.jar:/root/mycode/spark-3.2/jars/py4j-0.10.9.2.jar:/root/mycode/spark-3.2/jars/pyrolite-4.30.jar:/root/mycode/spark-3.2/jars/commons-io-2.4.jar:/root/mycode/spark-3.2/jars/lz4-java-1.7.1.jar:/root/mycode/spark-3.2/jars/okhttp-3.12.12.jar:/root/mycode/spark-3.2/jars/snakeyaml-1.27.jar:/root/mycode/spark-3.2/jars/JTransforms-3.1.jar:/root/mycode/spark-3.2/jars/avro-ipc-1.10.2.jar:/root/mycode/spark-3.2/jars/breeze_2.12-1.2.jar:/root/mycode/spark-3.2/jars/commons-cli-1.2.jar:/root/mycode/spark-3.2/jars/commons-net-2.2.jar:/root/mycode/spark-3.2/jars/commons-net-3.1.jar:/root/mycode/spark-3.2/jars/hk2-utils-2.6.1.jar:/root/mycode/spark-3.2/jars/jaxb-api-2.2.11.jar:/root/mycode/spark-3.2/jars/jersey-hk2-2.34.jar:/root/mycode/spark-3.2/jars/orc-core-1.6.11.jar:/root/mycode/spark-3.2/jars/JLargeArrays-1.5.jar:/root/mycode/spark-3.2/jars/automaton-1.11-8.jar:/root/mycode/spark-3.2/jars/commons-dbcp-1.4.jar:/root/mycode/spark-3.2/jars/commons-io-2.8.0.jar:/root/mycode/spark-3.2/jars/commons-lang-2.6.jar:/root/mycode/spark-3.2/jars/commons-text-1.6.jar:/root/mycode/spark-3.2/jars/orc-shims-1.6.11.jar:/root/mycode/spark-3.2/jars/slf4j-api-1.7.16.jar:/root/mycode/spark-3.2/jars/zjsonpatch-0.3.0.jar:/root/mycode/spark-3.2/jars/zstd-jni-1.5.0-4.jar:/root/mycode/spark-3.2/jars/chill-java-0.10.0.jar:/root/mycode/spark-3.2/jars/chill_2.12-0.10.0.jar:/root/mycode/spark-3.2/jars/commons-lang3-3.5.jar:/root/mycode/spark-3.2/jars/hadoop-auth-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-hdfs-2.7.3.jar:/root/mycode/spark-3.2/jars/hk2-locator-2.6.1.jar:/root/mycode/spark-3.2/jars/kryo-shaded-4.0.2.jar:/root/mycode/spark-3.2/jars/metrics-jmx-4.2.0.jar:/root/mycode/spark-3.2/jars/metrics-jvm-4.2.0.jar:/root/mycode/spark-3.2/jars/rocksdbjni-6.20.3.jar:/root/mycode/spark-3.2/jars/spire_2.12-0.17.0.jar:/root/mycode/spark-3.2/jars/aircompressor-0.21.jar:/root/mycode/spark-3.2/jars/algebra_2.12-2.0.1.jar:/root/mycode/spark-3.2/jars/annotations-17.0.0.jar:/root/mycode/spark-3.2/jars/antlr4-runtime-4.8.jar:/root/mycode/spark-3.2/jars/arrow-format-2.0.0.jar:/root/mycode/spark-3.2/jars/arrow-vector-2.0.0.jar:/root/mycode/spark-3.2/jars/avro-mapred-1.10.2.jar:/root/mycode/spark-3.2/jars/commons-codec-1.10.jar:/root/mycode/spark-3.2/jars/commons-codec-1.15.jar:/root/mycode/spark-3.2/jars/commons-pool-1.5.4.jar:/root/mycode/spark-3.2/jars/compress-lzf-1.0.3.jar:/root/mycode/spark-3.2/jars/jaxb-runtime-2.3.2.jar:/root/mycode/spark-3.2/jars/jersey-client-2.34.jar:/root/mycode/spark-3.2/jars/jersey-common-2.34.jar:/root/mycode/spark-3.2/jars/jersey-server-2.34.jar:/root/mycode/spark-3.2/jars/leveldbjni-all-1.8.jar:/root/mycode/spark-3.2/jars/metrics-core-4.2.0.jar:/root/mycode/spark-3.2/jars/metrics-json-4.2.0.jar:/root/mycode/spark-3.2/jars/RoaringBitmap-0.9.0.jar:/root/mycode/spark-3.2/jars/commons-math3-3.4.1.jar:/root/mycode/spark-3.2/jars/hadoop-client-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-common-2.7.3.jar:/root/mycode/spark-3.2/jars/jackson-core-2.12.3.jar:/root/mycode/spark-3.2/jars/javassist-3.25.0-GA.jar:/root/mycode/spark-3.2/jars/jul-to-slf4j-1.7.30.jar:/root/mycode/spark-3.2/jars/snappy-java-1.1.8.4.jar:/root/mycode/spark-3.2/jars/commons-crypto-1.0.0.jar:/root/mycode/spark-3.2/jars/commons-crypto-1.1.0.jar:/root/mycode/spark-3.2/jars/commons-digester-1.8.jar:/root/mycode/spark-3.2/jars/commons-lang3-3.12.0.jar:/root/mycode/spark-3.2/jars/jakarta.inject-2.6.1.jar:/root/mycode/spark-3.2/jars/orc-mapreduce-1.6.11.jar:/root/mycode/spark-3.2/jars/parquet-hadoop-1.8.2.jar:/root/mycode/spark-3.2/jars/scala-xml_2.12-1.2.0.jar:/root/mycode/spark-3.2/jars/shapeless_2.12-2.3.3.jar:/root/mycode/spark-3.2/jars/slf4j-log4j12-1.7.16.jar:/root/mycode/spark-3.2/jars/spark-sql_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/threeten-extra-1.5.0.jar:/root/mycode/spark-3.2/jars/commons-compress-1.21.jar:/root/mycode/spark-3.2/jars/commons-logging-1.1.3.jar:/root/mycode/spark-3.2/jars/hadoop-yarn-api-2.7.3.jar:/root/mycode/spark-3.2/jars/hive-cli-1.2.1.spark2.jar:/root/mycode/spark-3.2/jars/jcl-over-slf4j-1.7.30.jar:/root/mycode/spark-3.2/jars/orc-core-1.4.1-nohive.jar:/root/mycode/spark-3.2/jars/parquet-column-1.12.1.jar:/root/mycode/spark-3.2/jars/parquet-common-1.12.1.jar:/root/mycode/spark-3.2/jars/parquet-hadoop-1.12.1.jar:/root/mycode/spark-3.2/jars/scala-library-2.12.15.jar:/root/mycode/spark-3.2/jars/scala-reflect-2.12.15.jar:/root/mycode/spark-3.2/jars/spark-core_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-hive_2.11-2.3.0.jar:/root/mycode/spark-3.2/jars/spark-repl_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-tags_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-yarn_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/breeze-macros_2.12-1.2.jar:/root/mycode/spark-3.2/jars/cats-kernel_2.12-2.1.1.jar:/root/mycode/spark-3.2/jars/commons-compiler-3.0.8.jar:/root/mycode/spark-3.2/jars/commons-compress-1.4.1.jar:/root/mycode/spark-3.2/jars/commons-httpclient-3.1.jar:/root/mycode/spark-3.2/jars/flatbuffers-java-1.9.0.jar:/root/mycode/spark-3.2/jars/hive-exec-1.2.1.spark2.jar:/root/mycode/spark-3.2/jars/hive-jdbc-1.2.1.spark2.jar:/root/mycode/spark-3.2/jars/hive-storage-api-2.7.2.jar:/root/mycode/spark-3.2/jars/metrics-graphite-4.2.0.jar:/root/mycode/spark-3.2/jars/netty-all-4.1.68.Final.jar:/root/mycode/spark-3.2/jars/parquet-jackson-1.12.1.jar:/root/mycode/spark-3.2/jars/scala-compiler-2.12.15.jar:/root/mycode/spark-3.2/jars/spark-mesos_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-mllib_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spire-util_2.12-0.17.0.jar:/root/mycode/spark-3.2/jars/xbean-asm9-shaded-4.20.jar:/root/mycode/spark-3.2/jars/arpack_combined_all-0.1.jar:/root/mycode/spark-3.2/jars/arrow-memory-core-2.0.0.jar:/root/mycode/spark-3.2/jars/commons-beanutils-1.7.0.jar:/root/mycode/spark-3.2/jars/commons-compiler-3.0.16.jar:/root/mycode/spark-3.2/jars/jackson-databind-2.12.3.jar:/root/mycode/spark-3.2/jars/jakarta.ws.rs-api-2.1.6.jar:/root/mycode/spark-3.2/jars/kubernetes-client-5.4.1.jar:/root/mycode/spark-3.2/jars/macro-compat_2.12-1.1.1.jar:/root/mycode/spark-3.2/jars/parquet-encoding-1.12.1.jar:/root/mycode/spark-3.2/jars/spark-graphx_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-sketch_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-unsafe_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/univocity-parsers-2.9.1.jar:/root/mycode/spark-3.2/jars/arrow-memory-netty-2.0.0.jar:/root/mycode/spark-3.2/jars/hadoop-annotations-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-yarn-client-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-yarn-common-2.7.3.jar:/root/mycode/spark-3.2/jars/spark-kvstore_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spire-macros_2.12-0.17.0.jar:/root/mycode/spark-3.2/jars/avro-mapred-1.7.7-hadoop2.jar:/root/mycode/spark-3.2/jars/commons-collections-3.2.2.jar:/root/mycode/spark-3.2/jars/commons-configuration-1.6.jar:/root/mycode/spark-3.2/jars/hive-beeline-1.2.1.spark2.jar:/root/mycode/spark-3.2/jars/jakarta.servlet-api-4.0.3.jar:/root/mycode/spark-3.2/jars/json4s-ast_2.12-3.7.0-M11.jar:/root/mycode/spark-3.2/jars/spark-catalyst_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-launcher_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/audience-annotations-0.5.0.jar:/root/mycode/spark-3.2/jars/jackson-annotations-2.12.3.jar:/root/mycode/spark-3.2/jars/jakarta.xml.bind-api-2.3.2.jar:/root/mycode/spark-3.2/jars/json4s-core_2.12-3.7.0-M11.jar:/root/mycode/spark-3.2/jars/orc-mapreduce-1.4.1-nohive.jar:/root/mycode/spark-3.2/jars/spark-streaming_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spire-platform_2.12-0.17.0.jar:/root/mycode/spark-3.2/jars/hive-metastore-1.2.1.spark2.jar:/root/mycode/spark-3.2/jars/kubernetes-model-apps-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-core-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-node-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-rbac-5.4.1.jar:/root/mycode/spark-3.2/jars/logging-interceptor-3.12.12.jar:/root/mycode/spark-3.2/jars/mesos-1.4.0-shaded-protobuf.jar:/root/mycode/spark-3.2/jars/osgi-resource-locator-1.0.3.jar:/root/mycode/spark-3.2/jars/parquet-hadoop-bundle-1.6.0.jar:/root/mycode/spark-3.2/jars/spark-kubernetes_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/spark-tags_2.12-3.2.0-tests.jar:/root/mycode/spark-3.2/jars/aopalliance-repackaged-2.6.1.jar:/root/mycode/spark-3.2/jars/commons-beanutils-core-1.8.0.jar:/root/mycode/spark-3.2/jars/istack-commons-runtime-3.0.8.jar:/root/mycode/spark-3.2/jars/jakarta.annotation-api-1.3.5.jar:/root/mycode/spark-3.2/jars/jakarta.validation-api-2.0.2.jar:/root/mycode/spark-3.2/jars/json4s-scalap_2.12-3.7.0-M11.jar:/root/mycode/spark-3.2/jars/kubernetes-model-batch-5.4.1.jar:/root/mycode/spark-3.2/jars/spark-mllib-local_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/jersey-container-servlet-2.34.jar:/root/mycode/spark-3.2/jars/json4s-jackson_2.12-3.7.0-M11.jar:/root/mycode/spark-3.2/jars/kubernetes-model-common-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-events-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-policy-5.4.1.jar:/root/mycode/spark-3.2/jars/jackson-dataformat-yaml-2.12.3.jar:/root/mycode/spark-3.2/jars/jackson-datatype-jsr310-2.11.2.jar:/root/mycode/spark-3.2/jars/kubernetes-model-metrics-5.4.1.jar:/root/mycode/spark-3.2/jars/hadoop-yarn-server-common-2.7.3.jar:/root/mycode/spark-3.2/jars/mysql-connector-java-5.1.45-bin.jar:/root/mycode/spark-3.2/jars/spark-network-common_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/jackson-module-scala_2.12-2.12.3.jar:/root/mycode/spark-3.2/jars/kubernetes-model-discovery-5.4.1.jar:/root/mycode/spark-3.2/jars/parquet-format-structures-1.12.1.jar:/root/mycode/spark-3.2/jars/spark-network-shuffle_2.12-3.2.0.jar:/root/mycode/spark-3.2/jars/hadoop-mapreduce-client-app-2.7.3.jar:/root/mycode/spark-3.2/jars/kubernetes-model-extensions-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-networking-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-scheduling-5.4.1.jar:/root/mycode/spark-3.2/jars/hadoop-mapreduce-client-core-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-yarn-server-web-proxy-2.7.3.jar:/root/mycode/spark-3.2/jars/jersey-container-servlet-core-2.34.jar:/root/mycode/spark-3.2/jars/kubernetes-model-autoscaling-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-flowcontrol-5.4.1.jar:/root/mycode/spark-3.2/jars/scala-collection-compat_2.12-2.1.1.jar:/root/mycode/spark-3.2/jars/spark-hive-thriftserver_2.11-2.3.0.jar:/root/mycode/spark-3.2/jars/kubernetes-model-certificates-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-coordination-5.4.1.jar:/root/mycode/spark-3.2/jars/kubernetes-model-storageclass-5.4.1.jar:/root/mycode/spark-3.2/jars/scala-parser-combinators_2.12-1.1.2.jar:/root/mycode/spark-3.2/jars/hadoop-mapreduce-client-common-2.7.3.jar:/root/mycode/spark-3.2/jars/kubernetes-model-apiextensions-5.4.1.jar:/root/mycode/spark-3.2/jars/hadoop-mapreduce-client-shuffle-2.7.3.jar:/root/mycode/spark-3.2/jars/hadoop-mapreduce-client-jobclient-2.7.3.jar:/root/mycode/spark-3.2/jars/kubernetes-model-admissionregistration-5.4.1.jar T2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 25/05/14 08:05:53 INFO SparkContext: Running Spark version 3.2.0 25/05/14 08:05:53 INFO ResourceUtils: ============================================================== 25/05/14 08:05:53 INFO ResourceUtils: No custom resources configured for spark.driver. 25/05/14 08:05:53 INFO ResourceUtils: ============================================================== 25/05/14 08:05:53 INFO SparkContext: Submitted application: a 25/05/14 08:05:53 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/05/14 08:05:53 INFO ResourceProfile: Limiting resource is cpu 25/05/14 08:05:53 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/05/14 08:05:54 INFO SecurityManager: Changing view acls to: root 25/05/14 08:05:54 INFO SecurityManager: Changing modify acls to: root 25/05/14 08:05:54 INFO SecurityManager: Changing view acls groups to: 25/05/14 08:05:54 INFO SecurityManager: Changing modify acls groups to: 25/05/14 08:05:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 25/05/14 08:05:54 INFO Utils: Successfully started service 'sparkDriver' on port 43958. 25/05/14 08:05:54 INFO SparkEnv: Registering MapOutputTracker 25/05/14 08:05:54 INFO SparkEnv: Registering BlockManagerMaster 25/05/14 08:05:54 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/05/14 08:05:54 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/05/14 08:05:54 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/05/14 08:05:54 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1b895f24-07c1-405d-abd2-7c3e469de5f2 25/05/14 08:05:54 INFO MemoryStore: MemoryStore started with capacity 861.3 MiB 25/05/14 08:05:54 INFO SparkEnv: Registering OutputCommitCoordinator 25/05/14 08:05:55 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 25/05/14 08:05:55 INFO Utils: Successfully started service 'SparkUI' on port 4041. 25/05/14 08:05:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:4041 25/05/14 08:05:55 INFO Executor: Starting executor ID driver on host master 25/05/14 08:05:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41465. 25/05/14 08:05:55 INFO NettyBlockTransferService: Server created on master:41465 25/05/14 08:05:55 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/05/14 08:05:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 41465, None) 25/05/14 08:05:55 INFO BlockManagerMasterEndpoint: Registering block manager master:41465 with 861.3 MiB RAM, BlockManagerId(driver, master, 41465, None) 25/05/14 08:05:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 41465, None) 25/05/14 08:05:55 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 41465, None) 25/05/14 08:05:56 WARN FileSystem: Cannot load filesystem java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.WebHdfsFileSystem could not be instantiated at java.util.ServiceLoader.fail(ServiceLoader.java:232) at java.util.ServiceLoader.access$100(ServiceLoader.java:185) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:344) at org.apache.spark.SparkContext.$anonfun$hadoopFile$1(SparkContext.scala:1128) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1123) at org.apache.spark.SparkContext.$anonfun$textFile$1(SparkContext.scala:926) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.textFile(SparkContext.scala:923) at T2$.main(Main.scala:10) at T2.main(Main.scala) Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.<clinit>(WebHdfsFileSystem.java:129) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 22 more Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.ObjectMapper at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 29 more 25/05/14 08:05:56 WARN FileSystem: Cannot load filesystem java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.web.SWebHdfsFileSystem could not be instantiated at java.util.ServiceLoader.fail(ServiceLoader.java:232) at java.util.ServiceLoader.access$100(ServiceLoader.java:185) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:344) at org.apache.spark.SparkContext.$anonfun$hadoopFile$1(SparkContext.scala:1128) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1123) at org.apache.spark.SparkContext.$anonfun$textFile$1(SparkContext.scala:926) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:792) at org.apache.spark.SparkContext.textFile(SparkContext.scala:923) at T2$.main(Main.scala:10) at T2.main(Main.scala) Caused by: java.lang.NoClassDefFoundError: org.apache.hadoop.hdfs.web.WebHdfsFileSystem at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructor0(Class.java:3075) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 22 more 25/05/14 08:05:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 239.2 KiB, free 861.1 MiB) 25/05/14 08:05:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.0 KiB, free 861.0 MiB) 25/05/14 08:05:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on master:41465 (size: 23.0 KiB, free: 861.3 MiB) 25/05/14 08:05:56 INFO SparkContext: Created broadcast 0 from textFile at Main.scala:10 25/05/14 08:05:56 INFO FileInputFormat: Total input paths to process : 1 25/05/14 08:05:56 INFO SparkContext: Starting job: count at Main.scala:12 25/05/14 08:05:56 INFO DAGScheduler: Got job 0 (count at Main.scala:12) with 2 output partitions 25/05/14 08:05:56 INFO DAGScheduler: Final stage: ResultStage 0 (count at Main.scala:12) 25/05/14 08:05:56 INFO DAGScheduler: Parents of final stage: List() 25/05/14 08:05:56 INFO DAGScheduler: Missing parents: List() 25/05/14 08:05:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at filter at Main.scala:11), which has no missing parents 25/05/14 08:05:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.1 KiB, free 861.0 MiB) 25/05/14 08:05:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KiB, free 861.0 MiB) 25/05/14 08:05:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on master:41465 (size: 2.9 KiB, free: 861.3 MiB) 25/05/14 08:05:56 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1427 25/05/14 08:05:56 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at filter at Main.scala:11) (first 15 tasks are for partitions Vector(0, 1)) 25/05/14 08:05:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0 25/05/14 08:05:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (master, executor driver, partition 0, PROCESS_LOCAL, 4499 bytes) taskResourceAssignments Map() 25/05/14 08:05:56 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (master, executor driver, partition 1, PROCESS_LOCAL, 4499 bytes) taskResourceAssignments Map() 25/05/14 08:05:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 25/05/14 08:05:56 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 25/05/14 08:05:57 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 25/05/14 08:05:57 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 25/05/14 08:05:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (master executor driver): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 25/05/14 08:05:57 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 25/05/14 08:05:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 25/05/14 08:05:57 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on master, executor driver: java.lang.IllegalStateException (unread block data) [duplicate 1] 25/05/14 08:05:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 25/05/14 08:05:57 INFO TaskSchedulerImpl: Cancelling stage 0 25/05/14 08:05:57 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled 25/05/14 08:05:57 INFO DAGScheduler: ResultStage 0 (count at Main.scala:12) failed in 0.260 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (master executor driver): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 25/05/14 08:05:57 INFO DAGScheduler: Job 0 failed: count at Main.scala:12, took 0.348659 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (master executor driver): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279) at org.apache.spark.rdd.RDD.count(RDD.scala:1253) at T2$.main(Main.scala:12) at T2.main(Main.scala) Caused by: java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1603) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 25/05/14 08:05:57 INFO SparkContext: Invoking stop() from shutdown hook 25/05/14 08:05:57 INFO SparkUI: Stopped Spark web UI at http://master:4041 25/05/14 08:05:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 25/05/14 08:05:57 INFO MemoryStore: MemoryStore cleared 25/05/14 08:05:57 INFO BlockManager: BlockManager stopped 25/05/14 08:05:57 INFO BlockManagerMaster: BlockManagerMaster stopped 25/05/14 08:05:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 25/05/14 08:05:57 INFO SparkContext: Successfully stopped SparkContext 25/05/14 08:05:57 INFO ShutdownHookManager: Shutdown hook called 25/05/14 08:05:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-629d72a2-b4e9-4dff-8911-471aa95ce3f9 进程已结束,退出代码为 1 import org.apache.spark.{SparkConf, SparkContext} object T2 { def main(args: Array[String]): Unit = { val conf = new SparkConf() .setMaster("local[*]") // 使用所有核心 .setAppName("a") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sc = new SparkContext(conf) val rdd = sc.textFile("file:///root/spark-3.2/README.md") val rdd1 = rdd.filter(_.contains("A")) val num = rdd1.count() println(s"Count: $num") sc.stop() } }
05-15
org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Error operating ExecuteStatement: org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:194) at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectBase.run(CreateHiveTableAsSelectCommand.scala:71) at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectBase.run$(CreateHiveTableAsSelectCommand.scala:40) at org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:141) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3743) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3741) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Multiple failures in stage materialization. at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.cleanUpAndThrowException(AdaptiveSparkPlanExec.scala:652) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:224) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:179) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:295) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:177) ... 34 more Suppressed: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1 at org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec$$anon$1.run(QueryStageExec.scala:217) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ... 3 more Caused by: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1 at org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec$$anon$1.run(QueryStageExec.scala:217) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ... 3 more at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70) at org.apache.kyuubi.engine.spark.operation.SparkOperation$$anonfun$onError$1.$anonfun$applyOrElse$1(SparkOperation.scala:181) at org.apache.kyuubi.Utils$.withLockRequired(Utils.scala:425) at org.apache.kyuubi.operation.AbstractOperation.withLockRequired(AbstractOperation.scala:52) at org.apache.kyuubi.engine.spark.operation.SparkOperation$$anonfun$onError$1.applyOrElse(SparkOperation.scala:169) at org.apache.kyuubi.engine.spark.operation.SparkOperation$$anonfun$onError$1.applyOrElse(SparkOperation.scala:164) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:92) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:194) at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectBase.run(CreateHiveTableAsSelectCommand.scala:71) at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectBase.run$(CreateHiveTableAsSelectCommand.scala:40) at org.apache.spark.sql.hive.execution.OptimizedCreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:141) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3743) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3741) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131) at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) ... 6 more Caused by: org.apache.spark.SparkException: Multiple failures in stage materialization. at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.cleanUpAndThrowException(AdaptiveSparkPlanExec.scala:652) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:224) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:179) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:295) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:177) ... 34 more Suppressed: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1 at org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec$$anon$1.run(QueryStageExec.scala:217) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ... 3 more Caused by: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1 at org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec$$anon$1.run(QueryStageExec.scala:217) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ... 3 more at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70) at org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:135) at org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:173) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)为啥报错
12-12
你遇到的错误信息非常典型,是 **Apache Kyuubi + Spark SQL** 在执行 `CREATE TABLE AS SELECT`(CTAS)或插入 Hive 表时常见的 **广播超时导致任务失败** 的复合异常。 --- ### 🧾 错误摘要 ```text org.apache.kyuubi.KyuubiSQLException: Error operating ExecuteStatement: Caused by: org.apache.spark.SparkException: Job aborted. Suppressed: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1 ``` --- ## 🔍 根本原因分析 这个错误的核心是: > **Spark 尝试将一个“小表”进行广播(Broadcast)用于 Broadcast Join,但在 300 秒内未能完成传输,最终超时失败。** 虽然你的主操作可能是写入一张表(如 CTAS),但底层查询中涉及了 **Join 操作**,而 Spark 自动判断某个表“够小”可以广播,于是尝试将其广播到所有 Executor,但由于以下原因之一失败: | 原因 | 说明 | |------|------| | ⚠️ 被广播的表实际并不小(>10MB) | 默认 `spark.sql.autoBroadcastJoinThreshold=10MB`,超过就不会广播;但如果配置不当或统计不准,仍可能尝试广播大表 | | ⚠️ 集群网络慢 / GC 时间长 | 导致广播耗时超过 300 秒 | | ⚠️ Executor 内存不足 | 广播变量加载失败,重试多次后超时 | | ⚠️ 数据倾斜 | 某个分区特别大,拖慢整体广播进度 | --- ## ✅ 解决方案 ### ✅ 方案一:增加广播超时时间(适用于可信任的小表) 如果你确认要广播的表确实是小表(比如维度表 < 10MB),只是因为集群负载高、网络慢导致超时,可以延长超时时间。 ```scala val spark = SparkSession.builder() .appName("FixBroadcastTimeout") .config("spark.sql.broadcastTimeout", "600") // 单位:秒,从 300 改为 600(10分钟) .getOrCreate() ``` 或在 SQL 中设置: ```sql SET spark.sql.broadcastTimeout = 600; -- 然后执行你的查询 CREATE TABLE new_table AS SELECT ...; ``` --- ### ✅ 方案二:禁用自动广播 Join(推荐用于不确定大小的表) 如果你无法确定是否该广播,或者怀疑是误判导致的大表被广播,建议先**关闭自动广播**: ```scala val spark = SparkSession.builder() .config("spark.sql.autoBroadcastJoinThreshold", "-1") // 关闭自动广播 .getOrCreate() ``` 或通过 SQL 设置: ```sql SET spark.sql.autoBroadcastJoinThreshold = -1; ``` 这会强制 Spark 使用 **Shuffle Hash Join 或 Sort Merge Join**,避免广播大表带来的风险。 > 💡 提示:关闭后性能可能略有下降,但更稳定。 --- ### ✅ 方案三:手动控制广播(使用 SQL Hint) 如果你只想对某些明确的小表进行广播,可以用 `/*+ BROADCAST */` hint 显式指定: ```sql SELECT /*+ BROADCAST(dim_table) */ fact.*, dim.name FROM large_fact_table fact JOIN small_dim_table dim ON fact.id = dim.id; ``` 同时设置: ```sql SET spark.sql.autoBroadcastJoinThreshold = -1; -- 只允许显式广播 ``` 这样既安全又高效。 --- ### ✅ 方案四:检查并优化数据和资源配置 #### 1. 检查表的实际大小 运行: ```sql ANALYZE TABLE your_small_table COMPUTE STATISTICS; DESC FORMATTED your_small_table; ``` 查看 `numRows`, `totalSize` 是否真的小于 10MB。 #### 2. 增加 Executor 内存(防止 OOM) ```scala .config("spark.executor.memory", "8g") .config("spark.driver.memory", "4g") .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") ``` #### 3. 开启 Adaptive Query Execution (AQE) AQE 可以在运行时重新评估是否广播,更智能: ```scala .config("spark.sql.adaptive.enabled", "true") .config("spark.sql.adaptive.coalescePartitions.enabled", "true") .config("spark.sql.adaptive.broadcastTimeout", "600") // AQE 下的广播超时 ``` --- ### ✅ 示例完整配置代码 ```scala val spark = SparkSession.builder() .appName("CTAS-Fix-Broadcast-Timeout") .config("spark.sql.broadcastTimeout", "600") // 广播超时 10 分钟 .config("spark.sql.autoBroadcastJoinThreshold", "10MB") // 默认阈值 // .config("spark.sql.autoBroadcastJoinThreshold", "-1") // 如需关闭自动广播 .config("spark.sql.adaptive.enabled", "true") // 启用 AQE .config("spark.executor.memory", "8g") .config("spark.driver.memory", "4g") .enableHiveSupport() .getOrCreate() ``` --- ### ✅ 如何验证是否解决了问题? 1. 查看 Spark UI → **SQL / Stages 页面** - 观察是否有 `BroadcastQueryStage` - 查看 `Task Duration` 是否仍有长时间卡住的任务 2. 搜索日志关键词: ``` Could not execute broadcast BroadcastQueryStageExec ``` --- ## ❓ 总结:你应该怎么做? | 场景 | 推荐做法 | |------|----------| | 确认有小表参与 Join | 设置 `spark.sql.broadcastTimeout=600` | | 不确定表大小 or 经常误判 | 设置 `spark.sql.autoBroadcastJoinThreshold=-1` | | 想精确控制广播行为 | 使用 `/*+ BROADCAST(t) */` + 关闭自动广播 | | 生产环境稳定性优先 | 关闭自动广播 + 启用 AQE | --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值