步骤9: 使用reduceByKey计算各城市的平均年龄
各城市平均年龄:
2025-12-15 00:19:40,673 INFO spark.SparkContext: Starting job: collect at /root/PycharmProjects/pythonProject1/readspark.py:204
2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Registering RDD 22 (reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) as input to shuffle 1
2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Got job 15 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204) with 1 output partitions
2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Final stage: ResultStage 17 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204)
2025-12-15 00:19:40,676 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 16)
2025-12-15 00:19:40,677 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 16)
2025-12-15 00:19:40,678 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 16 (PairwiseRDD[22] at reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188), which has no missing parents
2025-12-15 00:19:40,717 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 13.9 KiB, free 413.3 MiB)
2025-12-15 00:19:40,719 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 8.1 KiB, free 413.3 MiB)
2025-12-15 00:19:40,722 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on 127.0.0.1:44313 (size: 8.1 KiB, free: 413.8 MiB)
2025-12-15 00:19:40,725 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1535
2025-12-15 00:19:40,726 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 16 (PairwiseRDD[22] at reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) (first 15 tasks are for partitions Vector(0))
2025-12-15 00:19:40,726 INFO scheduler.TaskSchedulerImpl: Adding task set 16.0 with 1 tasks resource profile 0
2025-12-15 00:19:40,730 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 16.0 (TID 16) (172.20.10.3, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes)
2025-12-15 00:19:40,731 INFO executor.Executor: Running task 0.0 in stage 16.0 (TID 16)
2025-12-15 00:19:40,737 INFO rdd.HadoopRDD: Input split: file:/root/PycharmProjects/pythonProject1/user_data.txt:0+365
2025-12-15 00:19:40,785 INFO storage.BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:44313 in memory (size: 6.0 KiB, free: 413.8 MiB)
2025-12-15 00:19:40,860 INFO python.PythonRunner: Times: total = 61, boot = -418, init = 479, finish = 0
2025-12-15 00:19:40,922 INFO executor.Executor: Finished task 0.0 in stage 16.0 (TID 16). 1620 bytes result sent to driver
2025-12-15 00:19:40,925 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 16.0 (TID 16) in 196 ms on 172.20.10.3 (executor driver) (1/1)
2025-12-15 00:19:40,925 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks have all completed, from pool
2025-12-15 00:19:40,927 INFO scheduler.DAGScheduler: ShuffleMapStage 16 (reduceByKey at /root/PycharmProjects/pythonProject1/readspark.py:188) finished in 0.248 s
2025-12-15 00:19:40,927 INFO scheduler.DAGScheduler: looking for newly runnable stages
2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: running: Set()
2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 17)
2025-12-15 00:19:40,928 INFO scheduler.DAGScheduler: failed: Set()
2025-12-15 00:19:40,929 INFO scheduler.DAGScheduler: Submitting ResultStage 17 (PythonRDD[25] at collect at /root/PycharmProjects/pythonProject1/readspark.py:204), which has no missing parents
2025-12-15 00:19:40,942 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 10.6 KiB, free 413.3 MiB)
2025-12-15 00:19:40,944 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 6.3 KiB, free 413.3 MiB)
2025-12-15 00:19:40,947 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on 127.0.0.1:44313 (size: 6.3 KiB, free: 413.8 MiB)
2025-12-15 00:19:40,962 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1535
2025-12-15 00:19:40,963 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 17 (PythonRDD[25] at collect at /root/PycharmProjects/pythonProject1/readspark.py:204) (first 15 tasks are for partitions Vector(0))
2025-12-15 00:19:40,963 INFO scheduler.TaskSchedulerImpl: Adding task set 17.0 with 1 tasks resource profile 0
2025-12-15 00:19:40,966 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 17.0 (TID 17) (172.20.10.3, executor driver, partition 0, ANY, 7181 bytes)
2025-12-15 00:19:40,967 INFO executor.Executor: Running task 0.0 in stage 17.0 (TID 17)
2025-12-15 00:19:41,013 INFO storage.ShuffleBlockFetcherIterator: Getting 1 (142.0 B) non-empty blocks including 1 (142.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
2025-12-15 00:19:41,014 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
2025-12-15 00:19:41,098 ERROR executor.Executor: Exception in task 0.0 in stage 17.0 (TID 17)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age
rounded_avg = __builtins__.round(avg_age_value, 1)
AttributeError: 'dict' object has no attribute 'round'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2025-12-15 00:19:41,241 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age
rounded_avg = __builtins__.round(avg_age_value, 1)
AttributeError: 'dict' object has no attribute 'round'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2025-12-15 00:19:41,243 ERROR scheduler.TaskSetManager: Task 0 in stage 17.0 failed 1 times; aborting job
2025-12-15 00:19:41,244 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks have all completed, from pool
2025-12-15 00:19:41,251 INFO scheduler.TaskSchedulerImpl: Cancelling stage 17
2025-12-15 00:19:41,276 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 17: Stage cancelled
2025-12-15 00:19:41,278 INFO scheduler.DAGScheduler: ResultStage 17 (collect at /root/PycharmProjects/pythonProject1/readspark.py:204) failed in 0.347 s due to Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age
rounded_avg = __builtins__.round(avg_age_value, 1)
AttributeError: 'dict' object has no attribute 'round'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
2025-12-15 00:19:41,283 INFO scheduler.DAGScheduler: Job 15 failed: collect at /root/PycharmProjects/pythonProject1/readspark.py:204, took 0.610148 s
2025-12-15 00:19:41,395 INFO storage.BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:44313 in memory (size: 8.1 KiB, free: 413.8 MiB)
2025-12-15 00:19:41,553 INFO storage.BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:44313 in memory (size: 6.2 KiB, free: 413.9 MiB)
2025-12-15 00:19:41,704 INFO storage.BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:44313 in memory (size: 6.3 KiB, free: 413.9 MiB)
Traceback (most recent call last):
File "/root/PycharmProjects/pythonProject1/readspark.py", line 204, in <module>
city_avg_results = city_avg_age.collect()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/pyspark/rdd.py", line 1814, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in __call__
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 17) (172.20.10.3 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age
rounded_avg = __builtins__.round(avg_age_value, 1)
AttributeError: 'dict' object has no attribute 'round'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1018)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/user/Downloads/spark-3.4.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/root/PycharmProjects/pythonProject1/readspark.py", line 197, in calculate_avg_age
rounded_avg = __builtins__.round(avg_age_value, 1)
AttributeError: 'dict' object has no attribute 'round'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Process finished with exit code 1
最新发布