我新定义了一个类(tools.UCleaner),放到Spark中做数据清洗的时候,跑了一个任务未序列化的异常
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1411)
at main.ScalaMain$.main(ScalaMain.scala:28)
at main.ScalaMain.main(ScalaMain.scala)
Caused by: java.io.NotSerializableException: tools.UCleaner
Serialization stack:
- object not serializable (class: tools.UCleaner, value: tools.UCleaner@ab6ab0)
- field (class: main.ScalaMain$$anonfun$1, name: cleaner$1, type: class tools.UCleaner)
- object (class main.ScalaMain$$anonfun$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 12 more
解决办法:
给tools.UCleaner类序列化,继承自Serializable
如图:
本文介绍在使用自定义类UCleaner进行Spark数据清洗时遇到的任务不可序列化异常,并提供了解决方案,即通过使UCleaner类实现Serializable接口来解决此问题。
4747

被折叠的 条评论
为什么被折叠?



