解决方法
创建SparkContext时设置一个属性
set("spark.serializer","org.apache.spark.serializer.KryoSerializer")
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 54.0 (TID 60) had a not serializable result: org.apache.kafka.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = spark-kafka-demo, partition = 0, offset = 80, CreateTime = 1587146352592, checksum = 2237683, serialized key size = 4, serialized value size = 22, key = nick, value = nick shanghai#200000 0))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
本文详细描述了在使用Spark处理Kafka数据时遇到的非序列化错误问题,特别是针对ConsumerRecord对象无法被Spark默认的序列化器处理的情况。通过设置SparkContext属性,将序列化方式更改为KryoSerializer,成功解决了这一问题。

6214

被折叠的 条评论
为什么被折叠?



