During reading htable via spark scala code, the following error happened:
15/10/28 16:39:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 2536, slave14.dc.tj): java.lang.RuntimeException: java.io.NotSerializableException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.io.ImmutableBytesWritable, value: 30 30 5f 39 39 39 38 38 33)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (30 30 5f 39 39 39 38 38 33,keyval......
15/10/28 16:39:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 2536, slave14.dc.tj): java.lang.RuntimeException: java.io.NotSerializableException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
Serialization stack:
- object not serializable (class: org.apache.hadoop.hbase.io.ImmutableBytesWritable, value: 30 30 5f 39 39 39 38 38 33)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (30 30 5f 39 39 39 38 38 33,keyval......
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
......
The solution is turn spark to use KryoSerializer:
sparkconf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
本文介绍了一个在使用Spark Scala代码读取HBase表时遇到的非序列化异常问题,并给出了通过配置Kryo序列化器来解决此问题的方法。
619

被折叠的 条评论
为什么被折叠?



