问题描述:
spark程序读取hbase数据库,standalone模式提交后出现如下错误,异常栈如下:
2018-02-24 10:05:32,012 INFO [dag-scheduler-event-loop] scheduler.DAGScheduler: ResultStage 0 (count at HbaseApiDemo.scala:22) failed in 1.099 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.109, executor 0): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2773)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1599)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决办法
折腾了很近,参考网上说明,原因是executor缺少执行应用所需的jar包。(当然前提是driver启动时的jar你已经配置或在命令行中指定了)。
我的应用只是简单读取hbase中的数据,运行时依赖hbase相关的jar包,如下:
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar
/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar
因此我们只需要将依赖的jar包添加到executor执行时的classpath中就行了。
方法1:
提交应用的时候,在命令行中通过–jars指定
spark-2.2.1/bin/spark-submit --master "spark://master:7077" --jars /home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar,/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar --driver-class-path /home/zzhan/workspace/spark-2.2.1/jars/hbase/*:/home/zzhan/workspace/hbase-1.2.6/conf --class HbaseApiDemo jobs/sparkdemo_2.11-0.3.jar
方法2:
配置spark-defaults.conf文件,指定executor或driver依赖的jar包
spark.driver.extraClassPath /home/zzhan/workspace/spark-2.2.1/jars/hbase/*
# spark.executor.extraClassPath /home/zzhan/workspace/spark-2.2.1/jars/hbase/*
# jar之间用冒号分隔
spark.executor.extraClassPath /home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar:/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar
提示:你可以将需要依赖的jar包放到指定的目录中,就像spark.driver.extraClassPath那样
参考文章:
http://blog.youkuaiyun.com/u010842515/article/details/51451883
https://stackoverflow.com/questions/34901331/spark-hbase-error-java-lang-illegalstateexception-unread-block-data

在Spark standalone模式下,从Hbase读取数据时遇到'java.lang.IllegalStateException: unread block data'错误。解决方法是确保executor在执行时包含hbase相关jar包。可以通过在命令行提交应用时使用--jars参数或修改spark-defaults.conf配置文件来添加依赖jar。参考解决方案包括指定executor或driver的额外类路径。
5757

被折叠的 条评论
为什么被折叠?



