spark Exception

本文介绍了一个关于Spark作业在读取HBase表时遇到的异常情况及其解决过程。异常原因是由于缺少必要的htrace-core库导致,通过在各节点上正确配置并部署缺失的jar包最终解决了问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

项目运行环境
CDH5.4.4

flowbaselinetable.sh

!/bin/bash

sudo -u hdfs spark-submit –class com.xx.FlowBaseLine \
–master yarn-client \
–jars /home/wanghongbin/test/driver_jar/mysql-connector-java-5.1.33.jar \
–executor-cores 5 \
–num-executors 5 \
–driver-memory 5G \
–executor-memory 5G \
/home/wanghongbin/test/TrainModel-1.0.0-SNAPSHOT.jar \
/user/flow/computebaselinetable

在用nohup ./flowbaselinetable.sh & 运行上面的flowbaselinetable.sh脚本文件,抛出下面的异常,
16/01/07 13:53:14 ERROR TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:183)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:230)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
at org.apache.spark.rdd.RDD

anonfun$partitions$2.apply(RDD.scala:219)atorg.apache.spark.rdd.RDD
anonfun partitions 2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1517)
at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
at com.zhongxin.FlowBaseLine .read4hbase(FlowBaseLine.scala:60)atcom.zhongxin.FlowBaseLine .main(FlowBaseLine.scala:46)
at com.zhongxin.FlowBaseLine.main(FlowBaseLine.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit .org apache spark deploy SparkSubmit runMain(SparkSubmit.scala:569)atorg.apache.spark.deploy.SparkSubmit .doRunMain 1(SparkSubmit.scala:166)atorg.apache.spark.deploy.SparkSubmit .submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit .main(SparkSubmit.scala:110)atorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Causedby:java.lang.reflect.InvocationTargetExceptionatsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod)atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)atjava.lang.reflect.Constructor.newInstance(Constructor.java:526)atorg.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)24moreCausedby:java.lang.NoClassDefFoundError:org/apache/htrace/Traceatorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)atorg.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)atorg.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)atorg.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)atorg.apache.hadoop.hbase.client.ConnectionManager HConnectionImplementation.retrieveClusterId(ConnectionManager.java:850)
at org.apache.hadoop.hbase.client.ConnectionManager HConnectionImplementation.(ConnectionManager.java:635)29moreCausedby:java.lang.ClassNotFoundException:org.apache.htrace.Traceatjava.net.URLClassLoader 1.run(URLClassLoader.java:366)
at java.net.URLClassLoader 1.run(URLClassLoader.java:355)atjava.security.AccessController.doPrivileged(NativeMethod)atjava.net.URLClassLoader.findClass(URLClassLoader.java:354)atjava.lang.ClassLoader.loadClass(ClassLoader.java:425)atsun.misc.Launcher AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
… 35 more

Exception in thread “main” java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task’s full log for more details.
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:241)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
at org.apache.spark.rdd.RDD

anonfun$partitions$2.apply(RDD.scala:219)atorg.apache.spark.rdd.RDD
anonfun partitions 2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1517)
at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
at com.zhongxin.FlowBaseLine .read4hbase(FlowBaseLine.scala:60)atcom.zhongxin.FlowBaseLine .main(FlowBaseLine.scala:46)
at com.zhongxin.FlowBaseLine.main(FlowBaseLine.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit .org apache spark deploy SparkSubmit runMain(SparkSubmit.scala:569)atorg.apache.spark.deploy.SparkSubmit .doRunMain 1(SparkSubmit.scala:166)atorg.apache.spark.deploy.SparkSubmit .submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:389)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:236)
… 20 more
16/01/07 13:53:15 INFO ClientCnxn: Opening socket connection to server hadoop-slave2/10.23.147.64:2181. Will not attempt to authenticate using SASL (unknown error)
16/01/07 13:53:15 INFO ClientCnxn: Socket connection established, initiating session, client: /10.23.147.62:45117, server: hadoop-slave2/10.23.147.64:2181
16/01/07 13:53:15 INFO ClientCnxn: Session establishment complete on server hadoop-slave2/10.23.147.64:2181, sessionid = 0x25216ca47df061a, negotiated timeout = 60000

google发现在博客http://www.qt4.net/spark-load-hbase/中有解决的办法,缺少了jar包,下载后放入/opt/cloudera/parcels/CDH/jars/下面,
然后修改flowbaselinetable.sh文件,修改后如下:

!/bin/bash

sudo -u hdfs spark-submit –class com.xx.FlowBaseLine \
–master yarn-client \
–jars /home/wanghongbin/test/driver_jar/mysql-connector-java-5.1.33.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar \
–executor-cores 5 \
–num-executors 5 \
–driver-memory 5G \
–executor-memory 5G \
/home/wanghongbin/test/TrainModel-1.0.0-SNAPSHOT.jar \
/user/flow/computebaselinetable
但是还是依然报上面的异常,郁闷了N天后发现在/etc/spark/目录下有个classpath.txt文件,要把/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar 这个内容追加到这个txt中,
然后把htrace-core-3.1.0-incubating.jar这个文件SCP到每个节点中,当然也要把classpath.txt SCP到每个节点中.
重启CM中的spark,运行脚本.OK!!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值