环境 : hadoop-2.7.7, Spark-2.2.0, Hbase-2.1.1
参考此文测试 :
在PySpark中使用saveAsNewAPIHadoopDataset操作Hbase报错, 错误信息 :
18/11/12 00:05:42 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at SparkHadoopMapReduceWriter.scala:88) failed in 2.072 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 6, 192.168.1.106, executor 1): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:178)
.
.
.
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.add([B[B[B)Lorg/apache/hadoop/hbase/client/Put;
at org.apache.spark.examples.pythonconverters.StringListToPutConverter.convert(HBaseConverters.scala:81)
at org.apache.spark.examples.pythonconverters.StringListToPutConverter.convert(HBaseConverters.scala:77)
at org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:181)
at org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:181)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:147)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:144)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:159)
... 8 more
此问题并不是因为jar包冲突
降低Hbase-2.1.1版本到Hbase-1.4.8, 依旧报错 :
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family table does not exist in region hbase:meta,,1.1588230740 in table 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', BLOOMFILTER => 'NONE', VERSIONS => '10', IN_MEMORY => 'true', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', CACHE_DATA_IN_L1 => 'true', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '8192', REPLICATION_SCOPE => '0'}
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:7889)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6893)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2079)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33766)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2205)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:748)
此问题google也未找到答案
再降低Hbase-1.4.8版本到Hbase-1.2.8 , 数据正常写入