作者:Syn良子 出处:http://www.cnblogs.com/cssdongl/p/7347167.html 转载请注明出处
记录自己最近抽空折腾虚拟机环境时用spark2.0的pyspark访问Hbase1.2时遇到的问题及解决过程.
连接准备
快速用pyspark访问Hbase中的表进行测试,代码如下(注意,其中的host和inputtable是已经定义好的主机和表名变量)
spark = SparkSession.builder.master("yarn-client").appName("statistics").getOrCreate()
hbaseconf = {"hbase.zookeeper.quorum":host,"hbase.mapreduce.inputtable":inputtable}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = spark.sparkContext.newAPIHadoopRDD(\
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",\
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",\
"org.apache.hadoop.hbase