scala学习-scala读取Hbase表中数据并且做join连接查询

在这里插入图片描述

1。业务需求:sparkSQL on hbase ,sparkSQL直接读取Hbase中的两个表,进行连接查询。
2。图示
这里写图片描述
绿色的线
上图中绿色的线是做过测试的,直接在hive中建表,然后load数据进去,数据文件是存储在HDFS上的。
(1)建表

create table mycase(
c_code string,
c_rcode string,
c_region string,
c_cate string,
c_start string,
c_end string,
c_start_m bigint,
c_end_m bigint,
c_name string,
c_mark string) 
row format delimited fields terminated by ',' stored as textfile; 

load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/mycase.txt' overwrite into table mycase; 

create table p_case(
p_code string,  
p_status string,
p_isend int
)
row format delimited fields terminated by ',' stored as textfile;

load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/p_case.txt' overwrite into table p_case; 

create table crime_man(
m_acode string,  
m_pcode string)
row format delimited fields terminated by ',' stored as textfile;

load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/crime_man.txt' overwrite into table crime_man; 

create table wb(
w_id bigint,
w_region string,
w_wname string,
w_address string,
w_uname string,
w_code string,
w_start string,
w_end string,
w_start_m bigint,
w_end_m bigint
) 
row format delimited fields terminated by ',' stored as textfile; 

load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/wbfile.txt' overwrite into table wb;  

create table hotel(
h_id bigint,
h_region string,
h_hname string,
h_address string,
h_uname string,
h_code string,
h_start string,
h_end string,
h_start_m bigint,
h_end_m bigint,
h_homecode string) 
row format delimited fields terminated by ',' stored as textfile; 

load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/hotelfile.txt' overwrite into table hotel;  

(2)添加数据
mycase.txt

A0,7,杭州市萧山区,杀人案件,2006/06/23 00:00:00,2006/06/23 21:00:00,1150992000000,1151067600000,案件名称0,暂无
A1,0,杭州市其他区,刑事案件,2006/06/25 00:00:00,2006/06/25 09:00:00,1151164800000,1151197200000,案件名称1,暂无
A2,1,杭州市上城区,强奸案件,2006/06/28 00:00:00,2006/06/28 10:00:00,1151424000000,1151460000000,案件名称2,暂无
A3,7,杭州市萧山区,杀人案件,2006/07/02 00:00:00,2006/07/02 01:00:00,1151769600000,1151773200000,案件名称3,暂无
A4,0,杭州市其他区,盗窃案件,2006/07/05 00:00:00,2006/07/05 16:00:00,1152028800000,1152086400000,案件名称4,暂无
A5,5,杭州市西湖区,强奸案件,2006/07/06 00:00:00,2006/07/06 21:00:00,1152115200000,1152190800000,案件名称5,暂无
A6,3,杭州市拱墅区,杀人案件,2006/07/06 00:00:00,2006/07/06 16:00:00,1152115200000,1152172800000,案件名称6,暂无
A7,3,杭州市拱墅区,杀人案件,2006/07/08 00:00:00,2006/07/08 10:00:00,1152288000000,1152324000000,案件名称7,暂无
A8,3,杭州市拱墅区,盗窃案件,2006/07/10 00:00:00,2006/07/10 02:00:00,1152460800000,1152468000000,案件名称8,暂无
A9,4,杭州市江干区,盗窃案件,2006/07/14 00:00:00,2006/07/14 13:00:00,1152806400000,1152853200000,案件名称9,暂无
A10,4,杭州市江干区,强奸案件,2006/07/17 00:00:00,2006/07/17 00:00:00,1153065600000,1153065600000,案件名称10,暂无
A11,1,杭州市上城区,杀人案件,2006/07/21 00:00:00,2006/07/21 21:00:00,1153411200000,1153486800000,案件名称11,暂无
A12,3,杭州市拱墅区,强奸案件,2006/07/21 00:00:00,2006/07/21 16:00:00,1153411200000,1153468800000,案件名称12,暂无
A13,7,杭州市萧山区,杀人案件,2006/07/21 00:00:00,2006/07/21 21:00:00,1153411200000,1153486800000,案件名称13,暂无
A14,4,杭州市江干区,盗窃案件,2006/07/23 00:00:00,2006/07/23 08:00:00,1153584000000,1153612800000,案件名称14,暂无
A15,2,杭州市下城区,盗窃案件,2006/07/26 00:00:00,2006/07/26 01:00:00,1153843200000,1153846800000,案件名称15,暂无
A16,3,杭州市拱墅区,刑事案件,2006/07/28 00:00:00,2006/07/28 10:00:00,1154016000000,1154052000000,案件名称16,暂无
A17,0,杭州市其他区,杀人案件,2006/07/28 00:00:00,2006/07/28 06:00:00,1154016000000,1154037600000,案件名称17,暂无
A18,0,杭州市其他区,刑事案件,2006/08/01 00:00:00,2006/08/01 15:00:00,1154361600000,1154415600000,案件名称18,暂无
A19,4,杭州市江干区,盗窃案件,2006/08/01 00:00:00,2006/08/01 20:00:00,1154361600000,1154433600000,案件名称19,暂无
A20,8,杭州市余杭区,杀人案件,2006/08/04 00:00:00,2006/08/04 06:00:00,1154620800000,1154642400000,案件名称20,暂无

p_case.txt

A0,移送起诉
A1,破案状态
A2,移送起诉
A3,破案状态
A4,移送起诉
A5,破案状态
A6,移送起诉
A7,移送起诉
A8,破案状态
A9,侦查终结
A10,侦查终结
A11,破案状态
A12,侦查终结
A13,破案状态
A14,移送起诉
A15,破案状态
A16,破案状态
A17,侦查终结
A18,移送起诉
A19,破案状态
A20,侦查终结

crime_man.txt

A0,U0
A0,U1
A1,U0
A1,U1
A1,U2
A1,U3
A1,U4
A1,U5
A1,U6
A1,U7
A1,U8
A2,U0
A2,U1
A2,U2
A2,U3
A2,U4
A2,U5
A2,U6
A3,U0
A3,U1
A4,U0
A4,U1
A4,U2
A4,U3
A5,U0
A6,U0
A6,U1
A6,U2
A6,U3
A6,U4
A6,U5
A6,U6
A7,U0
A8,U0
A8,U1
A8,U2
A8,U3
A8,U4
A8,U5
A9,U0
A9,U1
A10,U0
A10,U1
A10,U2
A10,U3
A10,U4
A10,U5
A10,U6
A11,U0
A11,U1
A11,U2
A11,U3
A12,U0
A13,U0
A13,U1
A13,U2
A13,U3
A13,U4
A13,U5
A13,U6
A13,U7
A13,U8
A14,U0
A14,U1
A14,U2
A14,U3
A14,U4
A14,U5
A14,U6
A14,U7
A15,U0
A15,U1
A15,U2
A15,U3
A16,U0
A16,U1
A17,U0
A17,U1
A17,U2
A17,U3
A17,U4
A18,U0
A18,U1
A19,U0
A19,U1
A19,U2
A19,U3
A19,U4
A19,U5
A19,U6
A20,U0
A20,U1
A20,U2
A20,U3
A20,U4
A20,U5
A20,U6
A20,U7
A20,U8

wbfile.txt

0,1,网吧583,杭州市上城区xx670路280号,姓名58,U86,2006/06/23 00:00:00,2006/06/23 19:00:00,1150992000000,1151060400000
1,0,网吧757,杭州市其他区xx570路266号,姓名55,U636,2006/06/23 00:00:00,2006/06/23 19:00:00,1150992000000,1151060400000
2,0,网吧283,杭州市其他区xx332路89号,姓名30,U793,2006/06/24 00:00:00,2006/06/24 19:00:00,1151078400000,1151146800000
3,3,网吧129,杭州市拱墅区xx662路713号,姓名33,U570,2006/06/27 00:00:00,2006/06/27 04:00:00,1151337600000,1151352000000
4,8,网吧434,杭州市余杭区xx975路721号,姓名59,U766,2006/06/29 00:00:00,2006/06/29 18:00:00,1151510400000,1151575200000
5,4,网吧80,杭州市江干区xx959路481号,姓名80,U318,2006/07/01 00:00:00,2006/07/01 18:00:00,1151683200000,1151748000000
6,6,网吧611,杭州市滨江区xx853路84号,姓名18,U220,2006/07/03 00:00:00,2006/07/03 19:00:00,1151856000000,1151924400000
7,1,网吧913,杭州市上城区xx560路157号,姓名56,U5,2006/07/03 00:00:00,2006/07/03 06:00:00,1151856000000,1151877600000
8,7,网吧684,杭州市萧山区xx754路827号,姓名34,U233,2006/07/07 00:00:00,2006/07/07 16:00:00,1152201600000,1152259200000
9,4,网吧545,杭州市江干区xx765路502号,姓名66,U167,2006/07/09 00:00:00,2006/07/09 21:00:00,1152374400000,1152450000000
10,2,网吧661,杭州市下城区xx690路657号,姓名96,U380,2006/07/09 00:00:00,2006/07/09 04:00:00,1152374400000,1152388800000
11,8,网吧928,杭州市余杭区xx61路688号,姓名90,U386,2006/07/12 00:00:00,2006/07/12 23:00:00,1152633600000,1152716400000
12,0,网吧979,杭州市其他区xx618路41号,姓名40,U378,2006/07/13 00:00:00,2006/07/13 09:00:00,1152720000000,1152752400000
13,1,网吧139,杭州市上城区xx666路869号,姓名97,U685,2006/07/13 00:00:00,2006/07/13 07:00:00,1152720000000,1152745200000
14,7,网吧109,杭州市萧山区xx558路485号,姓名32,U884,2006/07/15 00:00:00,2006/07/15 02:00:00,1152892800000,1152900000000
15,3,网吧866,杭州市拱墅区xx738路6号,姓名51,U629,2006/07/18 00:00:00,2006/07/18 09:00:00,1153152000000,1153184400000
16,0,网吧330,杭州市其他区xx251路887号,姓名79,U239,2006/07/22 00:00:00,2006/07/22 17:00:00,1153497600000,1153558800000
17,7,网吧138,杭州市萧山区xx385路448号,姓名57,U690,2006/07/22 00:00:00,2006/07/22 14:00:00,1153497600000,1153548000000
18,0,网吧816,杭州市其他区xx61路99号,姓名62,U137,2006/07/26 00:00:00,2006/07/26 01:00:00,1153843200000,1153846800000
19,5,网吧147,杭州市西湖区xx612路924号,姓名40,U569,2006/07/28 00:00:00,2006/07/28 17:00:00,1154016000000,1154077200000
20,0,网吧509,杭州市其他区xx569路234号,姓名54,U361,2006/07/30 00:00:00,2006/07/30 12:00:00,1154188800000,1154232000000

hotelfile.txt

1,5,宾馆598,杭州市西湖区xx268路894号,姓名38,U225,2006/06/24 00:00:00,2006/06/24 00:19:00,1151078400000,1151079540000,13
2,3,宾馆758,杭州市拱墅区xx480路729号,姓名92,U651,2006/06/25 00:00:00,2006/06/25 00:01:00,1151164800000,1151164860000,227
3,7,宾馆499,杭州市萧山区xx173路827号,姓名18,U329,2006/06/26 00:00:00,2006/06/26 00:04:00,1151251200000,1151251440000,794
4,7,宾馆478,杭州市萧山区xx620路622号,姓名57,U314,2006/06/27 00:00:00,2006/06/27 00:11:00,1151337600000,1151338260000,65
5,3,宾馆692,杭州市拱墅区xx165路624号,姓名15,U399,2006/06/28 00:00:00,2006/06/28 00:07:00,1151424000000,1151424420000,895
6,2,宾馆31,杭州市下城区xx635路833号,姓名60,U606,2006/06/29 00:00:00,2006/06/29 00:07:00,1151510400000,1151510820000,174
7,4,宾馆198,杭州市江干区xx622路536号,姓名71,U158,2006/06/29 00:00:00,2006/06/29 00:00:00,1151510400000,1151510400000,517
8,8,宾馆390,杭州市余杭区xx328路848号,姓名36,U27,2006/06/30 00:00:00,2006/06/30 00:11:00,1151596800000,1151597460000,670
9,4,宾馆398,杭州市江干区xx53路761号,姓名59,U624,2006/06/30 00:00:00,2006/06/30 00:01:00,1151596800000,1151596860000,878
10,0,宾馆1,杭州市其他区xx715路756号,姓名3,U703,2006/07/01 00:00:00,2006/07/01 00:00:00,1151683200000,1151683200000,898
11,4,宾馆53,杭州市江干区xx813路302号,姓名24,U226,2006/07/01 00:00:00,2006/07/01 00:10:00,1151683200000,1151683800000,983
12,8,宾馆718,杭州市余杭区xx911路813号,姓名1,U548,2006/07/01 00:00:00,2006/07/01 00:20:00,1151683200000,1151684400000,575
13,5,宾馆553,杭州市西湖区xx641路69号,姓名33,U265,2006/07/01 00:00:00,2006/07/01 00:06:00,1151683200000,1151683560000,122
14,4,宾馆179,杭州市江干区xx661路224号,姓名34,U262,2006/07/01 00:00:00,2006/07/01 00:17:00,1151683200000,1151684220000,131
15,4,宾馆582,杭州市江干区xx417路704号,姓名19,U813,2006/07/01 00:00:00,2006/07/01 00:23:00,1151683200000,1151684580000,0
16,8,宾馆895,杭州市余杭区xx527路341号,姓名80,U362,2006/07/02 00:00:00,2006/07/02 00:15:00,1151769600000,1151770500000,11
17,1,宾馆6,杭州市上城区xx62路637号,姓名35,U434,2006/07/02 00:00:00,2006/07/02 00:07:00,1151769600000,1151770020000,939
18,0,宾馆889,杭州市其他区xx943路239号,姓名46,U614,2006/07/02 00:00:00,2006/07/02 00:16:00,1151769600000,1151770560000,565
19,6,宾馆322,杭州市滨江区xx430路162号,姓名71,U911,2006/07/02 00:00:00,2006/07/02 00:10:00,1151769600000,1151770200000,542
20,4,宾馆491,杭州市江干区xx529路615号,姓名63,U911,2006/07/03 00:00:00,2006/07/03 00:09:00,1151856000000,1151856540000,385

(3)执行hive元数据命令

[root@bigdata01 ~]# hive --service metastore

(4)执行sparkSQL命令行

bin/spark-sql --master yarn-client --executor-memory 80g --conf spark.sql.warehouse.dir=hdfs://bigdata01.hzjs.co:8020/user/sparksql --conf spark.driver.maxResultSize=10g

(5)测试sql语句

案件发生区域内 2017年 盗窃案件 区域为3的 同时出现在网吧和宾馆的人 时间在两天内的
select c_rcode,c_code,c_name,c_region,p_status,h_region,h_hname,h_uname,h_code,w_region,w_wname,w_uname,w_code from mycase  left join p_case on mycase.c_code=p_case.p_code  left join hotel on mycase.c_rcode=hotel.h_region  left join wb on mycase.c_rcode=wb.w_region  where p_status !='破案状态' and c_cate='盗窃案件' and c_rcode = '3' and  3200000000 < c_start_m and c_start_m < 1514736000000  and h_code=w_code  and    ( c_start_m - 86400000 * 10 )< w_start_m  and   w_end_m < ( c_start_m + 86400000 * 10 ) and   ( c_start_m - 86400000 * 10 )< h_start_m  and   h_end_m < ( c_start_m + 86400000 * 10 ) ;

(6)执行结果

Time taken: 25.288 seconds, Fetched 25 row(s)

蓝色的线
蓝色的线只需要在建表的时候在hive里建立外部表,表指向Hbase中的一个表就可以了

create external table test_lcc_person (rowkey string,'name' string,'sex' string,'age' string) row format delimited fields terminated by '\t' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,lcc_liezu:name,lcc_liezu:sex,lcc_liezu:age") TBLPROPERTIES ("hbase.table.name" = "test_lcc_person");

test_lcc_person两处的名字要相同,该命令在hive命令行中执行

3。思路:你读出来的数据hbaseRDD通过transform转成dataframe,然后register 成table,再join,再save不就行呢?

4。做java的不会,只能读取 一个表,还不会转换

package com.lcc.spark.hbase.test;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Row;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.protobuf.generated.ClientProtos;
import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;

import scala.Tuple2;

public class SparkOnHbase {

	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
		
		System.setProperty("hadoop.home.dir", "E:\\02-hadoop\\hadoop-2.7.3\\");
	    System.setProperty("HADOOP_USER_NAME", "root"); 
	    
	    System.setProperty("HADOOP_USER_NAME", "root"); 
		
	   // System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
	    
		SparkConf conf = new SparkConf();
		conf.setAppName("LG_CALCULATE");
		conf.setMaster("local");

		JavaSparkContext context = new JavaSparkContext(conf);
		
		
		Configuration configuration = HBaseConfiguration.create();  
        configuration.set("hbase.zookeeper.property.clientPort", "2181");  
        configuration.set("hbase.zookeeper.quorum", "192.168.10.82");  
        //configuration.set("hbase.master", "192.168.10.82:60000");  
        
        Scan scan = new Scan();
        String tableName = "test_lcc_person";
        configuration.set(TableInputFormat.INPUT_TABLE, tableName);
        
        ClientProtos.Scan proto = ProtobufUtil.toScan(scan);
        String ScanToString = Base64.encodeBytes(proto.toByteArray());
        
        configuration.set(TableInputFormat.SCAN, ScanToString);
        
        JavaPairRDD<ImmutableBytesWritable, Result> myRDD = context.newAPIHadoopRDD(configuration,TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
        
        System.out.println(myRDD.count());
       
        
        
        myRDD.foreach(new VoidFunction<Tuple2<ImmutableBytesWritable,Result>>(){

			@Override
			public void call(Tuple2<ImmutableBytesWritable, Result> tuple)
					throws Exception {
				Result result = tuple._2();
				String rowkey = Bytes.toString(result.getRow());
	            String name = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("name")));
	            String sex = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("sex")));
	            String age = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("age")));
	        	System.out.print(rowkey);
	            System.out.print("\t");
	            System.out.print(name);
	            System.out.print("\t");
	            System.out.print(sex);
	            System.out.print("\t");
	            System.out.print(age);
	            System.out.println("\t");
				
			}
        	
        });

	}

}

5。采用scala学习

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}


object Test {
  
  def main(args: Array[String]): Unit = {
        // 本地模式运行,便于测试
        val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")

        // 创建hbase configuration
        val hBaseConf = HBaseConfiguration.create()
        hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  
        hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); 
        hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")

        // 创建 spark context
        val sc = new SparkContext(sparkConf)
        val sqlContext = new SQLContext(sc)
        import sqlContext.implicits._

        // 从数据源获取数据
        val hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

        // 将数据映射为表  也就是将 RDD转化为 dataframe schema
        val shop = hbaseRDD.map(r=>(
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age")))
        )).toDF("name","sex","age")

        shop.registerTempTable("shop")

        // 测试
        val df2 = sqlContext.sql("SELECT * FROM shop")
        println(df2.count())
        df2.collect().foreach(print(_))
        //df2.foreach(println)
  }
  
}

输出结果:

[梁川川1,男,12][梁川川2,男,12][梁川川3,男,12][梁川川4,男,12][梁川川5,男,12][梁川川6,男,12][梁川川7,男,17]

证明 读出来的数据hbaseRDD通过transform转成dataframe,然后register 成table 这个想法是正确的。

6。试试两个表读取试试

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}


object Test {
  
  def main(args: Array[String]): Unit = {
        // 本地模式运行,便于测试
        val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")

        // 创建hbase configuration
        val hBaseConf = HBaseConfiguration.create()
        hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  
        hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); 
        hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")
        
        
        
        // 创建 spark context
        val sc = new SparkContext(sparkConf)
        val sqlContext = new SQLContext(sc)
        import sqlContext.implicits._

        // 从数据源获取数据
        val hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

        // 将数据映射为表  也就是将 RDD转化为 dataframe schema
        val shop = hbaseRDD.map(r=>(
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age")))
        )).toDF("name","sex","age")

        shop.registerTempTable("shop")

        // 测试
        val df2 = sqlContext.sql("SELECT * FROM shop")
        println(df2.count())
        df2.collect().foreach(print(_))
        //df2.foreach(println)
        
        
        
        
        
        
        
        
        
         // 创建hbase configuration
        val hBaseConf2 = HBaseConfiguration.create()
        hBaseConf2.set("hbase.zookeeper.property.clientPort", "2181");  
        hBaseConf2.set("hbase.zookeeper.quorum", "192.168.10.82"); 
        hBaseConf2.set(TableInputFormat.INPUT_TABLE,"test_lcc_card")
        
        
        
        // 创建 spark context
        val sc2 = new SparkContext(sparkConf)
        val sqlContext2 = new SQLContext(sc2)
        import sqlContext.implicits._

        // 从数据源获取数据
        val hbaseRDD2 = sc2.newAPIHadoopRDD(hBaseConf2,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

        
        
        
        // 将数据映射为表  也就是将 RDD转化为 dataframe schema
        val card = hbaseRDD.map(r=>(
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("code"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("money"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("time")))
        )).toDF("code","money","time")

        card.registerTempTable("mycard")
        
        // 测试
        val df3 = sqlContext.sql("SELECT * FROM mycard")
        println(df3.count())
        df3.collect().foreach(print(_))
        
  }
  
}

但是结果报错

[梁川川1,男,12][梁川川2,男,12][梁川川3,男,12][梁川川4,男,12][梁川川5,男,12][梁川川6,男,12][梁川川7,男,17]Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
Test$.main(Test.scala:25)
Test.main(Test.scala)
	at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2285)

只能有一个SparkContext 存在。

7。改进以下程序

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}


object Test {
  
  def main(args: Array[String]): Unit = {
        // 本地模式运行,便于测试
        val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")

        // 创建hbase configuration
        val hBaseConf = HBaseConfiguration.create()
        hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  
        hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); 
        //var con = ConnectionFactory.createConnection(hBaseConf)
        
        //var table = con.getTable(TableName.valueOf(""))
        
       hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")
      
        
        
        // 创建 spark context
        val sc = new SparkContext(sparkConf)
        val sqlContext = new SQLContext(sc)
        import sqlContext.implicits._

        // 从数据源获取数据
        var hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

        // 将数据映射为表  也就是将 RDD转化为 dataframe schema
        val shop = hbaseRDD.map(r=>(
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("id"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age")))
        )).toDF("id","name","sex","age")

        shop.registerTempTable("shop")

        // 测试
        val df2 = sqlContext.sql("SELECT * FROM shop")
        println(df2.count())
        df2.collect().foreach(print(_))
        //df2.foreach(println)
        
        
        hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_card")
        hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
        // 将数据映射为表  也就是将 RDD转化为 dataframe schema
        val card = hbaseRDD.map(r=>(
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("ids"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("code"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("money"))),
            Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("time")))
        )).toDF("ids","code","money","time")

        card.registerTempTable("mycard")
        
        // 测试
        val df3 = sqlContext.sql("SELECT * FROM mycard")
        println(df3.count())
        df3.collect().foreach(print(_))
        
        
        
         // 测试
        val df4 = sqlContext.sql("SELECT * FROM shop inner join mycard on id=ids")
        println(df4.count())
        df4.collect().foreach(println(_))
        
  }
  
}

测试结果

[7,梁川川7,男,17,7,7777,7777,2015-10-11]
[3,梁川川3,男,12,3,3333,333,2015-10-11]
[5,梁川川5,男,12,5,55,55,2015-10-11]
[6,梁川川6,男,12,6,6666,6666,2015-10-11]
[1,梁川川1,男,12,1,1111111111,1111111,2015-10-11]
[4,梁川川4,男,12,4,444,444,2015-10-11]
[2,梁川川2,男,12,2,22222,22222,2015-10-11]

测试成功

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

九师兄

你的鼓励是我做大写作的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值