一、准备条件
本地(local windows)访问集群中的hive需要满足以下4个条件:
(1)在IDEA中的项目文件main中创建resources资源文件,然后将集群hive的hive-site.xml配置文件下载下来,放到resources文件中。
(2)需要导入的maven依赖,参考版本:
org.apache.spark
spark-hive_2.11
${spark.version}
(3)创建程序的入口,这里有两种做法:第一种用SparkSession设置enableHiveSupport():表示支持hive
val spark = SparkSession.builder().appName(Test.getClass.getSimpleName)
.master("local")
.config("spark.testing.memory", "471859200")
.enableHiveSupport()
.getOrCreate()
第二种用老的API:HiveContext
val conf = new SparkConf()
val sc = new SparkContext(conf)
val hq = new HiveContext(sc)
(4)设置访问集群Hdfs的用户:
System.setProperty(“HADOOP_USER_NAME”,“hdfs”) ,当然涉及到访问Hdfs所以还需要访问hdfs上的路径的两个配置文件:hdfs-site.xml和core-site.xml ,把这个两个文件也放到resources资源文件下。
二、代码例子:
package com.bigdata
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.{Row, SparkSession}
object MingXing {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME","hdfs")
val spark = SparkSession.builder().appName(MingXing.getClass.getSimpleName)
.master("local")
.config("spark.testing.memory", "471859200")
.enableHiveSupport()
.getOrCreate()
//往hive的表里面去插入数据,批量插入数据
spark.sql("use shuairui_user")
spark.sql(“insert into table mingxing1 select * from hive_to_hbase”)
System.err.println(" loading datas into hive success !")
//往hive的表里面读取数据
val a ="select name,sex,age from student"
spark.sql(a).show(true)
//两个表做join,广播变量 把小表广播到executor中,然后做join
val b =spark.sql("select name,sex,age from student")
val c =spark.sql("select name,department from student")
val d =spark.sql("select id,name from student")
//将两个小表c 、d 广播到executor中
val brodcast = spark.sparkContext.broadcast(c)
val brod =spark.sparkContext.broadcast(d)
//left 、left_outer 、leftouter 都代表左连接 其他的连接方式类似
val join01 =b.join(brodcast.value,b("name") === c("name"),"left_outer")
.select(b("name"),b("sex"),c("department"))
val join02 =join01.join(brod.value,join01("name")===d("name"),"left")
.select(join01("name"),join01("sex"),d("id"))
join02.show(true)
spark.stop()
}
}