简述
spark在2.2.0版本是不支持通过jdbc的方式直接访问hive数据的,需要修改部分源码实现spark直接通过jdbc的方式读取hive数据,就在之前写的文章中的方法二里。
https://blog.youkuaiyun.com/qq_42213403/article/details/117557610?spm=1001.2014.3001.5501
还有一种方法不用重写源码,是先通过jdbc获取数据,再用spark封装成dataframe的方式操作的
实现过程
首先使用jdbc查询的方式获取hive表数据
def getResult()={
val properties = new Properties
properties.setProperty("url", "jdbc:hive2://192.168.5.61:10000/")
properties.setProperty("user", "hive")
properties.setProperty("password", "")
properties.setProperty("driver", "org.apache.hive.jdbc.HiveDriver")
val connection = getConnection(properties)
val statement = connection.createStatement
val resultSet = statement.executeQuery("select * from test.user_info")
resultSet
}
def getConnection(prop: Properties): Connection = try {
Class.forName(prop.getProperty("driver"))
conn = DriverManager.getConnection(prop.getProperty("url"), prop.getProperty("user"), prop.getProperty("password"))
conn
} catch {
case e: Exception =>
e.printStackTrace()
null
}
把查出的ResultSet转换成DataFrame
def createStructField(name:String,colType:String):StructField={
colType match {
case "java.lang.String" =>{StructField(name,StringType,true)}
case "java.lang.Integer" =>{StructField(name,IntegerType,true)}
case "java.lang.Long&#