原文地址: http://blog.youkuaiyun.com/nsrainbow/article/details/43002387 最新课程请关注原作者博客,获得更好的显示体验
声明
说到Hive就一定要说到写程序的时候怎么调用Hive。以下我通过一个例子说明如果通过java来调用hive查询数据
服务准备
使用Jdbc方式链接hive,首先需要启动hive的Thrift Server,否则会导致错误“ Could not establish connection to localhost:10000/default: java.net.ConnectException: Connection refused”
hive --service hiveserver 是两”-“,
数据准备
建立一个文本文件叫 a.txt,内容是
1,terry 2,alex 3,jimmy 4,mike 5,kate
并上传到hive服务器的 /data/ 目录下
JDBC调用方法
加载Driver
加载driver (只说hive2的jdbc)
Class.forName( "org.apache.hive.jdbc.HiveDriver" );
连接数据库
Connection con = DriverManager.getConnection( "jdbc:hive2://host1:10000/default" , "hive" , "" );
这里的 host1 是主机名 10000是hive默认的端口名 default是默认的database hive是默认的用户名,默认密码是空
数据库操作语句
删除表
stmt.execute( "drop table if exists " + tableName);
创建表
stmt.execute( "create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'" );
查询数据
ResultSet res = stmt.executeQuery( "select * from " + tableName);
导入数据
stmt.execute( "load data local inpath '" + filepath + "' into table " + tableName);
例子
建立项目
先打开eclipse建立一个maven项目
pom.xml
< project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > < modelVersion > 4.0.0 </ modelVersion > < groupId > org.crazycake </ groupId > < artifactId > play-hive </ artifactId > < version > 0.0.1-SNAPSHOT </ version > < packaging > jar </ packaging > < name > play-hive </ name > < url > http://maven.apache.org </ url > < properties > < project.build.sourceEncoding > UTF-8 </ project.build.sourceEncoding > </ properties > < dependencies > < dependency > < groupId > junit </ groupId > < artifactId > junit </ artifactId > < version > 3.8.1 </ version > < scope > test </ scope > </ dependency > < dependency > < groupId > org.apache.hive </ groupId > < artifactId > hive-jdbc </ artifactId > < version > 0.14.0 </ version > </ dependency > < dependency > < groupId > org.apache.hadoop </ groupId > < artifactId > hadoop-common </ artifactId > < version > 2.2.0 </ version > </ dependency > </ dependencies > < build > < plugins > < plugin > < artifactId > maven-compiler-plugin </ artifactId > < version > 2.0.2 </ version > < configuration > < source > 1.6 </ source > < target > 1.6 </ target > < encoding > UTF-8 </ encoding > < optimise > true </ optimise > < compilerArgument > -nowarn </ compilerArgument > </ configuration > </ plugin > < plugin > < groupId > org.apache.maven.plugins </ groupId > < artifactId > maven-shade-plugin </ artifactId > < version > 2.3 </ version > < configuration > < transformers > < transformer implementation = "org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer" > </ transformer > </ transformers > </ configuration > < executions > < execution > < phase > package </ phase > < goals > < goal > shade </ goal > </ goals > </ execution > </ executions > </ plugin > </ plugins > </ build > </ project >
其中最重要的就是这两段
< dependency > < groupId > org.apache.hive </ groupId > < artifactId > hive-jdbc </ artifactId > < version > 0.14.0 </ version > </ dependency > < dependency > < groupId > org.apache.hadoop </ groupId > < artifactId > hadoop-common </ artifactId > < version > 2.2.0 </ version > </ dependency >
其他的都无所谓
建表、导入以及查询数据
建立一个类 HiveJdbcClient
package org.crazycake.play_hive; import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveJdbcClient { private static String driverName = "org.apache.hive.jdbc.HiveDriver" ; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); System.exit(1 ); } Connection con = DriverManager.getConnection("jdbc:hive2://host1:10000/default" , "hive" , "" ); Statement stmt = con.createStatement(); String tableName = "testhivedrivertable" ; stmt.execute("drop table if exists " + tableName); stmt.execute("create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'" ); String sql = "show tables '" + tableName + "'" ; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1 )); } sql = "describe " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1 ) + "\t" + res.getString( 2 )); } String filepath = "/data/a.txt" ; sql = "load data local inpath '" + filepath + "' into table " + tableName; System.out.println("Running: " + sql); stmt.execute(sql); sql = "select * from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(String.valueOf(res.getInt(1 )) + "\t" + res.getString( 2 )); } } }
输出是
Running: show tables 'testhivedrivertable' testhivedrivertable Running: describe testhivedrivertable key int value string Running: load data local inpath '/data/a.txt' into table testhivedrivertable Running: select * from testhivedrivertable 1 terry 2 alex 3 jimmy 4 mike 5 kate Running: select count(1) from testhivedrivertable
其实java调用很简单的,就是把你在hive shell 里面执行的语句用jdbc执行一遍而已,所以你传输过去的语句的环境是hive server机器,里面写的路径也是从hive server主机的根目录路径出发去寻找数据,所以我们的 a.txt 得上传到服务器上,这段代码才会运行正常。