安装hive 安装与配置

Hive安装与使用指南

最新推荐文章于 2021-11-18 21:57:38 发布

LHYF

最新推荐文章于 2021-11-18 21:57:38 发布

阅读量190

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/yangfanor/article/details/84666285

在h1上解压apache-hive-0.13.1-bin.tar.gz

将conf 中配置文件重命名 hive-default.xml.template --> hive-site.xml

1 hive数据仓库 (apache-hive-0.13.1-bin.tar.gz)

hive:解释器,编译器,优化器等

hive运行时,元数据存储在关系型数据库中.

安装一个关系型数据库(mysql) 并手动建立好名为hive的数据库，字符集设置为：latin1

修改配置文件 hive-site.xml

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://h1:3306/hive?characterEncoding=UTF-8</value>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

</property>

进入到 bin 执行 ./hive

创建表

create table t_emp(

id int,

name string,

age int,

dept_name string

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ',';

导入数据

hive> load data local inpath '/home/emp.txt' into table t_emp;

Copying data from file:/home/emp.txt

Copying file: file:/home/emp.txt

Loading data to table default.t_emp

Table default.t_emp stats: [numFiles=1, numRows=0, totalSize=120, rawDataSize=0]

Time taken: 2.791 seconds

如果使用hive> load data local inpath '/home/emp.txt' overwrite into table t_emp;

overwrite 关键字会告诉hive删除表对应目录中已有的所有文件。如果省去这一关键字，hive

就简单的把新的文件加入目录（除非目录下正好有同名的文件，此时将替换掉原有的同名文件）

执行查询

hive> select count(*) from t_emp;

======================================

create table t_person(

id int,

name string,

like array<string>,

tedian map<string,string>

)

row format delimited

fields terminated by ','

collection items terminated by '_'

map keys terminated by ':';

1,zhangsan,sports_books_TV,sex:男_color:red

2,lisi,sports_books_TV,sex:男_color:red

2.HQL脚本有三种方式执行.

① hive -e 'hql'

② hive -f 'hql.txt'

③ hive jdbc 代码执行脚本

启动hive 服务

修改配置文件 hive-site.xml

<name>hive.server2.thrift.bind.host</name>

<description>Bind host on which to run the HiveServer2 Thrift interface.

Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>

</property>

<name>hive.server2.long.polling.timeout</name>

</property>

启动服务

[root@h1 bin]# ./hive --service hiveserver2

[root@h1 ~]# netstat -nalp|grep 10000

tcp 0 0 192.168.0.201:10000 0.0.0.0:* LISTEN 3402/java

使用hive自带客户端beeline 连接hive

[root@h1 bin]# ./beeline

Beeline version 0.13.1 by Apache Hive

beeline> !connect jdbc:hive2://h1:10000/default

scan complete in 3ms

Connecting to jdbc:hive2://h1:10000/default

Enter username for jdbc:hive2://h1:10000/default: root

Enter password for jdbc:hive2://h1:10000/default:

Connected to: Apache Hive (version 0.13.1)

Driver: Hive JDBC (version 0.13.1)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://h1:10000/default> show tables;

+-----------+

| tab_name |

+-----------+

| t_emp |

+-----------+

==============hive统计tomcat访问日志文件========================

CREATE TABLE t_log (

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING,

referer STRING,

agent STRING

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*?) .*?\\] \"([^ ]*) (.*?)\" ([^ ]*) ([^ ]*)")

STORED AS TEXTFILE;

添加 jar

hive> add jar /home/hive-0.13.1/lib/hive-contrib-0.13.1.jar;

hive> select status, count(*) s from t_log group by status order by s desc;

3.hive 有两张函数 UDF,UDAF;

①.UDF 操作用于单个数据行,并且产生一个数据行作为输出,大多数函数(例如数学函数和字符串函数)都属于这一类.

②.UDAF:接受多个输入数据行,并且产生一个输出数据行. 如count聚合函数.avg,min,max

自定义UDF函数

package org.lhyf.hive.udf;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class Strip extends UDF {

private Text result = new Text();

public Text evaluate(Text str) {

if (str == null) {

return null;

}

//除去字符串前后空白字符

result.set(StringUtils.strip(str.toString()));

return result;

}

public Text evaluate(Text str, String stripChars) {

if (str == null) {

return null;

}

//去除字符串前后给定的字符

result.set(StringUtils.strip(str.toString(), stripChars));

return result;

}

导出jar包并添加到hive路径,创建函数

hive> add jar /home/hive-udf.jar;

hive> create temporary function strip as 'org.lhyf.hive.udf.Strip';

创建表

create table t_test(

bee string,

banana string

)

row format delimited

fields terminated by ',';

导入数据格式为

#111* ,# %1@ *

!@##222* ,# %2@ *

鉴于方便查看,使用0代替空格显示

000#111*0,#0%1@0*

!@##222*0,#0%2@0*

执行查询

hive> select strip(bee) from t_test;

#111*

!@##222*

hive> select strip(banana,'#') from t_test;

%1@ *

%2@ *

使用0代替空格显示后

0%1@0*

0%2@0*