HIVE入门知识二、建表，查询，运用等

最新推荐文章于 2024-07-28 08:00:00 发布

原创最新推荐文章于 2024-07-28 08:00:00 发布 · 466 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#hive #大数据

大数据专栏收录该内容

23 篇文章

订阅专栏

本文深入探讨Hive SQL的高级特性，包括CTAS、WITH子句、临时表、数据装载、分区表（静态与动态）、视图操作等。解析各种语句的应用场景及操作方法，助力提升Hive查询效率和数据管理能力。

Hive建表语句

创建内部表：
– 创建一个内部表
create table if not exists student(
id int, name string
)
row format delimited fields terminated by ‘\t’
stored as textfile
location ‘/home/hadoop/hive/warehouse/student’;

在这里插入图片描述
– 查询表的类型
desc formatted student;

Hive建表语句解析：
在这里插入图片描述
Hive建表高阶语句 - CTAS and WITH
CTAS – as select方式建表
employee为已建立的表，ctas_employee要建的新表
CREATE TABLE ctas_employee as SELECT * FROM employee;
创建临时表
临时表是应用程序自动管理在复杂查询期间生成的中间数据的方法
表只对当前session有效，session退出后自动删除
表空间位于/tmp/hive-<user_name>(安全考虑)
如果创建的临时表表名已存在，实际用的是临时表
CREATE TEMPORARY TABLE tmp_table_name1 (c1 string);
装载数据：LOAD
LOAD用于在Hive中移动数据
LOAD DATA LOCAL INPATH ‘/home/dayongd/Downloads/employee.txt’
OVERWRITE INTO TABLE employee;
– 加LOCAL关键字，表示原始文件位于Linux本地，执行后为拷贝数据
LOAD DATA LOCAL INPATH ‘/home/dayongd/Downloads/employee.txt’
OVERWRITE INTO TABLE employee_partitioned PARTITION (year=2014, month=12);
– 没有LOCAL关键字，表示文件位于HDFS文件系统中,执行后为直接移动数据
LOAD DATA INPATH ‘/tmp/employee.txt’
OVERWRITE INTO TABLE employee_partitioned PARTITION (year=2017, month=12);

LOCAL：指定文件位于本地文件系统，执行后为拷贝数据
OVERWRITE：表示覆盖表中现有数据

Hive分区（Partition）
分区主要用于提高性能
分区列的值将表划分为一个个的文件夹
查询时语法使用"分区"列和常规列类似
查询时Hive会只从指定分区查询数据，提高查询效率
分为静态分区和动态分区

Hive分区操作 - 静态分区

创建分区表
create table dept_partition(
deptno int,
dname string,
loc string )
partitioned by (month string)
row format delimited fields terminated by ‘\t’;

通过PARTITINED BY定义分区

静态分区操作
– 添加分区
alter table dept_partition add partition(month=‘201906’) ;
alter table dept_partition add partition(month=‘201905’) partition(month=‘201904’);
– 删除分区
alter table dept_partition drop partition (month=‘201904’);
alter table dept_partition drop partition (month=‘201905’), partition (month=‘201906’);

ALTER TABLE的方式添加静态分区
ADD添加分区， DROP删除分区
分区表操作
– 查看分区表有多少分区
show partitions dept_partition;
– 加载数据到分区表中
load data local inpath ‘/opt/datas/dept.txt’ into table dept_partition partition(month=‘201909’);
可以创建多级分区
– 创建二级分区表
create table dept_partition2(
deptno int, dname string, loc string)
partitioned by (month string, day string)
row format delimited fields terminated by ‘\t’;
– 加载数据到二级分区表中
load data local inpath ‘/opt/datas/dept.txt’ into table dept_partition2 partition(month=‘201909’, day=‘13’);

Hive分区操作 - 动态分区

使用动态分区需设定属性
et hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
设置分区模式为非严格模式

动态分区建表语句和静态分区相同
动态分区插入数据
insert into dynamic_people
partition(year,month)
select id,name,age,start_date,year(start_date),month(start_date)
from people;
– 查询数据
select * from dynamic_people where year = 2018;

Hive视图操作
视图操作命令
CREATE、SHOW、DROP、ALTER

– 创建视图，支持 CTE, ORDER BY, LIMIT, JOIN,等
CREATE VIEW view_name AS SELECT statement;
– 查找视图 (SHOW VIEWS 在 hive v2.2.0之后)
SHOW TABLES;
– 查看视图定义
SHOW CREATE TABLE view_name;
– 删除视图
DROP view_name;
–更改视图属性
ALTER VIEW view_name SET TBLPROPERTIES (‘comment’ = ‘This is a view’);
– 更改视图定义,
ALTER VIEW view_name AS SELECT statement;