020 Hive的分区表创建 Hive的分区

最新推荐文章于 2024-01-13 18:01:10 发布

原创最新推荐文章于 2024-01-13 18:01:10 发布 · 678 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Hive的分区表创建 #Hive的分区 #静态动态混合分区 #严格模式下阻止的五类查询

大数据专栏收录该内容

32 篇文章

订阅专栏

本文深入讲解Hive中的静态分区、动态分区及混合分区的概念与操作。涵盖了分区表的创建、数据加载、查询优化以及分区的修改、添加和删除等关键知识点。

（分到不同的目录去）第一行最后
在这里插入图片描述
分区的关键字 partitioned by
上面的语句 year就是表外字段
id comment dt 就是表内字段

hive的分区表创建

create table if not exists comm(
id int,
comment String,
dt String
)
partitioned by(year String)
row format delimited fields terminated by '\t'
;

在这里插入图片描述
hdfs dfs -ls /user/hive/warehouse/olqf.db/comm
现在没东西

加载三部分分别为三个文件 d1 d2 d3

1	this is comment1	2015-10-10
2	this is comment2	2015-10-10
3	this is comment3	2015-10-10

4	this is comment1	2016-11-13
5	this is comment2	2016-11-15
6	this is comment3	2016-11-16

7	this is comment1	2017-10-10
8	this is comment2	2017-10-10
9	this is comment3	2017-10-10

load data local inpath '/home/olddata/d1' into table comm partition(year='2015');
load data local inpath '/home/olddata/d2' into table comm partition(year='2016');
load data local inpath '/home/olddata/d3' into table comm partition(year='2017');

现在就有分区了

查询下结果

这样如果单个查询更快

创建两级分区

create table if not exists comm1(
id int,
comment String,
dt String
)
partitioned by(year String, month int)
row format delimited fields terminated by '\t'
;

在这里插入图片描述

load data local inpath '/home/olddata/d1' into table comm1 partition(year='2015',month=10);
load data local inpath '/home/olddata/d2' into table comm1 partition(year='2016',month=11);

在这里插入图片描述

两级目录

真正的数据

也可直接查询

对分区进行修改：
显示分区：show partitions comm1;
在这里插入图片描述
添加分区：alter table comm add partition(year=“2019”) partition(year=“2020”)

修该已经存在的分区：
修改元数据：
alter table comm partition(year=“2020”) rename to partition(year=“2018”)

在这里插入图片描述
指定分区对应到已有的数据：
alter table comm partition(year=“2018”) set location ‘hdfs://qf/user/hive/warehouse/olqf.db/comm/year=2016’;

现在2018分区没有数据

在这里插入图片描述
插曲

访问出错
之前的alter table comm partition(year=“2018”) set location ‘hdfs://qf/user/hive/warehouse/olqf.db/comm/year=2016’;

写成alter table comm partition(year=“2018”) set location ‘hdfs://hadoop01::9000/user/hive/warehouse/olqf.db/comm/year=2016’;

应该写成qf
因为我们使用的hadoop是高可用ha
没有使用普通的

添加分区时直接指定数据：
alter table comm add partition(year=“2021”) location ‘hdfs://qf/user/hive/warehouse/olqf.db/comm/year=2016’;

在这里插入图片描述
然后看看是否添加了数据

删除分区
alter table comm drop partition(year=“2021”);
alter table comm drop partition(year=“2020”), partition(year=“2019”);

注意第二句删除多个分区时间隔用逗号

而添加多个分区时用空格分隔

在这里插入图片描述

2021 2020 2019 都没了

====================================================================

Hive的分区

静态分区：对分区已经知道，并可以使用load方式加载数据
动态分区：对于分区未知，同时不能使用load方式加载数据
混合分区：静态和动态同时有

创建表加载数据查看

create table if not exists comm_tmp1(
id int,
comment String,
year String,
month String
)
row format delimited fields terminated by '\t'
;

load data local inpath '/home/olddata/ct' into table comm_tmp1;

在这里插入图片描述

再创建一个comm3分区表
然后我们看一下

动态分区加载数据的方式

在这里插入图片描述
这里告诉你如果要这样动态加载
需要将它设置为非严格模式
set hive.exec.dynamic.partition.mode=nonstrict;

在这里插入图片描述
再次加载

可以了
查看结果没问题

在这里插入图片描述
结果文件

混合分区：

create table if not exists comm4(
id int,
comment String
)
partitioned by(year String, month int)
row format delimited fields terminated by '\t'
;

先创建混合分区

在这里插入图片描述
然后要设置为严格模式然后再加载数据这里我们混合加载指定年份但没指定月份
可以的

set hive.exec.dynamic.partition.mode=strict;

如果为strict模式必须要至少指定一个静态如果全部都是动态的必要要nonstrict

insert into table comm4 partition(year=2016,month)
select id, comment, month from comm_tmp1
where year=2016
;

在这里插入图片描述
查看结果

====================================================

在这里插入图片描述

先设为严格模式再去尝试查询

在这里插入图片描述

2.分区不带where条件并且where条件中不带分区字段来过滤

同样select * from comm;也不允许你查
需要带where条件
select * from comm where year=2017;

hive在严格模式下你的查询不能没有分区字段来过滤

在这里插入图片描述

3.排序不带limit:

即带order by 的查询在严格模式下必须加入limit

select *
from comm3
where year = 2016
order by id desc
limit 2
;

不带limit就failed 带了就好了
在这里插入图片描述
这是结果

以上就是hive严格模式下阻止的五类查询

1.笛卡尔积查询（带join的）
2.分区不带where条件并且where条件中不带分区字段来过滤
3.order by排序不带limit:

020 Hive的分区表创建 Hive的分区

现在就有分区了 查询下结果 这样如果单个查询更快 创建两级分区

注意第二句 删除多个分区时间隔用逗号

而添加多个分区时用空格分隔

Hive的分区

动态分区加载数据的方式

先设为严格模式再去尝试查询

hive在严格模式下 你的查询不能没有分区字段来过滤

即带order by 的查询 在严格模式下 必须加入limit

以上就是hive严格模式下阻止的五类查询

还有两个就是 bigints 和 strings 作比较 bigints 和 doubles 作比较 这两者都不让查询 ’ 以及hive的静态分区 动态分区 混合分区 几个属性的设置 以及每一个的意义

现在就有分区了

查询下结果

这样如果单个查询更快

创建两级分区

注意第二句删除多个分区时间隔用逗号

hive在严格模式下你的查询不能没有分区字段来过滤

即带order by 的查询在严格模式下必须加入limit

还有两个就是
bigints 和 strings 作比较
bigints 和 doubles 作比较
这两者都不让查询
’
以及hive的静态分区动态分区混合分区几个属性的设置以及每一个的意义