5、Hive数据仓库——Hive分区及动态分区

原创

已于 2022-02-20 16:18:02 修改 · 5.2k 阅读

44 ·

CC 4.0 BY-SA版权

文章标签：

#hive #数据仓库 #hadoop

于 2022-02-20 15:08:49 首次发布

文章目录

Hive分区
Hive动态分区

Hive分区

分区的概念和分区表：

分区表指的是在创建表时指定分区空间，实际上就是在hdfs上表的目录下再创建子目录。

在使用数据时如果指定了需要访问的分区名称，则只会读取相应的分区，避免全表扫描，提高查询效率。

作用：进行分区裁剪，避免全表扫描，减少MapReduce处理的数据量，提高效率。

一般在公司的hive中，所有的表基本上都是分区表，通常按日期分区、地域分区。

分区表在使用的时候记得加上分区字段。

分区也不是越多越好，一般不超过3级，根据实际业务衡量。

建立分区表

建立外部表的时候external一般和LOCATION一同使用

create external table students_pt1
(
    id bigint
    ,name string
    ,age int
    ,gender string
    ,clazz string
)
PARTITIONED BY(pt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/student/input1';

这个时候多了一个字段：# Partition Information

增加分区

alter table students_pt1 add partition(pt='20220220');

alter table students_pt1 add partition(pt='20220219');

alter table students_pt1 add partition(pt='20220218');

alter table students_pt1 add partition(pt='20220221');

alter table students_pt1 add partition(pt='20220222');

alter table students_pt1 add partition(pt