文章目录
大家好啊,欢迎来到小张的频道,今天介绍一下Hive的动态分区,动态分区有一点非常重要就是查询一张表,这张表可以是普通表也可以是分区表,然后根据查询的最后一个字段动态的将数据分配到目标表的各个分区中,下面我们就举一个查询普通表的例子,其实查询分区表和查询普通表一样,就是查询时指明一下分区。
1. 创建分区表和普通表
下面我们是要查询普通表,然后动态的插入到分区表中
- 创建分区表
DROP TABLE IF EXISTS test.userinfo;
CREATE TABLE test.userinfo(
userid STRING COMMENT '用户编号',
mobile STRING COMMENT '手机号码',
regdate STRING COMMENT '注册日期')
COMMENT '用户信息'
PARTITIONED BY (dt string)
row format delimited fields terminated by ',';
- 创建普通表
drop table if exists test.tmp1;
create table test.tmp1 as select * from test.userinfo;
alter table test.tmp1 set serdeproperties('field.delim'=',');
说明:tmp1 是普通表,默认的字段分割符’\001’,所以这里我们要修改一下字段分隔符
2. 加载数据到普通表
– 数据(/opt/datas/hive/userinfo.dat)
001,13551111111,2020-03-01,2020-06-20
002,13561111111,2020-04-01,2020-06-20
003,13571111111,2020-05-01,2020-06-20
004,13581111111,2020-06-01,2020-06-20
002,13562222222,2020-04-01,2020-06-21
004,13582222222,2020-06-01,2020-06-21
005,13552222222,2020-06-21,2020-06-21
004,13333333333,2020-06-01,2020-06-22
005,13533333333,2020-06-21,2020-06-22
006,13733333333,2020-06-22,2020-06-22
001,13554444444,2020-03-01,2020-06-23
003,13574444444,2020-05-01,2020-06-23
005,13555554444,2020-06-21,2020-06-23
007,18600744444,2020-06-23,2020-06-23
008,18600844444,2020-06-23,2020-06-23
load data local inpath '/opt/datas/hive/userinfo.dat' into table test.tmp1;
+--------------+--------------+---------------+-------------+
| tmp1.userid | tmp1.mobile | tmp1.regdate | tmp1.dt |
+--------------+--------------+---------------+-------------+
| 001 | 13551111111 | 2020-03-01 | 2020-06-20 |
| 002 | 13561111111 | 2020-04-01 | 2020-06-20 |
| 003 | 13571111111 | 2020-05-01 | 2020-06-20 |
| 004 | 13581111111 | 2020-06-01 | 2020-06-20 |
| 002 | 13562222222 | 2020-04-01 | 2020-06-21 |
| 004 | 13582222222 | 2020-06-01 | 2020-06-21 |
| 005 | 13552222222 | 2020-06-21 | 2020-06-21 |
| 004 | 13333333333 | 2020-06-01 | 2020-06-22 |
| 005 | 13533333333 | 2020-06-21 | 2020-06-22 |
| 006 | 13733333333 | 2020-06-22 | 2020-06-22 |
| 001 | 13554444444 | 2020-03-01 | 2020-06-23 |
| 003 | 13574444444 | 2020-05-01 | 2020-06-23 |
| 005 | 13555554444 | 2020-06-21 | 2020-06-23 |
| 007 | 18600744444 | 2020-06-23 | 2020-06-23 |
| 008 | 18600844444 | 2020-06-23 | 2020-06-23 |
+--------------+--------------+---------------+-------------+
3. 使用动态分区加载数据到分区表
– 首先启用动态分区
set hive.exec.dynamic.partition=true; // 默认是true
set hive.exec.dynamic.partition.mode=nonstrict;
– 查询普通表动态插入分区表
insert overwrite table test.userinfo partition(dt)
select
userid,
mobile,
regdate,
dt
from
test.tmp1;
说明:上面那种写法还可以将 partition(dt) 去掉,最后也可以插入分区表中,可自行验证。所谓动态就是不用具体指明某一个分区日期,根据查询的最后一个字段,自行判断插入某个分区。
+------------------+------------------+-------------------+--------------+
| userinfo.userid | userinfo.mobile | userinfo.regdate | userinfo.dt |
+------------------+------------------+-------------------+--------------+
| 001 | 13551111111 | 2020-03-01 | 2020-06-20 |
| 002 | 13561111111 | 2020-04-01 | 2020-06-20 |
| 003 | 13571111111 | 2020-05-01 | 2020-06-20 |
| 004 | 13581111111 | 2020-06-01 | 2020-06-20 |
| 002 | 13562222222 | 2020-04-01 | 2020-06-21 |
| 004 | 13582222222 | 2020-06-01 | 2020-06-21 |
| 005 | 13552222222 | 2020-06-21 | 2020-06-21 |
| 004 | 13333333333 | 2020-06-01 | 2020-06-22 |
| 005 | 13533333333 | 2020-06-21 | 2020-06-22 |
| 006 | 13733333333 | 2020-06-22 | 2020-06-22 |
| 001 | 13554444444 | 2020-03-01 | 2020-06-23 |
| 003 | 13574444444 | 2020-05-01 | 2020-06-23 |
| 005 | 13555554444 | 2020-06-21 | 2020-06-23 |
| 007 | 18600744444 | 2020-06-23 | 2020-06-23 |
| 008 | 18600844444 | 2020-06-23 | 2020-06-23 |
+------------------+------------------+-------------------+--------------+
0: jdbc:hive2://hadoop102:10000> show partitions test.userinfo;
+----------------+
| partition |
+----------------+
| dt=2020-06-20 |
| dt=2020-06-21 |
| dt=2020-06-22 |
| dt=2020-06-23 |
+----------------+
4 rows selected (0.125 seconds)