Hive分区表

最新推荐文章于 2024-12-01 18:22:59 发布

原创最新推荐文章于 2024-12-01 18:22:59 发布 · 166 阅读

0 ·

CC 4.0 BY-SA版权

Hive 专栏收录该内容

9 篇文章

订阅专栏

分区表：

分区表实际上就是对应一个HDFS文件系统上独立的文件夹，该文件夹下是该分区所有的数据文件，Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成更小的数据集；

在查询时通过where子句中的表达式来选择查询所需要的指定的分区，这样的查询效率会提高很多；

CREATE EXTERNAL TABLE IF NOT EXISTS default.emp_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
partitioned by(month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

加载数据到分区表中：
load data local inpath '/home/wql/app/hData/emp.txt' into table emp_partition partition(month='20190307');

查询：
select * from emp_partition where month='20190307';

select count(distinct ip) from emp_partition where month='20190307';


CREATE EXTERNAL TABLE IF NOT EXISTS default.emp_partition2(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
partitioned by(month string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

查询：
select count(distinct ip) from emp_partition where month='201903' and day='07';

创建dept表：
CREATE TABLE IF NOT EXISTS default.dept_nopart(
deptno int,
dname string,
loc string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

创建dept分区表：
CREATE TABLE IF NOT EXISTS default.dept_part(
deptno int,
dname string,
loc string)
partitioned by(day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

创建分区目录：
dfs -mkdir -p /user/hive/warehouse/dept_part/day=20190307;

出现的问题：将数据put到分区中，select不到数据，因为元数据中不知道该分区的存在；
dfs -put /home/wql/app/hData/dept.txt /user/hive/warehouse/dept_part/day=20190307；

解决问题：修复分区
msck repair table dept_part;
alter table dept_part add partiton(day='20190307');