hive计划（二）分区

最新推荐文章于 2025-12-11 19:28:33 发布

原创最新推荐文章于 2025-12-11 19:28:33 发布 · 220 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#hive

hive 专栏收录该内容

2 篇文章

订阅专栏

partition 相当于索引，避免全表扫描

使用外部表不会清除HDFS文件系统的数据

#使用hive -e 可以执行多条语句
hive -e 'sentence1; sentence2;'
#进入hive 后查看当前路径
!pwd;
#hive 界面使用命令查看hdfs路径
dfs -ls / ;
#使用desc可以查看表的信息
desc <table.name>
#查看分区表的分区信息
show partitions <table.name>

通过文件存储的位置来看分区：

一般分区信息不存在于load的表中

--分区表的创建
create database if not exists bikepatition
comment 'test database';
create table if not exists bikepatition.bike(
tripduration string, 
starttime string,
stoptime string,
start_station_id string,
start_station_name string,
start_station_latitude string,
start_station_longitude string,
end_station_id string,
end_station_name string,
end_station_latitude string,
end_station_longitude string,
bikeid string,
usertype string，
birth_year string，
gender string
)
--注意patition要放在ROW语句前面
--表示按照月份分区
partitioned by (time_month string,creator string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS  TEXTFILE;

载入数据

--使用load加载并表明partitioned信息
--两个分区的标签信息，就会产生两级的目录
--可以load不同的表到同一个表中，建立不同的分区
load data local inpath '<绝对路径>' into table <table_name> partition(time_month = <分区标签>,creator = 'eric');

alter table <table_name> add partition(time_month=<标签>，creator='eric') location '<本地文件路径>';

--删除分区
alter table <table_name> drop partition(time_month=<标签> ,creator='eric');