1.hive基本架构
metadata 存储元信息
hadoop 集群
可与其他框架关联元数据
2.hive 数据类型
3.hive 定义
分桶 bucket
例如:
Create table t (user_id int,url string)partitioned by (dt string)
clustered by (user_id) INTO 96 buckets;
set hive.enforce.bucketing = true;
数据倾斜 skewed keys() on()
4.map-side
适合大表和小表关联,小表放到内存中,默认为reduce-side
使用方式:
select /*+MAPJOIN(b)*/ a.keyfrom a join b on a.key=b.key
5.order by 和 sort by
sort by为内部有序,全局无序,效率较高,一般与distribute by 连用
select s.ymd,s.symbol,s.price_close
from stocks s
distribute by s.symbol
sort by s.symbol,s.ymd;
可实现同一symbol下有序
select s.ymd,s.symbol,s.price_close
from stocks s
cluster by s.symbol;