Hive SQL整理

最新推荐文章于 2024-08-25 23:18:54 发布

#Super Pig

最新推荐文章于 2024-08-25 23:18:54 发布

阅读量476

点赞数

分类专栏：学习日记

本文链接：https://blog.youkuaiyun.com/zyl_wjl_1413/article/details/112625355

版权

学习日记专栏收录该内容

11 篇文章

订阅专栏

collect_set

作用：把一个字段中的值组合到一个集合中，搭配where（筛选）、group by使用；
用途：可将非group by字段进行转换，使其可以查询；
示例：

-- id：用户；time：登陆日期
select id,collect_set(time) as t from t_action_login where time<='20150906' group by id

该查询结果如下：

  123@163.com                                                                                      | ["20150620","20150619"]                                                                                    |
| abc@163.com                                                                                      | ["20150816"]                                                                                               |
| cde@qq.com                                                                                       | ["20150606","20150608","20150607","20150609","20150613","20150610","20150616","20150615"]                  |
| 789@sohu.com                                                                                     | ["20150827","20150623","20150627","20150820","20150823","20150612","20150717"]                             |
| 987@163.com                                                                                      | ["20150701","20150829","20150626","20150625","20150726","20150722","20150629","20150824","20150716","20150 |
| ddsf@163.com                                                                                     | ["20150804","20150803","20150801","20150809","20150807","20150806","20150905","20150904","20150730","20150 |
| 182@163.com                                                                                      |["20150803","20150801","20150809","20150808","20150805","20150806","20150906","20150904","20150730","20150 |
| 22225@163.com                                                                                    | ["20150604","20150609","20150622","20150827","20150625","20150620","20150613","20150610","20150614","20150 |
| 18697@qq.com                                                                                     | ["20150902"]                                                                                               |
| 1905@qq.com                                                                                      | ["20150709"]

该结果显示每个用户在2015年9月6日之前的全部登陆时间

场景应用：在hive中求出一个数据表中在2015年9月6日首次登陆的人数
对应SQL语句：

select count(a.id) from (select id,collect_set(time) as t from t_action_login where time<='20150906' group by id) as a where size(a.t)=1 and a.t[0]='20150906'

注意：size(a.t)=1 and a.t[0]='20150906'

参考：https://blog.youkuaiyun.com/liyantianmin/article/details/48262109

group by

hive不允许直接查询访问非group by的字段，错误示例：

select name,sex from people group by sex;
该语句会报错，因为name是非group by字段，解决方法如下：

只查询group by字段：
select sex from people group by sex;
对于非group by字段，只通过聚合函数查询：
select sex,count(name) from people group by sex;
对于非group by字段，通过collect_set等函数转换后查询：
select sex,collect_set(name)[0] as name from people group by sex;
注：可以用原字段名称；

grouping sets

用途：可以按照多个标准进行分组，增强group by的功能；
示例：查询每个部门每类工作平均工资、每个部门的平均工资及整个企业中每个雇员的平均工资

select department_id,job_id,avg(salary)
from hr.employees
group by grouping sets ((department_id,job_id),department_id,()); 
-- ()表示对整体进行一次聚类
-- 也可以写成：
select department_id,job_id,avg(salary)
from hr.employees
group by department_id,grouping sets ((department_id,job_id),());

limit

用途:限制查询/返回的条数;
用法:
- 指定初始位置: limit 初始位置,条数
- 不指定初始位置: limit 条数
- 由offset指定初始位置: limit 条数 offset 初始位置
说明:
- 初始位置:下标(从0开始),表示从该位置开始返回;
- 条数:表示返回的记录数

参考:https://blog.youkuaiyun.com/qq_33801641/article/details/107196036