蚂蚁金服试题hive/sql/hql

最新推荐文章于 2023-08-24 10:54:55 发布

原创最新推荐文章于 2023-08-24 10:54:55 发布 · 582 阅读

0 ·

CC 4.0 BY-SA版权

Author:baiyun ,Email:mitbaiyun@163.com

文章标签：

#蚂蚁金服试题 hive sql hql

本文详细解析了如何通过SQL查询，分析蚂蚁森林中用户连续三天及以上每天碳减排超过100g的低碳行为，展示了从数据筛选到最终结果呈现的全过程。

d

原理如图 SELECT username,time,SUM(carbon_g) c2,ROW_NUMBER()
OVER(PARTITION by username order by time ) rn FROM
test191128.user_low_carbon GROUP by username,time HAVING c2 >=100
//t1
------------------------- SELECT username,time,c2,date_sub(regexp_replace(time,’/’,’-’),rn) ds FROM
t1
--------------------------------------- SELECT username,time,c2,date_sub(regexp_replace(time,’/’,’-’),rn) ds FROM
(SELECT username,time,SUM(carbon_g) c2,ROW_NUMBER() OVER(PARTITION
by username order by time ) rn FROM test191128.user_low_carbon GROUP
by username,time HAVING c2 >=100)t1 //t2
----------------- SELECT username,time,COUNT(ds)OVER(PARTITION by username,ds ORDER by time rows BETWEEN UNBOUNDED PRECEDING and
UNBOUNDED FOLLOWING) cn FROM t2
----------------------------------- SELECT username,time,COUNT(ds)OVER(PARTITION by username,ds ORDER by time
rows BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) cn FROM
(SELECT username,time,c2,date_sub(regexp_replace(time,’/’,’-’),rn)
ds FROM (SELECT username,time,SUM(carbon_g) c2,ROW_NUMBER()
OVER(PARTITION by username order by time ) rn FROM
test191128.user_low_carbon GROUP by username,time HAVING c2 >=100)t1
)t2 //t4
------------------------------------- SELECT * FROM t4 WHERE cn >=3
------------------------ SELECT * FROM (SELECT username,time,COUNT(ds)OVER(PARTITION by username,ds ORDER by time
rows BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) cn FROM
(SELECT username,time,c2,date_sub(regexp_replace(time,’/’,’-’),rn)
ds FROM (SELECT username,time,SUM(carbon_g) c2,ROW_NUMBER()
OVER(PARTITION by username order by time ) rn FROM
test191128.user_low_carbon GROUP by username,time HAVING c2 >=100)t1
)t2 )t4 WHERE cn >=3 //t5
------------------------------ SELECT FROM user_low_carbon t6 JOIN t5 on t6.username=t5.username and t6.time=t5.time
--------------------------------- 最后的sql SELECT t6.* FROM test191128.user_low_carbon t6 JOIN (SELECT * FROM (SELECT
username,time,COUNT(ds)OVER(PARTITION by username,ds ORDER by time
rows BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) cn FROM
(SELECT username,time,c2,date_sub(regexp_replace(time,’/’,’-’),rn)
ds FROM (SELECT username,time,SUM(carbon_g) c2,ROW_NUMBER()
OVER(PARTITION by username order by time ) rn FROM
test191128.user_low_carbon GROUP by username,time HAVING c2 >=100)t1
)t2 )t4 WHERE cn >=3)t5 on t6.username=t5.username and
t6.time=t5.time 蚂蚁森林低碳用户排名分析问题：查询user_low_carbon表中每日流水记录，条件为：
用户在2017年，连续三天（或以上）的天数里，每天减少碳排放（low_carbon）都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下，因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g：提供的数据说明：
user_low_carbon： u_001 2017/1/1 10 u_001 2017/1/2 150
u_001 2017/1/2 110 u_001 2017/1/2 10 u_001 2017/1/4 50
u_001 2017/1/4 10 u_001 2017/1/6 45 u_001 2017/1/6 90
u_002 2017/1/1 10 u_002 2017/1/2 150 u_002 2017/1/2 70
u_002 2017/1/3 30 u_002 2017/1/3 80 u_002 2017/1/4 150
u_002 2017/1/5 101 u_002 2017/1/6 68