交叉重复问题
题目

数据
oppo 2021-06-05 2021-06-09
oppo 2021-06-11 2021-06-21
VIVO 2021-06-05 2021-06-15
VIVO 2021-06-09 2021-06-21
redmi 2021-06-05 2021-06-21
redmi 2021-06-09 2021-06-15
redmi 2021-06-17 2021-06-26
huawei 2021-06-05 2021-06-26
huawei 2021-06-09 2021-06-15
huawei 2021-06-17 2021-06-21
建表
create table if not exists intersection(
name string,
sdt date,
edt date
)row format delimited fields terminated by ‘\t’;
加载数据
load data local inpath ‘/opt/module/data/hive-interviews/intersection’ into table intersection;
解题步骤
解题步骤1 先将当前行以前的数据中最大的edt放置到当前行
sql语句
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection;
结果
name sdt edt maxedt
VIVO 2021-06-05 2021-06-15 NULL
VIVO 2021-06-09 2021-06-21 2021-06-15
huawei 2021-06-05 2021-06-26 NULL
huawei 2021-06-09 2021-06-15 2021-06-26
huawei 2021-06-17 2021-06-21 2021-06-26
oppo 2021-06-05 2021-06-09 NULL
oppo 2021-06-11 2021-06-21 2021-06-09
redmi 2021-06-05 2021-06-21 NULL
redmi 2021-06-09 2021-06-15 2021-06-21
redmi 2021-06-17 2021-06-26 2021-06-21
解题步骤2:比较sdt和maxEdt
如果maxEdt大,则替换,null不操作,maxEdt小不操作
sql语句
select
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
结果
t1.name sdt t1.edt
VIVO 2021-06-05 2021-06-15
VIVO 2021-06-15 2021-06-21
huawei 2021-06-05 2021-06-26
huawei 2021-06-26 2021-06-15
huawei 2021-06-26 2021-06-21
oppo 2021-06-05 2021-06-09
oppo 2021-06-11 2021-06-21
redmi 2021-06-05 2021-06-21
redmi 2021-06-21 2021-06-15
redmi 2021-06-21 2021-06-26
解题步骤3 求每个的打折时间
sql语句
select
t2.name,datediff(t2.edt,t2.sdt) days
from(
select
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
)t2
结果
t2.name days
VIVO 10
VIVO 6
huawei 21
huawei -11
huawei -5
oppo 4
oppo 10
redmi 16
redmi -6
redmi 5
解题步骤4:求出最大打折时间
sql语句
select t3.name,sum(if(t3.days>0,t3.days,0))
from(
select
t2.name,datediff(t2.edt,t2.sdt) days
from(
select
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
)t2
)t3
group by t3.name
结果
t3.name _c1
VIVO 16
huawei 21
oppo 14
redmi 21
本文详细介绍了如何使用SQL解决数据表中的交叉重复问题,通过建表、加载数据,然后逐步进行数据处理,包括获取每个条目的最大结束日期、比较并替换时间,最后计算每个时间段的打折天数,以及找出最大打折时间。这个过程对于大数据分析和处理具有重要意义。
3557

被折叠的 条评论
为什么被折叠?



