1 需求
要求补充缺失的日期和余额,缺失余额补充为上个最近一次时间的余额
2 表结构
3 期望结果
4 sql实现
1)方法一
select memberid
,create_dt
,balance
from (
select memberid
,date_add(create_dt,m) create_dt
,row_number() over(partition by memberid ,date_add(create_dt,m) order by create_dt desc) as rn
,balance
from(
select memberid
,create_dt
,lead(create_dt,1,create_dt) over(partition by memberid order by create_dt) next_create_dt
,balance
from member
) t1
lateral view posexplode (split (space( datediff(next_create_dt, create_dt)), ' ')) tmp as m,n
) t2
where rn = 1
在上述SQL中,我们先使用子查询得到每个会员的创建日期和下一个创建日期(如果有的话),然后使用datediff函数计算两个日期之间的天数差,并通过space函数生成一个以空格分隔的字符串,例如,如果天数差为5,则生成" "(5个空格)。
接下来,我们使用split函数将这个字符串按照空格分隔为一个字符串数组,然后使用posexplode函数对数组进行拆分,并生成一个新的列n表示数组的下标,以及一个新的列m表示数组的元素。
由于每次的创建日期都会多补一个和下次相邻创建日期相同的日期,所以需要对会员和补充后的日期进行分组,然后按照创建日期进行降序排列,过滤掉多补充的日期数据
2)方法二
select memberid
,date_add(create_dt,m)
,balance
from(
select memberid
,create_dt
,lead(create_dt,1,create_dt) over(partition by memberid order by create_dt) next_create_dt
,balance
from member
) t
lateral view posexplode (split (space( datediff(next_create_dt, create_dt)), ' (?!$)')) tmp as m,n;
方法二和方法一相比,少了去重。在使用lateral view和posexplode函数时,拆分字符串的分隔符需要使用正则表达式,并且在这个正则表达式中,’ (?!$)'表示匹配一个空格,并且这个空格不能在字符串的最后出现。这样可以确保在拆分字符串时不会将两个连续的空格视为一个分隔符。
3)方法三
select t3.memberid
,t3.curr_date as create_dt
,t4.balance
from (
select t1.memberid
,t1.curr_date
,max(t2.create_dt) over(partition by t1.memberid order by t1.curr_date) as dt
,balance
from (
select memberid
,date_add(start_dt,p) curr_date
from(
select memberid
,min(create_dt) start_dt
,max(create_dt) end_dt
from member
group by memberid
) t
lateral VIEW posexplode (
split (space( datediff(end_dt, start_dt)), " ")) tbl_idx AS p,v
) t1
left join member t2 on t1.memberid = t2.memberid and t1.curr_date = t2.create_dt
) t3
inner join member t4
on t3.memberid = t4.memberid and t3.dt = t4.create_dt
4)方法四
select t3.memberid
,t3.curr_date as create_dt
,first_value(balance) over(partition by t3.memberid ,t3.dt order by t3.curr_date) balance
from (
select t1.memberid
,t1.curr_date
,max(t2.create_dt) over(partition by t1.memberid order by t1.curr_date) as dt
,balance
from (
select memberid
,date_add(start_dt,p) curr_date
from(
select memberid
,min(create_dt) start_dt
,max(create_dt) end_dt
from member
group by memberid
) t
lateral VIEW posexplode (
split (space( datediff(end_dt, start_dt)), " ")) tbl_idx AS p,v
) t1
left join member t2 on t1.memberid = t2.memberid and t1.curr_date = t2.create_dt
) t3