hiveSql 京东面试题-有效值问题

hiveSql 京东面试题-有效值问题

需求

有入库成本表,一个商品每次入库就会产生一条数据,里面包含商品id,入库时间time,以及入库采购的成本。但由于某些某些原因,导致表中某些数据的成本是有丢失的。
现在的逻辑是:当成本丢失时,有两种取成本的方式,现在需要把两种成本都取出来,最后取2次成本的平均值作为本次入库的成本。取数逻辑如下:

  • 1、取同一个商品最近一次之前入库的有效成本,即丢失成本商品的丢失成本当前数据的前一条有效成本数据
  • 2、取同一个商品最近一次之后入库的有效成本,即丢失成本商品的丢失成本当前数据的后一条有效成本数据
  • 3、上述中结果依然有无效值时,记为0

具体数据如下:
在这里插入图片描述
可见截图中商品id为2的商品在2022-12-02号和2022-12-03号的入库成本丢失,按照上述取数逻辑,会生成两个新的字段last_cost、next_cost。其中
last_cost是当前丢失成本数据的前一条有效成本数据;
next_cost是当前丢失成本数据的后一条有效成本数据。

还是看商品id为2的数据,在2022-12-02号这条丢失成本数据中:
它的last_cost是商品id同样是2,且它的上一条有效成本数据,即2022-12-01的150,
它的next_cost是商品id同样是2,且它的下一条有效成本数据,即2022-12-04的200。
即上截图中第一条填充色为红色的数据行。

同理id为2的2022-12-03号数据也是它的上一行有效成本 和 它的下一条有效成本。
最后一条商品id为4的2022-12-05号丢失成本数据中next_cost为0,因为它没有下一条有效成本。(即上述逻辑3)

准备数据

select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price

在这里插入图片描述

分析

上述需求中可以看出,其实想要补充丢失成本行的数据,只要拿到相对当前丢失成本数据的前、后同商品的最近有效成本,不论有多少条连续的丢失成本数据行,见下图:
在这里插入图片描述
只要做到将丢失成本数据行与它的前、后有效成本利用重分组思想将他们分组在一组中,取组内max值即可。

实现

一、分组

with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)

select 
    id,itime,price,
    sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
    sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp

先利用重分组思想,根据price值是否为null为界限,顺序,逆序sum开窗,即可将丢失成本数据与它相对应的前、后有效成本分到同一组中。

last_cost分组,可见丢失成本的数据行已经和它的前一行有效成本行分在一组
在这里插入图片描述

next_cost分组,可见丢失成本的数据行已经和它的后一行有效成本行分在一组在这里插入图片描述
二、组内取最大price
按照商品id和last_index、next_index分组,取组内最大的price,其中nullprice赋0值。

with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
    max(if(price is null,0,price)) over(partition by id,next_index) as next_price
from
    (select 
        id,itime,price,
        sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
        sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
    from tmp
    ) t;

在这里插入图片描述
三、取平均值作为最后的成本
取每条数据last_price和next_price的平均值作为最后的成本数据


with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    case when price is null then (last_price + next_price) / 2 else price end as last_price
from
    (select 
        id,itime,price,
        max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
        max(if(price is null,0,price)) over(partition by id,next_index) as next_price
    from
        (select 
            id,itime,price,
            sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
            sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
        from tmp
        ) t
    ) t1;

在这里插入图片描述

最后

喜欢的点赞、关注、收藏吧~ 你的支持是最大的创作动力~~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

@nanami

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值