腾讯面试官问:解释下时间滑动窗口函数?当时我不会,过后我猜应该是这个东西

在hive快速建表analyst.t_window:
(后面再写一篇用excel +(python现成脚本)快速建hive表的文章)

drop table if exists analyst.t_window;
create table analyst.t_window as select * from ( 
select 'jack' as name, '2015-01-01' as orderdate, 10 as cost
 union all 
select 'tony' as name, '2015-01-02' as orderdate, 15 as cost
 union all 
select 'jack' as name, '2015-02-03' as orderdate, 23 as cost
 union all 
select 'tony' as name, '2015-01-04' as orderdate, 29 as cost
 union all 
select 'jack' as name, '2015-01-05' as orderdate, 46 as cost
 union all 
select 'jack' as name, '2015-04-06' as orderdate, 42 as cost
 union all 
select 'tony' as name, '2015-01-07' as orderdate, 50 as cost
 union all 
select 'jack' as name, '2015-01-08' as orderdate, 55 as cost
 union all 
select 'mart' as name, '2015-04-08' as orderdate, 62 as cost
 union all 
select 'mart' as name, '2015-04-09' as orderdate, 68 as cost
 union all 
select 'neil' as name, '2015-05-10' as orderdate, 12 as cost
 union all 
select 'mart' as name, '2015-04-11' as orderdate, 75 as cost
 union all 
select 'neil' as name, '2015-06-12' as orderdate, 80 as cost
 union all 
select 'mart' as name, '2015-04-13' as orderdate, 94 as cost) t

得到analyst.t_window如下:
在这里插入图片描述
使用window子句:
UNBOUNDED PRECEDING:起点
UNBOUNDED FOLLOWING:终点
CURRENT ROW:当前行
PRECEDING:往前
FOLLOWING:往后
sum()over(partition by 人名 order by 时间):按照name进行分区,按照购物时间进行排序,做cost的累加

select name,orderdate,
cost,sum(cost) over(partition by name order by orderdate ) as one,--order by默认是,在一个partition内,这个partition的首行聚合到本行
cost,sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row) as two,--这个partition的首行聚合到本行
cost,sum(cost) over(partition by name order by orderdate rows between 1 preceding and current row) as three,--前一行聚合到本行
cost,sum(cost) over(partition by name order by orderdate rows between 1 preceding and 1 following) as four,--本行以及前后一行汇聚
cost,sum(cost) over(partition by name order by orderdate rows between current row and unbounded following) as five--本行汇聚到最后一行
from analyst.t_window

时间(order date订单日期)滑动窗口(可见图中的红蓝窗口正在滑动)函数(sum就是这样的函数吧),所以我猜想:这种就是当时我面试时没反应过来的时间滑动窗口函数吧 !很简单呀!(but当时我就是没想出来嘤嘤嘤)在这里插入图片描述

注意:
可以观察到:

sum(cost) over(partition by name order by orderdate ) as one
与
sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row) as two

产生一样的结果
所以使用了order by子句,未使用window子句的情况下(如sum(cost) over(partition by name order by orderdate ) as one),order by默认情况下聚合从起始行到当前行的数据

引用:
https://blog.youkuaiyun.com/weixin_38750084/article/details/82779910?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值