至今指标是什么
至今指标是什么鬼呢?请看下面的图片:
在上图中有月至今销售额,月至今销售额是从每天所在月的1号开始将销售额累加到当前日期。例如,在上图中,在 2020-01-11 这行上,月至今单量=01号销售额+02号销售额+…+ 11号销售额。
说完,月至今的,再来说说,年至今的销售额=1月1号销售额+1月2号销售额+…+1月31号销售额+…+2月1号销售额+13月31号销售额
周至今的销售额=周一销售额+周二销售额+周三销售额+周四销售额+周五销售额+周六销售额+周日销售额
这就是所谓的至今指标。
全量如何计算
说完来至今指标是什么。我们接下来说说用什么办法计算出至今指标。
首先,我们来谈谈使用开窗函数计算至今值。请看,
with detail as (
select '2019-06-06' as sale_date , 'shop1' as shop_id , 10 as sale_amt
union all select '2019-06-05' as sale_date , 'shop1' as shop_id , 14 as sale_amt
union all select '2019-06-03' as sale_date , 'shop1' as shop_id , 17 as sale_amt
union all select '2019-06-02' as sale_date , 'shop1' as shop_id , 18 as sale_amt
union all select '2019-06-01' as sale_date , 'shop1' as shop_id , 13 as sale_amt
union all select '2019-06-06' as sale_date , 'shop2' as shop_id , 11 as sale_amt
union all select '2019-06-05' as sale_date , 'shop2' as shop_id , 15 as sale_amt
union all select '2019-06-03' as sale_date , 'shop2' as shop_id , 18 as sale_amt
union all select '2019-06-02' as sale_date , 'shop2' as shop_id , 19 as sale_amt
union all select '2019-06-01' as sale_date , 'shop2' as shop_id , 16 as sale_amt
), dim_date as (
select '2019-06-01' as date_d , '2019-06' as month_code
union all select '2019-06-02' as date_d , '2019-06' as month_code
union all select '2019-06-03' as date_d , '2019-06' as month_code
union all select '2019-06-04' as date_d , '2019-06' as month_code
union all select '2019-06-05' as date_d , '2019-06' as month_code
union all select '2019-06-06' as date_d , '2019-06' as month_code
union all select '2019-06-07' as date_d , '2019-06' as month_code
union all select '2019-06-08' as date_d , '2019-06' as month_code
union all select '2019-06-09' as date_d , '2019-06' as month_code
union all select '2019-06-10' as date_d , '2019-06' as month_code
union all select '2019-06-11' as date_d , '2019-06' as month_code
union all select '2019-06-12' as date_d , '2019-06' as month_code
)
select a.sale_date
, a.shop_id
, a.sale_amt
, sum(a.sale_amt) over(partition by b.month_code , a.shop_id order by a.sale_date) as agg_sale_amt
from detail as a
left join dim_date as b
on a.sale_date = b.date_d
上面这段 query 使用开窗函数来实现的。
我们还会遇到下面的问题。
从图中,我们可看到,在2020-01-03中 B 没有营业,所以没有销售额了。在2020-01-04月份,可能因为疫情的原因,A、B两家店都没有营业。那我们在2020-01-03的月至今指标的时候,使用开窗函数的话,就不能计算出 B 门店的销售金额了。
应该怎么做呢?一个先做一个表,此表包含了日期、门店的信息,大概就是下面的样子吧。
我们管这个表叫做full_dim
吧,然后,我们可以像下面这样写了。
with full_dim as (
select '2019-01-01' as date_d, '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-02' as date_d, '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-03' as date_d , '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-04' as date_d , '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-05' as date_d , '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-06' as date_d , '2019-01' as month_code , 'shop1' as shop_code
union all select '2019-01-02' as date_d , '2019-01' as month_code , 'shop2' as shop_code
union all select '2019-01-03' as date_d , '2019-01' as month_code , 'shop2' as shop_code
union all select '2019-01-04' as date_d , '2019-01' as month_code , 'shop2' as shop_code
union all select '2019-01-05' as date_d , '2019-01' as month_code , 'shop2' as shop_code
union all select '2019-01-06' as date_d , '2019-01' as month_code , 'shop2' as shop_code
)
select a.date_d
,a.shop_code
,sum(b.sale_amt) over(partiton by a.shop_code order by a.date_d) as sale_amt
from full_dim as a
left join fct_sale_d as b
on a.shop_code = b.shop_code
and a.date_d = b.date_d
full_dim
的作用是补全所有的维度值。这样补一下,上面说的补连续的问题就解决了。
full_dim
的生成方式,可以使用维度表拼接而成,另外,可以使用事实表来做。
维度表拼接:
select a.date_d
,b.shop_code
,b.shop_name
from dim_date as a-- 日期维表
cross join dim_store as b -- 门店维表
事实表的方式:
select shop_code , shop_name , sale_date
from fct_sale_d
where ts between 'start_date' and 'end_date'
group by shop_code , shop_name , sale_date
增量如何计算
开窗函数可以实现全量的方式来计算至今值。接下来是说说增量的方式了。
我归纳了下面几个步骤:
- 计算T的日期门店销售额
- 计算出T+1的日期门店销售额
- T 和 T+1 的数据做 full join
- 使用 coalesce() 实现相同门店日期下的值相加,不同日期门店的值设置初始值。
- 使用流程判断语句来判断,周至今、月至今、年至今的开始。
第一、计算T的日期门店销售额:
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T' and 'T'
第二、计算出T+1的日期门店销售额
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T+1' and 'T+1'
第三、T 和 T+1 的数据做 full join
with table_t as (
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T' and 'T'
), table_t_1 as (
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T+1' and 'T+1'
)
select 'T+1' as sale_date
, coalesce(a.shop_code ,b.shop_code) as shop_code
, coalesce(a.shop_name ,b.shop_name) as shop_name
, coalesce(a.sale_amt ,0)+coalesce(b.sale_amt ,0) as sale_amt
from table_t as a
full join table_t_1 as b
on a.shop_code = b.shop_code
第四、使用流程判断语句来判断,周至今、月至今、年至今的开始
with table_t as (
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T' and 'T'
), table_t_1 as (
select sale_date
,shop_code
,shop_name
,sum(sale_amt) as sale_amt
from fct_sale_d
where ts between 'T+1' and 'T+1'
) , dim_date as (
select date_d , month_code , month_first_day_ind
from dim_date
where date_d between 'T+1' and 'T+1'
)
select 'T+1' as sale_date
, coalesce(a.shop_code ,b.shop_code) as shop_code
, coalesce(a.shop_name ,b.shop_name) as shop_name
-- 如果sale_date为所在月的第一天,那不用向sale_amt字段中加入 T 的销售额
, if(c.month_first_day_ind= 1 , 0 , coalesce(a.sale_amt ,0))+coalesce(b.sale_amt ,0) as sale_amt
from table_t as a
full join table_t_1 as b
on a.shop_code = b.shop_code
left join dim_date as c
on c.date_d = a.sale_date
增量和全量计算的优缺点分析
总结如下所示:
- 全量的方式适用于跑历史数据。例如,要求跑 1 年的历史数据,我们可以跑一个时间段里面的数据。
- 增量的方式适用于每天更新情况,这中方式天然的解决维度在日期上连续的缺点。但是在跑历史数据的时候,需要每天执行一次,大家知道,一个计算平台为一个查询任务准备运行资源过程都是一样的,一个任务真正执行可能用不了多长时间,但是启动和停止任务用的时间是差不多的,所以这不利于跑历史数据。