至今指标总结

至今指标是什么

至今指标是什么鬼呢?请看下面的图片:

月至今销售额
在上图中有月至今销售额,月至今销售额是从每天所在月的1号开始将销售额累加到当前日期。例如,在上图中,在 2020-01-11 这行上,月至今单量=01号销售额+02号销售额+…+ 11号销售额。

说完,月至今的,再来说说,年至今的销售额=1月1号销售额+1月2号销售额+…+1月31号销售额+…+2月1号销售额+13月31号销售额

周至今的销售额=周一销售额+周二销售额+周三销售额+周四销售额+周五销售额+周六销售额+周日销售额

这就是所谓的至今指标。

全量如何计算

说完来至今指标是什么。我们接下来说说用什么办法计算出至今指标。
首先,我们来谈谈使用开窗函数计算至今值。请看,

with detail as (
             select '2019-06-06' as sale_date , 'shop1' as shop_id , 10 as sale_amt
   union all select '2019-06-05' as sale_date , 'shop1' as shop_id , 14 as sale_amt
   union all select '2019-06-03' as sale_date , 'shop1' as shop_id , 17 as sale_amt
   union all select '2019-06-02' as sale_date , 'shop1' as shop_id , 18 as sale_amt 
   union all select '2019-06-01' as sale_date , 'shop1' as shop_id , 13 as sale_amt 
   union all select '2019-06-06' as sale_date , 'shop2' as shop_id , 11 as sale_amt
   union all select '2019-06-05' as sale_date , 'shop2' as shop_id , 15 as sale_amt
   union all select '2019-06-03' as sale_date , 'shop2' as shop_id , 18 as sale_amt
   union all select '2019-06-02' as sale_date , 'shop2' as shop_id , 19 as sale_amt 
   union all select '2019-06-01' as sale_date , 'shop2' as shop_id , 16 as sale_amt
), dim_date as (
    select '2019-06-01' as date_d , '2019-06' as month_code
    union all select '2019-06-02' as date_d , '2019-06' as month_code
    union all select '2019-06-03' as date_d , '2019-06' as month_code
    union all select '2019-06-04' as date_d , '2019-06' as month_code
    union all select '2019-06-05' as date_d , '2019-06' as month_code
    union all select '2019-06-06' as date_d , '2019-06' as month_code
    union all select '2019-06-07' as date_d , '2019-06' as month_code
    union all select '2019-06-08' as date_d , '2019-06' as month_code        
    union all select '2019-06-09' as date_d , '2019-06' as month_code     
    union all select '2019-06-10' as date_d , '2019-06' as month_code     
    union all select '2019-06-11' as date_d , '2019-06' as month_code     
    union all select '2019-06-12' as date_d , '2019-06' as month_code            
)
select a.sale_date 
     , a.shop_id 
     , a.sale_amt
     , sum(a.sale_amt) over(partition by b.month_code , a.shop_id order by a.sale_date) as agg_sale_amt
from detail as a 
left join dim_date as b 
on a.sale_date = b.date_d 

上面这段 query 使用开窗函数来实现的。

我们还会遇到下面的问题。
在这里插入图片描述
从图中,我们可看到,在2020-01-03中 B 没有营业,所以没有销售额了。在2020-01-04月份,可能因为疫情的原因,A、B两家店都没有营业。那我们在2020-01-03的月至今指标的时候,使用开窗函数的话,就不能计算出 B 门店的销售金额了。

应该怎么做呢?一个先做一个表,此表包含了日期、门店的信息,大概就是下面的样子吧。

full_dim
我们管这个表叫做full_dim吧,然后,我们可以像下面这样写了。

with full_dim as (
  select '2019-01-01' as date_d, '2019-01' as month_code , 'shop1' as shop_code 
  union all select '2019-01-02' as date_d, '2019-01' as month_code , 'shop1' as shop_code 
  union all select '2019-01-03' as date_d , '2019-01' as month_code , 'shop1' as shop_code
  union all select '2019-01-04' as date_d , '2019-01' as month_code , 'shop1' as shop_code
  union all select '2019-01-05' as date_d , '2019-01' as month_code , 'shop1' as shop_code
  union all select '2019-01-06' as date_d , '2019-01' as month_code , 'shop1' as shop_code
  union all select '2019-01-02' as date_d , '2019-01' as month_code , 'shop2' as shop_code 
  union all select '2019-01-03' as date_d , '2019-01' as month_code , 'shop2' as shop_code
  union all select '2019-01-04' as date_d , '2019-01' as month_code , 'shop2' as shop_code
  union all select '2019-01-05' as date_d , '2019-01' as month_code , 'shop2' as shop_code
  union all select '2019-01-06' as date_d , '2019-01' as month_code , 'shop2' as shop_code  
)
select a.date_d 
      ,a.shop_code
      ,sum(b.sale_amt) over(partiton by a.shop_code order by a.date_d) as sale_amt
from full_dim as a
left join fct_sale_d as b 
on a.shop_code = b.shop_code 
and a.date_d = b.date_d 

full_dim的作用是补全所有的维度值。这样补一下,上面说的补连续的问题就解决了。

full_dim的生成方式,可以使用维度表拼接而成,另外,可以使用事实表来做。

维度表拼接:

select a.date_d 
       ,b.shop_code 
       ,b.shop_name 
from dim_date as a-- 日期维表
cross join dim_store as b -- 门店维表

事实表的方式:

select shop_code , shop_name , sale_date
from fct_sale_d 
where ts between 'start_date' and 'end_date' 
group by shop_code , shop_name , sale_date

增量如何计算

开窗函数可以实现全量的方式来计算至今值。接下来是说说增量的方式了。

我归纳了下面几个步骤:

  • 计算T的日期门店销售额
  • 计算出T+1的日期门店销售额
  • T 和 T+1 的数据做 full join
  • 使用 coalesce() 实现相同门店日期下的值相加,不同日期门店的值设置初始值。
  • 使用流程判断语句来判断,周至今、月至今、年至今的开始。

第一、计算T的日期门店销售额:

select sale_date 
      ,shop_code 
      ,shop_name 
      ,sum(sale_amt) as sale_amt 
from fct_sale_d 
where ts between 'T' and 'T' 

第二、计算出T+1的日期门店销售额

select sale_date 
      ,shop_code 
      ,shop_name 
      ,sum(sale_amt) as sale_amt 
from fct_sale_d 
where ts between 'T+1' and 'T+1' 

第三、T 和 T+1 的数据做 full join

with table_t as (
	select sale_date 
	      ,shop_code 
	      ,shop_name 
	      ,sum(sale_amt) as sale_amt 
	from fct_sale_d 
	where ts between 'T' and 'T' 
), table_t_1 as (
	select sale_date 
	      ,shop_code 
	      ,shop_name 
	      ,sum(sale_amt) as sale_amt 
	from fct_sale_d 
	where ts between 'T+1' and 'T+1' 
)
select 'T+1' as sale_date
      , coalesce(a.shop_code ,b.shop_code) as shop_code 
      , coalesce(a.shop_name ,b.shop_name) as shop_name 
      , coalesce(a.sale_amt ,0)+coalesce(b.sale_amt ,0) as sale_amt 
from table_t as a 
full join table_t_1 as b 
on a.shop_code = b.shop_code 

第四、使用流程判断语句来判断,周至今、月至今、年至今的开始

with table_t as (
	select sale_date 
	      ,shop_code 
	      ,shop_name 
	      ,sum(sale_amt) as sale_amt 
	from fct_sale_d 
	where ts between 'T' and 'T' 
), table_t_1 as (
	select sale_date 
	      ,shop_code 
	      ,shop_name 
	      ,sum(sale_amt) as sale_amt 
	from fct_sale_d 
	where ts between 'T+1' and 'T+1' 
) , dim_date as (
    select date_d , month_code , month_first_day_ind
    from dim_date 
    where date_d between 'T+1' and 'T+1' 
)
select 'T+1' as sale_date
      , coalesce(a.shop_code ,b.shop_code) as shop_code 
      , coalesce(a.shop_name ,b.shop_name) as shop_name 
      -- 如果sale_date为所在月的第一天,那不用向sale_amt字段中加入 T 的销售额
      , if(c.month_first_day_ind= 1 , 0 , coalesce(a.sale_amt ,0))+coalesce(b.sale_amt ,0) as sale_amt 
from table_t as a 
full join table_t_1 as b 
on a.shop_code = b.shop_code
left join dim_date as c 
on c.date_d = a.sale_date 

增量和全量计算的优缺点分析

总结如下所示:

  • 全量的方式适用于跑历史数据。例如,要求跑 1 年的历史数据,我们可以跑一个时间段里面的数据。
  • 增量的方式适用于每天更新情况,这中方式天然的解决维度在日期上连续的缺点。但是在跑历史数据的时候,需要每天执行一次,大家知道,一个计算平台为一个查询任务准备运行资源过程都是一样的,一个任务真正执行可能用不了多长时间,但是启动和停止任务用的时间是差不多的,所以这不利于跑历史数据。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值