6.2 离线数仓—ADS层商品主题需求的实现

本文详细介绍了ADS层商品主题在离线数仓中的实现,包括最近7/30日各品牌复购率、各品牌商品交易统计、各品类商品交易统计以及各分类商品购物车存量TOP3的分析和数据处理方法,涉及Hive SQL的建表与数据装载过程。


前言

前面完成了ADS层用户主题相关需求的设计和开发,下面进行ADS层商品主题的设计和开发。

一、最近7/30日各品牌复购率

1.需求说明和分析

1)需求说明

统计周期统计粒度指标说明
最近7、30日品牌复购率重复购买人数占购买人数比例

2)字段分析
根据需求,可以得知要包含以下字段:统计日期、最近天数、品牌ID、品牌名称、复购率

2.建表语句

DROP TABLE IF EXISTS ads_repeat_purchase_by_tm;
CREATE EXTERNAL TABLE ads_repeat_purchase_by_tm
(
    `dt`                STRING COMMENT '统计日期',
    `recent_days`       BIGINT COMMENT '最近天数,7:最近7天,30:最近30天',
    `tm_id`             STRING COMMENT '品牌ID',
    `tm_name`           STRING COMMENT '品牌名称',
    `order_repeat_rate` DECIMAL(16, 2) COMMENT '复购率'
) COMMENT '各品牌复购率统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_repeat_purchase_by_tm/';

3.数据装载

1)我的思路
先算7天的,直接从交易域用户商品粒度订单最近n日汇总表获取数据,按用户+品牌分组,对于每个用户的每个品牌,若最近7日下单次数为1,则代表了购买该品牌的人数,若最近7日下单次数大于1,则代表了重复购买该品牌的人数。要注意的是,这个表的粒度是用户——商品粒度,可能出现一个用户多次购买一个品牌的情况,所以要按用户+品牌进行分组语句如下:

select
  tm_id,
  tm_name,
  sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_7d) order_count
  from dws_trade_user_sku_order_nd
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

30天的复购率方法类似,语句如下:

select
  tm_id,
  tm_name,
  sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_30d) order_count
  from dws_trade_user_sku_order_nd
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

相同的数据,聚合逻辑不同,采用炸裂的方式炸裂两份。

select
  recent_days,
  tm_id,
  tm_name,
  sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
  select
    recent_days,
    user_id,
    tm_id,
    tm_name,
    if(recent_days=7,sum(order_count_7d),sum(order_count_30d)) order_count
  from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name,recent_days
)t1
group by tm_id,tm_name,recent_days

其中,子查询语句可以这样写:

sum(if(recent_days=7,order_count_7d,order_count_30d) order_count

2)给的思路

insert overwrite table ads_repeat_purchase_by_tm
select * from ads_repeat_purchase_by_tm
union
select
    '2020-06-14' dt,
    recent_days,
    tm_id,
    tm_name,
    cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0)) as decimal(16,2))
from
(
    select
        '2020-06-14' dt,
        recent_days,
        user_id,
        tm_id,
        tm_name,
        sum(order_count) order_count
    from
    (
        select
            recent_days,
            user_id,
            tm_id,
            tm_name,
            case recent_days
                when 7 then order_count_7d
                when 30 then order_count_30d
            end order_count
        from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,user_id,tm_id,tm_name
)t2
group by recent_days,tm_id,tm_name;

二、各品牌商品交易统计

1.需求说明和分析

1)需求说明

统计周期统计粒度指标说明
最近1、7、30日品牌订单数
最近1、7、30日品牌订单人数
最近1、7、30日品牌退单数
最近1、7、30日品牌退单人数

2)字段分析
根据需求说明,可以得知,要包含以下字段:统计日期、最近天数、订单数、订单人数、退单数、退单人数、品牌ID、品牌名称

2.建表语句

DROP TABLE IF EXISTS ads_trade_stats_by_tm;
CREATE EXTERNAL TABLE ads_trade_stats_by_tm
(
    `dt`                      STRING COMMENT '统计日期',
    `recent_days`             BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    `tm_id`                   STRING COMMENT '品牌ID',
    `tm_name`                 STRING COMMENT '品牌名称',
    `order_count`             BIGINT COMMENT '订单数',
    `order_user_count`        BIGINT COMMENT '订单人数',
    `order_refund_count`      BIGINT COMMENT '退单数',
    `order_refund_user_count` BIGINT COMMENT '退单人数'
) COMMENT '各品牌商品交易统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_trade_stats_by_tm/';

3.数据装载

1)我的思路
先算1天内订单数:

--最近1天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
  tm_id,
  tm_name,
  count(*)
  sum(order_count)
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_1d) order_count
  from dws_trade_user_sku_order_1d
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

算7天内订单数和订单人数:

--最近7天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
  tm_id,
  tm_name,
  count(*)
  sum(order_count)
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_7d) order_count
  from dws_trade_user_sku_order_nd
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

算30天内订单数和订单人数:

--最近30天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
  tm_id,
  tm_name,
  count(*)
  sum(order_count)
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_30d) order_count
  from dws_trade_user_sku_order_nd
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

7日和30日整合到一块去算:

select
  recent_days,
  tm_id,
  tm_name,
  count(*)
  sum(order_count)
from
(
  select
    recent_days,
    user_id,
    tm_id,
    tm_name,
    sum(if(recent_days=7,order_count_7d,order_count_30d) order_count
  from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
  where dt='2020-06-14'
  group by recent_days,user_id,tm_id,tm_name
)t1
group by recent_days,tm_id,tm_name

1日没办法和多日进行整合,只能分别计算然后join。
退单数和退单人数计算思路跟上面的一样。
2)给的思路:

insert overwrite table ads_trade_stats_by_tm
select * from ads_trade_stats_by_tm
union
select
    '2020-06-14' dt,
    nvl(odr.recent_days,refund.recent_days),
    nvl(odr.tm_id,refund.tm_id),
    nvl(odr.tm_name,refund.tm_name),
    nvl(order_count,0),
    nvl(order_user_count,0),
    nvl(order_refund_count,0),
    nvl(order_refund_user_count,0)
from
(
    select
        1 recent_days,
        tm_id,
        tm_name,
        sum(order_count_1d) order_count,
        count(distinct(user_id)) order_user_count
    from dws_trade_user_sku_order_1d
    where dt='2020-06-14'
    group by tm_id,tm_name
    union all
    select
        recent_days,
        tm_id,
        tm_name,
        sum(order_count),
        count(distinct(if(order_count>0,user_id,null)))
    from
    (
        select
            recent_days,
            user_id,
            tm_id,
            tm_name,
            case recent_days
                when 7 then order_count_7d
                when 30 then order_count_30d
            end order_count
        from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,tm_id,tm_name
)odr
full outer join
(
    select
        1 recent_days,
        tm_id,
        tm_name,
        sum(order_refund_count_1d) order_refund_count,
        count(distinct(user_id)) order_refund_user_count
    from dws_trade_user_sku_order_refund_1d
    where dt='2020-06-14'
    group by tm_id,tm_name
    union all
    select
        recent_days,
        tm_id,
        tm_name,
        sum(order_refund_count),
        count(if(order_refund_count>0,user_id,null))
    from
    (
        select
            recent_days,
            user_id,
            tm_id,
            tm_name,
            case recent_days
                when 7 then order_refund_count_7d
                when 30 then order_refund_count_30d
            end order_refund_count
        from dws_trade_user_sku_order_refund_nd lateral view explode(array(7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,tm_id,tm_name
)refund
on odr.recent_days=refund.recent_days
and odr.tm_id=refund.tm_id
and odr.tm_name=refund.tm_name;

注意事项:
可能会存在7天、30天内某品牌只有下单(或者只有退单)的情况,所以要使用全外联,但结果中可能会有部分数据为null,所以要对结果数据判null处理

4.语句说明

trade_user_sku_order_1d表中可能存在的数据情况如下:
在这里插入图片描述
所以计算1日内订单数和订单人数时,有以下两种方法:
第一种,直接按品牌分组,对用户去重(因为一个用户可能下单了一个品类中的多个商品)

select
  tm_id,
  tm_name,
  count(distinct(user_id)),
  sum(order_count_1d)
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by tm_id,tm_name

第二种,先按用户和品牌分组,再直接count(*)

select
  tm_id,
  tm_name,
  count(*)
  sum(order_count)
from
(
  select
    user_id,
    tm_id,
    tm_name,
    sum(order_count_1d) order_count
  from dws_trade_user_sku_order_1d
  where dt='2020-06-14'
  group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name

统计多天的时候也是一样:

select
  recent_days,
  tm_id,
  tm_name,
  count(distinct(if(if(recent_days=7,order_count_7d,order_count_30d)>0,user_id,null)))
  sum(if(recent_days=7,order_count_7d,order_count_30d)) order_count
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,tm_id,tm_name

三、各品牌商品交易统计

1.需求说明和分析

1)需求说明

统计周期统计粒度指标说明
最近1、7、30日品类订单数
最近1、7、30日品类订单人数
最近1、7、30日品类退单数
最近1、7、30日品类退单人数

2)字段分析
根据需求可以得到,该表应该包含以下字段:统计日期、最近天数、三个品类ID、三个品类名称、订单数、订单人数、退单数、退单人数

2.建表语句

DROP TABLE IF EXISTS ads_trade_stats_by_cate;
CREATE EXTERNAL TABLE ads_trade_stats_by_cate
(
    `dt`                      STRING COMMENT '统计日期',
    `recent_days`             BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    `category1_id`            STRING COMMENT '一级分类id',
    `category1_name`          STRING COMMENT '一级分类名称',
    `category2_id`            STRING COMMENT '二级分类id',
    `category2_name`          STRING COMMENT '二级分类名称',
    `category3_id`            STRING COMMENT '三级分类id',
    `category3_name`          STRING COMMENT '三级分类名称',
    `order_count`             BIGINT COMMENT '订单数',
    `order_user_count`        BIGINT COMMENT '订单人数',
    `order_refund_count`      BIGINT COMMENT '退单数',
    `order_refund_user_count` BIGINT COMMENT '退单人数'
) COMMENT '各分类商品交易统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_trade_stats_by_cate/';

3.数据装载

1)我的思路
首先,我觉得这个跟上一个需求很类似,只不过是把品牌换成了品类,所以语句基本上一样
先算订单数和订单人数:

--1天内的订单数和订单人数:
--1个用户可能多个下单了多个品类,按品类分组要去重

select
  sum(order_count_1d),
  count(distinct(user_id))
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

--7天内的订单数和订单人数:

select
  sum(order_count_7d),
  count(distinct(if(order_count_7d>0,user_id,null)))
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

--30天内的订单数和订单人数:

select
  sum(order_count_30d),
  count(distinct(if(order_count_30d>0,user_id,null)))
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

--7日和30日整合:

select
  recent_days,
  sum(if(recent_days=7,order_count_7d,order_count_30d))
  count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

--1日、7日、30日整合:
select
  1 recent_days,
  sum(order_count_1d),
  count(distinct(user_id))
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
  recent_days,
  sum(if(recent_days=7,order_count_7d,order_count_30d))
  count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

再算退单数和退单人数:

1天内的退单数和订单人数:
1个用户可能多个退单了多个品类,按品类分组要去重

select
  sum(order_refund_count_1d),
  count(distinct(user_id))
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

7天内的退单数和订单人数:

select
  sum(order_refund_count_7d),
  count(distinct(if(order_refund_count_7d>0,user_id,null)))
from dws_trade_user_sku_order_refund_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

30天内的退单数和退单人数:

select
  sum(order_refund_count_30d),
  count(distinct(if(order_refund_count_30d>0,user_id,null)))
from dws_trade_user_sku_order_refund_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

7日和30日整合:
select
  recent_days,
  sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
  count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

--1日、7日、30日整合:
select
  1 recent_days,
  sum(order_refund_count_1d),
  count(distinct(user_id))
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
  recent_days,
  sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
  count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name

将两部分整合:

select
  '2020-06-14',
  nvl(t1,recent_days,t2.recent_days),
  nvl(t1.category1_id,t2.category1_id),
  nvl(t1.category1_name,t2.category1_name),
  nvl(t1.category2_id,t2.category2_id),
  nvl(t1.category2_name,t2.category2_name),
  nvl(t1.category3_id,t2.category3_id),
  nvl(t1.category3_name,t2.category3_name),
  nvl(order_count,0),
  nvl(order_user_count,0),
  nvl(order_refund_count,0)
  nvl(order_refund_user_count,0)
from
(
  select
    1 recent_days,
    sum(order_count_1d) order_count,
    count(distinct(user_id)) order_user_count
  from dws_trade_user_sku_order_1d
  where dt='2020-06-14'
  group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
  union all
  select
    recent_days,
    sum(if(recent_days=7,order_count_7d,order_count_30d))
    count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
  from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
  where dt='2020-06-14'
  group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)t1
full outer join
(
  select
    1 recent_days,
    sum(order_refund_count_1d) order_refund_count,
    count(distinct(user_id)) order_refund_user_count
  from dws_trade_user_sku_order_refund_1d
  where dt='2020-06-14'
  group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
  union all
  select
    recent_days,
    sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
    count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
  from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
  where dt='2020-06-14'
  group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)t2
on t1.recent_days=t2.recent_days
and t1.category1_id=t2.category1_name
and t1.category2_id=t2.category2_name
and t1.category3_id=t2.category3_name

2)给的思路

insert overwrite table ads_trade_stats_by_cate
select * from ads_trade_stats_by_cate
union
select
    '2020-06-14' dt,
    nvl(odr.recent_days,refund.recent_days),
    nvl(odr.category1_id,refund.category1_id),
    nvl(odr.category1_name,refund.category1_name),
    nvl(odr.category2_id,refund.category2_id),
    nvl(odr.category2_name,refund.category2_name),
    nvl(odr.category3_id,refund.category3_id),
    nvl(odr.category3_name,refund.category3_name),
    nvl(order_count,0),
    nvl(order_user_count,0),
    nvl(order_refund_count,0),
    nvl(order_refund_user_count,0)
from
(
    select
        1 recent_days,
        category1_id,
        category1_name,
        category2_id,
        category2_name,
        category3_id,
        category3_name,
        sum(order_count_1d) order_count,
        count(distinct(user_id)) order_user_count
    from dws_trade_user_sku_order_1d
    where dt='2020-06-14'
    group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
    union all
    select
        recent_days,
        category1_id,
        category1_name,
        category2_id,
        category2_name,
        category3_id,
        category3_name,
        sum(order_count),
        count(distinct(if(order_count>0,user_id,null)))
    from
    (
        select
            recent_days,
            user_id,
            category1_id,
            category1_name,
            category2_id,
            category2_name,
            category3_id,
            category3_name,
            case recent_days
                when 7 then order_count_7d
                when 30 then order_count_30d
            end order_count
        from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)odr
full outer join
(
    select
        1 recent_days,
        category1_id,
        category1_name,
        category2_id,
        category2_name,
        category3_id,
        category3_name,
        sum(order_refund_count_1d) order_refund_count,
        count(distinct(user_id)) order_refund_user_count
    from dws_trade_user_sku_order_refund_1d
    where dt='2020-06-14'
    group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
    union all
    select
        recent_days,
        category1_id,
        category1_name,
        category2_id,
        category2_name,
        category3_id,
        category3_name,
        sum(order_refund_count),
        count(distinct(if(order_refund_count>0,user_id,null)))
    from
    (
        select
            recent_days,
            user_id,
            category1_id,
            category1_name,
            category2_id,
            category2_name,
            category3_id,
            category3_name,
            case recent_days
                when 7 then order_refund_count_7d
                when 30 then order_refund_count_30d
            end order_refund_count
        from dws_trade_user_sku_order_refund_nd lateral view explode(array(7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)refund
on odr.recent_days=refund.recent_days
and odr.category1_id=refund.category1_id
and odr.category1_name=refund.category1_name
and odr.category2_id=refund.category2_id
and odr.category2_name=refund.category2_name
and odr.category3_id=refund.category3_id
and odr.category3_name=refund.category3_name;

四、各分类商品购物车存量TOP3

1.需求说明和分析

1)需求说明
统计各个分类商品购物车存量TOP3
2)字段分析
根据需求,可以得知要包含以下字段:统计日期、三级分类的ID、三级分类的名称、商品ID、商品名称、购物车中存量数、排名

2.建表语句

DROP TABLE IF EXISTS ads_sku_cart_num_top3_by_cate;
CREATE EXTERNAL TABLE ads_sku_cart_num_top3_by_cate
(
    `dt`             STRING COMMENT '统计日期',
    `category1_id`   STRING COMMENT '一级分类ID',
    `category1_name` STRING COMMENT '一级分类名称',
    `category2_id`   STRING COMMENT '二级分类ID',
    `category2_name` STRING COMMENT '二级分类名称',
    `category3_id`   STRING COMMENT '三级分类ID',
    `category3_name` STRING COMMENT '三级分类名称',
    `sku_id`         STRING COMMENT '商品id',
    `sku_name`       STRING COMMENT '商品名称',
    `cart_num`       BIGINT COMMENT '购物车中商品数量',
    `rk`             BIGINT COMMENT '排名'
) COMMENT '各分类商品购物车存量Top10'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_sku_cart_num_top3_by_cate/';

3.数据装载

1)我的思路
要统计各分类商品购物车存量TOP3,要从交易域购物车周期快照事实表拿相关的数据,然后跟商品维度表join,再按分类进行统计

select
  '2020-06-14',
  category1_id,
  category1_name,
  category2_id,
  category2_name,
  category3_id,
  category3_name,
  sku_id,
  sku_name,
  sku_num,
  rk
from
(
  select
    category1_id,
    category1_name,
    category2_id,
    category2_name,
    category3_id,
    category3_name,
    sku_id,
    sku_name,
    sku_num,
    rank() over(partition by category1_id,category1_name,category2_id,category2_name,
    category3_id,category3_name order by sku_num desc) rk
  from
  (
    select
      sku_id,
      sku_name,
      sum(sku_num) sku_num
    from dwd_trade_cart_full
    where dt='2020-06-14'
    group by sku_id,sku_name
  )t1
  left join
  (
    select
      id,
      category1_id,
      category1_name,
      category2_id,
      category2_name,
      category3_id,
      category3_name
    from dim_sku_full
    where dt='2020-06-14'
  )t2
  on t1.sku_id=t2.id
)t3
where rk<=3

2)给的思路
给的思路跟我的思路一样,一遍就写出来了!!!

insert overwrite table ads_sku_cart_num_top3_by_cate
select * from ads_sku_cart_num_top3_by_cate
union
select
    '2020-06-14' dt,
    category1_id,
    category1_name,
    category2_id,
    category2_name,
    category3_id,
    category3_name,
    sku_id,
    sku_name,
    cart_num,
    rk
from
(
    select
        sku_id,
        sku_name,
        category1_id,
        category1_name,
        category2_id,
        category2_name,
        category3_id,
        category3_name,
        cart_num,
        rank() over (partition by category1_id,category2_id,category3_id order by cart_num desc) rk
    from
    (
        select
            sku_id,
            sum(sku_num) cart_num
        from dwd_trade_cart_full
        where dt='2020-06-14'
        group by sku_id
    )cart
    left join
    (
        select
            id,
            sku_name,
            category1_id,
            category1_name,
            category2_id,
            category2_name,
            category3_id,
            category3_name
        from dim_sku_full
        where dt='2020-06-14'
    )sku
    on cart.sku_id=sku.id
)t1
where rk<=3;
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值