离线数仓—ADS层商品主题需求的实现
前言
前面完成了ADS层用户主题相关需求的设计和开发,下面进行ADS层商品主题的设计和开发。
一、最近7/30日各品牌复购率
1.需求说明和分析
1)需求说明
| 统计周期 | 统计粒度 | 指标 | 说明 |
|---|---|---|---|
| 最近7、30日 | 品牌 | 复购率 | 重复购买人数占购买人数比例 |
2)字段分析
根据需求,可以得知要包含以下字段:统计日期、最近天数、品牌ID、品牌名称、复购率
2.建表语句
DROP TABLE IF EXISTS ads_repeat_purchase_by_tm;
CREATE EXTERNAL TABLE ads_repeat_purchase_by_tm
(
`dt` STRING COMMENT '统计日期',
`recent_days` BIGINT COMMENT '最近天数,7:最近7天,30:最近30天',
`tm_id` STRING COMMENT '品牌ID',
`tm_name` STRING COMMENT '品牌名称',
`order_repeat_rate` DECIMAL(16, 2) COMMENT '复购率'
) COMMENT '各品牌复购率统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_repeat_purchase_by_tm/';
3.数据装载
1)我的思路
先算7天的,直接从交易域用户商品粒度订单最近n日汇总表获取数据,按用户+品牌分组,对于每个用户的每个品牌,若最近7日下单次数为1,则代表了购买该品牌的人数,若最近7日下单次数大于1,则代表了重复购买该品牌的人数。要注意的是,这个表的粒度是用户——商品粒度,可能出现一个用户多次购买一个品牌的情况,所以要按用户+品牌进行分组语句如下:
select
tm_id,
tm_name,
sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_7d) order_count
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
30天的复购率方法类似,语句如下:
select
tm_id,
tm_name,
sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_30d) order_count
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
相同的数据,聚合逻辑不同,采用炸裂的方式炸裂两份。
select
recent_days,
tm_id,
tm_name,
sum(if(order_count>1),1,0)/sum(if(order_count>=1,1,0))
from
(
select
recent_days,
user_id,
tm_id,
tm_name,
if(recent_days=7,sum(order_count_7d),sum(order_count_30d)) order_count
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by user_id,tm_id,tm_name,recent_days
)t1
group by tm_id,tm_name,recent_days
其中,子查询语句可以这样写:
sum(if(recent_days=7,order_count_7d,order_count_30d) order_count
2)给的思路
insert overwrite table ads_repeat_purchase_by_tm
select * from ads_repeat_purchase_by_tm
union
select
'2020-06-14' dt,
recent_days,
tm_id,
tm_name,
cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0)) as decimal(16,2))
from
(
select
'2020-06-14' dt,
recent_days,
user_id,
tm_id,
tm_name,
sum(order_count) order_count
from
(
select
recent_days,
user_id,
tm_id,
tm_name,
case recent_days
when 7 then order_count_7d
when 30 then order_count_30d
end order_count
from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
where dt='2020-06-14'
)t1
group by recent_days,user_id,tm_id,tm_name
)t2
group by recent_days,tm_id,tm_name;
二、各品牌商品交易统计
1.需求说明和分析
1)需求说明
| 统计周期 | 统计粒度 | 指标 | 说明 |
|---|---|---|---|
| 最近1、7、30日 | 品牌 | 订单数 | 略 |
| 最近1、7、30日 | 品牌 | 订单人数 | 略 |
| 最近1、7、30日 | 品牌 | 退单数 | 略 |
| 最近1、7、30日 | 品牌 | 退单人数 | 略 |
2)字段分析
根据需求说明,可以得知,要包含以下字段:统计日期、最近天数、订单数、订单人数、退单数、退单人数、品牌ID、品牌名称
2.建表语句
DROP TABLE IF EXISTS ads_trade_stats_by_tm;
CREATE EXTERNAL TABLE ads_trade_stats_by_tm
(
`dt` STRING COMMENT '统计日期',
`recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
`tm_id` STRING COMMENT '品牌ID',
`tm_name` STRING COMMENT '品牌名称',
`order_count` BIGINT COMMENT '订单数',
`order_user_count` BIGINT COMMENT '订单人数',
`order_refund_count` BIGINT COMMENT '退单数',
`order_refund_user_count` BIGINT COMMENT '退单人数'
) COMMENT '各品牌商品交易统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_trade_stats_by_tm/';
3.数据装载
1)我的思路
先算1天内订单数:
--最近1天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
tm_id,
tm_name,
count(*)
sum(order_count)
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_1d) order_count
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
算7天内订单数和订单人数:
--最近7天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
tm_id,
tm_name,
count(*)
sum(order_count)
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_7d) order_count
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
算30天内订单数和订单人数:
--最近30天内订单数:
--1个用户对一个品牌的下单次数(一个品牌可能对应多个用户)
--1个品牌的下单次数和下单人数
select
tm_id,
tm_name,
count(*)
sum(order_count)
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_30d) order_count
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
7日和30日整合到一块去算:
select
recent_days,
tm_id,
tm_name,
count(*)
sum(order_count)
from
(
select
recent_days,
user_id,
tm_id,
tm_name,
sum(if(recent_days=7,order_count_7d,order_count_30d) order_count
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,user_id,tm_id,tm_name
)t1
group by recent_days,tm_id,tm_name
1日没办法和多日进行整合,只能分别计算然后join。
退单数和退单人数计算思路跟上面的一样。
2)给的思路:
insert overwrite table ads_trade_stats_by_tm
select * from ads_trade_stats_by_tm
union
select
'2020-06-14' dt,
nvl(odr.recent_days,refund.recent_days),
nvl(odr.tm_id,refund.tm_id),
nvl(odr.tm_name,refund.tm_name),
nvl(order_count,0),
nvl(order_user_count,0),
nvl(order_refund_count,0),
nvl(order_refund_user_count,0)
from
(
select
1 recent_days,
tm_id,
tm_name,
sum(order_count_1d) order_count,
count(distinct(user_id)) order_user_count
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by tm_id,tm_name
union all
select
recent_days,
tm_id,
tm_name,
sum(order_count),
count(distinct(if(order_count>0,user_id,null)))
from
(
select
recent_days,
user_id,
tm_id,
tm_name,
case recent_days
when 7 then order_count_7d
when 30 then order_count_30d
end order_count
from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
where dt='2020-06-14'
)t1
group by recent_days,tm_id,tm_name
)odr
full outer join
(
select
1 recent_days,
tm_id,
tm_name,
sum(order_refund_count_1d) order_refund_count,
count(distinct(user_id)) order_refund_user_count
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by tm_id,tm_name
union all
select
recent_days,
tm_id,
tm_name,
sum(order_refund_count),
count(if(order_refund_count>0,user_id,null))
from
(
select
recent_days,
user_id,
tm_id,
tm_name,
case recent_days
when 7 then order_refund_count_7d
when 30 then order_refund_count_30d
end order_refund_count
from dws_trade_user_sku_order_refund_nd lateral view explode(array(7,30)) tmp as recent_days
where dt='2020-06-14'
)t1
group by recent_days,tm_id,tm_name
)refund
on odr.recent_days=refund.recent_days
and odr.tm_id=refund.tm_id
and odr.tm_name=refund.tm_name;
注意事项:
可能会存在7天、30天内某品牌只有下单(或者只有退单)的情况,所以要使用全外联,但结果中可能会有部分数据为null,所以要对结果数据判null处理
4.语句说明
trade_user_sku_order_1d表中可能存在的数据情况如下:

所以计算1日内订单数和订单人数时,有以下两种方法:
第一种,直接按品牌分组,对用户去重(因为一个用户可能下单了一个品类中的多个商品)
select
tm_id,
tm_name,
count(distinct(user_id)),
sum(order_count_1d)
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by tm_id,tm_name
第二种,先按用户和品牌分组,再直接count(*)
select
tm_id,
tm_name,
count(*)
sum(order_count)
from
(
select
user_id,
tm_id,
tm_name,
sum(order_count_1d) order_count
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by user_id,tm_id,tm_name
)t1
group by tm_id,tm_name
统计多天的时候也是一样:
select
recent_days,
tm_id,
tm_name,
count(distinct(if(if(recent_days=7,order_count_7d,order_count_30d)>0,user_id,null)))
sum(if(recent_days=7,order_count_7d,order_count_30d)) order_count
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,tm_id,tm_name
三、各品牌商品交易统计
1.需求说明和分析
1)需求说明
| 统计周期 | 统计粒度 | 指标 | 说明 |
|---|---|---|---|
| 最近1、7、30日 | 品类 | 订单数 | 略 |
| 最近1、7、30日 | 品类 | 订单人数 | 略 |
| 最近1、7、30日 | 品类 | 退单数 | 略 |
| 最近1、7、30日 | 品类 | 退单人数 | 略 |
2)字段分析
根据需求可以得到,该表应该包含以下字段:统计日期、最近天数、三个品类ID、三个品类名称、订单数、订单人数、退单数、退单人数
2.建表语句
DROP TABLE IF EXISTS ads_trade_stats_by_cate;
CREATE EXTERNAL TABLE ads_trade_stats_by_cate
(
`dt` STRING COMMENT '统计日期',
`recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
`category1_id` STRING COMMENT '一级分类id',
`category1_name` STRING COMMENT '一级分类名称',
`category2_id` STRING COMMENT '二级分类id',
`category2_name` STRING COMMENT '二级分类名称',
`category3_id` STRING COMMENT '三级分类id',
`category3_name` STRING COMMENT '三级分类名称',
`order_count` BIGINT COMMENT '订单数',
`order_user_count` BIGINT COMMENT '订单人数',
`order_refund_count` BIGINT COMMENT '退单数',
`order_refund_user_count` BIGINT COMMENT '退单人数'
) COMMENT '各分类商品交易统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_trade_stats_by_cate/';
3.数据装载
1)我的思路
首先,我觉得这个跟上一个需求很类似,只不过是把品牌换成了品类,所以语句基本上一样
先算订单数和订单人数:
--1天内的订单数和订单人数:
--1个用户可能多个下单了多个品类,按品类分组要去重
select
sum(order_count_1d),
count(distinct(user_id))
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
--7天内的订单数和订单人数:
select
sum(order_count_7d),
count(distinct(if(order_count_7d>0,user_id,null)))
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
--30天内的订单数和订单人数:
select
sum(order_count_30d),
count(distinct(if(order_count_30d>0,user_id,null)))
from dws_trade_user_sku_order_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
--7日和30日整合:
select
recent_days,
sum(if(recent_days=7,order_count_7d,order_count_30d))
count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
--1日、7日、30日整合:
select
1 recent_days,
sum(order_count_1d),
count(distinct(user_id))
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
sum(if(recent_days=7,order_count_7d,order_count_30d))
count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
再算退单数和退单人数:
1天内的退单数和订单人数:
1个用户可能多个退单了多个品类,按品类分组要去重
select
sum(order_refund_count_1d),
count(distinct(user_id))
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
7天内的退单数和订单人数:
select
sum(order_refund_count_7d),
count(distinct(if(order_refund_count_7d>0,user_id,null)))
from dws_trade_user_sku_order_refund_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
30天内的退单数和退单人数:
select
sum(order_refund_count_30d),
count(distinct(if(order_refund_count_30d>0,user_id,null)))
from dws_trade_user_sku_order_refund_nd
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
7日和30日整合:
select
recent_days,
sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
--1日、7日、30日整合:
select
1 recent_days,
sum(order_refund_count_1d),
count(distinct(user_id))
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
将两部分整合:
select
'2020-06-14',
nvl(t1,recent_days,t2.recent_days),
nvl(t1.category1_id,t2.category1_id),
nvl(t1.category1_name,t2.category1_name),
nvl(t1.category2_id,t2.category2_id),
nvl(t1.category2_name,t2.category2_name),
nvl(t1.category3_id,t2.category3_id),
nvl(t1.category3_name,t2.category3_name),
nvl(order_count,0),
nvl(order_user_count,0),
nvl(order_refund_count,0)
nvl(order_refund_user_count,0)
from
(
select
1 recent_days,
sum(order_count_1d) order_count,
count(distinct(user_id)) order_user_count
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
sum(if(recent_days=7,order_count_7d,order_count_30d))
count(distinct(if(recent_days=7,order_count_7d,order_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)t1
full outer join
(
select
1 recent_days,
sum(order_refund_count_1d) order_refund_count,
count(distinct(user_id)) order_refund_user_count
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
sum(if(recent_days=7,order_refund_count_7d,order_refund_count_30d))
count(distinct(if(recent_days=7,order_refund_count_7d,order_refund_count_30d)>0,user_did,null))
from dws_trade_user_sku_order_nd LATERAL VIEW explode(array(7,30)) tmp AS recent_days
where dt='2020-06-14'
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)t2
on t1.recent_days=t2.recent_days
and t1.category1_id=t2.category1_name
and t1.category2_id=t2.category2_name
and t1.category3_id=t2.category3_name
2)给的思路
insert overwrite table ads_trade_stats_by_cate
select * from ads_trade_stats_by_cate
union
select
'2020-06-14' dt,
nvl(odr.recent_days,refund.recent_days),
nvl(odr.category1_id,refund.category1_id),
nvl(odr.category1_name,refund.category1_name),
nvl(odr.category2_id,refund.category2_id),
nvl(odr.category2_name,refund.category2_name),
nvl(odr.category3_id,refund.category3_id),
nvl(odr.category3_name,refund.category3_name),
nvl(order_count,0),
nvl(order_user_count,0),
nvl(order_refund_count,0),
nvl(order_refund_user_count,0)
from
(
select
1 recent_days,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sum(order_count_1d) order_count,
count(distinct(user_id)) order_user_count
from dws_trade_user_sku_order_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sum(order_count),
count(distinct(if(order_count>0,user_id,null)))
from
(
select
recent_days,
user_id,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
case recent_days
when 7 then order_count_7d
when 30 then order_count_30d
end order_count
from dws_trade_user_sku_order_nd lateral view explode(array(7,30)) tmp as recent_days
where dt='2020-06-14'
)t1
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)odr
full outer join
(
select
1 recent_days,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sum(order_refund_count_1d) order_refund_count,
count(distinct(user_id)) order_refund_user_count
from dws_trade_user_sku_order_refund_1d
where dt='2020-06-14'
group by category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
union all
select
recent_days,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sum(order_refund_count),
count(distinct(if(order_refund_count>0,user_id,null)))
from
(
select
recent_days,
user_id,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
case recent_days
when 7 then order_refund_count_7d
when 30 then order_refund_count_30d
end order_refund_count
from dws_trade_user_sku_order_refund_nd lateral view explode(array(7,30)) tmp as recent_days
where dt='2020-06-14'
)t1
group by recent_days,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name
)refund
on odr.recent_days=refund.recent_days
and odr.category1_id=refund.category1_id
and odr.category1_name=refund.category1_name
and odr.category2_id=refund.category2_id
and odr.category2_name=refund.category2_name
and odr.category3_id=refund.category3_id
and odr.category3_name=refund.category3_name;
四、各分类商品购物车存量TOP3
1.需求说明和分析
1)需求说明
统计各个分类商品购物车存量TOP3
2)字段分析
根据需求,可以得知要包含以下字段:统计日期、三级分类的ID、三级分类的名称、商品ID、商品名称、购物车中存量数、排名
2.建表语句
DROP TABLE IF EXISTS ads_sku_cart_num_top3_by_cate;
CREATE EXTERNAL TABLE ads_sku_cart_num_top3_by_cate
(
`dt` STRING COMMENT '统计日期',
`category1_id` STRING COMMENT '一级分类ID',
`category1_name` STRING COMMENT '一级分类名称',
`category2_id` STRING COMMENT '二级分类ID',
`category2_name` STRING COMMENT '二级分类名称',
`category3_id` STRING COMMENT '三级分类ID',
`category3_name` STRING COMMENT '三级分类名称',
`sku_id` STRING COMMENT '商品id',
`sku_name` STRING COMMENT '商品名称',
`cart_num` BIGINT COMMENT '购物车中商品数量',
`rk` BIGINT COMMENT '排名'
) COMMENT '各分类商品购物车存量Top10'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_sku_cart_num_top3_by_cate/';
3.数据装载
1)我的思路
要统计各分类商品购物车存量TOP3,要从交易域购物车周期快照事实表拿相关的数据,然后跟商品维度表join,再按分类进行统计
select
'2020-06-14',
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sku_id,
sku_name,
sku_num,
rk
from
(
select
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sku_id,
sku_name,
sku_num,
rank() over(partition by category1_id,category1_name,category2_id,category2_name,
category3_id,category3_name order by sku_num desc) rk
from
(
select
sku_id,
sku_name,
sum(sku_num) sku_num
from dwd_trade_cart_full
where dt='2020-06-14'
group by sku_id,sku_name
)t1
left join
(
select
id,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name
from dim_sku_full
where dt='2020-06-14'
)t2
on t1.sku_id=t2.id
)t3
where rk<=3
2)给的思路
给的思路跟我的思路一样,一遍就写出来了!!!
insert overwrite table ads_sku_cart_num_top3_by_cate
select * from ads_sku_cart_num_top3_by_cate
union
select
'2020-06-14' dt,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
sku_id,
sku_name,
cart_num,
rk
from
(
select
sku_id,
sku_name,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name,
cart_num,
rank() over (partition by category1_id,category2_id,category3_id order by cart_num desc) rk
from
(
select
sku_id,
sum(sku_num) cart_num
from dwd_trade_cart_full
where dt='2020-06-14'
group by sku_id
)cart
left join
(
select
id,
sku_name,
category1_id,
category1_name,
category2_id,
category2_name,
category3_id,
category3_name
from dim_sku_full
where dt='2020-06-14'
)sku
on cart.sku_id=sku.id
)t1
where rk<=3;
本文详细介绍了ADS层商品主题在离线数仓中的实现,包括最近7/30日各品牌复购率、各品牌商品交易统计、各品类商品交易统计以及各分类商品购物车存量TOP3的分析和数据处理方法,涉及Hive SQL的建表与数据装载过程。
650

被折叠的 条评论
为什么被折叠?



