离线数仓—DWS层交易域、用户域汇总表设计实现
前言
前面完成了所有1d和nd表的设计和开发,下面进行td表的设计和开发。
一、交易域用户粒度订单历史至今汇总表
1.建表语句
DROP TABLE IF EXISTS dws_trade_user_order_td;
CREATE EXTERNAL TABLE dws_trade_user_order_td
(
`user_id` STRING COMMENT '用户id',
`order_date_first` STRING COMMENT '首次下单日期',
`order_date_last` STRING COMMENT '末次下单日期',
`order_count_td` BIGINT COMMENT '下单次数',
`order_num_td` BIGINT COMMENT '购买商品件数',
`original_amount_td` DECIMAL(16, 2) COMMENT '原始金额',
`activity_reduce_amount_td` DECIMAL(16, 2) COMMENT '活动优惠金额',
`coupon_reduce_amount_td` DECIMAL(16, 2) COMMENT '优惠券优惠金额',
`total_amount_td` DECIMAL(16, 2) COMMENT '最终金额'
) COMMENT '交易域用户粒度订单历史至今汇总事实表'
PARTITIONED BY (`dt` STRING)
STORED AS ORC
LOCATION '/warehouse/gmall/dws/dws_trade_user_order_td'
TBLPROPERTIES ('orc.compress' = 'snappy');
2.首日数据装载
数据装载分为首日和每日,首日的时候用的是1d表,每日的时候用的是前一天的td表和当天对应的1d表
首日数据装载语句:
insert overwrite table dws_trade_user_order_td partition(dt='2020-06-14')
select
user_id,
min(dt) login_date_first,
max(dt) login_date_last,
sum(order_count_1d) order_count,
sum(order_num_1d) order_num,
sum(order_original_amount_1d) original_amount,
sum(activity_reduce_amount_1d) activity_reduce_amount,
sum(coupon_reduce_amount_1d) coupon_reduce_amount,
sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
group by user_id;
3.每日数据装载
方式一:
可以跟首日的思路一样,只不过要读取所有的分区,常此以后,效率不高
insert overwrite table dws_trade_user_order_td partition(dt='2020-06-15')
select
user_id,
min(dt) login_date_first,
max(dt) login_date_last,
sum(order_count_1d) order_count,
sum(order_num_1d) order_num,
sum(order_original_amount_1d) original_amount,
sum(activity_reduce_amount_1d) activity_reduce_amount,
sum(coupon_reduce_amount_1d) coupon_reduce_amount,
sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
group by user_id;
方式二:
方式二是利用前一天的td表和今天的1d表,将两部分数据结合到一起
使用full outer join的方法:
insert overwrite table dws_trade_user_order_td partition(dt='2020-06-15')
select
nvl(old.user_id,new.user_id),
if(new.user_id is not null and old.user_id is null,'2020-06-15',old.order_date_first),
if(new.user_id is not null,'2020-06-15',old.order_date_last),
nvl(old.order_count_td,0)+nvl(new.order_count_1d,0),
nvl(old.order_num_td,0)+nvl(new.order_num_1d,0),
nvl(old.original_amount_td,0)+nvl(new.order_original_amount_1d,0),
nvl(old.activity_reduce_amount_td,0)+nvl(new.activity_reduce_amount_1d,0),
nvl(old.coupon_reduce_amount_td,0)+nvl(new.coupon_reduce_amount_1d,0),
nvl(old.total_amount_td,0)+nvl(new.order_total_amount_1d,0)
from
(
select
user_id,
order_date_first,
order_date_last,
order_count_td,
order_num_td,
original_amount_td,
activity_reduce_amount_td,
coupon_reduce_amount_td,
total_amount_td
from dws_trade_user_order_td
where dt=date_add('2020-06-15',-1)
)old
full outer join
(
select
user_id,
order_count_1d,
order_num_1d,
order_original_amount_1d,
activity_reduce_amount_1d,
coupon_reduce_amount_1d,
order_total_amount_1d
from dws_trade_user_order_1d
where dt='2020-06-15'
)new
on old.user_id=new.user_id;
使用union all的方法:
select
user_id,
min(order_date_first) login_date_first,
max(order_date_last) login_date_last,
sum(order_count_td) order_count,
sum(order_num_td) order_num,
sum(original_amount_td) original_amount,
sum(activity_reduce_amount_td) activity_reduce_amount,
sum(coupon_reduce_amount_td) coupon_reduce_amount,
sum(total_amount_td) total_amount
from
(
select
user_id,
order_date_first,
order_date_last,
order_count_td,
order_num_td,
original_amount_td,
activity_reduce_amount_td,
coupon_reduce_amount_td,
total_amount_td
from dws_trade_user_order_td
where dt='2020-06-14'
union all
select
user_id,
'2020-06-15',
'2020-06-15',
sum(order_count_1d) order_count,
sum(order_num_1d) order_num,
sum(order_original_amount_1d) original_amount,
sum(activity_reduce_amount_1d) activity_reduce_amount,
sum(coupon_reduce_amount_1d) coupon_reduce_amount,
sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
where dt='2020-06-15'
)t1
group by t1.user_id
二、交易域用户粒度支付历史至今汇总表
1.建表语句
DROP TABLE IF EXISTS dws_trade_user_payment_td;
CREATE EXTERNAL TABLE dws_trade_user_payment_td
(
`user_id` STRING COMMENT '用户id',
`payment_date_first` STRING COMMENT '首次支付日期',
`payment_date_last` STRING COMMENT '末次支付日期',
`payment_count_td` BIGINT COMMENT '历史至今支付次数',
`payment_num_td` BIGINT COMMENT '历史至今支付商品件数',
`payment_amount_td` DECIMAL(16, 2) COMMENT '历史至今支付金额'
) COMMENT '交易域用户粒度支付历史至今汇总事实表'
PARTITIONED BY (`dt` STRING)
STORED AS ORC
LOCATION '/warehouse/gmall/dws/dws_trade_user_payment_td'
TBLPROPERTIES ('orc.compress' = 'snappy');
2.首日数据装载
insert overwrite table dws_trade_user_payment_td partition(dt='2020-06-14')
select
user_id,
min(dt) payment_date_first,
max(dt) payment_date_last,
sum(payment_count_1d) payment_count,
sum(payment_num_1d) payment_num,
sum(payment_amount_1d) payment_amount
from dws_trade_user_payment_1d
group by user_id;
3.每日数据装载
每日装载和一中的一样
insert overwrite table dws_trade_user_payment_td partition(dt='2020-06-15')
select
nvl(old.user_id,new.user_id),
if(old.user_id is null and new.user_id is not null,'2020-06-15',old.payment_date_first),
if(new.user_id is not null,'2020-06-15',old.payment_date_last),
nvl(old.payment_count_td,0)+nvl(new.payment_count_1d,0),
nvl(old.payment_num_td,0)+nvl(new.payment_num_1d,0),
nvl(old.payment_amount_td,0)+nvl(new.payment_amount_1d,0)
from
(
select
user_id,
payment_date_first,
payment_date_last,
payment_count_td,
payment_num_td,
payment_amount_td
from dws_trade_user_payment_td
where dt=date_add('2020-06-15',-1)
)old
full outer join
(
select
user_id,
payment_count_1d,
payment_num_1d,
payment_amount_1d
from dws_trade_user_payment_1d
where dt='2020-06-15'
)new
on old.user_id=new.user_id;
三、用户域用户粒度登录历史至今汇总表
1.建表语句
DROP TABLE IF EXISTS dws_user_user_login_td;
CREATE EXTERNAL TABLE dws_user_user_login_td
(
`user_id` STRING COMMENT '用户id',
`login_date_last` STRING COMMENT '末次登录日期',
`login_count_td` BIGINT COMMENT '累计登录次数'
) COMMENT '用户域用户粒度登录历史至今汇总事实表'
PARTITIONED BY (`dt` STRING)
STORED AS ORC
LOCATION '/warehouse/gmall/dws/dws_user_user_login_td'
TBLPROPERTIES ('orc.compress' = 'snappy');
这里可以把注册日期当作用户的首次登录日期,所以可以不写这个字段。
2.首日装载语句
insert overwrite table dws_user_user_login_td partition(dt='2020-06-14')
select
u.id,
nvl(login_date_last,date_format(create_time,'yyyy-MM-dd')),
nvl(login_count_td,1)
from
(
select
id,
create_time
from dim_user_zip
where dt='9999-12-31'
)u
left join
(
select
user_id,
max(dt) login_date_last,
count(*) login_count_td
from dwd_user_login_inc
group by user_id
)l
on u.id=l.user_id;
首日装载语句说明:
1)在首日装载时,因为登录表里的数据来自于日志,所以只有部分用户的登录状态,因此要从用户表中拿到所有用户的信息,若没有某用户的登录信息,那么就根据用户的注册时间作为登录时间,登录次数设置为1(这里使用用户9999-12-31分区的数据跟登录表做left join操作)
3.每日装载语句
insert overwrite table dws_user_user_login_td partition(dt='2020-06-15')
select
nvl(old.user_id,new.user_id),
if(new.user_id is null,old.login_date_last,'2020-06-15'),
nvl(old.login_count_td,0)+nvl(new.login_count_1d,0)
from
(
select
user_id,
login_date_last,
login_count_td
from dws_user_user_login_td
where dt=date_add('2020-06-15',-1)
)old
full outer join
(
select
user_id,
count(*) login_count_1d
from dwd_user_login_inc
where dt='2020-06-15'
group by user_id
)new
on old.user_id=new.user_id;
每日装载语句说明:
1)每日装载时,先拿到前一天的td表的数据,在拿到当天登录用户的数据,需要注意的是,当天登录的用户可能存在第一天注册登录的情况,即前一天的td表中没有该用户的数据,所以要使用full outer join,取两张表的并集,再根据字段的值取对应的值。
本文详细介绍了如何设计和实现离线数据仓库DWS层中的交易域和用户域汇总表。内容包括交易域用户粒度订单历史至今汇总表、支付历史至今汇总表及用户域用户粒度登录历史至今汇总表的建表语句、首日及每日数据装载方法,重点讨论了全外连接和并集操作在数据装载中的应用。
783

被折叠的 条评论
为什么被折叠?



