5.4 离线数仓—DWS层交易域、用户域汇总表设计实现

本文详细介绍了如何设计和实现离线数据仓库DWS层中的交易域和用户域汇总表。内容包括交易域用户粒度订单历史至今汇总表、支付历史至今汇总表及用户域用户粒度登录历史至今汇总表的建表语句、首日及每日数据装载方法,重点讨论了全外连接和并集操作在数据装载中的应用。


前言

前面完成了所有1d和nd表的设计和开发,下面进行td表的设计和开发。

一、交易域用户粒度订单历史至今汇总表

1.建表语句

DROP TABLE IF EXISTS dws_trade_user_order_td;
CREATE EXTERNAL TABLE dws_trade_user_order_td
(
    `user_id`                   STRING COMMENT '用户id',
    `order_date_first`          STRING COMMENT '首次下单日期',
    `order_date_last`           STRING COMMENT '末次下单日期',
    `order_count_td`            BIGINT COMMENT '下单次数',
    `order_num_td`              BIGINT COMMENT '购买商品件数',
    `original_amount_td`        DECIMAL(16, 2) COMMENT '原始金额',
    `activity_reduce_amount_td` DECIMAL(16, 2) COMMENT '活动优惠金额',
    `coupon_reduce_amount_td`   DECIMAL(16, 2) COMMENT '优惠券优惠金额',
    `total_amount_td`           DECIMAL(16, 2) COMMENT '最终金额'
) COMMENT '交易域用户粒度订单历史至今汇总事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/gmall/dws/dws_trade_user_order_td'
    TBLPROPERTIES ('orc.compress' = 'snappy');

2.首日数据装载

数据装载分为首日和每日,首日的时候用的是1d表,每日的时候用的是前一天的td表和当天对应的1d表
首日数据装载语句:

insert overwrite table dws_trade_user_order_td partition(dt='2020-06-14')
select
    user_id,
    min(dt) login_date_first,
    max(dt) login_date_last,
    sum(order_count_1d) order_count,
    sum(order_num_1d) order_num,
    sum(order_original_amount_1d) original_amount,
    sum(activity_reduce_amount_1d) activity_reduce_amount,
    sum(coupon_reduce_amount_1d) coupon_reduce_amount,
    sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
group by user_id;

3.每日数据装载

方式一:
可以跟首日的思路一样,只不过要读取所有的分区,常此以后,效率不高

insert overwrite table dws_trade_user_order_td partition(dt='2020-06-15')
select
    user_id,
    min(dt) login_date_first,
    max(dt) login_date_last,
    sum(order_count_1d) order_count,
    sum(order_num_1d) order_num,
    sum(order_original_amount_1d) original_amount,
    sum(activity_reduce_amount_1d) activity_reduce_amount,
    sum(coupon_reduce_amount_1d) coupon_reduce_amount,
    sum(order_total_amount_1d) total_amount
from dws_trade_user_order_1d
group by user_id;

方式二:
方式二是利用前一天的td表和今天的1d表,将两部分数据结合到一起
使用full outer join的方法:

insert overwrite table dws_trade_user_order_td partition(dt='2020-06-15')
select
    nvl(old.user_id,new.user_id),
    if(new.user_id is not null and old.user_id is null,'2020-06-15',old.order_date_first),
    if(new.user_id is not null,'2020-06-15',old.order_date_last),
    nvl(old.order_count_td,0)+nvl(new.order_count_1d,0),
    nvl(old.order_num_td,0)+nvl(new.order_num_1d,0),
    nvl(old.original_amount_td,0)+nvl(new.order_original_amount_1d,0),
    nvl(old.activity_reduce_amount_td,0)+nvl(new.activity_reduce_amount_1d,0),
    nvl(old.coupon_reduce_amount_td,0)+nvl(new.coupon_reduce_amount_1d,0),
    nvl(old.total_amount_td,0)+nvl(new.order_total_amount_1d,0)
from
(
    select
        user_id,
        order_date_first,
        order_date_last,
        order_count_td,
        order_num_td,
        original_amount_td,
        activity_reduce_amount_td,
        coupon_reduce_amount_td,
        total_amount_td
    from dws_trade_user_order_td
    where dt=date_add('2020-06-15',-1)
)old
full outer join
(
    select
        user_id,
        order_count_1d,
        order_num_1d,
        order_original_amount_1d,
        activity_reduce_amount_1d,
        coupon_reduce_amount_1d,
        order_total_amount_1d
    from dws_trade_user_order_1d
    where dt='2020-06-15'
)new
on old.user_id=new.user_id;

使用union all的方法:

select
  user_id,
  min(order_date_first) login_date_first,
  max(order_date_last) login_date_last,
  sum(order_count_td) order_count,
  sum(order_num_td) order_num,
  sum(original_amount_td) original_amount,
  sum(activity_reduce_amount_td) activity_reduce_amount,
  sum(coupon_reduce_amount_td) coupon_reduce_amount,
  sum(total_amount_td) total_amount
from
(
  select
    user_id,
    order_date_first,
    order_date_last,
    order_count_td,
    order_num_td,
    original_amount_td,
    activity_reduce_amount_td,
    coupon_reduce_amount_td,
    total_amount_td
  from dws_trade_user_order_td
  where dt='2020-06-14'
  union all
  select
    user_id,
    '2020-06-15',
    '2020-06-15',
    sum(order_count_1d) order_count,
    sum(order_num_1d) order_num,
    sum(order_original_amount_1d) original_amount,
    sum(activity_reduce_amount_1d) activity_reduce_amount,
    sum(coupon_reduce_amount_1d) coupon_reduce_amount,
    sum(order_total_amount_1d) total_amount
  from dws_trade_user_order_1d
  where dt='2020-06-15'
)t1
group by t1.user_id

二、交易域用户粒度支付历史至今汇总表

1.建表语句

DROP TABLE IF EXISTS dws_trade_user_payment_td;
CREATE EXTERNAL TABLE dws_trade_user_payment_td
(
    `user_id`            STRING COMMENT '用户id',
    `payment_date_first` STRING COMMENT '首次支付日期',
    `payment_date_last`  STRING COMMENT '末次支付日期',
    `payment_count_td`   BIGINT COMMENT '历史至今支付次数',
    `payment_num_td`     BIGINT COMMENT '历史至今支付商品件数',
    `payment_amount_td`  DECIMAL(16, 2) COMMENT '历史至今支付金额'
) COMMENT '交易域用户粒度支付历史至今汇总事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/gmall/dws/dws_trade_user_payment_td'
    TBLPROPERTIES ('orc.compress' = 'snappy');

2.首日数据装载

insert overwrite table dws_trade_user_payment_td partition(dt='2020-06-14')
select
    user_id,
    min(dt) payment_date_first,
    max(dt) payment_date_last,
    sum(payment_count_1d) payment_count,
    sum(payment_num_1d) payment_num,
    sum(payment_amount_1d) payment_amount
from dws_trade_user_payment_1d
group by user_id;

3.每日数据装载

每日装载和一中的一样

insert overwrite table dws_trade_user_payment_td partition(dt='2020-06-15')
select
    nvl(old.user_id,new.user_id),
    if(old.user_id is null and new.user_id is not null,'2020-06-15',old.payment_date_first),
    if(new.user_id is not null,'2020-06-15',old.payment_date_last),
    nvl(old.payment_count_td,0)+nvl(new.payment_count_1d,0),
    nvl(old.payment_num_td,0)+nvl(new.payment_num_1d,0),
    nvl(old.payment_amount_td,0)+nvl(new.payment_amount_1d,0)
from
(
    select
        user_id,
        payment_date_first,
        payment_date_last,
        payment_count_td,
        payment_num_td,
        payment_amount_td
    from dws_trade_user_payment_td
    where dt=date_add('2020-06-15',-1)
)old
full outer join
(
    select
        user_id,
        payment_count_1d,
        payment_num_1d,
        payment_amount_1d
    from dws_trade_user_payment_1d
    where dt='2020-06-15'
)new
on old.user_id=new.user_id;

三、用户域用户粒度登录历史至今汇总表

1.建表语句

DROP TABLE IF EXISTS dws_user_user_login_td;
CREATE EXTERNAL TABLE dws_user_user_login_td
(
    `user_id`         STRING COMMENT '用户id',
    `login_date_last` STRING COMMENT '末次登录日期',
    `login_count_td`  BIGINT COMMENT '累计登录次数'
) COMMENT '用户域用户粒度登录历史至今汇总事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/gmall/dws/dws_user_user_login_td'
    TBLPROPERTIES ('orc.compress' = 'snappy');

这里可以把注册日期当作用户的首次登录日期,所以可以不写这个字段。

2.首日装载语句

insert overwrite table dws_user_user_login_td partition(dt='2020-06-14')
select
    u.id,
    nvl(login_date_last,date_format(create_time,'yyyy-MM-dd')),
    nvl(login_count_td,1)
from
(
    select
        id,
        create_time
    from dim_user_zip
    where dt='9999-12-31'
)u
left join
(
    select
        user_id,
        max(dt) login_date_last,
        count(*) login_count_td
    from dwd_user_login_inc
    group by user_id
)l
on u.id=l.user_id;

首日装载语句说明:
1)在首日装载时,因为登录表里的数据来自于日志,所以只有部分用户的登录状态,因此要从用户表中拿到所有用户的信息,若没有某用户的登录信息,那么就根据用户的注册时间作为登录时间,登录次数设置为1(这里使用用户9999-12-31分区的数据跟登录表做left join操作)

3.每日装载语句

insert overwrite table dws_user_user_login_td partition(dt='2020-06-15')
select
    nvl(old.user_id,new.user_id),
    if(new.user_id is null,old.login_date_last,'2020-06-15'),
    nvl(old.login_count_td,0)+nvl(new.login_count_1d,0)
from
(
    select
        user_id,
        login_date_last,
        login_count_td
    from dws_user_user_login_td
    where dt=date_add('2020-06-15',-1)
)old
full outer join
(
    select
        user_id,
        count(*) login_count_1d
    from dwd_user_login_inc
    where dt='2020-06-15'
    group by user_id
)new
on old.user_id=new.user_id;

每日装载语句说明:
1)每日装载时,先拿到前一天的td表的数据,在拿到当天登录用户的数据,需要注意的是,当天登录的用户可能存在第一天注册登录的情况,即前一天的td表中没有该用户的数据,所以要使用full outer join,取两张表的并集,再根据字段的值取对应的值。

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值