DM层
DM即data market 数据集市,主要是对DWS层中的数据进行一个按时间(年月日)进行聚合的操作。
DM层存在三张表:
- dm_sale (销售相关)
- dm_sku (商品相关)
- dm_user (用户相关)
销售相关
针对dws中的表dws_sale_daycount
进行聚合操作,由于这两个表(dm_sale
和dws_sale_daycount
)在聚合过程中存在相同字段group_type
,两个表的纬度需要相互匹配,匹配的条件是where group_type_dws=group_type_dm
- 关键SQL
insert into hive1.yp_dm.dm_sale
with groupby as
(
select
case
when grouping(dim_date_id)=0
then 'date'
when grouping(year_week_name_cn)=0
then 'week'
when grouping(year_month)=0
then 'month'
when grouping(year_code)=0
then 'year'
end
as time_type,
year_code,
year_month,
month_code,
day_month_num,
dim_date_id,
year_week_name_cn,
case
when grouping(brand_id)=0
then 'brand'
when grouping(min_class_id)=0
then 'min_class'
when grouping(mid_class_id)=0
then 'mid_class'
when grouping(max_class_id)=0
then 'max_class'
when grouping(store_id)=0
then 'store'
when grouping(trade_area_id)=0
then 'trade_area'
when grouping(city_id)=0
then 'city'
when grouping(year_code)=0
then 'all'
end
as group_type_new,
dws.group_type as group_type_old,
province_id,
...,
sum(sale_amt) as sale_amt,
...,
'2021-11-01' as date_time
from hive1.yp_dws.dws_sale_daycount dws
left join hive1.yp_dwd.dim_date d
on dws.dt = d.dim_date_id
group by
grouping sets(
-- 年
(year_code, dws.group_type),
(year_code, city_id, city_name, province_id, province_name, dws.group_type),
(year_code, trade_area_id, trade_area_name, city_id, city_name, province_id, province_name, dws.group_type),
(year_code, store_id, store_name, trade_area_id, trade_area_name, city_id, city_name, province_id, province_name, dws.group_type),
(year_code, brand_id, brand_name, dws.group_type),
(year_code, max_class_id, max_class_name, dws.group_type),
(year_code, mid_class_id, mid_class_name, max_class_id, max_class_name, dws.group_type),
(year_code, min_class_id, min_class_name, mid_class_id, mid_class_name, max_class_id, max_class_name, dws.group_type),
-- 月
...,
-- 日
...,
-- 周 和月、年是多对多关系,不再是父子关系
...
)
)
select
...,
group_type_new as group_type,
...
from groupby
where group_type_new=group_type_old
;
商品相关
商品和用户纬度涉及到两个方面的数据导入:全量和增量,首次数据从dws到dm时使用全量,按照一定时间周期增长的数据采用增量的方式进行采集。
关键SQL
- 全量
insert into yp_dm.dm_sku
with all_count as (
...
),
month as (
...
)
select ac.*, '2021-08-31' date_time
from all_count ac
union all
select m.*, '2021-08-31' date_time
from month m;
- 增量
-- 合并新旧数据
-- 增量
insert into yp_dm.dm_sku_tmp
-- dm旧数据
with max_time as (
...
),
old as (
...
),
-- 昨天 的dws新数据
last_day as (
...
),
-- 总累积数值=旧数据+昨日新数据
all as (
...
),
-- 月统计值
last_month as (
...
)
-- 合并总累积和月度数据
select *, '2021-08-31' date_time from all
union all
select *, '2021-08-31' date_time from last_month;
用户相关与商品类似,这里不过多赘述。