环境信息:
hive1.2.1
spark1.6.1
hadoop2.6.0-cdh5.4.2
memory:1918752, vCores:506
表结构:
表名称 | 表容量 | 主键 | hive存储类型 |
---|---|---|---|
temp_01_pc_order | 5G | PC_ORDER_ID | RCFile |
TST_ORDER_RISK | 9.4G | 非 PC_ORDER_ID | RCFile |
TST_ORDER_VEHICLE | 36G | PC_ORDER_VEHICLE_ID | RCFile |
TST_ORDER_ASSIST | 800M | 非ORDER_ID | RCFile |
TST_PC_ORDER | 90G | PC_ORDER_ID | RCFile |
原hive sql:
(为求简便,所有字段以XX代替,最终有90个字段)
INSERT OVERWRITE TABLE H_DW_ORDER
SELECT A.PC_ORDER_ID, A.XX90
FROM TST_PC_ORDER A
LEFT JOIN (SELECT PC_ORDER_ID, XX
FROM (SELECT PC_ORDER_ID
,XX
,ROW_NUMBER() OVER(PARTITION BY PC_ORDER_ID ORDER BY UPDATED_DATE DESC) AS ROWNO
FROM TST_ORDER_RISK) K
WHERE ROWNO = 1) B ON A.PC_ORDER_ID = B.PC_ORDER_ID
LEFT JOIN TST_ORDER_VEHICLE C ON B.CLIENT_ID = C.PC_ORDER_VEHICLE_ID
LEFT JOIN (SELECT T.* FROM TEMP_01_PC_ORDER T WHERE FLAG_L = 1) D ON A.PC_ORDER_ID =
D.PC_ORDER_ID
LEFT JOIN (SELECT T1.* FROM TEMP_01_PC_ORDER T1 WHERE FLAG_F = 1) F ON A.PC_ORDER_ID =
F.PC_ORDER_ID
LEFT JOIN (SELECT H.ORDER_ID, XX
FROM (SELECT E.XX
,E.ORDER_ID
,ROW_NUMBER() OVER(PARTITION BY