基于0.14
- left join 时候左边表使用join列的谓词过滤
BUG语句
-
explain
-
select t.cookie,t.datetime,c.cookie
-
from log t
-
left join cookie c
-
on t.cookie = c.cookie
-
where t.dt = '2015_01_01_10'
-
and t.cookie = 'xxxxx'
问题原因:
查看执行计划:
-
Stage: Stage-1
-
Map Reduce
-
Map Operator Tree:
-
TableScan
-
alias: t
-
Statistics: Num rows: 782692 Data size: 97053857 Basic stats: COMPLETE Column stats: NONE
-
Filter Operator
-
predicate: (cookie = 'xxxxx') (type: boolean)
-
Statistics: Num rows: 391346 Data size: 48526928 Basic stats: COMPLETE Column stats: NONE
-
Reduce Output Operator
-
key expressions: 'xxxxx' (type: string)
-
sort order: +
-
Statistics: Num rows: 391346 Data size: 48526928 Basic stats: COMPLETE Column stats: NONE
-
value expressions: datetime (type: string)
-
TableScan
-
alias: c
-
Statistics: Num rows: 26938739 Data size: 1728178442 Basic stats: COMPLETE Column stats: NONE
-
Filter Operator
-
predicate: (cookie = 'xxxxx') (type: boolean)
-
Statistics: Num rows: 13469369 Data size: 864089188 Basic stats: COMPLETE Column stats: NONE
-
Reduce Output Operator
-
key expressions: cookie (type: string)
-
sort order: +
-
Map-reduce partition columns: cookie (type: string)
-
Statistics: Num rows: 13469369 Data size: 864089188 Basic stats: COMPLETE Column stats: NONE
-
Reduce Operator Tree:
执行计划如上,其中t表在map端的reduce时候partition columns丢失,导致相同的cookie无法和c表分发到同一个reduce上。
测试:
1、将reduce数量设置为1:
结果发现join操作正确
2、改写sql,将t.cookie = 'xxxx' 改为in操作
查看执行计划发现t表在map端的reduce时候partition columns正常,sql也执行正常。