问:hive explain在生产实践中能帮助我们做什么?
1、join语句会过滤null值吗?——可以!
执行如下语句:
explain select a.loan_typ from ods.ods_s01_orc_lc_appl_i a join ods.ods_s01_orc_p_loan_typ_s b on a.loan_typ =b.typ_cde;
查看运行结果(只截取相关部分):
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
$hdt$_1:b
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
$hdt$_1:b
TableScan
alias: b
Statistics: Num rows: 39064 Data size: 354481431 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: typ_cde is not null (type: boolean)
Statistics: Num rows: 39064 Data size: 354481431 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: typ_cde (type: string)
outputColumnNames: _col0
Statistics: Num rows: 39064 Data size: 354481431 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
可以看到predicate: typ_cde is not null (type: boolean)记录,说明join时自动过滤了关联字段为null的记录。
实验left join是不会过滤的。
2、group by分组语句会排序吗?——会!
explain select a.loan_typ,count(1) from ods.ods_s01_orc_lc_appl_i a group by a.loan_typ;
如上语句,查看explain执行结果:
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 39128923 Data size: 324039568673 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: loan_typ (type: string)
outputColumnNames: _col0
Statistics: Num rows: 39128923 Data size: 324039568673 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count(1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 39128923 Data size: 324039568673 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 39128923 Data size: 324039568673 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
可以看到Group By Operator操作里,有keys: _col0 (type: string),说明是按照loan_typ 进行分组的,下面还有sort order: +,说明是按照loan_typ 字段进行正序排序的。
3、哪条sql执行效率高呢?
对比如下两条sql:
select a.loan_typ,count(1)
from