今天在使用hive 进行数据统计时,发现了一个问题使用类似如下的两个sql统计出的结果不一致:
select count(*) from tbl_simpledata where column = '-1';
select count(column) from tbl_simpledata where column = '-1';
查询hive的参考文档发现如下的信息:
count(*) - Returns the total number of retrieved rows, including rows containing NULL values;
count(expr) - Returns the number of rows for which the supplied expression is non-NULL;
count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
也就是说count(*)计算的时候包含了NULL值,而count(expr)则不包含空值。