文章目录
场景
源MySQL数据导入HIVE数据仓库ODS层后,统计主键行数、主键NULL行数、主键重复行数
统计HIVE表主键行数、主键NULL行数、主键重复行数的SQL
-- 统计HIVE表源MySQL主键行数、主键NULL行数、主键重复行数
WITH
w AS (
SELECT
`id`
,ROW_NUMBER()OVER(PARTITION BY `id` ORDER BY `id`)r
FROM `{hive_tb}`
WHERE ymd="{ymd}"
)
SELECT
COUNT(1) -- 行数
,COUNT(if(`id` IS NOT NULL AND `r`==1,1,NULL)) -- 非空主键去重后行数
,COUNT(if(`id` IS NULL,1,NULL)) -- 主键NULL行数
,COUNT(if(`id` IS NOT NULL AND `r`>1,1,NULL)) -- 主键重复行数
FROM w;
```### 联合主键的情况
`i1`和`i2`是联合主键
```sql
WITH
w AS (
SELECT
concat(`i1`,`i2`) AS `pk`
,ROW_NUMBER()OVER(PARTITION BY `i1`,`i2` ORDER BY `i1`,`i2`)r
FROM `{hive_tb}`
WHERE ymd="{ymd}"
)
SELECT
COUNT(1) AS target_rows -- 行数
,COUNT(if(`pk` IS NOT NULL AND `r`==1,1,NULL)) AS target_distinct_

最低0.47元/天 解锁文章
5319

被折叠的 条评论
为什么被折叠?



