1. 去重技巧—用group by替换distinct
- 取出user_trade中的全部支付用户
## 原有写法
SELECT distinct user_name
FROM user_trade
WHERE dt>'0';
## 优化写法
SELECT user_name
FROM user_trade
WHERE dt>'0'
GROUP BY user_name;
- 在2019年购买又退款的用户
## 原有写法
SELECT a.user_name
FROM
(SELECT distinct user_name
FROM user_trade
WHERE year(dt)=2019) a
JOIN
(SELECT distinct user_name
FROM user_refund
WHERE year(dt)=2019) b ON a.user_name=b.user_name;
## 优化写法
SELECT a.user_name
FROM
(SELECT user_name
FROM user_trade
WHERE year(dt)=2019
GROUP BY user_name) a
JOIN
(SELECT user_name
FROM user_refund
WHERE year(dt)=2019
GROUP BY user_name) b ON a.user_name=b.user_name;
- 在极大的数据量(且很多重复值)时,可以先group by去重,再count()计数,效率高于直接count(distinct **)
2. 聚合技巧–利用窗口函数grouping sets、cube、rollup
- 性别、城市和等级用户分布
SELECT sex,
city,
level,
count(user_id)
FROM user_info
GROUP BY sex,city,level
GROUPING SETS (sex,city,level);
-
GROUPING SETS() 在group by查询中,根据不同的维度进行聚合,等价于将不同维度的group by结果集进行union all。聚合规则在括号中进行指定。
-
性别、性别&城市的用户分布
SELECT sex,
city,
count(user_id)
FROM user_info
GROUP BY