SQL中distinct与group byorder by 的矛盾 解决方案

SQL去重与优化
本文通过实例演示如何解决SQL查询中出现的重复记录问题,并提供了一种有效的解决方案,即使用GROUP BY结合聚合函数MAX来去除重复记录,同时保持查询结果的正确排序。

select distinct cont_id from cms_contentiteminfo t order by cast (t.click as int) desc

上面这句报错ORA-1791,not a SELECTED expression

select distinct cont_id ,cast (t.click as int) as cfrom cms_contentiteminfo t order by cast (t.click as int) desc

查出的结果是有重复记录的

cont_id c

1853371517
1856391003
185586985
185337967
185641937
185337899
185639832
185641832
185641727
185283649
185144605
185586588
185639564
185216547
185586508
185283492
185144422
185283421
185216378
185217369
185144346
185626309
185217303
192278254

修改为

select distinct cont_id,max(cast (t.click as int)) c from cms_contentiteminfo t group by cont_id order by c desc

查询结果

cont_id c

1853371517
1856391003
185586985
185641937
185283649
185144605
185216547
185217369
185626309
192278254
185145234
191809214
185286191
190484190
185642190
190487180
185287171
190486162
192483146
189151138
191253118
177607117
192704116
185140115

cont-id重复的记录就可以去掉了

<think>我们之前已经详细讨论过SQLGROUP BY的注意事项,但为了确保回答的完整性,我将再次总结关键点并补充更多细节。 关键注意事项: 1. SELECT子句中的列:除了聚合函数外,SELECT子句中出现的列必须包含在GROUP BY子句中,或者是聚合函数的参数。 2. 聚合函数的使用:GROUP BY通常聚合函数(如COUNT, SUM, AVG, MAX, MIN等)一起使用。 3. WHEREHAVING的区别:WHERE用于在分组前过滤行,而HAVING用于在分组后过滤分组。 4. NULL值处理:GROUP BY将NULL值视为一组。 5. 多列分组:GROUP BY可以按多列分组,结果将按列的组合分组。 6. 排序:分组后可以使用ORDER BY对结果进行排序。 7. 性能:GROUP BY操作可能涉及排序和哈希,大数据集时可能影响性能,需考虑索引优化。 下面通过示例代码详细说明:</think># SQL GROUP BY 分组深度解析最佳实践 GROUP BYSQL 中最强大但也最容易出错的功能之一。作为编程专家,我将全面解析 GROUP BY 的使用注意事项、常见陷阱和高级技巧。 ## 一、GROUP BY 核心注意事项 ### 1. SELECT 子句限制 **问题**:SELECT 中的非聚合列必须包含在 GROUP BY 中 ```sql -- 错误示例:product_name 未分组 SELECT category_id, product_name, AVG(price) FROM products GROUP BY category_id; -- 正确示例 SELECT category_id, product_name, AVG(price) FROM products GROUP BY category_id, product_name; -- 添加缺失的分组列 ``` ### 2. NULL 值处理 **问题**:所有 NULL 值会被分到同一组 ```sql -- 统计经理为NULL的员工 SELECT manager_id, COUNT(*) AS employee_count FROM employees GROUP BY manager_id; -- NULL 值会单独成组 ``` ### 3. WHERE vs HAVING **问题**:过滤条件放置错误 ```sql -- 错误:WHERE 不能过滤聚合结果 SELECT department_id, AVG(salary) FROM employees WHERE AVG(salary) > 5000 GROUP BY department_id; -- 正确:使用 HAVING 过滤分组结果 SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id HAVING AVG(salary) > 5000; ``` ### 4. 分组顺序影响 **问题**:分组顺序影响结果组织方式 ```sql -- 按部门>职位分组 SELECT department_id, job_id, COUNT(*) FROM employees GROUP BY department_id, job_id; -- 按职位>部门分组(结果组织方式不同) SELECT job_id, department_id, COUNT(*) FROM employees GROUP BY job_id, department_id; ``` ## 二、高级分组技术 ### 1. ROLLUP 层次汇总 ```sql -- 生成部门-职位的分层汇总 SELECT COALESCE(department_name, '所有部门') AS department, COALESCE(job_title, '所有职位') AS job, SUM(salary) AS total_salary FROM employees GROUP BY ROLLUP(department_name, job_title); ``` ### 2. CUBE 多维分析 ```sql -- 多维度分析销售数据 SELECT COALESCE(region, '所有地区') AS region, COALESCE(product_category, '所有类别') AS category, COALESCE(TO_CHAR(sale_date, 'YYYY-MM'), '所有月份') AS month, SUM(amount) AS total_sales FROM sales GROUP BY CUBE(region, product_category, TO_CHAR(sale_date, 'YYYY-MM')); ``` ### 3. GROUPING SETS 灵活分组 ```sql -- 自定义分组组合 SELECT department_id, job_id, EXTRACT(YEAR FROM hire_date) AS hire_year, COUNT(*) AS employee_count FROM employees GROUP BY GROUPING SETS ( (department_id, job_id), (hire_year), (department_id, hire_year), () -- 总计 ); ``` ## 三、性能优化策略 ### 1. 索引优化 ```sql -- 创建覆盖索引 CREATE INDEX idx_emp_dept_job ON employees (department_id, job_id, salary); ``` ### 2. 减少分组前数据量 ```sql -- 先过滤再分组 SELECT department_id, AVG(salary) FROM ( SELECT department_id, salary FROM employees WHERE hire_date > '2020-01-01' -- 先过滤 ) AS filtered_emps GROUP BY department_id; ``` ### 3. 使用物化视图 ```sql -- Oracle 物化视图 CREATE MATERIALIZED VIEW dept_salary_mv REFRESH FAST ON COMMIT AS SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id; ``` ### 4. 避免不必要的列 ```sql -- 优化前(包含不必要列) SELECT department_id, job_id, COUNT(*) FROM employees GROUP BY department_id, job_id; -- 优化后(只取必要列) SELECT department_id, job_id, COUNT(*) FROM ( SELECT department_id, job_id FROM employees ) AS emp_sub GROUP BY department_id, job_id; ``` ## 四、完整案例:电商销售分析 ```sql -- 创建表结构 CREATE TABLE sales ( sale_id INT PRIMARY KEY, product_id INT, category VARCHAR(50), region VARCHAR(50), sale_date DATE, quantity INT, amount DECIMAL(10,2) ); CREATE TABLE products ( product_id INT PRIMARY KEY, product_name VARCHAR(100), cost DECIMAL(10,2) ); -- 复杂分组查询 SELECT COALESCE(s.region, '所有地区') AS region, COALESCE(s.category, '所有类别') AS category, COALESCE(TO_CHAR(s.sale_date, 'YYYY-Q'), '所有季度') AS quarter, COUNT(DISTINCT s.sale_id) AS order_count, SUM(s.quantity) AS total_quantity, SUM(s.amount) AS total_revenue, SUM(s.amount - p.cost * s.quantity) AS total_profit, ROUND(AVG(s.amount), 2) AS avg_order_value FROM sales s JOIN products p ON s.product_id = p.product_id WHERE s.sale_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY GROUPING SETS ( (s.region, s.category, TO_CHAR(s.sale_date, 'YYYY-Q')), (s.region, s.category), (s.region, TO_CHAR(s.sale_date, 'YYYY-Q')), (s.category, TO_CHAR(s.sale_date, 'YYYY-Q')), (s.region), (s.category), (TO_CHAR(s.sale_date, 'YYYY-Q')), () ) ORDER BY CASE WHEN region = '所有地区' THEN 1 ELSE 0 END, region, CASE WHEN category = '所有类别' THEN 1 ELSE 0 END, category, CASE WHEN quarter = '所有季度' THEN 1 ELSE 0 END, quarter; ``` ## 五、常见错误解决方案 ### 1. 非分组列错误 ```sql -- 错误:hire_date 未包含在 GROUP BY 中 SELECT department_id, hire_date, COUNT(*) FROM employees GROUP BY department_id; -- 解决方案1:添加缺失分组列 SELECT department_id, hire_date, COUNT(*) FROM employees GROUP BY department_id, hire_date; -- 解决方案2:使用聚合函数 SELECT department_id, MIN(hire_date) AS first_hire, MAX(hire_date) AS last_hire, COUNT(*) FROM employees GROUP BY department_id; ``` ### 2. 聚合函数位置错误 ```sql -- 错误:在 WHERE 中使用聚合函数 SELECT department_id FROM employees WHERE COUNT(*) > 5 GROUP BY department_id; -- 解决方案:使用 HAVING SELECT department_id FROM employees GROUP BY department_id HAVING COUNT(*) > 5; ``` ### 3. 多对多关系处理 ```sql -- 错误:订单金额重复计算 SELECT c.customer_id, SUM(o.amount) FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items i ON o.order_id = i.order_id GROUP BY c.customer_id; -- 解决方案:先聚合子表 SELECT c.customer_id, SUM(order_total) FROM customers c JOIN ( SELECT customer_id, order_id, SUM(amount) AS order_total FROM orders GROUP BY customer_id, order_id ) o ON c.customer_id = o.customer_id GROUP BY c.customer_id; ``` ## 六、最佳实践总结 1. **列选择原则**:SELECT 中的非聚合列必须包含在 GROUP BY 中 2. **过滤顺序**:WHERE → GROUP BY → HAVING → SELECT → ORDER BY 3. **NULL 处理**:明确处理 NULL 值分组 4. **性能优化**:使用索引、减少分组前数据量 5. **高级分组**:合理使用 ROLLUP、CUBE、GROUPING SETS 6. **结果排序**:始终使用 ORDER BY 明确排序 7. **数据验证**:对比分组前后数据量验证正确性
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值