006-高级SQL查询与优化
学习目标
通过本章学习,您将能够:
- 掌握复杂SQL查询的编写技巧
- 理解子查询和连接查询的使用场景
- 学会使用窗口函数进行高级数据分析
- 掌握索引的设计和优化策略
- 理解查询执行计划和性能调优
- 学会使用存储过程和触发器
6.1 复杂查询技术
6.1.1 子查询
子查询是嵌套在其他SQL语句中的查询语句。
标量子查询:
-- 查找工资高于平均工资的员工
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
表子查询:
-- 查找每个部门工资最高的员工
SELECT e.name, e.department_id, e.salary
FROM employees e
WHERE (e.department_id, e.salary) IN (
SELECT department_id, MAX(salary)
FROM employees
GROUP BY department_id
);
相关子查询:
-- 查找每个部门工资排名前3的员工
SELECT e1.name, e1.department_id, e1.salary
FROM employees e1
WHERE 3 > (
SELECT COUNT(*)
FROM employees e2
WHERE e2.department_id = e1.department_id
AND e2.salary > e1.salary
);
6.1.2 公用表表达式(CTE)
-- 使用CTE计算递归层次结构
WITH RECURSIVE employee_hierarchy AS (
-- 基础查询:找到顶级管理者
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归查询:找到下级员工
SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;
6.2 窗口函数
窗口函数在结果集的分区上执行计算,不会改变结果集的行数。
6.2.1 排名函数
-- 员工工资排名
SELECT
name,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as row_num,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as rank_num,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as dense_rank_num
FROM employees;
6.2.2 聚合窗口函数
-- 累计销售额和移动平均
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) as cumulative_sum,
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3
FROM sales
ORDER BY sale_date;
6.2.3 偏移函数
-- 计算销售额环比增长
SELECT
sale_date,
amount,
LAG(amount, 1) OVER (ORDER BY sale_date) as prev_amount,
LEAD(amount, 1) OVER (ORDER BY sale_date) as next_amount,
amount - LAG(amount, 1) OVER (ORDER BY sale_date) as growth
FROM sales
ORDER BY sale_date;
6.3 索引设计与优化
6.3.1 索引类型
6.3.2 索引创建策略
-- 单列索引
CREATE INDEX idx_employee_name ON employees(name);
-- 复合索引
CREATE INDEX idx_employee_dept_salary ON employees(department_id, salary DESC);
-- 唯一索引
CREATE UNIQUE INDEX idx_employee_email ON employees(email);
-- 部分索引
CREATE INDEX idx_active_employees ON employees(name) WHERE status = 'active';
-- 函数索引
CREATE INDEX idx_employee_upper_name ON employees(UPPER(name));
6.3.3 索引使用原则
- 最左前缀原则:复合索引按最左列开始匹配
- 选择性原则:选择性高的列适合建索引
- 覆盖索引:索引包含查询所需的所有列
-- 利用覆盖索引
CREATE INDEX idx_employee_cover ON employees(department_id, name, salary);
-- 以下查询可以完全使用索引
SELECT name, salary
FROM employees
WHERE department_id = 10;
6.4 查询执行计划
6.4.1 执行计划分析
-- 查看执行计划
EXPLAIN (ANALYZE, BUFFERS)
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 50000;
6.4.2 常见执行计划操作
graph TD
A[执行计划操作] --> B[Seq Scan 顺序扫描]
A --> C[Index Scan 索引扫描]
A --> D[Index Only Scan 仅索引扫描]
A --> E[Nested Loop 嵌套循环]
A --> F[Hash Join 哈希连接]
A --> G[Merge Join 归并连接]
A --> H[Sort 排序]
A --> I[Hash 哈希]
B --> B1[全表扫描,成本高]
C --> C1[通过索引访问数据]
D --> D1[仅扫描索引,不访问表]
E --> E1[小表驱动大表]
F --> F1[构建哈希表进行连接]
G --> G1[两个有序集合归并]
6.5 查询优化技术
6.5.1 SQL重写优化
-- 优化前:使用NOT IN
SELECT * FROM employees
WHERE department_id NOT IN (SELECT department_id FROM departments WHERE budget < 100000);
-- 优化后:使用NOT EXISTS
SELECT * FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM departments d
WHERE d.department_id = e.department_id AND d.budget < 100000
);
6.5.2 分页查询优化
-- 传统分页(大偏移量时性能差)
SELECT * FROM employees ORDER BY employee_id LIMIT 10000, 20;
-- 优化后的分页(使用游标)
SELECT * FROM employees
WHERE employee_id > 10000
ORDER BY employee_id
LIMIT 20;
6.5.3 统计信息更新
-- 更新表统计信息
ANALYZE employees;
-- 查看表统计信息
SELECT
schemaname,
tablename,
n_tup_ins,
n_tup_upd,
n_tup_del,
last_analyze
FROM pg_stat_user_tables
WHERE tablename = 'employees';
6.6 存储过程和函数
6.6.1 存储过程
-- 创建存储过程
CREATE OR REPLACE PROCEDURE update_employee_salary(
p_employee_id INTEGER,
p_salary_increase DECIMAL(10,2)
)
LANGUAGE plpgsql
AS $$
BEGIN
-- 更新员工工资
UPDATE employees
SET salary = salary + p_salary_increase,
updated_at = CURRENT_TIMESTAMP
WHERE employee_id = p_employee_id;
-- 记录操作日志
INSERT INTO salary_history (employee_id, old_salary, new_salary, change_date)
SELECT
p_employee_id,
salary - p_salary_increase,
salary,
CURRENT_TIMESTAMP
FROM employees
WHERE employee_id = p_employee_id;
COMMIT;
END;
$$;
-- 调用存储过程
CALL update_employee_salary(1001, 5000.00);
6.6.2 用户定义函数
-- 创建函数计算员工年龄
CREATE OR REPLACE FUNCTION calculate_age(birth_date DATE)
RETURNS INTEGER
LANGUAGE plpgsql
AS $$
BEGIN
RETURN EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date));
END;
$$;
-- 使用函数
SELECT name, birth_date, calculate_age(birth_date) as age
FROM employees;
6.7 触发器
6.7.1 创建触发器
-- 创建触发器函数
CREATE OR REPLACE FUNCTION audit_employee_changes()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO employee_audit (operation, employee_id, old_data, new_data, change_time)
VALUES ('INSERT', NEW.employee_id, NULL, row_to_json(NEW), CURRENT_TIMESTAMP);
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO employee_audit (operation, employee_id, old_data, new_data, change_time)
VALUES ('UPDATE', NEW.employee_id, row_to_json(OLD), row_to_json(NEW), CURRENT_TIMESTAMP);
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO employee_audit (operation, employee_id, old_data, new_data, change_time)
VALUES ('DELETE', OLD.employee_id, row_to_json(OLD), NULL, CURRENT_TIMESTAMP);
RETURN OLD;
END IF;
RETURN NULL;
END;
$$;
-- 创建触发器
CREATE TRIGGER employee_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON employees
FOR EACH ROW
EXECUTE FUNCTION audit_employee_changes();
6.8 实践练习
练习1:复杂查询编写
编写SQL查询,找出每个部门中工资排名前2的员工,并显示他们与部门平均工资的差值。
-- 参考答案
WITH dept_avg AS (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
),
ranked_employees AS (
SELECT
employee_id,
name,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rn
FROM employees
)
SELECT
re.name,
re.department_id,
re.salary,
da.avg_salary,
re.salary - da.avg_salary as salary_diff
FROM ranked_employees re
JOIN dept_avg da ON re.department_id = da.department_id
WHERE re.rn <= 2
ORDER BY re.department_id, re.rn;
练习2:索引优化
为以下查询设计最优的索引策略:
SELECT e.name, e.salary, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.hire_date >= '2020-01-01'
AND e.status = 'active'
ORDER BY e.salary DESC;
练习3:性能调优
分析并优化一个慢查询,包括执行计划分析和索引建议。
6.9 常见问题与解答
Q1:什么时候使用子查询,什么时候使用连接?
A1:一般情况下,连接的性能优于子查询。但在以下情况下可以考虑子查询:
- 逻辑更清晰时
- 需要使用EXISTS/NOT EXISTS时
- 子查询结果集很小时
Q2:如何选择合适的索引?
A2:索引选择原则:
- WHERE条件中的列
- JOIN条件中的列
- ORDER BY中的列
- 选择性高的列
- 避免过多索引影响写性能
Q3:窗口函数与GROUP BY的区别?
A3:主要区别:
- GROUP BY会减少结果集行数
- 窗口函数保持原有行数
- 窗口函数可以访问分区内的其他行
6.10 本章小结
本章介绍了高级SQL查询与优化技术,包括:
- 复杂查询技术:子查询、CTE、递归查询
- 窗口函数:排名、聚合、偏移函数的使用
- 索引优化:索引类型、创建策略、使用原则
- 查询优化:执行计划分析、SQL重写、性能调优
- 存储过程和函数:业务逻辑封装和代码复用
- 触发器:数据变更的自动响应机制
掌握这些高级技术对于开发高性能的数据库应用至关重要。
参考资料
- PostgreSQL官方文档 - 查询规划器
- MySQL性能调优指南
- SQL反模式 - Bill Karwin
- 高性能MySQL - Baron Schwartz
- 数据库查询优化器的艺术 - 李海翔
本章内容涵盖了高级SQL查询与优化的核心概念,为深入理解数据库性能调优奠定基础。
1296

被折叠的 条评论
为什么被折叠?



