📢 本文是《MySQL从入门到精通》系列的第五篇,将带你深入学习MySQL中的复杂查询技术,包括多表联结、子查询、集合操作和高级分组等,掌握这些技术将显著提升你的数据查询能力。
📋 文章目录
多表联结(JOIN)详解
关系型数据库的核心优势之一就是能够通过表之间的关系来组合数据。JOIN操作让我们能够从多个表中检索相关数据。
JOIN的类型和语法
MySQL支持多种类型的JOIN操作:
1. 内联结(INNER JOIN)
返回两个表中匹配行的结果。
-- 基本语法
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;
-- 示例:查询订单及对应的客户信息
SELECT
o.order_id,
o.order_date,
c.customer_name,
c.email
FROM
orders o
INNER JOIN
customers c ON o.customer_id = c.customer_id;
2. 左外联结(LEFT JOIN)
返回左表中的所有行,以及右表中的匹配行。如果右表中没有匹配,右表的列显示为NULL。
-- 基本语法
SELECT columns
FROM table1
LEFT JOIN table2 ON table1.column = table2.column;
-- 示例:查询所有客户及其订单(包括没有订单的客户)
SELECT
c.customer_id,
c.customer_name,
o.order_id,
o.order_date
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id;
3. 右外联结(RIGHT JOIN)
返回右表中的所有行,以及左表中的匹配行。如果左表中没有匹配,左表的列显示为NULL。
-- 基本语法
SELECT columns
FROM table1
RIGHT JOIN table2 ON table1.column = table2.column;
-- 示例:查询所有产品及其订单(包括未被订购的产品)
SELECT
p.product_id,
p.product_name,
oi.order_id,
oi.quantity
FROM
order_items oi
RIGHT JOIN
products p ON oi.product_id = p.product_id;
4. 全外联结(FULL JOIN)
返回两个表中的所有行。MySQL不直接支持FULL JOIN,但可以通过UNION结合LEFT JOIN和RIGHT JOIN来模拟。
-- 模拟FULL JOIN
SELECT
c.customer_id,
c.customer_name,
o.order_id
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id
UNION
SELECT
c.customer_id,
c.customer_name,
o.order_id
FROM
customers c
RIGHT JOIN
orders o ON c.customer_id = o.customer_id
WHERE
c.customer_id IS NULL;
5. 自联结(SELF JOIN)
同一个表与自身进行联结,通常用于处理层次结构数据。
-- 示例:查询员工及其直接经理
SELECT
e.employee_id,
e.employee_name,
m.employee_name AS manager_name
FROM
employees e
LEFT JOIN
employees m ON e.manager_id = m.employee_id;
多表联结
可以联结两个以上的表来获取更复杂的数据关系。
-- 示例:查询订单、客户和销售人员信息
SELECT
o.order_id,
o.order_date,
c.customer_name,
s.salesperson_name
FROM
orders o
JOIN
customers c ON o.customer_id = c.customer_id
JOIN
salespeople s ON o.salesperson_id = s.salesperson_id;
-- 包含订单明细的多表查询
SELECT
o.order_id,
c.customer_name,
p.product_name,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS item_total
FROM
orders o
JOIN
customers c ON o.customer_id = c.customer_id
JOIN
order_items oi ON o.order_id = oi.order_id
JOIN
products p ON oi.product_id = p.product_id
ORDER BY
o.order_id, p.product_name;
JOIN的性能优化
- 确保JOIN条件列有索引:这是最重要的优化手段
- 合理排序表的顺序:从小表到大表联结通常更高效
- 只选择必要的列:减少数据传输量
- 使用适当的JOIN类型:不要过度使用LEFT/RIGHT JOIN
- 控制结果集大小:使用WHERE和LIMIT限制结果数量
子查询和嵌套查询
子查询是嵌套在另一个查询中的SELECT语句,可用于构建更复杂的查询逻辑。
子查询的类型
1. 标量子查询
返回单个值的子查询,可用在需要单个值的表达式中。
-- 示例:查询价格高于平均价格的产品
SELECT
product_id,
product_name,
price
FROM
products
WHERE
price > (SELECT AVG(price) FROM products);
-- 在SELECT列表中使用标量子查询
SELECT
product_name,
price,
(SELECT AVG(price) FROM products) AS avg_price,
price - (SELECT AVG(price) FROM products) AS price_diff
FROM
products;
2. 行子查询
返回单行多列的子查询,可用于比较多个列的值。
-- 示例:查找与特定产品价格和类别相同的其他产品
SELECT
product_id,
product_name
FROM
products
WHERE
(category_id, price) = (
SELECT category_id, price
FROM products
WHERE product_id = 10
)
AND
product_id <> 10;
3. 列子查询
返回单列多行的子查询,通常与IN、ANY、ALL等运算符一起使用。
-- 示例:查询有订单的客户
SELECT
customer_id,
customer_name
FROM
customers
WHERE
customer_id IN (SELECT DISTINCT customer_id FROM orders);
-- 使用NOT IN查询没有订单的客户
SELECT
customer_id,
customer_name
FROM
customers
WHERE
customer_id NOT IN (SELECT DISTINCT customer_id FROM orders);
4. 表子查询
返回多行多列的子查询,可用在FROM子句中作为派生表。
-- 示例:从派生表中查询数据
SELECT
category_name,
avg_price
FROM
(
SELECT
c.category_name,
AVG(p.price) AS avg_price
FROM
products p
JOIN
categories c ON p.category_id = c.category_id
GROUP BY
c.category_name
) AS category_stats
WHERE
avg_price > 100;
子查询运算符
1. IN 和 NOT IN
检查值是否在子查询结果集中。
-- 示例:查询销售额前3名产品的订单
SELECT
order_id,
customer_id
FROM
orders
WHERE
order_id IN (
SELECT order_id
FROM order_items
WHERE product_id IN (
SELECT product_id
FROM products
ORDER BY sales DESC
LIMIT 3
)
);
2. EXISTS 和 NOT EXISTS
检查子查询是否返回任何行。比IN更高效,尤其是对大型数据集。
-- 示例:查询过去30天内有订单的客户
SELECT
customer_id,
customer_name
FROM
customers c
WHERE
EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date > DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
);
3. ANY/SOME 和 ALL
与比较运算符一起使用,分别检查条件是否对子查询结果集中的任意行或所有行成立。
-- 使用ANY:查询价格高于任何经济型产品的豪华产品
SELECT
product_name,
price
FROM
products
WHERE
category = 'luxury'
AND price > ANY (
SELECT price
FROM products
WHERE category = 'economy'
);
-- 使用ALL:查询价格高于所有经济型产品的豪华产品
SELECT
product_name,
price
FROM
products
WHERE
category = 'luxury'
AND price > ALL (
SELECT price
FROM products
WHERE category = 'economy'
);
相关子查询
相关子查询是引用外部查询表的子查询,每行外部查询都会执行一次子查询。
-- 示例:查询每个类别中价格高于该类别平均价格的产品
SELECT
p1.product_id,
p1.product_name,
p1.category_id,
p1.price
FROM
products p1
WHERE
p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p1.category_id
);
子查询优化技巧
- 尽可能使用JOIN代替子查询:在许多情况下,JOIN的性能更好
- 优先使用EXISTS而非IN:对大型数据集更高效
- 避免在子查询中使用相关子查询:可能导致执行效率低下
- 使用索引优化子查询性能:确保子查询中使用的列有合适的索引
- 考虑使用临时表:有时将复杂子查询的结果存储在临时表中更高效
集合操作(UNION/INTERSECT)
集合操作允许合并多个查询的结果集。
UNION 和 UNION ALL
UNION合并两个或多个SELECT语句的结果,并删除重复行。UNION ALL保留所有行,包括重复行。
-- UNION示例:合并活跃客户和潜在客户列表
SELECT
customer_id,
customer_name,
email,
'Active' AS status
FROM
customers
WHERE
status = 'active'
UNION
SELECT
lead_id AS customer_id,
lead_name AS customer_name,
email,
'Lead' AS status
FROM
leads
WHERE
quality_score > 80;
-- UNION ALL示例:查询所有交易记录(收入和支出)
SELECT
transaction_date,
amount,
'Income' AS type
FROM
income_transactions
UNION ALL
SELECT
transaction_date,
amount,
'Expense' AS type
FROM
expense_transactions
ORDER BY
transaction_date;
模拟INTERSECT
MySQL不直接支持INTERSECT,但可以使用IN或EXISTS模拟。INTERSECT返回两个查询结果中都存在的行。
-- 使用IN模拟INTERSECT:查询同时是客户和供应商的公司
SELECT
company_name,
contact_email
FROM
customers
WHERE
company_name IN (
SELECT company_name
FROM suppliers
);
-- 使用EXISTS模拟INTERSECT
SELECT
c.company_name,
c.contact_email
FROM
customers c
WHERE
EXISTS (
SELECT 1
FROM suppliers s
WHERE s.company_name = c.company_name
);
模拟EXCEPT/MINUS
MySQL同样不直接支持EXCEPT/MINUS,但可以使用NOT IN或NOT EXISTS模拟。EXCEPT返回第一个查询中存在但第二个查询中不存在的行。
-- 使用NOT IN模拟EXCEPT:查询有订单的客户
SELECT
customer_id,
customer_name
FROM
customers
WHERE
customer_id NOT IN (
SELECT DISTINCT customer_id
FROM orders
);
-- 使用NOT EXISTS模拟EXCEPT
SELECT
c.customer_id,
c.customer_name
FROM
customers c
WHERE
NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
高级分组和聚合
GROUP BY子句用于对数据进行分组,通常与聚合函数一起使用。
基本分组和聚合
-- 基本分组查询:按类别统计产品数量和平均价格
SELECT
category_id,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id;
-- 多列分组:按类别和供应商统计
SELECT
category_id,
supplier_id,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id, supplier_id;
HAVING子句
HAVING用于过滤分组后的结果,类似于WHERE过滤行,但适用于分组和聚合后的结果。
-- 示例:查找平均价格超过100的类别
SELECT
category_id,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id
HAVING
AVG(price) > 100;
-- 组合WHERE和HAVING:查找有超过5个激活产品且平均价格超过50的类别
SELECT
category_id,
COUNT(*) AS active_products,
AVG(price) AS avg_price
FROM
products
WHERE
status = 'active'
GROUP BY
category_id
HAVING
COUNT(*) > 5 AND AVG(price) > 50;
GROUP_CONCAT函数
GROUP_CONCAT将组内的值连接成一个字符串,非常适合生成逗号分隔列表。
-- 示例:每个类别的产品名称列表
SELECT
category_id,
GROUP_CONCAT(product_name) AS products
FROM
products
GROUP BY
category_id;
-- 自定义分隔符和排序
SELECT
category_id,
GROUP_CONCAT(
product_name
ORDER BY price DESC
SEPARATOR ' | '
) AS products
FROM
products
GROUP BY
category_id;
WITH ROLLUP
WITH ROLLUP修饰符生成额外的汇总行,提供分层汇总信息。
-- 示例:按类别和供应商统计产品,带汇总行
SELECT
IFNULL(category_id, 'All Categories') AS category,
IFNULL(supplier_id, 'All Suppliers') AS supplier,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM
products
GROUP BY
category_id, supplier_id
WITH ROLLUP;
窗口函数(MySQL 8.0+)
MySQL 8.0引入了窗口函数(也称为分析函数),它们对结果集的行执行计算,类似于聚合函数,但不会减少结果集的行数。
窗口函数语法
function_name() OVER (
[PARTITION BY column1, column2, ...]
[ORDER BY column3, column4, ...]
[frame_clause]
)
其中:
PARTITION BY
:定义窗口分区(类似于GROUP BY)ORDER BY
:定义窗口内行的排序frame_clause
:定义当前行的窗口框架(范围)
排名函数
-- ROW_NUMBER():为每行分配唯一的序号
SELECT
product_id,
category_id,
product_name,
price,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM
products;
-- RANK():相同值获得相同排名,但会有间隔
SELECT
product_id,
category_id,
product_name,
price,
RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM
products;
-- DENSE_RANK():相同值获得相同排名,没有间隔
SELECT
product_id,
category_id,
product_name,
price,
DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM
products;
分析函数
-- LAG():访问当前行之前的行
SELECT
product_id,
product_name,
price,
LAG(price) OVER (ORDER BY product_id) AS prev_price,
price - LAG(price) OVER (ORDER BY product_id) AS price_diff
FROM
products;
-- LEAD():访问当前行之后的行
SELECT
product_id,
product_name,
price,
LEAD(price) OVER (ORDER BY product_id) AS next_price,
LEAD(price) OVER (ORDER BY product_id) - price AS price_diff
FROM
products;
-- FIRST_VALUE():返回窗口第一行的值
SELECT
product_id,
category_id,
product_name,
price,
FIRST_VALUE(price) OVER (
PARTITION BY category_id
ORDER BY price
) AS lowest_price_in_category
FROM
products;
聚合窗口函数
-- 计算累计总和
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM
orders;
-- 计算移动平均
SELECT
order_date,
amount,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day
FROM
orders;
-- 按类别计算产品价格百分比
SELECT
product_id,
category_id,
product_name,
price,
price / SUM(price) OVER (PARTITION BY category_id) * 100 AS pct_of_category
FROM
products;
窗口函数应用场景
- 排名与分组:识别每组中的前N条记录
- 同比环比分析:与前期数据比较
- 累计统计:计算累计和、累计平均等
- 移动统计:计算移动平均、移动总和等
- 前后对比:当前值与前后值的比较
- 百分位数计算:计算数据的分布
常见复杂查询模式
动态透视表(PIVOT)
MySQL没有内置PIVOT功能,但可以使用CASE表达式和聚合函数模拟。
-- 示例:将产品类别从行转为列
SELECT
supplier_id,
SUM(CASE WHEN category_id = 1 THEN 1 ELSE 0 END) AS category_1_count,
SUM(CASE WHEN category_id = 2 THEN 1 ELSE 0 END) AS category_2_count,
SUM(CASE WHEN category_id = 3 THEN 1 ELSE 0 END) AS category_3_count
FROM
products
GROUP BY
supplier_id;
-- 按季度汇总销售额
SELECT
YEAR(order_date) AS year,
SUM(CASE WHEN QUARTER(order_date) = 1 THEN amount ELSE 0 END) AS Q1,
SUM(CASE WHEN QUARTER(order_date) = 2 THEN amount ELSE 0 END) AS Q2,
SUM(CASE WHEN QUARTER(order_date) = 3 THEN amount ELSE 0 END) AS Q3,
SUM(CASE WHEN QUARTER(order_date) = 4 THEN amount ELSE 0 END) AS Q4,
SUM(amount) AS yearly_total
FROM
orders
GROUP BY
YEAR(order_date);
查找缺失值
查找序列中缺失的值是一个常见需求。
-- 示例:查找缺失的订单编号
WITH RECURSIVE seq AS (
SELECT MIN(order_id) AS value
FROM orders
UNION ALL
SELECT value + 1
FROM seq
WHERE value < (SELECT MAX(order_id) FROM orders)
)
SELECT s.value AS missing_order_id
FROM seq s
LEFT JOIN orders o ON s.value = o.order_id
WHERE o.order_id IS NULL;
最新记录查询
每组中查找最新的记录是另一个常见需求。
-- 方法1:使用子查询
SELECT p.*
FROM products p
JOIN (
SELECT product_code, MAX(version) AS latest_version
FROM products
GROUP BY product_code
) latest ON p.product_code = latest.product_code AND p.version = latest.latest_version;
-- 方法2:使用窗口函数(MySQL 8.0+)
WITH latest_products AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY product_code ORDER BY version DESC) AS rn
FROM products
)
SELECT * FROM latest_products WHERE rn = 1;
累计分布分析
分析数据的累计分布可以帮助了解数据分布情况。
-- 示例:计算价格的累计分布
WITH price_stats AS (
SELECT
price,
COUNT(*) AS product_count,
(SELECT COUNT(*) FROM products) AS total_products
FROM
products
GROUP BY
price
)
SELECT
price,
product_count,
SUM(product_count) OVER (ORDER BY price) AS cum_count,
ROUND(SUM(product_count) OVER (ORDER BY price) / total_products * 100, 2) AS cum_percentage
FROM
price_stats
ORDER BY
price;
总结与预告
本文详细介绍了MySQL中的复杂查询技术,包括多表联结、子查询、集合操作、高级分组和窗口函数等。掌握这些技术将大大提升你分析和处理数据的能力,让你能够从数据库中提取更有价值的信息。
在下一篇文章中,我们将深入探讨索引优化,包括:
- 索引基本原理与类型
- 索引设计最佳实践
- 执行计划分析
- 常见索引问题与解决方案
- 索引维护策略
敬请期待!
🚀 关注我,不错过更多MySQL干货!
🔔 点赞、收藏,支持创作!
💬 评论区留下你的问题,我们一起讨论!