MySQL从入门到精通(五):复杂查询详解

📢 本文是《MySQL从入门到精通》系列的第五篇,将带你深入学习MySQL中的复杂查询技术,包括多表联结、子查询、集合操作和高级分组等,掌握这些技术将显著提升你的数据查询能力。

📋 文章目录

多表联结(JOIN)详解

关系型数据库的核心优势之一就是能够通过表之间的关系来组合数据。JOIN操作让我们能够从多个表中检索相关数据。

JOIN的类型和语法

MySQL支持多种类型的JOIN操作:

1. 内联结(INNER JOIN)

返回两个表中匹配行的结果。

-- 基本语法
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;

-- 示例:查询订单及对应的客户信息
SELECT 
    o.order_id,
    o.order_date,
    c.customer_name,
    c.email
FROM 
    orders o
INNER JOIN 
    customers c ON o.customer_id = c.customer_id;
2. 左外联结(LEFT JOIN)

返回左表中的所有行,以及右表中的匹配行。如果右表中没有匹配,右表的列显示为NULL。

-- 基本语法
SELECT columns
FROM table1
LEFT JOIN table2 ON table1.column = table2.column;

-- 示例:查询所有客户及其订单(包括没有订单的客户)
SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date
FROM 
    customers c
LEFT JOIN 
    orders o ON c.customer_id = o.customer_id;
3. 右外联结(RIGHT JOIN)

返回右表中的所有行,以及左表中的匹配行。如果左表中没有匹配,左表的列显示为NULL。

-- 基本语法
SELECT columns
FROM table1
RIGHT JOIN table2 ON table1.column = table2.column;

-- 示例:查询所有产品及其订单(包括未被订购的产品)
SELECT 
    p.product_id,
    p.product_name,
    oi.order_id,
    oi.quantity
FROM 
    order_items oi
RIGHT JOIN 
    products p ON oi.product_id = p.product_id;
4. 全外联结(FULL JOIN)

返回两个表中的所有行。MySQL不直接支持FULL JOIN,但可以通过UNION结合LEFT JOIN和RIGHT JOIN来模拟。

-- 模拟FULL JOIN
SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id
FROM 
    customers c
LEFT JOIN 
    orders o ON c.customer_id = o.customer_id

UNION

SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id
FROM 
    customers c
RIGHT JOIN 
    orders o ON c.customer_id = o.customer_id
WHERE 
    c.customer_id IS NULL;
5. 自联结(SELF JOIN)

同一个表与自身进行联结,通常用于处理层次结构数据。

-- 示例:查询员工及其直接经理
SELECT 
    e.employee_id,
    e.employee_name,
    m.employee_name AS manager_name
FROM 
    employees e
LEFT JOIN 
    employees m ON e.manager_id = m.employee_id;

多表联结

可以联结两个以上的表来获取更复杂的数据关系。

-- 示例:查询订单、客户和销售人员信息
SELECT 
    o.order_id,
    o.order_date,
    c.customer_name,
    s.salesperson_name
FROM 
    orders o
JOIN 
    customers c ON o.customer_id = c.customer_id
JOIN 
    salespeople s ON o.salesperson_id = s.salesperson_id;

-- 包含订单明细的多表查询
SELECT 
    o.order_id,
    c.customer_name,
    p.product_name,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price) AS item_total
FROM 
    orders o
JOIN 
    customers c ON o.customer_id = c.customer_id
JOIN 
    order_items oi ON o.order_id = oi.order_id
JOIN 
    products p ON oi.product_id = p.product_id
ORDER BY 
    o.order_id, p.product_name;

JOIN的性能优化

  1. 确保JOIN条件列有索引:这是最重要的优化手段
  2. 合理排序表的顺序:从小表到大表联结通常更高效
  3. 只选择必要的列:减少数据传输量
  4. 使用适当的JOIN类型:不要过度使用LEFT/RIGHT JOIN
  5. 控制结果集大小:使用WHERE和LIMIT限制结果数量

子查询和嵌套查询

子查询是嵌套在另一个查询中的SELECT语句,可用于构建更复杂的查询逻辑。

子查询的类型

1. 标量子查询

返回单个值的子查询,可用在需要单个值的表达式中。

-- 示例:查询价格高于平均价格的产品
SELECT 
    product_id, 
    product_name, 
    price
FROM 
    products
WHERE 
    price > (SELECT AVG(price) FROM products);

-- 在SELECT列表中使用标量子查询
SELECT 
    product_name,
    price,
    (SELECT AVG(price) FROM products) AS avg_price,
    price - (SELECT AVG(price) FROM products) AS price_diff
FROM 
    products;
2. 行子查询

返回单行多列的子查询,可用于比较多个列的值。

-- 示例:查找与特定产品价格和类别相同的其他产品
SELECT 
    product_id, 
    product_name
FROM 
    products
WHERE 
    (category_id, price) = (
        SELECT category_id, price 
        FROM products 
        WHERE product_id = 10
    )
AND 
    product_id <> 10;
3. 列子查询

返回单列多行的子查询,通常与IN、ANY、ALL等运算符一起使用。

-- 示例:查询有订单的客户
SELECT 
    customer_id, 
    customer_name
FROM 
    customers
WHERE 
    customer_id IN (SELECT DISTINCT customer_id FROM orders);

-- 使用NOT IN查询没有订单的客户
SELECT 
    customer_id, 
    customer_name
FROM 
    customers
WHERE 
    customer_id NOT IN (SELECT DISTINCT customer_id FROM orders);
4. 表子查询

返回多行多列的子查询,可用在FROM子句中作为派生表。

-- 示例:从派生表中查询数据
SELECT 
    category_name, 
    avg_price
FROM 
    (
        SELECT 
            c.category_name,
            AVG(p.price) AS avg_price
        FROM 
            products p
        JOIN 
            categories c ON p.category_id = c.category_id
        GROUP BY 
            c.category_name
    ) AS category_stats
WHERE 
    avg_price > 100;

子查询运算符

1. IN 和 NOT IN

检查值是否在子查询结果集中。

-- 示例:查询销售额前3名产品的订单
SELECT 
    order_id, 
    customer_id
FROM 
    orders
WHERE 
    order_id IN (
        SELECT order_id 
        FROM order_items 
        WHERE product_id IN (
            SELECT product_id 
            FROM products 
            ORDER BY sales DESC 
            LIMIT 3
        )
    );
2. EXISTS 和 NOT EXISTS

检查子查询是否返回任何行。比IN更高效,尤其是对大型数据集。

-- 示例:查询过去30天内有订单的客户
SELECT 
    customer_id, 
    customer_name
FROM 
    customers c
WHERE 
    EXISTS (
        SELECT 1 
        FROM orders o 
        WHERE o.customer_id = c.customer_id 
        AND o.order_date > DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
    );
3. ANY/SOME 和 ALL

与比较运算符一起使用,分别检查条件是否对子查询结果集中的任意行或所有行成立。

-- 使用ANY:查询价格高于任何经济型产品的豪华产品
SELECT 
    product_name, 
    price
FROM 
    products
WHERE 
    category = 'luxury' 
    AND price > ANY (
        SELECT price 
        FROM products 
        WHERE category = 'economy'
    );

-- 使用ALL:查询价格高于所有经济型产品的豪华产品
SELECT 
    product_name, 
    price
FROM 
    products
WHERE 
    category = 'luxury' 
    AND price > ALL (
        SELECT price 
        FROM products 
        WHERE category = 'economy'
    );

相关子查询

相关子查询是引用外部查询表的子查询,每行外部查询都会执行一次子查询。

-- 示例:查询每个类别中价格高于该类别平均价格的产品
SELECT 
    p1.product_id,
    p1.product_name,
    p1.category_id,
    p1.price
FROM 
    products p1
WHERE 
    p1.price > (
        SELECT AVG(p2.price) 
        FROM products p2 
        WHERE p2.category_id = p1.category_id
    );

子查询优化技巧

  1. 尽可能使用JOIN代替子查询:在许多情况下,JOIN的性能更好
  2. 优先使用EXISTS而非IN:对大型数据集更高效
  3. 避免在子查询中使用相关子查询:可能导致执行效率低下
  4. 使用索引优化子查询性能:确保子查询中使用的列有合适的索引
  5. 考虑使用临时表:有时将复杂子查询的结果存储在临时表中更高效

集合操作(UNION/INTERSECT)

集合操作允许合并多个查询的结果集。

UNION 和 UNION ALL

UNION合并两个或多个SELECT语句的结果,并删除重复行。UNION ALL保留所有行,包括重复行。

-- UNION示例:合并活跃客户和潜在客户列表
SELECT 
    customer_id,
    customer_name,
    email,
    'Active' AS status
FROM 
    customers
WHERE 
    status = 'active'

UNION

SELECT 
    lead_id AS customer_id,
    lead_name AS customer_name,
    email,
    'Lead' AS status
FROM 
    leads
WHERE 
    quality_score > 80;

-- UNION ALL示例:查询所有交易记录(收入和支出)
SELECT 
    transaction_date,
    amount,
    'Income' AS type
FROM 
    income_transactions

UNION ALL

SELECT 
    transaction_date,
    amount,
    'Expense' AS type
FROM 
    expense_transactions
ORDER BY 
    transaction_date;

模拟INTERSECT

MySQL不直接支持INTERSECT,但可以使用IN或EXISTS模拟。INTERSECT返回两个查询结果中都存在的行。

-- 使用IN模拟INTERSECT:查询同时是客户和供应商的公司
SELECT 
    company_name, 
    contact_email
FROM 
    customers
WHERE 
    company_name IN (
        SELECT company_name 
        FROM suppliers
    );

-- 使用EXISTS模拟INTERSECT
SELECT 
    c.company_name, 
    c.contact_email
FROM 
    customers c
WHERE 
    EXISTS (
        SELECT 1 
        FROM suppliers s 
        WHERE s.company_name = c.company_name
    );

模拟EXCEPT/MINUS

MySQL同样不直接支持EXCEPT/MINUS,但可以使用NOT IN或NOT EXISTS模拟。EXCEPT返回第一个查询中存在但第二个查询中不存在的行。

-- 使用NOT IN模拟EXCEPT:查询有订单的客户
SELECT 
    customer_id, 
    customer_name
FROM 
    customers
WHERE 
    customer_id NOT IN (
        SELECT DISTINCT customer_id 
        FROM orders
    );

-- 使用NOT EXISTS模拟EXCEPT
SELECT 
    c.customer_id, 
    c.customer_name
FROM 
    customers c
WHERE 
    NOT EXISTS (
        SELECT 1 
        FROM orders o 
        WHERE o.customer_id = c.customer_id
    );

高级分组和聚合

GROUP BY子句用于对数据进行分组,通常与聚合函数一起使用。

基本分组和聚合

-- 基本分组查询:按类别统计产品数量和平均价格
SELECT 
    category_id,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM 
    products
GROUP BY 
    category_id;

-- 多列分组:按类别和供应商统计
SELECT 
    category_id,
    supplier_id,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM 
    products
GROUP BY 
    category_id, supplier_id;

HAVING子句

HAVING用于过滤分组后的结果,类似于WHERE过滤行,但适用于分组和聚合后的结果。

-- 示例:查找平均价格超过100的类别
SELECT 
    category_id,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM 
    products
GROUP BY 
    category_id
HAVING 
    AVG(price) > 100;

-- 组合WHERE和HAVING:查找有超过5个激活产品且平均价格超过50的类别
SELECT 
    category_id,
    COUNT(*) AS active_products,
    AVG(price) AS avg_price
FROM 
    products
WHERE 
    status = 'active'
GROUP BY 
    category_id
HAVING 
    COUNT(*) > 5 AND AVG(price) > 50;

GROUP_CONCAT函数

GROUP_CONCAT将组内的值连接成一个字符串,非常适合生成逗号分隔列表。

-- 示例:每个类别的产品名称列表
SELECT 
    category_id,
    GROUP_CONCAT(product_name) AS products
FROM 
    products
GROUP BY 
    category_id;

-- 自定义分隔符和排序
SELECT 
    category_id,
    GROUP_CONCAT(
        product_name 
        ORDER BY price DESC 
        SEPARATOR ' | '
    ) AS products
FROM 
    products
GROUP BY 
    category_id;

WITH ROLLUP

WITH ROLLUP修饰符生成额外的汇总行,提供分层汇总信息。

-- 示例:按类别和供应商统计产品,带汇总行
SELECT 
    IFNULL(category_id, 'All Categories') AS category,
    IFNULL(supplier_id, 'All Suppliers') AS supplier,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM 
    products
GROUP BY 
    category_id, supplier_id
WITH ROLLUP;

窗口函数(MySQL 8.0+)

MySQL 8.0引入了窗口函数(也称为分析函数),它们对结果集的行执行计算,类似于聚合函数,但不会减少结果集的行数。

窗口函数语法

function_name() OVER (
    [PARTITION BY column1, column2, ...]
    [ORDER BY column3, column4, ...]
    [frame_clause]
)

其中:

  • PARTITION BY:定义窗口分区(类似于GROUP BY)
  • ORDER BY:定义窗口内行的排序
  • frame_clause:定义当前行的窗口框架(范围)

排名函数

-- ROW_NUMBER():为每行分配唯一的序号
SELECT 
    product_id,
    category_id,
    product_name,
    price,
    ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM 
    products;

-- RANK():相同值获得相同排名,但会有间隔
SELECT 
    product_id,
    category_id,
    product_name,
    price,
    RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM 
    products;

-- DENSE_RANK():相同值获得相同排名,没有间隔
SELECT 
    product_id,
    category_id,
    product_name,
    price,
    DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank
FROM 
    products;

分析函数

-- LAG():访问当前行之前的行
SELECT 
    product_id,
    product_name,
    price,
    LAG(price) OVER (ORDER BY product_id) AS prev_price,
    price - LAG(price) OVER (ORDER BY product_id) AS price_diff
FROM 
    products;

-- LEAD():访问当前行之后的行
SELECT 
    product_id,
    product_name,
    price,
    LEAD(price) OVER (ORDER BY product_id) AS next_price,
    LEAD(price) OVER (ORDER BY product_id) - price AS price_diff
FROM 
    products;

-- FIRST_VALUE():返回窗口第一行的值
SELECT 
    product_id,
    category_id,
    product_name,
    price,
    FIRST_VALUE(price) OVER (
        PARTITION BY category_id 
        ORDER BY price
    ) AS lowest_price_in_category
FROM 
    products;

聚合窗口函数

-- 计算累计总和
SELECT 
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM 
    orders;

-- 计算移动平均
SELECT 
    order_date,
    amount,
    AVG(amount) OVER (
        ORDER BY order_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3day
FROM 
    orders;

-- 按类别计算产品价格百分比
SELECT 
    product_id,
    category_id,
    product_name,
    price,
    price / SUM(price) OVER (PARTITION BY category_id) * 100 AS pct_of_category
FROM 
    products;

窗口函数应用场景

  1. 排名与分组:识别每组中的前N条记录
  2. 同比环比分析:与前期数据比较
  3. 累计统计:计算累计和、累计平均等
  4. 移动统计:计算移动平均、移动总和等
  5. 前后对比:当前值与前后值的比较
  6. 百分位数计算:计算数据的分布

常见复杂查询模式

动态透视表(PIVOT)

MySQL没有内置PIVOT功能,但可以使用CASE表达式和聚合函数模拟。

-- 示例:将产品类别从行转为列
SELECT 
    supplier_id,
    SUM(CASE WHEN category_id = 1 THEN 1 ELSE 0 END) AS category_1_count,
    SUM(CASE WHEN category_id = 2 THEN 1 ELSE 0 END) AS category_2_count,
    SUM(CASE WHEN category_id = 3 THEN 1 ELSE 0 END) AS category_3_count
FROM 
    products
GROUP BY 
    supplier_id;

-- 按季度汇总销售额
SELECT 
    YEAR(order_date) AS year,
    SUM(CASE WHEN QUARTER(order_date) = 1 THEN amount ELSE 0 END) AS Q1,
    SUM(CASE WHEN QUARTER(order_date) = 2 THEN amount ELSE 0 END) AS Q2,
    SUM(CASE WHEN QUARTER(order_date) = 3 THEN amount ELSE 0 END) AS Q3,
    SUM(CASE WHEN QUARTER(order_date) = 4 THEN amount ELSE 0 END) AS Q4,
    SUM(amount) AS yearly_total
FROM 
    orders
GROUP BY 
    YEAR(order_date);

查找缺失值

查找序列中缺失的值是一个常见需求。

-- 示例:查找缺失的订单编号
WITH RECURSIVE seq AS (
    SELECT MIN(order_id) AS value
    FROM orders
    UNION ALL
    SELECT value + 1
    FROM seq
    WHERE value < (SELECT MAX(order_id) FROM orders)
)
SELECT s.value AS missing_order_id
FROM seq s
LEFT JOIN orders o ON s.value = o.order_id
WHERE o.order_id IS NULL;

最新记录查询

每组中查找最新的记录是另一个常见需求。

-- 方法1:使用子查询
SELECT p.*
FROM products p
JOIN (
    SELECT product_code, MAX(version) AS latest_version
    FROM products
    GROUP BY product_code
) latest ON p.product_code = latest.product_code AND p.version = latest.latest_version;

-- 方法2:使用窗口函数(MySQL 8.0+)
WITH latest_products AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY product_code ORDER BY version DESC) AS rn
    FROM products
)
SELECT * FROM latest_products WHERE rn = 1;

累计分布分析

分析数据的累计分布可以帮助了解数据分布情况。

-- 示例:计算价格的累计分布
WITH price_stats AS (
    SELECT 
        price,
        COUNT(*) AS product_count,
        (SELECT COUNT(*) FROM products) AS total_products
    FROM 
        products
    GROUP BY 
        price
)
SELECT 
    price,
    product_count,
    SUM(product_count) OVER (ORDER BY price) AS cum_count,
    ROUND(SUM(product_count) OVER (ORDER BY price) / total_products * 100, 2) AS cum_percentage
FROM 
    price_stats
ORDER BY 
    price;

总结与预告

本文详细介绍了MySQL中的复杂查询技术,包括多表联结、子查询、集合操作、高级分组和窗口函数等。掌握这些技术将大大提升你分析和处理数据的能力,让你能够从数据库中提取更有价值的信息。

在下一篇文章中,我们将深入探讨索引优化,包括:

  • 索引基本原理与类型
  • 索引设计最佳实践
  • 执行计划分析
  • 常见索引问题与解决方案
  • 索引维护策略

敬请期待!


🚀 关注我,不错过更多MySQL干货!

🔔 点赞、收藏,支持创作!

💬 评论区留下你的问题,我们一起讨论!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值