- 定义
- 窗口函数(Window Functions)的命名源于其核心机制:通过定义一个动态的“窗口”(Window)来确定函数计算的数据范围。这个“窗口”并非固定不变,而是为每一行数据单独划定一个相关的数据集,函数仅在该范围内执行计算。
- 与聚合函数(如
GROUP BY
)不同,窗口函数不会折叠数据,而是为每行返回结果,同时保留原始行。
- 案例
- 环境准备
- 创建一张成绩表
CREATE TABLE `academic_records` ( `id` int NOT NULL AUTO_INCREMENT COMMENT '自增主键', `student_id` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '学生唯一标识符', `student_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '学生姓名', `subject` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '学科名称', `exam_date` datetime NOT NULL COMMENT '考试时间', `score` decimal(5,1) NOT NULL COMMENT '数值成绩(0-100分,允许一位小数)', PRIMARY KEY (`id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=DYNAMIC COMMENT='学生学业成绩记录表';
- 填充数据
INSERT INTO academic_records (student_id, student_name, subject, exam_date, score) VALUES -- 学生 S001 张三 ('S001', '张三', '语文', '2023-10-01 09:00:00', 85.5), ('S001', '张三', '语文', '2023-11-05 09:00:00', 88.0), ('S001', '张三', '语文', '2023-12-10 09:00:00', 90.5), ('S001', '张三', '数学', '2023-10-01 14:00:00', 92.0), ('S001', '张三', '数学', '2023-11-05 14:00:00', 94.5), ('S001', '张三', '数学', '2023-12-10 14:00:00', 95.0), ('S001', '张三', '英语', '2023-10-01 10:00:00', 80.0), ('S001', '张三', '英语', '2023-11-05 10:00:00', 85.5), ('S001', '张三', '英语', '2023-12-10 10:00:00', 88.0), -- 学生 S002 李四 ('S002', '李四', '语文', '2023-10-01 09:00:00', 78.5), ('S002', '李四', '语文', '2023-11-05 09:00:00', 82.0), ('S002', '李四', '语文', '2023-12-10 09:00:00', 85.5), ('S002', '李四', '数学', '2023-10-01 14:00:00', 88.0), ('S002', '李四', '数学', '2023-11-05 14:00:00', 91.5), ('S002', '李四', '数学', '2023-12-10 14:00:00', 93.0), ('S002', '李四', '英语', '2023-10-01 10:00:00', 92.5), ('S002', '李四', '英语', '2023-11-05 10:00:00', 89.0), ('S002', '李四', '英语', '2023-12-10 10:00:00', 95.5), -- 学生 S003 王五 ('S003', '王五', '语文', '2023-10-01 09:00:00', 65.0), ('S003', '王五', '语文', '2023-11-05 09:00:00', 70.5), ('S003', '王五', '语文', '2023-12-10 09:00:00', 75.0), ('S003', '王五', '数学', '2023-10-01 14:00:00', 95.0), ('S003', '王五', '数学', '2023-11-05 14:00:00', 97.5), ('S003', '王五', '数学', '2023-12-10 14:00:00', 99.0), ('S003', '王五', '英语', '2023-10-01 10:00:00', 72.0), ('S003', '王五', '英语', '2023-11-05 10:00:00', 68.5), ('S003', '王五', '英语', '2023-12-10 10:00:00', 80.0), -- 学生 S004 赵六 ('S004', '赵六', '语文', '2023-10-01 09:00:00', 92.5), ('S004', '赵六', '语文', '2023-11-05 09:00:00', 94.0), ('S004', '赵六', '语文', '2023-12-10 09:00:00', 96.5), ('S004', '赵六', '数学', '2023-10-01 14:00:00', 85.0), ('S004', '赵六', '数学', '2023-11-05 14:00:00', 88.5), ('S004', '赵六', '数学', '2023-12-10 14:00:00', 90.0), ('S004', '赵六', '英语', '2023-10-01 10:00:00', 89.5), ('S004', '赵六', '英语', '2023-11-05 10:00:00', 93.0), ('S004', '赵六', '英语', '2023-12-10 10:00:00', 91.5), -- 学生 S005 陈七 ('S005', '陈七', '语文', '2023-10-01 09:00:00', 60.5), ('S005', '陈七', '语文', '2023-11-05 09:00:00', 65.0), ('S005', '陈七', '语文', '2023-12-10 09:00:00', 68.5), ('S005', '陈七', '数学', '2023-10-01 14:00:00', 72.0), ('S005', '陈七', '数学', '2023-11-05 14:00:00', 75.5), ('S005', '陈七', '数学', '2023-12-10 14:00:00', 79.0), ('S005', '陈七', '英语', '2023-10-01 10:00:00', 84.0), ('S005', '陈七', '英语', '2023-11-05 10:00:00', 88.5), ('S005', '陈七', '英语', '2023-12-10 10:00:00', 92.0);
- 创建一张成绩表
- 函数语法
函数名([参数]) OVER ( [PARTITION BY 分组列] [ORDER BY 排序列 [ASC|DESC]] [ROWS|RANGE 框架范围] )
PARTITION BY
: 按列分组,类似GROUP BY
但结果不折叠ORDER BY
: 定义窗口内排序方式,影响函数计算逻辑- 框架范围(Frame Clause): 指定窗口的滑动范围,例如:
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
: 包含当前行及前3行。RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
: 默认范围(从第一行到当前行)ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
: 覆盖整个分区
- 函数应用
-
排名函数
- ROW_NUMBER(): 为每行分配唯一序号(如:1,2,3)。
- RANK(): 相同值排名相同,后续序号跳过(如:1,1,3)。
- DENSE_RANK(): 相同值排名相同,后续序号不跳过(如:1,1,2)
- demo:
SELECT student_id, student_name, subject, exam_date, score, -- 唯一序号(无并列) ROW_NUMBER() OVER (PARTITION BY subject ORDER BY score DESC) AS row_num_rank, -- 允许并列且跳过后续序号 RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS `rank`, -- 用反引号包裹 -- 允许并列且不跳过序号 DENSE_RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS `dense_rank` FROM academic_records WHERE subject = '数学';
-
聚合函数作为窗口函数
- SUM(), AVG(), COUNT(), MIN(), MAX() 等可配合窗口使用,实现累积或分组统计
- demo
SELECT student_id, student_name, subject, exam_date, score, -- 按考试顺序(id)计算累积平均分 AVG(score) OVER ( PARTITION BY student_id, subject ORDER BY id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_avg, -- 按考试顺序(id)计算近2次考试(包括本次)的平均分 AVG(score) OVER ( PARTITION BY student_id, subject ORDER BY id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS cumulative_avg_last_2_months FROM academic_records WHERE student_id = 'S001';
-
分布函数
- CUME_DISTO :计算累积分布(<=当前值的行占比,范围[0,1])。
- demo
SELECT student_id, student_name, score, CUME_DIST() OVER (ORDER BY score DESC) AS `cume_dist` FROM academic_records WHERE subject = '数学' and exam_date = '2023-10-01 14:00:00';
-
前后行访问
- LAG(column,n):获取当前行前第 n行的值。
- LEAD(column,n):获取当前行后第 n行的值。
- demo
SELECT student_id, student_name, subject, exam_date, score, -- 获取上一次考试成绩 LAG(score) OVER ( PARTITION BY student_id, subject ORDER BY exam_date ) AS previous_score, -- 计算本次与上次成绩差值 score - LAG(score) OVER ( PARTITION BY student_id, subject ORDER BY exam_date ) AS score_change FROM academic_records where student_id = 'S001';
-
首尾值函数
- FIRST_VALUE(column):返回窗口第一行的值。
- LAST VALUE(column):返回窗口最后一行的值。
- demo
SELECT student_id, student_name, subject, exam_date, score, -- 首次考试成绩 FIRST_VALUE(score) OVER ( PARTITION BY student_id, subject ORDER BY exam_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- 覆盖整个分区 ) AS first_score, -- 最后一次考试成绩 LAST_VALUE(score) OVER ( PARTITION BY student_id, subject ORDER BY exam_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- 覆盖整个分区 ) AS last_score FROM academic_records where student_id = 'S001' ORDER BY student_id, subject, exam_date;
-
- 环境准备