在SQL中,LAG()、LEAD() 和 ROW_NUMBER() 是处理连续性问题的强大窗口函数。让我详细解释它们各自的用法和区别。
1. ROW_NUMBER() - 标识连续性
功能:为结果集中的每一行分配一个唯一的连续序号
处理连续性的典型用法:
-- 找出连续记录(如连续登录用户)
WITH numbered_data AS (
SELECT
user_id,
login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) as rn
FROM login_records
)
SELECT
user_id,
login_date,
DATE_SUB(login_date, INTERVAL rn DAY) as group_id
FROM numbered_data
GROUP BY user_id, DATE_SUB(login_date, INTERVAL rn DAY)
HAVING COUNT(*) >= 3; -- 连续3天登录
2. LAG() - 与前一行比较
功能:访问当前行之前的行的数据
处理连续性的典型用法:
-- 检测连续递增/递减序列
WITH lag_data AS (
SELECT
date,
value,
LAG(value) OVER (ORDER BY date) as prev_value
FROM metrics
)
SELECT
date,
value,
CASE
WHEN value = prev_value + 1 THEN '连续递增'
WHEN value = prev_value - 1 THEN '连续递减'
ELSE '不连续'
END as continuity_status
FROM lag_data;
-- 找出连续登录的用户
SELECT
user_id,
login_date,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) as prev_login
FROM user_logins
WHERE login_date = DATE_ADD(prev_login, INTERVAL 1 DAY);
3. LEAD() - 与后一行比较
功能:访问当前行之后的行的数据
处理连续性的典型用法:
-- 预测连续性
WITH lead_data AS (
SELECT
date,
value,
LEAD(value) OVER (ORDER BY date) as next_value
FROM metrics
)
SELECT
date,
value,
CASE
WHEN next_value = value + 1 THEN '下一项连续'
ELSE '下一项不连续'
END as next_continuity
FROM lead_data;
综合示例:找出最长连续序列
-- 找出每个用户的最长连续登录天数
WITH login_sequences AS (
SELECT
user_id,
login_date,
DATE_SUB(login_date, INTERVAL ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
) DAY) as group_id
FROM user_logins
),
sequence_lengths AS (
SELECT
user_id,
group_id,
MIN(login_date) as start_date,
MAX(login_date) as end_date,
COUNT(*) as consecutive_days
FROM login_sequences
GROUP BY user_id, group_id
)
SELECT
user_id,
start_date,
end_date,
consecutive_days
FROM sequence_lengths
WHERE consecutive_days = (
SELECT MAX(consecutive_days)
FROM sequence_lengths sl2
WHERE sl2.user_id = sequence_lengths.user_id
);
处理连续性问题的通用模式
- 使用ROW_NUMBER()创建序列号
- 用日期减去序列号创建分组标识
- 对分组进行聚合分析
-- 通用连续性分析模板
WITH numbered_data AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY order_column) as rn
FROM your_table
),
grouped_data AS (
SELECT
*,
DATE_SUB(order_date, INTERVAL rn DAY) as continuity_group -- 根据实际情况调整
FROM numbered_data
)
SELECT
group_column,
continuity_group,
MIN(order_date) as start_date,
MAX(order_date) as end_date,
COUNT(*) as consecutive_count
FROM grouped_data
GROUP BY group_column, continuity_group
HAVING consecutive_count >= desired_threshold;
关键区别总结
| 函数 | 用途 | 连续性处理场景 |
|---|---|---|
ROW_NUMBER() | 生成唯一序列号 | 标识和分组连续记录 |
LAG() | 访问前一行数据 | 检测与前一项的连续性 |
LEAD() | 访问后一行数据 | 检测与后一项的连续性 |
这些函数通常结合使用来解决复杂的连续性问题,如寻找最长连续序列、检测连续性中断等。

327

被折叠的 条评论
为什么被折叠?



