04 数据分析与SQL Lesson4 SQL 子查询和临时表格
文章目录
1.视频:简介
有时候,你要回答的问题无法通过直接处理数据库中的现有表格获得答案。子查询和表格表达式都是用来通过查询创建一个表格,然后再编写一个查询来与这个新创建的表格进行互动,这涉及了本节课的3个内容:
- 子查询
- 表格表达式
- 持久衍生表格
2.视频:子查询简介
Subqueries: Allow you to answer more complex questions than you can with a single database table. 其实就是把子查询(也叫嵌套查询),放在 FROM() sub 中。
4.解决方案:编写你的首个子查询
- 要想计算出每个渠道每天的事件发生次数。首先,我们需要按照日期和渠道分组。然后按事件数(第三列)排序,这样可以快速得出第一个问题的答案。
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;
- 要想创建一个子查询,用来直接输出上面的值。查询的附加部分包括 *,并且我们需要为表格设置别名。此外,是在 SELECT 语句中(而不是 FROM)中提供表格。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
- 最后,我们在以下语句中能够获得显示每个渠道一天的平均事件数的表格。
- 这里()sub 的输出是 day,channel,event 3列。在根下再显示 channel 和 events的avg。
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
5.文本:子查询格式
最清晰的方式:
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub
GROUP BY channel
ORDER BY 2 DESC;
8.解决方案:关于子查询的更多内容
- 以下是从 orders 表格中获取第一个订单的年/月信息的查询。
- 这里注意 DATE_TRUNC 是可以对输出进行函数处理的(比如MIN)
SELECT DATE_TRUNC('month', MIN(occurred_at))
FROM orders;
- 然后,为了获取每个订单的平均值,我们可以在一个查询中执行所有的任务。但是为了便于阅读,我在下面提供了两个查询,单独执行每一步。
- 注意这里(SELECT)是 WHERE 判断的条件,所以不能使用别名,页不用设定别名
SELECT AVG(standard_qty) avg_std, AVG(gloss_qty) avg_gls, AVG(poster_qty) avg_pst
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
SELECT SUM(total_amt_usd)
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
10.解决方案:爱上子查询
这节系统的扩展了子查询的复杂例子(在 JOIN 里面也可以选中)。推荐学有余力的同学尝试。
11.视频:WITH
WITH 语句经常称为公式表达式,可以理解为用 WITH 把 SUBQUERY 抽离出来并起一个别名进行使用,在逻辑上更容易看懂:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;
这种方式在有多个筛选表的时候特别方便:
WITH table1 AS (
SELECT *
FROM web_events),
table2 AS (
SELECT *
FROM accounts)
SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;