最近,一个朋友和以前同事问起我是如何书写SQL的。起先这个问题让我很惊讶,然后表现出没什么不同的样子,但是在此问题上经过一番讨论,这个问题变得清晰起来了,大多数人都没有书写干净的可读性强的SQL的概念。后会无期这里介绍我是如何书写SQL的,带示例。
首先让我们了解一个数据库对象集合(schema):
# \dt
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | app_rating | table | craig
public | app_recommendation | table | craig
public | app_userprofile | table | craig
public | app_wine | table | craig
public | app_winemakeup | table | craig
public | app_winery | table | craig
public | auth_user | table | craig
上面的对象集合包括了葡萄酒厂的葡萄酒,品尝者对此评等级和作评论。特别是app_rating表,它包含了太多我们即将说明的东西。
# \d app_rating
Table "public.app_rating"
Column | Type | Modifiers
------------+--------------------------+---------------------------------------------------------
id | integer | not null default nextval('app_rating_id_seq'::regclass)
user_id | integer | not null
wine_id | integer | not null
rated_at | date | not null
rating | integer | not null
notes | text |
tags | character varying(255)[] |
created_at | timestamp with time zone | not null
上面大多数字段都是十分明确的,如果你对postgres 的数组(array)不熟悉请看看
这篇早期文章。
让我们书写一些查询从所有数据中找出所给的葡萄酒所对应的葡萄酒厂,平均评价等级和标签。我通常会分别书写每个关键字部分,然后把它们聚合在一起。让我们从处理平均[评价等价]开始吧。
首先是基本结构,为了最大可读性,我会保证所有SQL保留字使用大写。对于很大的查询语句我确保所有的列/条件都独自占用一行。所以为了得到平均的查询语句看起来像这样的:
SELECT
avg(rating),
wine_id
FROM
app_wine
GROUP BY
wine_id;
SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags;
WITH
wine_ratings as (
SELECT
avg(rating) as rating,
wine_id
FROM
app_rating
GROUP BY
wine_id),
wine_tags as (
SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags),
wine_detail as (
SELECT
app_wine.name as name,
app_wine.id,
app_winery.name as winery
FROM
app_wine,
app_winery
WHERE app_wine.winery_id = app_winery.id
)
SELECT
name,
rating,
array_agg(tag),
winery
FROM
wine_ratings,
wine_detail
LEFT OUTER JOIN
wine_tags ON wine_detail.id = wine_tags.wine_id
WHERE wine_detail.id = wine_ratings.wine_id
GROUP BY
name,
rating,
winery
ORDER BY
rating DESC
值得一提的是,像 SELECT, FROM 和 ORDER BY在其它语句之后新建一行。当WHERE有多个条件时,我会确保AND和条件写在同一行。这是有意去使不仅阅读更容易,而且容易删除/添加。使之更具可读性的关键是在AND之前添加额外的两个空格使之与条件对齐。这可能就会像:
SELECT foo
FROM bar
WHERE foo.id = bar.foo_id
AND foo.created_at > now() - '7 days'::INTERVAL
对于这个例子而言,我们就会从这查询语句中得到这个结果:
name | rating | array_agg | winery
-----------------------+--------+--------------------+------------------------
Bordeaux Blend | 5.0 | {'dry', 'smooth'} | Chateau Rahoul
Cabernet Franc | 5.0 | {'chocolate'} | Beaucanon
Cabernet Sauvignon | 5.0 | {'young', 'dry'} | Hawkes
尽管长了点,但是完美优雅。