单关系查询
消除重复——DISTINCT
SELECT默认不消除重复元组,要加关键字DISTINCT。
-- 查询class的所有的学院名,不能重复
SELECT DISTINCT institute
FROM class
选择运算 WHERE
1.比较查询
包括比较运算符>, >=, <, <=, =, <>(不等于)
-- 对于数值型指定范围的选择
SELECT *
FROM films
WHERE release_year>2016;
-- 对于文本类,比较时,值要用‘’括起来
SELECT *
FROM films
WHERE language='French';
-- 选择满足条件的所有列的数量,用COUNT()函数
SELECT COUNT(*)
FROM films
WHERE release_year<2000;
2.多条件(单边范围+点)查询 WHERE…AND…OR…+逻辑查询
WHERE…AND…OR…与比较运算符结合,比较运算符控制某个条件的单边范围或者点范围,AND/OR组合多个条件。
逻辑查询仅用于实现逻辑与或非,即只使用=,!=比较符。此外,逻辑运算不能对同一个属性进行连续的逻辑与运算。这种只能使用连接查询或者嵌套子查询。
SELECT title, release_year
FROM films
WHERE language='Spanish' AND release_year<2000;
-- 可以接很多个AND
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND gross>2000000;
-- 下面是错误例子
SELECT *
FROM score
WHERE courseNo='001' AND courseNo='002'
3. 范围查询(双边) BETWEEN…AND…
控制条件的两个边界。BETWEEN后是属性的下限值,AND后是上限值。NOT…BETWEEN…AND表示查询属性值不在某一个范围内的元组。也可以使用OR和比较运算符组合来实现。
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND (budget > 100000000)
AND language='Spanish'
4. 集合查询 IN
用于查询属性值在某个集合内的元组,NOT IN 用于查询属性值不在某个集合内的元组。
SELECT title, language
FROM films
WHERE language IN('English','Spanish','French');
-- 注意IN后直接跟集合,没有空格
5. 空值查询 IS NULL/IS NOT NULL
IS不能用=替代。
SELECT name
FROM people
WHERE deathdate IS NULL;
6. 字符匹配查询 LIKE
对于字符型数据来说,LIKE用于字符匹配查询。语法格式为:
[NOT] LIKE <匹配字符串> [ESCAPE<换码字符>]
解释——
有NOT:表示查询指定的属性列值不和<匹配字符串>匹配的元组;
无NOT:表示查询指定的属性列值和<匹配字符串>匹配的元组;
<匹配字符串>:可以是个具体的字符串,也可以包括通配符%和_。在语句中用一对引号’’ 括起来。
%:表示任意长度的字符串,如ab%,表示所有以ab开头的任意长度字符串;ab%ab表示以ab开头ab结束,中间长度任意的字符串。
_:表示任意一个字符。如ab_,表示所有以ab开头的3个字符的字符串,第3个字符任意。
ESCAPE<换码字符>:查询的字符串中本身包含通配符%和_,就使用ESCAPE<换码字符>对通配符进行转义处理。
-- 选择所有人名以B开头的名字
SELECT name
FROM people
WHERE name LIKE 'B%'
-- 选择第二个字符为r的名字
SELECT name
FROM people
WHERE name LIKE '_r%'
-- 选择班级名称中含有‘16_'的班级名称
SELECT className
FROM class
WHERE className LIKE '%16\_' ESCAPE '\'
-- \被ESCAPE解释为换码字符,表示它后面的不是通配符,也可以换成#
7. 聚合查询 (聚合函数+分组聚合)
聚合函数包括了几种统计函数——
count(DISTINCT | ALL *|列名):个数
sum(DISTINCT | ALL 列名):必须是数值型
avg(DISTINCT | ALL 列名):必须是数值型
max(DISTINCT | ALL 列名):最大值
min(DISTINCT | ALL 列名):最小值
SELECT SUM(gross)
FROM films
WHERE release_year>=2000
SELECT AVG(gross)
FROM films
WHERE title LIKE 'A%';
-- 下例包含两个知识点,浮点数相除才能得到浮点数,int型没有;
-- AS 重命名列
SELECT COUNT(deathdate)*100.0/COUNT(*) AS percentage_dead
FROM people
分组聚合(目的是为了细化聚合函数的作用对象)——
GROUP BY:对查询结果按某一列或某几列进行分组,值相等分为一组;分组后聚合函数分别作用于每个组,查询结果按组聚合输出**(按那一列分组,这一列要被SELECT)**
HAVING:对分组的结果进行选择,仅输出满足条件的组,必须与GROUP BY 配合使用。(有点类似于WHERE,但WHERE后的条件列不能使用聚合函数,HAVING可以)
SELECT language,SUM(gross)
FROM films
GROUP BY language
-- HAVING普通条件
SELECT release_year
FROM films
GROUP BY release_year
HAVING release_year > 1990;
-- HAVING 聚合函数条件
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget)>60000000
排序 ORDER BY
SQL默认排序是升序排列,在ORDER BY后加DESC关键字是降序排列。
可以对多列进行排序,首先按照第一列,然后按照下一列,类推。
SELECT birthdate,name
FROM people
ORDER BY birthdate
SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration
SELECT birthdate,name
FROM people
ORDER BY birthdate, name
单表查询语句组合示例
对多条查询语句组合使用时,需要按照下例所示的顺序使用:
/* SELECT...
FROM...
WHERE...
GROUP BY...
HAVING...
ORDER BY...按照这个次序*/
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY avg_gross DESC
/*解释:查询year>1990的元组,并按照year分组,year相同为一组,并计算每一组的budget、gross平均值,之后筛选出budget平均值>60000000的组,查询到的按照gross平均值降序排列*/
数量限制 LIMIT
限制查询到的结果打印的数量。
SELECT country,AVG(budget) as avg_budget,AVG(gross) as avg_gross
FROM films
GROUP BY country
HAVING COUNT(title)>10
ORDER BY country
LIMIT 5
多关系查询
1. 连接
强烈提醒: 连接的语句作用,是先根据连接语句的语法规则,将表中符合规则的元组找到并完整连接,在这里不会筛选掉任何属性,连接后再对新形成的虚表进行SELECT查询需要的字段。
1.1 内连接 INNER JOIN/等值连接
假设我们有两个表,左表和右表。匹配连接字段相等的行并查询。这里常用AS起别名简化代码。当两个表的连接字段名相同时,可以不写ON语句,直接用USING(列名)代替即可。
-- 给列起别名
SELECT cities.name AS city, countries.name AS country,countries.region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
-- 给表起别名
SELECT c.code AS country_code,c.name,e.year, e.inflation_rate
FROM countries AS c
-- Join to economies (alias e)
INNER JOIN economies AS e
-- Match on code
ON c.code = e.code
-- 三个表的内连接
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code and year
ON c.code = e.code AND e.year = p.year;
-- 使用USING
SELECT c.name AS country, continent, l.name AS language,official
FROM countries AS c
INNER JOIN languages AS l
USING(code)
1.2 自连接 SEL-fish JOIN
用于某一字段的值与同一表中同一字段的其他值进行比较。使用的语句依旧是INNER JOIN语句。
-- 示例查询了所有国家当年与5年前人口的总数,通过AND串起了两个连接字段,实现了自表比较
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
FROM populations as p1
INNER JOIN populations as p2
ON p1.country_code = p2.country_code
AND p1.year = p2.year-5
自表比较的一种方式——CASE…WHEN…THEN。实现的是if…then…else的功能。对某一列进行等级判定,生成一个判定后的等级新列。
-- 示例对领土面积进行了等级划分,生成新列
SELECT name, continent, code, surface_area,
-- First case
CASE WHEN surface_area > 2000000 THEN 'large'
-- Second case
WHEN surface_area>350000 THEN 'medium'
-- Else clause + end
ELSE 'small' END
-- Alias name
AS geosize_group
-- From table
FROM countries;
暂存查询为表——INTO。INTO能够把目前的查询结果存在一张新表中,后续查询可以针对该新表。
SELECT country_code, size,
CASE WHEN size > 50000000 THEN 'large'
WHEN size > 1000000 THEN 'medium'
ELSE 'small' END
AS popsize_group
-- Into table
INTO pop_plus
FROM populations
WHERE year = 2015;
-- Select all columns of pop_plus
SELECT * FROM pop_plus
自然连接在内连接的基础上消除重复列。
1.3 外连接
外连接包括左连接和右连接。
1. 3.1 左连接 LEFT JOIN
内连接是对连接字段匹配的元组进行连接,而左连接是右表匹配到后,匹配到的全部连接到左表上,没有得到匹配的抛弃,而左表其他元组会没有右表属性的值,全部赋值为NULL。相当于将右表扩充到左表上。
-- 两个简单例子
/*
Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
FROM countries AS c
LEFT JOIN languages AS l
ON c.code = l.code
ORDER BY country DESC;
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC;
右连接(RIGHT JOIN)与左连接语法类似,是把左表扩充到右表上,没有的属性填充NULL。
-- 要注意,右连接是把左表连接到右表上,因此FROM后的是左表,RIGHT JOIN后的是右表。
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
indep_year, languages.name AS language, percent
FROM languages
RIGHT JOIN countries
ON countries.code = languages.code
RIGHT JOIN cities
ON cities.country_code = countries.code
ORDER BY city, language;
1.4 全连接 FULL JOIN
全连接相当于左右连接的组合,是将连接字段相同的两个表匹配到的关系扩充到一起,保留左表所有字段同时保留右表所有字段,连接字段匹配到的连接在一起,没有匹配到的缺乏属性则填充NULL。
SELECT c1.name AS country, region, l.name AS language,
basic_unit, frac_unit
FROM countries AS c1
FULL JOIN languages AS l
USING (code)
FULL JOIN currencies AS c2
USING (code)
WHERE region LIKE 'M%esia';
1.5 交叉连接 CROSS JOIN
交叉连接类似于组合,左表的每一个字段匹配右表的所有字段。
-- 不需要连接字段
SELECT c.name AS city, l.name AS language
FROM cities AS c
CROSS JOIN languages AS l
WHERE c.name LIKE 'Hyder%';
以上连接都是附加连接,即查询到的结果都是附加在左表上的,还有两种连接,它们类似于WHERE子句依赖于右表,根据右表来确定保留哪些左表的数据。
1.6 半连接 Semi-JOIN
从SQL语言上来说,它并没有内置的语句规则来实现,而是组合WHERE…IN与简单SELECT嵌套来表现。即用SELECT嵌套子句限定右表的条件查询,再以此条件来限定左表的结果。
SELECT DISTINCT name
FROM languages
WHERE code IN
(SELECT code
FROM countries
WHERE region = 'Middle East')
ORDER BY name;
1.7 反连接 anti-JOIN
与半连接原理相同,但是左表查询的结果在右表的限定条件之外。用到NOT IN。
SELECT code,name
FROM countries
WHERE continent = 'Oceania'
AND code not IN
(SELECT code
FROM currencies
WHERE continent = 'Oceania');
2. 集合运算
集合运算时多个SELECT语句的查询组合。与连接不同,连接先作用于整张表,然后执行SELECT,集合运算则是分别执行SELECT再组合。主要包括并UNION,交INTERSECT,差EXCEPT,全UNION ALL。使用集合运算的前提是两张表中需要查询的字段是含义相同、数据类型一致的。
2.1 并集 UNION
它只保留出现过的属性值,不会保留重复。
-- 获取cities表country_code字段与currencies表code字段的并集
SELECT country_code
FROM cities
UNION
SELECT code
FROM currencies
ORDER BY country_code;
2.2 全集 UNION ALL
与UNION的区别在于,它不会去掉重复的,而是将所有都保留,类似于全连接。
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
UNION ALL
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code, year
ORDER BY code, year;
2.3 交集INTERSECT
顾名思义,获取两张表中含义相同字段的交集。即都出现的部分。
SELECT name
-- From countries
FROM countries
-- Set theory clause
INTERSECT
-- Select fields
SELECT name
-- From cities
FROM cities;
2.4 差集EXCEPT
与交集相反,它获取的是不相交的部分。但是在应用时,查询到的是只在SELECT后的表中出现,而不在EXCEPT后的表中出现的数据。
-- Select field
SELECT name
-- From cities
FROM cities
-- Set theory clause
EXCEPT
-- Select field
SELECT capital
-- From countries
FROM countries
-- Order by result
ORDER BY name;