15445 lecture#2 Intermediate SQL

最新推荐文章于 2024-06-03 22:30:27 发布

RedemptionC

最新推荐文章于 2024-06-03 22:30:27 发布

阅读量416

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/RedemptionC/article/details/115215466

15-445 专栏收录该内容

6 篇文章

订阅专栏

这篇博客介绍了SQL的历史，它起源于IBM的System R项目，是一种声明性语言，分为DML、DDL和DCL三类命令。文章还讨论了SQL中的聚合函数，如COUNT、AVG，以及它们在查询中的应用。此外，提到了字符串操作、子查询和窗口函数的概念。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

(本来在想，Intermediate指啥，貌似应该是和advanced相对，中级的意思

1Relational Languages
Edgar Codd published a major paper on relational models in the early 1970s. Originally, he only defined the mathematical notation for how a DBMS could execute queries on a relational model DBMS.
The user only needs to specify the result that they want using a declarative language (i.e., SQL). The DBMS is responsible for determining the most efficient plan to produce that answer.
Relational algebra is based on sets (unordered, no duplicates). SQL is based on bags (unordered, allows duplicates).

SQL的来历

SQL只需要用户指定自己需要什么，DBMS去决定如何高效执行

关系代数基于集合（无序，不重复。那为什么之前slides里有重复，应该是slides出错了）

SQL基于bag（无序，允许重复）

2 SQL History
Declarative query lanaguage for relational databases. It was originally developed in the 1970s as part of the IBM System R project. IBM originally called it “SEQUEL” (Structured English Query Language). The
name changed in the 1980s to just “SQL” (Structured Query Language).
The language is comprised of different classes of commands:
1. Data Manipulation Language (DML): SELECT, INSERT, UPDATE, and DELETE statements.
2. Data Definition Language (DDL): Schema definitions for tables, indexes, views, and other objects.
3. Data Control Language (DCL): Security, access controls.
SQL is not a dead language. It is being updated with new features every couple of years. SQL-92 is the
minimum that a DBMS has to support to claim they support SQL. Each vendor follows the standard to a
certain degree but there are many proprietary extensions.

3 Aggregates
An aggregation function takes in a bag of tuples as its input and then produces a single scalar value as its output. Aggregate functions can only be used in SELECT output list.

aggregation的输入是一系列tuple，输出是一个统计值

仅能用于？

Example: Get # of students with a ‘@cs’ login. The following three queries are equivalent:
SELECT COUNT(*) FROM student WHERE login LIKE '%@cs';
SELECT COUNT(login) FROM student WHERE login LIKE '%@cs';
SELECT COUNT(1) FROM student WHERE login LIKE '%@cs';

Can use multiple aggregates within a single SELECT statement:

SELECT AVG(gpa), COUNT(sid)
FROM student WHERE login LIKE '%@cs';

Some aggregate functions support the DISTINCT keyword:
SELECT COUNT(DISTINCT login)
FROM student WHERE login LIKE '%@cs';

去重，有多少个不同的login

Output of other columns outside of an aggregate is undefined (e.cid is undefined below)
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid;

混用是undefined behavior

如上所说，直接混用是undefined，但是这里用了group by，所以每一组都是一样的cid，就没有问题

where和having的区别：

前者发生在聚集之前，后者发生在聚集之后；前者是对tuple进行filter，后者是对group进行filter

4 String Operations
The SQL standard says that strings are case sensitive and single-quotes only. There are functions to ma-nipulate strings that can be used in any part of a query.
Pattern Matching: The LIKE keyword is used for string matching in predicates.
• “%” matches any substrings (including empty).
• “_ ” matches any one character.
Concatenation: Two vertical bars (“||”) will concatenate two or more strings together into a single string.

SQL标准里字符串是大小写敏感的，并且只使用单引号

用来操作字符串的函数：

模式匹配：%匹配任何字符串，_匹配任何单个字符（是这样的？）

连接函数：||（是这样？）

sql查询的输出，除了可以打印在命令行，还可以转存到其他表

如上述第一个是将所有课程编号保存到一个课程编号表（新创建的）

还可以存进已有的表，输出的relation和已有表的attr数量和类型要一致，但是名字不用相同（？？）

可以用order by对输出结果进行排序

可以指定多个排序的依据，如果第一个排序依据相同，就使用第二个 break tie

order by里也可以使用各种表达式

limit限制输出行数

感觉如果没有order by，offset是不是就没有意义？不是乱序的吗，你怎么知道跳过哪些呢

Unless you use an ORDER BY clause with a LIMIT, the DBMS could produce different tuples in the result on
each invocation of the query because the relational model does not impose an ordering.

这是不是对上面的补充？

7 Nested Queries
Invoke queries inside of other queries to execute more complex logic within a single query. The scope of outer query is included in inner query (i.e. inner query can access attributes from outer query), but not the other way around.

子查询，能够访问外部query的数据，外部不能访问内部的

Note that sid has different scope depending on where it appears in the query.

上面是为了说明子查询可以出现在select后面，from后面，where后面

但是除了第三个，前两个在干嘛？

https://stackoverflow.com/a/10379739/9045233

看了下这个，大概明白了exists的作用：

比如对于

SELECT *
  FROM suppliers
 WHERE EXISTS (select *
                 from orders
                where suppliers.supplier_id = orders.supplier_id);

他是针对supplier的每一行，判断order中存不存在一个supplier_id和外面那一行的supplier_id相同，如果有，那么就输出supplier的那一行

看了下这个链接：

https://learnsql.com/blog/sql-window-functions-examples/

window function和aggregation的区别主要在于，后者是对一组tuple算出一个统计值，而前者是为每个tuple算出一个值，比如行号，比如排名

另外按上面的提示，RANK是在排序之后进行的，说明确实可以用来计算排名，但是行号是在排序之前进行的，那行号的意义是什么呢？

看了下这个对比：

https://saveenr.gitbooks.io/usql-tutorial/content/window-functions/row-number-rank-dense-rank.html

在一个排序结果中，如果a=10,b=8,c=8,那么row_number就是1，2，3

而rank则是1，1，3；也就是说rank中，如果值相同，那么rank相同，而row_number则仅看他们排序之后实际的行数

https://learnsql.com/blog/difference-between-sql-cte-and-view/

CTE其实和view很像，都相当于创建了一个临时表，有一些细微的差别：

如：

Although there are some differences between them, common table expressions and views seem to perform very similarly. So, when should you use each one?

Ad-hoc queries. For queries that are referenced occasionally (or just once), it’s usually better to use a CTE. If you need the query again, you can just copy the CTE and modify it if necessary.
Frequently used queries. If you tend to reference the same query often, creating a corresponding view is a good idea. However, you’ll need create view permission in your database to create a view.
Access management. A view might be used to restrict particular users’ database access while still allowing them to get the information they need. You can give users access to specific views that query the data they’re allowed to see without exposing the whole database. In such a case, a view provides an additional access layer.

另外，view创建之后，在之后的query里，仍然可以引用，但是CTE需要在每次query里都执行这个