[课业] 24 | 数据库基础 | 关系模型

RussellHan

已于 2023-01-21 22:02:01 修改

阅读量1.7k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：课业文章标签：数据库 mysql sql

于 2020-08-11 03:36:54 首次发布

本文链接：https://blog.youkuaiyun.com/RussellHan/article/details/107926127

课业专栏收录该内容

25 篇文章

订阅专栏

文章目录

知识
例题

知识

名词解释

Data Model：一组描述如何用计算机化的信息表示现实世界中的数据的集合，它同时也描述了访问和更新这些信息的操作类型
两套标准术语
关系——表；元组——行；属性——列

CAP数据库

他的表
他的数据模型
CAP数据库表中每一行都是不同的，每张表都有唯一的标识符

数据库各部分的命名

Table/表：又称关系/relation
Column name/列名；又称属性/attribute
Row/行：又称元组/tuple
Table heading/表标题（不是表名）：又称属性集合/schema
两个定义
CAP = {Customers, Agents, Products, Orders}——数据库的表名集合
Head(Customers) = {cid, cname, city, discnt}——一张表的属性集合
程序的数据独立性：当通过查询回答一个问题时，即使所有的数据改变了，查询依然能够回答该问题，查询语句与表内容无关，仅与列名有关
列的域是枚举类型
域一般可定义为：可以用作表的属性值的常数集合（一般域在某些编程语言中对应一些特定的枚举类型）
笛卡尔乘积
设CID = Domain(cid), CNAME = Domain(cname) , CITY = Domain(city), DISCNT = Domain(discnt)
这四个域的笛卡尔乘积CID x CNAME x CITY x DISCNT = 包含所有元组(w, x, y, z)（其中w来自CID，x来自CNAME，y来自CITY，z来自DISCNT）的集合；这四个域的关系是笛卡尔乘积的子集（CUSTOMERS表包含于 CID x CNAME x CITY x DISCNT）

关系规则

规则一：第一范式：不能有多值域
多值域的例子

将这个表根据规则一的规则改进

这种改进浪费空间，更有效的做法是再做出一个独立的表，如图
规则二：只能基于内容存取行(Access rows by Content only)
不能说“取第三行”因为行、列都没有序
不允许使用指针指向某些行
注：大部分关系数据库产品破坏了这一规则，它允许用户通过RIDS来取行
规则三：行唯一规则(The unique row rule)
任何两个元组在同时刻不能完全相同
注：但是许多数据库产品允许行不唯一，以便有效加减
在这一章，假设这三条规则都被遵守

键、超键、空值

键：表中用于区分行的列的集合（如CUSTOMERS表中的cid）
键是由DBA定义的，由DBA决定哪些列具有这一属性
其他表可以通过键来引用这张表中的一行
超键：具有唯一属性的列的集合，如CUSTOMERS的超键为(cid, cname)；能唯一识别元组的属性集
键是最小的超键，组成键的列的集合再也没有子集是表的超键（即键是没有多余的）
一个列可作为键的条件：这个列在所有行的值都不同，但凡有两个相同的都不能做键
候选键：就是键
主键：由数据库设计者选择出来用于在其他表的引用中唯一地识别好的候选键，通常主键标识符被用来作为别的表中的引用（如Orders表用cid来引用Customers）
定理：每个表都至少有一个键
空值：被放在表中的某个域中，表示具体值未知或者尚未定义
例如：
插入一个新的Agent：(a₁₂, Beowulf, unknown, unknown)
特定信息未到被配期，但是表内还需要有他的记录，此时用空值顶替
空值可以被用在数值型或字符型数据域，但他与任何实数域取值不同
另外，他与0或空字符串也不同
空值会被商业数据库特殊处理，如根据某特征查询时，要筛选percent < 6的agents，此信息域值为unknown的不会被筛出
规则四：实体完整性(Entity integrity rule)
表T中的任意一行在主键列的取值都不允许为空值

关系代数（由E. F. Codd发明的抽象语言）

关系代数可以被看作是根据查询结果来生成新表啊的方法的集合
存储在表中的信息经过关系代数的操作可以得到表形式的查询结果
关系代数的基本操作有两类：集合论运算和自然关系运算
集合论运算依赖于一个事实：表是行的集合
自然关系运算依赖于表的结构
关系运算的优先级
从优先级高到低：投影，选择，乘积，连接、除法，交，并、差

集合运算

集合论运算

名称	符号	Form	示例
Union 并	$\cup$	UNION	R $\cup$ S
Intersection 交	$\cap$	INTERSECT	R $\cap$ S
Difference 差	—	MINUS	R — S
Product 乘积	$\times$	TIMES	R $\times$ S

相容表（兼容表）：如果表R和S具有相同的标题，即Head(R) == Head(S)，并且属性是从相同的域中选择并且具有相同含义，那么R和S就是相容的、兼容的
表的列是无序的
交、并、差运算
仅当两表相容时才可以做交、并、差运算
交、并操作满足交换律、结合律（略）
差操作不满足交换律

即：R – S $\neq$ S – R
差运算满足的操作
R – S = R – (R $\cap$ S)
S – R = S – (R $\cap$ S)
R $\cap$ S = R – (R – S) = S – (S – R)
(R – S) $\cap$ R = $\Phi$
(R – S) $\cap$ S = R – S
(R – S) $\cap$ (S – R) = $\Phi$

赋值、别名
R是一个表，Head(R) = {A₁, …, A_n}，假定B_n是n个属性，对于所有1 $\leq$ I $\leq$ n，它的域Domain(B_i) = Domain(A_i)；定义一个新表S，新表S有Head(S) = {B₁, …, B_n}，则经过

S(B ₁, ..., B _n) := R(A ₁, ..., A _n) 后，两个表的内容相同

赋值中使用符号":="称为复制运算符
我们可以通过赋值操作定义新表，从而把表达式计算的中间结果保存下来

乘积运算
R $\times$ S：如果Head(R) = {A₁, …, A_n}，Head(S) = {B₁, …, B_m}，则对T := R $\times$ S，Head(T) = {R.A₁, …,R. A_n, S.B₁, …, S.B_m}
t为T上的行，当且仅当R上一行u与S上一行v，有t为u、v的串接，即t(R.A_i) = u(A_i)；t(S.B_i) = v(B_i)
如果表R的列数为C_R，表S的列数为C_S，则R $\times$ S 的列数为C_R + C_S
如果表R的行数为N_R，表S的行数为N_S，则R $\times$ S 的行数为N_R $\times$ N_S
注意：并不是所有笛卡尔积的元素都是关系上的元素，表中每一行都与不同的列上的值相关联，所以表中行的完整集合才被称为一个关系；所有笛卡尔积表中有很多行都是没有意义的

自然关系运算

自然关系运算

名称	符号	Form	示例
Project 投影	R[] $\pi$	R[]	R[A_i₁, …, A_{I_k}] $\pi$ _{A_i₁, …, A_{I_k}} (R)
Select 选择	R where C $\delta$	R where C $\delta$ _C(R)	R where A₁ == 5 $\delta$ _A₁ == 5(R)
Join 连接	$\infty$	JOIN	R $\infty$ S
Division 除法	$\div$	DIVIDEBY	R $\div$ S

投影运算
投影运算选择表中的列
投影可记为R[A_i₁, A_i₂, …, A_{i_k}]，其中A_i₁, A_i₂, …, A_{i_k} $\in$ [A₁, …, A_n]
R在属性A_i₁, A_i₂, …, A_{i_k}上的投影是标题为Head(T) = {A_i₁, A_i₂, …, A_{i_k}}的表T，包含了对所有R上的行r，在T上也存在一个唯一的行t，对于所有的A_ij $\in$ [A₁, …, A_n]，有r[A_ij] = t[A_ij]
投影结果中重复的行将被删除
选择运算
给定一个表S，Head(S) = {A₁, …, A_n}，选择运算定义了一个新表，用S where C表示
他与表S具有相同的属性集合，包含S中满足选择条件（记为C）的元组
定义选择条件C
C可以是任何形如A_i $\theta$ A_j或A_i $\theta$ a的比较，其中A_i、A_j是S中具有相同域的属性，a是Domain(A_i)中的一个常数， $\theta$ 是比较符号，可以是<、>、=、 $\leq$ 、 $\geq$ 、<>等；如果C和C’都是条件，那么新条件可以是C AND C’, C OR C’, NOT C等
连接运算
功能：合并当前表与另一张表或合并当前表与自身；公共属性相同的行连接在一起，公共属性不同的行舍弃
定义：等质连接、自然连接
Head(R) = {A₁, …, A_n, B₁, …, B_k}, Head(S) = {B₁, …, B_k, C₁, …, C_m}，其中{B₁, …, B_k}是两个表共享的完全属性子集，该子集在k=0时可能为空；{A₁, …, A_n}, {C₁, …, C_m}同样可空
则Head(R $\infty$ S) = {A₁, …, A_n, B₁, …, B_k, C₁, …, C_m}
而且，行t在表R $\infty$ S中，当且仅当存在2个行u $\in$ R, v $\in$ S对所有的1 $\leq$ I $\leq$ k，有u[B_i] = v[B_i]成立
行t属性列上的值：
t[A_j] = u[A_j]
t[B_k] = u[B_k] = v[B_j]
t[C_l] = v[C_l]
而且有
Head(R) – Head(S) = {A₁, …, A_n}
Head(S) – Head(R) = {C₁, …, C_m}
Head(S) $\cap$ Head(R) = {B₁, …, B_k}
Head(R) $\cap$ Head(S) = $\Phi$ 时，R $\infty$ S = R $\times$ S
Head(R) = Head(S)时，R $\cap$ S = R $\infty$ S
Head(R) $\subseteq$ Head(S)时，R $\infty$ S $\subseteq$ S（属性与S的相同，元组比S的少）
注意：连接操作做两件事：一方面连接属性，另一方面减少一些元组
乘积运算和连接运算都满足结合律、交换律
除运算
考虑两个表R，S，其中Head(R) $\subset$ Head(S)
特别地，假定Head(R) = {A₁, …, A_n, B₁, …, B_k}, Head(S) = {B₁, …, B_k}
表T是R $\div$ S的结果
当Head(T) = {A₁, …, A_n}时，T包含了行的最大可能集合
T中行t有：对于S中的每一行s，我们可以在R中找到一行R，使得t(A_j) = r(A_j), S(B_j) = r(B_j)（即S中的行s与T中的行t串接的结果可以在R中找到）
除运算的结果没有除数的属性，结果中没有被除数中与除数属性内容不同的行
除数属性必须是被除数的子集
注意：除运算做两件事：一方面减少除数的属性，另一方面减少一些行（即是被除数被删去的那些属性中与除数的对应属性们上的值不同的行被删去） ；被除表在被除表特有的属性上每行值都相同的情况下，除表属性部分包括了整个除表的所有行的这些行能留下（还得去除重复）
除运算示例
定理：除运算与笛卡尔乘积的联系
如果R = T $\times$ S，则T = R $\div$ S, S = R $\div$ T
如果T = R $\div$ S，则T $\times$ S $\subseteq$ R
除运算的应用
关键词all/所有等的问题可以用除运算回答
其中，被所有描述的部分作为除数，包含所有的那个东西作为被除数的特有属性

运算依赖

定理1: A $\cap$ B = A – (A – B)
定理2:
Head(R) = {A₁, …, A_n, B₁, …, B_k}, Head(S) = {B₁, …, B_k, C₁, …, C_m}
T₁[R.A₁, …, R.A_n, R.B₁, …, R.B_k, S.B₁, …, S.B_k, S.C₁, …, S.C_m] := R $\times$ S（做笛卡尔积）
T₂ := T₁ where R.B₁ = S.B₁, …, R.B_k = S.B_k（去除相同属性上值不同的元组）
T₃ := T₂[R.A₁, …,R.A_n, R.B₁, …, R.B_k, S.C₁, …, S.C_m]（去除重复的属性）
Then: T₃ = R $\infty$ S
定理3:
Head(R) = {A₁, …, A_n, B₁, …, B_k}, Head(S) = {B₁, …, B_k}
T₁ := R[A₁, …, A_n]
T₂ := T₁ $\times$ S
T₃ := T₂ – R
T₄ := T₃[A₁, …, A_n]
T₅ := T₁ – T₄
Then: T₅ = R $\div$ S

其他关系操作

左外连接(Left Outer Join)、右外连接(Right Outer Join)
之前的连接叫内连接
内连接的“连接结点”是两表公共属性；左外连接的“连接结点”是左操作数的属性，对右操作数属性和值来说：属性加进来，没有对应行的值的位置填null，多出来的行直接删除；右外连接同理
$\theta$ 连接(Theta Join)
R $\infty$ _F S = (R $\times$ S) where F，其中F是某条件

例题

e.g. 1. 找出在Tokyo的顾客
解：
T := CUSTOMERS where city=‘Tokyo’

e.g. 2. 检索所有佣金比例超过6%的代理商
解：
T := AGENTS where percent >= 6

e.g. 3. 检索所有的代理对，它们都具有超过6%的佣金比例，而且他们要在同一城市
解：
T := AGENTS where percent >= 6
L := AGENTS where percent >= 6
PAIRS := (L $\times$ T) where L.city = T.city
注意：不要在同一个表的基础上自乘，因为乘完之后的表会分辨不清属性原属于哪个表

e.g. 4. 找出折扣率小于10%的顾客或佣金小于6%的代理商的城市
解：
T₁ := (CUSTOMERS where discount < 10) [city]
T₂ := (AGENTS where percent < 6)[city]
T₃ := T₁ $\cup$ T_2>

注意：下面的解法是错误的
R₁ := CUSTOMERS where discount < 10
R₂ := AGENTS where percent < 10
R₃ := (R₁ $\cup$ R₂)[city]
因为R₁和R₂是不相容表

e.g. 5. 找出顾客的城市、名字，代理商的城市、名字，要求二者住在同一城市
解：
T := ((CUSTOMERS $\times$ AGENTS) where (CUSTOMERS.city = AGENTS.city))[CUSTOMERS.cname, AGENTS.aname, AGENTS.city]
解释：先得到所有的（顾客，代理商）对，再进行筛选
或
T := (CUSTOMERS $\infty$ AGENTS)[CUSTOMERS.cname, AGENTS.aname, AGENTS.city]

e.g. 6. 找出顾客名是Allied的订单的pid，month和quantity
解：
T := ((CUSTOMERS $\times$ ORDERS) where ((CUSTOMERS.cid = ORDERS.cid) and (CUSTOMERS.cname=‘Allied’)))[ORDERS.pid, ORDERS.month, ORDERS.quantity]
或
T := (((CUSTOMERS where cname='Allied)[cid]) $\times$ ORDERS)[ORDERS.pid, ORDERS.month, ORDERS.quantity]
或
T := ((CUSTOMERS $\infty$ ORDERS) where cname=‘Allied’)[ORDERS.pid, ORDERS.month, ORDERS.quantity]

e.g. 7. 查找顾客所在城市、代理商城市、商品城市相同的订单号
解：
T := (CUSTOMERS $\times$ AGENTS $\times$ PRODUCTS $\times$ ORDERS) where CUSTOMERS.city = AGENTS.city and CUSTOMERS.city = PRODUCTS.city and ORDERS.pid = PRODUCTS.pid and ORDERS.aid = AGENTS.aid and ORDERS.cid = CUSTOMERS.cid)[ORDERS.ordno]

e.g. 8. 查找顾客中拥有最高折扣的cid
解：
T := CUSTOMERS[cid, discount]
M := CUSTOMERS[cid, discount]
L := ((T $\times$ M) where T.discount < M.discount)[CUSTOMERS.cid, CUSTOMERS.discount]
P := (T – L)[cid]
总结：找出某个指标的最值，用这个表的复制表与他做所有元组的该指标的比较

e.g. 9. 找出订购过商品p01的顾客名字
解：
T := (ORDERS $\infty$ CUSTOMERS) where pid=‘p01’)[cname]

e.g. 10. 查询至少订购$0.5商品的顾客名字
解：
T := (((CUSTOMERS $\infty$ ORDERS) $\infty$ PRODUCTS[pid, price]) where price >= 0.5)[cname]
注意：
下面的答案是错误的，因为还要考虑到其他公共属性如city等，会遗失一些元组
R := (((CUSTOMERS $\infty$ ORDERS) $\infty$ PRODUCTS) where price >= 0.5)[cname]

e.g. 11. Get cids of customers who order products p01
解：
T₁ := (ORDERS where pid=‘p01’)[cid]

e.g. 12. Get cids of customers who order products p01 and p02
解：
T₂ := (ORDERS where pid=‘p02’)[cid]
T := T₁ $\cap$ T₂

e.g. 13. Get cids of customers who order all products
解：
T := ((ORDERS[cid, pid] $\div$ PRODUCTS[pid]) $\infty$ CUSTOMERS)[cname]
解释：要找出ORDERS中对所有pid，cid都一样的行

e.g. 13. Get names of customers who order all products ordered by c006
解：
T := ((ORDERS[cid, pid] $\div$ ((ORDERS where cid=‘c006’)[pid])) $\infty$ CUSTOMERS)[cname]

e.g. 14. Get pid of products ordered through all agents
解：
T := ORDERS[pid, aid] $\div$ AGENTS[aid]

e.g. 15. Get name of products ordered by all customers who live in Dallas
解：
T := ((ORDERS[cid, pid] $\div$ (CUSTOMERS where city=‘Dallas’)[cid]) $\infty$ PRODUCTS)[pname]

e.g. 16. Get cids of customers who order all products priced at $0.50
解：
T := ORDERS[cid, pid] $\div$ ((PRODUCTS where price=0.50)[pid])

e.g. 17. Get cids of customers who order all products that anybody orders
解：
T := ORDERS[cid, pid] $\div$ ORDERS[pid]

e.g. 18. Get aids of agents who do not supply p02
解：
T := AGENTS[aid] – (ORDERS where pid=‘p02’)[aid]

e.g. 19. Get aid of agents who supply only product p02
解：
T := ORDERS[aid] – (ORDERS where pid<>‘p02’)[aid]
注意：下面的答案是错误的
R := AGENTS[aid] – (ORDERS where pid<>‘p02’)[aid]
因为这样就包含了不代理任何商品的代理商

e.g. 20. Get aids of agents who take orders on at least that set of products ordered by c004
解：
T := ORDERS[aid, pid] $\div$ ((ORDERS where cid=‘c004’)[pid])

e.g. 21. Get cids of customers who order p01 and p07
解：
T := (ORDERS where pid=‘p01’)[cid] $\cap$ (ORDERS where pid=‘p07’)[cid]
或
T := ORDERS[cid, pid] $\div$ (PRODUCTS where pid=‘p01’ or pid=‘p07’)[pid]

e.g. 22. Get cids of customers who order p01 or p07
解：
T := (ORDERS where pid=‘p01’)[cid] $\cup$ (ORDERS where pid=‘p07’)[cid]

e.g. 23. List all cities inhabited by customers who order p02 or agents who place an order for p02
解：
T := ((ORDERS where pid=‘p02’)[cid] $\infty$ CUSTOMERS)[city] $\cup$ ((ORDERS where pid=‘p02’)[aid] $\infty$ AGENTS)[city]

e.g. 24. Get aids of agents who place an order for at least one customer that uses product p01
解：
T := (ORDERS $\infty$ (ORDERS where pid=‘p01’)[cid])[aid]

e.g. 25. Get aids of agents who place orders for all customers that use product p01
解：
T := ORDERS[aid, cid] $\div$ (ORDERS where pid=‘p01’)[cid]

e.g. 26. Retrieve product ids for all products that are not ordered by any customer living in a city beginning with the letter ‘D’
解：
T := PRODUCTS[pid] – (ORDERS $\infty$ (CUSTOMERS where city>=‘D’ and city<‘E’)[cid])

e.g. 27. Retrieve cids of customers with the largest discounts
解：
M := CUSTOMERS
T := CUSTOMERS[cid] – ((CUSTOMERS $\times$ M) where CUSTOMERS.discount<M.discount)[CUSTOMER.cid]
或

e.g. 28. Get the names of customers who order at least one product priced at $0.50
解：
T := (((PRODUCTS where price=0.50)[pid] $\infty$ ORDERS) $\infty$ CUSTOMERS)[cname]

e.g. 29. Find cids of all customers who don’t place any order through agent a03
解：
T := CUSTOMERS[cid] – (ORDERS where aid=‘a03’)[cid]
注意：下面的答案是错误的
R := ORDERS[cid] – (ORDERS where aid=‘a03’)[cid]
因为他没考虑没下过单的顾客

e.g. 30. Retrieve customers who place orders only through a03
解：
T := ORDERS[cid] – (ORDERS where aid<>‘a03’)[cid]

e.g. 31. Find products that have never been ordered by a customer based in NYC through an agent based in Boston
解：
T₁ := (CUSTOMERS where city=‘NYC’)[cid]
T₂ := (AGENTS where city=‘Boston’)[aid]
T₃ := ((ORDERS $\infty$ T₁) $\infty$ T₂)[pid]
T := PRODUCTS[pid] – T₃

e.g. 32. Get names of customers who order all products priced at $0.50
T := ((ORDERS[cid, pid] $\div$ (PRODUCTS where price=0.50)[pid]) $\infty$ CUSTOMERS)[cname]

e.g. 33. Get cids of customers who order all products that anybody orders
解：
T := ORDERS[pid, cid] $\div$ ORDERS[pid]

e.g. 34. Get aids of agent who take orders on at least that set of products ordered by c004
解：
T := ORDERS[aid, pid] $\div$ (ORDERS where cid=‘c004’)[pid]

e.g. 35. Get cids of customers who place an order through at least one agent who places an order for product p03
解：
T := (ORDERS $\infty$ (ORDERS where pid=‘p03’)[aid])[cid]

e.g. 36. Get cids of all customers who have the same discount as any custoemr in Dallas or Boston
解：
T := (CUSTOMERS $\infty$ (CUSTOMERS where city=‘Boston’ or city=‘Dallas’)[discount])[cid]

e.g. 37. List pids of products that are ordered through agents who place orders for (possibly different) customers who order at least one product from an agent who has placed an order for customer c001
解：
T := (ORDERS $\infty$ (ORDERS $\infty$ ((ORDERS $\infty$ (ORDERS where cid=‘c001’)[aid])[cid]))[aid])[pid]

e.g. 38. Get pids of products not ordered by any customer living in a city whose name begin with the letter ‘D’
解：
T := PRODUCTS[pid] – (ORDERS $\infty$ (CUSTOMERS where city>=‘D’ and city<‘E’)[cid])[pid]

e.g. 39. Find all order number values for orders whose order quantity exceeds the current quantity on hand from the product
解：
T := (ORDERS $\infty$ _{ORDERS.pid = PRODUCTS.pid and ORDERS.quantity < PRODUCTS.quantity} PRODUCTS)[ordno]