Recap of Data Storage in Files • Data is stored in files using primary organization — Unordered (heap) — Ordered (sequential) — Hashed 数据主要存储方式无序有序哈希• To speed up data retrieval, indexes are defined on the data files based on — Ordering Key field – unique key values for all the records – primary index OR — Ordering Non-key field – clustering index AND — Non-ordering non-key fields – Secondary indexes 关键字段排序 非关键字段排序 非排序非关键字段• To search for a required record (whose key is given in the WHERE part of the query) in the data file, DBMS first searches the index — Once index is located the pointer field of the index leads the DBMS to the disk page where the required record is located — Binary search can be performed on the ordered index DBMS 首先搜索索引 索引的指针字段会将 DBMS 引向所需记录所在的磁盘 可在有序索引上执行二进制搜索
Primary Indexes • The data file is sequentially ordered on the key field • Index file stores all (dense) or some (sparse) values of the key field and the page number of the data file in which the corresponding record is stored主索引 数据按关键字段顺序排序 索引文件存储关键字段全部或部分值以及文件页码
Multi-level Index • If the index information is large it needs to be stored on the hard disk • This means efficient techniques are required for searching indexes as well — Faster than a binary search on the ordered index • The key idea used to improve search efficiency is to add another level of index to the initial level of index • This idea can be repeated several times to define several levels of index — The top level index is made to fit into a single disk page — This top level search gives the pointer to the required lower level index page or the pointer to the required data page • This is the central idea behind Multi-level indexes • ISAM uses a Multi-level index多级索引 索引信息较大,存储在硬盘上 币有序索引上进行二进制搜索更快 用于提高搜索效率的关键想法是在初始索引级别上添加另一个索引级别 定义多个索引级别 将顶级索引制作成适合单个磁盘页面 核心思想:顶级搜索提供指向所需低级索引页面的指针或指向所需数据页面的指针 ISAM 使用多级索引
Dynamic Multi-level Index • Although multi-level indexes (as described earlier) can speed up search they perform poorly with insertions and deletions • Dynamic multi-level index addresses this problem by leaving out some space in each of its pages for new entries • Dynamic multi-level index is implemented using data structures called B-Trees and B+-Trees — B+-Trees are a variation on B-Trees — B+-Trees are more commonly used for indexing than B-Trees动态多级索引 多级索引在插入和删除时表现不佳 动态多级索引在每个页面中为新条目留出一些空间 使用称为 B 树和 B+ 树的数据结构来实现 B+ 树是 B 树的一种变体,更常用于编制索引
B-Tree B-Tree stands for a Balanced tree • All the paths through a B-Tree from root to different leaf nodes are of the same length (balanced path lengths) — All leaf nodes are at the same depth (level) • This ensures that number of disk accesses required for all the searches are same — The lesser the depth (level) of an index tree the faster the search B-Tree 从根节点到不同叶节点的所有路径长度相同 所有叶节点的深度相同 确保了所有搜索所需的磁盘访问次数相同 索引树的深度越小,搜索速度越快
B+-Tree • B-Tree stores data pointers in non-leaf nodes and also leaf nodes (i.e., 1,3,5,6,7,8,9,12) • B+-Tree stores data pointers in leaf nodes only — This means leaf nodes and non-leaf nodes are structured differently in B+-Tree — The saved space in the non-leaf (internal) nodes is used to store more keys and more tree pointers ◦ Reduction in the depth of a B+-Tree ◦ Faster search B-Tree 在非叶节点和叶节点(即 1,3,5,6,7,8,9,12)中存储数据指针 - B+-Tree 仅在叶节点中存储数据指针 - 这意味着叶节点和非叶节点在 B+-Tree 中的结构不同 - 在非叶(内部)节点中节省的空间用于存储更多的键和更多的树指针 ◦ 减少 B+-Tree 的深度 ◦ 加快搜索速度
• B+ Tree is a Balanced Tree with the following properties • The structure of a B+-Tree is defined based on a parameter called ‘Order’ denoted by p — Order of a B+-Tree depends upon the page size and the sizes of different fields in the tree nodes • The internal and leaf nodes in a B+-Tree are structured differently • Therefore the order of leaf node is different from the order of the internal nodes and we use — p – order of internal node — pleaf – order of leaf node- B+ 树的结构是根据一个名为 "顺序 "的参数定义的,用 p 表示 - B+ 树的顺序取决于页面大小和树节点中不同字段的大小 - B+ 树中内部节点和叶子节点的结构不同 - 因此叶子节点的顺序与内部节点的顺序不同,我们使用 - p - 内部节点的顺序 - pleaf - 叶子节点的顺序
Internal Node For a B+-Tree of order p internal nodes are structured as follows • Each internal node is of the form < P1, K1, P2, K2, ..., Pq−1, Kq−1, Pq > where q ≤ p and each Pi is a tree pointer and Ki is an index • Within each internal node, K1 < K2 < ... < Kq−1 – indexes are sorted • For all search field values X in the subtree pointed at by Pi , — Ki−1 < X ≤ Ki and 1 < i < q; X ≤ Ki for i = 1; and Ki−1 < X for i = q • Each internal node has at most p tree pointers • Each internal node, except the root has at least ceiling(p/2) tree pointers • The root node has at least two tree pointers if it is an internal node • An internal node with q pointers, q ≤ p, has q − 1 index values内部节点 对于阶数为 p 的 B+ 树,内部节点的结构如下 - 每个内部节点的形式为 < P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq >,其中 q ≤ p,每个 Pi 是一个树指针,Ki 是一个索引 - 在每个内部节点内,K1 < K2 < ... < Kq-1 - 索引被排序 - 对于 Pi 所指向的子树上的所有搜索字段值 X, - Ki-1 < X ≤ Ki,且 1 < i < q;X ≤ Ki-1 < Kq-1 - 索引排序 - 对于 Pi 指向的子树上的所有搜索字段值 X, - Ki-1 < X ≤ Ki,且 1 < i < q;i = 1 时,X ≤ Ki;i = q 时,Ki-1 < X - 每个内部节点最多有 p 个树指针 - 除根节点外,每个内部节点至少有 ceiling(p/2) 个树指针 - 如果根节点是内部节点,则至少有两个树指针 - 有 q 个指针的内部节点,q ≤ p,有 q - 1 个索引值
Leaf Node • Leaf nodes are structured as follows — Each leaf node is of the form , < K2, Pr2 >, ..., < Kq−1, Prq−1 >, Pnext > where q ≤ p,each Pri is a data pointer, and Pnext points to the next leaf node in the B+-tree — Within each leaf node, K1 <= K2, ..., Kq−1, q ≤ p — Each leaf node has at least ceiling(p/2) values — All leaf nodes are the same level – balanced • In B+-tree all the leaf nodes are linked together — First level of index as linked list (could be doubly linked as well)叶节点的结构如下 - 每个叶节点的形式为 << K1, Pr1 >, < K2, Pr2 >, ..., < Kq-1, Prq-1 >, Pnext >,其中 q ≤ p,每个 Pri 是一个数据指针,Pnext 指向 B+ 树中的下一个叶节点 - 在每个叶节点内,K1 <= K2, ..., Kq-1, q ≤ p - 每个叶节点至少有 ceiling(p/2) 值 - 所有叶节点都是同级的 - 平衡的 - 在 B+ 树中,所有叶节点都是平衡的。 ...,Kq-1,q ≤ p - 每个叶节点至少有 ceiling(p/2) 值 - 所有叶节点都在同一层次 - 平衡 - 在 B+ 树中,所有叶节点都链接在一起 - 第一层索引为链表(也可以是双链表)
We illustrate index insertion with an example • We want to insert the following indexes into an empty B+-Tree of p = 3 and pleaf = 2 — 8, 5, 1, 7, 3, 12 • Initially you start with the root node which is of type leaf node (no children yet)
• Overflow in leaf node • Split the leaf node the first j = ceiling((pleaf +1)/2) entries are kept in the original node and the remaining moved to the new leaf node • Create a new internal node and the j th index value is replicated in the parent internal node • A pointer is added to the newly formed leaf node 拆分叶节点 前 j = ceiling((pleaf +1)/2) 条目保留在原节点中,其余的移至新的叶节点 - 创建新的内部节点,第 j 个索引值复制到父节点内部节点中 - 将指针添加到新形成的叶节点中
• Space available in nodes to store new entries without creating new nodes节点中可用来存储新条目而无需创建新节点的空间
• Overflow in leaf node • Split the leaf node the first j = ceiling((pleaf +1)/2) entries are kept in the original node and the remaining moved to the new leaf node the j th index value is replicated in the parent internal node • A pointer is added to the newly formed leaf node同上述操作除了第二部不需要创建新的内部节点而是将第 j 个索引值复制到父节点内部
• Overflow in internal node • Split the internal node the entries upto Pj where j = floor((p+1)/2) are kept in the original node and the remaining moved to the new internal node • Create a new internal node and the j th index value is moved to the parent internal node (without replication) • Pointers are added to the newly formed nodes内部节点溢出 - 拆分内部节点,将 Pj 以下的条目(其中 j = floor((p+1)/2) 保留在原始节点中,其余的移至新的内部节点 - 创建新的内部节点,将第 j 个索引值移至父内部节点(无复制) - 将指针添加到新形成的节点中
• You can see that not all insertions required creation of new nodes • B+-Trees ensure that some space is always left in nodes for new entries • Also B+-Trees also make sure all nodes are at least half full并非所有的插入都需要创建新节点 - B+ 树确保节点中总是留有一些空间供新条目使用 - 此外,B+ 树还确保所有节点至少有一半是满的
Search • Given an index, K to be searched — Start at the root node — Search for the pointer to follow to the lower level of the tree until a leaf node is found — Search for the key in the leaf node给定要搜索的索引 K - 从根节点开始 - 搜索指向树下层的指针,直到找到叶节点 - 搜索叶节点中的键
Deletion • We illustrate index deletion with an example • We want to delete the following indexes from a B+-Tree of p = 3 and pleaf = 2 — 5, 12, 9
• Underflow in leaf node — If a sibling node (right or left) exists redistribute entries among • The node and its siblings so that both are at least half full — Else merge the node with its siblings to reduce the number of leaf nodes — Modify the parent internal node to reflect the redistribution 如果存在同级节点(右侧或左侧),则在节点及其同级节点之间重新分配条目,使两者至少都满一半 - 否则将节点与其同级节点合并,以减少叶节点的数量 - 修改父节点内部节点,以反映重新分配的情况
Summary • B+-Trees provide efficient operations of — Search, insert and delete • Real databases have nodes of size equal to one disk page (say of 1KB size) — Thus each node stores lot more indexes than the examples shown here — Therefore achieve short search trees (small depth values) leading to faster search • B+ trees offer dynamic multilevel index — Dynamic ◦ Allow simple insertion and deletion operations in majority of cases — Multilevel ◦ First level index in the form of the linked list of all its leaf nodes ◦ Each subsequent internal level in a B+-Tree offers another level of index B+ 树提供了高效的搜索操作、 B+ 树提供动态多级索引 - 动态 ◦ 在大多数情况下允许简单的插入和删除操作 - 多级 ◦ 第一级索引以其所有叶节点的链表形式存在 ◦ B+ 树中每个后续内部层提供另一级索引