【CS 61b study notes 5】Abstract Data Type and Binary Search Tree

Abstract Data Type

We know that a data type signifies the type and space taken by the data used in programs. An abstract data type is a special data type that is defined by a set of values and a set of operations on that type.

We call these data types as “abstract” because these are independent of any implementation. We can use these data types and perform different operations with them , but we do not know how these operations are working internally.

An Abstract Data Type is defined only by its operations , not by its implementation.
Notes: Interfaces in java are not purely abstract as they can contain some implementation details e.g. default methods.
Some commonly used ADT’s are :

  • Stacks: Structures that supports last-in first-out retrieval of elements
    • push(int x) : puts x on the top of the stack
    • int pop() : takes the element on the top of the stack
  • Lists : an ordered set of elements
    • add(int i): adds an element
    • int get(int i) : gets element at index i
  • Sets: an unordered set of unique elements(no repeat)
    • add(int i) : adds an element
    • contains(int i): returns a boolean for whether or not the set contains the value
  • Maps : set of key/value pairs
    • put(K key, V value) : puts a key value pair into the map
    • V get(K key): gets the value corresponding the key
  • The bolded ADT’s are sub-interfaces of a bigger overarching interface called Collection
    • Among the most important interfaces in the java.util.library are those that extend the Collection interface. ( interfaces can extend other interfaces)

Types and Operations of ADT

  • Types
    • We can classify the Abstract data types either as built-in or user-defined or as mutable or immutable
  • Operations
    • Creators : Creators create new objects of the type. It may take an object as an argument
    • Producers : Producers create new objects from old objects of the type.
    • Observers : Observers take the objects of the abtract data type and return objects of a different type . eg: size() method of the List returns an ine
    • Mutators : Mutators change objects . eg: the add() method of List change a list by adding an element to the end.
    • Some Abstract Data type with some of their operations and the types
      • int is a primitive integer type of java and is immutable. So its operations are:
        • creators : the numeric literals 0,1,2,3
        • producers : arithmetic operators : + - * /
        • obeservers : comparison operators ==,!= ,<,>
        • mutators: None
      • list is an interface of java List. List is mutable. So its operations are:
        • creators : ArrayList and LinkedList constructors, Collections.singletonList
        • producers : Collections.unmodifiableList
        • observers : size, get
        • mutators: add, remove, addAll, Collections.sort
      • string is java’s string type and is immutable. So its operations are:
        • creators : String constructors
        • producers : concat, substring, toUpperCase
        • observers : length, charAt
        • mutators: none (it’s immutable)

Binary Search Tree

  • Properties of trees
    • Trees are composed of
      • Nodes
      • Edges that connect those nodes
        • Constraint : there is only one path between any two nodes
      • (Leaves) :nodes with no children
      • root node has no parents
  • Binary Trees
    • binary property constraint : each node has either 0, 1, 2
  • Binary Search Tree
    • For every node X in the tree, every key in the left subtree is less than X’s key
    • For every node X in the tree, every key in the right subtree is greater than X’s key
    • Search : If the tree is relatively “bushy” , the find operation will run in log(n) time because the height of the tree is log(n).
    • Insert : always insert at a leaf node. First we search in the tree for the node. If we find it , then we do not do anything . If we do not find it , we will be at the leaf node already . At this point, we can just add the new element to either the left or the right of the leaf ,preserving the BST property.
    • Delete : 3 categories:
      • the node we are trying to delete has no children : just delete
      • has 1 child : reassgin the parent’s child pointer to the node’s child
      • has 2 children : need to choose a new node to replace the deleted one. To find these nodes, we can take the right-most node in the left sub-tree or the left-most node in the right subtree. Hibbard deletion.

BST Perfomance

  • depth : the number of links between a node and the root
  • height : the lowest depth of a tree
  • average depth : average of the total depths in the tree . We can calculate this by taking ∑ i = 0 D d i n i N \frac{\sum_{i=0}^{D}d_i n_i}{N} Ni=0Ddini where d i d_i di is the depth and n i n_i ni is number of nodes at that depth.
  • The height of the tree determines the worst-case runtime , because in the worst case the node we are looking for is at the bottom of the tree. If a tree is splindly , then basically a linked list and the runtime is linear , so it takes Θ N \Theta{N} ΘN. If the tree is bushy , then the height of the tree is logN and therefore the runtime grows in LogN time.
  • The average depth determines the average-case runtime.
  • The order you insert nodes into a BST determines its height.

B-Tree

The problem with BST is that we always insert at a leaf node, which causes the height to increase. And when we start inserting nodes , we could potentially break the balanced structure.
Idea : When we insert , let us just add to current leaf node. And set a limit on the number of the elements in a single node. If we set the limit to 4 , which means if we need to add a new element to a node when it already has 4 elements , we will spilt the node in half , by bumping up the middle left node. These tree are called B-Trees or 2-3-4/2-3 Trees.
2-3-4 Tree (L-3) : Max 3 items per node, and Max 4 none-null children per node.

  • Insertion Process : the process of adding a node to a 2-3-4 tree (L =3)

    1. We still always inserting into a left node , so take the node you want to insert and traverse down the tree with it, going left and right according to whether or not the node to be inserted is greater than or smaller than the items in each node
    2. After adding the node to the leaf node , if the new node has 4 nodes , then pop up the middle left node and re-arrange the children accordingly.
    3. If this results in the parent node having 4 nodes, then pop up the middle left node again ,rearranging the children accordingly
    4. Repeat this process until the parent node can accommodate or we get the root.
  • Exercise 1: Add 1-7 in the order of 1,2,3,4,5,6,7 into a 2-3 tree.

    • (1 2 3) -> pop up 2 to be the root of the tree -> (1) (2[up]) (3 4 5) -> pop up 4 into the node 2 , that is (1) (2 4 [up]) (3) (5) -> (1) (2 4 [up]) (3) (5 6 7) -> pop up 6 into the (2 4) then we get (2 4 6) -> pop up 4 to the top the we get (1) (2 [up]) (3) (4 [root]) (5) (6 [up]) (7)

    • the height of this tree is 2.

      4
      26
      1357
  • Exercise 2 : Find an order such that if you add the items 1-7 in that order , the resulting 2-3 tree has height 1.

    • the order can be 2 3 4 5 6 1 7

      3 5
      1 246 7
  • A B-tree has the following helpful invariants , which cause the tree to always be bushy.

    • All leaves must be the same distance from the source.
    • A non-leaf node with k items must have exactly k+1 children.
  • B-Tree runtime analysis

    • The worst-case runtime situation for search in a B-Tree would be if each node had the maxium number of elements in it and we had to traverse all the way to the bottom. We use L L L to denote the number of the elements in each node , which means we would need to explore l o g N log N logN nodes (Since the max height is l o g N logN logN ) and at each node we would need to explore L elements. In total we run L l o g N LlogN LlogN operations. And L L L is a constant , so our total runtime is O ( l o g N ) O(log N) O(logN).
  • Summary of B-Tree

    • B-Trees are a modification of the binary search tree that avoids Θ ( N ) \Theta(N) Θ(N) worst case.
    • Nodes may contain between 1 and L items
    • It contains works almost exactly like a normal BST
    • Add works by adding items to existing leaf nodes (If nodes are too full , then split)
    • Resulting tree has perfect balance. Runtime operations is O ( l o g N ) O(logN) O(logN)
    • Have not discussed deletion
    • B-Tree are more complex , but they can efficiently handle any insertion order.

Rotating Trees

  • Formal definition
    • rotateLeft(G) : Let x be the right child of G , make G the new left child of x.

    • rotateRight(G) : Let x be the left child of G , make G the new right child of x.

    • example : 3 is the right child of 1 , and 2 is the left child of 3. Rotating it to make 2 be the root of the tree and 1 is the left child of 2, 3 is the right child of 2.

      • original :
      1
      3
      2
      • step 1: rotateRight(3) : 2 is the left child of 3, make 3 the new right child of 2
      1
      2
      3
      • step 2: rotateLeft(1) : 2 is the right child of 1 , make 1 the new left child of 2
      2
      13

Red-Black Tree

We are going to create this tree by looking at a 2-3 tree and do some kind of modifications that we can make in order to convert it into BST.

  • For a 2-3 tree that only has 2-nodes(nodes with 2 children) , we already have a BST , so we do not need any modifications.

  • If we get a 3-node, one thing we could do is create a “glue” links that does not hold any infomation and only serves to show that its 2 children are actually a part of one node.

  • We choose arbitrarily to make the left element a child of the right one. This results in a left-leaning tree. We show that a link is a glue link by making it red. Normal link is black. Based on these ,we call these structures left-leaning red-black trees(LLRB).

  • Left-Leaning Red-Black trees have a 1-1 correspondence with 2-3 trees. Every 2-3 tree has a unique LLRB red-black tree associated with it. As for 2-3-4 trees , they maintain correspondence with standard Red-Black Tree

  • Properties of LLRB’s

    • 1-1 correspondence with 2-3 trees.
    • No Node has 2 red links
    • There are no red right-links
    • Every path from root to leaf has same number of black links(because 2-3 trees have same number of links to every leaf)
    • Height is no more than 2x height of corresponding 2-3 tree.
  • Example : Draw the LLRB corresponding to the 2-3 tree shown below

    u w
    a svx y

    note : highlight part has a “red” line

    w
    uy
    svx
    a
  • Inserting into LLRB

    • We can always insert into a LLRB tree by inserting into a 2-3 tree and converting it using the scheme from above. However , this would be contrary to our original purpose of creating a LLRB, which was to avoid the complicate code of a 2-3 tree. Instead ,we insert into the LLRB as we would with a normal BST. However, this could break its 1-1 mapping to a 2-3 tree, so we will use rotations to message the tree back into a proper structure.
    • Task1: insertion color : because in a 2-3 tree, we are always inserting by adding to a leaf node , the color of the link we add should always be red
    • Task2: insertion on the right : we are using left-leaning red black trees ,which means we can never have a right red link. If we insert on the right , we will need to use a rotation in order to maintain the LLRB invariant. However , if we were to insert on the right with a red link and the left child is also a red link, then we will temporarily allow it for purposes that will become clearer in task3
    • Task3: double insertion on the left : if there are 2 left red links , then we have a 4-node which is illegal. First we will rotate to create the same tree seen in task2 above. Then in both situations , we will flip the colors of all edges touching S. This is equivalent to pushing up the middle in a 2-3 tree.
    • the summary of all the operation
      • When inserting : Use a red link
      • If there is a right leaning “3-node”, we have a left leaning violation
        • Rotate left the appropriate node to fix
      • If there are two consecutive left links , we have an incorrect 4 node violation
        • Rotate right the appropriate node to fix
      • If there are any nodes with two red children, we have a temporary 4 Node.
        • Color flip the node to emulate the split operation
  • LLRB runtime

    • LLRB tree has height O ( l o g N ) O(logN) O(logN)
    • Contains is trivially O ( l o g N ) O(logN) O(logN)
    • Insert is O ( l o g N ) O(logN) O(logN)
      • O ( l o g N ) O(logN) O(logN) to add new node
      • O ( l o g N ) O(logN) O(logN) rotation and color flip operation per insert.
  • LLRB Implementation

    // abstract code for insertion into LLRB
    private Node put(Node h, Key key, Value val){
    	if(h == null){
    		return new Node(key, val, RED);
    	}
    	int cmp = key.compareTo(h.key);
    	if(cmp < 0) {h.left = put(h.left, key, val);}
    	else if (cmp > 0) {h.right = put(h.right, key, val);}
    	else {h.val = val;}
    	
    	ifisRed(h.right) && !isRed(h.left)) {h = rotateLeft(h);}
    	if (isRed(h.left) && isRed(h.left.left)) {h = rotateRight(h);}
    	if (isRed(h.left) && isRed(h.right)) { flipColors(h); }
    	return h;
    	}
    

Summary

  • Binary Search Trees are simple , but they are subject to imbalance which leads to crappy runtime
  • 2-3 Trees (B-Trees) are balanced, but painful to implement and relatively slow
  • LLRBs insertion is simple to implement (but deletion is hard)
    • Works by maintaining mathematical bijection with a 2-3 trees
  • Java’s TreeMap is red-black tree (but not left leaning)
  • LLRBs maintain correspondence with 2-3 tree , Standard Red-Black tress maintain correspondence with 2-3-4 trees.
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值