A hash table is a generalization of the simpler notion of an ordinary array. Directly addressing into an ordinary array makes effective use of our ability to examine an arbitrary position in an array in big O(1) time. Direct addressing is applicable when we can afford to allocate an array that has one position for every possible key.
When the number of keys actually stored is small relative to the total number of possible keys, Hash tables become an effective alternative to directly addressing an array, since a hash table typically use an array of size proportional to the number of keys actually stored.
Instead of using the key as an array index directly, the array index is computed from the key.
Direct-address table is a simple technique that works well when the universe U of keys is reasonably small.
When the set k of keys stored in a dictionary is much smaller than the universal U of all possible keys, a hash table require much less storage than a direct-address table.
With direct addressing, an element with key k is stored in slot k. With hashing the element is stored in slot h(k), so h(k) is the hash value of key k. There is one hitch: two keys may hash to the same slot, We call this situation a collision. In chaining, we put all the elements that hash to the same slot in a linked list.
A good hash function satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hash to .
The division method computes the hash value as the remainder when the key is divided by a specified prime number. This method frequently gives good results, assume that the prime number is chosen to be unrelated to any patterns in the distribution of keys.
In the division method for creating hash function. We map a key k into one of m slots by taking the remainder of k divided by m. That is, h(k) = k mod m , etc, size m=12 k=100 h(k) =4
When using the division method, we usually avoid certain value of m, For example, m should not be a power of 2 .
A prime not too close to an exact power of 2 is often a good choise for m
The multiplication method for creating hash functions operates in 2 steps. First, we multiply the key k by a constant A in the range 0<A<1 and extract the fractional part of kA. Then, we multiply this value by m and take the floor of the result.
H(k) = m(kA mod 1)
kA mod 1 means the fractional part of kA
An advantage of the multiplication method is that the value of m is not critical. We typically choose it to be a power of 2, since we can then easily implement the function on most computers as follows.
The main idea behind universal hashing is to select the hash function at random from a carefully designed class of functions at the beginning of exexution.
Designing a universal class of hash functions by choosing a prime number p large enough so that every possible key k is in the range 0 to p-1, inclusive, Let Z1 denote the set {0,1,2…..p-1},and let Z2 denote the set {1,2,3……p-1}. Because we assume that the size of the universal of keys is greater than the number of slots in the hash table ,we have p>m
Ha,b(k) = ((ak+b) mod p) mod m
Etc p=17 m=6
H3,4(8) =5
Perfecting hashing : The basic idea to create a perfect hashing scheme is simple. We use a two-level hashing scheme with universal hashing at each level.