Hash function:
Solution: Use a hash function h(k) to map the universe U of all keys into continuous storage spaces {0, 1, ..., m–1}.
Collision: When a record to be inserted maps to an already occupied slot in T.
Resolving collisions:
At first we assume that each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed.
Let n be the number of keys in the table, and let m be the number of slots.
Define the load factor of T to be α = n/m = average number of keys per slot.
1)by chaining
Records in the same slot are linked into a list.
Apparently, the T(n) of searching an element is the time we access the correct slot plus the search the list.And that is T(n) = Θ(1 + α).
Recommend hash functions:
- h(k) = k mod m. Pick m to be a prime not too close to a power of 2 or 10.
- h(k) = (A·k mod 2w) rsh (w – r). Assume m = 2r, and our computer has w-bit words. Don’t pick A too close to 2w.
2)by open addressing
No storage is used outside of the hash table itself.Insertion systematically probes the table until an empty slot is found.
The hash function depends on both the key and probe number.
Probing strategies:
-Linear probing.
h(k,i) = (h′(k) + i) mod m. It is a simple function , but it's easy to cause crowd.
-Double hashing
h(k,i) = (h1(k) + i⋅h2(k)) mod m. Make m a power of 2 and design h2(k) to produce only odd numbers to be guaranteed to hit every slot.
Performance:
The posibility of the first probe we hit the occupied slot is apparently n/m , which is α.The second probe is (n-1)/(m-1).Then the expected number of probe is :
1 + n/m(1 + (n-1)/(m-1)(1 + (n-2)/(m-2) ) ... ) <= 1 + α(1 + α(1 + α(1 + α ...))) = 1 + α + α2 + α3 + .... = 1 / (1 - α).
Problem: If the table is 90% full, then the expected number of probes is 1/(1–0.9) = 10. And imagin it's 99.99……% full, what will happen.
Universal hashing and perfect hashing.
1)Universal hashing function.
Hp,m = {ha,b : a ∈ Zp* b ∈ Zp} Here H is a bounch of functions with params p and m ,h. And there are two random constant a,b in these functions.
ha,b(k) = ((ak + b) mod p) mod m.
p is a prime who is at least greater than the largest k ; m is the slot number as always ; a , b is two random guy , bounded by Zp* Zp ;
Before we do anything , we randomly chose a hashing function from H .
Performance:
If we handle the collisions by making a chain in each slot, then the lenth of the chain turns out to be the load factor "algha".
2)perfect hashing.
Rather than making a chain , the perfect hashing to handle the collisions is by making another smaller hash table in each slot.
The subtable's length(mj) is α2 , and the function is also a Universal hashing function , a b randomized chosed again and m become mj .