Weakness of hashing: For any choice of hash function, it always exist bad set of keys that all hash to same slot.
Idea: Choose hash function at random,independently from the keys.
Universal Hashing
Definition1: Let U be a universal of keys, and let
So H is universal:
I.e. if h is chosen randomly from
Theorem1:
Choose h randomly from
Proof:
Let cx be random variable, the total number collisions of keys in T with x, and let
Note: E[cxy]=1m and Cx=∑y∈T−{x}cxy ,and y is the element not equal with x in Table T.
So:
Constructing a universal hash function
Let m be prime, decompose key
in here we treating k as an
Now we’re going to pick an
Definition2:
We want to know how big is the set of hash function here? how many different hash functions do I have in this set?
Conclusion:
Explanation:
Because it have m choices for each
Theorem2: H is universal.
Proof:
Let
Question: How many hash functions in universal
It must have ha(x)=ha(y) if they collide.
since x0≠y0, so ∃(x0−y0)−1 to make following formula is true according with Number Theory Fact.
Number Theory Fact:
let m be prime, for any
z∈Zm (Zm are intergers module m) , so for any z is not congruent to 0, there exists a uniquez inverse in Zm, such that if I multiply z times the inverse, it produces something congruent to one modm .
I.e.$z≢0,∃ unique z−1∈Zm ⇒$zz−1≡1 (mod m).
Conclusion:
Thus, for any choice of a1,a2,...,ar exactly 1 of the m choices for
So the number of ha’s that cause x,
because a1 has m choices, and
Perfect Hashing
Situation: Given n keys construct a static hash table of size
Idea: Use a 2-level scheme with universal hashing at both levels. And the idea is that we’re going to do it in such a way that we have no collisions at level 2 and we’ll take any collides at level 1.
If ni items hash to level-1 slot i, then use
Level-2 Analysis
Theorem: Hash n keys into
Proof: Probability 2 given keys collide under h is
Note: (2n)=C2n.
Markov inequality
For randomly variable x≥0, Pr{x≥t} ≤ E[x]t.
Proof:
Corollary:
Now we can use the Markov inequality theorem to prove that the corollary is correct.
Proof:
Conclusion: So we can know that to find a good level-2 hash function, just test a few at random, and we’ll find one quickly since at least half will work.
Analysis of storage
For level-1 choose m=n, and let ni be the random variable for the number of keys that hash to slot i in table T, use
E[total storage]=n+E[∑i=0m−1θ(n2i)]=θ(n)
Note: we can get E[∑m−1i=0θ(n2i)]=θ(n) by bucket sort analysis.