MIT 算法导论（八） Hash

最新推荐文章于 2020-04-14 14:02:29 发布

weixin_34270606

最新推荐文章于 2020-04-14 14:02:29 发布

阅读量166

点赞数

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/louis-sherren/archive/2012/09/03/hash.html

本文深入探讨了哈希函数的原理与应用，包括如何使用哈希函数将宇宙中的所有键映射到连续存储空间，并详细阐述了冲突解决策略，如通过链式和开放寻址方法来处理冲突。

MIT 算法导论（八） Hash

Hash function:

Solution: Use a hash function h(k) to map the universe U of all keys into continuous storage spaces {0, 1, ..., m–1}.

Collision: When a record to be inserted maps to an already occupied slot in T.

Resolving collisions:

At first we assume that each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed.

Let n be the number of keys in the table, and let m be the number of slots.

Define the load factor of T to be α = n/m = average number of keys per slot.

1)by chaining

Records in the same slot are linked into a list.

Apparently, the T(n) of searching an element is the time we access the correct slot plus the search the list.And that is T(n) = Θ(1 + α).

Recommend hash functions:

h(k) = k mod m. Pick m to be a prime not too close to a power of 2 or 10.
h(k) = (A·k mod 2w) rsh (w – r). Assume m = 2r, and our computer has w-bit words. Don’t pick A too close to 2w.

2)by open addressing

No storage is used outside of the hash table itself.Insertion systematically probes the table until an empty slot is found.

The hash function depends on both the key and probe number.

Probing strategies:

-Linear probing.

h(k,i) = (h′(k) + i) mod m. It is a simple function , but it's easy to cause crowd.

-Double hashing

h(k,i) = (h1(k) + i⋅h2(k)) mod m. Make m a power of 2 and design h2(k) to produce only odd numbers to be guaranteed to hit every slot.

Performance:

The posibility of the first probe we hit the occupied slot is apparently n/m , which is α.The second probe is (n-1)/(m-1).Then the expected number of probe is :

1 + n/m(1 + (n-1)/(m-1)(1 + (n-2)/(m-2) ) ... ) <= 1 + α(1 + α(1 + α(1 + α ...))) = 1 + α + α²+ α³+ .... = 1 / (1 - α).

Problem: If the table is 90% full, then the expected number of probes is 1/(1–0.9) = 10. And imagin it's 99.99……% full, what will happen.

Universal hashing and perfect hashing.

1)Universal hashing function.

H_p,m = {h_a,b : a ∈ Z_p^*b ∈ Z_p} Here H is a bounch of functions with params p and m ,h. And there are two random constant a,b in these functions.

ha,b(k) = ((ak + b) mod p) mod m.

p is a prime who is at least greater than the largest k ; m is the slot number as always ; a , b is two random guy , bounded by Z_p^* Z_p ;

Before we do anything , we randomly chose a hashing function from H .

Performance:

If we handle the collisions by making a chain in each slot, then the lenth of the chain turns out to be the load factor "algha".

2)perfect hashing.

Rather than making a chain , the perfect hashing to handle the collisions is by making another smaller hash table in each slot.