MIT 算法导论 (八) Hash

本文深入探讨了哈希函数的原理与应用,包括如何使用哈希函数将宇宙中的所有键映射到连续存储空间,并详细阐述了冲突解决策略,如通过链式和开放寻址方法来处理冲突。

Hash function:

Solution: Use a hash function h(k) to map the universe U of all keys into continuous storage spaces  {0, 1, ..., m–1}.

Collision: When a record to be inserted maps to an already occupied slot in T.

 

Resolving collisions:

At first we assume that each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed.

Let n be the number of keys in the table, and let m be the number of slots.

Define the load factor of T to be α = n/m = average number of keys per slot.

1)by chaining

Records in the same slot are linked into a list.

Apparently, the T(n) of searching an element is the time we access the correct slot plus the search the list.And that is T(n) = Θ(1 + α).

Recommend hash functions: 

  • h(k) = k mod m.  Pick m to be a prime not too close to a power of 2 or 10.
  • h(k) = (A·k mod 2w) rsh (w – r).  Assume m = 2r, and our computer has w-bit words. Don’t pick A too close to 2w.

2)by open addressing

No storage is used outside of the hash table itself.Insertion systematically probes the table until an empty slot is found.

The hash function depends on both the key and probe number.

Probing strategies:

-Linear probing. 

h(k,i) = (h′(k) + i) mod m. It is a simple function , but it's easy to cause crowd.

-Double hashing

h(k,i) = (h1(k) + i⋅h2(k)) mod m. Make m a power of 2 and design h2(k) to produce only odd numbers to be guaranteed to hit every slot.

Performance:

The posibility of the first probe we hit the occupied slot is apparently n/m , which is α.The second probe is (n-1)/(m-1).Then the expected number of probe is :

1 + n/m(1 + (n-1)/(m-1)(1 + (n-2)/(m-2) ) ... ) <= 1 + α(1 + α(1 + α(1 + α ...))) = 1 + α + α2 + α+ .... = 1 / (1 - α).

Problem: If the table is 90% full, then the expected number of probes is 1/(1–0.9) = 10. And imagin it's 99.99……% full, what will happen.

 

Universal hashing and perfect hashing.

1)Universal hashing function.

Hp,m = {ha,b : a ∈ Zp* b ∈ Zp}    Here H is a bounch of functions with params p and m ,h. And there are two random constant a,b in these functions.

ha,b(k) = ((ak + b) mod p) mod m.

p is a prime who is at least greater than the largest k ; m is the slot number as always ; a , b is two random guy , bounded by Zp*   Zp ; 

Before we do anything , we randomly chose a hashing function from H .

Performance:

If we handle the collisions by making a chain in each slot, then the lenth of the chain turns out to be the load factor "algha".

2)perfect hashing.

Rather than making a chain , the perfect hashing to handle the collisions is by making another smaller hash table in each slot.

The subtable's length(mj) is α2 , and the function is also a Universal hashing function , a b randomized chosed again and m become mj  .

posted on 2012-09-03 22:51 我是正常蛇 阅读( ...) 评论( ...) 编辑 收藏

转载于:https://www.cnblogs.com/louis-sherren/archive/2012/09/03/hash.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值