Hash Table Collision Handling

最新推荐文章于 2024-07-13 17:50:59 发布

转载最新推荐文章于 2024-07-13 17:50:59 发布 · 308 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/grainy/p/7244183.html

文章标签：

#人工智能

本文探讨了哈希表中两种基本的碰撞解决方法：分离链接法和开放寻址法。分离链接法通过在每个桶中附加一个额外的数据结构，如链表，来存储碰撞的元素，而开放寻址法则在遇到碰撞时寻找下一个可用位置。文章详细介绍了这两种方法的优缺点，并讨论了线性探测、二次探测和双重哈希等开放寻址策略。

Two basic methods; separate chaining and open address.

Separate Chain

Hangs an additional data structure off of the buckets. For example the bucket array becomes an array of link list. So to find an item we first go to the bucket then compare keys.. This is a popular method, and if link list is used the hash never fills up.

Illustrate

load factor, f = n/N where n is number of items stored in the hash table. Like for the load factor to be less then 1.

The cost for get(k) is on average O(n/N)

Open Addressing

The problem with separate chaining is that the data structure can grow with out bounds. Sometimes this is not appropriate because of finite storage, for example in embedded processors.

Open addressing does not introduce a new structure. If a collision occurs then we look for availability in the next spot generated by an algorithm. Open Addressing is generally used where storage space is a premium, i.e. embedded processors. Open addressing not necessarily faster then separate chaining.

Methods for Open Addressing:

1. Linear Probing:
  
  We try to insert Item = (k, e) into bucket A[i] and find it full so the next bucket we try is:
  
  A[(i + 1) mod N]
  
  then try A[(i + 1) mod N], etc.
  
  Illustrate with 11 buckets: Note the probing is linear.
  
  Note the hash table can be filled up.
  
  Also what to do if we remove an Item. Should repair the array A but this is too costly. Instead we mark the bucket as available/deactivated. Then the next use of findElement(k) would skip over the available/deactivated bucket. insertItem(k, e) would insert into a available/deactivated.
  
  Clustering slows down searches.
2. Quadratic Probing:
  
  A[ (i + f(j) )mod N] where j = 0, 1, 2, ... and f(j) = j²
  
  Helps avoids clustering. Secondary clustering can occur. We can imagine a more complicated function for f.
3. Double Hashing:
  
  Use a second hash function h'.
  
  A[ (i + f(j) )mod N] where f(j) = j*h'(k) should not evaluate to zero. Example: h'(k) = q - (k mod q). Note that still i = h(k).