The notes of Algorithms ---- Data Structures ---- Hash Table-优快云博客

本文链接：https://blog.youkuaiyun.com/DirkNow/article/details/8803720

本文深入探讨了哈希表的工作原理，包括直接访问表的优势与局限性、哈希函数的设计原则，以及解决冲突的方法如链接法和开放地址法。此外，还介绍了如何选择合适的哈希函数以提高效率，并讨论了通用哈希的概念及其应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Table S holding n records.

One record structure is like this:

struct{

Key;

Satellite data;

}

X, as a pointer, which point to one record.

Operations

- Insert(S,x); //S<--S∪{x}

- Delete(S,x); //S<--S - {x}

- Search(S,k); //return x, such as key[x]=k or nil if no such x. (Note, the second param is k not x.)

Direct Access Table:

Suppose keys are drown form U={0,1,......m-1}

Set up array T[0,......,m-1] to represent the dynamic set S.

T[k]=x if x∈S and key[x]=k

T[k]=nil otherwise

Operation take Θ(1) time.

Disadvantage: if the range of U is very large and only a subpartion of U will be map into table S, there will be lots of waste space. Furthermore, if the key is not represent by numeral, it cannot map the key into index of the array. So, we need another data-structure ------ Hash-Table.

Hashing:

m: stands there are m slots(space).

n: there are n records to map into the slots.

Hash funcion h maps keys "randomly" into slots of table T. When a record to be inserted maps to an already occupied slot, a collision occurs.

Resolving collisions

The first method --- by chaining:

Idea: link records in same slot into a list.

Analysis:

Worst-case: every hashes to the same slot. Access takes Θ(n) time.

Anverage-Case: assumption of simple uniform hashing

- each key is equally likely to be hashed to any slot in T independent of where other keys are hash.

P{h(k_i)=h(k_j)}=1/m*1/m*m=1/m

P{h(k_i)=2}=1/m

The load factor of a hash table with n keys and m slots, is α=n/m="anverage numbers of keys per slot".

So, the Expected unsuccessful search time will be Θ(1+α)

the Expected successful search time will be Θ(1) (if α=O(1))

The second method --- by open addressing --- no stroge for links.

- Probe table systematically until an empty slot is found.

- h: U×{0,1,......,m-1} ---> {0,1,.....,m-1}

U: universe of keys

frist {0,1,...m-1}: probe

second {0,1,...m-1}: slot

- Table may fill up, so we should make sure n<=m.

- Deletion is impossible.

Probing stratagies:

- Linear detect : "primary clustring"

- Double hashing - h(k,i)=(h_1(k)+i*h_2(k)) mod m

To guarateen search whole table, there are two methods.

Method1 : usually pick m=2^r and h_2(k) gets odd.

Method2 : m get a prime. m` as a number little than m and keep positive interger. h_2(k)=1+(k mod m`)

Choose a hash function :

Division method:

h(k)=k mod m

Notice: 1.Don`t pick m with small divisor d such as d=2.

Ex: if d=2 and all keys are even, then the slots which index is odd can never use.

2.Don`t pick m=2^n so that hash doesn`t depend on all bits of k.

3.Pick m = prime not too close to power of 2 and 10.

Multiplication method:

Condition: 1.m=2^r 2. computer has w-bits words.

Then: h(k)=(A*k mod 2^w) rsh (w-r)

Don`t pick A too close to the power of 2^(w-1) or 2^w

Muliple and mod 2^w is much faster than division, btw, rsh is also fast.

It like a modular wheel, A decide when the wheel shoud stop.

Universal hashing:

If you put out a certain hash function, there must be a array, we can make it out, to make your hash function map these data into one slot. How to deal with it: Randomized.

Define: let U be a universal of keys, and let H be a finite collection of has functions, mapping U to {0,1,...m-1},

We can call H is universal: if any x,y∈U where x!=y the following is true:

{h∈H | h(x)=h(y)}=|H|/m

(This means there are |H| hash function in H, and only |H|/m functions make h(x)=h(y))

Thm: Choose h randomly from H suppose hashing n keys into m slots in table T. Then for a given key x.

E[# collisions wiht x]<n/m

Constructing a universal hash function:

Let m be prime: Decompose key k into r+1 digits: k=<k_0,k_1...k_r> where 0<=k_i<=m-1

(make k into many digital base m, it can be realized using k mod m and k/m)

Pick a=<a_0,a_1,...a_r>(base m) each a_i is chosen randomly from {0,1,....m-1}

(This step insert radom into our algorithm)

Define h_a(k)=(∑a_i*k_i) mod m

The size of H is m^(r+1).

Perfect hashing:

Given n keys, construct a static hash table of size m=O(n), such as the search take O(1) time in the worst-case.

Idea: use 2-level schema with universal hashing at both level. No collisions at level 2.

level-1: If n_ items that hash to level-1 slot i the use m_i=n_i^2 in level-2 table S_i.

The level-1 table store m_i, the index of universal hash, and a pointer to level-2 table.

level-2: to make sure that no collisions at the level, just test a few at random. Find one quickly since at least half of these functions will work fine.