The notes of Algorithms ---- Data Structures ---- Hash Table

Table S holding n records.

One record structure is like this:

struct{

Key;

Satellite data;

}

X, as a pointer, which point to one record.

Operations

- Insert(S,x); //S<--S∪{x}

- Delete(S,x); //S<--S - {x}

- Search(S,k);  //return x, such as key[x]=k or nil if no such x. (Note, the second param is k not x.)


Direct Access Table:

Suppose keys are drown form U={0,1,......m-1}

Set up array T[0,......,m-1] to represent the dynamic set S.


T[k]=x if x∈S and key[x]=k

T[k]=nil otherwise

Operation take Θ(1) time.

Disadvantage: if the range of U is very large and only a subpartion of U will be map into table S, there will be lots of waste space. Furthermore, if the key is not represent by numeral, it cannot map the key into index of the array. So, we need another data-structure  ------   Hash-Table.


Hashing:

m: stands there are m slots(space).

n: there are n records to map into the slots.


Hash funcion h maps keys "randomly" into slots of table T. When a record to be inserted maps to an already occupied slot, a collision occurs.


Resolving collisions 

The first method --- by chaining:

Idea: link records in same slot into a list.

Analysis:

Worst-case: every hashes to the same slot. Access takes Θ(n) time.

Anverage-Case: assumption of simple uniform hashing

- each key  is equally likely to be hashed to any slot in T independent of where other keys are hash.

P{h(k_i)=h(k_j)}=1/m*1/m*m=1/m

P{h(k_i)=2}=1/m

The load factor of a hash table with n keys and m slots, is α=n/m="anverage numbers of keys per slot".

So, the Expected unsuccessful search time will be Θ(1+α)

       the Expected successful search time will be Θ(1) (if α=O(1))

The second method --- by open addressing  --- no stroge for links.

- Probe table systematically until an empty slot is found.

- h: U×{0,1,......,m-1} ---> {0,1,.....,m-1}

U: universe of keys

frist {0,1,...m-1}: probe

second {0,1,...m-1}: slot

- Table may fill up, so we should make sure n<=m.

- Deletion is impossible.

   Probing stratagies:

- Linear detect : "primary clustring"

- Double hashing - h(k,i)=(h_1(k)+i*h_2(k)) mod m

To guarateen search whole table, there are two methods.

Method1 : usually pick m=2^r and h_2(k) gets odd. 

Method2 : m get a prime. m` as a number little than m and keep positive interger. h_2(k)=1+(k mod m`)


Choose a hash function :

Division method:

h(k)=k mod m 

Notice: 1.Don`t pick m with small divisor d such as d=2.

Ex: if d=2 and all keys are even, then the slots which index is odd can never use.

2.Don`t pick m=2^n so that hash doesn`t depend on all bits of k.

3.Pick m = prime not too close to power of 2 and 10.

Multiplication method:

Condition: 1.m=2^r     2. computer has w-bits words.

Then: h(k)=(A*k mod 2^w) rsh (w-r)

Don`t pick A too close to the power of 2^(w-1) or 2^w

Muliple and mod 2^w is much faster than division, btw, rsh is also fast.

It like a modular wheel, A decide when the wheel shoud stop.

Universal hashing:

If you put out a certain hash function, there must be a array, we can make it out, to make your hash function map these data into one slot. How to deal with it: Randomized.

Define: let U be a universal of keys, and let H be a finite collection of has functions, mapping U to {0,1,...m-1},

We can call H is universal: if any x,y∈U where x!=y the following is true:

{h∈H | h(x)=h(y)}=|H|/m 

(This means there are |H| hash function in H, and only |H|/m functions make h(x)=h(y))

Thm: Choose h randomly from H suppose hashing n keys into m slots in table T. Then for a given key x.

E[# collisions wiht x]<n/m

Constructing a universal hash function:

Let m be prime: Decompose key k into r+1 digits: k=<k_0,k_1...k_r> where 0<=k_i<=m-1

(make k into many digital base m, it can be realized using k mod m and k/m)

Pick a=<a_0,a_1,...a_r>(base m) each a_i is chosen randomly from {0,1,....m-1}

(This step insert radom into our algorithm)

Define h_a(k)=(∑a_i*k_i) mod m

The size of H is m^(r+1).

Perfect hashing:

Given n keys, construct a static hash table of size m=O(n), such as the search take O(1) time in the worst-case.

Idea: use 2-level schema with universal hashing at both level. No collisions at level 2.

level-1:  If n_ items that hash to level-1 slot i the use m_i=n_i^2 in level-2 table S_i.

The level-1 table store m_i, the index of universal hash, and a pointer to level-2 table.

level-2: to make sure that no collisions at the level, just test a few at random. Find one quickly since at least half of these functions will work fine.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值