Hash Table - Reviews

最新推荐文章于 2025-10-06 17:15:00 发布

转载最新推荐文章于 2025-10-06 17:15:00 发布 · 990 阅读

文章标签：

#hashtable #哈希表

Interview Questions 专栏收录该内容

3 篇文章

订阅专栏

本文介绍了哈希表的基本概念、工作原理及其实现方法，包括哈希函数、冲突处理等关键技术，并通过一个简单的C++示例展示了哈希表的插入、检索和删除操作。

Intro

The concept, usage and implementations of Hash table are always used in Software Engineer interviews. From the interview guidance of Google, there is an requirement of hash table. It is said "Hashtables: Arguably the single most important data structure known to mankind." There is indeed a bunch of knowledge and techniques for hashtables (hash function, collision, etc.), but from the interview perspective, it is not possible to test the thorough and complete skills of hashtables in a short interview. Take this advantage, in this post, I'd like to learn the basics of hash tables, and try to implement sample code.

What is Hash Table?

It is a very common but often occurred question in IT interviews. I generalize the concept in my own words: " Hash table, is a data structure, which stores key-value pairs, the access of value by key can be O(1) time, a hash function is used to map the key to the index of the value."

You can find many many definitions of hash table, generally speaking, you can imagine hash table is an array, originally we access an element in array by using index, e.g. A[1], A[2]. However in hash table, we access element by the key, e.g. A["Monday"], D["Marry"]. The great advantage of it is the speed to look up an element (O(1) time).

How does Hash table works

Firstly, hash tables can be implemented based on many data structures, e.g. Linked list, array and linked list, binary search tree, etc. The idea is to store the <key, value> pair and build a way to access it. For better understanding, just consider an array, we put the <key, value> in a specific order. The way to locate the <key, value> using the key is called hashing. We can consider a hash function takes the key as the input, and output the location of the <key, value> in the array. A simple hash function is to used "mod" operation. Use the "key mod array size" to get the hash, the index of the desired value.

An example

Let's see a simple example.

We have a storage of size 5:

idx key value

0 -1 0

1 -1 0

2 -1 0

3 -1 0

4 -1 0

key=-1 means the slot is empty.

The hash function is hash(key) = key % 5;

First we insert <12, 12> (first is key, second is value)

Compute the hash(12) = 2;

Store the <key, value> into the storage of idx 2.

idx key value

0 -1 0

1 -1 0

2 12 12

3 -1 0

4 -1 0

Next we insert <29,29>, hash(29)=4;

idx key value

0 -1 0

1 -1 0

2 12 12

3 -1 0

4 29 29

Then we insert <27,27>, where the hash code is 2. When we check the location 2, it is already in use.

It is called a collision, where different key are mapped into same hash code. To deal with the collision, there are many methods, such as, chaining (use a linked list for each location), and rehashing (second function is used to map to another location). Usually we need to know at least these two kinds of methods.

Here we use the rehashing.

The rehashing function is: rehash(key) = (key+1)%5;

So, continue the above step, rehash(2) = 3; location 3 is empty, then store the <27,27> to location 3.

idx key value

0 -1 0

1 -1 0

2 12 12

3 27 27

4 29 29

If we further insert <32,32>, hash(32) = 2; location 2 is in use, rehash(2) = 3, location 3 is also in use,

Then rehash again, rehash(3) = 4, no available, rehash(4) = 0, OK! Store <32, 32 > in 0th slot.

idx key value

0 32 32

1 -1 0

2 12 12

3 27 27

4 29 29

That is the basic way of insert operation for a hash table.

To retrieve the value, e.g. we want to find the value of key <27, ?>, hash(27) = 2, check the key stored in location 2 , which is 12 !=27, then rehashing is need, rehash(2) = 3, the key is 27, then return the value 27.

A simple implementation (in C++)

 
        #include <iostream>
       
        using 
         namespace 
         std;
       
        const 
         int 
         sz = 5;
       
        struct 
         data{
       
        int 
         id;
       
        int 
         val;
       
        };
       
        class 
         Hashtable{
       
        data dt[sz];
       
        int 
         numel;
       
        public
        :
       
        Hashtable();
       
        int 
         hash(
        int 
         &id);
       
        int 
         rehash(
        int 
         &id);
       
        int 
         insert(data &d);
       
        int 
         remove
        (data &d);
       
        int 
         retrieve(
        int 
         &id); 
       
        void 
         output();
       
        };
       
        Hashtable::Hashtable(){
       
        for 
         (
        int 
         i=0;i<sz;i++){
       
        dt[i].id = -1;
       
        dt[i].val = 0;
       
        }
       
        numel = 0;
       
        }
       
        int 
         Hashtable::hash(
        int 
         &id){
       
        return 
         id%sz;
       
        }
       
        int 
         Hashtable::rehash(
        int 
         &id){
       
        return 
         (id+1)%sz;
       
        }
       
        int 
         Hashtable::insert(data &d){
       
        if 
         (numel<sz){
       
        int 
         hashid = hash(d.id);
       
        if 
         (hashid>=0 && hashid < sz){
       
        if 
         (dt[hashid].id==-1 || dt[hashid].id==-2){
       
        dt[hashid].id = d.id;
       
        dt[hashid].val = d.val;
       
        numel++;
       
        return 
         0;
       
        }
        else
        {
       
        cout << 
        "collision! rehashing..." 
         <<endl;
       
        int 
         i=0;
       
        while 
         (i<sz){
       
        hashid = rehash(hashid);
       
        if 
         (dt[hashid].id==-1 || dt[hashid].id==-2){
       
        dt[hashid].id = d.id;
       
        dt[hashid].val = d.val;
       
        numel++;
       
        return 
         0;
       
        }
       
        if 
         (i==sz){
        return 
         -1;}
       
        i++;
       
        }
       
        }
       
        }
       
        }
        else
        {
        return 
         -1;}
       
        }
       
        int 
         Hashtable::
        remove
        (data &d){
       
        int 
         hashid = hash(d.id);
       
        if 
         (hashid>=0 && hashid < sz){
       
        if 
         (dt[hashid].id==d.id){
       
        dt[hashid].id = -2;
       
        dt[hashid].val = 0;
       
        numel--;
       
        return 
         0;
       
        }
        else
        {
       
        int 
         i=0;
       
        while 
         (i<sz){
       
        hashid = rehash(hashid);
       
        if 
         (dt[hashid].id==d.id){
       
        dt[hashid].id = -2;
       
        dt[hashid].val = 0;
       
        numel--;
       
        return 
         0;
       
        }
       
        if 
         (i==sz){
        return 
         -1;}
       
        i++;
       
        }
       
        }
       
        }
       
        }
       
        int 
         Hashtable::retrieve(
        int 
         &id){
       
        int 
         hashid = hash(id);
       
        if 
         (hashid>=0 && hashid < sz){
       
        if 
         (dt[hashid].id==id){
       
        return 
         dt[hashid].val;
       
        }
        else
        {
       
        int 
         i=0;
       
        while 
         (i<sz){
       
        hashid = rehash(hashid);
       
        if 
         (dt[hashid].id==id){
       
        return 
         dt[hashid].val;
       
        }
       
        if 
         (i==sz){
        return 
         0;}
       
        i++;
       
        }
       
        }
       
        }
       
        }
       
        void 
         Hashtable::output(){
       
        cout << 
        "idx  id  val" 
         << endl;
       
        for 
         (
        int 
         i=0;i<sz;i++){
       
        cout << i << 
        "    " 
         << dt[i].id << 
        "    " 
         << dt[i].val << endl; 
       
        }
       
        }
       
        int 
         main(){
       
        Hashtable hashtable;
       
        data d;
       
        d.id = 27;
       
        d.val = 27;
       
        hashtable.insert(d);
       
        hashtable.output();
       
        d.id = 99;
       
        d.val = 99;
       
        hashtable.insert(d);
       
        hashtable.output();
       
        d.id = 32;
       
        d.val = 32;
       
        hashtable.insert(d);
       
        hashtable.output();
       
        d.id = 77;
       
        d.val = 77;
       
        hashtable.insert(d);
       
        hashtable.output();
       
        //retrieve data
       
        int 
         id = 77;
       
        int 
         val = hashtable.retrieve(id);
       
        cout << endl;
       
        cout << 
        "Retrieving ... " 
         << endl;
       
        cout << 
        "hashtable[" 
         << id<< 
        "]=" 
         << val << endl;
       
        cout << endl;
       
        //delete element
       
        d.id = 32;
       
        d.val = 32;
       
        hashtable.
        remove
        (d);
       
        hashtable.output();
       
        d.id = 77;
       
        d.val = 77;
       
        hashtable.
        remove
        (d);
       
        hashtable.output();
       
        return 
         0;
       
        }