Hash Table - Reviews

本文介绍了哈希表的基本概念、工作原理及其实现方法,包括哈希函数、冲突处理等关键技术,并通过一个简单的C++示例展示了哈希表的插入、检索和删除操作。

Intro

The concept, usage and implementations of Hash table are always used in Software Engineer interviews. From the interview guidance of Google, there is an requirement of hash table. It is said "Hashtables: Arguably the single most important data structure known to mankind." There is indeed a bunch of knowledge and techniques for hashtables (hash function, collision, etc.), but from the interview perspective, it is not possible to test the thorough and complete skills of hashtables in a short interview. Take this advantage, in this post, I'd like to learn the basics of hash tables, and try to implement sample code.

What is Hash Table?

It is a very common but often occurred question in IT interviews. I generalize the concept in my own words: " Hash table, is a  data structure, which stores  key-value pairs, the access of value by key can be  O(1) time, a hash function is used to map the key to the index of the value."

You can find many many definitions of hash table, generally speaking, you can imagine hash table is an array, originally we access an element in array by using index, e.g. A[1], A[2]. However in hash table, we access  element by the  key,  e.g. A["Monday"], D["Marry"].  The great advantage of it is the speed to look up an element (O(1) time). 

How does Hash table works

Firstly, hash tables can be implemented based on many data structures, e.g. Linked list, array and linked list, binary search tree, etc. The idea is to store the <key, value> pair and build a way to access it. For better understanding, just consider an array, we put the <key, value> in a specific order. The way to locate the <key, value> using the key is called hashing. We can consider a hash function takes the key as the input, and output the location of the <key, value> in the array. A simple hash function is to used "mod" operation.  Use the "key mod array size" to get the hash, the index of the desired value. 


An example

Let's see a simple example.
We have a storage of  size 5:
idx      key       value
0         -1           0
1         -1           0
2         -1           0
3         -1           0
4         -1           0
key=-1 means the slot is empty.
The hash function is   hash(key) = key % 5;
First we insert <12, 12>  (first is key, second is value)
Compute the hash(12) = 2;
Store the <key, value> into the storage of idx 2.
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         -1           0
4         -1           0
Next we insert <29,29>, hash(29)=4;
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         -1           0
4         29          29
Then we insert <27,27>, where the hash code is 2. When we check the location 2, it is already in use. 
It is called a  collision, where different key are mapped into same hash code. To deal with the collision, there are many methods, such as, chaining (use a linked list for each location), and rehashing (second function is used to map to another location). Usually we need to know at least these two kinds of methods.
Here we use the rehashing.  

The rehashing function is:  rehash(key) = (key+1)%5;
So, continue the above step, rehash(2) = 3; location 3 is empty, then store the <27,27> to location 3.
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         27          27
4         29          29

If we further insert <32,32>, hash(32) = 2; location 2 is in use, rehash(2) = 3, location 3 is also in use,
Then rehash again, rehash(3) = 4, no available, rehash(4) = 0, OK! Store <32, 32 > in 0th slot.

idx      key       value
0         32          32
1         -1           0
2         12          12
3         27          27
4         29          29

That is the basic way of insert operation for a hash table.

To retrieve the value, e.g. we want to find the value of key <27, ?>, hash(27) = 2, check the key stored in location 2 , which is 12 !=27, then rehashing is need, rehash(2) = 3,  the key is 27, then return the value 27.

A simple implementation (in C++)

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#include <iostream>
 
 
using namespace std;
 
const int sz = 5;
 
struct data{
  int id;
  int val;
};
 
class Hashtable{
  data dt[sz];
  int numel;
public :
  Hashtable();
  int hash( int &id);
  int rehash( int &id);
  int insert(data &d);
  int remove (data &d);
  int retrieve( int &id);
  void output();
};
 
 
Hashtable::Hashtable(){
  for ( int i=0;i<sz;i++){
    dt[i].id = -1;
dt[i].val = 0;
  }
  numel = 0;
}
 
int Hashtable::hash( int &id){
  return id%sz;
}
 
int Hashtable::rehash( int &id){
  return (id+1)%sz;
}
 
int Hashtable::insert(data &d){
  if (numel<sz){
    int hashid = hash(d.id);
if (hashid>=0 && hashid < sz){
  if (dt[hashid].id==-1 || dt[hashid].id==-2){
    dt[hashid].id = d.id;
    dt[hashid].val = d.val;
            numel++;
    return 0;
  } else {
    cout << "collision! rehashing..." <<endl;
    int i=0;
    while (i<sz){
      hashid = rehash(hashid);
  if (dt[hashid].id==-1 || dt[hashid].id==-2){
    dt[hashid].id = d.id;
    dt[hashid].val = d.val;
    numel++;
    return 0;
      }
  if (i==sz){ return -1;}
  i++;
}
  }
}
  } else { return -1;}
}
 
int Hashtable:: remove (data &d){
  int hashid = hash(d.id);
if (hashid>=0 && hashid < sz){
  if (dt[hashid].id==d.id){
    dt[hashid].id = -2;
    dt[hashid].val = 0;
    numel--;
    return 0;
  } else {
    int i=0;
    while (i<sz){
      hashid = rehash(hashid);
  if (dt[hashid].id==d.id){
    dt[hashid].id = -2;
    dt[hashid].val = 0;
    numel--;
    return 0;
      }
  if (i==sz){ return -1;}
  i++;
}
  }
}
}
 
int Hashtable::retrieve( int &id){
  int hashid = hash(id);
  if (hashid>=0 && hashid < sz){
    if (dt[hashid].id==id){
  return dt[hashid].val;
} else {
   int i=0;
    while (i<sz){
      hashid = rehash(hashid);
  if (dt[hashid].id==id){
    return dt[hashid].val;
  }
  if (i==sz){ return 0;}
  i++;
}
}
  }
}
 
void Hashtable::output(){
  cout << "idx  id  val" << endl;
  for ( int i=0;i<sz;i++){
    cout << i << "    " << dt[i].id << "    " << dt[i].val << endl;
  }
}
 
 
int main(){
  Hashtable hashtable;
  data d;
  d.id = 27;
  d.val = 27;
  hashtable.insert(d);
  hashtable.output();
  
  
  d.id = 99;
  d.val = 99;
  hashtable.insert(d);
  hashtable.output();
  
  d.id = 32;
  d.val = 32;
  hashtable.insert(d);
  hashtable.output();
  
  d.id = 77;
  d.val = 77;
  hashtable.insert(d);
  hashtable.output();
  
  //retrieve data
  int id = 77;
  int val = hashtable.retrieve(id);
  cout << endl;
  cout << "Retrieving ... " << endl;
  cout << "hashtable[" << id<< "]=" << val << endl;
  cout << endl;
  
  
  //delete element
  d.id = 32;
  d.val = 32;
  hashtable. remove (d);
  hashtable.output();
  
  d.id = 77;
  d.val = 77;
  hashtable. remove (d);
  hashtable.output();
  
      
  return 0;
}


原文地址如下:

http://yucoding.blogspot.com/2013/08/re-viewhash-table-basics.html



<?xml version="1.0" encoding="utf-8"?> <!-- 有关如何配置 ASP.NET 应用程序的详细信息,请访问 https://go.microsoft.com/fwlink/?LinkId=169433 --> <configuration> <connectionStrings> <add name="DB-NetShopsConnectionString" connectionString="Data Source=.\MSSQLSERVER2012;Initial Catalog=DB-NetShops;User ID=sa;Password=your_password;Integrated Security=False" providerName="System.Data.SqlClient" /> </connectionStrings> <system.web> <!-- 添加全局cookie设置 --> <httpCookies httpOnlyCookies="true" /> <compilation debug="true" targetFramework="4.7.2" /> <httpRuntime targetFramework="4.7.2" maxRequestLength="20480" executionTimeout="300" requestValidationMode="4.7.2" /> <!-- 移除httpOnlyCookies属性 --> <sessionState mode="InProc" timeout="20" /> <authentication mode="Forms"> <!-- 移除httpOnlyCookies属性 --> <forms loginUrl="~/WebForm1.aspx" defaultUrl="~/WebForm2.aspx" timeout="20" /> </authentication> <globalization culture="zh-CN" uiCulture="zh-CN" requestEncoding="utf-8" responseEncoding="utf-8" /> <customErrors mode="RemoteOnly" defaultRedirect="~/Error.aspx"> <error statusCode="404" redirect="~/404.aspx" /> <error statusCode="500" redirect="~/500.aspx" /> </customErrors> <!-- 新添加的母版页配置 --> <pages masterPageFile="~/Site.master"> <controls> <add tagPrefix="asp" namespace="System.Web.UI" assembly="System.Web.Extensions, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/> </controls> </pages> </system.web> <system.webServer> <httpProtocol> <customHeaders> <add name="X-Content-Type-Options" value="nosniff" /> <add name="X-Frame-Options" value="SAMEORIGIN" /> <add name="X-Xss-Protection" value="1; mode=block" /> </customHeaders> </httpProtocol> <staticContent> <clientCache cacheControlMode="UseMaxAge" cacheControlMaxAge="7.00:00:00" /> </staticContent> <security> <requestFiltering> <requestLimits maxAllowedContentLength="20971520" /> </requestFiltering> </security> </system.webServer> <system.codedom> <compilers> <compiler language="c#;cs;csharp" extension=".cs" type="Microsoft.CodeDom.Providers.DotNetCompilerPlatform.CSharpCodeProvider, Microsoft.CodeDom.Providers.DotNetCompilerPlatform, Version=2.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" warningLevel="4" compilerOptions="/langversion:default /nowarn:1659;1699;1701" /> </compilers> </system.codedom> </configuration> 根据webconfig给出完整的数据库代码
06-12
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值