Intro:
API: Python: Dict; JAVA: HashMap
Applications: File Systems; Password Verification; Store Optimization
IP Address:
Main Loop |
log - array of log lines(time,IP) i←0 j←0 UpdateAccessList(log, i, j, C) |
UpdateAccessList(log, i, j, C) |
while log[i].time≤ Now(): C[log[i].IP]← C[log[i].IP] + 1 i←i+1 while log[j].time≤ Now()− 3600: C[log[j].IP]← C[log[j].IP]− 1 j←j+1 |
AccessedLastHour(IP, C) |
return C[IP]> 0 |
Direct Addressing
Need a data structure forC
There are 232different IP(v4) addresses
Convert IP to 32-bit integer
Create an integer array A of size 232
Use A[int(IP)]as C[IP]
int(IP) |
return IP[1]·224+IP[2]·216+IP[3]·28+IP[4] |
UpdateAccessList(log, i, j, A) |
while log[i].time≤ Now(): A[int(log[i].IP)]← A[int(log[i].IP)] + 1 i←i+1 while log[j].time≤ Now()− 3600: A[int(log[j].IP)]← A[int(log[j].IP)]− 1 j←j+1 |
AccessedLastHour(IP) |
return A[int(IP)]> 0 |
Asymptotics
UpdateAccessListis O(1)per log line
AccessedLastHouris
O(1)
But need 232memory even for few IPs
IPv6: 2128won’t fit in memory
In general: O(N)memory, N= |S|
List-based Mapping:
Direct addressing requires too much memory
Let’s store only active IPs
Store them in a list
Store only last occurrence of each IP
Keep the order of occurrence
UpdateAccessList(log, i, L) |
while log[i].time≤ Now(): log_line← L.FindByIP(log[i].IP) if log_line!=NULL: L.Erase(log_line) L.Append(log[i]) i←i+1 while L.Top().time≤ Now()− 3600: L.Pop() |
AccessedLastHour(IP, L) |
return L.FindByIP(IP)!=NULL |
Asymptotics
n is number of active IPs
Memory usage is Θ(n)
L.Append,L.Top,L.Popare
Θ(1)
L.Findand L.Eraseare Θ(n)
UpdateAccessListis Θ(n)per log line
AccessedLastHouris Θ(n)
Encoding IPs
Encode IPs with small numbers
I.e. numbers from 0 to 999
Different codes for currently active IPs
Hash Function
De nition |
For any set of objectsS
and any integer |
De nition |
m is called thecardinality of hash function h. |
Desirable Properties
h should be fast to compute.
Different values for different objects.
Direct addressing withO(m)memory.
Want small cardinalitym.
Impossible to have all different values ifnumber of objects|S|is more than m.
Collisions
De nition |
When h(o1) = h(o2)and o1̸=o2, this is acollision. |
Map
Store mapping from objects to other objects:
Filename → location of the file on disk
Student ID
→ student name
Contact name → contact phone number
Definition |
Map from S to V is a data structure with methodsHasKey(O),Get(O),Set(O,v),whereO ∈S,v∈ V. |
h :S
→ {0,1, . . . ,m
−1}
O,O′∈S
v,v′∈V
A ← array ofm
lists (chains) of pairs(O,v)
HasKey(O) |
L ←A[h(O)] if O′== O: return true return false |
Get(O) |
L ←A[h(O)] if O′== O: return v′ return n/a |
Set(O,v) |
L ←A[h(O)] for pin L: if p.O== O: p.v← v return L.Append(O, v) |
Set
De nition |
Set is a data structure with methodsAdd(O),Remove(O),Find(O). |
Examples |
IPs accessed during last hourStudents on campus |
h :S
→ {0,1, . . . ,m
−1}
O,O′∈S
A ← array ofm
lists (chains) of objectsO
Find(O) |
L ←A[h(O)] for O′in L: if O′== O: return true return false |
Add(O) |
L ←A[h(O)] for O′in L: if O′== O: return L.Append(O) |
Remove(O) |
if not Find(O): return L ←A[h(O)] L.Erase(O) |
Hash Table:
Definition |
An implementation of a set or a map usinghashing is called a hash table. |
Programming Language:
Set:
unordered_set inC++
HashSet
in Java
set in Python
Map:
unordered_map inC++
HashMap
in Java
dict in Python
Conclusion
Chaining is a technique to implement ahash table
Memory consumption isO(n+ m)
Operations work in timeO(c+ 1)
How to make bothm andc small?