I. Principles of Hash Table Lookup
Hash table lookup is implemented based on a hash function, which maps keys to specific storage locations. Ideally, it can complete a lookup operation in constant time. The core principle is to compute the storage location of a record using the hash function Loc(i) = H(keyi)
, enabling fast access.
(1) Construction of Hash Functions
Direct Addressing Method
- Principle: Use a linear function of the key as the hash address, such as
Hash(key) = a * key + b
(wherea
andb
are constants). This method does not produce collisions but requires continuous address space, leading to low space efficiency. - Applicable Scenarios: Suitable for cases where the key distribution is relatively continuous and predictable, such as specific numbering systems.
Division Method
- Principle:
Hash(key) = key mod p
(wherep
is an integer, typically a prime number withp ≤ m
, andm
is the hash table length). This is one of the most commonly used methods for constructing hash functions. - Applicable Scenarios: Widely used for processing various types of keys, effectively dispersing keys across the hash table.
- Key Points: Choosing an appropriate value for
p
is critical. It requires comprehensive consideration of the key distribution, execution speed, key length, hash table size, and lookup frequency.
(2) Collision Handling Methods
Open Addressing Method
Linear Probing
- Principle: When a collision occurs, compute the next probe address using
Hi = (Hash(key) + di) mod m
(1 ≤ i < m
,di = i
), and search for an empty address to store the element. - Pros and Cons:
- Advantages: As long as the hash table is not full, an empty address can always be found to store the collided element.
- Disadvantages: Prone to the “clustering” phenomenon, which reduces lookup efficiency.
Quadratic Probing
- Principle:
Hi = (Hash(key) ± di) mod m
(wherem
is the hash table length, a prime number of the form4k + 3
, anddi
is the incremental sequence:1², -1², 2², -2², ..., q²
). - Advantages: Compared to linear probing, it reduces the “clustering” phenomenon to some extent, improving lookup efficiency.
Chaining Method
- Principle: Records with the same hash address are linked into a singly linked list, and an array stores the head pointers of
m
linked lists. - Advantages:
- No collisions for non-synonymous keys.
- No “clustering” phenomenon.
- Linked list nodes are dynamically allocated, making it suitable for cases where the table length is uncertain.
(3) Hash Table Lookup Process
- Compute the hash function value of the given key to determine its initial storage location in the hash table.
- Check if the location is empty:
- If empty, the lookup fails.
- If not empty, compare the key stored at the location with the given key:
- If they match, the lookup succeeds.
- If not, compute a new address using the selected collision handling method and continue comparing until the key is found or the lookup fails.
II. Python Implementation of Hash Table Lookup
(1) Direct Addressing Method Implementation
def direct_addressing(key, a, b):
return a * key + b
# Example Usage
keys = [100, 300, 500, 700, 800, 900]
a, b = 1, 0 # Simplified example; adjust as needed for practical applications
addresses = [direct_addressing(key, a, b) for key in keys]
print(addresses)
(2) Division Method Implementation
def division_method(key, p):
return key % p
# Example Usage
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
p = 11 # Select an appropriate prime number
hash_values = [division_method(key, p) for key in keys]
print(hash_values)
(3) Open Addressing Method - Linear Probing Implementation
def linear_probing(hash_table, key, m):
hash_value = key % m
i = 0
while hash_table[(hash_value + i) % m] is not None:
i += 1
return (hash_value + i) % m
# Example Usage
m = 11 # Hash table length
hash_table = [None] * m
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
for key in keys:
hash_table[linear_probing(hash_table, key, m)] = key
print(hash_table)
(4) Open Addressing Method - Quadratic Probing Implementation
def quadratic_probing(hash_table, key, m):
hash_value = key % m
i = 0
while hash_table[(hash_value + i ** 2) % m] is not None or hash_table[(hash_value - i ** 2) % m] is not None:
i += 1
if hash_table[(hash_value + i ** 2) % m] is None:
return (hash_value + i ** 2) % m
else:
return (hash_value - i ** 2) % m
# Example Usage
m = 11 # Hash table length, must satisfy the form 4k + 3
hash_table = [None] * m
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
for key in keys:
hash_table[quadratic_probing(hash_table, key, m)] = key
print(hash_table)
(5) Chaining Method Implementation
class ListNode:
def __init__(self, key):
self.key = key
self.next = None
class HashTableChain:
def __init__(self, m):
self.m = m
self.table = [None] * m
def hash_function(self, key):
return key % self.m
def insert(self, key):
hash_value = self.hash_function(key)
if self.table[hash_value] is None:
self.table[hash_value] = ListNode(key)
else:
current = self.table[hash_value]
while current.next is not None:
current = current.next
current.next = ListNode(key)
def search(self, key):
hash_value = self.hash_function(key)
current = self.table[hash_value]
while current is not None:
if current.key == key:
return True
current = current.next
return False
# Example Usage
hash_table_chain = HashTableChain(13)
keys = [19, 14, 23, 1, 68, 20, 84, 27, 55, 11, 10, 79]
for key in keys:
hash_table_chain.insert(key)
print(hash_table_chain.search(23)) # Output: True
print(hash_table_chain.search(99)) # Output: False
III. Performance Analysis
Average Search Length (ASL)
The performance of hash table lookup is mainly measured by the average search length (ASL), which depends on the hash function, collision handling method, and load factor α
(α = number of records in the table / hash table length
).
Impact of Load Factor (α)
- Smaller α: Fewer records in the table, lower probability of collisions, and higher lookup efficiency.
- Larger α: More records in the table, higher probability of collisions, and reduced performance.
Performance Comparison of Collision Handling Methods
- Chaining Method: Performs well in handling collisions. Non-synonymous keys do not collide, and there is no “clustering” phenomenon. Suitable for scenarios with frequent insertions and deletions, offering relatively short average search lengths.
- Open Addressing Method: Quadratic probing is better than linear probing in reducing the “clustering” effect, but overall, open addressing is slightly less efficient than chaining when handling collisions.
Impact of Hash Function Selection
The division method, a commonly used hash function, disperses keys effectively when an appropriate prime p
is selected. Different hash functions suit different data distributions and application scenarios, making the choice of a suitable hash function crucial for hash table performance.
Hash tables are an important data structure for efficient data lookup. By carefully selecting hash functions, collision handling methods, and optimizing the load factor α
for specific application scenarios, efficient data lookup operations can be achieved. Understanding the principles and implementations of hash table lookup is essential for improving program performance and optimizing data processing workflows.