Hash Table Lookup: Python Implementation and Performance Analysis_python performance analysis-优快云博客

I. Principles of Hash Table Lookup

Hash table lookup is implemented based on a hash function, which maps keys to specific storage locations. Ideally, it can complete a lookup operation in constant time. The core principle is to compute the storage location of a record using the hash function Loc(i) = H(keyi), enabling fast access.

(1) Construction of Hash Functions

Direct Addressing Method

Principle: Use a linear function of the key as the hash address, such as Hash(key) = a * key + b (where a and b are constants). This method does not produce collisions but requires continuous address space, leading to low space efficiency.
Applicable Scenarios: Suitable for cases where the key distribution is relatively continuous and predictable, such as specific numbering systems.

Division Method

Principle: Hash(key) = key mod p (where p is an integer, typically a prime number with p ≤ m, and m is the hash table length). This is one of the most commonly used methods for constructing hash functions.
Applicable Scenarios: Widely used for processing various types of keys, effectively dispersing keys across the hash table.
Key Points: Choosing an appropriate value for p is critical. It requires comprehensive consideration of the key distribution, execution speed, key length, hash table size, and lookup frequency.

(2) Collision Handling Methods

Open Addressing Method

Linear Probing

Principle: When a collision occurs, compute the next probe address using Hi = (Hash(key) + di) mod m (1 ≤ i < m, di = i), and search for an empty address to store the element.
Pros and Cons:
- Advantages: As long as the hash table is not full, an empty address can always be found to store the collided element.
- Disadvantages: Prone to the “clustering” phenomenon, which reduces lookup efficiency.

Quadratic Probing

Principle: Hi = (Hash(key) ± di) mod m (where m is the hash table length, a prime number of the form 4k + 3, and di is the incremental sequence: 1², -1², 2², -2², ..., q²).
Advantages: Compared to linear probing, it reduces the “clustering” phenomenon to some extent, improving lookup efficiency.

Chaining Method

Principle: Records with the same hash address are linked into a singly linked list, and an array stores the head pointers of m linked lists.
Advantages:
- No collisions for non-synonymous keys.
- No “clustering” phenomenon.
- Linked list nodes are dynamically allocated, making it suitable for cases where the table length is uncertain.

(3) Hash Table Lookup Process

Compute the hash function value of the given key to determine its initial storage location in the hash table.
Check if the location is empty:
- If empty, the lookup fails.
- If not empty, compare the key stored at the location with the given key:
  - If they match, the lookup succeeds.
  - If not, compute a new address using the selected collision handling method and continue comparing until the key is found or the lookup fails.

II. Python Implementation of Hash Table Lookup

(1) Direct Addressing Method Implementation

def direct_addressing(key, a, b):
    return a * key + b

# Example Usage
keys = [100, 300, 500, 700, 800, 900]
a, b = 1, 0  # Simplified example; adjust as needed for practical applications
addresses = [direct_addressing(key, a, b) for key in keys]
print(addresses)

(2) Division Method Implementation

def division_method(key, p):
    return key % p

# Example Usage
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
p = 11  # Select an appropriate prime number
hash_values = [division_method(key, p) for key in keys]
print(hash_values)

(3) Open Addressing Method - Linear Probing Implementation

def linear_probing(hash_table, key, m):
    hash_value = key % m
    i = 0
    while hash_table[(hash_value + i) % m] is not None:
        i += 1
    return (hash_value + i) % m

# Example Usage
m = 11  # Hash table length
hash_table = [None] * m
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
for key in keys:
    hash_table[linear_probing(hash_table, key, m)] = key
print(hash_table)

(4) Open Addressing Method - Quadratic Probing Implementation

def quadratic_probing(hash_table, key, m):
    hash_value = key % m
    i = 0
    while hash_table[(hash_value + i ** 2) % m] is not None or hash_table[(hash_value - i ** 2) % m] is not None:
        i += 1
    if hash_table[(hash_value + i ** 2) % m] is None:
        return (hash_value + i ** 2) % m
    else:
        return (hash_value - i ** 2) % m

# Example Usage
m = 11  # Hash table length, must satisfy the form 4k + 3
hash_table = [None] * m
keys = [47, 7, 29, 11, 16, 92, 22, 8, 3]
for key in keys:
    hash_table[quadratic_probing(hash_table, key, m)] = key
print(hash_table)

(5) Chaining Method Implementation

class ListNode:
    def __init__(self, key):
        self.key = key
        self.next = None

class HashTableChain:
    def __init__(self, m):
        self.m = m
        self.table = [None] * m

    def hash_function(self, key):
        return key % self.m

    def insert(self, key):
        hash_value = self.hash_function(key)
        if self.table[hash_value] is None:
            self.table[hash_value] = ListNode(key)
        else:
            current = self.table[hash_value]
            while current.next is not None:
                current = current.next
            current.next = ListNode(key)

    def search(self, key):
        hash_value = self.hash_function(key)
        current = self.table[hash_value]
        while current is not None:
            if current.key == key:
                return True
            current = current.next
        return False

# Example Usage
hash_table_chain = HashTableChain(13)
keys = [19, 14, 23, 1, 68, 20, 84, 27, 55, 11, 10, 79]
for key in keys:
    hash_table_chain.insert(key)
print(hash_table_chain.search(23))  # Output: True
print(hash_table_chain.search(99))  # Output: False

III. Performance Analysis

Average Search Length (ASL)

The performance of hash table lookup is mainly measured by the average search length (ASL), which depends on the hash function, collision handling method, and load factor α (α = number of records in the table / hash table length).

Impact of Load Factor (α)

Smaller α: Fewer records in the table, lower probability of collisions, and higher lookup efficiency.
Larger α: More records in the table, higher probability of collisions, and reduced performance.

Performance Comparison of Collision Handling Methods

Chaining Method: Performs well in handling collisions. Non-synonymous keys do not collide, and there is no “clustering” phenomenon. Suitable for scenarios with frequent insertions and deletions, offering relatively short average search lengths.
Open Addressing Method: Quadratic probing is better than linear probing in reducing the “clustering” effect, but overall, open addressing is slightly less efficient than chaining when handling collisions.

Impact of Hash Function Selection

The division method, a commonly used hash function, disperses keys effectively when an appropriate prime p is selected. Different hash functions suit different data distributions and application scenarios, making the choice of a suitable hash function crucial for hash table performance.

Hash tables are an important data structure for efficient data lookup. By carefully selecting hash functions, collision handling methods, and optimizing the load factor α for specific application scenarios, efficient data lookup operations can be achieved. Understanding the principles and implementations of hash table lookup is essential for improving program performance and optimizing data processing workflows.