《C++哈希表技术深度解析：高效存储与检索的实现原理与优化实践》

最新推荐文章于 2025-12-05 20:00:00 发布

原创最新推荐文章于 2025-12-05 20:00:00 发布 · 648 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#c++ #散列表 #开发语言

C++学习专栏收录该内容

52 篇文章

订阅专栏

前言：大厂面试中，'请实现一个哈希表'是经典考题。要给出满分答案，必须掌握：哈希函数设计原则、装载因子与扩容的关系、各种冲突解决方案的优劣比较。本文不仅涵盖这些核心知识点，更会揭示面试官期待的加分项——比如如何评估哈希函数的雪崩效应，或是解释Java HashMap与C++ unordered_map的关键差异。"

1. 只能存储key为整形的元素，其他类型怎么解决？

2.除留余数法，最好模一个素数，如何每次快速取一个类似两倍关系的素数？

一、哈希表的背景

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即 O(log_2 N)，搜索的效率取决于搜索过程中元素的比较次数。

理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素！

（总结：对映射关系的改进形成了哈希表）

插入元素：根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

搜索元素：对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希方法，哈希方法中使用的转换函数称为哈希函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表)

哈希表有两种高效的存储方式：线性+哈希桶（解决哈希冲突）

例如：数据集合{1，7，6，4，5，9}；
哈希函数设置为：hash(key) = key % capacity; capacity为存储元素底层空间总的大小。

用该方法进行搜索不必进行多次关键码的比较，因此搜索的速度比较快
问题：按照上述哈希方式，向集合中插入元素44，会出现什么问题？

这就涉及到我们的哈希冲突了

二、哈希冲突

1.负载因子

负载因子的定义：哈希表中已存储的元素数量 / 哈希表的总容量（桶的数量）

负载因子的计算公式：入=n/m

n：是哈希表中当前存储的有效元素数量
m：是哈希表的总容量（即桶数组的长度，如：vector<Node*> 的大小）

负载因子是哈希冲突概率和内存利用率的 “平衡器”

（1）负载因子越小 → 哈希冲突概率越低，插入、查找、删除的时间复杂度接近 O ( 1 )

（2）负载因子越大 → 哈希冲突概率越高，查找效率可能会退化至O（n）

负载因子超过阈值时会发什么？

负载因子驱动的扩容流程：

当负载因子超过阈值时，哈希表会触发扩容，流程如下：

新建更大的桶数组：新容量通常是原容量的 2 倍（或接近的质数，依实现而定）
重新映射所有元素：遍历旧哈希表的所有元素，用新哈希函数（或新容量重新取模）将元素插入新桶
释放旧内存：销毁旧桶数组，替换为新桶数组

2.哈希冲突的概念

哈希冲突（Hash Collision）：哈希冲突的本质是有限存储空间与无限可能输入之间的矛盾

不同关键字通过相同哈希哈数计算出相同的哈希地址，该种现象称为哈希冲突
或哈希碰撞。（即：映射到哈希表的同一个桶或位置）

把具有不同关键码而具有相同哈希地址的数据元素称为“同义词”。

引起哈希冲突的一个原因可能是：哈希函数设计不够合理。

哈希函数设计原则：

哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值
域必须在0到m-1之间
哈希函数计算出来的地址能均匀分布在整个空间中
哈希函数应该比较简单

三、处理哈希冲突

解决哈希冲突两种常见的方法是：闭散列和开散列

【1】闭散列

闭散列：也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。

线性探测

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。

插入

通过哈希函数获取待插入元素在哈希表中的位置
如果该位置中没有元素则直接插入新元素，如果该位置中有元素发生哈希冲突，
使用线性探测找到下一个空位置，插入新元素

如上图插入44时，4位置已满，便顺延后位空位填入。

但是这里就涉及到一个问题，删除：
采用闭散列处理哈希冲突时，不能随便物理删除哈希表中已有的元素，若直接删除元素
会影响其他元素的搜索。比如删除元素4，如果直接删除掉，44查找起来可能会受影
响。因此线性探测采用标记的伪删除法来删除一个元素。

后续实现的时候采用这样的方式：

// 哈希表每个空间给个标记
// EMPTY此位置空， EXIST此位置已经有元素， DELETE元素已经删除
enum State{EMPTY, EXIST, DELETE};

上文我们也看了，如果负载因子超过阈值的时候需要扩容，那么，什么时候需要扩容呢？？

线性探测优点：实现非常简单，
线性探测缺点：一旦发生哈希冲突，所有的冲突连在一起，容易产生数据“堆积”，即：不同
关键码占据了可利用的空位置，使得寻找某关键码的位置需要许多次比较，导致搜索效率降
低。如何缓解呢？

这我们就引入了二次探测法。

二次探测（线性基础上了解）

二次探测法：在发生冲突时，从发生冲突的位置开始，按照二次方的步长，向右进行跳跃式探测，直至找到下一个未存储数据的位置

本质是通过平方运算构造伪随机探测序列：Hₗ = (H₀ + c₁i + c₂i²) % key
当c₁=0, c₂=1时为标准二次探测，避免线性探测的算术序列

研究表明：当表的长度为质数且表装载因子a不超过0.5时，新的表项一定能够插入，而且任
何一个位置都不会被探查两次。

因此只要表中有一半的空位置，就不会存在表满的问题。在搜索时可以不考虑表装满的情况，但在插入时必须确保表的装载因子a不超过0.5，如果超出必须考虑增容。

【2】开散列

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地
址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链
接起来，各链表的头结点存储在哈希表中。

它的核心思路是：用数组 + 链表（或其他动态结构）的组合，让冲突元素 “链” 在一起，既简单又高效。

插入元素时：

通过哈希函数计算 key 的哈希值，确定要放入数组的哪个 “桶”（即：数组索引）
若该桶对应的链表为空，直接插入
若已存在元素（发生冲突），就把新元素追加到链表末尾

查找/删除元素时：

先通过哈希函数找到对应桶
再遍历链表逐个匹配 key

从上图可以看出，开散列中每个桶中放的都是发生哈希冲突的元素。

桶的扩容

桶的个数是一定的，随着元素的不断插入，每个桶中元素的个数不断增多，极端情况下，可
能会导致一个桶中链表节点非常多，会影响的哈希表的性能，因此在一定条件下需要对哈希
表进行增容，那该条件怎么确认呢？

开散列最好的情况是：每个哈希桶中刚好挂一个节点，再继续插入元素时，每一次都会发生哈希冲突，因此，在元素个数刚好等于桶的个数时，可以给哈希表增容。

开散列的思考

1. 只能存储key为整形的元素，其他类型怎么解决？

我们需要为哈希表增加一个 仿函数（也叫哈希函数对象），该仿函数的作用是，把 key 转换成一个可用于取模的整数。

1.若 key 本身能较方便地转换为整数，且转换后不易引发哈希冲突，直接使用哈希表默认的仿函数参数即可

2.若 key 无法直接转换为整数，就需要我们自行实现一个仿函数，并传递给哈希表

实现这类仿函数的核心要求是：让 key 的每个部分（字符、字段等）都参与计算，尽可能保证不同 key 转换后的整数值互不相同

// 哈希函数采用处理余数法，被模的key必须要为整形才可以处理，此处提供将key转化为整形的方法
// 整形数据不需要转化
template<class T>
class DefHashF
{
public:
    size_t operator()(const T& val)
   {
        return val;
}
};
// key为字符串类型，需要将其转化为整形
class Str2Int
{
public:
    size_t operator()(const string& s)
   {
        const char* str = s.c_str();
        unsigned int seed = 131; // 31 131 1313 13131 131313
        unsigned int hash = 0;
        while (*str)
       {
            hash = hash * seed + (*str++);
       }
        
        return (hash & 0x7FFFFFFF);
   }
};
// 为了实现简单，此哈希表中我们将比较直接与元素绑定在一起
template<class V, class HF>
class HashBucket
{
    // ……
private:
    size_t HashFunc(const V& data)
   {
        return HF()(data.first)%_ht.capacity();
   }
};

2.除留余数法，最好模一个素数，如何每次快速取一个类似两倍关系的素数？

第一种：直接查表

// 预计算适合哈希表扩容的素数序列（近似2倍递增）
const std::vector<size_t> prime_table = {
    53,         // ≈2^6
    97,         // ≈2^7 + 25
    193,        // ≈2^8 - 63
    389,        // ≈2^9 - 123
    769,        // ≈2^10 - 255
    1543,       // ≈2^11 + 23
    3079,       // ≈2^12 - 57
    6151,       // ≈2^13 - 217
    12289,      // ≈2^14 + 1 (已知的费马素数)
    24593,      // ≈2^15 - 79
    49157,      // ≈2^16 - 107
    98317,      // ≈2^17 - 307
    196613,     // ≈2^18 - 395
    393241,     // ≈2^19 - 87
    786433,     // ≈2^20 - 255
    1572869,    // ≈2^21 - 155
    3145739,    // ≈2^22 - 869
    6291469,    // ≈2^23 - 211
    12582917,   // ≈2^24 - 107
    25165843,   // ≈2^25 - 109
    50331653,   // ≈2^26 - 11
    100663319,  // ≈2^27 - 9
    201326611,  // ≈2^28 - 397
    402653189,  // ≈2^29 - 3
    805306457,  // ≈2^30 - 167
    1610612741, // ≈2^31 - 27
};

第二种：动态查找

size_t GetNextPrime(size_t prime)
 {
 const int PRIMECOUNT = 28;
 static const size_t primeList[PRIMECOUNT] =
 {
 53ul, 97ul, 193ul, 389ul, 769ul,
 1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
 49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
 1572869ul, 3145739ul, 6291469ul, 12582917ul, 
25165843ul,
 50331653ul, 100663319ul, 201326611ul, 402653189ul, 
805306457ul,
 1610612741ul, 3221225473ul, 4294967291ul
 };
 size_t i = 0;
 for (; i < PRIMECOUNT; ++i)
 {
 if (primeList[i] > prime)
 return primeList[i];
 }return primeList[i];
 }

四、方法实现

线性探测法

1.节点设置

enum State{EMPTY, EXIST, DELETE};

//节点结构
template<class T,class V>
struct Node
{
	//数据
	pair<T, V> dict;
	//状态
	enum State state = EMPTY;
};

2.哈希结构

//Ha_Sh结构
template<class T,class V>
class Ha_Sh
{
public:
 
	Ha_Sh()
	{
		_table.resize(10);
	}
 
 
private:
	//哈希存储
	vector<Node<T, V>> _table;
	//数据有效个数
	int size = 0;
};

3.插入

//计算插入下标
int val = date.first % _table.size();

int sum = date.first;
//循环找位置
while (_table[val].state == existence)
{
	//更新下标
	val = (++sum) % _table.max_size();
}
//插入
_table[val].dict = date;
_table[val].state = EXIST;
size++;

//计算负载因子（0.75）
double factor = (double)size / _table.size();
//准备扩容
if (factor > 0.75)
{
	vector<Node<T, V>> table(2 * _table.size());
	//转移元素
	for (int i = 0; i < _table.size(); i++)
	{
		if (_table[i].state == EXIST)
		{
			table[i] = _table[i];
		}
	}
	_table.swap(table);
}

//取整
template<class T>
struct HaShiFunc
{
	//如果是整型
	const size_t operator()(const T& date)
	{
		return size_t(date);
	}
};
//字符型（特化）
template<>
struct HaShiFunc<string>
{
	const size_t operator()(const string& date)
	{
		size_t _date = 0;
		for (auto e : date)
		{
           //去重算法
			_date *= 131;
			_date += e;
		}
		return _date;
	}
};

4.查找

查找我们暂时返回当前位置的哈希数据结构（值+状态）

注意：可以把 Key 值封装一下，防止修改

查找思路：根据映射下标去绕圈似的查找，如果找到空状态依旧没有找到就返回空指针

V* find(const T& key) {
    size_t index = find_index(key);
    if (index == _table.size()) {
        return nullptr;
    }
    return &_table[index].data.second;
}

5.删除

bool erase(const T& key) {
    size_t index = find_index(key);
    if (index == _table.size()) {
        return false;
    }

    _table[index].state = DELETE;
    _size--;
    return true;
}

总代码：test.h

#define _CRT_SECURE_NO_WARNINGS
#include <vector>
#include <utility>
#include <string>
#include <algorithm>

// 节点状态枚举
enum State { EMPTY, EXIST, DELETE };

// 哈希节点结构
template<class T, class V>
struct HashNode {
    std::pair<T, V> data;
    State state = EMPTY;
};

// 哈希函数对象
template<class T>
struct HashFunc {
    size_t operator()(const T& key) const {
        return static_cast<size_t>(key);
    }
};

// 字符串特化版本
template<>
struct HashFunc<std::string> {
    size_t operator()(const std::string& key) const {
        size_t hash = 0;
        for (char ch : key) {
            hash = hash * 131 + ch;
        }
        return hash;
    }
};

// 哈希表主体
template<class T, class V, class Hash = HashFunc<T>>
class HashTable {
public:
    HashTable(size_t initial_size = 10) : _table(initial_size), _size(0) {}

    bool insert(const std::pair<T, V>& data) {
        // 检查是否需要扩容
        if (load_factor() > 0.75) {
            rehash();
        }

        size_t index = hash_index(data.first);
        size_t start = index;
        size_t probe_count = 1;

        // 线性探测
        while (_table[index].state == EXIST) {
            // 键已存在，插入失败
            if (_table[index].data.first == data.first) {
                return false;
            }

            // 二次探测：index = (start + probe_count * probe_count) % _table.size();
            // 这里使用线性探测
            index = (index + 1) % _table.size();
            probe_count++;

            // 防止无限循环
            if (index == start) {
                return false;
            }
        }

        // 插入数据
        _table[index].data = data;
        _table[index].state = EXIST;
        _size++;
        return true;
    }

    bool erase(const T& key) {
        size_t index = find_index(key);
        if (index == _table.size()) {
            return false;
        }

        _table[index].state = DELETE;
        _size--;
        return true;
    }

    V* find(const T& key) {
        size_t index = find_index(key);
        if (index == _table.size()) {
            return nullptr;
        }
        return &_table[index].data.second;
    }

    size_t size() const { return _size; }
    bool empty() const { return _size == 0; }

private:
    std::vector<HashNode<T, V>> _table;
    size_t _size;

    double load_factor() const {
        return static_cast<double>(_size) / _table.size();
    }

    size_t hash_index(const T& key) const {
        Hash hash_func;
        return hash_func(key) % _table.size();
    }

    size_t find_index(const T& key) const {
        size_t index = hash_index(key);
        size_t start = index;

        do {
            if (_table[index].state == EXIST && _table[index].data.first == key) {
                return index;
            }

            if (_table[index].state == EMPTY) {
                break;
            }

            index = (index + 1) % _table.size();
        } while (index != start);

        return _table.size(); // 未找到
    }

    void rehash() {
        size_t new_size = get_next_prime(_table.size() * 2);
        std::vector<HashNode<T, V>> new_table(new_size);

        for (size_t i = 0; i < _table.size(); ++i) {
            if (_table[i].state == EXIST) {
                size_t new_index = HashFunc<T>()(_table[i].data.first) % new_size;

                // 处理新表中的冲突
                while (new_table[new_index].state == EXIST) {
                    new_index = (new_index + 1) % new_size;
                }

                new_table[new_index] = _table[i];
            }
        }

        _table.swap(new_table);
    }

    size_t get_next_prime(size_t prime) const {
        static constexpr size_t prime_list[] = {
            53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593,
            49157, 98317, 196613, 393241, 786433, 1572869, 3145739,
            6291469, 12582917, 25165843, 50331653, 100663319,
            201326611, 402653189, 805306457, 1610612741, 3221225473
        };

        auto it = std::upper_bound(std::begin(prime_list), std::end(prime_list), prime);
        return it != std::end(prime_list) ? *it : prime_list[std::size(prime_list) - 1];
    }
};

链地址法

1.节点结构

//节点结构
template<class T, class V>
struct Node
{
	Node(const pair<T, V>& date)
		:dict(date)
	{ }
	//数据
	pair<T, V> dict;
	//下一个节点
	Node<T, V>* _next = nullptr;
};

2.哈希结构

//Ha_Sh结构
template<class T, class V, class HashFunc = HaShiFunc<T>>
class Ha_Sh
{
public:
	typedef Node<const T, V> Node;
 
	Ha_Sh()
	{
		_table.resize(5);
	}
	~Ha_Sh()
	{
		Node* cur = nullptr;
		Node* next = nullptr;
		for (int i = 0; i < _table.size(); i++)
		{
			cur = _table[i];
			while (cur)
			{
				next = cur->_next;
				//释放cur
				delete cur;
				cur = next;
			}
			_table[i] = nullptr;
		}
        size = 0;
	}
 
private:
	//哈希存储
	vector<Node*> _table;
	//数据有效个数
	size_t size = 0;
};

3.插入

//插入
bool insert(const pair<T, V>& date)
{
	HashFunc Hash;
	//负载因子是否为1
	if (size == _table.size())
	{
		vector<Node*> newtable(_table.size() * 2, nullptr);
		Node* cur = nullptr;
		Node* next = nullptr;
		//重新映射
		for (int i = 0; i < _table.size(); i++)
		{
			cur = _table[i];
			while (cur)
			{
				next = cur->_next;
				size_t val = Hash(cur->dict.first) % newtable.size();
				//头插
				cur->_next = newtable[val];
				newtable[val] = cur;
				//下一个节点
				cur = next;
			}
			_table[i] = nullptr;
		}
		_table.swap(newtable);
	}
	//计算下标
	size_t val = Hash(date.first) % _table.size();
	Node* _date = new Node(date);
	//头插
	_date->_next = _table[val];
	_table[val] = _date;
	size++;
	return true;
}

4.查找

//查找
Node* Find(const T& date)
{
	if (size == 0)
	{
		return nullptr;
	}
    //计算下标
	size_t val = Hash(date) % _table.size();
    //遍历链表
	Node* cur = _table[val];
	while (cur)
	{
		if (cur->dict.first == date)
		{
			return cur;
		}
		cur = cur->_next;
	}
	return nullptr;
}

5.删除

//删除
bool Erase(const T& date)
{
	Node* cur = Find(date);
	if (cur)
	{
		HashFunc Hash;
		size_t val = Hash(date) % _table.size();
		Node* sum = _table[val];
		for (int i = 0; i < _table.size(); i++)
		{
			//如果删的是头
			if (cur == sum)
			{
				_table[i] = cur->_next;
				break;
			}
			//删中间节点
			while (sum->_next && sum->_next->dict.first != date)
			{
				sum = sum->_next;
			}
			if (sum->_next && sum->_next->dict.first == date)
			{
				sum->_next = cur->_next;
				break;
			}
		}
		delete cur;
		size--;
		return true;
	}
	return false;
}