从哈希表到容器封装：手把手实现 myunordered_map 和 myunordered_set

最新推荐文章于 2025-11-25 16:50:45 发布

原创最新推荐文章于 2025-11-25 16:50:45 发布 · 737 阅读

CC 4.0 BY-SA版权

文章标签：

在 C++ 开发中，unordered_map 和 unordered_set 是高频使用的无序容器，其底层依赖哈希表实现高效的增删查改。但很多开发者只知其然，不知其所以然 —— 如何基于哈希表封装出这两个容器？如何处理迭代器、[] 运算符等核心特性？本文将以 SGI-STL 源码框架为参考，从哈希表改造入手，逐步实现 myunordered_map 和 myunordered_set，彻底打通 “哈希表底层” 与 “容器上层封装” 的逻辑链路。

一、核心思路：复用哈希表，解决 “数据类型差异”

myunordered_set 存储单个 Key（需去重），myunordered_map 存储键值对 pair<K, V>（Key 去重、Value 可修改），但两者底层都依赖哈希表的 “Key 映射 + 冲突解决” 能力。因此，核心设计思路是：

复用同一哈希表模板：通过模板参数泛化哈希表存储的数据类型（T），既支持 T=K（对应 myunordered_set），也支持 T=pair<const K, V>（对应 myunordered_map）；
提取 Key 的仿函数（KeyOfT）：哈希表需要通过 Key 计算哈希值和比较相等，但 T 可能是 Key 或键值对，需通过仿函数从 T 中提取 Key（如从 pair<K, V> 中提取 first）；
统一接口适配：在容器层（myunordered_map/myunordered_set）封装哈希表的接口，对外提供符合 STL 风格的 insert、find、erase 等方法。

二、前置准备：改造哈希表以支持泛化

首先需要改造之前实现的链地址法哈希表，使其支持泛化数据类型 T，并通过 KeyOfT 仿函数提取 Key。改造重点包括：

模板参数增加 KeyOfT（从 T 提取 Key 的仿函数）；

迭代器实现（单向迭代器，支持 ++、*、->）；

insert 接口返回 pair<Iterator, bool>（为 myunordered_map 的 [] 运算符做准备）。

2.1 哈希表节点与仿函数定义

#include <vector>
#include <algorithm>
#include <string>
#include <iostream>
using namespace std;

// 1. 哈希函数仿函数（支持内置类型和 string）
template <class K>
struct HashFunc {
    size_t operator()(const K& key) {
        return (size_t)key; // 内置类型直接转换为 size_t
    }
};

// string 特化：BKDR 哈希算法（减少冲突）
template <>
struct HashFunc<string> {
    size_t operator()(const string& key) {
        size_t hash = 0;
        for (char ch : key) {
            hash = hash * 131 + ch; // 131 是质数，增强分布均匀性
        }
        return hash;
    }
};

namespace hash_bucket {
    // 2. 哈希表节点（存储泛化数据 T）
    template <class T>
    struct HashNode {
        T _data;          // 存储的数据（K 或 pair<const K, V>）
        HashNode<T>* _next; // 链表指针

        HashNode(const T& data) : _data(data), _next(nullptr) {}
    };

    // 3. 哈希表迭代器（前置声明，供 HashTable 友元使用）
    template <class K, class T, class Ptr, class Ref, class KeyOfT, class Hash>
    struct HTIterator;

    // 4. 泛化哈希表（链地址法）
    template <class K, class T, class KeyOfT, class Hash = HashFunc<K>>
    class HashTable {
        // 迭代器需要访问哈希表的私有成员（_tables、size 等），声明为友元
        template <class K, class T, class Ptr, class Ref, class KeyOfT, class Hash>
        friend struct HTIterator;

    private:
        using Node = HashNode<T>;
        vector<Node*> _tables; // 指针数组：存储每个桶的头节点
        size_t _n = 0;         // 已存储元素个数

        // 查找下一个质数（参考 SGI-STL，确保哈希表容量为质数）
        unsigned long __stl_next_prime(unsigned long n) {
            static const unsigned long prime_list[] = {
                53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593,
                49157, 98317, 196613, 393241, 786433, 1572869, 3145739,
                6291469, 12582917, 25165843, 50331653, 100663319, 201326611,
                402653189, 805306457, 1610612741, 3221225473, 4294967291
            };
            const unsigned long* first = prime_list;
            const unsigned long* last = prime_list + sizeof(prime_list) / sizeof(prime_list[0]);
            const unsigned long* pos = lower_bound(first, last, n);
            return pos == last ? *(last - 1) : *pos;
        }

    public:
        // 迭代器类型定义（普通迭代器和 const 迭代器）
        using Iterator = HTIterator<K, T, T*, T&, KeyOfT, Hash>;
        using ConstIterator = HTIterator<K, T, const T*, const T&, KeyOfT, Hash>;

        // 构造函数：初始化哈希表为最小质数（53）
        HashTable() {
            _tables.resize(__stl_next_prime(0), nullptr);
        }

        // 析构函数：释放所有节点（防止内存泄漏）
        ~HashTable() {
            for (size_t i = 0; i < _tables.size(); ++i) {
                Node* cur = _tables[i];
                while (cur) {
                    Node* next = cur->_next;
                    delete cur;
                    cur = next;
                }
                _tables[i] = nullptr;
            }
        }

        // -------------------------- 迭代器相关接口 --------------------------
        // begin()：返回第一个非空桶的第一个节点
        Iterator Begin() {
            if (_n == 0) return End();
            for (size_t i = 0; i < _tables.size(); ++i) {
                if (_tables[i]) {
                    return Iterator(_tables[i], this);
                }
            }
            return End();
        }

        // end()：返回空节点构造的迭代器（表示末尾）
        Iterator End() {
            return Iterator(nullptr, this);
        }

        ConstIterator Begin() const {
            if (_n == 0) return End();
            for (size_t i = 0; i < _tables.size(); ++i) {
                if (_tables[i]) {
                    return ConstIterator(_tables[i], this);
                }
            }
            return End();
        }

        ConstIterator End() const {
            return ConstIterator(nullptr, this);
        }

        // -------------------------- 核心功能接口 --------------------------
        // insert：插入数据，返回 (迭代器, 是否插入成功)
        pair<Iterator, bool> Insert(const T& data) {
            KeyOfT kot;
            // 1. 检查 Key 是否已存在（去重）
            Iterator it = Find(kot(data));
            if (it != End()) {
                return make_pair(it, false); // 已存在，返回现有迭代器和 false
            }

            Hash hs;
            // 2. 负载因子 == 1 时扩容（重新映射所有节点）
            if (_n == _tables.size()) {
                size_t new_size = __stl_next_prime(_tables.size() + 1);
                vector<Node*> new_tables(new_size, nullptr);

                for (size_t i = 0; i < _tables.size(); ++i) {
                    Node* cur = _tables[i];
                    while (cur) {
                        Node* next = cur->_next;
                        // 计算节点在新表中的桶位置
                        size_t new_hashi = hs(kot(cur->_data)) % new_size;
                        // 头插法插入新表
                        cur->_next = new_tables[new_hashi];
                        new_tables[new_hashi] = cur;
                        cur = next;
                    }
                    _tables[i] = nullptr; // 旧表置空
                }
                _tables.swap(new_tables); // 交换新旧表
            }

            // 3. 插入新节点（头插法）
            size_t hashi = hs(kot(data)) % _tables.size();
            Node* new_node = new Node(data);
            new_node->_next = _tables[hashi];
            _tables[hashi] = new_node;
            ++_n;

            return make_pair(Iterator(new_node, this), true); // 返回新节点迭代器和 true
        }

        // find：根据 Key 查找，返回迭代器
        Iterator Find(const K& key) {
            KeyOfT kot;
            Hash hs;
            size_t hashi = hs(key) % _tables.size();
            Node* cur = _tables[hashi];

            while (cur) {
                if (kot(cur->_data) == key) { // 从 T 中提取 Key 比较
                    return Iterator(cur, this);
                }
                cur = cur->_next;
            }
            return End(); // 未找到，返回 end()
        }

        // erase：根据 Key 删除，返回是否成功
        bool Erase(const K& key) {
            KeyOfT kot;
            Hash hs;
            size_t hashi = hs(key) % _tables.size();
            Node* prev = nullptr;
            Node* cur = _tables[hashi];

            while (cur) {
                if (kot(cur->_data) == key) {
                    // 头节点删除
                    if (prev == nullptr) {
                        _tables[hashi] = cur->_next;
                    } else {
                        // 中间/尾节点删除
                        prev->_next = cur->_next;
                    }
                    delete cur;
                    --_n;
                    return true;
                }
                prev = cur;
                cur = cur->_next;
            }
            return false; // 未找到，返回 false
        }
    };

    // -------------------------- 哈希表迭代器实现 --------------------------
    template <class K, class T, class Ptr, class Ref, class KeyOfT, class Hash>
    struct HTIterator {
    private:
        using Node = HashNode<T>;
        using Self = HTIterator<K, T, Ptr, Ref, KeyOfT, Hash>;

        Node* _node;                          // 指向当前节点
        const HashTable<K, T, KeyOfT, Hash>* _pht; // 指向哈希表（用于找下一个桶）

    public:
        // 迭代器类别：单向迭代器（只能 ++）
        using iterator_category = forward_iterator_tag;
        using value_type = T;
        using pointer = Ptr;
        using reference = Ref;
        using difference_type = ptrdiff_t;

        // 构造函数
        HTIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht)
            : _node(node), _pht(pht) {}

        // 重载 *：返回当前节点的数据引用
        Ref operator*() {
            return _node->_data;
        }

        // 重载 ->：返回当前节点的数据指针（支持 it->first / it->second）
        Ptr operator->() {
            return &_node->_data;
        }

        // 重载 !=：比较节点指针是否相等
        bool operator!=(const Self& s) const {
            return _node != s._node;
        }

        // 重载 ==：比较节点指针是否相等
        bool operator==(const Self& s) const {
            return _node == s._node;
        }

        // 重载 ++：单向迭代器核心（难点）
        Self& operator++() {
            // 1. 若当前桶还有下一个节点，直接移动到下一个节点
            if (_node->_next) {
                _node = _node->_next;
            } else {
                // 2. 当前桶已遍历完，找下一个非空桶
                KeyOfT kot;
                Hash hs;
                // 计算当前节点所在的桶位置
                size_t cur_bucket = hs(kot(_node->_data)) % _pht->_tables.size();
                // 从下一个桶开始找非空桶
                size_t next_bucket = cur_bucket + 1;
                while (next_bucket < _pht->_tables.size()) {
                    if (_pht->_tables[next_bucket]) {
                        _node = _pht->_tables[next_bucket];
                        break;
                    }
                    ++next_bucket;
                }
                // 3. 所有桶遍历完，指向 null（end()）
                if (next_bucket == _pht->_tables.size()) {
                    _node = nullptr;
                }
            }
            return *this;
        }

        // 后置 ++（复用前置 ++）
        Self operator++(int) {
            Self tmp = *this;
            ++(*this);
            return tmp;
        }
    };
} // namespace hash_bucket

三、封装 myunordered_set：存储单个 Key 并去重

myunordered_set 存储单个 Key，需满足 “去重 + 无序” 特性，且迭代器不允许修改 Key（只读）。封装要点：

哈希表的 T 类型为 const K（禁止修改 Key）；
KeyOfT 仿函数直接返回 T（因 T 本身就是 Key）；
对外封装 insert、find、erase 等接口，适配哈希表的泛化接口。

3.1 myunordered_set 完整实现

namespace bit {
    template <class K, class Hash = HashFunc<K>>
    class unordered_set {
    private:
        // KeyOfT 仿函数：从 T（const K）中提取 Key（直接返回）
        struct SetKeyOfT {
            const K& operator()(const K& key) {
                return key;
            }
        };

        // 复用哈希表：T=const K（禁止修改 Key），KeyOfT=SetKeyOfT
        using HashTable = hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>;
        HashTable _ht;

    public:
        // 迭代器类型（复用哈希表的迭代器，因 T=const K，迭代器天然只读）
        using iterator = typename HashTable::ConstIterator;
        using const_iterator = typename HashTable::ConstIterator;

        // -------------------------- 迭代器接口 --------------------------
        iterator begin() const {
            return _ht.Begin();
        }

        iterator end() const {
            return _ht.End();
        }

        // -------------------------- 核心功能接口 --------------------------
        // insert：插入 Key，返回 (迭代器, 是否插入成功)
        pair<iterator, bool> insert(const K& key) {
            return _ht.Insert(key);
        }

        // find：根据 Key 查找，返回迭代器
        iterator find(const K& key) const {
            return _ht.Find(key);
        }

        // erase：根据 Key 删除，返回是否成功
        bool erase(const K& key) {
            return _ht.Erase(key);
        }

        // 获取元素个数
        size_t size() const {
            // 哈希表的 _n 是私有成员，可在 HashTable 中增加 size() 接口，此处简化省略
            // 实际实现需在 HashTable 中添加：size_t size() const { return _n; }
        }
    };
} // namespace bit

3.2 myunordered_set 使用示例

void test_unordered_set() {
    bit::unordered_set<int> s;
    int a[] = {4, 2, 6, 1, 3, 5, 15, 7, 16, 14, 3, 3, 15};

    // 插入元素（自动去重）
    for (auto e : a) {
        auto [it, success] = s.insert(e); // C++17 结构化绑定
        if (success) {
            cout << "插入成功：" << *it << endl;
        } else {
            cout << "插入失败（已存在）：" << *it << endl;
        }
    }

    // 遍历（无序）
    cout << "\n遍历 unordered_set：";
    for (auto e : s) {
        cout << e << " ";
    }
    cout << endl;

    // 查找
    auto it = s.find(5);
    if (it != s.end()) {
        cout << "\n找到元素：" << *it << endl;
    } else {
        cout << "\n未找到元素：5" << endl;
    }

    // 删除
    bool ret = s.erase(3);
    cout << "\n删除元素 3：" << (ret ? "成功" : "失败") << endl;

    // 遍历验证删除结果
    cout << "\n删除后遍历：";
    for (auto e : s) {
        cout << e << " ";
    }
    cout << endl;
}

四、封装 myunordered_map：存储键值对并支持 []

myunordered_map 存储 pair<K, V>，需满足 “Key 去重 + Value 可修改”，且支持 [] 运算符（核心特性）。封装要点：

哈希表的 T 类型为 pair<const K, V>（Key 不可修改，Value 可修改）；
KeyOfT 仿函数从 pair 中提取 first（Key）；
实现 [] 运算符：通过 insert 接口实现 “不存在则插入，存在则返回 Value 引用”。

4.1 myunordered_map 完整实现

namespace bit {
    template <class K, class V, class Hash = HashFunc<K>>
    class unordered_map {
    private:
        // KeyOfT 仿函数：从 T（pair<const K, V>）中提取 Key（first）
        struct MapKeyOfT {
            const K& operator()(const pair<const K, V>& kv) {
                return kv.first;
            }
        };

        // 复用哈希表：T=pair<const K, V>，KeyOfT=MapKeyOfT
        using HashTable = hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>;
        HashTable _ht;

    public:
        // 迭代器类型（普通迭代器可修改 Value，const 迭代器只读）
        using iterator = typename HashTable::Iterator;
        using const_iterator = typename HashTable::ConstIterator;

        // -------------------------- 迭代器接口 --------------------------
        iterator begin() {
            return _ht.Begin();
        }

        iterator end() {
            return _ht.End();
        }

        const_iterator begin() const {
            return _ht.Begin();
        }

        const_iterator end() const {
            return _ht.End();
        }

        // -------------------------- 核心功能接口 --------------------------
        // insert：插入键值对，返回 (迭代器, 是否插入成功)
        pair<iterator, bool> insert(const pair<K, V>& kv) {
            return _ht.Insert(kv);
        }

        // find：根据 Key 查找，返回迭代器
        iterator find(const K& key) {
            return _ht.Find(key);
        }

        const_iterator find(const K& key) const {
            return _ht.Find(key);
        }

        // erase：根据 Key 删除，返回是否成功
        bool erase(const K& key) {
            return _ht.Erase(key);
        }

        // -------------------------- [] 运算符实现 --------------------------
        // 功能：1. 访问 Value；2. 不存在则插入默认 Value
        V& operator[](const K& key) {
            // insert 返回 (迭代器, 是否插入成功)
            auto [it, success] = _ht.Insert(make_pair(key, V()));
            // 返回 Value 的引用（it->second）
            return it->second;
        }

        // 获取元素个数
        size_t size() const {
            // 同 unordered_set，需在 HashTable 中添加 size() 接口
        }
    };
} // namespace bit

4.2 myunordered_map 使用示例

void test_unordered_map() {
    bit::unordered_map<string, string> dict;

    // 插入键值对（三种方式）
    dict.insert(pair<string, string>("sort", "排序"));
    dict.insert({"left", "左边"});
    dict["right"] = "右边"; // 使用 [] 插入

    // 访问和修改 Value
    dict["left"] += "（方位词）"; // 修改已存在的 Value
    dict["insert"] = "插入";     // 插入新键值对

    // 遍历（无序，Key 不可修改，Value 可修改）
    cout << "遍历 unordered_map：" << endl;
    for (auto& [key, val] : dict) { // C++17 结构化绑定
        val += "；"; // 修改 Value
        cout << key << " : " << val << endl;
    }

    // 查找
    auto it = dict.find("sort");
    if (it != dict.end()) {
        cout << "\n找到键值对：" << it->first << " : " << it->second << endl;
    }

    // 删除
    bool ret = dict.erase("right");
    cout << "\n删除键 right：" << (ret ? "成功" : "失败") << endl;

    // 遍历验证删除结果
    cout << "\n删除后遍历：" << endl;
    for (auto it = dict.begin(); it != dict.end(); ++it) {
        cout << it->first << " : " << it->second << endl;
    }
}

五、关键细节解析

5.1 迭代器的 “单向” 特性

哈希表的迭代器是单向迭代器（仅支持 ++），原因是哈希表的无序性 —— 无法通过计算直接定位前一个元素，只能从当前位置向后遍历。这与 set/map 的双向迭代器（支持 ++/--）形成鲜明对比（红黑树的有序性支持双向遍历）。

5.2 Key 的 “不可修改” 设计

myunordered_set 的 T 为 const K：确保迭代器无法修改 Key，避免破坏去重规则；

myunordered_map 的 T 为 pair<const K, V>：Key 不可修改（避免破坏哈希表的映射关系），但 Value 可修改（满足业务需求）。

5.3 `[]` 运算符的实现逻辑

myunordered_map 的 [] 运算符依赖哈希表 insert 返回的 pair<Iterator, bool>：

若 Key 不存在：insert 插入 pair(key, V())（默认构造 Value），返回新节点的迭代器和 true，[] 返回新 Value 的引用；
若 Key 已存在：insert 不插入新节点，返回现有节点的迭代器和 false，[] 返回现有 Value 的引用。

六、总结

本文通过 “泛化哈希表→封装容器” 的步骤，实现了 myunordered_map 和 myunordered_set，核心收获如下：

复用思想：通过模板参数和仿函数，让同一哈希表支持不同数据类型（Key 或键值对），减少代码冗余；
接口适配：在容器层封装哈希表的底层接口，对外提供符合 STL 风格的统一接口，降低使用成本；
细节把控：迭代器的单向特性、Key 的不可修改设计、[] 运算符的逻辑，都是确保容器正确性和易用性的关键。