从红黑树到容器：手把手教你封装 MyMap 与 MySet

最新推荐文章于 2025-11-25 10:01:10 发布

原创最新推荐文章于 2025-11-25 10:01:10 发布 · 743 阅读

CC 4.0 BY-SA版权

文章标签：

在 C++ STL 中，map和set是高频使用的关联式容器，而它们的底层实现都依赖于红黑树这一高效的数据结构。这种 “复用底层树结构，上层差异化封装” 的设计思想，不仅体现了泛型编程的灵活性，也为我们理解容器实现提供了绝佳范例。本文将以 SGI-STL 源码为参考，带你从红黑树出发，一步步完成MyMap与MySet的封装，掌握容器设计的核心逻辑。

一、底层逻辑：为什么红黑树能同时支撑 Map 与 Set？

要理解map和set的封装，首先需要明确一个关键问题：红黑树如何同时存储 “key-only”（set）和 “key-value”（map）数据？

答案藏在红黑树的泛型设计中。SGI-STL 的红黑树通过三个核心模板参数实现了通用性：

Key：用于查找、删除的键类型（set和map的查找都依赖Key）。
Value：红黑树节点实际存储的数据类型（set中是Key，map中是pair<const Key, T>）。
KeyOfValue：从Value中提取Key的仿函数（解决 “如何从存储的Value中获取用于比较的Key” 问题）。

这种设计的精妙之处在于：红黑树的核心逻辑（插入、查找、旋转）与Value的具体类型解耦，只需通过KeyOfValue仿函数 “告诉” 红黑树 “如何取 Key”，就能同时支撑set和map两种场景。

1.1 SGI-STL 核心框架剖析

我们先通过源码片段理解 STL 的设计思路（简化后）：

// 1. 红黑树类（stl_tree.h）：泛型设计，不绑定具体Value类型
template <class Key, class Value, class KeyOfValue, class Compare>
class rb_tree {
protected:
    struct __rb_tree_node {
        Value value_field;  // 存储实际数据（set的Key / map的pair）
        __rb_tree_node* parent;
        __rb_tree_node* left;
        __rb_tree_node* right;
        Colour color;
    };
public:
    pair<iterator, bool> insert_unique(const Value& x);  // 插入（不允许重复）
    iterator find(const Key& x);                         // 按Key查找
private:
    __rb_tree_node* _root;
};

// 2. Set类（stl_set.h）：Value = Key，KeyOfValue直接返回Key
template <class Key, class Compare>
class set {
public:
    typedef Key key_type;
    typedef Key value_type;  // set的Value就是Key
private:
    // 仿函数：从Value（Key）中提取Key
    struct identity {
        const Key& operator()(const value_type& x) { return x; }
    };
    // 实例化红黑树：Key=Key，Value=Key，KeyOfValue=identity
    typedef rb_tree<key_type, value_type, identity, Compare> rep_type;
    rep_type t;  // 底层红黑树对象
};

// 3. Map类（stl_map.h）：Value = pair<const Key, T>，KeyOfValue取pair.first
template <class Key, class T, class Compare>
class map {
public:
    typedef Key key_type;
    typedef T mapped_type;
    typedef pair<const Key, T> value_type;  // map的Value是键值对
private:
    // 仿函数：从Value（pair）中提取Key（pair.first）
    struct select1st {
        const Key& operator()(const value_type& x) { return x.first; }
    };
    // 实例化红黑树：Key=Key，Value=pair，KeyOfValue=select1st
    typedef rb_tree<key_type, value_type, select1st, Compare> rep_type;
    rep_type t;  // 底层红黑树对象
};

核心结论：

set和map的差异仅在于 **Value的类型和KeyOfValue仿函数的实现 **，底层红黑树的核心逻辑完全复用。

map的Value是pair<const Key, T>，其中Key设为const是为了禁止修改（保证二叉搜索树的有序性），而T（value）可修改。

二、封装前的准备：改造红黑树以支持泛型

在之前实现的红黑树基础上，我们需要做三点改造，使其能支撑MyMap和MySet：

模板参数调整：增加KeyOfValue仿函数参数，用于从Value中提取Key。
迭代器实现：红黑树需要提供中序迭代器（map和set的迭代器是有序的，对应中序遍历）。
插入返回值优化：map的[]运算符依赖插入结果（是否插入成功），需将Insert返回值改为pair<iterator, bool>。

2.1 改造红黑树节点与类模板

// 1. 颜色枚举
enum Colour { RED, BLACK };

// 2. 红黑树节点：存储Value（泛型）
template <class T>
struct RBTreeNode {
    T _data;                  // 存储实际数据（set的Key / map的pair）
    RBTreeNode<T>* _left;     // 左子节点
    RBTreeNode<T>* _right;    // 右子节点
    RBTreeNode<T>* _parent;   // 父节点（用于回溯调整）
    Colour _col;              // 节点颜色

    RBTreeNode(const T& data)
        : _data(data)
        , _left(nullptr)
        , _right(nullptr)
        , _parent(nullptr)
        , _col(RED)  // 新节点默认红色（减少规则破坏）
    {}
};

// 3. 红黑树类：增加KeyOfValue仿函数参数
template <class K, class T, class KeyOfValue>
class RBTree {
    typedef RBTreeNode<T> Node;
public:
    // 后续实现迭代器、Insert、Find等接口
    typedef RBTreeIterator<T, T&, T*> Iterator;       // 普通迭代器
    typedef RBTreeIterator<T, const T&, const T*> ConstIterator;  // 常量迭代器

    Iterator Begin();
    Iterator End();
    ConstIterator Begin() const;
    ConstIterator End() const;

    pair<Iterator, bool> Insert(const T& data);
    Iterator Find(const K& key);

private:
    // 辅助函数：旋转（与之前实现一致，无需修改）
    void RotateL(Node* parent);
    void RotateR(Node* parent);
    // 销毁树（析构用）
    void Destroy(Node* root);

    Node* _root = nullptr;  // 根节点
};

2.2 实现红黑树迭代器（核心难点）

map和set的迭代器是双向迭代器，且遍历顺序为中序遍历（保证有序）。迭代器的核心是实现operator++和operator--，难点在于 “如何找到中序遍历的下一个 / 上一个节点”。

2.2.1 迭代器类设计

迭代器本质是 “封装节点指针，并提供指针 - like 接口”，需存储当前节点和根节点（用于--end()的特殊处理）：

template <class T, class Ref, class Ptr>
struct RBTreeIterator {
    typedef RBTreeNode<T> Node;
    typedef RBTreeIterator<T, Ref, Ptr> Self;

    Node* _node;   // 当前指向的节点
    Node* _root;   // 红黑树的根节点（用于--end()）

    // 构造函数
    RBTreeIterator(Node* node, Node* root)
        : _node(node)
        , _root(root)
    {}

    // 解引用：返回当前节点的数据
    Ref operator*() { return _node->_data; }
    // 箭头运算符：返回数据的指针（支持it->first / it->second）
    Ptr operator->() { return &_node->_data; }

    // 相等/不等判断
    bool operator==(const Self& s) const { return _node == s._node; }
    bool operator!=(const Self& s) const { return _node != s._node; }

    // 核心：operator++（中序遍历的下一个节点）
    Self& operator++();
    // 核心：operator--（中序遍历的上一个节点）
    Self& operator--();
};

2.2.2 实现 operator++（中序下一个节点）

中序遍历的顺序是 “左子树 → 根节点 → 右子树”，寻找下一个节点分两种情况：

当前节点有右子树：下一个节点是右子树的 “最左节点”（右子树中序遍历的第一个节点）。
当前节点无右子树：下一个节点是 “第一个祖先节点，且当前节点是该祖先的左孩子”（回溯到父节点，直到找到左分支）。

Self& RBTreeIterator<T, Ref, Ptr>::operator++() {
    if (_node->_right) {
        // 情况1：有右子树 → 找右子树的最左节点
        Node* leftMost = _node->_right;
        while (leftMost->_left) {
            leftMost = leftMost->_left;
        }
        _node = leftMost;
    } else {
        // 情况2：无右子树 → 回溯找“左分支祖先”
        Node* cur = _node;
        Node* parent = cur->_parent;
        // 当cur是parent的右孩子时，继续回溯
        while (parent && cur == parent->_right) {
            cur = parent;
            parent = cur->_parent;
        }
        // 此时parent就是目标节点（或nullptr，即end()）
        _node = parent;
    }
    return *this;
}

2.2.3 实现 operator--（中序上一个节点）

逻辑与operator++对称，分三种情况：

当前节点是 nullptr（end ()）：上一个节点是整棵树的 “最右节点”（中序遍历的最后一个节点）。
当前节点有左子树：上一个节点是左子树的 “最右节点”（左子树中序遍历的最后一个节点）。
当前节点无左子树：上一个节点是 “第一个祖先节点，且当前节点是该祖先的右孩子”。

Self& RBTreeIterator<T, Ref, Ptr>::operator--() {
    if (_node == nullptr) {
        // 情况1：end() → 找整棵树的最右节点
        Node* rightMost = _root;
        while (rightMost && rightMost->_right) {
            rightMost = rightMost->_right;
        }
        _node = rightMost;
    } else if (_node->_left) {
        // 情况2：有左子树 → 找左子树的最右节点
        Node* rightMost = _node->_left;
        while (rightMost->_right) {
            rightMost = rightMost->_right;
        }
        _node = rightMost;
    } else {
        // 情况3：无左子树 → 回溯找“右分支祖先”
        Node* cur = _node;
        Node* parent = cur->_parent;
        // 当cur是parent的左孩子时，继续回溯
        while (parent && cur == parent->_left) {
            cur = parent;
            parent = cur->_parent;
        }
        _node = parent;
    }
    return *this;
}

2.2.4 红黑树的 Begin () 与 End ()

Begin()：返回中序遍历的第一个节点（整棵树的最左节点）。

End()：返回 nullptr（作为迭代器的结束标记，与 STL 的哨兵节点逻辑一致）。

template <class K, class T, class KeyOfValue>
typename RBTree<K, T, KeyOfValue>::Iterator RBTree<K, T, KeyOfValue>::Begin() {
    Node* leftMost = _root;
    // 找最左节点
    while (leftMost && leftMost->_left) {
        leftMost = leftMost->_left;
    }
    return Iterator(leftMost, _root);
}

template <class K, class T, class KeyOfValue>
typename RBTree<K, T, KeyOfValue>::Iterator RBTree<K, T, KeyOfValue>::End() {
    // end()返回nullptr
    return Iterator(nullptr, _root);
}

// 常量版本（ConstIterator）实现类似，略

2.3 优化 Insert 返回值

为了支持map的[]运算符，Insert需要返回 “插入的节点迭代器” 和 “是否插入成功”（避免重复插入），因此返回值改为pair<Iterator, bool>：

template <class K, class T, class KeyOfValue>
pair<typename RBTree<K, T, KeyOfValue>::Iterator, bool> 
RBTree<K, T, KeyOfValue>::Insert(const T& data) {
    KeyOfValue kot;  // 用于从data中提取Key

    // 1. 空树处理
    if (_root == nullptr) {
        _root = new Node(data);
        _root->_col = BLACK;
        return make_pair(Iterator(_root, _root), true);
    }

    // 2. 按二叉搜索树规则找插入位置
    Node* parent = nullptr;
    Node* cur = _root;
    while (cur) {
        if (kot(cur->_data) < kot(data)) {  // 比较Key
            parent = cur;
            cur = cur->_right;
        } else if (kot(cur->_data) > kot(data)) {
            parent = cur;
            cur = cur->_left;
        } else {
            // Key已存在，插入失败
            return make_pair(Iterator(cur, _root), false);
        }
    }

    // 3. 插入新节点（红色）
    cur = new Node(data);
    cur->_col = RED;
    if (kot(parent->_data) < kot(data)) {
        parent->_right = cur;
    } else {
        parent->_left = cur;
    }
    cur->_parent = parent;

    // 4. 维护红黑树规则（变色/旋转，与之前实现一致）
    while (parent && parent->_col == RED) {
        Node* grandfather = parent->_parent;
        if (parent == grandfather->_left) {
            Node* uncle = grandfather->_right;
            // 情况1：叔叔存在且为红 → 变色
            if (uncle && uncle->_col == RED) {
                parent->_col = BLACK;
                uncle->_col = BLACK;
                grandfather->_col = RED;
                cur = grandfather;
                parent = cur->_parent;
            } else {
                // 情况2/3：叔叔不存在或为黑 → 旋转+变色
                if (cur == parent->_right) {
                    RotateL(parent);
                    swap(parent, cur);  // 旋转后更新parent和cur
                }
                RotateR(grandfather);
                parent->_col = BLACK;
                grandfather->_col = RED;
                break;
            }
        } else {
            // 对称逻辑（parent是grandfather的右孩子），略
            Node* uncle = grandfather->_left;
            if (uncle && uncle->_col == RED) {
                parent->_col = BLACK;
                uncle->_col = BLACK;
                grandfather->_col = RED;
                cur = grandfather;
                parent = cur->_parent;
            } else {
                if (cur == parent->_left) {
                    RotateR(parent);
                    swap(parent, cur);
                }
                RotateL(grandfather);
                parent->_col = BLACK;
                grandfather->_col = RED;
                break;
            }
        }
    }
    _root->_col = BLACK;  // 确保根节点为黑

    // 5. 返回插入成功的迭代器和true
    return make_pair(Iterator(cur, _root), true);
}

三、封装 MySet：Key-Only 场景

MySet的核心特点是 “存储单个 Key，且 Key 不可重复、不可修改”。基于改造后的红黑树，只需定义KeyOfValue仿函数（直接返回 Key），并封装上层接口即可。

3.1 MySet 的完整实现

// Myset.h
#include "RBTree.h"

namespace bit {
template <class K>
class set {
public:
    // 仿函数：从Value（K）中提取Key（直接返回）
    struct SetKeyOfT {
        const K& operator()(const K& key) {
            return key;
        }
    };

    // 迭代器类型（复用红黑树的迭代器）
    typedef typename RBTree<K, const K, SetKeyOfT>::Iterator iterator;
    typedef typename RBTree<K, const K, SetKeyOfT>::ConstIterator const_iterator;

    // 迭代器接口
    iterator begin() { return _t.Begin(); }
    iterator end() { return _t.End(); }
    const_iterator begin() const { return _t.Begin(); }
    const_iterator end() const { return _t.End(); }

    // 插入接口（不允许重复）
    pair<iterator, bool> insert(const K& key) {
        return _t.Insert(key);
    }

    // 查找接口（按Key查找）
    iterator find(const K& key) {
        return _t.Find(key);
    }

private:
    // 底层红黑树：Key=K，Value=const K（禁止修改Key），KeyOfValue=SetKeyOfT
    RBTree<K, const K, SetKeyOfT> _t;
};

// 测试函数
void test_set() {
    bit::set<int> s;
    int a[] = {4, 2, 6, 1, 3, 5, 15, 7};
    for (auto e : a) {
        s.insert(e);
    }

    // 遍历set（中序有序）
    for (auto it = s.begin(); it != s.end(); ++it) {
        // *it = 10;  // 编译报错：Value是const K，不可修改
        cout << *it << " ";  // 输出：1 2 3 4 5 6 7 15
    }
    cout << endl;

    // 查找
    auto it = s.find(6);
    if (it != s.end()) {
        cout << "Found: " << *it << endl;  // 输出：Found: 6
    }
}
}  // namespace bit

关键细节：

Value类型设为const K：确保set的 Key 不可修改，避免破坏红黑树的有序性。

SetKeyOfT仿函数直接返回 Key：因为Value本身就是 Key。

四、封装 MyMap：Key-Value 场景

MyMap的核心特点是 “存储键值对pair<Key, T>，Key 不可修改但 Value 可修改”。与MySet的差异在于Value类型和KeyOfValue仿函数的实现。

4.1 MyMap 的完整实现

// Mymap.h
#include "RBTree.h"

namespace bit {
template <class K, class V>
class map {
public:
    // 仿函数：从Value（pair）中提取Key（pair.first）
    struct MapKeyOfT {
        const K& operator()(const pair<const K, V>& kv) {
            return kv.first;
        }
    };

    // 迭代器类型（复用红黑树的迭代器）
    typedef typename RBTree<K, pair<const K, V>, MapKeyOfT>::Iterator iterator;
    typedef typename RBTree<K, pair<const K, V>, MapKeyOfT>::ConstIterator const_iterator;

    // 迭代器接口
    iterator begin() { return _t.Begin(); }
    iterator end() { return _t.End(); }
    const_iterator begin() const { return _t.Begin(); }
    const_iterator end() const { return _t.End(); }

    // 插入接口（插入键值对）
    pair<iterator, bool> insert(const pair<K, V>& kv) {
        // 注意：红黑树存储的是pair<const K, V>，需转换
        return _t.Insert(pair<const K, V>(kv.first, kv.second));
    }

    // 查找接口（按Key查找）
    iterator find(const K& key) {
        return _t.Find(key);
    }

    // 核心：[]运算符（插入+访问/修改）
    V& operator[](const K& key) {
        // 插入默认键值对（Key=key，Value=默认构造）
        pair<iterator, bool> ret = insert(make_pair(key, V()));
        // 返回Value的引用，支持修改
        return ret.first->second;
    }

private:
    // 底层红黑树：Key=K，Value=pair<const K, V>（Key不可改），KeyOfValue=MapKeyOfT
    RBTree<K, pair<const K, V>, MapKeyOfT> _t;
};

// 测试函数
void test_map() {
    bit::map<string, string> dict;
    // 插入键值对
    dict.insert({"sort", "排序"});
    dict.insert({"left", "左边"});

    // 使用[]访问并修改
    dict["left"] = "左边（剩余）";  // 修改已有Value
    dict["insert"] = "插入";        // 插入新键值对
    dict["string"];                 // 插入默认Value（空字符串）

    // 遍历map（中序有序，按Key排序）
    for (auto it = dict.begin(); it != dict.end(); ++it) {
        // it->first = "test";  // 编译报错：first是const K，不可修改
        it->second += "x";       // 可修改Value
        cout << it->first << ":" << it->second << endl;
    }
    // 输出：
    // insert:插入x
    // left:左边（剩余）x
    // sort:排序x
    // string:x
}
}  // namespace bit

关键细节：

Value类型设为pair<const K, V>：first（Key）不可修改，second（Value）可修改。

[]运算符的实现逻辑：调用insert插入默认键值对，返回second的引用，支持 “访问已存在的 Value” 或 “插入新键值对后修改”。

五、总结：容器封装的核心思想

通过MyMap与MySet的封装，我们可以提炼出 STL 容器设计的核心思想：

底层复用：用一个泛型的红黑树支撑多种上层容器，减少代码冗余，降低维护成本。
接口隔离：上层容器（map/set）通过仿函数（KeyOfValue）和类型定义，隐藏底层实现细节，只暴露符合自身语义的接口。
语义约束：通过类型修饰（如const K）确保容器的语义正确性（set的 Key 不可改，map的 Key 不可改）。