Redis原理篇——IntSet

原创于 2025-08-11 09:36:25 发布 · 728 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#redis #数据库 #缓存

IntSet数据结构

`contents[]` 声明为 `int8_t`

表面类型 vs 实际类型
int8_t 是C语言中的最小内存单元（1字节），但 contents 数组的实际存储类型由 encoding 决定：
- 若 encoding = INTSET_ENC_INT16 → contents 实际是 int16_t[]（2字节/元素）
- 若 encoding = INTSET_ENC_INT32 → contents 实际是 int32_t[]（4字节/元素）
- 若 encoding = INTSET_ENC_INT64 → contents 实际是 int64_t[]（8字节/元素）
设计目的：动态内存分配
- int8_t* 提供统一的指针类型，通过 encoding 动态解释内存内容
- 示例：访问第 i 个元素时，实际按 *(int16_t*)(contents + i*2) 读取（当 encoding=INT16）

`encoding` 的作用

标记当前存储的整数类型
- INTSET_ENC_INT16 → 存储 int16_t（类似Java的 short）
- INTSET_ENC_INT32 → 存储 int32_t（类似Java的 int）
- INTSET_ENC_INT64 → 存储 int64_t（类似Java的 long）
支持编码动态升级
- 插入一个超出当前编码范围的整数时（如原为 INT16 时插入 32768），IntSet会自动：
  1. 升级 encoding（如 INT16 → INT32）
  2. 将旧数据扩展为新类型（如每个2B → 4B）
  3. 插入新值并更新 length

intset的源码如下

#ifndef __INTSET_H
#define __INTSET_H
#include <stdint.h>

/*
 * intset 结构定义
 */
typedef struct intset {
    // 编码方式
    // contents 数组的真正类型取决于编码方式
    uint32_t encoding;

    // 数组包含元素个数
    uint32_t length;

    // int8_t 类型的整数数组
    // 1 用于记录数据，具体类型是通过 encoding 决定的
    // 2 从内存使用的角度来看，整数数组就是一块连续内存空间，这样就避免了内存碎片，并提升了内存使用效率
    // 3 定位的时候，对数组地址进行操作，不是直接根据索引
    int8_t contents[];
} intset;

intset *intsetNew(void);
intset *intsetAdd(intset *is, int64_t value, uint8_t *success);
intset *intsetRemove(intset *is, int64_t value, int *success);
uint8_t intsetFind(intset *is, int64_t value);
int64_t intsetRandom(intset *is);
uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value);
uint32_t intsetLen(const intset *is);
size_t intsetBlobLen(intset *is);
int intsetValidateIntegrity(const unsigned char *is, size_t size, int deep);

#ifdef REDIS_TEST
int intsetTest(int argc, char *argv[], int accurate);
#endif

#endif // __INTSET_H

IntSet升级

步骤①：升级编码和扩容数组：系统将编码改为 INTSET_ENC_INT32（每个整数占用 4 字节）
步骤②：倒序拷贝现有元素：为了维持顺序和避免覆盖，系统倒序复制现有元素到新位置。图中示例：元素 20 从原位置（字节 4-6）拷贝到新位置（字节 8-12）
步骤③：添加新元素：将新元素 50000 放入数组末尾（字节 17-20）。
步骤④：更新元数据：最后，更新 intset 的 encoding 属性为 INTSET_ENC_INT32，length 属性改为 4（表示元素个数）。

IntSet升级底层源码

intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    // 1 获取 value 的编码方式
    uint8_t valenc = _intsetValueEncoding(value);
    //要插入的位置
    uint32_t pos;

    // 2 默认设置插入成功
    if (success) *success = 1;

    /* Upgrade encoding if necessary. If we need to upgrade, we know that
     * this value should be either appended (if > 0) or prepended (if < 0),
     * because it lies outside the range of existing values. */
    // 3 如果 value 的编码比整数集合现在的编码大，说明：
    // 3.1 表示 value 必然可以添加到整数集合中（因为编码要大，所集合中肯定不会有相同的元素）
    // 3.2 则需要对整数集合进行升级才能满足 value 所需的编码
    if (valenc > intrev32ifbe(is->encoding)) {
        /* This always succeeds, so we don't need to curry *success. */
        return intsetUpgradeAndAdd(is, value);

        // 4 运行到这里，表示整数集合现有的编码方式适用于 value
    } else {

        /* Abort if the value is already present in the set.
         * This call will populate "pos" with the right position to insert
         * the value when it cannot be found. */
        // 4.1  判断 value 是否在集合中存在
        // 如果存在，那么将 *success 设置为 0 ，并返回未经改动的整数集合。
        // 如果不存在，那么可以将 value 放入到 pos 指针位置，todo 注意是有序的位置
        if (intsetSearch(is, value, &pos)) {
            if (success) *success = 0;
            return is;
        }

        // 4.2 为 value 在集合中分配空间
        is = intsetResize(is, intrev32ifbe(is->length) + 1);

        // 4.3 如果新元素不是被添加到底层数组的末尾，那么需要对现有元素的数据进行移动，空出 pos 上的位置，用于设置新值。
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is, pos, pos + 1);
    }

    // 4.4 将新值设置到底层数组的 pos 位置
    _intsetSet(is, pos, value);

    /// 5 增加集合元素数量
    is->length = intrev32ifbe(intrev32ifbe(is->length) + 1);
    return is;
}


static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {
    // 1 获取整数集合当前编码方式
    uint8_t curenc = intrev32ifbe(is->encoding);

    // 2 获取新值所需的编码方式
    uint8_t newenc = _intsetValueEncoding(value);

    // 3 获取整数集合的元素数量
    int length = intrev32ifbe(is->length);

    // 4 根据 value 的值，决定是将它添加到底层数组的最前端还是最后端
    int prepend = value < 0 ? 1 : 0;

    /* First set new encoding and resize */
    // 5 更新集合的编码方式
    is->encoding = intrev32ifbe(newenc);

    // 6 根据新编码对集合进行空间调整（集合的编码方式变了，内存空间肯定也变）
    is = intsetResize(is, intrev32ifbe(is->length) + 1);

    /* Upgrade back-to-front so we don't overwrite values.
     * Note that the "prepend" variable is used to make sure we have an empty
     * space at either the beginning or the end of the intset. */
    /**
     * 7 将原编码数组的元素重新放置到新编码数组中（因为数组类型变了，重新放置是为了对原数据类型进行转换）
     * 1）根据整数集合原来的编码方式，依次从后向前取出集合中的元素。
     * 2）将元素以新编码的方式重新放入对应的位置
     * 注意：
     * 1）完成以上步骤后，集合中所有原有的元素就完成了从旧编码到新编码的转换。
     * 2）从后向前取出元素，是因为新分配的空间都放在数组的后端，所以程序先从后端向前端移动元素
     */
    while (length--)
        _intsetSet(is, length + prepend, _intsetGetEncoded(is, length, curenc));

    /* Set the value at the beginning or the end. */
    // 8 根据 prepend 的值，决定是添加到数组头还是数组尾
    if (prepend)
        _intsetSet(is, 0, value);
    else
        _intsetSet(is, intrev32ifbe(is->length), value);

    // 9 更新整数集合的元素数量
    is->length = intrev32ifbe(intrev32ifbe(is->length) + 1);
    return is;
}