redis源码学习--数据结构：ziplist实现

最新推荐文章于 2024-10-12 23:24:17 发布

Carson_zhong

最新推荐文章于 2024-10-12 23:24:17 发布

阅读量216

点赞数

CC 4.0 BY-SA版权

分类专栏：数据结构（C语言）：链表文章标签： c语言

本文链接：https://blog.youkuaiyun.com/dmgy614262711/article/details/105883313

数据结构（C语言）：链表专栏收录该内容

4 篇文章

订阅专栏

本文深入解析Redis中ziplist数据结构的设计原理与实现机制，重点介绍ziplist的entry结构，包括prevlen、encoding、entry-data等字段的含义及计算方法。同时，通过测试用例代码展示ziplist的创建与插入操作流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文接上篇"redis源码学习–数据结构：ziplist设计"
https://blog.youkuaiyun.com/dmgy614262711/article/details/105879969

一下是entry定义的数据结构

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* 记录前一个entry的长度所占的字节数 Bytes used to encode the previous entry len */
    unsigned int prevrawlen;     /* 前一个entry的字节数  Previous entry len. */
    unsigned int lensize;        /* 本entry的用于保存encoding和数据长度的字节数 Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /*  表示本entry数据的长度 Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* 头长度 prevrawlensize + lensize. */
    unsigned char encoding;      /* 编码方式 Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* 指向entry的指针 Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

我们学习完结构体定义之后可以直接先看测试用例代码，顺着测试用例的流程来学习ziplist的实现。
测试用例总入口为ziplistTest
先看第一个源码函数ziplistNew，新建一个ZIPList，新建即对zlentry结构赋初始值

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
	// // ziplist固有字段的长度，不包含entry， 分别是zlbytes+zltail + zllen + zlend
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE; 
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); // 把bytes转化成小端序，赋值给zlbytes
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); //zltail赋值 
    ZIPLIST_LENGTH(zl) = 0; // zllen赋值
    zl[bytes-1] = ZIP_END; // zlend赋值
    return zl;
}

接下来看插入函数ziplistPush，先找到p指针，如果是从表头插入，p指向zllen后面(第一个entry)的位置；如果是表尾插入，指向zlend字段。由于zlend字段固定位255，所以可以根据p[0]是否为255来识别是查头还是插尾。(这也应该是entry的第一个字节最大只能为254的原因)。
知识点：p指针的赋值使用了宏来封装。避免直接对指针地址进行操作，更加安全。也方便适用于其他地方的使用。

/* 
zl：表头
s：待插入的数据
slen：数据长度
where：插头还是差尾
 */
unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where) {
    unsigned char *p;
    p = (where == ZIPLIST_HEAD) ? ZIPLIST_ENTRY_HEAD(zl) : ZIPLIST_ENTRY_END(zl);
    return __ziplistInsert(zl,p,s,slen);
}

封装了一个内部函数__ziplistInsert(zl,p,s,slen),函数过长，就不贴了。如果从头插入，P的值是从list地址头偏移2个U32+1个U16的地址，从标位插入，那么p的值是表头+zllen-1，即zlend的地址，这样，如果p中第一个字节是255，表示是尾差，否则是头插。
__ziplistInsert函数的总结起来就是：
1、组装新的entyry结构

字段	解释
prevlen	表示前一个entry占用的字节数，占1个字节或者5个字节，第一个字节为0xFE表示需要5个字节表示
encoding	表示编码格式，包含了编码格式和编码后数据的长度
entry-data	表示编码后的值

对于从头插来说，prevlen值就是0，占用是1个字节。
软件中计算entry字段的长度代码如下，先检查是否可以被encode，可以则计算encoding字段所需的长度，在计算prelen字段的长度，最后计算entry-data的长度。
在这里插入图片描述
如果是头插，需要刷新下一个entry的prelen字段，如果原先prelen字段的长度不够，那就需要扩充到5个字节，意味的之后的所有的结构都要调整位置，毕竟zipList是内存连续存储的。下面代码检查下一个entry的prelen字段的长度是否足够，如果不够，nextdiff就是额外需要的长度。源码中只检查了下一个entry是否需要扩充长度，实际上如果下一个字段长度扩充了，那么再下一个entry的prelen也要刷新，一直到最后一个。源码中在后面有专门的处理。
在这里插入图片描述
计算出下一个entry的prelen字段是否需要扩充，扩充保留在nextdiff(取值为0,4或-4)中。负数表示从5份字节变为4个字节，forcelarge值会置1。
之后是重新申请内存，代码如下。由于内存块可能会会更换，所以需要先算出offset，才能恢复p的位置。
在这里插入图片描述
如果是尾插，到此可以直接刷新zltail字段。代码如下：

头插比较复杂，代码如下分为4步：
1)移动内存，留出供新entry的位置。
2)根据之前的计算结果调整下一个entry的prevlen和prevlen所占的长度
3)刷新zltail字段(表示ziplist头指针到最后一个entry的偏移地址)，先不包含nextdiff。如果下一个entry是最后一个entry，自己长度的变化不会影响zltail字段。否则zltail需要需要加入nextdiff，下一步即此处理。
4)获取出下一个entry的结构，判断是否是最后一个entry，如果不是，那需要刷新zltail字段，加上nextdiff
在这里插入图片描述
如果nextdiff的值不为0，意味着下一个entry的长度需要调整，同理需要逐个后面的entry，看是否刷新prevlen以及其所占的内存。代码如下：

调整完所有原先的entry之后封装新的entry，代码如下，不做解释

至此，插入一个entry的流程完毕。

总结：
1、代码中对于结构体元素的取值和赋值都使用了宏封装，屏蔽细节，使代码更加紧凑，比如ZIP_DECODE_PREVLEN，ZIPLIST_ENTRY_TAIL。
2、代码注释丰富，没个功能块都有注释。
3、主函数__ziplistInsert分层清晰，都是对ziplist级别的元素进行处理，其他低层的细节都封装后直接调用
4、相对于尾差，前插效率更低，因为可能需要调整后面没个entry的prevlen字段，并且可能还需要扩充后面entry的内存空间。

下一步我们计划学习函数__ziplistCascadeUpdate。