c++ list 修改_redis基础数据结构——list篇(adlist, ziplist, quicklist)上篇

前言

上周日承诺的转载又双叒叕鸽了,本周六原创更新继续分享redis相关内容。

其实今天分享这三块内容源头上一样的东西,是线性表。线性表是最基础的一种数据结构,不管哪门语言,哪所大学,学完hello world就是学习线性表。在redis中对线性表的优化下了蛮多功夫,有些思想值得去了解下。

线性表简单回顾

线性表主要分为顺序表和链表。顺序表就是一大块儿连续的存储空间,比如数组,一大块儿连续的存储空间,它的相对位置是连续的,寻址是O(1)复杂度。但是有些插入删除操作,涉及到后面所有元素的挪动与修改,最坏情况是O(n)复杂度。

链表每个节点记录好它的后继节点。稍稍不同的还可以根据需求记录其他信息,比如双向链表会记录前驱节点。 会每个节点加多许多额外的存储信息。

这里面插入和删除都是O(1),但是找到单个节点需要O(n)的复杂度。范围查找可以使用跳表,这个后续的数据结构里会说。

不管什么样的数据结构,只要是线性存储,都是这两种情形的延伸,只是在不同的场景下需要有针对性的选择,针对性的优化。

adlist 双向链表

https://github.com/redis/redis/blob/6.0/src/adlist.h

typedef struct listNode {    struct listNode *prev;    struct listNode *next;    void *value;} listNode;typedef struct listIter {    listNode *next;    int direction;} listIter;typedef struct list {    listNode *head;    listNode *tail;    void *(*dup)(void *ptr);    void (*free)(void *ptr);    int (*match)(void *ptr, void *key);    unsigned long len;} list;

定义了节点,迭代器,list 结构。

list结构里 包含了 头尾节点, 复制方法、释放方法、比较方法、和长度。

https://github.com/redis/redis/blob/6.0/src/adlist.c

头插:

/* Add a new node to the list, to head, containing the specified 'value' * pointer as value. * * On error, NULL is returned and no operation is performed (i.e. the * list remains unaltered). * On success the 'list' pointer you pass to the function is returned. */list *listAddNodeHead(list *list, void *value){    listNode *node;    if ((node = zmalloc(sizeof(*node))) == NULL)        return NULL;    node->value = value;    if (list->len == 0) {        list->head = list->tail = node;        node->prev = node->next = NULL;    } else {        node->prev = NULL;        node->next = list->head;        list->head->prev = node;        list->head = node;    }    list->len++;    return list;}/* 

逻辑很简单,就是普通链表操作,新申请一个Node,分配空间,赋值。

如果是第一个节点,前驱后继节点是空,头尾都是该节点。

否则前驱为空,后继为头,后继的前驱为新节点,新节点为链表头。

查找

/* Search the list for a node matching a given key. * The match is performed using the 'match' method * set with listSetMatchMethod(). If no 'match' method * is set, the 'value' pointer of every node is directly * compared with the 'key' pointer. * * On success the first matching node pointer is returned * (search starts from head). If no matching node exists * NULL is returned. */listNode *listSearchKey(list *list, void *key){    listIter iter;    listNode *node;    listRewind(list, &iter);    while((node = listNext(&iter)) != NULL) {        if (list->match) {            if (list->match(node->value, key)) {                return node;            }        } else {            if (key == node->value) {                return node;            }        }    }    return NULL;}

就是遍历。。然后调用传入的match方法去比较。如果没有传match方法,然后比node的值。

翻转:

/* Create an iterator in the list private iterator structure */void listRewind(list *list, listIter *li) {    li->next = list->head;    li->direction = AL_START_HEAD;}void listRewindTail(list *list, listIter *li) {    li->next = list->tail;    li->direction = AL_START_TAIL;}

双向链表的翻转,翻转迭代器方向就好。

获取下个节点:

listNode *listNext(listIter *iter){    listNode *current = iter->next;    if (current != NULL) {        if (iter->direction == AL_START_HEAD)            iter->next = current->next;        else            iter->next = current->prev;    }    return current;}

通过迭代器里面设置的方向判断下个节点。

复制:

/* Duplicate the whole list. On out of memory NULL is returned. * On success a copy of the original list is returned. * * The 'Dup' method set with listSetDupMethod() function is used * to copy the node value. Otherwise the same pointer value of * the original node is used as value of the copied node. * * The original list both on success or error is never modified. */list *listDup(list *orig){    list *copy;    listIter iter;    listNode *node;    if ((copy = listCreate()) == NULL)        return NULL;    copy->dup = orig->dup;    copy->free = orig->free;    copy->match = orig->match;    listRewind(orig, &iter);    while((node = listNext(&iter)) != NULL) {        void *value;        if (copy->dup) {            value = copy->dup(node->value);            if (value == NULL) {                listRelease(copy);                return NULL;            }        } else            value = node->value;        if (listAddNodeTail(copy, value) == NULL) {            listRelease(copy);            return NULL;        }    }    return copy;}

遍历,然后尾插一个当前值的节点。

如果指定了dup方法可以用指定的方法计算值。

释放

/* Free the whole list. * * This function can't fail. */void listRelease(list *list){    listEmpty(list);    zfree(list);}/* Remove all the elements from the list without destroying the list itself. */void listEmpty(list *list){    unsigned long len;    listNode *current, *next;    current = list->head;    len = list->len;    while(len--) {        next = current->next;        if (list->free) list->free(current->value);        zfree(current);        current = next;    }    list->head = list->tail = NULL;    list->len = 0;}

释放比较简单。就是先释放掉每个节点,然后最后清理掉list的空间。

ziplist

d8292b797a41f04cba07357ce4dd361a.png概述

adlist还是比较简单的,都是基本的链表操作。与此同时,链表的副作用也比较明显,比如操作时候有比较多的碎片空间,以及保存了很多的额外信息。所以在较小的list时,redis采用了另一种结构,压缩list。

其实严格意义上来说,它并不是一个链表。它是统一分配了一大块内存。然后把所有的内容都相邻着存进这块内存中。没有链表的前驱后继指针信息。但是和数组不同的是,它每个元素分配的内存大小也是不同的。以及它可以以O(1)从两端增减内容。

d8292b797a41f04cba07357ce4dd361a.pngziplist 与其他数据结构的转换

新创建一个list时候,默认是使用的ziplist。当达到阈值限制后,会自动转化为对应的adlist。这里是一些默认的阈值,在redis.conf里可以配置。

list-max-ziplist-entries 512 list-max-ziplist-value 64 

类似的有:

hash-max-ziplist-entries 512 hash-max-ziplist-value 64 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 

这个配置的意思是:当发生以下两条之一的时候,ziplist当满足以下之一的时候,会自动转为adlist / dict 等。

  1. 数据个数超过512个的时候

    2.单个value长度超过64的时候

这是因为在ziplist在变大以后,每次操作带来的realloc,更容易引发内存拷贝,同时每次内存拷贝会拷贝更大的一块数据,使得性能变低。

d8292b797a41f04cba07357ce4dd361a.png原理与重要方法实现

https://github.com/redis/redis/blob/6.0/src/ziplist.h

https://github.com/redis/redis/blob/6.0/src/ziplist.c

其实ziplist.c 上面的注释讲的非常清楚,将近注释了200行,这里主要是结合源码的注释进行说明。我把源代码的注释贴在这。

/* The ziplist is a specially encoded dually linked list that is designed * to be very memory efficient. It stores both strings and integer values, * where integers are encoded as actual integers instead of a series of * characters. It allows push and pop operations on either side of the list * in O(1) time. However, because every operation requires a reallocation of * the memory used by the ziplist, the actual complexity is related to the * amount of memory used by the ziplist. * * ---------------------------------------------------------------------------- * * ZIPLIST OVERALL LAYOUT * ====================== * * The general layout of the ziplist is as follows: * *  ...  * * NOTE: all fields are stored in little endian, if not specified otherwise. * *  is an unsigned integer to hold the number of bytes that * the ziplist occupies, including the four bytes of the zlbytes field itself. * This value needs to be stored to be able to resize the entire structure * without the need to traverse it first. * *  is the offset to the last entry in the list. This allows * a pop operation on the far side of the list without the need for full * traversal. * *  is the number of entries. When there are more than * 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the * entire list to know how many items it holds. * *  is a special entry representing the end of the ziplist. * Is encoded as a single byte equal to 255. No other normal entry starts * with a byte set to the value of 255. * * ZIPLIST ENTRIES * =============== * * Every entry in the ziplist is prefixed by metadata that contains two pieces * of information. First, the length of the previous entry is stored to be * able to traverse the list from back to front. Second, the entry encoding is * provided. It represents the entry type, integer or string, and in the case * of strings it also represents the length of the string payload. * So a complete entry is stored like this: * *  * * Sometimes the encoding represents the entry itself, like for small integers * as we'll see later. In such a case the  part is missing, and we * could have just: * *  * * The length of the previous entry, , is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte representing the length as an unsinged 8 bit integer. When the length * is greater than or equal to 254, it will consume 5 bytes. The first byte is * set to 254 (FE) to indicate a larger value is following. The remaining 4 * bytes take the length of the previous entry as value. * * So practically an entry is encoded in the following way: * *  * * Or alternatively if the previous entry length is greater than 253 bytes * the following encoding is used: * * 0xFE <4 bytes unsigned little endian prevlen>  * * The encoding field of the entry depends on the content of the * entry. When the entry is a string, the first 2 bits of the encoding first * byte will hold the type of encoding used to store the length of the string, * followed by the actual length of the string. When the entry is an integer * the first 2 bits are both set to 1. The following 2 bits are used to specify * what kind of integer will be stored after this header. An overview of the * different types and encodings is as follows. The first byte is always enough * to determine the kind of entry. * * |00pppppp| - 1 byte *      String value with length less than or equal to 63 bytes (6 bits). *      "pppppp" represents the unsigned 6 bit length. * |01pppppp|qqqqqqqq| - 2 bytes *      String value with length less than or equal to 16383 bytes (14 bits). *      IMPORTANT: The 14 bit number is stored in big endian. * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes *      String value with length greater than or equal to 16384 bytes. *      Only the 4 bytes following the first byte represents the length *      up to 2^32-1. The 6 lower bits of the first byte are not used and *      are set to zero. *      IMPORTANT: The 32 bit number is stored in big endian. * |11000000| - 3 bytes *      Integer encoded as int16_t (2 bytes). * |11010000| - 5 bytes *      Integer encoded as int32_t (4 bytes). * |11100000| - 9 bytes *      Integer encoded as int64_t (8 bytes). * |11110000| - 4 bytes *      Integer encoded as 24 bit signed (3 bytes). * |11111110| - 2 bytes *      Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0001 and 1101) immediate 4 bit integer. *      Unsigned integer from 0 to 12. The encoded value is actually from *      1 to 13 because 0000 and 1111 can not be used, so 1 should be *      subtracted from the encoded 4 bit value to obtain the right value. * |11111111| - End of ziplist special entry. * * Like for the ziplist header, all the integers are represented in little * endian byte order, even when this code is compiled in big endian systems. * * EXAMPLES OF ACTUAL ZIPLISTS * =========================== * * The following is a ziplist containing the two elements representing * the strings "2" and "5". It is composed of 15 bytes, that we visually * split into sections: * *  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff] *        |             |          |       |       |     | *     zlbytes        zltail    entries   "2"     "5"   end * * The first 4 bytes represent the number 15, that is the number of bytes * the whole ziplist is composed of. The second 4 bytes are the offset * at which the last ziplist entry is found, that is 12, in fact the * last entry, that is "5", is at offset 12 inside the ziplist. * The next 16 bit integer represents the number of elements inside the * ziplist, its value is 2 since there are just two elements inside. * Finally "00 f3" is the first entry representing the number 2. It is * composed of the previous entry length, which is zero because this is * our first entry, and the byte F3 which corresponds to the encoding * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F" * higher order bits 1111, and subtract 1 from the "3", so the entry value * is "2". The next entry has a prevlen of 02, since the first entry is * composed of exactly two bytes. The entry itself, F6, is encoded exactly * like the first entry, and 6-1 = 5, so the value of the entry is 5. * Finally the special entry FF signals the end of the ziplist. * * Adding another element to the above string with the value "Hello World" * allows us to show how the ziplist encodes small strings. We'll just show * the hex dump of the entry itself. Imagine the bytes as following the * entry that stores "5" in the ziplist above: * * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64] * * The first byte, 02, is the length of the previous entry. The next * byte represents the encoding in the pattern |00pppppp| that means * that the entry is a string of length , so 0B means that * an 11 bytes string follows. From the third byte (48) to the last (64) * there are just the ASCII characters for "Hello World". * * ---------------------------------------------------------------------------- * * Copyright (c) 2009-2012, Pieter Noordhuis  * Copyright (c) 2009-2017, Salvatore Sanfilippo  * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * *   * Redistributions of source code must retain the above copyright notice, *     this list of conditions and the following disclaimer. *   * Redistributions in binary form must reproduce the above copyright *     notice, this list of conditions and the following disclaimer in the *     documentation and/or other materials provided with the distribution. *   * Neither the name of Redis nor the names of its contributors may be used *     to endorse or promote products derived from this software without *     specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */

d8292b797a41f04cba07357ce4dd361a.png注释的第16行

 * <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

说了宏观上ziplist的存储方式,是连续的一段内存区域。

zlbytes表示该ziplist占用的字节总数  ,占32位

zltail表示该ziplist总的偏移字节数  ,占32位

zllen表示数据中元素的个数  ,占16位

entry 每一个entry是一个元素   

zlend 固定的结束标记,值等于255.   ,占8位

如果没有指定,都是小端存储。

d8292b797a41f04cba07357ce4dd361a.png注释的47行 讲了entry的构成

* <prevlen> <encoding> <entry-data>

每一个entry由以下三部分构成

entry-data 就是存储的数据内容

prevlen 表示前一个元素的长度,这样可以在找它前一个元素的时候,快速找到,只需偏移prevlen的长度就ok了,这个参数是变长的参数。

encoding 表示前面数据的编码.

d8292b797a41f04cba07357ce4dd361a.pngprevlen的变长规则如下:

如果前面的元素,长度小于254个字节,  那么prevlen这里用unsinged 8 bit integer。

当长度大于或等于254,将消耗5个字节。第一个字节设置为254(FE)表示后面有一个较大的值,总体结构如下。

0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>

d8292b797a41f04cba07357ce4dd361a.pngencoding部分:

根据entry的内容不同,有以下N多种情况:(截取自注释80~106行)

 * |00pppppp| - 1 byte *      String value with length less than or equal to 63 bytes (6 bits). *      "pppppp" represents the unsigned 6 bit length. * |01pppppp|qqqqqqqq| - 2 bytes *      String value with length less than or equal to 16383 bytes (14 bits). *      IMPORTANT: The 14 bit number is stored in big endian. * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes *      String value with length greater than or equal to 16384 bytes. *      Only the 4 bytes following the first byte represents the length *      up to 2^32-1. The 6 lower bits of the first byte are not used and *      are set to zero. *      IMPORTANT: The 32 bit number is stored in big endian. * |11000000| - 3 bytes *      Integer encoded as int16_t (2 bytes). * |11010000| - 5 bytes *      Integer encoded as int32_t (4 bytes). * |11100000| - 9 bytes *      Integer encoded as int64_t (8 bytes). * |11110000| - 4 bytes *      Integer encoded as 24 bit signed (3 bytes). * |11111110| - 2 bytes *      Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0001 and 1101) immediate 4 bit integer. *      Unsigned integer from 0 to 12. The encoded value is actually from *      1 to 13 because 0000 and 1111 can not be used, so 1 should be *      subtracted from the encoded 4 bit value to obtain the right value. * |11111111| - End of ziplist special entry.

d8292b797a41f04cba07357ce4dd361a.png感受

以上就是数据结构定义的完全体。理解了这个数据结构,后面看到它操作增删改查,就灰常容易理解了。

小邢硬着头皮看完这个注释,只觉得设计的蛮精巧。其实这个ziplist,本质上是开辟了连续的内存,因为变长故不能像数组一样直接捞出来对应的value。

然后redis这边的实现,也是用了一个prevlen,和encoding这样的一种形式,可以在内存中前后自由的寻址。其实这个就和 前驱指针和后继指针思想是一样的有木有!!只不过adlist里面是指针,这里描述的是节点的长度,真の殊途同归。

所以这种方式解决了链表内存东一块西一块的问题,也省去了一些不必要的开销。当然,连续内存的最大弊端就是增减内容时候O(n),需要把后面的值移过来,ziplist只在小范围内使用,倒也还好。

d8292b797a41f04cba07357ce4dd361a.png一些重要接口 以及 quicklist

在下篇中继续更新。

总结

ziplist的一些重要接口实现,还有adlist和ziplist的结合产物——quicklist由于篇幅的关系在下篇中继续阐述。(太晚了,图书馆要闭馆赶人了,(笑)

其实redis中很多数据结构都是这样的形式,不光是线性表。在数据量小的时候,和数据量大的时候,用不同的数据结构,做成不同的数据工具去存储,在我们日常开发中也可以借鉴这种思想。每种结构都有它的优缺点和适宜场景,根据项目实际情况,灵活的运用。

b4215af45dc084d904cafeff2232922f.png

微信号|xiaoxingblog

欢迎关注小邢的技术博客!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值