Redis 字符串数据结构(String)底层存储源码分析
以下涉及到的源码均来自 https://github.com/redis/redis (branch 6.2)
背景知识
在6.2这个版本redis支持的数据结构如下
<server.h>
/* The actual Redis Object */
#define OBJ_STRING 0 /* String object. */
#define OBJ_LIST 1 /* List object. */
#define OBJ_SET 2 /* Set object. */
#define OBJ_ZSET 3 /* Sorted set object. */
#define OBJ_HASH 4 /* Hash object. */
#define OBJ_MODULE 5 /* Module object. */
#define OBJ_STREAM 6 /* Stream object. */
可能大家比较熟悉和用的比较多的是前5种数据结构(后面两种是后来加入的)
redis数据结构编码类型
#define OBJ_ENCODING_RAW 0 /* Raw representation */
#define OBJ_ENCODING_INT 1 /* Encoded as integer */
#define OBJ_ENCODING_HT 2 /* Encoded as hash table */
#define OBJ_ENCODING_ZIPMAP 3 /* Encoded as zipmap */
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */
#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */
#define OBJ_ENCODING_INTSET 6 /* Encoded as intset */
#define OBJ_ENCODING_SKIPLIST 7 /* Encoded as skiplist */
#define OBJ_ENCODING_EMBSTR 8 /* Embedded sds string encoding */
#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */
#define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */
其中字符串涉及到的编码类型有
OBJ_ENCODING_RAW(原生,)
OBJ_ENCODING_INT(整型)
OBJ_ENCODING_EMBSTR(紧凑型, 前提条件字符串长度小于等于44,后面再分析这个数字是如何来的)
这些数据结构在redis内部表示
<server.h>
#define LRU_BITS 24
typedef struct redisObject {
//数据结构类型(如上所述)
unsigned type:4;
//编码格式(如上所述 如何存储在内存)
unsigned encoding:4;
//缓存策略数据
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;//结构体引用数
void *ptr;//实际的数据
} robj;
从redisObject可以算出该结构体占用内存大小
4bit + 4bit + 24bit + 4Byte + 4Byte = 12Byte
即是 sizeof(struct redisObject) = 12 (后面有用到)
Redis字符串(string)内部数据结构Sds
<sds.h>
typedef char *sds;
从上面看起来sds和普通字符串没啥区别, 其实redis为了节省内存做了更细的划分
<sds.h>
//sds类型
#define SDS_TYPE_5 0
#define SDS_TYPE_8 1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
//适用于len<2^5
struct __attribute__ ((__packed__)) sdshdr5 {
//高5位表示字符串长度, 低三位表示sds类型
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
//实际的字符串数据
char buf[];
};
//适用于2^5<=len<2^8
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
//适用于2^8<=len<2^16
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
//适用于2^16<=len<2^32
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
//适用于2^32<=len
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
除了sdshdr5的结构不太一样,其它结构都一样(适用于不同长度的字符串)
len -> 已使用的长度
alloc -> 分配的长度(除去了sds头和空’\0’结束符 - 可以更方便的计算可用长度 avail = alloc - len)
flags -> 低三位表示sds类型, 高五位目前还未使用到
buf[] -> 实际字符串数据
sds header里存有len长度的一个解释
You can print the string with printf() as there is an implicit \0 at the end of the string. However the string is binary safe and can contain \0 characters in the middle, as the length is stored in the sds header.
来看看sds的创建过程就能大概明白了
sds创建
<sds.c>
//创建sds的函数
sds _sdsnewlen(const void *init, size_t initlen, int trymalloc) {
void *sh;
sds s;
//1. 根据长度来确定sds类型
char type = sdsReqType(initlen);
/* Empty strings are usually created in order to append. Use type 8
* since type 5 is not good at this. */
if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
//2. 计算sds header大小
int hdrlen = sdsHdrSize(type

最低0.47元/天 解锁文章
4621





