Lua源码之表

最新推荐文章于 2022-03-19 20:40:37 发布

-沧海流云-

最新推荐文章于 2022-03-19 20:40:37 发布

阅读量303

点赞数

CC 4.0 BY-SA版权

分类专栏： Lua

本文链接：https://blog.youkuaiyun.com/wyk223344/article/details/107300061

Lua 专栏收录该内容

2 篇文章

订阅专栏

lua版本：5.3.5

数据结构

table是lua唯一的数据结构，也是lua最明显的特色之一，使用者可以在table的基础上实现各种各样的数据结构。table即可以作为数组，又可以作为哈希表，而在实现上，源码里也是将table的内部存储分为了数组和哈希表。

/*
** Tables
*/

typedef union TKey {
  struct {
    TValuefields;
    int next;  /* for chaining (offset for next node) */
  } nk;
  TValue tvk;
} TKey;


typedef struct Node {
  TValue i_val;
  TKey i_key;
} Node;


typedef struct Table {
  CommonHeader;
  lu_byte flags;  /* 1<<p means tagmethod(p) is not present */
  lu_byte lsizenode;  /* log2 of size of 'node' array */
  unsigned int sizearray;  /* size of 'array' array */
  TValue *array;  /* array part */
  Node *node;
  Node *lastfree;  /* any free position is before this position */
  struct Table *metatable;
  GCObject *gclist;
} Table;

TKey：哈希表里每个节点的key值，除了包含作为key的值，还指向同哈希值的下一个节点。
Node：哈希表的内的每个节点，包含key和value。
Table：表结构
- lsizenode：哈希表大小的log2(哈希表大小为2的幂次)。
- sizearray：数组大小(数组大小为2的幂次)。
- array：数组头部指针。
- node：哈希表头部指针。
- lastfree：指向哈希表内最后一个空着的节点。
- metatable：元表。
- gclist：用于垃圾回收。

结构本身很简单明晰，唯一需要多说下的是哈希表的结构，这里推荐篇文章。
简单来说，这是个闭散列表，TKey里指向的下一个节点就是哈希表内的节点，当新元素(new)的哈希值所指向的位置如果被占据，则检查该位置的老元素(old)是否应该在这：

如果old应在此处，则new找空位，并和old的next连接。
如果old不应该在此处，则old另找新位置，new占据此处。

实现

新增元素

/*
** inserts a new key into a hash table; first, check whether key's main
** position is free. If not, check whether colliding node is in its main
** position or not: if it is not, move colliding node to an empty place and
** put new key in its main position; otherwise (colliding node is in its main
** position), new key goes to an empty position.
*/
TValue *luaH_newkey (lua_State *L, Table *t, const TValue *key) {
  Node *mp;
  TValue aux;
  if (ttisnil(key)) luaG_runerror(L, "table index is nil");
  else if (ttisfloat(key)) {
    lua_Integer k;
    if (luaV_tointeger(key, &k, 0)) {  /* does index fit in an integer? */
      setivalue(&aux, k);
      key = &aux;  /* insert it as an integer */
    }
    else if (luai_numisnan(fltvalue(key)))
      luaG_runerror(L, "table index is NaN");
  }
  mp = mainposition(t, key);
  if (!ttisnil(gval(mp)) || isdummy(t)) {  /* main position is taken? */
    Node *othern;
    Node *f = getfreepos(t);  /* get a free place */
    if (f == NULL) {  /* cannot find a free place? */
      rehash(L, t, key);  /* grow table */
      /* whatever called 'newkey' takes care of TM cache */
      return luaH_set(L, t, key);  /* insert key into grown table */
    }
    lua_assert(!isdummy(t));
    othern = mainposition(t, gkey(mp));
    if (othern != mp) {  /* is colliding node out of its main position? */
      /* yes; move colliding node into free position */
      while (othern + gnext(othern) != mp)  /* find previous */
        othern += gnext(othern);
      gnext(othern) = cast_int(f - othern);  /* rechain to point to 'f' */
      *f = *mp;  /* copy colliding node into free pos. (mp->next also goes) */
      if (gnext(mp) != 0) {
        gnext(f) += cast_int(mp - f);  /* correct 'next' */
        gnext(mp) = 0;  /* now 'mp' is free */
      }
      setnilvalue(gval(mp));
    }
    else {  /* colliding node is in its own main position */
      /* new node will go into free position */
      if (gnext(mp) != 0)
        gnext(f) = cast_int((mp + gnext(mp)) - f);  /* chain new position */
      else lua_assert(gnext(f) == 0);
      gnext(mp) = cast_int(f - mp);
      mp = f;
    }
  }
  setnodekey(L, &mp->i_key, key);
  luaC_barrierback(L, t, key);
  lua_assert(ttisnil(gval(mp)));
  return gval(mp);
}

代码注释本身就说的挺清楚了。在这里，mainposition是用于获取正确的哈希值，getfreepos是获取当前空余位置，rehash则是在没有空余位置时进行扩容。

Rehash

/*
** nums[i] = number of keys 'k' where 2^(i - 1) < k <= 2^i
*/
static void rehash (lua_State *L, Table *t, const TValue *ek) {
  unsigned int asize;  /* optimal size for array part */
  unsigned int na;  /* number of keys in the array part */
  unsigned int nums[MAXABITS + 1];
  int i;
  int totaluse;
  for (i = 0; i <= MAXABITS; i++) nums[i] = 0;  /* reset counts */
  na = numusearray(t, nums);  /* count keys in array part */
  totaluse = na;  /* all those keys are integer keys */
  totaluse += numusehash(t, nums, &na);  /* count keys in hash part */
  /* count extra key */
  na += countint(ek, nums);
  totaluse++;
  /* compute new size for array part */
  asize = computesizes(nums, &na);
  /* resize the table to new computed sizes */
  luaH_resize(L, t, asize, totaluse - na);
}

rehash的主要工作时统计当前table种到底有多少有效键值对，以及决定数组部分需要开辟多少空间，其原则时最终数组部分的利用率需要超过50%。
这里用到了数组nums来帮助统计，统计的是在 $\displaystyle 2^{i-1}$ 与 $\displaystyle 2^{i}$ 的key的数量。
numusearray用于统计数组部分的键值。
numusehash用于统计哈希表部分的键值，需要注意的是，这里会把na(数组部分键值数)也传进去，对于可以加入数组部分的键，则会加到更新nums并累加na。
computesizes计算在不低于50%利用率下，数组所应该维持的空间大小。
luaH_resize则是根据统计数据，将不能放入数组的键值塞入哈希表。

查询操作

/*
** main search function
*/
const TValue *luaH_get (Table *t, const TValue *key) {
  switch (ttype(key)) {
    case LUA_TSHRSTR: return luaH_getshortstr(t, tsvalue(key));
    case LUA_TNUMINT: return luaH_getint(t, ivalue(key));
    case LUA_TNIL: return luaO_nilobject;
    case LUA_TNUMFLT: {
      lua_Integer k;
      if (luaV_tointeger(key, &k, 0)) /* index is int? */
        return luaH_getint(t, k);  /* use specialized version */
      /* else... */
    }  /* FALLTHROUGH */
    default:
      return getgeneric(t, key);
  }
}

当查询键为整数键并在数组范围内时，则在数组部分查询。否则，根据键在哈希表查询。当存在相同哈希的冲突键值对时，则根据next遍历链表查询。

总结

查找：
- 如果输入的key是正整数，并且它的值>0&&<=数组大小，则尝试在数组部分查找
- 否则则尝试在散列表部分查找：计算出key的散列值，根据此散列值访问Node数组得到散列桶的位置，遍历旗下所有链表元素
新增元素：
- 哈希表：
  - lastfree找空位置，lastfree与表头重合表示表满了，进行rehash
  - 新元素(new)的哈希值所指向的位置如果被占据，则检查该位置的老元素(old)是否应该在这：
    - 如果old应在此处，则new找空位，并和old的next连接
    - 如果old不应该在此处，则old另找新位置，new占据此处
rehash：基本原则为最终数组部分的利用率需要超过50%。