STL源码剖析hashtable_linear hash和chaining-优快云博客

本文链接：https://blog.youkuaiyun.com/ThorKing01/article/details/105864704

hashtable概述

在前面介绍的RB-tree中，可以看出红黑树的插入、查找、删除的平均时间复杂度为O(nlogn)。但这是基于一个假设：输入数据具有随机性。而哈希表/散列表hash table在插入、删除、查找上具有“平均常数时间复杂度”O(1)；这种表现是以统计为基础，且不依赖输入数据的随机性。

hashtable的实现主要要通过几种方式：线性探测(linear probing)，二次探测(quadratic probing)，开链(separate chaining)......

在SGI STL中，就是使用开链这种方法。hash table表格内的每个元素为一个桶子bucket。这里bucket所维护的linked list不是

STL得list或slist，而是下面的hashtable node所形成的list。而buckets的聚合体使用vector来实现。

template <class Value>
struct __hashtable_node
{
    __hashtable_node* next;
    Value val;
}

使用注意

1.在使用hashtable的时候，不能直接调用<stl_hashtable.h>,应该含乳有用到hashtable的容器头文件，例如：<hash_set.h>和<hash_map.h>。

#include<hash_set>
#include<hash_map>

2.hash function只能处理int,short,long,char和char*。

不能处理string，double和float类型，这三个类型需要用户自定义hash function.

3.键值相同的元素，一定落在同一个bucket list中，键值不同的元素，有可能落在同一个bucket list中。

插入操作和表格重整

当对hashtable进行插入操作的时候，会判断需要重整表格，也就是buckets这个vector。

书中讲到：“表格重建与否”的判断原则颇为奇特，是那元素个数（把新增元素计入后）和buckets这个vector的大小来比较。如果前者大于后者，就重建表格。由此可判知，每个bucket（list）的最大容量和buckets vector的大小相同。

这里可以理解为，最糟糕的情况下，每个元素都在第一个bucket中，那么该list的容量最大与buckets的大小相同。在最好的情况下，是每个bucket中都只有一个元素，从而可以使得查找元素的速度很快。但是在常规情况下，肯定会有某些bucket中的元素不止一个，从而使得有的bucket中为空。

不允许重复插入

pair<iterator,bool>insert_unique(const value_type& obj)
{
    resize(num_elements+1);//在该函数中判断是否需要重整表格，若需要就进行重整
    return insert_unique_noresize(obj);
}

表格重整

template <class V, class K, class HF, class Ex, class Eq, class A>
void hashtable<V, K, HF, Ex, Eq, A>::resize(size_type num_elements_hint)
{
  const size_type old_n = buckets.size();//bucket vector 的大小
  /*如果元素个数(把新增元素计入后)比bucket vector 大，则需要重建表格*/
  if (num_elements_hint > old_n) {
      const size_type n = next_size(num_elements_hint);//找出下一个质数
		
      if (n > old_n) { //old_n不是质数表里面的最大值时，才可扩展
        vector<node*, A> tmp(n, (node*)0);//设立新的bucket vector，大小为n
	    //以下处理每一个旧的bucket
        for (size_type bucket = 0; bucket < old_n; ++bucket) {
            node* first = buckets[bucket];//指向节点所对应之串行(链表)的起始节点
            while (first) {//处理单个bucket中的链表
                size_type new_bucket = bkt_num(first->val, n);//找出节点落在哪一个新的bucket内
                buckets[bucket] = first->next;//令旧bucket指向其所对应的链表的下一个节点，以便迭代处理
                /*下面将当前节点插入到新的bucket内，成为其对应链表的第一个节点，这里的实现比较巧妙
                相当于插入新节点到新bucket vector中，新插入的元素插入到链表的首位置，这里不同于一般的插入的是，
                由于之前已有元素占据空间，这里只是修改节点指针指向*/
                first->next = tmp[new_bucket];
                tmp[new_bucket] = first;
                first = buckets[bucket];//回到旧bucket所指的待处理链表，准备处理下一个节点
            }
        }
        buckets.swap(tmp);//vector::swap 新旧两个buckets 对调（浅修改）
        /*对调两方如果大小不同，大的会变小，小的会变大，离开时释放local tmp 的内存*/
     }
  }
}

注意：每次调整的时候，不是扩大两倍，是以质数来设定表格大小。

/*质数表*/
// Note: assumes long is at least 32 bits.
static const int __stl_num_primes = 28;
static const unsigned long __stl_prime_list[__stl_num_primes] =
{
	53, 97, 193, 389, 769,
	1543, 3079, 6151, 12289, 24593,
	49157, 98317, 196613, 393241, 786433,
	1572869, 3145739, 6291469, 12582917, 25165843,
	50331653, 100663319, 201326611, 402653189, 805306457,
	1610612741, 3221225473, 4294967291
};
 
/*以下找出上述28个质数之中，最接近并大于 n的那个质数（有的话），没有取最大*/
inline unsigned long __stl_next_prime(unsigned long n)
{
	const unsigned long* first = __stl_prime_list;//首
	const unsigned long* last = __stl_prime_list + __stl_num_primes;//尾的下一位置
	/*泛型算法，返回一个迭代器，指向第一个不小于 n的元素*/
	const unsigned long* pos = lower_bound(first, last, n);
	return pos == last ? *(last - 1) : *pos;//如果没有比它大的就取最大的
}

size_type next_size (size_type n) const {return __stl_next_prime(n);}

不允许重复插入，不需要重建表格

/*插入元素，不允许重复*/
template <class V, class K, class HF, class Ex, class Eq, class A>
pair<hashtable<V, K, HF, Ex, Eq, A>::iterator, bool>
hashtable<V, K, HF, Ex, Eq, A>::insert_unique_noresize(const value_type& obj)
{
	const size_type n = bkt_num(obj);//定位bucket
	node* first = buckets[n];
 
	/*判断插入元素是否有重复*/
	for (node* cur = first; cur; cur = cur->next)
	if (equals(get_key(cur->val), get_key(obj)))
		return pair<iterator, bool>(iterator(cur, this), false);
 
	node* tmp = new_node(obj);//产生新节点 node_allocator::allocate()
	/*先插入节点放在链表最前面*/
	tmp->next = first;
	buckets[n] = tmp;
	++num_elements;//元素个数增加
	return pair<iterator, bool>(iterator(tmp, this), true);
}

允许重复插入

pair<iterator,bool>insert_equal(const value_type& obj)
{
    resize(num_elements+1);//在该函数中判断是否需要重整表格，若需要就进行重整
    return insert_equal_noresize(obj);
}

不需要重建的情况下，允许重复插入

/*插入元素，允许重复*/
template <class V, class K, class HF, class Ex, class Eq, class A>
hashtable<V, K, HF, Ex, Eq, A>::iterator
hashtable<V, K, HF, Ex, Eq, A>::insert_equal_noresize(const value_type& obj)
{
	const size_type n = bkt_num(obj);//定位bucket
	node* first = buckets[n];//链表头节点
 
	for (node* cur = first; cur; cur = cur->next)
	if (equals(get_key(cur->val), get_key(obj))) {//如果插入元素是重复的(与cur->val重复)
		node* tmp = new_node(obj);
		tmp->next = cur->next;//新增元素插入重复元素的后面
		cur->next = tmp;
		++num_elements;
		return iterator(tmp, this);
	}
	//没有重复，等同于insert_unique_noresize()
	node* tmp = new_node(obj);
	tmp->next = first;
	buckets[n] = tmp;
	++num_elements;
	return iterator(tmp, this);
}