STL unordered_set(hahs_set)详解

最新推荐文章于 2025-07-13 22:57:09 发布

MrZhanglver

最新推荐文章于 2025-07-13 22:57:09 发布

阅读量1k

点赞数

CC 4.0 BY-SA版权

分类专栏： STL 文章标签： STL 容器算法

本文链接：https://blog.youkuaiyun.com/qq_27802405/article/details/50917829

STL 专栏收录该内容

12 篇文章

订阅专栏

本文详细介绍了C++ STL中unordered_set的底层实现原理，包括hash函数、负载系数、碰撞解决方法等，并通过实例展示了常用函数的使用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

STL unordered_set(hahs_set)底层

#include<iostream>
#include<vector>
#include<string>
#include<unordered_set>
#include<iterator>
#include<algorithm>
using namespace std;

//被人代码 在VS2015下编写

//介绍hashset 与hashmap之前先了解 底层的hashtable

//C++11 用unordered_set 与unordered_map 代替hash_set 与hash_map
/*
hash函数：映射函数 将某个值映射到 某一个位置
负载系数：元素个数/hash表(使用开链法 时将大于1)
碰撞：不同元素被映射到相同位置 
解决碰撞方法：线性探测 二次探测 开链...
惰性删除：只标记该元素被删除 而非真正删除该元素

SGI STL 中bucket采用vector 每个bucket中的元素为一个链表

hashtable迭代器 必须维持与整个bucket的关系
//我摘取了 STL中 operator++的源代码
const node* old=cur;
cur=cur->next;
if(!cur)
{
size_type bucket=ht->bkt(old->val);
while(!cur&&++bucket<ht->bucket.size())
cur=ht->buckets[bucket];
return *this;
}
hashtable 只定义了前向操作 所以无后退操作 也无你想迭代器
//数组形式的hashtable 的表格大小需要为素数 （因为使用合数会使某些二进制位失效
增大碰撞几率）
开链法 不严格要求表的大小为质数 但是SGI的STL严格使用质数
是从53-4294967291ul 的无符号 质数 逐渐成两倍增长的趋势
用户指定大小后 内部返回大于该大小的最小质数为 表格大小
补充：C++11用 unordered_set 代替 hash_set 
其动态扩张大小次序改为 16 32 64 ...
hashtable 的插入有两种insert_unique 与insert_equal
hashtable 判断是否重建的方法：
当元素个数（含新插入元素）大于表的bucket个数时 便找下一个质数
,重新遍历整个表计算hash函数 插入到新位置
hashtable 的clear与copy_from都是一个一个删除与拷贝的 
*/
//下面介绍 hash_set的常用函数
//程序在VS2015 下编写 编译器提醒我 hash_set will be removed ,
//于是使用C++11中的unordered_set 代替
/*
构造函数 
unordered_set(size_t n,const hasher& hf,const key_equal& eq);
unordered_set(InputIterator f,Inputterator l,size_t n, Inputsize_t n,const hasher& hf,const key_equal& eq);
成员函数
pair<iterator,boolinsert(const value_type& obj);
void insert(InputIterator f,Inputterator l);
iterator find(const valu_type& obj);
size_type count(const key_type& key);//key元素的个数  不存在为0 存在为1
size_t bucket_count()const;//桶的个数 即vector的长度
size_t bucket_size(const size_t n);//第n个桶的元素个数
*/
struct Hash
{
	size_t operator()(const string& Str)const
	{
		unsigned long h = 0;
		for (size_t i = 0; i < Str.size(); ++i)
		{
			h = 5 * h + Str[i];
		}
		return (size_t)h;
	}
};

//千万不要忘记后const修饰
struct Compare
{
	bool operator()(const string& Str1, const string& Str2)const
	{
		return Str1 > Str2;
	}
};
int main(void)
{
	string array[7] = {"zhang","li","sun","zhao","wang","wu","zheng"};
	int len = sizeof(array) / sizeof(array[0]);//这样也可以 只是没试过
	unordered_set<string> MySet(array,array+len/*,100*/);//此处可以指定大小
	//unordered_set<string, Hash, Compare> MySet(array, array + len);
	typedef unordered_set<string>::iterator S_Ite;
	pair<S_Ite, bool> bInsert;
	bInsert = MySet.insert("zhang");//底层调用 hashtable的insert_unique
	if (bInsert.second)
	{
		cout <<"insert success!"<< *(bInsert.first) << endl;
	}
	string array2[] = {"1","2","3"};
	MySet.insert(array2,array2+3);
	cout << "bucket count: " << MySet.bucket_count() << endl;
	//64 这是c++11的改变  他把增长序列变为二次幂了（有一点奇怪 为什么 不是16呢）
	size_t Lastcount = MySet.bucket_count();
	/*
	int i = 0;
	while (1)
	{
		MySet.reserve(i);
		if (Lastcount != MySet.bucket_count())
		{
			Lastcount = MySet.bucket_count();
			cout << Lastcount << endl;
		}
		++i;
	}
	*/
	//打印结果为 16 32 64 128 与vector扩张是一样的 和SGI的不一样
	S_Ite ite = MySet.begin();
	while (ite != MySet.end())
	{
		cout << *ite << ":" << MySet.count(*ite) << endl;
		//输出每个元素的个数 对于unordered 来说 只会是 1或者0
		++ite;
	}
	puts("");
	for (int i = 0; i < (int)MySet.bucket_count(); ++i)
	{
		if (0 != MySet.bucket_size(i))
		{
			//输出每个桶里的元素个数
			cout << MySet.bucket_size(i);
		}
	}
	ite=MySet.find("zhang");
	if (MySet.end() != ite)
	{
		cout << "\nI find :" << *ite << endl;
	}
	return 0;
}