哈希表

最新推荐文章于 2022-07-19 17:50:50 发布

f2016913

最新推荐文章于 2022-07-19 17:50:50 发布

阅读量962

点赞数

CC 4.0 BY-SA版权

分类专栏：数据结构文章标签： hash 哈希桶-线性探测

本文链接：https://blog.youkuaiyun.com/f2016913/article/details/70849292

数据结构专栏收录该内容

26 篇文章

订阅专栏

一：HashTable是散列表/哈希表，是根据关键字（key）而直接访问在内存存储位置的数据结构
通过关键值得函数将所需要的数据映射到表中的位置来访问数据，这个映射函数叫做散列函数，存放记录的数组叫做散列表；
构建哈希表的几种方法：
1：直接定址法–去关键字的某个线性函数为散列地址，hash(key)=key;或者是hash(key) =key*A+B;
2: 除留余数法–取关键值被某个不大于散列表表长m，除后所得的余数为散列地址hash(key) =key%m
3：平方取中；
4：折叠法;
5 : 随机数法；
6：数学分析法；
直接定址法：（高效，但是限定范围）
缺点是实际存储的关键字的集合可能比所给的范围小很多，就会造成内存的浪费；
除留余数法：
不同的key值可能经过哈希函数hash(key)处理后放到表中相同的位置，我们称这种情况为哈希冲突。任何的散列函数都无法避免哈希冲突；
二：解决哈希冲突的方法：闭散列方法–（开放定址法）
1：线性探测：
2：二次探测：
这里写图片描述
缺陷：当继续探测冲突的元素越来越多，查找的效率会越来越慢，
此时需要更大的空间来存放更多的冲突的数，这就需要二次探测；

这里面的开更大的空间不是简单的二倍的关系，如果直接开比原来大两倍的空间，还是会出现和线性探测相同的问题，
因此需要素数表做除数，这样就会减少冲突数；
3：素数表(使用素数做除数可以减少哈希冲突)
//使用素数表对齐做哈希表的容量，降低哈希冲突

const int _PrimeSize= 28;
static const unsigned long _PrimeList[_PrimeSize] =
{
53ul, 97ul, 193ul, 389ul, 769ul,
1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
49157ul, 98317ul, 196613ul, 393241ul,
786433ul,
1572869ul, 3145739ul, 6291469ul, 12582917ul,
25165843ul,
50331653ul, 100663319ul, 201326611ul, 402653189ul,
805306457ul,
1610612741ul, 3221225473ul, 4294967291ul
};

4：对于开放定址法影响冲突的是负载因子：
定义：负载因子 a = 填入表中的个数/表的长度；
a是散列表装满程度的标志，a与“填入表中的元素成正比”，所以a越大，表明填入表中的元素越多，产生冲突的元素越多。
对于开放定址法负载因子特别重要，因该严格控制在0.7~0.8左右；
三：开放定址法的实现：

#include<iostream>
#include<vector>
#include<string>
using namespace std;
namespace first
{
    enum State
    {
        EMPTY,
        EXIT,
        DELETE,
    };

    template<class K, class V>
    struct  HashNode
    {
        pair<K, V>_kv;
        State _s;//表示存的状态
    };
    //仿函数
    template<class K>
    struct __HashFunc
    {
        size_t operator()(const K&key)
        {
            return key;
        }
    };
    //字符串哈希算法
    template<>
    struct __HashFunc<string>
    {
        static size_t BKDRHash(const char* str)
        {
            unsigned int seed = 131;// 31 131 1313 13131 131313
            unsigned int hash = 0;
            while (*str)
            {
                hash = hash*seed + (*str++);
            }
            return(hash & 0x7FFFFFFF);
        }
        size_t operator()(const string &s)
        {
            return BKDRHash(s.c_str());
        }
    };
    template<class K, class V, class HashFunc = __HashFunc<K>>
    class HashTable
    {
        typedef HashNode<K, V> Node;
    public:
        HashTable()
            :_size(0)
        {}


        HashTable(size_t n)
            :_size(0)
        {
            _tables.resize(n);
        }
        //插入
        pair<Node*, bool> Insert(const pair<K, V> &kv)
        {
            _Check();
            size_t index = _HashFunc(kv.first);//除留余数法算位置

            while (_tables[index]._s == EXIT)//存在进行线性探测
            {
                if (_tables[index]._kv.first == kv.first)
                {
                    return make_pair(&_tables[index], false);//表明该位置存在
                }
                index += 1;

                if (index == _tables.size())
                {
                    index = 0;
                }
            }
            //插入成功
            _tables[index]._kv = kv;
            _tables[index]._s = EXIT;//将状态改为存在
            ++_size;

            return make_pair(&_tables[index], true);//插入成功返回
        }

        //查找
        Node*Find(const K&key)
        {
            size_t index = _HashFunc(key);
            while (_tables[index]._s != EMPTY)
            {
                if (_tables[index]._kv.first == key)
                {
                    if (_tables[index]._s == EXIT)
                    {
                        return &_tables[index];
                    }
                    else
                        return NULL;
                }
                index += 1;
                if (_index == _tables.size())
                {
                    index = 0;
                }
            }
            return NULL;
        }
        //删除
        bool Remove(const K&key)
        {
            Node*node = Find(key);
            if (node)
            {
                node->_s = DELETE;
                --_size;
                return true;
            }
            else
                return false;
        }

        void Swap(HashTable<K, V, HashFunc>&hf)
        {
            swap(_size, hf._size);
            _tables.swap(hf._tables);
        }
        //实现operator[]实现字典；
        V&operator[](const K&key)
        {
            pair<Node*, bool>ret = Insert(make_pair(key, V()));
            return ((ret.first)->_kv).second;
        }
    protected:
        //除留余数法算位置
        size_t _HashFunc(const K&key)
        {
            HashFunc hf;
            return hf(key) % _tables.size();
        }
        //素数表
        size_t GetNextPeime(size_t num)
        {
            const int _PrimeSize = 28;
            static const unsigned long _PrimeList[_PrimeSize] =
            {
                53ul, 97ul, 193ul, 389ul, 769ul,
                1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
                49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
                1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
                50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
                1610612741ul, 3221225473ul, 4294967291ul
            };

            for (size_t i = 0; i < _PrimeSize; ++i)
            {
                if (_PrimeList[i] > num)
                {
                    return _PrimeList[i];
                }
            }

            return _PrimeList[_PrimeSize - 1];
        }
        //检查是否需要增容
        void _Check()
        {
            if (_tables.size() == 0)
            {
                _tables.resize(GetNextPeime(0));
            }
            //考虑负载因子
            if (_size * 10 / _tables.size() == 7)
            {
                size_t NewSize = GetNextPeime(_tables.size());//新表的大小
                HashTable<K, V, HashFunc>NewHf(NewSize);
                //将旧表的元素插入到新表中
                for (size_t i = 0; i < _tables.size(); ++i)
                {
                    if (_tables[i]._s == EXIT)
                    {
                        NewHf.Insert(_tables[i]._kv);
                    }
                }
                Swap(NewHf);
            }
        }
    protected:
        vector<Node>_tables;
        size_t _size;//实际表中元素的个数

    };
    void HashTables()
    {
        int a[] = { 9, 18, 60, 29, 58 };

        HashTable<int, int> ht;
        for (size_t i = 0; i < sizeof(a) / sizeof(a[0]); ++i)
        {
            ht.Insert(make_pair(a[i], 1));
        }
        ht.Insert(make_pair(5, 1));
        ht.Insert(make_pair(6, 1));
        ht.Insert(make_pair(8, 1));
        ht.Insert(make_pair(30, 1));
        ht.Insert(make_pair(42, 1));

        HashTable<string, string> dict;
        dict["insert"] = "插入";
        dict["erase"] = "删除";
        dict["find"] = "查找";
    }
}

四：开链法的实现：
我们知道闭散列方法当负载因子大于0.8时，哈希表的查找效率就会变得很低效，如何解决？
哈希桶就是盛放不同key链表的容器（即是哈希表），在这里我们可以把每个key的位置看作是一个桶，桶里放了一个链表（vector可以实现动态增容）
这里写图片描述

开链法（哈希桶）
namespace second
{
    //哈希桶
    template<class K,class V>
    struct HashNode
    {
        pair<K, V> _kv;
        HashNode<K, V>*_next;

        HashNode(const pair<K, V>&kv)
            :_kv(kv)
            ,_next(NULL)
        {}
    };
     template<class K,class V,class HashFunc>
     class HashTable;
     template<class K, class V, class HashFunc = first::__HashFunc<K>>
    class HashTable
    {
        typedef HashNode<K, V>Node;
    public:
        HashTable()
            :_size(0)
        {}

        void Resize(size_t n)
        {
            _Check(n);
        }
        //构造函数
        HashTable(const HashTable<K, V, HashFunc>&ht)
            :_size(0)
        {
            _tables.resize(ht._tables.size());//开和原来一样大的空间
            for (size_t i = 0; i <ht._tables.size(); ++i)//拷贝节点的数据
            {
                Node*cur = ht._tables[i];
                while (cur)
                {
                    Insert(cur->_kv);
                    cur = cur->_next;
                }
            }
        }
        //拷贝构造
        HashTable<K, V, HashFunc>& operator =(HashTable<K, V, HashFunc> ht)
        {
            swap(_size, ht._size);
            _tables.swap(ht._tables);
            return *this;
        }
        ~HashTable()
        {
            _Clear();
        }

        //插入
        pair<Node*, bool> Insert(const pair<K, V>&kv)
        {
            _Check();
            size_t index = _HashFunc(kv.first,_tables.size());
            Node*cur = _tables[index];

            while (cur)
            {
                if (cur->_kv.first == kv.first)
                    return make_pair(cur, false);//插入值相等
                cur = cur->_next;
            }
            //表示表为空，插入新节点
            Node*tmp = new Node(kv);
            tmp->_next = _tables[index];
            _tables[index] = tmp;

            ++_size;
        }
        //查找
        Node*Find(const K&key)
        {
            size_t index = _HashFunc(key, _tables.size());//位置
            Node*cur = _tables[index];
            while (cur)
            {
                if (cur->_kv.first == key)
                {
                    return cur;
                }

                cur = cur->_next;
            }
            return NULL;//没找到
        }
        //删除
        bool Remove(const K&key)
        {
            size_t index = _HashFunc(key, _tables.size());//位置
            Node*prev = NULL;
            Node*cur = _tables[index];
            while (cur)
            {
                if (cur->_kv.first == key)//扎到删除的元素
                {
                    if (prev == NULL)
                    {
                        _tables[index] = cur->_next;//保存下一个节点
                    }
                    else
                        prev->_next = cur->_next;

                    delete cur;
                    --_size;
                }
                prev = cur;
                cur = cur->_next;
            }
            return false;//删除的节点不存在
        }

    protected:
        size_t _HashFunc(const K&key, size_t size)
        {
                 //HashFunc hf;
                 return HashFunc()(key) % size;
        }

        void _Check(size_t  n =0)
        {
            if (_size == _tables.size() || n > _tables.size())
            {
                if (_tables.size() > n)
                {
                    n = _tables.size();
                }

                vector<Node*> NewTables;
                NewTables.resize(GetNextPeime(n));//创建新表

                for (size_t i =0; i < _tables.size(); ++i)
                {
                    Node*cur = _tables[i];//摘节点赋给新表上
                    while (cur)
                    {
                        Node*next = cur->_next;
                        size_t index = _HashFunc(cur->_kv.first, NewTables.size());
                        cur->_next = NewTables[index];
                        NewTables[index] = cur;

                        cur = next;//遍历下一个节点
                    }
                    _tables[i] = NULL;//将节点与原链表分离
                }
                _tables.swap(NewTables);
            }
        }

        size_t GetNextPeime(size_t num)
        {
            const int _PrimeSize = 28;
            static const unsigned long _PrimeList[_PrimeSize] =
            {
                53ul, 97ul, 193ul, 389ul, 769ul,
                1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
                49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
                1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
                50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
                1610612741ul, 3221225473ul, 4294967291ul
            };

            for (size_t i = 0; i < _PrimeSize; ++i)
            {
                if (_PrimeList[i] > num)
                {
                    return _PrimeList[i];
                }
            }

            return _PrimeList[_PrimeSize - 1];
        }

        void _Clear()
        {
            for (size_t i = 0; i < _tables.size(); ++i)
            {
                Node*cur = _tables[i];//数组用来存节点
                while (cur)
                {
                    Node*next = cur->_next;
                    delete cur;
                    cur = next;
                }
                _tables[i] = NULL;
            }
            _size = 0;
        }
    protected:
        vector<Node*>_tables;
        size_t _size;//表中实际元素的个数
    };

    void TestHashTable()
    {
        int a1[] = { 51, 105, 52, 3, 55, 2, 106, 53, 0 };
        HashTable<int, int> ht1;


        for (size_t i = 0; i < sizeof(a1) / sizeof(a1[0]); ++i)
        {
            ht1.Insert(make_pair(a1[i], i));
        }
    }
}

这里面我们使用命名空间namespace增加代码的复用，开链法的特点是哈希表下面挂的是节点，而每个节点的头结点的地址都存在这个表中方便遍历，以及实现插入和删除节点的操作。
区别：
1：闭散列采用的是除留余数法算位置，然后在位置上实行插入、查找、删除，而且要判断每个位置的状态；开链法是对vector上的节点操作。
2：增容，闭散列采用每次开原来的两倍大小的空间，然后通过处留余数法重新计算位置，再将原表的元素插入到新表，释放旧表（影响因素：负载因子不差过0.8）
开链法，开空间，然后摘节点把原表的节点，给新表相同的位置。把原表和节点的链解开，释放原表。负载因子可以超过0.8（影响因素：挂的节点很多时，查找效率变慢）
五：小结：
1：直接定址法：用于确定范围不是很大（字符串不行）时间复杂度是O(1);
2:处留余数法：hash(key) =key%len(表长)
不同的值通过哈希函数映射到相同的位置主要用于处理冲突（闭散列方法，开放定址法）
3：线性探测：找位置（可能互相冲突影响效率）
4：二次探测：解决线性探测的冲突（负载因子不超过0.8）
5：开链法：解决二次探测的冲突以及空间利用率不高，（缺点哈希表下挂的节点不能太多否则影响查找效率，解决方法：节点可以挂红黑树这样查找效率会得到很大的优化）