Algorithms_6_HashTable

本文探讨了哈希表的基本概念及其与直接寻址数组的区别。介绍了哈希表如何通过减少存储需求来优化数据存储,并详细解释了几种常见的哈希函数实现方法,包括除法方法和乘法方法。此外,还讨论了如何处理哈希冲突,以及如何选择合适的哈希函数。

A hash table is a generalization of the simpler notion of an ordinary array. Directly addressing into an ordinary array makes effective use of our ability to examine an arbitrary position in an array in big O(1) time. Direct addressing is applicable when we can afford to allocate an array that has one position for every possible key.

  When the number of keys actually stored is small relative to the total number of possible keys, Hash tables become an effective alternative to directly addressing an array, since a hash table typically use an array of size proportional to the number of keys actually stored.

  Instead of using the key as an array index directly, the array index is computed from the key.

  Direct-address table is a simple technique that works well when the universe U of keys is reasonably small.

  When the set k of keys stored in a dictionary is much smaller than the universal U of all possible keys, a hash table require much less storage than a direct-address table.

  With direct addressing, an element with key k is stored in slot k. With hashing the element is stored in slot h(k), so h(k) is the hash value of key k. There is one hitch: two keys may hash to the same slot, We call this situation a collision. In chaining, we put all the elements that hash to the same slot in a linked list.

  A good hash function satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hash to .

  The division method computes the hash value as the remainder when the key is divided by a specified prime number. This method frequently gives good results, assume that the prime number is chosen to be unrelated to any patterns in the distribution of keys.

 In the division method for creating hash function. We map a key k into one of m slots by taking the remainder of k divided by m. That is, h(k) = k mod m , etc, size m=12 k=100 h(k) =4

 When using the division method, we usually avoid certain value of m, For example, m should not be a power of 2 .

 A prime not too close to an exact power of 2 is often a good choise for m

 The multiplication method for creating hash functions operates in 2 steps. First, we multiply the key k by a constant A in the range 0<A<1 and extract the fractional part of kA. Then, we multiply this value by m and take the floor of the result.

 H(k) = m(kA mod 1)

 kA mod 1 means the fractional part of kA

 An advantage of the multiplication method is that the value of m is not critical. We typically choose it to be a power of 2, since we can then easily implement the function on most computers as follows.

 The main idea behind universal hashing is to select the hash function at random from a carefully designed class of functions at the beginning of exexution.

 Designing a universal class of hash functions by choosing a prime number p large enough so that every possible key k is in the range 0 to p-1, inclusive, Let Z1 denote the set {0,1,2…..p-1}and let Z2 denote the set {1,2,3……p-1}. Because we assume that the size of the universal of keys is greater than the number of slots in the hash table ,we have p>m

 Ha,b(k) = ((ak+b) mod p) mod m

 Etc p=17 m=6

 H3,4(8) =5

Perfecting hashing : The basic idea to create a perfect hashing scheme is simple. We use a  two-level hashing scheme with universal hashing at each level.

 
Traceback (most recent call last): File "F:\314\Lib\site-packages\pandas\core\arrays\categorical.py", line 460, in __init__ codes, categories = factorize(values, sort=True) File "F:\314\Lib\site-packages\pandas\core\algorithms.py", line 795, in factorize codes, uniques = factorize_array( File "F:\314\Lib\site-packages\pandas\core\algorithms.py", line 595, in factorize_array uniques, codes = table.factorize( File "pandas/_libs/hashtable_class_helper.pxi", line 7293, in pandas._libs.hashtable.PyObjectHashTable.factorize File "pandas/_libs/hashtable_class_helper.pxi", line 7203, in pandas._libs.hashtable.PyObjectHashTable._unique TypeError: unhashable type: 'list' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "F:/shuju/1/12.py", line 300, in <module> 主程序() File "F:/shuju/1/12.py", line 270, in 主程序 df = 识别器.保存结果(结果列表) File "F:/shuju/1/12.py", line 238, in 保存结果 df = df.sort_values(['周数排序', '果实编号排序']) File "F:\314\Lib\site-packages\pandas\core\frame.py", line 7205, in sort_values indexer = lexsort_indexer( File "F:\314\Lib\site-packages\pandas\core\sorting.py", line 351, in lexsort_indexer cat = Categorical(k, ordered=True) File "F:\314\Lib\site-packages\pandas\core\arrays\categorical.py", line 462, in __init__ codes, categories = factorize(values, sort=False) File "F:\314\Lib\site-packages\pandas\core\algorithms.py", line 795, in factorize codes, uniques = factorize_array( File "F:\314\Lib\site-packages\pandas\core\algorithms.py", line 595, in factorize_array uniques, codes = table.factorize( File "pandas/_libs/hashtable_class_helper.pxi", line 7293, in pandas._libs.hashtable.PyObjectHashTable.factorize File "pandas/_libs/hashtable_class_helper.pxi", line 7203, in pandas._libs.hashtable.PyObjectHashTable._unique TypeError: unhashable type: 'list'
最新发布
11-09
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[23], line 114 111 user_prefs = selector.get_selection() 113 if user_prefs: --> 114 recommendations = enhanced_recommendations(user_prefs, diversity=0.2) 115 print("推荐结果:") 116 for movie in recommendations: Cell In[23], line 96, in enhanced_recommendations(user_preferences, diversity, top_n) 91 diversity_rec = movies[non_pref_mask & (movies['rating'] >= 4)].sample( 92 min(int(top_n*diversity), len(movies[non_pref_mask])) 93 ) 95 # 合并结果(修复final_rec定义) ---> 96 final_rec = pd.concat([main_rec, diversity_rec]).drop_duplicates() 98 update_preference_history(user_preferences) 99 update_recommended_movies(final_rec['movieId'].tolist()) File D:\anaconda3\Lib\site-packages\pandas\util\_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs) 325 if len(args) > num_allow_args: 326 warnings.warn( 327 msg.format(arguments=_format_argument_list(allow_args)), 328 FutureWarning, 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) File D:\anaconda3\Lib\site-packages\pandas\core\frame.py:6672, in DataFrame.drop_duplicates(self, subset, keep, inplace, ignore_index) 6670 inplace = validate_bool_kwarg(inplace, "inplace") 6671 ignore_index = validate_bool_kwarg(ignore_index, "ignore_index") -> 6672 duplicated = self.duplicated(subset, keep=keep) 6674 result = self[-duplicated] 6675 if ignore_index: File D:\anaconda3\Lib\site-packages\pandas\core\frame.py:6814, in DataFrame.duplicated(self, subset, keep) 6812 else: 6813 vals = (col.values for name, col in self.items() if name in subset) -> 6814 labels, shape = map(list, zip(*map(f, vals))) 6816 ids = get_group_index( 6817 labels, 6818 # error: Argument 1 to "tuple" has incompatible type "List[_T]"; (...) 6822 xnull=False, 6823 ) 6824 result = self._constructor_sliced(duplicated(ids, keep), index=self.index) File D:\anaconda3\Lib\site-packages\pandas\core\frame.py:6782, in DataFrame.duplicated.<locals>.f(vals) 6781 def f(vals) -> tuple[np.ndarray, int]: -> 6782 labels, shape = algorithms.factorize(vals, size_hint=len(self)) 6783 return labels.astype("i8", copy=False), len(shape) File D:\anaconda3\Lib\site-packages\pandas\core\algorithms.py:822, in factorize(values, sort, na_sentinel, use_na_sentinel, size_hint) 819 # Don't modify (potentially user-provided) array 820 values = np.where(null_mask, na_value, values) --> 822 codes, uniques = factorize_array( 823 values, 824 na_sentinel=na_sentinel_arg, 825 size_hint=size_hint, 826 ) 828 if sort and len(uniques) > 0: 829 if na_sentinel is None: 830 # TODO: Can remove when na_sentinel=na_sentinel as in TODO above File D:\anaconda3\Lib\site-packages\pandas\core\algorithms.py:578, in factorize_array(values, na_sentinel, size_hint, na_value, mask) 575 hash_klass, values = _get_hashtable_algo(values) 577 table = hash_klass(size_hint or len(values)) --> 578 uniques, codes = table.factorize( 579 values, 580 na_sentinel=na_sentinel, 581 na_value=na_value, 582 mask=mask, 583 ignore_na=ignore_na, 584 ) 586 # re-cast e.g. i8->dt64/td64, uint8->bool 587 uniques = _reconstruct_data(uniques, original.dtype, original) File pandas\_libs\hashtable_class_helper.pxi:5943, in pandas._libs.hashtable.PyObjectHashTable.factorize() File pandas\_libs\hashtable_class_helper.pxi:5857, in pandas._libs.hashtable.PyObjectHashTable._unique() TypeError: unhashable type: 'list'
05-11
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值