#学习笔记#Concept of Hashing

本文深入探讨了哈希函数及其在哈希表中的应用,介绍了如何通过哈希函数加速搜索速度,避免了传统搜索算法的低效。详细解释了哈希表的创建、使用哈希函数的步骤,并提供了Java hashCode()方法的实现方式和实例,同时讨论了冲突处理方法和哈希表的优势。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

http://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

Introduction

如何加快搜索的速度呢? 如果搜索一个unsorted array, 那么需要扫一遍array, O(n). 如果是sorted array, 则可以用binary search, O(log n). 

如果事先知道要找的value分布在array的什么位置(index)的话, 我们可以在O(1)的时间找到这个value, Hash Function 就是这样一种方法.

  • A hash function is a function which when given a key, generates an address in the table.



Hash function has the following properties

  • It always return a number for an object.
  • Two equal objects will always have the same number
  • Two unequal objects not always have different numbers

使用Hashing Function的步骤(procedure):

Create an array of size M. Choose a hash function h, that is a mapping from objects into integers 0, 1, 2, ... , M - 1. Put these objects into an array at indexes computed via the hash function index = h(object). Such array is called a hash table.



如何来选择hash function呢? 一种方法是使用Java的 hashCode() 方法, which is implemented in the Object class and there fore each class in Java inherits it. 它可以对object提供一种数字化的表示 numeric representation, (相对于 toString() 方法会提供object的text representation)

例子:


hashCode() 方法对不同的class有不同的实现方式, 对于String来说, 如下:

s.charAt(0) * 31n-1 + s.charAt(1) * 31n-2 + ... + s.charAt(n-1)

s is a string and n is the length. Example:

"ABC" = 'A' * 312 + 'B' * 31 + 'C' = 65 * 312 + 66 * 31 + 67 = 64578

Note that Java's hashCode method might return a negative integer. If a string is long enough, its hashcode will be bigger than the largest integer we can store on 32 bits CPU. In this case, due to integer overflow, the value returned by hashCode can be negative.

Review the code in HashCodeDemo.java.


Collisions

When we put objects into a hashtable, it is possible that different objects (by the  equals()  method) might have the same hashcode. This is called a collision . Here is the example of collision. Two different strings "Aa" and "BB" have the same key:
"Aa" = 'A' * 31 + 'a' = 2112
"BB" = 'B' * 31 + 'B' = 2112

如何解决collisions问题? 
一种方法是基于将collide keys放在linked list中, 即所谓的 separate chaining collision resolution.
hashtable 的优势就在于对于add, remove, contains, size这些basic operations 具有constant- time performance. 然而由于collisions, worst-case情况下我们并不能保证constant runtime. 试想所有的 objects 都 collide into the same index. 这时搜索此hashtable就和搜索一个linkedlist一样了, linear runtime. 我们可以保证的是 expected constant runtime.


=========TO BE CONTINUED...


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值