java hashmap的hash方法的作用是为了让哈希表分布更均匀
为什么要对key的hashCode无符号右移16位呢?
桶索引是通过哈希码与(n - 1)按位与运算得到的(这里的n是桶数组的大小),hash方法混合了高16位和低16位,使得冲突的概率降低了。
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
这段注释来自Java的HashMap
类,它详细解释了为什么以及如何对键的哈希码进行处理,以确保哈希值在哈希表中的分布更加均匀。以下是该段注释的解析:
关键点解析
-
哈希码的计算和扩展:
key.hashCode()
:首先获取键对象的原始哈希码。spreads (XORs) higher bits of hash to lower
:通过将哈希码的高位与低位进行异或运算,来扩展(即混合)哈希码的高位信息到低位。
-
幂次方掩码的影响:
Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide.
:由于HashMap
使用的是2的幂次方大小的桶数组,并且桶索引是通过哈希码与(n - 1)
按位与运算得到的(这里的n
是桶数组的大小),如果哈希码只在高于当前掩码(即(n - 1)
)的位上有所不同,那么这些哈希码将会总是映射到同一个桶中,导致哈希冲突。
-
解决哈希冲突的问题:
So we apply a transform that spreads the impact of higher bits downward.
:为了解决这个问题,HashMap
应用了一个变换,使得哈希码的高位信息能够影响到低位,从而减少这种系统性的哈希冲突。
-
实际操作:
There is a tradeoff between speed, utility, and quality of bit-spreading.
:这里存在速度、效用和位扩散质量之间的权衡。过于复杂的位扩散可能会降低性能,而过于简单的扩散可能无法有效减少哈希冲突。Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.
:许多常见的哈希集已经分布得比较合理,不需要额外的位扩散;此外,HashMap
还使用红黑树来处理大量的哈希冲突。因此,HashMap
选择了一种最简单的方式——即无符号右移16位并异或——来减少系统性的哈希冲突,并确保高位信息能够参与到桶索引的计算中。
总结
这段注释强调了HashMap
为了提高哈希值的分布均匀性,采取了一种平衡速度和效果的策略。通过将哈希码的高位与低位进行异或运算,HashMap
不仅提高了哈希值的随机性和分布均匀性,还保证了操作的高效性。这种方式特别适用于那些哈希码高位信息未被充分利用的情况,如小规模的Float
类型键的连续整数值。同时,对于大多数情况下已经分布良好的哈希值,这种简单的变换不会带来额外的性能开销。