在WeakHashMap和HashMap中使用了不同的哈希函数
WeakHashMap
/** * Retrieve object hash code and applies a supplemental hash function to the * result hash, which defends against poor quality hash functions. This is * critical because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. */ final int hash(Object k) { int h = k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
HashMap
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
如何设计哈希
找了很久才发现没有银弹,哈希方法要考虑分散,性能,安全等方面。这篇文章给了很多启发
http://ticki.github.io/blog/designing-a-good-non-cryptographic-hash-function/
尤其是里面的坐标图,说明设计的哈希函数可以用坐标图的方式验证分散性好不好。
为什么是异或
常见的map中的哈希函数一般都少不了移位操作和异或操作。为什么偏爱异或?
Assuming uniformly random (1-bit) inputs, the AND function output probability distribution is 75%
0
and 25%1
. Conversely, OR is 25%0
and 75%1
.The XOR function is 50%
0
and 50%1
, therefore it is good for combining uniform probability distributions.This can be seen by writing out truth tables:
a | b | a AND b ---+---+-------- 0 | 0 | 0 0 | 1 | 0 1 | 0 | 0 1 | 1 | 1 a | b | a OR b ---+---+-------- 0 | 0 | 0 0 | 1 | 1 1 | 0 | 1 1 | 1 | 1 a | b | a XOR b ---+---+-------- 0 | 0 | 0 0 | 1 | 1 1 | 0 | 1 1 | 1 | 0
Exercise: How many logical functions of two 1-bit inputs
a
andb
have this uniform output distribution? Why is XOR the most suitable for the purpose stated in your question?
为什么table的长度是2的指
Map的底层实现一般都是数组,为什么要求数组的长度是2的倍数?长度是2的倍数,长度 - 1 可以得到全1的二进制数,
再与hash值进行"与"运算,得到该值在数组中的位置。
/** * Returns index for hash code h. */ private static int indexFor(int h, int length) { return h & (length-1); }