如何设计hash函数_怎么设计哈希函数-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_29454347/article/details/94555069

本文深入探讨了WeakHashMap和HashMap中使用的不同哈希函数，解释了为何哈希表的长度通常为2的幂次方，以及异或操作在哈希函数中的作用。通过对比分析，揭示了哈希函数设计中的考量因素，包括分布均匀性、性能和安全性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在WeakHashMap和HashMap中使用了不同的哈希函数

WeakHashMap

/**
 * Retrieve object hash code and applies a supplemental hash function to the
 * result hash, which defends against poor quality hash functions.  This is
 * critical because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits.
 */
final int hash(Object k) {
    int h = k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

HashMap

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

如何设计哈希

找了很久才发现没有银弹，哈希方法要考虑分散，性能，安全等方面。这篇文章给了很多启发

http://ticki.github.io/blog/designing-a-good-non-cryptographic-hash-function/

尤其是里面的坐标图，说明设计的哈希函数可以用坐标图的方式验证分散性好不好。

为什么是异或

常见的map中的哈希函数一般都少不了移位操作和异或操作。为什么偏爱异或？

Assuming uniformly random (1-bit) inputs, the AND function output probability distribution is 75% 0and 25% 1. Conversely, OR is 25% 0 and 75% 1.

The XOR function is 50% 0 and 50% 1, therefore it is good for combining uniform probability distributions.

This can be seen by writing out truth tables:
 a | b | a AND b
---+---+--------
 0 | 0 |    0
 0 | 1 |    0
 1 | 0 |    0
 1 | 1 |    1

 a | b | a OR b
---+---+--------
 0 | 0 |    0
 0 | 1 |    1
 1 | 0 |    1
 1 | 1 |    1

 a | b | a XOR b
---+---+--------
 0 | 0 |    0
 0 | 1 |    1
 1 | 0 |    1
 1 | 1 |    0
Exercise: How many logical functions of two 1-bit inputs a and b have this uniform output distribution? Why is XOR the most suitable for the purpose stated in your question?

为什么table的长度是2的指

Map的底层实现一般都是数组，为什么要求数组的长度是2的倍数？长度是2的倍数，长度 - 1 可以得到全1的二进制数，

再与hash值进行"与"运算，得到该值在数组中的位置。

/**
 * Returns index for hash code h.
 */
private static int indexFor(int h, int length) {
    return h & (length-1);
}