NanguoCoffee 知道为啥HashMap里面的数组size必须是2的次幂?

最新推荐文章于 2024-11-16 12:00:01 发布

转载最新推荐文章于 2024-11-16 12:00:01 发布 · 606 阅读

java 专栏收录该内容

22 篇文章

订阅专栏

最近在写一个简易的分离锁的类:

要求：对不同的Key进行hash得到一个Lock，并要求对锁映射的概率差不多。比如，160个Key，分布到16个锁上，大概有10个Key是映射到同一个锁上的，只要这样并发效率才会高。

public 
class SplitReentrantLock {
 
    privateLock[] locks;
 
    privateintLOCK_NUM;
 
    publicSplitReentrantLock(intlockNum) {
        super();
        LOCK_NUM = lockNum;
        locks =newLock[LOCK_NUM];
        for(inti =
0; i < LOCK_NUM; i++) {
            locks[i] =newReentrantLock();
        }
    }
 
    /**
     * 获取锁, 使用HashMap的hash算法
     *
     *
     * @param key
     * @return
     */
    publicLock getLock(String key) {
 
        intlockIndex = index(key);
        returnlocks[lockIndex];
    }
 
    intindex(String key) {
        inthash = hash(key.hashCode());       
        returnhash & (LOCK_NUM -1);
    }
 
    inthash(inth) {
        h ^= (h >>>20) ^ (h >>>12);
        returnh ^ (h >>>7) ^ (h >>>4);
    }

用法：

  public void test1() {
        method(32, 1000);
        }

<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">本来认为用HashMap的hash算法就能够将 达到上述的要求，结果测试的时候吓了一跳。</span>

测试代码：

package hash;


import java.util.Map;
import java.util.TreeMap;


import org.apache.commons.lang3.RandomStringUtils;


import junit.framework.TestCase;


public class SplitReenterLockTest extends TestCase {
<span style="white-space:pre">	</span> 
    public void method(int lockNum, int testNum) {
 
        SplitReentrantLock splitLock = new SplitReentrantLock(lockNum);
        Map<Integer, Integer> map = new TreeMap<Integer, Integer>();
        for (int i = 0; i < lockNum; i++) {
            map.put(i, 0);
        }
        for (int i = 0; i < testNum; i++) {
            Integer key = splitLock.index(RandomStringUtils.random(128));
            map.put(key, map.get(key) + 1);
        }
 
        for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
            System.out.println(entry.getKey() + " : " + entry.getValue());
        }
    }
 
    public void test1() {
        method(50, 1000);
        }
  
}

结果：1000个随机key的hash只是映射到8个Lock上，而不是平均到50个Lock上。

而且是固定分布到0,1,16,17,32,33,48,49的数组下标对应的Lock上面，这是为什么呢？

如果改为：

<pre code_snippet_id="437309" snippet_file_name="blog_20140728_2_7660145" name="code" class="java">public void test1() {
        method(32, 1000);
        }

结果：1000个随机key的hash 映射到32个Lock上，而且基本上是平均分布的。

问题：为什么50和32的hash的效果差别那么大呢？

再次测试2,4,8,16,64,128. 发现基本上都是平均分布到所有的Lock上面。

得到平均分布的这些数都是2的次幂，难道hash算法和二进制有关？

看看hash算法：

public int index(String key) {
<span style="white-space:pre">		</span>int hash = hash(key.hashCode());
<span style="white-space:pre">		</span>return hash & (LOCK_NUM - 1);
<span style="white-space:pre">	</span>}


<span style="white-space:pre">	</span>private int hash(int h) {
<span style="white-space:pre">		</span>h ^= (h >>> 20)^(h >>> 12);
<span style="white-space:pre">		</span>return h^(h>>>7)^(h>>>4);
<span style="white-space:pre">	</span>}

先是经过神奇的(ps：不知道为什么这么运算，无知的我只能用神奇来形容)的位运算，最后和LOCK_NUM - 1来进行与运算。

本帖的关键点就是在于这个与运算中，如果要想运算后的结果是否平均分布，在于LOCK_NUM-1的二进制中1的位数有几个。如果都是1,那么肯定是平均分布到0至LOCK_NUM-1上面。否则仅仅分布指定的几位。

下面以50和32说明：

假设Key进行hash运行得到hash值为h,

比如：我测试的数据中的一些h的二进制值：

1100000010000110110101010001001
10111100001001110111000100010001
11111011111010101010000111001001
11001010011000100110110111011111
10001010100010111101011010011110

50的二进制值：110010.减去1后的二进制：110001

32的二进制值: 100000.减去1后的二进制：11111

因此h和 49 (即110001)与的结果只能为

000000 ： 0

000001 ： 1

010000 ： 16

010001 ： 17

100000 ： 32

100001 ： 33

110000 ： 48

110001 ： 49

而h和31 (即11111)与的结果为：