HashMap工作原理分析

最新推荐文章于 2025-04-18 14:39:21 发布

mxiangjian

最新推荐文章于 2025-04-18 14:39:21 发布

阅读量225

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Java语言基础

本文链接：https://blog.youkuaiyun.com/mxiangjian/article/details/52082787

Java语言基础专栏收录该内容

4 篇文章

订阅专栏

本文深入剖析HashMap的内部机制，包括初始化、put/get方法实现、hash算法及resize过程。

本文基于JDK1.8源码进行分析,阅读本文你可了解：

1、hashmap是什么，它有什么特点

2、hashmap的工作原理

3、初始化,put、get、resize方法是怎样实现的

4、hash算法是怎么实现的

官方对hashmap的描述如下：

1、概述

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it isunsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

几个关键的信息：基于Map接口实现、允许null键/值、非同步、不保证有序(比如插入的顺序)、也不保证序不随时间变化。

2、初始化

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

以上是hashmap最重要的初始化方法，该方法有两个重要的参数容量(initialCapatiicy) 和负载因子(loadFactor)

容量是指在初始化hashmap的时候，bucket的大小，负载因子是对hashmap中bucket填满程度的一个比例,当hashmap的size大于bucket*loadFactor的

时候，hashmap会自动扩容。

3、put函数的实现

对key值进行hash,计算index

若该index上的bucket无值，直接存放

若该index上存在值oldValue,则存放在以该oldValue为头结点的链表下

如该链表长度过长则将该链表结构改造后红黑树结构

如果hashmap的填满程度超过了负载因子,进行resize

具体代码如下：

public V put(K key, V value) {
    // 对key的hashCode()做hash
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // tab为空则创建
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 计算index，并对null做处理
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // 节点存在
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 该链为树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 该链为链表
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 写入
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 超过load factor*current capacity，resize
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

4、get函数的实现

对key值进行hash，计算index

若该index位置上是该值，直接命中返回

若出现冲突：

如是链表，则在链表中通过k.equals(key)进行查找 O(N)

如是红黑树，则在红黑树中通过k.equals(key)进行查找 O(logn)

函数如下：

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 直接命中
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 未命中
        if ((e = first.next) != null) {
            // 在树中get
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 在链表中get
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

5、hash函数的实现

在源码中涉及到hash计算的共有两个地方：

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

该函数时计算参数key的hash值，与获取普通对象hash值不同的地方在于，该函数将key的原hash值右移16位之后再与其本身做异或操作之后得到最终的hash值，这叫做扰动函数，关于这么做的目的待会解释，现在我们先来看下源码中是怎么通过hashcode得到最终index的

if ((p = tab[i = (n - 1) & hash]) == null)

以上 (n - 1) & hash]即是计算索引的方法，利用位运算代替模运算提高了效率，将hashmap的长度减1再与hashcode做相与操作，这也解释了为什么hashmap的长度一定要是2的幂次放，因为如果是这样的话n-1在二进制的表示就是全1，正好相当于一个地位掩码，它与hashcode做相与操作的结果就是将hashcode的高位全部清零只保留地位，用来当做索引，例如以N=16为例 n-1等于15 二进制表示为00000000 00001111，与某hash值做与操作的结果就是截取了低4位

10101010 10101010

&00000000 00001111

--------------------------------------

1010

但这时候问题就来了，hashcode就算在散列，只取后4位的话，冲突的可能性还是很大，所以这就体现出扰动函数的作用了，我们来看下图

右移16正好是32位的一半,自己的高半区和低半区做异或操作，就是为了混合原hash码的高位和地位，依次来加大低位的随机性，混合后的低位掺杂了高位的部分特征，这也变相保留了高位的特征。

以上就是hash的整个过程。

6、resize方法

当put时，如果发现目前的bucket占用程度已经超过了Load Factor所希望的比例，那么就会发生resize。在resize的过程，简单的

说就是把bucket扩充为2倍，之后重新计算index，把节点再放到新的bucket中。resize的注释是这样描述的：

Initializes or doubles table size. If null, allocates in accord with initial capacity target held in field threshold. Otherwise,

because we are using power-of-two expansion, the elements from each bin must either stay at same index,

or move with a power of two offset in the new table.

大致意思就是说，当超过限制的时候会resize，然而又因为我们使用的是2次幂的扩展(指长度扩为原来2倍)，所以，元素的位置

要么是在原位置，要么是在原位置再移动2次幂的位置。

下面是代码的具体实现：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // 超过最大值就不再扩充了，就只好随你碰撞去吧
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 没超过最大值，就扩充为原来的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 计算新的resize上限
    if (newThr == 0) {

        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // 把每个bucket都移动到新的buckets中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 原索引+oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 原索引放到bucket里
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 原索引+oldCap放到bucket里
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

需要注意的一点是，处于链表中的值，在经过resize后可能还在原位置，可能会在原来位置2的的bucket中，也可能处理该2倍

bucket位置的链表中。

当扩充的容量size大于hashmap最大容量时，此时不能简单地将容量加倍，阈值加倍，只能将阈值调到最高（源码中是int类型的

最大值），来存储。因为size是int类型的故阈值是不会超过int类型最大值的。